DevOps Debugging Part 6: df & du

4 min readNov 25, 2022

Photo by D O M I N I K J P W on Unsplash

This is a multi-part series where we will explore essential unix commands for debugging applications. These skills are critical when an outage occurs or something doesn’t work as expected. This is aimed at DevOps Engineers, SREs and linux sysadmins. Below is a quick navigation if you want to jump to the other parts.

In this part we are going to cover df and du. Disk space running out is one of the most common causes of unavailability in infrastructure. As a sysadmin it is expected that the root cause is found. There are some usual suspects to check but when it is not fully clear then some investigation is required. Keep in mind that we will not cover the whole usage of the command and what fancy things it can do but rather how to use the command to debug servers and applications.

Installation

To install network on redhat/centos/ubuntu/osx run:

# redhat/centos/amazon linux
$ yum install coreutils fileutils# ubuntu
$ apt-get install coreutils fileutils# OSX/Mac (usually already installed)# test for installation
$ df

If you get a command not found back then please reach out below in the comments section.

Usage

See the disk space for each filesystem.

$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/root        20G  3.2G   17G  17% /
devtmpfs        477M     0  477M   0% /dev
tmpfs           484M     0  484M   0% /dev/shm
tmpfs            97M  840K   96M   1% /run
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           484M     0  484M   0% /sys/fs/cgroup
/dev/loop1       56M   56M     0 100% /snap/core18/2566
/dev/loop0       26M   26M     0 100% /snap/amazon-ssm-agent/5656
/dev/loop2       64M   64M     0 100% /snap/core20/1623
/dev/loop3       68M   68M     0 100% /snap/lxd/22753
/dev/loop4       48M   48M     0 100% /snap/snapd/16778
/dev/xvda15     105M  5.2M  100M   5% /boot/efi
/dev/loop5       48M   48M     0 100% /snap/snapd/17029
/dev/loop6       25M   25M     0 100% /snap/amazon-ssm-agent/6312
tmpfs            97M     0   97M   0% /run/user/1000

The filesystem column lists the device whether its physical or virtually installed onto the operating system. The Size, Used, Avail and Use% columns will give a good indication if this server is running out of space.

# list inodes
$ df -i
Filesystem      Inodes IUsed   IFree IUse% Mounted on
/dev/root      2580480 99990 2480490    4% /
devtmpfs        121968   330  121638    1% /dev
tmpfs           123679     2  123677    1% /dev/shm
tmpfs           123679   540  123139    1% /run
tmpfs           123679     3  123676    1% /run/lock
tmpfs           123679    19  123660    1% /sys/fs/cgroup
/dev/loop1       10858 10858       0  100% /snap/core18/2566
/dev/loop0          16    16       0  100% /snap/amazon-ssm-agent/5656
/dev/loop2       11882 11882       0  100% /snap/core20/1623
/dev/loop3         802   802       0  100% /snap/lxd/22753
/dev/loop4         486   486       0  100% /snap/snapd/16778
/dev/xvda15          0     0       0     - /boot/efi
/dev/loop5         486   486       0  100% /snap/snapd/17029
/dev/loop6          16    16       0  100% /snap/amazon-ssm-agent/6312
tmpfs           123679    22  123657    1% /run/user/1000

It is also possible to list inodeson the server. inodes are the number of files that can be created on the filesystem.

$ sudo du -hc --max-depth=1 /var/log
84K /var/log/apt
40K /var/log/unattended-upgrades
4.0K /var/log/openvpn
4.0K /var/log/private
4.0K /var/log/dist-upgrade
636K /var/log/mongodb
17M /var/log/journal
4.0K /var/log/landscape
36K /var/log/amazon
18M /var/log
18M total

With the above I can tell that journal directory is taking up most space in /var/log directory. journal is where journald stores logs of all systemd processes, this is expected. Keep in mind it can take some time to get an output back since each file is aggregated and then displayed. It is not uncommon for this command to hang if there are lots of tiny files in one directory.

Debugging

During an outage df can be used to find out if the server has ran out of space. You most probably cannot even do ls or open any files. Main aim is to create space to the application can run again. Use du to find the cause with usual suspects being /var/lib/docker or /var/log. Maybe run a docker system prune or truncate the log file by doing > /var/log/syslog. Double check your log rotate policies to be more aggressive. If it is inodes related, increase the ulimit to give more capacity.

Alternatives

df is a powerful tool for finding out if your disk space is running out. There are alternatives however that give deeper insights.

$ ncdu

ncdu can be used to view disk usage similar to df. Keep in mind ncdu can be a bit slow to load, essentially it depends on how many files you have on the filesystem.

$ gdu

gdu (https://github.com/dundee/gdu) is an alternative and is able to find the exact path to files. It is claimed that it is faster than ncdu.

$ ls -lh

ls can be used to determine file sizes and eventually get to the root cause.

Conclusion

In the next part we are going to cover openssl for debugging applications. These parts will be released on a weekly basis, if you want to skip the queue please buy the book here:

https://www.amazon.com/dp/B0BJC4Y1N1

Please leave comments and share your outage debugging stories.