DevOps Debugging Part 6: df & du

Neeran Gul
4 min readNov 25, 2022
Photo by D O M I N I K J P W on Unsplash

This is a multi-part series where we will explore essential unix commands for debugging applications. These skills are critical when an outage occurs or something doesn’t work as expected. This is aimed at DevOps Engineers, SREs and linux sysadmins. Below is a quick navigation if you want to jump to the other parts.

  1. netcat
  2. curl
  3. dig
  4. ps
  5. less
  6. df & du
  7. openssl
  8. lsof
  9. netstat
  10. iostat

In this part we are going to cover df and du. Disk space running out is one of the most common causes of unavailability in infrastructure. As a sysadmin it is expected that the root cause is found. There are some usual suspects to check but when it is not fully clear then some investigation is required. Keep in mind that we will not cover the whole usage of the command and what fancy things it can do but rather how to use the command to debug servers and applications.

Installation

To install network on redhat/centos/ubuntu/osx run:

# redhat/centos/amazon linux
$ yum install coreutils fileutils
# ubuntu
$ apt-get install coreutils fileutils
# OSX/Mac (usually already installed)# test for installation
$ df

If you get a command not found back then please reach out below in the comments section.

Usage

See the disk space for each filesystem.

$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/root 20G 3.2G 17G 17% /
devtmpfs 477M 0 477M 0% /dev
tmpfs 484M 0 484M 0% /dev/shm
tmpfs 97M 840K 96M 1% /run
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 484M 0 484M 0% /sys/fs/cgroup
/dev/loop1 56M 56M 0 100% /snap/core18/2566
/dev/loop0 26M 26M 0 100% /snap/amazon-ssm-agent/5656
/dev/loop2 64M 64M 0 100% /snap/core20/1623
/dev/loop3 68M 68M 0 100% /snap/lxd/22753
/dev/loop4 48M 48M 0 100% /snap/snapd/16778
/dev/xvda15 105M 5.2M 100M 5% /boot/efi
/dev/loop5 48M 48M 0 100% /snap/snapd/17029
/dev/loop6 25M 25M 0 100% /snap/amazon-ssm-agent/6312
tmpfs 97M 0 97M 0% /run/user/1000

The filesystem column lists the device whether its physical or virtually installed onto the operating system. The Size, Used, Avail and Use% columns will give a good indication if this server is running out of space.

# list inodes
$ df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/root 2580480 99990 2480490 4% /
devtmpfs 121968 330 121638 1% /dev
tmpfs 123679 2 123677 1% /dev/shm
tmpfs 123679 540 123139 1% /run
tmpfs 123679 3 123676 1% /run/lock
tmpfs 123679 19 123660 1% /sys/fs/cgroup
/dev/loop1 10858 10858 0 100% /snap/core18/2566
/dev/loop0 16 16 0 100% /snap/amazon-ssm-agent/5656
/dev/loop2 11882 11882 0 100% /snap/core20/1623
/dev/loop3 802 802 0 100% /snap/lxd/22753
/dev/loop4 486 486 0 100% /snap/snapd/16778
/dev/xvda15 0 0 0 - /boot/efi
/dev/loop5 486 486 0 100% /snap/snapd/17029
/dev/loop6 16 16 0 100% /snap/amazon-ssm-agent/6312
tmpfs 123679 22 123657 1% /run/user/1000

It is also possible to list inodeson the server. inodes are the number of files that can be created on the filesystem.

$ sudo du -hc --max-depth=1 /var/log
84K /var/log/apt
40K /var/log/unattended-upgrades
4.0K /var/log/openvpn
4.0K /var/log/private
4.0K /var/log/dist-upgrade
636K /var/log/mongodb
17M /var/log/journal
4.0K /var/log/landscape
36K /var/log/amazon
18M /var/log
18M total

With the above I can tell that journal directory is taking up most space in /var/log directory. journal is where journald stores logs of all systemd processes, this is expected. Keep in mind it can take some time to get an output back since each file is aggregated and then displayed. It is not uncommon for this command to hang if there are lots of tiny files in one directory.

Debugging

During an outage df can be used to find out if the server has ran out of space. You most probably cannot even do ls or open any files. Main aim is to create space to the application can run again. Use du to find the cause with usual suspects being /var/lib/docker or /var/log. Maybe run a docker system prune or truncate the log file by doing > /var/log/syslog. Double check your log rotate policies to be more aggressive. If it is inodes related, increase the ulimit to give more capacity.

Alternatives

df is a powerful tool for finding out if your disk space is running out. There are alternatives however that give deeper insights.

$ ncdu

ncdu can be used to view disk usage similar to df. Keep in mind ncdu can be a bit slow to load, essentially it depends on how many files you have on the filesystem.

$ gdu

gdu (https://github.com/dundee/gdu) is an alternative and is able to find the exact path to files. It is claimed that it is faster than ncdu.

$ ls -lh

ls can be used to determine file sizes and eventually get to the root cause.

Conclusion

In the next part we are going to cover openssl for debugging applications. These parts will be released on a weekly basis, if you want to skip the queue please buy the book here:

https://www.amazon.com/dp/B0BJC4Y1N1

Please leave comments and share your outage debugging stories.

--

--

Neeran Gul

Industry veteran providing strong mentorship and sharing experiences.