DevOps Debugging Part 6: df & du
This is a multi-part series where we will explore essential unix commands for debugging applications. These skills are critical when an outage occurs or something doesn’t work as expected. This is aimed at DevOps Engineers, SREs and linux sysadmins. Below is a quick navigation if you want to jump to the other parts.
In this part we are going to cover df
and du
. Disk space running out is one of the most common causes of unavailability in infrastructure. As a sysadmin it is expected that the root cause is found. There are some usual suspects to check but when it is not fully clear then some investigation is required. Keep in mind that we will not cover the whole usage of the command and what fancy things it can do but rather how to use the command to debug servers and applications.
Installation
To install network on redhat/centos/ubuntu/osx run:
# redhat/centos/amazon linux
$ yum install coreutils fileutils# ubuntu
$ apt-get install coreutils fileutils# OSX/Mac (usually already installed)# test for installation
$ df
If you get a command not found back then please reach out below in the comments section.
Usage
See the disk space for each filesystem.
$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/root 20G 3.2G 17G 17% /
devtmpfs 477M 0 477M 0% /dev
tmpfs 484M 0 484M 0% /dev/shm
tmpfs 97M 840K 96M 1% /run
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 484M 0 484M 0% /sys/fs/cgroup
/dev/loop1 56M 56M 0 100% /snap/core18/2566
/dev/loop0 26M 26M 0 100% /snap/amazon-ssm-agent/5656
/dev/loop2 64M 64M 0 100% /snap/core20/1623
/dev/loop3 68M 68M 0 100% /snap/lxd/22753
/dev/loop4 48M 48M 0 100% /snap/snapd/16778
/dev/xvda15 105M 5.2M 100M 5% /boot/efi
/dev/loop5 48M 48M 0 100% /snap/snapd/17029
/dev/loop6 25M 25M 0 100% /snap/amazon-ssm-agent/6312
tmpfs 97M 0 97M 0% /run/user/1000
The filesystem column lists the device whether its physical or virtually installed onto the operating system. The Size, Used, Avail and Use% columns will give a good indication if this server is running out of space.
# list inodes
$ df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/root 2580480 99990 2480490 4% /
devtmpfs 121968 330 121638 1% /dev
tmpfs 123679 2 123677 1% /dev/shm
tmpfs 123679 540 123139 1% /run
tmpfs 123679 3 123676 1% /run/lock
tmpfs 123679 19 123660 1% /sys/fs/cgroup
/dev/loop1 10858 10858 0 100% /snap/core18/2566
/dev/loop0 16 16 0 100% /snap/amazon-ssm-agent/5656
/dev/loop2 11882 11882 0 100% /snap/core20/1623
/dev/loop3 802 802 0 100% /snap/lxd/22753
/dev/loop4 486 486 0 100% /snap/snapd/16778
/dev/xvda15 0 0 0 - /boot/efi
/dev/loop5 486 486 0 100% /snap/snapd/17029
/dev/loop6 16 16 0 100% /snap/amazon-ssm-agent/6312
tmpfs 123679 22 123657 1% /run/user/1000
It is also possible to list inodes
on the server. inodes
are the number of files that can be created on the filesystem.
$ sudo du -hc --max-depth=1 /var/log
84K /var/log/apt
40K /var/log/unattended-upgrades
4.0K /var/log/openvpn
4.0K /var/log/private
4.0K /var/log/dist-upgrade
636K /var/log/mongodb
17M /var/log/journal
4.0K /var/log/landscape
36K /var/log/amazon
18M /var/log
18M total
With the above I can tell that journal
directory is taking up most space in /var/log
directory. journal
is where journald
stores logs of all systemd processes, this is expected. Keep in mind it can take some time to get an output back since each file is aggregated and then displayed. It is not uncommon for this command to hang if there are lots of tiny files in one directory.
Debugging
During an outage df
can be used to find out if the server has ran out of space. You most probably cannot even do ls
or open any files. Main aim is to create space to the application can run again. Use du
to find the cause with usual suspects being /var/lib/docker
or /var/log
. Maybe run a docker system prune
or truncate the log file by doing > /var/log/syslog
. Double check your log rotate policies to be more aggressive. If it is inodes related, increase the ulimit
to give more capacity.
Alternatives
df
is a powerful tool for finding out if your disk space is running out. There are alternatives however that give deeper insights.
$ ncdu
ncdu
can be used to view disk usage similar to df
. Keep in mind ncdu
can be a bit slow to load, essentially it depends on how many files you have on the filesystem.
$ gdu
gdu
(https://github.com/dundee/gdu) is an alternative and is able to find the exact path to files. It is claimed that it is faster than ncdu.
$ ls -lh
ls
can be used to determine file sizes and eventually get to the root cause.
Conclusion
In the next part we are going to cover openssl
for debugging applications. These parts will be released on a weekly basis, if you want to skip the queue please buy the book here:
https://www.amazon.com/dp/B0BJC4Y1N1
Please leave comments and share your outage debugging stories.