DevOps Debugging Part 10: iostat (Final)

Neeran Gul
3 min readDec 24, 2022
Photo by Benjamin Davies on Unsplash

This is a multi-part series where we will explore essential unix commands for debugging applications. These skills are critical when an outage occurs or something doesn’t work as expected. This is aimed at DevOps Engineers, SREs and linux sysadmins. Below is a quick navigation if you want to jump to the other parts.

  1. netcat
  2. curl
  3. dig
  4. ps
  5. less
  6. df & du
  7. openssl
  8. lsof
  9. netstat
  10. iostat

In this part we are going to cover iostat. I/O are the read and write operations that are taking place on a server. Normally these don’t cause issues but in cases where CPU and memory are normal and the server is still clogging up disk I/O can be a critical factor. Basically there are too many read and write operators happening that the disk cannot keep up, slowing everything else down. Keep in mind that we will not cover the whole usage of the command and what fancy things it can do but rather how to use the command to debug servers and applications.

Installation

To install network on redhat/centos/ubuntu/osx run:

# redhat/centos/amazon linux
$ yum install sysstat
# ubuntu
$ apt-get install sysstat
# OSX/Mac (usually already installed)# test for installation
$ iostat

If you get a command not found back then please reach out below in the comments section.

Usage

Have a look at I/O operations.

$ iostat -h
Linux 5.15.0-1019-aws (ip-172-31-33-31) 10/16/22 _x86_64_ (1 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
0.9% 0.0% 0.5% 0.2% 0.2% 98.1%
tps kB_read/s kB_wrtn/s kB_dscd/s kB_read kB_wrtn kB_dscd Device
0.01 0.2k 0.0k 0.0k 16.2M 0.0k 0.0k loop0
0.00 0.0k 0.0k 0.0k 521.0k 0.0k 0.0k loop1
0.00 0.0k 0.0k 0.0k 2.3M 0.0k 0.0k loop2
0.00 0.0k 0.0k 0.0k 1.2M 0.0k 0.0k loop3
0.02 1.0k 0.0k 0.0k 90.7M 0.0k 0.0k loop4
0.01 0.4k 0.0k 0.0k 38.6M 0.0k 0.0k loop5
0.01 0.3k 0.0k 0.0k 29.6M 0.0k 0.0k loop6
4.12 26.8k 52.3k 0.0k 2.3G 4.5G 0.0k xvda

As we can see in the above output, the system is 98.1% idle, meaning there is barely anything happening on this server. The main stat to watch out here is %iowait. %iowait is how long a CPU is waiting to perform a read or write operation. If your iowait is high then most likely applications running on the server will not perform as expected.

Debugging

During an outage disk I/O should be the last thing you should concentrate on but that depends on your system. For example data science environments that train models can see I/O spikes, large ETL jobs and NFS volumes can be breeding grounds for high I/O. Have a look at iostat and determine if I/O really is an issue, then try to see if you can move the workload to a dedicated disk or distribute the workload so it doesn’t affect everything else. If even a dedicated disk doesn’t work then consider moving workloads to memory since memory is much faster than disk I/O.

Alternatives

iostat is a powerful tool but there are alternatives out there that provide almost the same functionality.

$ sar

sar is part of the same package but it streams the iostat output and collects the data about I/O.

$ htop

htop can show the system I/O and the same stats as iostat.

Conclusion

Thank you for sticking around these few weeks to get the end of this series. I hope these become a reference and come in handy during difficult times. To carry a portable copy with you at all times consider purchasing the book here:

https://www.amazon.com/dp/B0BJC4Y1N1

Please leave comments and share your outage debugging stories.

--

--

Neeran Gul

Industry veteran providing strong mentorship and sharing experiences.