DevOps Debugging Part 4: ps

Neeran Gul
4 min readNov 11, 2022
Photo by Matthew Smith on Unsplash

This is a multi-part series where we will explore essential unix commands for debugging applications. These skills are critical when an outage occurs or something doesn’t work as expected. This is aimed at DevOps Engineers, SREs and linux sysadmins. Below is a quick navigation if you want to jump to the other parts.

  1. netcat
  2. curl
  3. dig
  4. ps
  5. less
  6. df & du
  7. openssl
  8. lsof
  9. netstat
  10. iostat

In this part we are going to cover ps . This command is a great way of debugging linux processes and to find bottlenecks during outages. When a server is under high load or unavailable, it is common practice to find out what is the cause of this and eventually to find a mitigation. With ps we can identify parent and child processes to fine tune our targets for a root cause analysis. Keep in mind that we will not cover the whole usage of the command and what fancy things it can do but rather how to use the command to debug servers and applications.

Installation

To install network on redhat/centos/ubuntu/osx run:

# redhat/centos/amazon linux
$ yum install procps
# ubuntu
$ apt-get install procps
# OSX/Mac (usually already installed)# test for installation
$ ps --help

If you get a command not found back then please reach out below in the comments section.

Usage

List all running processes inside of an instance or docker container.

$ ps ax
PID TTY STAT TIME COMMAND
1 ? Ss 0:03 /sbin/init
2 ? S 0:00 [kthreadd]
3 ? I< 0:00 [rcu_gp]
4 ? I< 0:00 [rcu_par_gp]
5 ? I< 0:00 [netns]
6 ? I 0:00 [kworker/0:0-events]
...

We can see the PID (Process ID), STAT (Process state) and the command that is running. Let’s look for a particular process.

$ ps ax | grep mongo
449 ? Ssl 0:56 /usr/bin/mongod --config /etc/mongod.conf

As we can see above, I can use grep to search for a particular process. In my case the mongodb daemon is running with the above command. With this I can confirm that the process itself is running, so if it isn’t responding I can try a few things.

# I can try to restart the process
$ sudo systemctl restart mongod
# If a restart is not working, I can try to kill
$ sudo kill 449
# kill by force
$ sudo kill -9 449
# If the process is in Z state, we will need to reboot
$ reboot

Sometimes if a process is killed, it can be respawned by the parent process. Let’s find out the parent of our mongodb process.

$ ps -ef | grep mongo
mongodb 449 1 0 09:27 ? 00:00:58 /usr/bin/mongod --config /etc/mongod.conf

We can see from the above output that the parent process ID is 1. PID 1 is a special process ID for the kernel, meaning systemd started this process upon boot. In this case we won’t try to kill the kernel process but if we had rogue parent processes, we can get their PIDs and kill them.

Debugging

During an outage ps could be a great to determine if a process is running or not. SSH into the server, run ps and see if you can see the process running. Now it might not be the python process or java process, it could even be docker that is not running or kubelet has died. Try to see if disk space is running out or check the logs as to why that process died. Once root cause determined, restart the process with systemctl.

Alternatives

ps is super powerful at the low level to try to figure out any bottlenecks in a given docker container or Kubernetes worker. However there are better alternatives.

htop is a much more detailed view of what is running on a server. We can see how much CPU or memory each process is taking and aggregate CPU/memory alongside the PID. The only downside of htop is that when a server is under high load or hung, it might not be possible to load up the interface since that takes some processing power compared to ps.

Similarly top is the poor mans version of htop. Shows pretty much the same information but is more lightweight. There are a bunch of alternatives that have popped up in recent years, for example gtop, vtop and ptop but keep in mind these require external dependencies such as python or npm to install.

Conclusion

In the next part we are going to cover less for debugging applications. These parts will be released on a weekly basis, if you want to skip the queue please buy the book here:

https://www.amazon.com/dp/B0BJC4Y1N1

Please leave comments and share your outage debugging stories.

--

--

Neeran Gul

Industry veteran providing strong mentorship and sharing experiences.