Reading /proc/stat: CPU Monitoring from the Ground Up

After watching a great bash tutorial series, I went down a rabbit hole understanding exactly how Linux exposes CPU stats through the /proc filesystem and ended up building on top of it.

linuxbashdevopssystemsmonitoring

A few weeks ago I stumbled onto You Suck At Programming's bash learning series. Dave is one of those programmers who makes you feel dumb in the best possible way, the kind of person where watching them work for 20 minutes is worth more than most tutorials. A big chunk of the proc-cpu-monitor project started with his code from the series.

I liked the concept a lot, so I kept poking at it and here is what I learned.


The /proc filesystem

The /proc filesystem in Linux is a pseudo-filesystem - it exists purely in memory and provides a live interface into kernel data structures. There's no disk involved. What looks like a file is really the kernel answering a read request with a snapshot of its internal state.

"The /proc filesystem in Linux is a pseudo-filesystem that provides an interface to kernel data structures. Unlike regular filesystems that store data on disk, /proc exists purely in memory, exposing kernel and process information as files and directories." — The Linux Kernel Documentation

You can see this yourself:

cat /proc/stat

What comes back is a plain text dump of aggregate CPU statistics, everything accumulated since the system first booted. Those cumulative numbers are what we work with.


Breaking down /proc/stat

Each cpu line in /proc/stat gives you ten fields. Here's what they actually mean:

We are going to take a quick look at some of these headers:

user - normal processes in user mode

nice - niced processes in user mode

system - kernel mode time

idle - doing nothing

iowait - waiting for I/O

irq - servicing hardware interrupts

softirq - servicing software interrupts

steal - involuntary wait

guest - time spent running a normal guest OS

guest_nice - time spent running niced guest

All of these are cumulative tick counts. To get a meaningful CPU percentage you need to take two snapshots, compute the delta on each field, then divide by the total delta. That's the core loop in the script.


What is up with iowait

iowait deserves its own section because it's genuinely weird. The naive reading is "percentage of time the CPU was waiting for I/O", but that's not quite right.

The real definition: the percentage of time the system was idle while at least one process was waiting for disk I/O to finish.

That distinction matters for a few reasons:

  • CPUs don't actually wait. When a task blocks on I/O, the kernel schedules something else. The CPU keeps running. iowait is time that was idle, not time that was blocked.
  • Multi-core makes it worse. If you have 8 cores and one process is waiting on disk, the waiting task isn't running on any CPU. The other 7 cores have no idea. So iowait gets diluted or miscounted depending on which CPU the kernel attributes idle time to.

This means iowait can undercount on multi-core machines. A system doing heavy I/O might show surprisingly low %wa just because the other cores were busy with other work.

That said, iowait still has real signal. If a service is performing poorly while CPU usage looks healthy, elevated iowait can be the clue that sends you to the disk:

# Cross-reference with iostat to confirm
iostat -x 1

# vmstat shows %wa column too
vmstat 1

If you see high %util and await times on a disk (iostat) alongside high %wa (vmstat), you've likely got an I/O bottleneck. In database contexts, common culprits are missing indexes that force full table scans, or queries doing large sequential reads that saturate the disk before the CPU even gets involved.

iowait won't tell you which query is the problem, but it'll tell you where to start looking.


What I added to the original script

The base from YSAP reads /proc/stat and renders a CPU usage graph in the terminal. I extended it in three directions:

System uptime via /proc/uptime — adds a formatted uptime counter to the display header. The file gives you two floats: total uptime and idle time. A bit of bash arithmetic formats it into Xd Xh Xm Xs.

Load average bar via /proc/loadavg/proc/loadavg exposes the 1, 5, and 15-minute load averages as the first three fields. I took the 1-minute value and rendered it as a proportional bar next to the CPU graph, so you can see load trend at a glance without doing mental math.

Color coding the CPU usage bar changes color based on thresholds: green when usage is low, yellow when it's moderate, red when it's high. Implemented with ANSI escape codes. Turns out making a terminal tool look decent is mostly just knowing which escape sequences to use.


References