How can I see i/o stats for a briefly running process?

For long running processes like init, I can do things like

$ cat /proc/[pid]/io

What can I do if I want to see stats for a briefly running process such as a command line utility like ls? I don’t even know how to see the pid for such a briefly running process…

Asked By: labyrinth

||

You can start command in background then get its pid via $! variable.

Example:

$ ls & cat /proc/$!/io
[1] 6410
rchar: 7004
wchar: 0
syscr: 13
syscw: 0
read_bytes: 0
write_bytes: 0
cancelled_write_bytes: 0
Answered By: cuonglm

Basically, it sounds like you want general advice on profiling an application’s I/O at runtime. You’ve been doing this with /proc/$PID/io which will give you some sort of idea of how much bandwidth is being used between disk and memory for the application. Polling this file can give you a rough idea of what the process is doing but it’s an incomplete picture since it only shows you how much data is being shoved to and from disk.

To solve your stated problem, you basically have the following options:

  • Use platform instrumentation. On Linux writing a SystemTap script is the most feature complete solution but depending on how hardcore you’re wanting to go that may be more work than you’re really willing to expend for the desired benefit.

  • Use application-based instrumentation. A lot of ways to do this but gprof profile is a popular option. Some applications may also provide their own instrumentation, but you’d have to check.

Probably, the best alternative is to use already existing platform instrumentation tools together to achieve the desired effect and to get the most out of it.


I’m not aware of a program that will kick off an application and do all this for you (doesn’t mean there isn’t by any means, just that I haven’t heard of it) so your best bet is to just start gathering system-wide information and just filter for the PID you’re concerned about after the fact (so that you get a full sample).

First things first, I would enable auditing of execve calls so that you can save the PID of the application you’re kicking off. Once you have the PID, you can remove auditing.

Run mount debugfs -t debugfs /sys/kernel/debug to get the debugfs running, so you can run the blktrace.

On my system I ran blktrace -d /dev/sda -a read -a write -o - | blkparse -i - but you can adjust accordingly. Here is some example blktrace output:

8,0   15        3 1266874889.709440165 32679  Q   W 20511277 + 8 [rpc.mountd]

In the above output the fifth column (32679) is the PID associated with the application performing the write. The parts we care about are the Q (event type, queued) the W (RWBS field, W means it’s an write since there’s no S in that field as well the implication is that it was an asynchronous one.) and the 20511277 + 8 (operation starts at block number 20511277 and goes for another eight blocks). Determining read/write sizes should just be adding the blocks together and multiplying by block size.

blktrace will also tell you about more than just throughput, it will also let you see if there’s anything going on with merges that you care about.

Once you have the blktrace running, you can spawn the process using strace -c which will give you a feeling for the average latency associated with each system call (including read and write operations). Depending on how reliable each invocation needs to be latency can be important, it can also tell you more about what the application is doing (to point out areas to explore tuning) without having any application instrumentation going.

Between those two you should get a pretty good sampling of what your program is doing without losing any data or possibly including the I/O of other applications. Obviously there are more ways to do this than what I’ve described but this is how I would’ve solved the problem.

One should also be able to collect I/O-related latency measures by messing with blkparse‘s output options, for instance. I just didn’t because I haven’t played with them enough.

Answered By: Bratchley