GPU usage monitoring (CUDA)
I installed CUDA toolkit on my computer and started BOINC project on GPU. In BOINC I can see that it is running on GPU, but is there a tool that can show me more details about that what is running on GPU – GPU usage and memory usage?
For Intel GPU’s there exists the intel-gpu-tools
from http://intellinuxgraphics.org/ project, which brings the command intel_gpu_top
(amongst other things). It is similar to top
and htop
, but specifically for the Intel GPU.
render busy: 18%: ███▋ render space: 39/131072
bitstream busy: 0%: bitstream space: 0/131072
blitter busy: 28%: █████▋ blitter space: 28/131072
task percent busy
GAM: 33%: ██████▋ vert fetch: 0 (0/sec)
GAFS: 3%: ▋ prim fetch: 0 (0/sec)
VS: 0%: VS invocations: 559188 (150/sec)
SF: 0%: GS invocations: 0 (0/sec)
VF: 0%: GS prims: 0 (0/sec)
DS: 0%: CL invocations: 186396 (50/sec)
CL: 0%: CL prims: 186396 (50/sec)
SOL: 0%: PS invocations: 8191776208 (38576436/sec)
GS: 0%: PS depth pass: 8158502721 (38487525/sec)
HS: 0%:
TE: 0%:
GAFM: 0%:
SVG: 0%:
For Nvidia GPUs there is a tool nvidia-smi
that can show memory usage, GPU utilization and temperature of GPU. There also is a list of compute processes and few more options but my graphic card (GeForce 9600 GT) is not fully supported.
Sun May 13 20:02:49 2012
+------------------------------------------------------+
| NVIDIA-SMI 3.295.40 Driver Version: 295.40 |
|-------------------------------+----------------------+----------------------+
| Nb. Name | Bus Id Disp. | Volatile ECC SB / DB |
| Fan Temp Power Usage /Cap | Memory Usage | GPU Util. Compute M. |
|===============================+======================+======================|
| 0. GeForce 9600 GT | 0000:01:00.0 N/A | N/A N/A |
| 0% 51 C N/A N/A / N/A | 90% 459MB / 511MB | N/A Default |
|-------------------------------+----------------------+----------------------|
| Compute processes: GPU Memory |
| GPU PID Process name Usage |
|=============================================================================|
| 0. Not Supported |
+-----------------------------------------------------------------------------+
For OS X
Including Mountain Lion
Excluding Mountain Lion
The last version of atMonitor to support GPU related features is atMonitor 2.7.1.
– and the link to 2.7.1 delivers 2.7b.
For the more recent version of the app, atMonitor – FAQ explains:
To make atMonitor compatible with MacOS 10.8 we have removed all GPU related features.
I experimented with 2.7b a.k.a. 2.7.1 on Mountain Lion with a MacBookPro5,2 with NVIDIA GeForce 9600M GT. The app ran for a few seconds before quitting, it showed temperature but not usage:
For linux, use nvidia-smi -l 1
will continually give you the gpu usage info, with in refresh interval of 1 second.
For completeness, AMD has two options:
-
fglrx (closed source drivers).
$ aticonfig --odgc --odgt
-
mesa (open source drivers), you can use RadeonTop.
$ sudo apt-get -y install radeontop; radeontop
View your GPU utilization, both for the total activity percent and individual blocks.
for nvidia on linux i use the following python script which uses an optional delay and repeat like iostat and vmstat
https://gist.github.com/matpalm/9c0c7c6a6f3681a0d39d
$ gpu_stat.py 1 2
{"util":{"PCIe":"0", "memory":"10", "video":"0", "graphics":"11"}, "used_mem":"161", "time": 1424839016}
{"util":{"PCIe":"0", "memory":"10", "video":"0", "graphics":"9"}, "used_mem":"161", "time":1424839018}
nvidia-smi
does not work on some linux machines (returns N/A for many properties). You can use nvidia-settings
instead (this is also what mat kelcey used in his python script).
nvidia-settings -q GPUUtilization -q useddedicatedgpumemory
You can also use:
watch -n0.1 "nvidia-settings -q GPUUtilization -q useddedicatedgpumemory"
for continuous monitoring.
Recently I have written a simple command-line utility called gpustat
(which is a wrapper of nvidia-smi
) : please take a look at https://github.com/wookayin/gpustat.
For Linux, I use this HTOP like tool that I wrote myself. It monitors and gives an overview of the GPU temperature as well as the core / VRAM / PCI-E & memory bus usage. It does not monitor what’s running on the GPU though.
I have a GeForce 1060 GTX video card and I found that the following command give me info about card utilization, temperature, fan speed and power consumption:
$ nvidia-smi --format=csv --query-gpu=power.draw,utilization.gpu,fan.speed,temperature.gpu
You can see list of all query options with:
$ nvidia-smi --help-query-gpu
I have had processes terminate (probably killed or crashed) and continue to use resources, but were not listed in nvidia-smi
. Usually these processes were just taking gpu memory.
If you think you have a process using resources on a GPU and it is not being shown in nvidia-smi
, you can try running this command to double check. It will show you which processes are using your GPUs.
sudo fuser -v /dev/nvidia*
This works on EL7, Ubuntu or other distributions might have their nvidia devices listed under another name/location.
The following function appends information such as PID, user name, CPU usage, memory usage, GPU memory usage, program arguments and run time of processes that are being run on the GPU, to the output of nvidia-smi
:
function better-nvidia-smi () {
nvidia-smi
join -1 1 -2 3
<(nvidia-smi --query-compute-apps=pid,used_memory
--format=csv
| sed "s/ //g" | sed "s/,/ /g"
| awk 'NR<=1 {print toupper($0)} NR>1 {print $0}'
| sed "/[NotSupported]/d"
| awk 'NR<=1{print $0;next}{print $0| "sort -k1"}')
<(ps -a -o user,pgrp,pid,pcpu,pmem,time,command
| awk 'NR<=1{print $0;next}{print $0| "sort -k3"}')
| column -t
}
Example output:
$ better-nvidia-smi
Fri Sep 29 16:52:58 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 378.13 Driver Version: 378.13 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GT 730 Off | 0000:01:00.0 N/A | N/A |
| 32% 49C P8 N/A / N/A | 872MiB / 976MiB | N/A Default |
+-------------------------------+----------------------+----------------------+
| 1 Graphics Device Off | 0000:06:00.0 Off | N/A |
| 23% 35C P8 17W / 250W | 199MiB / 11172MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 Not Supported |
| 1 5113 C python 187MiB |
+-----------------------------------------------------------------------------+
PID USED_GPU_MEMORY[MIB] USER PGRP %CPU %MEM TIME COMMAND
9178 187MiB tmborn 9175 129 2.6 04:32:19 ../path/to/python script.py args 42
Glances has a plugin which shows GPU utilization and memory usage.
http://glances.readthedocs.io/en/stable/aoa/gpu.html
Uses the nvidia-ml-py3 library: https://pypi.python.org/pypi/nvidia-ml-py3
You can use nvtop
, it’s similar to htop
but for NVIDIA GPUs. Link: https://github.com/Syllo/nvtop
This script is more readable and is designed for easy mods and extensions.
You can replace gnome-terminal with your favorite terminal window program.
#! /bin/bash
if [ "$1" = "--guts" ]; then
echo; echo " ctrl-c to gracefully close"
f "$a"
f "$b"
exit 0; fi
# easy to customize here using "nvidia-smi --help-query-gpu" as a guide
a='--query-gpu=pstate,memory.used,utilization.memory,utilization.gpu,encoder.stats.sessionCount'
b='--query-gpu=encoder.stats.averageFps,encoder.stats.averageLatency,temperature.gpu,power.draw'
p=0.5 # refresh period in seconds
s=110x9 # view port as width_in_chars x line_count
c="s/^/ /; s/, +/t/g"
t="`echo '' |tr 'n' 't'`"
function f() { echo; nvidia-smi --format=csv "$1" |sed -r "$c" |column -t "-s$t" "-o "; }
export c t a b; export -f f
gnome-terminal --hide-menubar --geometry=$s -- watch -t -n$p "`readlink -f "$0"`" --guts
#
License: GNU GPLv2, TranSeed Research
You can use
nvidia-smi pmon -i 0
to monitor every process in GPU 0. including compute/graphic mode, sm usage, memory usage, encoder usage, decoder usage.
I didn’t see it in the available answers (except maybe in a comment), so I thought I’d add that you can get a nicer refreshing nvidia-smi
with watch
. This refreshes the screen with each update rather than scrolling constantly.
watch -n 1 nvidia-smi
for one second interval updates. Replace the 1
with whatever you want, including fractional seconds:
watch -n 5 nvidia-smi
watch -n 0.1 nvidia-smi
Recently, I have written a monitoring tool called nvitop
, the interactive NVIDIA-GPU process viewer.
It is written in pure Python and is easy to install.
Install from PyPI:
pip3 install --upgrade nvitop
Install the latest version from GitHub:
pip3 install git+https://github.com/XuehaiPan/nvitop.git#egg=nvitop
Run as a resource monitor:
nvitop
nvitop
will show the GPU status like nvidia-smi
but with additional fancy bars and history graphs.
For the processes, it will use psutil
to collect process information and display the USER
, %CPU
, %MEM
, TIME
and COMMAND
fields, which is much more detailed than nvidia-smi
. Besides, it is responsive for user inputs in monitor mode. You can interrupt or kill your processes on the GPUs.
nvitop
comes with a tree-view screen and an environment screen:
In addition, nvitop
can be integrated into other applications. For example, integrate into PyTorch training code:
import os
from nvitop.core import host, CudaDevice, HostProcess, GpuProcess
from torch.utils.tensorboard import SummaryWriter
device = CudaDevice(0)
this_process = GpuProcess(os.getpid(), device)
writer = SummaryWriter()
for epoch in range(n_epochs):
# some training code here
# ...
this_process.update_gpu_status()
writer.add_scalars(
'monitoring',
{
'device/memory_used': float(device.memory_used()) / (1 << 20), # convert bytes to MiBs
'device/memory_percent': device.memory_percent(),
'device/memory_utilization': device.memory_utilization(),
'device/gpu_utilization': device.gpu_utilization(),
'host/cpu_percent': host.cpu_percent(),
'host/memory_percent': host.virtual_memory().percent,
'process/cpu_percent': this_process.cpu_percent(),
'process/memory_percent': this_process.memory_percent(),
'process/used_gpu_memory': float(this_process.gpu_memory()) / (1 << 20), # convert bytes to MiBs
'process/gpu_sm_utilization': this_process.gpu_sm_utilization(),
'process/gpu_memory_utilization': this_process.gpu_memory_utilization(),
},
global_step
)
See https://github.com/XuehaiPan/nvitop for more details.
Note: nvitop
is dual-licensed by the GPLv3 License and Apache-2.0 License. Please feel free to use it as a dependency for your own projects. See Copyright Notice for more details.
you can use "GPU Dashboards in Jupyter Lab"
Introduction
NVDashboard is an open-source package for the real-time visualization of NVIDIA GPU metrics in interactive Jupyter Lab environments. NVDashboard is a great way for all GPU users to monitor system resources. However, it is especially valuable for users of RAPIDS, NVIDIA’s open-source suite of GPU-accelerated data-science software libraries.ref
To monitor GPU usage in real-time, you can use the nvidia-smi command with the –loop option on systems with NVIDIA GPUs. Open a terminal and run the following command:
nvidia-smi --query-gpu=timestamp,name,utilization.gpu,utilization.memory,memory.total,memory.free,memory.used --format=csv --loop=1
This command will display GPU usage information in real-time with a refresh interval of 1 second (you can change the interval by modifying the value after –loop=). The displayed information includes timestamp, GPU name, GPU utilization, memory utilization, total memory, free memory, and used memory.