How to use the command 'nvidia-smi' (with examples)
The ’nvidia-smi’ command is a powerful tool provided by NVIDIA to aid in the management and monitoring of NVIDIA GPU devices. It provides detailed information about the available GPUs and the processes using them.
Use case 1: Display information on all available GPUs and processes using them
Code:
nvidia-smi
Motivation: This example is useful when you want to quickly check the current status of all GPUs and the processes that are currently using them. It helps in identifying any GPU bottlenecks or processes that are utilizing the GPU extensively.
Explanation: This command without any arguments displays a summary of information for all available GPUs, including their utilization, memory usage, and the processes currently using them.
Example Output:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:00:1E.0 Off | 0 |
| N/A 36C P8 29W / 149W | 0MiB / 11441MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K80 Off | 00000000:00:1F.0 Off | 0 |
| N/A 31C P8 26W / 149W | 0MiB / 11441MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
...
Use case 2: Display more detailed GPU information
Code:
nvidia-smi --query
Motivation: This example is useful when you need more detailed information about the GPUs, including their power usage, driver version, CUDA version, and more. It provides a comprehensive overview of the GPU properties.
Explanation: This command with the ‘–query’ argument displays a wide range of information about the GPU devices, including their name, persistence mode, bus ID, display affinity, ECC status, utilization, memory usage, power usage, compute mode, and more.
Example Output:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:00:1E.0 Off | 0 |
| N/A 36C P8 29W / 149W | 0MiB / 11441MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K80 Off | 00000000:00:1F.0 Off | 0 |
| N/A 31C P8 26W / 149W | 0MiB / 11441MiB | 0% Default |
| | | N/A |
...
...
Use case 3: Monitor overall GPU usage with 1-second update interval
Code:
nvidia-smi dmon
Motivation: This example is useful when you want to continuously monitor the overall GPU usage of all available GPUs in real-time. It helps in tracking the GPU performance and utilization over time.
Explanation: This command with the ‘dmon’ argument activates the dynamic monitoring of the GPU utilization and provides an update every second. It displays the GPU utilization percentage, memory usage, temperature, power usage, and more.
Example Output:
##### timestamp gpu pwr_gfx pwr_mem pwr_pc pwr_tot
1422296980 - - - - -
1422296981 0 7840 / 14900 1219 / 875 7821 / 300 9440 / 720
1422296982 0 7698 / 14900 1209 / 875 7819 / 300 9507 / 720
1422296983 0 7731 / 14900 1216 / 875 7829 / 300 9547 / 720
...
...
Conclusion:
The ’nvidia-smi’ command is a versatile tool for managing and monitoring NVIDIA GPUs. With the ability to display GPU information, monitor GPU usage, and provide detailed statistics, it assists in optimizing the performance and troubleshooting of GPU-intensive applications.