How to use the command 'nvidia-smi' (with examples)

How to use the command 'nvidia-smi' (with examples)

The ’nvidia-smi’ command is a powerful tool provided by NVIDIA to aid in the management and monitoring of NVIDIA GPU devices. It provides detailed information about the available GPUs and the processes using them.

Use case 1: Display information on all available GPUs and processes using them

Code:

nvidia-smi

Motivation: This example is useful when you want to quickly check the current status of all GPUs and the processes that are currently using them. It helps in identifying any GPU bottlenecks or processes that are utilizing the GPU extensively.

Explanation: This command without any arguments displays a summary of information for all available GPUs, including their utilization, memory usage, and the processes currently using them.

Example Output:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   36C    P8    29W / 149W |      0MiB / 11441MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K80           Off  | 00000000:00:1F.0 Off |                    0 |
| N/A   31C    P8    26W / 149W |      0MiB / 11441MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

...

Use case 2: Display more detailed GPU information

Code:

nvidia-smi --query

Motivation: This example is useful when you need more detailed information about the GPUs, including their power usage, driver version, CUDA version, and more. It provides a comprehensive overview of the GPU properties.

Explanation: This command with the ‘–query’ argument displays a wide range of information about the GPU devices, including their name, persistence mode, bus ID, display affinity, ECC status, utilization, memory usage, power usage, compute mode, and more.

Example Output:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   36C    P8    29W / 149W |      0MiB / 11441MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K80           Off  | 00000000:00:1F.0 Off |                    0 |
| N/A   31C    P8    26W / 149W |      0MiB / 11441MiB |      0%      Default |
|                               |                      |                  N/A |
...
...

Use case 3: Monitor overall GPU usage with 1-second update interval

Code:

nvidia-smi dmon

Motivation: This example is useful when you want to continuously monitor the overall GPU usage of all available GPUs in real-time. It helps in tracking the GPU performance and utilization over time.

Explanation: This command with the ‘dmon’ argument activates the dynamic monitoring of the GPU utilization and provides an update every second. It displays the GPU utilization percentage, memory usage, temperature, power usage, and more.

Example Output:

#####   timestamp               gpu     pwr_gfx    pwr_mem    pwr_pc   pwr_tot
1422296980                         -          -          -         -         -
1422296981          0  7840 / 14900  1219 /  875  7821 / 300  9440 / 720
1422296982                        0   7698 / 14900   1209 /  875   7819 / 300   9507 / 720
1422296983                        0   7731 / 14900   1216 /  875   7829 / 300   9547 / 720
...
...

Conclusion:

The ’nvidia-smi’ command is a versatile tool for managing and monitoring NVIDIA GPUs. With the ability to display GPU information, monitor GPU usage, and provide detailed statistics, it assists in optimizing the performance and troubleshooting of GPU-intensive applications.

Related Posts

How to use the command 'pass otp' (with examples)

How to use the command 'pass otp' (with examples)

The ‘pass otp’ command is a pass extension that allows for the management of one-time-password (OTP) tokens.

Read More
How to use the command "biometrickitd" (with examples)

How to use the command "biometrickitd" (with examples)

The “biometrickitd” command provides support for biometric operations. It is not meant to be manually invoked and is typically used by other applications or services that require biometric functionality.

Read More
How to use the command 'rustup show' (with examples)

How to use the command 'rustup show' (with examples)

This article provides examples of using the ‘rustup show’ command, which is used to display information about installed toolchains, targets, and the version of ‘rustc’.

Read More