How to use the command bpftrace (with examples)
- Linux
- December 25, 2023
bpftrace is a high-level tracing language for Linux eBPF (Extended Berkeley Packet Filter). It allows users to write programs that trace and profile the operation of the kernel or other processes, providing insights into performance bottlenecks, debugging, and system monitoring. This article will illustrate various use cases of the bpftrace command.
Use case 1: Display bpftrace version
Code:
bpftrace -V
Motivation: This use case is helpful when you want to verify the version of bpftrace installed on your system. It is crucial to ensure compatibility and take advantage of any new features or bug fixes introduced in the latest release.
Explanation:
The -V
option is a short form of --version
that instructs bpftrace to display its version information.
Example output:
bpftrace version 0.11.0
Use case 2: List all available probes
Code:
sudo bpftrace -l
Motivation: Listing all available probes can be useful when you want to explore the available options for tracing different aspects of the system or application. It allows you to find specific probes to monitor and understand various events or behaviors.
Explanation:
The -l
option instructs bpftrace to list all available probes that it can trace. The command must be run with superuser privileges (sudo
) to access the required resources.
Example output:
kprobe:sys_read /sys/kernel/mm/ksm/\\* r
kprobe:sys_write /sys/kernel/mm/ksm/\\* w
uprobe:/lib/x86_64-linux-gnu/libc.so.6 /memcpy r
...
Use case 3: Run a one-liner program (e.g., syscall count by program)
Code:
sudo bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }'
Motivation: This example demonstrates how to write a simple one-liner program that counts the number of syscalls performed by each program running on the system. It can help identify processes that make excessive system calls and optimize their usage.
Explanation:
The -e
option followed by the program code enclosed in quotes instructs bpftrace to run a one-liner program. In this case, the program uses the tracepoint:raw_syscalls:sys_enter
probe to trace syscalls and increments a per-program counter stored in @[comm]
.
Example output:
Attaching 1 probe...
^C
perl: 14
systemd-journald: 29
systemd: 31
bpfcc: 72
sshd: 97
influxd: 214
irqbalance: 214
...
Use case 4: Run a program from a file
Code:
sudo bpftrace {{path/to/file}}
Motivation: Running a bpftrace program from a file allows for more complex scripts to be executed. This is especially useful when working on large-scale tracing or complex debugging scenarios that require more extensive scripts.
Explanation:
Executing a program from a file involves providing the file path as an argument to the bpftrace
command.
Example output: (Assuming the program in the file performs a specific custom tracing)
Attaching 1 probe...
tracepoint:syscalls:sys_enter_openat filename=0x7f086927cd30 flags=0x800 read=5 write=0 (b'/.', c'O_RDONLY|O_DIRECTORY')
tracepoint:syscalls:sys_enter_openat filename=0x7f086927d570 flags=0x800 read=5 write=0 (b'/dev', c'O_RDONLY|O_DIRECTORY')
tracepoint:syscalls:sys_enter_openat filename=0x7f086927cff8 flags=0x800 read=5 write=0 (b'/proc', c'O_RDONLY|O_DIRECTORY')
...
Use case 5: Trace a program by PID
Code:
sudo bpftrace -e '{{tracepoint:raw_syscalls:sys_enter /pid == 123/ { @[comm] = count(); }}}'
Motivation: Tracing a specific program by its process ID (PID) can be useful when you want to monitor and analyze a particular process’s behavior or performance in real-time.
Explanation:
To trace a program by PID, the program code includes a condition that matches the desired PID in the tracepoint:raw_syscalls:sys_enter
probe. In this example, the program traces syscalls for the process with PID 123 and increments the per-program counter stored in @[comm]
.
Example output:
Attaching 2 probes...
^C
python: 27
Use case 6: Do a dry run and display the output in eBPF format
Code:
sudo bpftrace -d -e '{{one_line_program}}'
Motivation:
Performing a dry run with the -d
option can help analyze the behavior and verify the correctness of a bpftrace program without having it attached to actual probes. Additionally, displaying the output in eBPF format (-d
) can aid in understanding the underlying bytecode representation.
Explanation:
The -d
option instructs bpftrace to perform a dry run of the program without attaching to any probes. The program code is provided within the quotes following the -e
option.
Example output:
1: BEGIN
| start: {
| @start_ts = nsecs;
| printf("Probe begin\\n");
| }
3: EXIT
| end: {
| @end_ts = nsecs;
| printf("Probe exit\\n");
| }
Conclusion:
bpftrace is a powerful tool for tracing and profiling various aspects of a Linux system. By understanding different use cases and examples of the bpftrace command, users can harness its capabilities to gain insights into system behavior, optimize performance, and troubleshoot issues effectively.