xref: /aosp_15_r20/external/bcc/tools/argdist_example.txt (revision 387f9dfdfa2baef462e92476d413c7bc2470293e)
1Demonstrations of argdist.
2
3
4argdist probes functions you specify and collects parameter values into a
5histogram or a frequency count. This can be used to understand the distribution
6of values a certain parameter takes, filter and print interesting parameters
7without attaching a debugger, and obtain general execution statistics on
8various functions.
9
10For example, suppose you want to find what allocation sizes are common in
11your application:
12
13# ./argdist -p 2420 -c -C 'p:c:malloc(size_t size):size_t:size'
14[01:42:29]
15p:c:malloc(size_t size):size_t:size
16        COUNT      EVENT
17[01:42:30]
18p:c:malloc(size_t size):size_t:size
19        COUNT      EVENT
20[01:42:31]
21p:c:malloc(size_t size):size_t:size
22        COUNT      EVENT
23        1          size = 16
24[01:42:32]
25p:c:malloc(size_t size):size_t:size
26        COUNT      EVENT
27        2          size = 16
28[01:42:33]
29p:c:malloc(size_t size):size_t:size
30        COUNT      EVENT
31        3          size = 16
32[01:42:34]
33p:c:malloc(size_t size):size_t:size
34        COUNT      EVENT
35        4          size = 16
36^C
37
38It seems that the application is allocating blocks of size 16. The COUNT
39column contains the number of occurrences of a particular event, and the
40EVENT column describes the event. In this case, the "size" parameter was
41probed and its value was 16, repeatedly.
42
43Now, suppose you wanted a histogram of buffer sizes passed to the write()
44function across the system:
45
46# ./argdist -c -H 'p:c:write(int fd, void *buf, size_t len):size_t:len'
47[01:45:22]
48p:c:write(int fd, void *buf, size_t len):size_t:len
49     len                 : count     distribution
50         0 -> 1          : 0        |                                        |
51         2 -> 3          : 2        |*************                           |
52         4 -> 7          : 0        |                                        |
53         8 -> 15         : 2        |*************                           |
54        16 -> 31         : 0        |                                        |
55        32 -> 63         : 6        |****************************************|
56[01:45:23]
57p:c:write(int fd, void *buf, size_t len):size_t:len
58     len                 : count     distribution
59         0 -> 1          : 0        |                                        |
60         2 -> 3          : 11       |***************                         |
61         4 -> 7          : 0        |                                        |
62         8 -> 15         : 4        |*****                                   |
63        16 -> 31         : 0        |                                        |
64        32 -> 63         : 28       |****************************************|
65        64 -> 127        : 12       |*****************                       |
66[01:45:24]
67p:c:write(int fd, void *buf, size_t len):size_t:len
68     len                 : count     distribution
69         0 -> 1          : 0        |                                        |
70         2 -> 3          : 21       |****************                        |
71         4 -> 7          : 0        |                                        |
72         8 -> 15         : 6        |****                                    |
73        16 -> 31         : 0        |                                        |
74        32 -> 63         : 52       |****************************************|
75        64 -> 127        : 26       |********************                    |
76^C
77
78It seems that most writes fall into three buckets: very small writes of 2-3
79bytes, medium writes of 32-63 bytes, and larger writes of 64-127 bytes.
80
81But these are writes across the board -- what if you wanted to focus on writes
82to STDOUT?
83
84# ./argdist -c -H 'p:c:write(int fd, void *buf, size_t len):size_t:len:fd==1'
85[01:47:17]
86p:c:write(int fd, void *buf, size_t len):size_t:len:fd==1
87     len                 : count     distribution
88         0 -> 1          : 0        |                                        |
89         2 -> 3          : 0        |                                        |
90         4 -> 7          : 0        |                                        |
91         8 -> 15         : 1        |****************************************|
92        16 -> 31         : 0        |                                        |
93        32 -> 63         : 1        |****************************************|
94[01:47:18]
95p:c:write(int fd, void *buf, size_t len):size_t:len:fd==1
96     len                 : count     distribution
97         0 -> 1          : 0        |                                        |
98         2 -> 3          : 0        |                                        |
99         4 -> 7          : 0        |                                        |
100         8 -> 15         : 2        |*************                           |
101        16 -> 31         : 0        |                                        |
102        32 -> 63         : 3        |********************                    |
103        64 -> 127        : 6        |****************************************|
104[01:47:19]
105p:c:write(int fd, void *buf, size_t len):size_t:len:fd==1
106     len                 : count     distribution
107         0 -> 1          : 0        |                                        |
108         2 -> 3          : 0        |                                        |
109         4 -> 7          : 0        |                                        |
110         8 -> 15         : 3        |*********                               |
111        16 -> 31         : 0        |                                        |
112        32 -> 63         : 5        |***************                         |
113        64 -> 127        : 13       |****************************************|
114^C
115
116The "fd==1" part is a filter that is applied to every invocation of write().
117Only if the filter condition is true, the value is recorded.
118
119You can also use argdist to trace kernel functions. For example, suppose you
120wanted a histogram of kernel allocation (kmalloc) sizes across the system,
121printed twice with 3 second intervals:
122
123# ./argdist -i 3 -n 2 -H 'p::__kmalloc(size_t size):size_t:size'
124[01:50:00]
125p::__kmalloc(size_t size):size_t:size
126     size                : count     distribution
127         0 -> 1          : 0        |                                        |
128         2 -> 3          : 0        |                                        |
129         4 -> 7          : 0        |                                        |
130         8 -> 15         : 6        |****************************************|
131[01:50:03]
132p::__kmalloc(size_t size):size_t:size
133     size                : count     distribution
134         0 -> 1          : 0        |                                        |
135         2 -> 3          : 0        |                                        |
136         4 -> 7          : 0        |                                        |
137         8 -> 15         : 22       |****************************************|
138        16 -> 31         : 0        |                                        |
139        32 -> 63         : 0        |                                        |
140        64 -> 127        : 5        |*********                               |
141       128 -> 255        : 2        |***                                     |
142
143Occasionally, numeric information isn't enough and you want to capture strings.
144What are the strings printed by puts() across the system?
145
146# ./argdist -i 10 -n 1 -C 'p:c:puts(char *str):char*:str'
147[01:53:54]
148p:c:puts(char *str):char*:str
149        COUNT      EVENT
150        2          str = Press ENTER to start.
151
152It looks like the message "Press ENTER to start." was printed twice during the
15310 seconds we were tracing.
154
155What about reads? You could trace gets() across the system and print the
156strings input by the user (note how "r" is used instead of "p" to attach a
157probe to the function's return):
158
159# ./argdist -i 10 -n 1 -C 'r:c:gets():char*:(char*)$retval:$retval!=0'
160[02:12:23]
161r:c:gets():char*:$retval:$retval!=0
162        COUNT      EVENT
163        1          (char*)$retval = hi there
164        3          (char*)$retval = sasha
165        8          (char*)$retval = hello
166
167Similarly, we could get a histogram of the error codes returned by read():
168
169# ./argdist -i 10 -c 1 -H 'r:c:read()'
170[02:15:36]
171r:c:read()
172     retval              : count     distribution
173         0 -> 1          : 29       |****************************************|
174         2 -> 3          : 11       |***************                         |
175         4 -> 7          : 0        |                                        |
176         8 -> 15         : 3        |****                                    |
177        16 -> 31         : 2        |**                                      |
178        32 -> 63         : 22       |******************************          |
179        64 -> 127        : 5        |******                                  |
180       128 -> 255        : 0        |                                        |
181       256 -> 511        : 1        |*                                       |
182       512 -> 1023       : 1        |*                                       |
183      1024 -> 2047       : 0        |                                        |
184      2048 -> 4095       : 2        |**                                      |
185
186In return probes, you can also trace the latency of the function (unless it is
187recursive) and the parameters it had on entry. For example, we can identify
188which processes are performing slow synchronous filesystem reads -- say,
189longer than 0.1ms (100,000ns):
190
191# ./argdist -C 'r::__vfs_read():u32:$PID:$latency > 100000'
192[01:08:48]
193r::__vfs_read():u32:$PID:$latency > 100000
194        COUNT      EVENT
195        1          $PID = 10457
196        21         $PID = 2780
197[01:08:49]
198r::__vfs_read():u32:$PID:$latency > 100000
199        COUNT      EVENT
200        1          $PID = 10457
201        21         $PID = 2780
202^C
203
204It looks like process 2780 performed 21 slow reads.
205
206You can print the name of the process. This is helpful for short lived processes
207and for easier identification of processes response. For example, we can identify
208the process using the epoll I/O multiplexing system call
209
210# ./argdist -C 't:syscalls:sys_exit_epoll_wait():char*:$COMM'
211[19:57:56]
212t:syscalls:sys_exit_epoll_wait():char*:$COMM
213	COUNT      EVENT
214	4          $COMM = b'node'
215[19:57:57]
216t:syscalls:sys_exit_epoll_wait():char*:$COMM
217	COUNT      EVENT
218	2          $COMM = b'open5gs-sgwud'
219	3          $COMM = b'open5gs-sgwcd'
220	3          $COMM = b'open5gs-nrfd'
221	3          $COMM = b'open5gs-udmd'
222	4          $COMM = b'open5gs-scpd'
223
224Occasionally, entry parameter values are also interesting. For example, you
225might be curious how long it takes malloc() to allocate memory -- nanoseconds
226per byte allocated. Let's go:
227
228# ./argdist -H 'r:c:malloc(size_t size):u64:$latency/$entry(size);ns per byte' -n 1 -i 10
229[01:11:13]
230     ns per byte         : count     distribution
231         0 -> 1          : 0        |                                        |
232         2 -> 3          : 4        |*****************                       |
233         4 -> 7          : 3        |*************                           |
234         8 -> 15         : 2        |********                                |
235        16 -> 31         : 1        |****                                    |
236        32 -> 63         : 0        |                                        |
237        64 -> 127        : 7        |*******************************         |
238       128 -> 255        : 1        |****                                    |
239       256 -> 511        : 0        |                                        |
240       512 -> 1023       : 1        |****                                    |
241      1024 -> 2047       : 1        |****                                    |
242      2048 -> 4095       : 9        |****************************************|
243      4096 -> 8191       : 1        |****                                    |
244
245It looks like a tri-modal distribution. Some allocations are extremely cheap,
246and take 2-15 nanoseconds per byte. Other allocations are slower, and take
24764-127 nanoseconds per byte. And some allocations are slower still, and take
248multiple microseconds per byte.
249
250You could also group results by more than one field. For example, __kmalloc
251takes an additional flags parameter that describes how to allocate memory:
252
253# ./argdist -c -C 'p::__kmalloc(size_t size, gfp_t flags):gfp_t,size_t:flags,size'
254[03:42:29]
255p::__kmalloc(size_t size, gfp_t flags):gfp_t,size_t:flags,size
256        COUNT      EVENT
257        1          flags = 16, size = 152
258        2          flags = 131280, size = 8
259        7          flags = 131280, size = 16
260[03:42:30]
261p::__kmalloc(size_t size, gfp_t flags):gfp_t,size_t:flags,size
262        COUNT      EVENT
263        1          flags = 16, size = 152
264        6          flags = 131280, size = 8
265        19         flags = 131280, size = 16
266[03:42:31]
267p::__kmalloc(size_t size, gfp_t flags):gfp_t,size_t:flags,size
268        COUNT      EVENT
269        2          flags = 16, size = 152
270        10         flags = 131280, size = 8
271        31         flags = 131280, size = 16
272[03:42:32]
273p::__kmalloc(size_t size, gfp_t flags):gfp_t,size_t:flags,size
274        COUNT      EVENT
275        2          flags = 16, size = 152
276        14         flags = 131280, size = 8
277        43         flags = 131280, size = 16
278^C
279
280The flags value must be expanded by hand, but it's still helpful to eliminate
281certain kinds of allocations or visually group them together.
282
283argdist also has basic support for kernel tracepoints. It is sometimes more
284convenient to use tracepoints because they are documented and don't vary a lot
285between kernel versions. For example, let's trace the net:net_dev_start_xmit
286tracepoint and print out the protocol field from the tracepoint structure:
287
288# argdist -C 't:net:net_dev_start_xmit():u16:args->protocol'
289[13:01:49]
290t:net:net_dev_start_xmit():u16:args->protocol
291        COUNT      EVENT
292        8          args->protocol = 2048
293^C
294
295Note that to discover the format of the net:net_dev_start_xmit tracepoint, you
296use the tplist tool (tplist -v net:net_dev_start_xmit).
297
298
299Occasionally, it is useful to filter certain expressions by string. This is not
300trivially supported by BPF, but argdist provides a STRCMP helper you can use in
301filter expressions. For example, to get a histogram of latencies opening a
302specific file, run this:
303
304# argdist -c -H 'r:c:open(char *file):u64:$latency/1000:STRCMP("test.txt",$entry(file))'
305[02:16:38]
306[02:16:39]
307[02:16:40]
308     $latency/1000       : count     distribution
309         0 -> 1          : 0        |                                        |
310         2 -> 3          : 0        |                                        |
311         4 -> 7          : 0        |                                        |
312         8 -> 15         : 0        |                                        |
313        16 -> 31         : 2        |****************************************|
314[02:16:41]
315     $latency/1000       : count     distribution
316         0 -> 1          : 0        |                                        |
317         2 -> 3          : 0        |                                        |
318         4 -> 7          : 0        |                                        |
319         8 -> 15         : 1        |**********                              |
320        16 -> 31         : 4        |****************************************|
321[02:16:42]
322     $latency/1000       : count     distribution
323         0 -> 1          : 0        |                                        |
324         2 -> 3          : 0        |                                        |
325         4 -> 7          : 0        |                                        |
326         8 -> 15         : 1        |********                                |
327        16 -> 31         : 5        |****************************************|
328[02:16:43]
329     $latency/1000       : count     distribution
330         0 -> 1          : 0        |                                        |
331         2 -> 3          : 0        |                                        |
332         4 -> 7          : 0        |                                        |
333         8 -> 15         : 1        |********                                |
334        16 -> 31         : 5        |****************************************|
335
336
337Here's a final example that finds how many write() system calls are performed
338by each process on the system:
339
340# argdist -c -C 'p:c:write():int:$PID;write per process' -n 2
341[06:47:18]
342write by process
343        COUNT      EVENT
344        3          $PID = 8889
345        7          $PID = 7615
346        7          $PID = 2480
347[06:47:19]
348write by process
349        COUNT      EVENT
350        9          $PID = 8889
351        23         $PID = 7615
352        23         $PID = 2480
353
354
355USAGE message:
356
357# argdist -h
358usage: argdist [-h] [-p PID] [-z STRING_SIZE] [-i INTERVAL] [-n COUNT] [-v]
359               [-c] [-T TOP] [-H specifier] [-C[specifier] [-I header]
360
361Trace a function and display a summary of its parameter values.
362
363optional arguments:
364  -h, --help            show this help message and exit
365  -p PID, --pid PID     id of the process to trace (optional)
366  -t TID, --tid TID     id of the thread to trace (optional)
367  -z STRING_SIZE, --string-size STRING_SIZE
368                        maximum string size to read from char* arguments
369  -i INTERVAL, --interval INTERVAL
370                        output interval, in seconds (default 1 second)
371  -d DURATION, --duration DURATION
372			total duration of trace, in seconds
373  -n COUNT, --number COUNT
374                        number of outputs
375  -v, --verbose         print resulting BPF program code before executing
376  -c, --cumulative      do not clear histograms and freq counts at each interval
377  -T TOP, --top TOP     number of top results to show (not applicable to
378                        histograms)
379  -H specifier, --histogram specifier
380                        probe specifier to capture histogram of (see examples
381                        below)
382  -C specifier, --count specifier
383                        probe specifier to capture count of (see examples
384                        below)
385  -I header, --include header
386                        additional header files to include in the BPF program
387                        as either full path, or relative to current working directory,
388                        or relative to default kernel header search path
389
390Probe specifier syntax:
391        {p,r,t,u}:{[library],category}:function(signature)[:type[,type...]:expr[,expr...][:filter]][#label]
392Where:
393        p,r,t,u    -- probe at function entry, function exit, kernel tracepoint,
394                      or USDT probe
395                      in exit probes: can use $retval, $entry(param), $latency
396        library    -- the library that contains the function
397                      (leave empty for kernel functions)
398        category   -- the category of the kernel tracepoint (e.g. net, sched)
399        signature  -- the function's parameters, as in the C header
400        type       -- the type of the expression to collect (supports multiple)
401        expr       -- the expression to collect (supports multiple)
402        filter     -- the filter that is applied to collected values
403        label      -- the label for this probe in the resulting output
404
405EXAMPLES:
406
407argdist -H 'p::__kmalloc(u64 size):u64:size'
408        Print a histogram of allocation sizes passed to kmalloc
409
410argdist -p 1005 -C 'p:c:malloc(size_t size):size_t:size:size==16'
411        Print a frequency count of how many times process 1005 called malloc
412        with an allocation size of 16 bytes
413
414argdist -C 'r:c:gets():char*:$retval#snooped strings'
415        Snoop on all strings returned by gets()
416
417argdist -H 'r::__kmalloc(size_t size):u64:$latency/$entry(size)#ns per byte'
418        Print a histogram of nanoseconds per byte from kmalloc allocations
419
420argdist -C 'p::__kmalloc(size_t size, gfp_t flags):size_t:size:flags&GFP_ATOMIC'
421        Print frequency count of kmalloc allocation sizes that have GFP_ATOMIC
422
423argdist -p 1005 -C 'p:c:write(int fd):int:fd' -T 5
424        Print frequency counts of how many times writes were issued to a
425        particular file descriptor number, in process 1005, but only show
426        the top 5 busiest fds
427
428argdist -p 1005 -H 'r:c:read()'
429        Print a histogram of error codes returned by read() in process 1005
430
431argdist -C 'r::__vfs_read():u32:$PID:$latency > 100000'
432        Print frequency of reads by process where the latency was >0.1ms
433
434argdist -C 'r::__vfs_read():u32:$COMM:$latency > 100000'
435        Print frequency of reads by process name where the latency was >0.1ms
436
437argdist -H 'r::__vfs_read(void *file, void *buf, size_t count):size_t:$entry(count):$latency > 1000000'
438        Print a histogram of read sizes that were longer than 1ms
439
440argdist -H \
441        'p:c:write(int fd, const void *buf, size_t count):size_t:count:fd==1'
442        Print a histogram of buffer sizes passed to write() across all
443        processes, where the file descriptor was 1 (STDOUT)
444
445argdist -C 'p:c:fork()#fork calls'
446        Count fork() calls in libc across all processes
447        Can also use funccount.py, which is easier and more flexible
448
449argdist -H 't:block:block_rq_complete():u32:args->nr_sector'
450        Print histogram of number of sectors in completing block I/O requests
451
452argdist -C 't:irq:irq_handler_entry():int:args->irq'
453        Aggregate interrupts by interrupt request (IRQ)
454
455argdist -C 'u:pthread:pthread_start():u64:arg2' -p 1337
456        Print frequency of function addresses used as a pthread start function,
457        relying on the USDT pthread_start probe in process 1337
458
459argdist -H 'p:c:sleep(u32 seconds):u32:seconds' \
460        -H 'p:c:nanosleep(struct timespec *req):long:req->tv_nsec'
461        Print histograms of sleep() and nanosleep() parameter values
462
463argdist -p 2780 -z 120 \
464        -C 'p:c:write(int fd, char* buf, size_t len):char*:buf:fd==1'
465        Spy on writes to STDOUT performed by process 2780, up to a string size
466        of 120 characters
467
468argdist -I 'kernel/sched/sched.h' \
469        -C 'p::__account_cfs_rq_runtime(struct cfs_rq *cfs_rq):s64:cfs_rq->runtime_remaining'
470        Trace on the cfs scheduling runqueue remaining runtime. The struct cfs_rq is defined
471        in kernel/sched/sched.h which is in kernel source tree and not in kernel-devel
472        package.  So this command needs to run at the kernel source tree root directory
473        so that the added header file can be found by the compiler.
474
475argdist -C 'p::do_sys_open(int dfd, const char __user *filename, int flags,
476            umode_t mode):char*:filename:STRCMP("sample.txt", filename)'
477        Trace open of the file "sample.txt". It should be noted that 'filename'
478        passed to the do_sys_open is a char * user pointer. Hence parameter
479        'filename' should be tagged with __user for kprobes (const char __user
480        *filename).  This information distinguishes if the 'filename' should be
481        copied from userspace to the bpf stack or from kernel space to the bpf
482        stack.
483