1Demonstrations of argdist. 2 3 4argdist probes functions you specify and collects parameter values into a 5histogram or a frequency count. This can be used to understand the distribution 6of values a certain parameter takes, filter and print interesting parameters 7without attaching a debugger, and obtain general execution statistics on 8various functions. 9 10For example, suppose you want to find what allocation sizes are common in 11your application: 12 13# ./argdist -p 2420 -c -C 'p:c:malloc(size_t size):size_t:size' 14[01:42:29] 15p:c:malloc(size_t size):size_t:size 16 COUNT EVENT 17[01:42:30] 18p:c:malloc(size_t size):size_t:size 19 COUNT EVENT 20[01:42:31] 21p:c:malloc(size_t size):size_t:size 22 COUNT EVENT 23 1 size = 16 24[01:42:32] 25p:c:malloc(size_t size):size_t:size 26 COUNT EVENT 27 2 size = 16 28[01:42:33] 29p:c:malloc(size_t size):size_t:size 30 COUNT EVENT 31 3 size = 16 32[01:42:34] 33p:c:malloc(size_t size):size_t:size 34 COUNT EVENT 35 4 size = 16 36^C 37 38It seems that the application is allocating blocks of size 16. The COUNT 39column contains the number of occurrences of a particular event, and the 40EVENT column describes the event. In this case, the "size" parameter was 41probed and its value was 16, repeatedly. 42 43Now, suppose you wanted a histogram of buffer sizes passed to the write() 44function across the system: 45 46# ./argdist -c -H 'p:c:write(int fd, void *buf, size_t len):size_t:len' 47[01:45:22] 48p:c:write(int fd, void *buf, size_t len):size_t:len 49 len : count distribution 50 0 -> 1 : 0 | | 51 2 -> 3 : 2 |************* | 52 4 -> 7 : 0 | | 53 8 -> 15 : 2 |************* | 54 16 -> 31 : 0 | | 55 32 -> 63 : 6 |****************************************| 56[01:45:23] 57p:c:write(int fd, void *buf, size_t len):size_t:len 58 len : count distribution 59 0 -> 1 : 0 | | 60 2 -> 3 : 11 |*************** | 61 4 -> 7 : 0 | | 62 8 -> 15 : 4 |***** | 63 16 -> 31 : 0 | | 64 32 -> 63 : 28 |****************************************| 65 64 -> 127 : 12 |***************** | 66[01:45:24] 67p:c:write(int fd, void *buf, size_t len):size_t:len 68 len : count distribution 69 0 -> 1 : 0 | | 70 2 -> 3 : 21 |**************** | 71 4 -> 7 : 0 | | 72 8 -> 15 : 6 |**** | 73 16 -> 31 : 0 | | 74 32 -> 63 : 52 |****************************************| 75 64 -> 127 : 26 |******************** | 76^C 77 78It seems that most writes fall into three buckets: very small writes of 2-3 79bytes, medium writes of 32-63 bytes, and larger writes of 64-127 bytes. 80 81But these are writes across the board -- what if you wanted to focus on writes 82to STDOUT? 83 84# ./argdist -c -H 'p:c:write(int fd, void *buf, size_t len):size_t:len:fd==1' 85[01:47:17] 86p:c:write(int fd, void *buf, size_t len):size_t:len:fd==1 87 len : count distribution 88 0 -> 1 : 0 | | 89 2 -> 3 : 0 | | 90 4 -> 7 : 0 | | 91 8 -> 15 : 1 |****************************************| 92 16 -> 31 : 0 | | 93 32 -> 63 : 1 |****************************************| 94[01:47:18] 95p:c:write(int fd, void *buf, size_t len):size_t:len:fd==1 96 len : count distribution 97 0 -> 1 : 0 | | 98 2 -> 3 : 0 | | 99 4 -> 7 : 0 | | 100 8 -> 15 : 2 |************* | 101 16 -> 31 : 0 | | 102 32 -> 63 : 3 |******************** | 103 64 -> 127 : 6 |****************************************| 104[01:47:19] 105p:c:write(int fd, void *buf, size_t len):size_t:len:fd==1 106 len : count distribution 107 0 -> 1 : 0 | | 108 2 -> 3 : 0 | | 109 4 -> 7 : 0 | | 110 8 -> 15 : 3 |********* | 111 16 -> 31 : 0 | | 112 32 -> 63 : 5 |*************** | 113 64 -> 127 : 13 |****************************************| 114^C 115 116The "fd==1" part is a filter that is applied to every invocation of write(). 117Only if the filter condition is true, the value is recorded. 118 119You can also use argdist to trace kernel functions. For example, suppose you 120wanted a histogram of kernel allocation (kmalloc) sizes across the system, 121printed twice with 3 second intervals: 122 123# ./argdist -i 3 -n 2 -H 'p::__kmalloc(size_t size):size_t:size' 124[01:50:00] 125p::__kmalloc(size_t size):size_t:size 126 size : count distribution 127 0 -> 1 : 0 | | 128 2 -> 3 : 0 | | 129 4 -> 7 : 0 | | 130 8 -> 15 : 6 |****************************************| 131[01:50:03] 132p::__kmalloc(size_t size):size_t:size 133 size : count distribution 134 0 -> 1 : 0 | | 135 2 -> 3 : 0 | | 136 4 -> 7 : 0 | | 137 8 -> 15 : 22 |****************************************| 138 16 -> 31 : 0 | | 139 32 -> 63 : 0 | | 140 64 -> 127 : 5 |********* | 141 128 -> 255 : 2 |*** | 142 143Occasionally, numeric information isn't enough and you want to capture strings. 144What are the strings printed by puts() across the system? 145 146# ./argdist -i 10 -n 1 -C 'p:c:puts(char *str):char*:str' 147[01:53:54] 148p:c:puts(char *str):char*:str 149 COUNT EVENT 150 2 str = Press ENTER to start. 151 152It looks like the message "Press ENTER to start." was printed twice during the 15310 seconds we were tracing. 154 155What about reads? You could trace gets() across the system and print the 156strings input by the user (note how "r" is used instead of "p" to attach a 157probe to the function's return): 158 159# ./argdist -i 10 -n 1 -C 'r:c:gets():char*:(char*)$retval:$retval!=0' 160[02:12:23] 161r:c:gets():char*:$retval:$retval!=0 162 COUNT EVENT 163 1 (char*)$retval = hi there 164 3 (char*)$retval = sasha 165 8 (char*)$retval = hello 166 167Similarly, we could get a histogram of the error codes returned by read(): 168 169# ./argdist -i 10 -c 1 -H 'r:c:read()' 170[02:15:36] 171r:c:read() 172 retval : count distribution 173 0 -> 1 : 29 |****************************************| 174 2 -> 3 : 11 |*************** | 175 4 -> 7 : 0 | | 176 8 -> 15 : 3 |**** | 177 16 -> 31 : 2 |** | 178 32 -> 63 : 22 |****************************** | 179 64 -> 127 : 5 |****** | 180 128 -> 255 : 0 | | 181 256 -> 511 : 1 |* | 182 512 -> 1023 : 1 |* | 183 1024 -> 2047 : 0 | | 184 2048 -> 4095 : 2 |** | 185 186In return probes, you can also trace the latency of the function (unless it is 187recursive) and the parameters it had on entry. For example, we can identify 188which processes are performing slow synchronous filesystem reads -- say, 189longer than 0.1ms (100,000ns): 190 191# ./argdist -C 'r::__vfs_read():u32:$PID:$latency > 100000' 192[01:08:48] 193r::__vfs_read():u32:$PID:$latency > 100000 194 COUNT EVENT 195 1 $PID = 10457 196 21 $PID = 2780 197[01:08:49] 198r::__vfs_read():u32:$PID:$latency > 100000 199 COUNT EVENT 200 1 $PID = 10457 201 21 $PID = 2780 202^C 203 204It looks like process 2780 performed 21 slow reads. 205 206You can print the name of the process. This is helpful for short lived processes 207and for easier identification of processes response. For example, we can identify 208the process using the epoll I/O multiplexing system call 209 210# ./argdist -C 't:syscalls:sys_exit_epoll_wait():char*:$COMM' 211[19:57:56] 212t:syscalls:sys_exit_epoll_wait():char*:$COMM 213 COUNT EVENT 214 4 $COMM = b'node' 215[19:57:57] 216t:syscalls:sys_exit_epoll_wait():char*:$COMM 217 COUNT EVENT 218 2 $COMM = b'open5gs-sgwud' 219 3 $COMM = b'open5gs-sgwcd' 220 3 $COMM = b'open5gs-nrfd' 221 3 $COMM = b'open5gs-udmd' 222 4 $COMM = b'open5gs-scpd' 223 224Occasionally, entry parameter values are also interesting. For example, you 225might be curious how long it takes malloc() to allocate memory -- nanoseconds 226per byte allocated. Let's go: 227 228# ./argdist -H 'r:c:malloc(size_t size):u64:$latency/$entry(size);ns per byte' -n 1 -i 10 229[01:11:13] 230 ns per byte : count distribution 231 0 -> 1 : 0 | | 232 2 -> 3 : 4 |***************** | 233 4 -> 7 : 3 |************* | 234 8 -> 15 : 2 |******** | 235 16 -> 31 : 1 |**** | 236 32 -> 63 : 0 | | 237 64 -> 127 : 7 |******************************* | 238 128 -> 255 : 1 |**** | 239 256 -> 511 : 0 | | 240 512 -> 1023 : 1 |**** | 241 1024 -> 2047 : 1 |**** | 242 2048 -> 4095 : 9 |****************************************| 243 4096 -> 8191 : 1 |**** | 244 245It looks like a tri-modal distribution. Some allocations are extremely cheap, 246and take 2-15 nanoseconds per byte. Other allocations are slower, and take 24764-127 nanoseconds per byte. And some allocations are slower still, and take 248multiple microseconds per byte. 249 250You could also group results by more than one field. For example, __kmalloc 251takes an additional flags parameter that describes how to allocate memory: 252 253# ./argdist -c -C 'p::__kmalloc(size_t size, gfp_t flags):gfp_t,size_t:flags,size' 254[03:42:29] 255p::__kmalloc(size_t size, gfp_t flags):gfp_t,size_t:flags,size 256 COUNT EVENT 257 1 flags = 16, size = 152 258 2 flags = 131280, size = 8 259 7 flags = 131280, size = 16 260[03:42:30] 261p::__kmalloc(size_t size, gfp_t flags):gfp_t,size_t:flags,size 262 COUNT EVENT 263 1 flags = 16, size = 152 264 6 flags = 131280, size = 8 265 19 flags = 131280, size = 16 266[03:42:31] 267p::__kmalloc(size_t size, gfp_t flags):gfp_t,size_t:flags,size 268 COUNT EVENT 269 2 flags = 16, size = 152 270 10 flags = 131280, size = 8 271 31 flags = 131280, size = 16 272[03:42:32] 273p::__kmalloc(size_t size, gfp_t flags):gfp_t,size_t:flags,size 274 COUNT EVENT 275 2 flags = 16, size = 152 276 14 flags = 131280, size = 8 277 43 flags = 131280, size = 16 278^C 279 280The flags value must be expanded by hand, but it's still helpful to eliminate 281certain kinds of allocations or visually group them together. 282 283argdist also has basic support for kernel tracepoints. It is sometimes more 284convenient to use tracepoints because they are documented and don't vary a lot 285between kernel versions. For example, let's trace the net:net_dev_start_xmit 286tracepoint and print out the protocol field from the tracepoint structure: 287 288# argdist -C 't:net:net_dev_start_xmit():u16:args->protocol' 289[13:01:49] 290t:net:net_dev_start_xmit():u16:args->protocol 291 COUNT EVENT 292 8 args->protocol = 2048 293^C 294 295Note that to discover the format of the net:net_dev_start_xmit tracepoint, you 296use the tplist tool (tplist -v net:net_dev_start_xmit). 297 298 299Occasionally, it is useful to filter certain expressions by string. This is not 300trivially supported by BPF, but argdist provides a STRCMP helper you can use in 301filter expressions. For example, to get a histogram of latencies opening a 302specific file, run this: 303 304# argdist -c -H 'r:c:open(char *file):u64:$latency/1000:STRCMP("test.txt",$entry(file))' 305[02:16:38] 306[02:16:39] 307[02:16:40] 308 $latency/1000 : count distribution 309 0 -> 1 : 0 | | 310 2 -> 3 : 0 | | 311 4 -> 7 : 0 | | 312 8 -> 15 : 0 | | 313 16 -> 31 : 2 |****************************************| 314[02:16:41] 315 $latency/1000 : count distribution 316 0 -> 1 : 0 | | 317 2 -> 3 : 0 | | 318 4 -> 7 : 0 | | 319 8 -> 15 : 1 |********** | 320 16 -> 31 : 4 |****************************************| 321[02:16:42] 322 $latency/1000 : count distribution 323 0 -> 1 : 0 | | 324 2 -> 3 : 0 | | 325 4 -> 7 : 0 | | 326 8 -> 15 : 1 |******** | 327 16 -> 31 : 5 |****************************************| 328[02:16:43] 329 $latency/1000 : count distribution 330 0 -> 1 : 0 | | 331 2 -> 3 : 0 | | 332 4 -> 7 : 0 | | 333 8 -> 15 : 1 |******** | 334 16 -> 31 : 5 |****************************************| 335 336 337Here's a final example that finds how many write() system calls are performed 338by each process on the system: 339 340# argdist -c -C 'p:c:write():int:$PID;write per process' -n 2 341[06:47:18] 342write by process 343 COUNT EVENT 344 3 $PID = 8889 345 7 $PID = 7615 346 7 $PID = 2480 347[06:47:19] 348write by process 349 COUNT EVENT 350 9 $PID = 8889 351 23 $PID = 7615 352 23 $PID = 2480 353 354 355USAGE message: 356 357# argdist -h 358usage: argdist [-h] [-p PID] [-z STRING_SIZE] [-i INTERVAL] [-n COUNT] [-v] 359 [-c] [-T TOP] [-H specifier] [-C[specifier] [-I header] 360 361Trace a function and display a summary of its parameter values. 362 363optional arguments: 364 -h, --help show this help message and exit 365 -p PID, --pid PID id of the process to trace (optional) 366 -t TID, --tid TID id of the thread to trace (optional) 367 -z STRING_SIZE, --string-size STRING_SIZE 368 maximum string size to read from char* arguments 369 -i INTERVAL, --interval INTERVAL 370 output interval, in seconds (default 1 second) 371 -d DURATION, --duration DURATION 372 total duration of trace, in seconds 373 -n COUNT, --number COUNT 374 number of outputs 375 -v, --verbose print resulting BPF program code before executing 376 -c, --cumulative do not clear histograms and freq counts at each interval 377 -T TOP, --top TOP number of top results to show (not applicable to 378 histograms) 379 -H specifier, --histogram specifier 380 probe specifier to capture histogram of (see examples 381 below) 382 -C specifier, --count specifier 383 probe specifier to capture count of (see examples 384 below) 385 -I header, --include header 386 additional header files to include in the BPF program 387 as either full path, or relative to current working directory, 388 or relative to default kernel header search path 389 390Probe specifier syntax: 391 {p,r,t,u}:{[library],category}:function(signature)[:type[,type...]:expr[,expr...][:filter]][#label] 392Where: 393 p,r,t,u -- probe at function entry, function exit, kernel tracepoint, 394 or USDT probe 395 in exit probes: can use $retval, $entry(param), $latency 396 library -- the library that contains the function 397 (leave empty for kernel functions) 398 category -- the category of the kernel tracepoint (e.g. net, sched) 399 signature -- the function's parameters, as in the C header 400 type -- the type of the expression to collect (supports multiple) 401 expr -- the expression to collect (supports multiple) 402 filter -- the filter that is applied to collected values 403 label -- the label for this probe in the resulting output 404 405EXAMPLES: 406 407argdist -H 'p::__kmalloc(u64 size):u64:size' 408 Print a histogram of allocation sizes passed to kmalloc 409 410argdist -p 1005 -C 'p:c:malloc(size_t size):size_t:size:size==16' 411 Print a frequency count of how many times process 1005 called malloc 412 with an allocation size of 16 bytes 413 414argdist -C 'r:c:gets():char*:$retval#snooped strings' 415 Snoop on all strings returned by gets() 416 417argdist -H 'r::__kmalloc(size_t size):u64:$latency/$entry(size)#ns per byte' 418 Print a histogram of nanoseconds per byte from kmalloc allocations 419 420argdist -C 'p::__kmalloc(size_t size, gfp_t flags):size_t:size:flags&GFP_ATOMIC' 421 Print frequency count of kmalloc allocation sizes that have GFP_ATOMIC 422 423argdist -p 1005 -C 'p:c:write(int fd):int:fd' -T 5 424 Print frequency counts of how many times writes were issued to a 425 particular file descriptor number, in process 1005, but only show 426 the top 5 busiest fds 427 428argdist -p 1005 -H 'r:c:read()' 429 Print a histogram of error codes returned by read() in process 1005 430 431argdist -C 'r::__vfs_read():u32:$PID:$latency > 100000' 432 Print frequency of reads by process where the latency was >0.1ms 433 434argdist -C 'r::__vfs_read():u32:$COMM:$latency > 100000' 435 Print frequency of reads by process name where the latency was >0.1ms 436 437argdist -H 'r::__vfs_read(void *file, void *buf, size_t count):size_t:$entry(count):$latency > 1000000' 438 Print a histogram of read sizes that were longer than 1ms 439 440argdist -H \ 441 'p:c:write(int fd, const void *buf, size_t count):size_t:count:fd==1' 442 Print a histogram of buffer sizes passed to write() across all 443 processes, where the file descriptor was 1 (STDOUT) 444 445argdist -C 'p:c:fork()#fork calls' 446 Count fork() calls in libc across all processes 447 Can also use funccount.py, which is easier and more flexible 448 449argdist -H 't:block:block_rq_complete():u32:args->nr_sector' 450 Print histogram of number of sectors in completing block I/O requests 451 452argdist -C 't:irq:irq_handler_entry():int:args->irq' 453 Aggregate interrupts by interrupt request (IRQ) 454 455argdist -C 'u:pthread:pthread_start():u64:arg2' -p 1337 456 Print frequency of function addresses used as a pthread start function, 457 relying on the USDT pthread_start probe in process 1337 458 459argdist -H 'p:c:sleep(u32 seconds):u32:seconds' \ 460 -H 'p:c:nanosleep(struct timespec *req):long:req->tv_nsec' 461 Print histograms of sleep() and nanosleep() parameter values 462 463argdist -p 2780 -z 120 \ 464 -C 'p:c:write(int fd, char* buf, size_t len):char*:buf:fd==1' 465 Spy on writes to STDOUT performed by process 2780, up to a string size 466 of 120 characters 467 468argdist -I 'kernel/sched/sched.h' \ 469 -C 'p::__account_cfs_rq_runtime(struct cfs_rq *cfs_rq):s64:cfs_rq->runtime_remaining' 470 Trace on the cfs scheduling runqueue remaining runtime. The struct cfs_rq is defined 471 in kernel/sched/sched.h which is in kernel source tree and not in kernel-devel 472 package. So this command needs to run at the kernel source tree root directory 473 so that the added header file can be found by the compiler. 474 475argdist -C 'p::do_sys_open(int dfd, const char __user *filename, int flags, 476 umode_t mode):char*:filename:STRCMP("sample.txt", filename)' 477 Trace open of the file "sample.txt". It should be noted that 'filename' 478 passed to the do_sys_open is a char * user pointer. Hence parameter 479 'filename' should be tagged with __user for kprobes (const char __user 480 *filename). This information distinguishes if the 'filename' should be 481 copied from userspace to the bpf stack or from kernel space to the bpf 482 stack. 483