1# bcc Python Developer Tutorial 2 3This tutorial is about developing [bcc](https://github.com/iovisor/bcc) tools and programs using the Python interface. There are two parts: observability then networking. Snippets are taken from various programs in bcc: see their files for licences. 4 5Also see the bcc developer's [reference_guide.md](reference_guide.md), and a tutorial for end-users of tools: [tutorial.md](tutorial.md). There is also a lua interface for bcc. 6 7## Observability 8 9This observability tutorial contains 17 lessons, and 46 enumerated things to learn. 10 11### Lesson 1. Hello World 12 13Start by running [examples/hello_world.py](../examples/hello_world.py), while running some commands (eg, "ls") in another session. It should print "Hello, World!" for new processes. If not, start by fixing bcc: see [INSTALL.md](../INSTALL.md). 14 15``` 16# ./examples/hello_world.py 17 bash-13364 [002] d... 24573433.052937: : Hello, World! 18 bash-13364 [003] d... 24573436.642808: : Hello, World! 19[...] 20``` 21 22Here's the code for hello_world.py: 23 24```Python 25from bcc import BPF 26BPF(text='int kprobe__sys_clone(void *ctx) { bpf_trace_printk("Hello, World!\\n"); return 0; }').trace_print() 27``` 28 29There are six things to learn from this: 30 311. ```text='...'```: This defines a BPF program inline. The program is written in C. 32 331. ```kprobe__sys_clone()```: This is a short-cut for kernel dynamic tracing via kprobes. If the C function begins with ``kprobe__``, the rest is treated as a kernel function name to instrument, in this case, ```sys_clone()```. 34 351. ```void *ctx```: ctx has arguments, but since we aren't using them here, we'll just cast it to ```void *```. 36 371. ```bpf_trace_printk()```: A simple kernel facility for printf() to the common trace_pipe (/sys/kernel/debug/tracing/trace_pipe). This is ok for some quick examples, but has limitations: 3 args max, 1 %s only, and trace_pipe is globally shared, so concurrent programs will have clashing output. A better interface is via BPF_PERF_OUTPUT(), covered later. 38 391. ```return 0;```: Necessary formality (if you want to know why, see [#139](https://github.com/iovisor/bcc/issues/139)). 40 411. ```.trace_print()```: A bcc routine that reads trace_pipe and prints the output. 42 43### Lesson 2. sys_sync() 44 45Write a program that traces the sys_sync() kernel function. Print "sys_sync() called" when it runs. Test by running ```sync``` in another session while tracing. The hello_world.py program has everything you need for this. 46 47Improve it by printing "Tracing sys_sync()... Ctrl-C to end." when the program first starts. Hint: it's just Python. 48 49### Lesson 3. hello_fields.py 50 51This program is in [examples/tracing/hello_fields.py](../examples/tracing/hello_fields.py). Sample output (run commands in another session): 52 53``` 54# ./examples/tracing/hello_fields.py 55TIME(s) COMM PID MESSAGE 5624585001.174885999 sshd 1432 Hello, World! 5724585001.195710000 sshd 15780 Hello, World! 5824585001.991976000 systemd-udevd 484 Hello, World! 5924585002.276147000 bash 15787 Hello, World! 60``` 61 62Code: 63 64```Python 65from bcc import BPF 66from bcc.utils import printb 67 68# define BPF program 69prog = """ 70int hello(void *ctx) { 71 bpf_trace_printk("Hello, World!\\n"); 72 return 0; 73} 74""" 75 76# load BPF program 77b = BPF(text=prog) 78b.attach_kprobe(event=b.get_syscall_fnname("clone"), fn_name="hello") 79 80# header 81print("%-18s %-16s %-6s %s" % ("TIME(s)", "COMM", "PID", "MESSAGE")) 82 83# format output 84while 1: 85 try: 86 (task, pid, cpu, flags, ts, msg) = b.trace_fields() 87 except ValueError: 88 continue 89 except KeyboardInterrupt: 90 exit() 91 printb(b"%-18.9f %-16s %-6d %s" % (ts, task, pid, msg)) 92``` 93 94This is similar to hello_world.py, and traces new processes via sys_clone() again, but has a few more things to learn: 95 961. ```prog =```: This time we declare the C program as a variable, and later refer to it. This is useful if you want to add some string substitutions based on command line arguments. 97 981. ```hello()```: Now we're just declaring a C function, instead of the ```kprobe__``` shortcut. We'll refer to this later. All C functions declared in the BPF program are expected to be executed on a probe, hence they all need to take a ```pt_reg* ctx``` as first argument. If you need to define some helper function that will not be executed on a probe, they need to be defined as ```static inline``` in order to be inlined by the compiler. Sometimes you would also need to add ```_always_inline``` function attribute to it. 99 1001. ```b.attach_kprobe(event=b.get_syscall_fnname("clone"), fn_name="hello")```: Creates a kprobe for the kernel clone system call function, which will execute our defined hello() function. You can call attach_kprobe() more than once, and attach your C function to multiple kernel functions. 101 1021. ```b.trace_fields()```: Returns a fixed set of fields from trace_pipe. Similar to trace_print(), this is handy for hacking, but for real tooling we should switch to BPF_PERF_OUTPUT(). 103 104### Lesson 4. sync_timing.py 105 106Remember the days of sysadmins typing ```sync``` three times on a slow console before ```reboot```, to give the first asynchronous sync time to complete? Then someone thought ```sync;sync;sync``` was clever, to run them all on one line, which became industry practice despite defeating the original purpose! And then sync became synchronous, so more reasons it was silly. Anyway. 107 108The following example times how quickly the ```do_sync``` function is called, and prints output if it has been called more recently than one second ago. A ```sync;sync;sync``` will print output for the 2nd and 3rd sync's: 109 110``` 111# ./examples/tracing/sync_timing.py 112Tracing for quick sync's... Ctrl-C to end 113At time 0.00 s: multiple syncs detected, last 95 ms ago 114At time 0.10 s: multiple syncs detected, last 96 ms ago 115``` 116 117This program is [examples/tracing/sync_timing.py](../examples/tracing/sync_timing.py): 118 119```Python 120from __future__ import print_function 121from bcc import BPF 122from bcc.utils import printb 123 124# load BPF program 125b = BPF(text=""" 126#include <uapi/linux/ptrace.h> 127 128BPF_HASH(last); 129 130int do_trace(struct pt_regs *ctx) { 131 u64 ts, *tsp, delta, key = 0; 132 133 // attempt to read stored timestamp 134 tsp = last.lookup(&key); 135 if (tsp != NULL) { 136 delta = bpf_ktime_get_ns() - *tsp; 137 if (delta < 1000000000) { 138 // output if time is less than 1 second 139 bpf_trace_printk("%d\\n", delta / 1000000); 140 } 141 last.delete(&key); 142 } 143 144 // update stored timestamp 145 ts = bpf_ktime_get_ns(); 146 last.update(&key, &ts); 147 return 0; 148} 149""") 150 151b.attach_kprobe(event=b.get_syscall_fnname("sync"), fn_name="do_trace") 152print("Tracing for quick sync's... Ctrl-C to end") 153 154# format output 155start = 0 156while 1: 157 try: 158 (task, pid, cpu, flags, ts, ms) = b.trace_fields() 159 if start == 0: 160 start = ts 161 ts = ts - start 162 printb(b"At time %.2f s: multiple syncs detected, last %s ms ago" % (ts, ms)) 163 except KeyboardInterrupt: 164 exit() 165``` 166 167Things to learn: 168 1691. ```bpf_ktime_get_ns()```: Returns the time as nanoseconds. 1701. ```BPF_HASH(last)```: Creates a BPF map object that is a hash (associative array), called "last". We didn't specify any further arguments, so it defaults to key and value types of u64. 1711. ```key = 0```: We'll only store one key/value pair in this hash, where the key is hardwired to zero. 1721. ```last.lookup(&key)```: Lookup the key in the hash, and return a pointer to its value if it exists, else NULL. We pass the key in as an address to a pointer. 1731. ```if (tsp != NULL) {```: The verifier requires that pointer values derived from a map lookup must be checked for a null value before they can be dereferenced and used. 1741. ```last.delete(&key)```: Delete the key from the hash. This is currently required because of [a kernel bug in `.update()`](https://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=a6ed3ea65d9868fdf9eff84e6fe4f666b8d14b02) (fixed in 4.8.10). 1751. ```last.update(&key, &ts)```: Associate the value in the 2nd argument to the key, overwriting any previous value. This records the timestamp. 176 177### Lesson 5. sync_count.py 178 179Modify the sync_timing.py program (prior lesson) to store the count of all kernel sync system calls (both fast and slow), and print it with the output. This count can be recorded in the BPF program by adding a new key index to the existing hash. 180 181### Lesson 6. disksnoop.py 182 183Browse the [examples/tracing/disksnoop.py](../examples/tracing/disksnoop.py) program to see what is new. Here is some sample output: 184 185``` 186# ./disksnoop.py 187TIME(s) T BYTES LAT(ms) 18816458043.436012 W 4096 3.13 18916458043.437326 W 4096 4.44 19016458044.126545 R 4096 42.82 19116458044.129872 R 4096 3.24 192[...] 193``` 194 195And a code snippet: 196 197```Python 198[...] 199REQ_WRITE = 1 # from include/linux/blk_types.h 200 201# load BPF program 202b = BPF(text=""" 203#include <uapi/linux/ptrace.h> 204#include <linux/blk-mq.h> 205 206BPF_HASH(start, struct request *); 207 208void trace_start(struct pt_regs *ctx, struct request *req) { 209 // stash start timestamp by request ptr 210 u64 ts = bpf_ktime_get_ns(); 211 212 start.update(&req, &ts); 213} 214 215void trace_completion(struct pt_regs *ctx, struct request *req) { 216 u64 *tsp, delta; 217 218 tsp = start.lookup(&req); 219 if (tsp != 0) { 220 delta = bpf_ktime_get_ns() - *tsp; 221 bpf_trace_printk("%d %x %d\\n", req->__data_len, 222 req->cmd_flags, delta / 1000); 223 start.delete(&req); 224 } 225} 226""") 227if BPF.get_kprobe_functions(b'blk_start_request'): 228 b.attach_kprobe(event="blk_start_request", fn_name="trace_start") 229b.attach_kprobe(event="blk_mq_start_request", fn_name="trace_start") 230if BPF.get_kprobe_functions(b'__blk_account_io_done'): 231 b.attach_kprobe(event="__blk_account_io_done", fn_name="trace_completion") 232else: 233 b.attach_kprobe(event="blk_account_io_done", fn_name="trace_completion") 234[...] 235``` 236 237Things to learn: 238 2391. ```REQ_WRITE```: We're defining a kernel constant in the Python program because we'll use it there later. If we were using REQ_WRITE in the BPF program, it should just work (without needing to be defined) with the appropriate #includes. 2401. ```trace_start(struct pt_regs *ctx, struct request *req)```: This function will later be attached to kprobes. The arguments to kprobe functions are ```struct pt_regs *ctx```, for registers and BPF context, and then the actual arguments to the function. We'll attach this to blk_start_request(), where the first argument is ```struct request *```. 2411. ```start.update(&req, &ts)```: We're using the pointer to the request struct as a key in our hash. What? This is commonplace in tracing. Pointers to structs turn out to be great keys, as they are unique: two structs can't have the same pointer address. (Just be careful about when it gets free'd and reused.) So what we're really doing is tagging the request struct, which describes the disk I/O, with our own timestamp, so that we can time it. There's two common keys used for storing timestamps: pointers to structs, and, thread IDs (for timing function entry to return). 2421. ```req->__data_len```: We're dereferencing members of ```struct request```. See its definition in the kernel source for what members are there. bcc actually rewrites these expressions to be a series of ```bpf_probe_read_kernel()``` calls. Sometimes bcc can't handle a complex dereference, and you need to call ```bpf_probe_read_kernel()``` directly. 243 244This is a pretty interesting program, and if you can understand all the code, you'll understand many important basics. We're still using the bpf_trace_printk() hack, so let's fix that next. 245 246### Lesson 7. hello_perf_output.py 247 248Let's finally stop using bpf_trace_printk() and use the proper BPF_PERF_OUTPUT() interface. This will also mean we stop getting the free trace_field() members like PID and timestamp, and will need to fetch them directly. Sample output while commands are run in another session: 249 250``` 251# ./hello_perf_output.py 252TIME(s) COMM PID MESSAGE 2530.000000000 bash 22986 Hello, perf_output! 2540.021080275 systemd-udevd 484 Hello, perf_output! 2550.021359520 systemd-udevd 484 Hello, perf_output! 2560.021590610 systemd-udevd 484 Hello, perf_output! 257[...] 258``` 259 260Code is [examples/tracing/hello_perf_output.py](../examples/tracing/hello_perf_output.py): 261 262```Python 263from bcc import BPF 264 265# define BPF program 266prog = """ 267#include <linux/sched.h> 268 269// define output data structure in C 270struct data_t { 271 u32 pid; 272 u64 ts; 273 char comm[TASK_COMM_LEN]; 274}; 275BPF_PERF_OUTPUT(events); 276 277int hello(struct pt_regs *ctx) { 278 struct data_t data = {}; 279 280 data.pid = bpf_get_current_pid_tgid(); 281 data.ts = bpf_ktime_get_ns(); 282 bpf_get_current_comm(&data.comm, sizeof(data.comm)); 283 284 events.perf_submit(ctx, &data, sizeof(data)); 285 286 return 0; 287} 288""" 289 290# load BPF program 291b = BPF(text=prog) 292b.attach_kprobe(event=b.get_syscall_fnname("clone"), fn_name="hello") 293 294# header 295print("%-18s %-16s %-6s %s" % ("TIME(s)", "COMM", "PID", "MESSAGE")) 296 297# process event 298start = 0 299def print_event(cpu, data, size): 300 global start 301 event = b["events"].event(data) 302 if start == 0: 303 start = event.ts 304 time_s = (float(event.ts - start)) / 1000000000 305 print("%-18.9f %-16s %-6d %s" % (time_s, event.comm, event.pid, 306 "Hello, perf_output!")) 307 308# loop with callback to print_event 309b["events"].open_perf_buffer(print_event) 310while 1: 311 b.perf_buffer_poll() 312``` 313 314Things to learn: 315 3161. ```struct data_t```: This defines the C struct we'll use to pass data from kernel to user space. 3171. ```BPF_PERF_OUTPUT(events)```: This names our output channel "events". 3181. ```struct data_t data = {};```: Create an empty data_t struct that we'll then populate. 3191. ```bpf_get_current_pid_tgid()```: Returns the process ID in the lower 32 bits (kernel's view of the PID, which in user space is usually presented as the thread ID), and the thread group ID in the upper 32 bits (what user space often thinks of as the PID). By directly setting this to a u32, we discard the upper 32 bits. Should you be presenting the PID or the TGID? For a multi-threaded app, the TGID will be the same, so you need the PID to differentiate them, if that's what you want. It's also a question of expectations for the end user. 3201. ```bpf_get_current_comm()```: Populates the first argument address with the current process name. 3211. ```events.perf_submit()```: Submit the event for user space to read via a perf ring buffer. 3221. ```def print_event()```: Define a Python function that will handle reading events from the ```events``` stream. 3231. ```b["events"].event(data)```: Now get the event as a Python object, auto-generated from the C declaration. 3241. ```b["events"].open_perf_buffer(print_event)```: Associate the Python ```print_event``` function with the ```events``` stream. 3251. ```while 1: b.perf_buffer_poll()```: Block waiting for events. 326 327### Lesson 8. sync_perf_output.py 328 329Rewrite sync_timing.py, from a prior lesson, to use ```BPF_PERF_OUTPUT```. 330 331### Lesson 9. bitehist.py 332 333The following tool records a histogram of disk I/O sizes. Sample output: 334 335``` 336# ./bitehist.py 337Tracing... Hit Ctrl-C to end. 338^C 339 kbytes : count distribution 340 0 -> 1 : 3 | | 341 2 -> 3 : 0 | | 342 4 -> 7 : 211 |********** | 343 8 -> 15 : 0 | | 344 16 -> 31 : 0 | | 345 32 -> 63 : 0 | | 346 64 -> 127 : 1 | | 347 128 -> 255 : 800 |**************************************| 348``` 349 350Code is [examples/tracing/bitehist.py](../examples/tracing/bitehist.py): 351 352```Python 353from __future__ import print_function 354from bcc import BPF 355from time import sleep 356 357# load BPF program 358b = BPF(text=""" 359#include <uapi/linux/ptrace.h> 360#include <linux/blkdev.h> 361 362BPF_HISTOGRAM(dist); 363 364int kprobe__blk_account_io_done(struct pt_regs *ctx, struct request *req) 365{ 366 dist.increment(bpf_log2l(req->__data_len / 1024)); 367 return 0; 368} 369""") 370 371# header 372print("Tracing... Hit Ctrl-C to end.") 373 374# trace until Ctrl-C 375try: 376 sleep(99999999) 377except KeyboardInterrupt: 378 print() 379 380# output 381b["dist"].print_log2_hist("kbytes") 382``` 383 384A recap from earlier lessons: 385 386- ```kprobe__```: This prefix means the rest will be treated as a kernel function name that will be instrumented using kprobe. 387- ```struct pt_regs *ctx, struct request *req```: Arguments to kprobe. The ```ctx``` is registers and BPF context, the ```req``` is the first argument to the instrumented function: ```blk_account_io_done()```. 388- ```req->__data_len```: Dereferencing that member. 389 390New things to learn: 391 3921. ```BPF_HISTOGRAM(dist)```: Defines a BPF map object that is a histogram, and names it "dist". 3931. ```dist.increment()```: Increments the histogram bucket index provided as first argument by one by default. Optionally, custom increments can be passed as the second argument. 3941. ```bpf_log2l()```: Returns the log-2 of the provided value. This becomes the index of our histogram, so that we're constructing a power-of-2 histogram. 3951. ```b["dist"].print_log2_hist("kbytes")```: Prints the "dist" histogram as power-of-2, with a column header of "kbytes". The only data transferred from kernel to user space is the bucket counts, making this efficient. 396 397### Lesson 10. disklatency.py 398 399Write a program that times disk I/O, and prints a histogram of their latency. Disk I/O instrumentation and timing can be found in the disksnoop.py program from a prior lesson, and histogram code can be found in bitehist.py from a prior lesson. 400 401### Lesson 11. vfsreadlat.py 402 403This example is split into separate Python and C files. Example output: 404 405``` 406# ./vfsreadlat.py 1 407Tracing... Hit Ctrl-C to end. 408 usecs : count distribution 409 0 -> 1 : 0 | | 410 2 -> 3 : 2 |*********** | 411 4 -> 7 : 7 |****************************************| 412 8 -> 15 : 4 |********************** | 413 414 usecs : count distribution 415 0 -> 1 : 29 |****************************************| 416 2 -> 3 : 28 |************************************** | 417 4 -> 7 : 4 |***** | 418 8 -> 15 : 8 |*********** | 419 16 -> 31 : 0 | | 420 32 -> 63 : 0 | | 421 64 -> 127 : 0 | | 422 128 -> 255 : 0 | | 423 256 -> 511 : 2 |** | 424 512 -> 1023 : 0 | | 425 1024 -> 2047 : 0 | | 426 2048 -> 4095 : 0 | | 427 4096 -> 8191 : 4 |***** | 428 8192 -> 16383 : 6 |******** | 429 16384 -> 32767 : 9 |************ | 430 32768 -> 65535 : 6 |******** | 431 65536 -> 131071 : 2 |** | 432 433 usecs : count distribution 434 0 -> 1 : 11 |****************************************| 435 2 -> 3 : 2 |******* | 436 4 -> 7 : 10 |************************************ | 437 8 -> 15 : 8 |***************************** | 438 16 -> 31 : 1 |*** | 439 32 -> 63 : 2 |******* | 440[...] 441``` 442 443Browse the code in [examples/tracing/vfsreadlat.py](../examples/tracing/vfsreadlat.py) and [examples/tracing/vfsreadlat.c](../examples/tracing/vfsreadlat.c). Things to learn: 444 4451. ```b = BPF(src_file = "vfsreadlat.c")```: Read the BPF C program from a separate source file. 4461. ```b.attach_kretprobe(event="vfs_read", fn_name="do_return")```: Attaches the BPF C function ```do_return()``` to the return of the kernel function ```vfs_read()```. This is a kretprobe: instrumenting the return from a function, rather than its entry. 4471. ```b["dist"].clear()```: Clears the histogram. 448 449### Lesson 12. urandomread.py 450 451Tracing while a ```dd if=/dev/urandom of=/dev/null bs=8k count=5``` is run: 452 453``` 454# ./urandomread.py 455TIME(s) COMM PID GOTBITS 45624652832.956994001 smtp 24690 384 45724652837.726500999 dd 24692 65536 45824652837.727111001 dd 24692 65536 45924652837.727703001 dd 24692 65536 46024652837.728294998 dd 24692 65536 46124652837.728888001 dd 24692 65536 462``` 463 464Hah! I caught smtp by accident. Code is [examples/tracing/urandomread.py](../examples/tracing/urandomread.py): 465 466```Python 467from __future__ import print_function 468from bcc import BPF 469 470# load BPF program 471b = BPF(text=""" 472TRACEPOINT_PROBE(random, urandom_read) { 473 // args is from /sys/kernel/debug/tracing/events/random/urandom_read/format 474 bpf_trace_printk("%d\\n", args->got_bits); 475 return 0; 476} 477""") 478 479# header 480print("%-18s %-16s %-6s %s" % ("TIME(s)", "COMM", "PID", "GOTBITS")) 481 482# format output 483while 1: 484 try: 485 (task, pid, cpu, flags, ts, msg) = b.trace_fields() 486 except ValueError: 487 continue 488 print("%-18.9f %-16s %-6d %s" % (ts, task, pid, msg)) 489``` 490 491Things to learn: 492 4931. ```TRACEPOINT_PROBE(random, urandom_read)```: Instrument the kernel tracepoint ```random:urandom_read```. These have a stable API, and thus are recommend to use instead of kprobes, wherever possible. You can run ```perf list``` for a list of tracepoints. Linux >= 4.7 is required to attach BPF programs to tracepoints. 4941. ```args->got_bits```: ```args``` is auto-populated to be a structure of the tracepoint arguments. The comment above says where you can see that structure. Eg: 495 496``` 497# cat /sys/kernel/debug/tracing/events/random/urandom_read/format 498name: urandom_read 499ID: 972 500format: 501 field:unsigned short common_type; offset:0; size:2; signed:0; 502 field:unsigned char common_flags; offset:2; size:1; signed:0; 503 field:unsigned char common_preempt_count; offset:3; size:1; signed:0; 504 field:int common_pid; offset:4; size:4; signed:1; 505 506 field:int got_bits; offset:8; size:4; signed:1; 507 field:int pool_left; offset:12; size:4; signed:1; 508 field:int input_left; offset:16; size:4; signed:1; 509 510print fmt: "got_bits %d nonblocking_pool_entropy_left %d input_entropy_left %d", REC->got_bits, REC->pool_left, REC->input_left 511``` 512 513In this case, we were printing the ```got_bits``` member. 514 515### Lesson 13. disksnoop.py fixed 516 517Convert disksnoop.py from a previous lesson to use the ```block:block_rq_issue``` and ```block:block_rq_complete``` tracepoints. 518 519### Lesson 14. strlen_count.py 520 521This program instruments a user-level function, the ```strlen()``` library function, and frequency counts its string argument. Example output: 522 523``` 524# ./strlen_count.py 525Tracing strlen()... Hit Ctrl-C to end. 526^C COUNT STRING 527 1 " " 528 1 "/bin/ls" 529 1 "." 530 1 "cpudist.py.1" 531 1 ".bashrc" 532 1 "ls --color=auto" 533 1 "key_t" 534[...] 535 10 "a7:~# " 536 10 "/root" 537 12 "LC_ALL" 538 12 "en_US.UTF-8" 539 13 "en_US.UTF-8" 540 20 "~" 541 70 "#%^,~:-=?+/}" 542 340 "\x01\x1b]0;root@bgregg-test: ~\x07\x02root@bgregg-test:~# " 543``` 544 545These are various strings that are being processed by this library function while tracing, along with their frequency counts. ```strlen()``` was called on "LC_ALL" 12 times, for example. 546 547Code is [examples/tracing/strlen_count.py](../examples/tracing/strlen_count.py): 548 549```Python 550from __future__ import print_function 551from bcc import BPF 552from time import sleep 553 554# load BPF program 555b = BPF(text=""" 556#include <uapi/linux/ptrace.h> 557 558struct key_t { 559 char c[80]; 560}; 561BPF_HASH(counts, struct key_t); 562 563int count(struct pt_regs *ctx) { 564 if (!PT_REGS_PARM1(ctx)) 565 return 0; 566 567 struct key_t key = {}; 568 u64 zero = 0, *val; 569 570 bpf_probe_read_user(&key.c, sizeof(key.c), (void *)PT_REGS_PARM1(ctx)); 571 // could also use `counts.increment(key)` 572 val = counts.lookup_or_try_init(&key, &zero); 573 if (val) { 574 (*val)++; 575 } 576 return 0; 577}; 578""") 579b.attach_uprobe(name="c", sym="strlen", fn_name="count") 580 581# header 582print("Tracing strlen()... Hit Ctrl-C to end.") 583 584# sleep until Ctrl-C 585try: 586 sleep(99999999) 587except KeyboardInterrupt: 588 pass 589 590# print output 591print("%10s %s" % ("COUNT", "STRING")) 592counts = b.get_table("counts") 593for k, v in sorted(counts.items(), key=lambda counts: counts[1].value): 594 print("%10d \"%s\"" % (v.value, k.c.encode('string-escape'))) 595``` 596 597Things to learn: 598 5991. ```PT_REGS_PARM1(ctx)```: This fetches the first argument to ```strlen()```, which is the string. 6001. ```b.attach_uprobe(name="c", sym="strlen", fn_name="count")```: Attach to library "c" (if this is the main program, use its pathname), instrument the user-level function ```strlen()```, and on execution call our C function ```count()```. 601 602### Lesson 15. nodejs_http_server.py 603 604This program instruments a user statically-defined tracing (USDT) probe, which is the user-level version of a kernel tracepoint. Sample output: 605 606``` 607# ./nodejs_http_server.py 24728 608TIME(s) COMM PID ARGS 60924653324.561322998 node 24728 path:/index.html 61024653335.343401998 node 24728 path:/images/welcome.png 61124653340.510164998 node 24728 path:/images/favicon.png 612``` 613 614Relevant code from [examples/tracing/nodejs_http_server.py](../examples/tracing/nodejs_http_server.py): 615 616```Python 617from __future__ import print_function 618from bcc import BPF, USDT 619import sys 620 621if len(sys.argv) < 2: 622 print("USAGE: nodejs_http_server PID") 623 exit() 624pid = sys.argv[1] 625debug = 0 626 627# load BPF program 628bpf_text = """ 629#include <uapi/linux/ptrace.h> 630int do_trace(struct pt_regs *ctx) { 631 uint64_t addr; 632 char path[128]={0}; 633 bpf_usdt_readarg(6, ctx, &addr); 634 bpf_probe_read_user(&path, sizeof(path), (void *)addr); 635 bpf_trace_printk("path:%s\\n", path); 636 return 0; 637}; 638""" 639 640# enable USDT probe from given PID 641u = USDT(pid=int(pid)) 642u.enable_probe(probe="http__server__request", fn_name="do_trace") 643if debug: 644 print(u.get_text()) 645 print(bpf_text) 646 647# initialize BPF 648b = BPF(text=bpf_text, usdt_contexts=[u]) 649``` 650 651Things to learn: 652 6531. ```bpf_usdt_readarg(6, ctx, &addr)```: Read the address of argument 6 from the USDT probe into ```addr```. 6541. ```bpf_probe_read_user(&path, sizeof(path), (void *)addr)```: Now the string ```addr``` points to into our ```path``` variable. 6551. ```u = USDT(pid=int(pid))```: Initialize USDT tracing for the given PID. 6561. ```u.enable_probe(probe="http__server__request", fn_name="do_trace")```: Attach our ```do_trace()``` BPF C function to the Node.js ```http__server__request``` USDT probe. 6571. ```b = BPF(text=bpf_text, usdt_contexts=[u])```: Need to pass in our USDT object, ```u```, to BPF object creation. 658 659### Lesson 16. task_switch.c 660 661This is an older tutorial included as a bonus lesson. Use this for recap and to reinforce what you've already learned. 662 663This is a slightly more complex tracing example than Hello World. This program 664will be invoked for every task change in the kernel, and record in a BPF map 665the new and old pids. 666 667The C program below introduces a new concept: the prev argument. This 668argument is treated specially by the BCC frontend, such that accesses 669to this variable are read from the saved context that is passed by the 670kprobe infrastructure. The prototype of the args starting from 671position 1 should match the prototype of the kernel function being 672kprobed. If done so, the program will have seamless access to the 673function parameters. 674 675```c 676#include <uapi/linux/ptrace.h> 677#include <linux/sched.h> 678 679struct key_t { 680 u32 prev_pid; 681 u32 curr_pid; 682}; 683 684BPF_HASH(stats, struct key_t, u64, 1024); 685int count_sched(struct pt_regs *ctx, struct task_struct *prev) { 686 struct key_t key = {}; 687 u64 zero = 0, *val; 688 689 key.curr_pid = bpf_get_current_pid_tgid(); 690 key.prev_pid = prev->pid; 691 692 // could also use `stats.increment(key);` 693 val = stats.lookup_or_try_init(&key, &zero); 694 if (val) { 695 (*val)++; 696 } 697 return 0; 698} 699``` 700 701The userspace component loads the file shown above, and attaches it to the 702`finish_task_switch` kernel function. 703The `[]` operator of the BPF object gives access to each BPF_HASH in the 704program, allowing pass-through access to the values residing in the kernel. Use 705the object as you would any other python dict object: read, update, and deletes 706are all allowed. 707```python 708from bcc import BPF 709from time import sleep 710 711b = BPF(src_file="task_switch.c") 712b.attach_kprobe(event="finish_task_switch", fn_name="count_sched") 713 714# generate many schedule events 715for i in range(0, 100): sleep(0.01) 716 717for k, v in b["stats"].items(): 718 print("task_switch[%5d->%5d]=%u" % (k.prev_pid, k.curr_pid, v.value)) 719``` 720 721These programs can be found in the files [examples/tracing/task_switch.c](../examples/tracing/task_switch.c) and [examples/tracing/task_switch.py](../examples/tracing/task_switch.py) respectively. 722 723### Lesson 17. Further Study 724 725For further study, see Sasha Goldshtein's [linux-tracing-workshop](https://github.com/goldshtn/linux-tracing-workshop), which contains additional labs. There are also many tools in bcc /tools to study. 726 727Please read [CONTRIBUTING-SCRIPTS.md](../CONTRIBUTING-SCRIPTS.md) if you wish to contribute tools to bcc. At the bottom of the main [README.md](../README.md), you'll also find methods for contacting us. Good luck, and happy tracing! 728 729## Networking 730 731To do. 732