xref: /aosp_15_r20/external/bcc/docs/tutorial_bcc_python_developer.md (revision 387f9dfdfa2baef462e92476d413c7bc2470293e)
1# bcc Python Developer Tutorial
2
3This tutorial is about developing [bcc](https://github.com/iovisor/bcc) tools and programs using the Python interface. There are two parts: observability then networking. Snippets are taken from various programs in bcc: see their files for licences.
4
5Also see the bcc developer's [reference_guide.md](reference_guide.md), and a tutorial for end-users of tools: [tutorial.md](tutorial.md). There is also a lua interface for bcc.
6
7## Observability
8
9This observability tutorial contains 17 lessons, and 46 enumerated things to learn.
10
11### Lesson 1. Hello World
12
13Start by running [examples/hello_world.py](../examples/hello_world.py), while running some commands (eg, "ls") in another session. It should print "Hello, World!" for new processes. If not, start by fixing bcc: see [INSTALL.md](../INSTALL.md).
14
15```
16# ./examples/hello_world.py
17            bash-13364 [002] d... 24573433.052937: : Hello, World!
18            bash-13364 [003] d... 24573436.642808: : Hello, World!
19[...]
20```
21
22Here's the code for hello_world.py:
23
24```Python
25from bcc import BPF
26BPF(text='int kprobe__sys_clone(void *ctx) { bpf_trace_printk("Hello, World!\\n"); return 0; }').trace_print()
27```
28
29There are six things to learn from this:
30
311. ```text='...'```: This defines a BPF program inline. The program is written in C.
32
331. ```kprobe__sys_clone()```: This is a short-cut for kernel dynamic tracing via kprobes. If the C function begins with ``kprobe__``, the rest is treated as a kernel function name to instrument, in this case, ```sys_clone()```.
34
351. ```void *ctx```: ctx has arguments, but since we aren't using them here, we'll just cast it to ```void *```.
36
371. ```bpf_trace_printk()```: A simple kernel facility for printf() to the common trace_pipe (/sys/kernel/debug/tracing/trace_pipe). This is ok for some quick examples, but has limitations: 3 args max, 1 %s only, and trace_pipe is globally shared, so concurrent programs will have clashing output. A better interface is via BPF_PERF_OUTPUT(), covered later.
38
391. ```return 0;```: Necessary formality (if you want to know why, see [#139](https://github.com/iovisor/bcc/issues/139)).
40
411. ```.trace_print()```: A bcc routine that reads trace_pipe and prints the output.
42
43### Lesson 2. sys_sync()
44
45Write a program that traces the sys_sync() kernel function. Print "sys_sync() called" when it runs. Test by running ```sync``` in another session while tracing. The hello_world.py program has everything you need for this.
46
47Improve it by printing "Tracing sys_sync()... Ctrl-C to end." when the program first starts. Hint: it's just Python.
48
49### Lesson 3. hello_fields.py
50
51This program is in [examples/tracing/hello_fields.py](../examples/tracing/hello_fields.py). Sample output (run commands in another session):
52
53```
54# ./examples/tracing/hello_fields.py
55TIME(s)            COMM             PID    MESSAGE
5624585001.174885999 sshd             1432   Hello, World!
5724585001.195710000 sshd             15780  Hello, World!
5824585001.991976000 systemd-udevd    484    Hello, World!
5924585002.276147000 bash             15787  Hello, World!
60```
61
62Code:
63
64```Python
65from bcc import BPF
66from bcc.utils import printb
67
68# define BPF program
69prog = """
70int hello(void *ctx) {
71    bpf_trace_printk("Hello, World!\\n");
72    return 0;
73}
74"""
75
76# load BPF program
77b = BPF(text=prog)
78b.attach_kprobe(event=b.get_syscall_fnname("clone"), fn_name="hello")
79
80# header
81print("%-18s %-16s %-6s %s" % ("TIME(s)", "COMM", "PID", "MESSAGE"))
82
83# format output
84while 1:
85    try:
86        (task, pid, cpu, flags, ts, msg) = b.trace_fields()
87    except ValueError:
88        continue
89    except KeyboardInterrupt:
90        exit()
91    printb(b"%-18.9f %-16s %-6d %s" % (ts, task, pid, msg))
92```
93
94This is similar to hello_world.py, and traces new processes via sys_clone() again, but has a few more things to learn:
95
961. ```prog =```: This time we declare the C program as a variable, and later refer to it. This is useful if you want to add some string substitutions based on command line arguments.
97
981. ```hello()```: Now we're just declaring a C function, instead of the ```kprobe__``` shortcut. We'll refer to this later. All C functions declared in the BPF program are expected to be executed on a probe, hence they all need to take a ```pt_reg* ctx``` as first argument. If you need to define some helper function that will not be executed on a probe, they need to be defined as ```static inline``` in order to be inlined by the compiler. Sometimes you would also need to add ```_always_inline``` function attribute to it.
99
1001. ```b.attach_kprobe(event=b.get_syscall_fnname("clone"), fn_name="hello")```: Creates a kprobe for the kernel clone system call function, which will execute our defined hello() function. You can call attach_kprobe() more than once, and attach your C function to multiple kernel functions.
101
1021. ```b.trace_fields()```: Returns a fixed set of fields from trace_pipe. Similar to trace_print(), this is handy for hacking, but for real tooling we should switch to BPF_PERF_OUTPUT().
103
104### Lesson 4. sync_timing.py
105
106Remember the days of sysadmins typing ```sync``` three times on a slow console before ```reboot```, to give the first asynchronous sync time to complete? Then someone thought ```sync;sync;sync``` was clever, to run them all on one line, which became industry practice despite defeating the original purpose! And then sync became synchronous, so more reasons it was silly. Anyway.
107
108The following example times how quickly the ```do_sync``` function is called, and prints output if it has been called more recently than one second ago. A ```sync;sync;sync``` will print output for the 2nd and 3rd sync's:
109
110```
111# ./examples/tracing/sync_timing.py
112Tracing for quick sync's... Ctrl-C to end
113At time 0.00 s: multiple syncs detected, last 95 ms ago
114At time 0.10 s: multiple syncs detected, last 96 ms ago
115```
116
117This program is [examples/tracing/sync_timing.py](../examples/tracing/sync_timing.py):
118
119```Python
120from __future__ import print_function
121from bcc import BPF
122from bcc.utils import printb
123
124# load BPF program
125b = BPF(text="""
126#include <uapi/linux/ptrace.h>
127
128BPF_HASH(last);
129
130int do_trace(struct pt_regs *ctx) {
131    u64 ts, *tsp, delta, key = 0;
132
133    // attempt to read stored timestamp
134    tsp = last.lookup(&key);
135    if (tsp != NULL) {
136        delta = bpf_ktime_get_ns() - *tsp;
137        if (delta < 1000000000) {
138            // output if time is less than 1 second
139            bpf_trace_printk("%d\\n", delta / 1000000);
140        }
141        last.delete(&key);
142    }
143
144    // update stored timestamp
145    ts = bpf_ktime_get_ns();
146    last.update(&key, &ts);
147    return 0;
148}
149""")
150
151b.attach_kprobe(event=b.get_syscall_fnname("sync"), fn_name="do_trace")
152print("Tracing for quick sync's... Ctrl-C to end")
153
154# format output
155start = 0
156while 1:
157    try:
158        (task, pid, cpu, flags, ts, ms) = b.trace_fields()
159        if start == 0:
160            start = ts
161        ts = ts - start
162        printb(b"At time %.2f s: multiple syncs detected, last %s ms ago" % (ts, ms))
163    except KeyboardInterrupt:
164        exit()
165```
166
167Things to learn:
168
1691. ```bpf_ktime_get_ns()```: Returns the time as nanoseconds.
1701. ```BPF_HASH(last)```: Creates a BPF map object that is a hash (associative array), called "last". We didn't specify any further arguments, so it defaults to key and value types of u64.
1711. ```key = 0```: We'll only store one key/value pair in this hash, where the key is hardwired to zero.
1721. ```last.lookup(&key)```: Lookup the key in the hash, and return a pointer to its value if it exists, else NULL. We pass the key in as an address to a pointer.
1731. ```if (tsp != NULL) {```: The verifier requires that pointer values derived from a map lookup must be checked for a null value before they can be dereferenced and used.
1741. ```last.delete(&key)```: Delete the key from the hash. This is currently required because of [a kernel bug in `.update()`](https://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=a6ed3ea65d9868fdf9eff84e6fe4f666b8d14b02) (fixed in 4.8.10).
1751. ```last.update(&key, &ts)```: Associate the value in the 2nd argument to the key, overwriting any previous value. This records the timestamp.
176
177### Lesson 5. sync_count.py
178
179Modify the sync_timing.py program (prior lesson) to store the count of all kernel sync system calls (both fast and slow), and print it with the output. This count can be recorded in the BPF program by adding a new key index to the existing hash.
180
181### Lesson 6. disksnoop.py
182
183Browse the [examples/tracing/disksnoop.py](../examples/tracing/disksnoop.py) program to see what is new. Here is some sample output:
184
185```
186# ./disksnoop.py
187TIME(s)            T  BYTES    LAT(ms)
18816458043.436012    W  4096        3.13
18916458043.437326    W  4096        4.44
19016458044.126545    R  4096       42.82
19116458044.129872    R  4096        3.24
192[...]
193```
194
195And a code snippet:
196
197```Python
198[...]
199REQ_WRITE = 1		# from include/linux/blk_types.h
200
201# load BPF program
202b = BPF(text="""
203#include <uapi/linux/ptrace.h>
204#include <linux/blk-mq.h>
205
206BPF_HASH(start, struct request *);
207
208void trace_start(struct pt_regs *ctx, struct request *req) {
209	// stash start timestamp by request ptr
210	u64 ts = bpf_ktime_get_ns();
211
212	start.update(&req, &ts);
213}
214
215void trace_completion(struct pt_regs *ctx, struct request *req) {
216	u64 *tsp, delta;
217
218	tsp = start.lookup(&req);
219	if (tsp != 0) {
220		delta = bpf_ktime_get_ns() - *tsp;
221		bpf_trace_printk("%d %x %d\\n", req->__data_len,
222		    req->cmd_flags, delta / 1000);
223		start.delete(&req);
224	}
225}
226""")
227if BPF.get_kprobe_functions(b'blk_start_request'):
228        b.attach_kprobe(event="blk_start_request", fn_name="trace_start")
229b.attach_kprobe(event="blk_mq_start_request", fn_name="trace_start")
230if BPF.get_kprobe_functions(b'__blk_account_io_done'):
231    b.attach_kprobe(event="__blk_account_io_done", fn_name="trace_completion")
232else:
233    b.attach_kprobe(event="blk_account_io_done", fn_name="trace_completion")
234[...]
235```
236
237Things to learn:
238
2391. ```REQ_WRITE```: We're defining a kernel constant in the Python program because we'll use it there later. If we were using REQ_WRITE in the BPF program, it should just work (without needing to be defined) with the appropriate #includes.
2401. ```trace_start(struct pt_regs *ctx, struct request *req)```: This function will later be attached to kprobes. The arguments to kprobe functions are ```struct pt_regs *ctx```, for registers and BPF context, and then the actual arguments to the function. We'll attach this to blk_start_request(), where the first argument is ```struct request *```.
2411. ```start.update(&req, &ts)```: We're using the pointer to the request struct as a key in our hash. What? This is commonplace in tracing. Pointers to structs turn out to be great keys, as they are unique: two structs can't have the same pointer address. (Just be careful about when it gets free'd and reused.) So what we're really doing is tagging the request struct, which describes the disk I/O, with our own timestamp, so that we can time it. There's two common keys used for storing timestamps: pointers to structs, and, thread IDs (for timing function entry to return).
2421. ```req->__data_len```: We're dereferencing members of ```struct request```. See its definition in the kernel source for what members are there. bcc actually rewrites these expressions to be a series of ```bpf_probe_read_kernel()``` calls. Sometimes bcc can't handle a complex dereference, and you need to call ```bpf_probe_read_kernel()``` directly.
243
244This is a pretty interesting program, and if you can understand all the code, you'll understand many important basics. We're still using the bpf_trace_printk() hack, so let's fix that next.
245
246### Lesson 7. hello_perf_output.py
247
248Let's finally stop using bpf_trace_printk() and use the proper BPF_PERF_OUTPUT() interface. This will also mean we stop getting the free trace_field() members like PID and timestamp, and will need to fetch them directly. Sample output while commands are run in another session:
249
250```
251# ./hello_perf_output.py
252TIME(s)            COMM             PID    MESSAGE
2530.000000000        bash             22986  Hello, perf_output!
2540.021080275        systemd-udevd    484    Hello, perf_output!
2550.021359520        systemd-udevd    484    Hello, perf_output!
2560.021590610        systemd-udevd    484    Hello, perf_output!
257[...]
258```
259
260Code is [examples/tracing/hello_perf_output.py](../examples/tracing/hello_perf_output.py):
261
262```Python
263from bcc import BPF
264
265# define BPF program
266prog = """
267#include <linux/sched.h>
268
269// define output data structure in C
270struct data_t {
271    u32 pid;
272    u64 ts;
273    char comm[TASK_COMM_LEN];
274};
275BPF_PERF_OUTPUT(events);
276
277int hello(struct pt_regs *ctx) {
278    struct data_t data = {};
279
280    data.pid = bpf_get_current_pid_tgid();
281    data.ts = bpf_ktime_get_ns();
282    bpf_get_current_comm(&data.comm, sizeof(data.comm));
283
284    events.perf_submit(ctx, &data, sizeof(data));
285
286    return 0;
287}
288"""
289
290# load BPF program
291b = BPF(text=prog)
292b.attach_kprobe(event=b.get_syscall_fnname("clone"), fn_name="hello")
293
294# header
295print("%-18s %-16s %-6s %s" % ("TIME(s)", "COMM", "PID", "MESSAGE"))
296
297# process event
298start = 0
299def print_event(cpu, data, size):
300    global start
301    event = b["events"].event(data)
302    if start == 0:
303            start = event.ts
304    time_s = (float(event.ts - start)) / 1000000000
305    print("%-18.9f %-16s %-6d %s" % (time_s, event.comm, event.pid,
306        "Hello, perf_output!"))
307
308# loop with callback to print_event
309b["events"].open_perf_buffer(print_event)
310while 1:
311    b.perf_buffer_poll()
312```
313
314Things to learn:
315
3161. ```struct data_t```: This defines the C struct we'll use to pass data from kernel to user space.
3171. ```BPF_PERF_OUTPUT(events)```: This names our output channel "events".
3181. ```struct data_t data = {};```: Create an empty data_t struct that we'll then populate.
3191. ```bpf_get_current_pid_tgid()```: Returns the process ID in the lower 32 bits (kernel's view of the PID, which in user space is usually presented as the thread ID), and the thread group ID in the upper 32 bits (what user space often thinks of as the PID). By directly setting this to a u32, we discard the upper 32 bits. Should you be presenting the PID or the TGID? For a multi-threaded app, the TGID will be the same, so you need the PID to differentiate them, if that's what you want. It's also a question of expectations for the end user.
3201. ```bpf_get_current_comm()```: Populates the first argument address with the current process name.
3211. ```events.perf_submit()```: Submit the event for user space to read via a perf ring buffer.
3221. ```def print_event()```: Define a Python function that will handle reading events from the ```events``` stream.
3231. ```b["events"].event(data)```: Now get the event as a Python object, auto-generated from the C declaration.
3241. ```b["events"].open_perf_buffer(print_event)```: Associate the Python ```print_event``` function with the ```events``` stream.
3251. ```while 1: b.perf_buffer_poll()```: Block waiting for events.
326
327### Lesson 8. sync_perf_output.py
328
329Rewrite sync_timing.py, from a prior lesson, to use ```BPF_PERF_OUTPUT```.
330
331### Lesson 9. bitehist.py
332
333The following tool records a histogram of disk I/O sizes. Sample output:
334
335```
336# ./bitehist.py
337Tracing... Hit Ctrl-C to end.
338^C
339     kbytes          : count     distribution
340       0 -> 1        : 3        |                                      |
341       2 -> 3        : 0        |                                      |
342       4 -> 7        : 211      |**********                            |
343       8 -> 15       : 0        |                                      |
344      16 -> 31       : 0        |                                      |
345      32 -> 63       : 0        |                                      |
346      64 -> 127      : 1        |                                      |
347     128 -> 255      : 800      |**************************************|
348```
349
350Code is [examples/tracing/bitehist.py](../examples/tracing/bitehist.py):
351
352```Python
353from __future__ import print_function
354from bcc import BPF
355from time import sleep
356
357# load BPF program
358b = BPF(text="""
359#include <uapi/linux/ptrace.h>
360#include <linux/blkdev.h>
361
362BPF_HISTOGRAM(dist);
363
364int kprobe__blk_account_io_done(struct pt_regs *ctx, struct request *req)
365{
366	dist.increment(bpf_log2l(req->__data_len / 1024));
367	return 0;
368}
369""")
370
371# header
372print("Tracing... Hit Ctrl-C to end.")
373
374# trace until Ctrl-C
375try:
376	sleep(99999999)
377except KeyboardInterrupt:
378	print()
379
380# output
381b["dist"].print_log2_hist("kbytes")
382```
383
384A recap from earlier lessons:
385
386- ```kprobe__```: This prefix means the rest will be treated as a kernel function name that will be instrumented using kprobe.
387- ```struct pt_regs *ctx, struct request *req```: Arguments to kprobe. The ```ctx``` is registers and BPF context, the ```req``` is the first argument to the instrumented function: ```blk_account_io_done()```.
388- ```req->__data_len```: Dereferencing that member.
389
390New things to learn:
391
3921. ```BPF_HISTOGRAM(dist)```: Defines a BPF map object that is a histogram, and names it "dist".
3931. ```dist.increment()```: Increments the histogram bucket index provided as first argument by one by default. Optionally, custom increments can be passed as the second argument.
3941. ```bpf_log2l()```: Returns the log-2 of the provided value. This becomes the index of our histogram, so that we're constructing a power-of-2 histogram.
3951. ```b["dist"].print_log2_hist("kbytes")```: Prints the "dist" histogram as power-of-2, with a column header of "kbytes". The only data transferred from kernel to user space is the bucket counts, making this efficient.
396
397### Lesson 10. disklatency.py
398
399Write a program that times disk I/O, and prints a histogram of their latency. Disk I/O instrumentation and timing can be found in the disksnoop.py program from a prior lesson, and histogram code can be found in bitehist.py from a prior lesson.
400
401### Lesson 11. vfsreadlat.py
402
403This example is split into separate Python and C files. Example output:
404
405```
406# ./vfsreadlat.py 1
407Tracing... Hit Ctrl-C to end.
408     usecs               : count     distribution
409         0 -> 1          : 0        |                                        |
410         2 -> 3          : 2        |***********                             |
411         4 -> 7          : 7        |****************************************|
412         8 -> 15         : 4        |**********************                  |
413
414     usecs               : count     distribution
415         0 -> 1          : 29       |****************************************|
416         2 -> 3          : 28       |**************************************  |
417         4 -> 7          : 4        |*****                                   |
418         8 -> 15         : 8        |***********                             |
419        16 -> 31         : 0        |                                        |
420        32 -> 63         : 0        |                                        |
421        64 -> 127        : 0        |                                        |
422       128 -> 255        : 0        |                                        |
423       256 -> 511        : 2        |**                                      |
424       512 -> 1023       : 0        |                                        |
425      1024 -> 2047       : 0        |                                        |
426      2048 -> 4095       : 0        |                                        |
427      4096 -> 8191       : 4        |*****                                   |
428      8192 -> 16383      : 6        |********                                |
429     16384 -> 32767      : 9        |************                            |
430     32768 -> 65535      : 6        |********                                |
431     65536 -> 131071     : 2        |**                                      |
432
433     usecs               : count     distribution
434         0 -> 1          : 11       |****************************************|
435         2 -> 3          : 2        |*******                                 |
436         4 -> 7          : 10       |************************************    |
437         8 -> 15         : 8        |*****************************           |
438        16 -> 31         : 1        |***                                     |
439        32 -> 63         : 2        |*******                                 |
440[...]
441```
442
443Browse the code in [examples/tracing/vfsreadlat.py](../examples/tracing/vfsreadlat.py) and [examples/tracing/vfsreadlat.c](../examples/tracing/vfsreadlat.c). Things to learn:
444
4451. ```b = BPF(src_file = "vfsreadlat.c")```: Read the BPF C program from a separate source file.
4461. ```b.attach_kretprobe(event="vfs_read", fn_name="do_return")```: Attaches the BPF C function ```do_return()``` to the return of the kernel function ```vfs_read()```. This is a kretprobe: instrumenting the return from a function, rather than its entry.
4471. ```b["dist"].clear()```: Clears the histogram.
448
449### Lesson 12. urandomread.py
450
451Tracing while a ```dd if=/dev/urandom of=/dev/null bs=8k count=5``` is run:
452
453```
454# ./urandomread.py
455TIME(s)            COMM             PID    GOTBITS
45624652832.956994001 smtp             24690  384
45724652837.726500999 dd               24692  65536
45824652837.727111001 dd               24692  65536
45924652837.727703001 dd               24692  65536
46024652837.728294998 dd               24692  65536
46124652837.728888001 dd               24692  65536
462```
463
464Hah! I caught smtp by accident. Code is [examples/tracing/urandomread.py](../examples/tracing/urandomread.py):
465
466```Python
467from __future__ import print_function
468from bcc import BPF
469
470# load BPF program
471b = BPF(text="""
472TRACEPOINT_PROBE(random, urandom_read) {
473    // args is from /sys/kernel/debug/tracing/events/random/urandom_read/format
474    bpf_trace_printk("%d\\n", args->got_bits);
475    return 0;
476}
477""")
478
479# header
480print("%-18s %-16s %-6s %s" % ("TIME(s)", "COMM", "PID", "GOTBITS"))
481
482# format output
483while 1:
484    try:
485        (task, pid, cpu, flags, ts, msg) = b.trace_fields()
486    except ValueError:
487        continue
488    print("%-18.9f %-16s %-6d %s" % (ts, task, pid, msg))
489```
490
491Things to learn:
492
4931. ```TRACEPOINT_PROBE(random, urandom_read)```: Instrument the kernel tracepoint ```random:urandom_read```. These have a stable API, and thus are recommend to use instead of kprobes, wherever possible. You can run ```perf list``` for a list of tracepoints. Linux >= 4.7 is required to attach BPF programs to tracepoints.
4941. ```args->got_bits```: ```args``` is auto-populated to be a structure of the tracepoint arguments. The comment above says where you can see that structure. Eg:
495
496```
497# cat /sys/kernel/debug/tracing/events/random/urandom_read/format
498name: urandom_read
499ID: 972
500format:
501	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
502	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
503	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
504	field:int common_pid;	offset:4;	size:4;	signed:1;
505
506	field:int got_bits;	offset:8;	size:4;	signed:1;
507	field:int pool_left;	offset:12;	size:4;	signed:1;
508	field:int input_left;	offset:16;	size:4;	signed:1;
509
510print fmt: "got_bits %d nonblocking_pool_entropy_left %d input_entropy_left %d", REC->got_bits, REC->pool_left, REC->input_left
511```
512
513In this case, we were printing the ```got_bits``` member.
514
515### Lesson 13. disksnoop.py fixed
516
517Convert disksnoop.py from a previous lesson to use the ```block:block_rq_issue``` and ```block:block_rq_complete``` tracepoints.
518
519### Lesson 14. strlen_count.py
520
521This program instruments a user-level function, the ```strlen()``` library function, and frequency counts its string argument. Example output:
522
523```
524# ./strlen_count.py
525Tracing strlen()... Hit Ctrl-C to end.
526^C     COUNT STRING
527         1 " "
528         1 "/bin/ls"
529         1 "."
530         1 "cpudist.py.1"
531         1 ".bashrc"
532         1 "ls --color=auto"
533         1 "key_t"
534[...]
535        10 "a7:~# "
536        10 "/root"
537        12 "LC_ALL"
538        12 "en_US.UTF-8"
539        13 "en_US.UTF-8"
540        20 "~"
541        70 "#%^,~:-=?+/}"
542       340 "\x01\x1b]0;root@bgregg-test: ~\x07\x02root@bgregg-test:~# "
543```
544
545These are various strings that are being processed by this library function while tracing, along with their frequency counts. ```strlen()``` was called on "LC_ALL" 12 times, for example.
546
547Code is [examples/tracing/strlen_count.py](../examples/tracing/strlen_count.py):
548
549```Python
550from __future__ import print_function
551from bcc import BPF
552from time import sleep
553
554# load BPF program
555b = BPF(text="""
556#include <uapi/linux/ptrace.h>
557
558struct key_t {
559    char c[80];
560};
561BPF_HASH(counts, struct key_t);
562
563int count(struct pt_regs *ctx) {
564    if (!PT_REGS_PARM1(ctx))
565        return 0;
566
567    struct key_t key = {};
568    u64 zero = 0, *val;
569
570    bpf_probe_read_user(&key.c, sizeof(key.c), (void *)PT_REGS_PARM1(ctx));
571    // could also use `counts.increment(key)`
572    val = counts.lookup_or_try_init(&key, &zero);
573    if (val) {
574      (*val)++;
575    }
576    return 0;
577};
578""")
579b.attach_uprobe(name="c", sym="strlen", fn_name="count")
580
581# header
582print("Tracing strlen()... Hit Ctrl-C to end.")
583
584# sleep until Ctrl-C
585try:
586    sleep(99999999)
587except KeyboardInterrupt:
588    pass
589
590# print output
591print("%10s %s" % ("COUNT", "STRING"))
592counts = b.get_table("counts")
593for k, v in sorted(counts.items(), key=lambda counts: counts[1].value):
594    print("%10d \"%s\"" % (v.value, k.c.encode('string-escape')))
595```
596
597Things to learn:
598
5991. ```PT_REGS_PARM1(ctx)```: This fetches the first argument to ```strlen()```, which is the string.
6001. ```b.attach_uprobe(name="c", sym="strlen", fn_name="count")```: Attach to library "c" (if this is the main program, use its pathname), instrument the user-level function ```strlen()```, and on execution call our C function ```count()```.
601
602### Lesson 15. nodejs_http_server.py
603
604This program instruments a user statically-defined tracing (USDT) probe, which is the user-level version of a kernel tracepoint. Sample output:
605
606```
607# ./nodejs_http_server.py 24728
608TIME(s)            COMM             PID    ARGS
60924653324.561322998 node             24728  path:/index.html
61024653335.343401998 node             24728  path:/images/welcome.png
61124653340.510164998 node             24728  path:/images/favicon.png
612```
613
614Relevant code from [examples/tracing/nodejs_http_server.py](../examples/tracing/nodejs_http_server.py):
615
616```Python
617from __future__ import print_function
618from bcc import BPF, USDT
619import sys
620
621if len(sys.argv) < 2:
622    print("USAGE: nodejs_http_server PID")
623    exit()
624pid = sys.argv[1]
625debug = 0
626
627# load BPF program
628bpf_text = """
629#include <uapi/linux/ptrace.h>
630int do_trace(struct pt_regs *ctx) {
631    uint64_t addr;
632    char path[128]={0};
633    bpf_usdt_readarg(6, ctx, &addr);
634    bpf_probe_read_user(&path, sizeof(path), (void *)addr);
635    bpf_trace_printk("path:%s\\n", path);
636    return 0;
637};
638"""
639
640# enable USDT probe from given PID
641u = USDT(pid=int(pid))
642u.enable_probe(probe="http__server__request", fn_name="do_trace")
643if debug:
644    print(u.get_text())
645    print(bpf_text)
646
647# initialize BPF
648b = BPF(text=bpf_text, usdt_contexts=[u])
649```
650
651Things to learn:
652
6531. ```bpf_usdt_readarg(6, ctx, &addr)```: Read the address of argument 6 from the USDT probe into ```addr```.
6541. ```bpf_probe_read_user(&path, sizeof(path), (void *)addr)```: Now the string ```addr``` points to into our ```path``` variable.
6551. ```u = USDT(pid=int(pid))```: Initialize USDT tracing for the given PID.
6561. ```u.enable_probe(probe="http__server__request", fn_name="do_trace")```: Attach our ```do_trace()``` BPF C function to the Node.js ```http__server__request``` USDT probe.
6571. ```b = BPF(text=bpf_text, usdt_contexts=[u])```: Need to pass in our USDT object, ```u```, to BPF object creation.
658
659### Lesson 16. task_switch.c
660
661This is an older tutorial included as a bonus lesson. Use this for recap and to reinforce what you've already learned.
662
663This is a slightly more complex tracing example than Hello World. This program
664will be invoked for every task change in the kernel, and record in a BPF map
665the new and old pids.
666
667The C program below introduces a new concept: the prev argument. This
668argument is treated specially by the BCC frontend, such that accesses
669to this variable are read from the saved context that is passed by the
670kprobe infrastructure. The prototype of the args starting from
671position 1 should match the prototype of the kernel function being
672kprobed. If done so, the program will have seamless access to the
673function parameters.
674
675```c
676#include <uapi/linux/ptrace.h>
677#include <linux/sched.h>
678
679struct key_t {
680    u32 prev_pid;
681    u32 curr_pid;
682};
683
684BPF_HASH(stats, struct key_t, u64, 1024);
685int count_sched(struct pt_regs *ctx, struct task_struct *prev) {
686    struct key_t key = {};
687    u64 zero = 0, *val;
688
689    key.curr_pid = bpf_get_current_pid_tgid();
690    key.prev_pid = prev->pid;
691
692    // could also use `stats.increment(key);`
693    val = stats.lookup_or_try_init(&key, &zero);
694    if (val) {
695      (*val)++;
696    }
697    return 0;
698}
699```
700
701The userspace component loads the file shown above, and attaches it to the
702`finish_task_switch` kernel function.
703The `[]` operator of the BPF object gives access to each BPF_HASH in the
704program, allowing pass-through access to the values residing in the kernel. Use
705the object as you would any other python dict object: read, update, and deletes
706are all allowed.
707```python
708from bcc import BPF
709from time import sleep
710
711b = BPF(src_file="task_switch.c")
712b.attach_kprobe(event="finish_task_switch", fn_name="count_sched")
713
714# generate many schedule events
715for i in range(0, 100): sleep(0.01)
716
717for k, v in b["stats"].items():
718    print("task_switch[%5d->%5d]=%u" % (k.prev_pid, k.curr_pid, v.value))
719```
720
721These programs can be found in the files [examples/tracing/task_switch.c](../examples/tracing/task_switch.c) and [examples/tracing/task_switch.py](../examples/tracing/task_switch.py) respectively.
722
723### Lesson 17. Further Study
724
725For further study, see Sasha Goldshtein's [linux-tracing-workshop](https://github.com/goldshtn/linux-tracing-workshop), which contains additional labs. There are also many tools in bcc /tools to study.
726
727Please read [CONTRIBUTING-SCRIPTS.md](../CONTRIBUTING-SCRIPTS.md) if you wish to contribute tools to bcc. At the bottom of the main [README.md](../README.md), you'll also find methods for contacting us. Good luck, and happy tracing!
728
729## Networking
730
731To do.
732