1Demonstrations of funcinterval, the Linux eBPF/bcc version. 2 3eBPF/bcc is very suitable for platform performance tuning. 4By funclatency, we can profile specific functions to know how latency 5this function costs. However, sometimes performance drop is not about the 6latency of function but the interval between function calls. 7funcinterval is born for this purpose. 8 9Another story, hardware performance tuning on the platform we will use 10protocol analyzer to analyze performance, but most protocol analyzers lack 11the distribution feature. Using a protocol analyzer you need a lot of time 12to check every detail latency. By funcinterval, we can save a lot of time 13by distribution feature. 14 15For example: 16 17# ./funcinterval xhci_ring_ep_doorbell -d 2 -u 18Tracing 1 functions for "xhci_ring_ep_doorbell"... Hit Ctrl-C to end. 19 20 usecs : count distribution 21 0 -> 1 : 0 | | 22 2 -> 3 : 0 | | 23 4 -> 7 : 0 | | 24 8 -> 15 : 0 | | 25 16 -> 31 : 0 | | 26 32 -> 63 : 134 | | 27 64 -> 127 : 2862 |******************** | 28 128 -> 255 : 5552 |****************************************| 29 256 -> 511 : 216 |* | 30 512 -> 1023 : 2 | | 31Detaching... 32 33This example output shows that the interval latency of xhci_ring_ep_doorbell 34took between 64 and 255 microseconds. USB MAC will start its job after USB 35doorbell register ringing, above information that can help hardware engineer to 36analyze, the performance drop is because software rings the doorbell too 37late or just slowly hardware DMA. 38 39# ./funcinterval blk_start_request -i 2 -u 40Tracing 1 functions for "blk_start_request"... Hit Ctrl-C to end. 41 42 usecs : count distribution 43 0 -> 1 : 0 | | 44 2 -> 3 : 0 | | 45 4 -> 7 : 0 | | 46 8 -> 15 : 5 |* | 47 16 -> 31 : 0 | | 48 32 -> 63 : 1 | | 49 64 -> 127 : 2 | | 50 128 -> 255 : 1 | | 51 256 -> 511 : 0 | | 52 512 -> 1023 : 1 | | 53 1024 -> 2047 : 1 | | 54 2048 -> 4095 : 117 |****************************************| 55 4096 -> 8191 : 13 |**** | 56 8192 -> 16383 : 1 | | 57 58If using biolatency tool that has no difference between two platforms. 59Maybe the problem is related to the interval time instead of block 60device I/O latency. 61 62# ./funcinterval ion_ioctl -i 2 -m 63Tracing 1 functions for "ion_ioctl"... Hit Ctrl-C to end. 64 65 msecs : count distribution 66 0 -> 1 : 215 |****************************************| 67 2 -> 3 : 0 | | 68 4 -> 7 : 4 | | 69 8 -> 15 : 5 | | 70 16 -> 31 : 29 |***** | 71 72You can also check the ion_ioctl. By the above output, we know the activity 73frequency of ion_ioctl() is high mostly(less than 1 ms), but has 29 times low 74frequency. 75 76# ./funcinterval t:block:block_bio_queue -d 30 -u 77Tracing tracepoint for "block:block_bio_queue"... Hit Ctrl-C to end. 78 79 usecs : count distribution 80 0 -> 1 : 0 | | 81 2 -> 3 : 0 | | 82 4 -> 7 : 0 | | 83 8 -> 15 : 2 | | 84 16 -> 31 : 262 | | 85 32 -> 63 : 9075 |******************* | 86 64 -> 127 : 18668 |****************************************| 87 128 -> 255 : 1492 |*** | 88 256 -> 511 : 2616 |***** | 89 512 -> 1023 : 7226 |*************** | 90 1024 -> 2047 : 8982 |******************* | 91 2048 -> 4095 : 2394 |***** | 92 4096 -> 8191 : 163 | | 93 8192 -> 16383 : 42 | | 94 16384 -> 32767 : 2 | | 95 32768 -> 65535 : 0 | | 96 65536 -> 131071 : 0 | | 97 131072 -> 262143 : 0 | | 98 262144 -> 524287 : 0 | | 99 524288 -> 1048575 : 1 | | 100Detaching... 101 102# ./funcinterval t:block:block_rq_issue -d 30 -u 103Tracing tracepoint for "block:block_rq_issue"... Hit Ctrl-C to end. 104 105 usecs : count distribution 106 0 -> 1 : 0 | | 107 2 -> 3 : 0 | | 108 4 -> 7 : 0 | | 109 8 -> 15 : 0 | | 110 16 -> 31 : 5 | | 111 32 -> 63 : 18 | | 112 64 -> 127 : 32 | | 113 128 -> 255 : 95 | | 114 256 -> 511 : 2194 |****** | 115 512 -> 1023 : 13830 |****************************************| 116 1024 -> 2047 : 9001 |************************** | 117 2048 -> 4095 : 1569 |**** | 118 4096 -> 8191 : 96 | | 119 8192 -> 16383 : 17 | | 120Detaching... 121 122funcinterval also supports tracepoint filter. The above two cases are under EMMC 123throughput testing, by those results you know which layer has a slower interval 124time. In our case, mmc-cmdqd is slower than block layer. 125 126 127# ./funcinterval -p `pidof -s top` c:malloc -i 3 128Tracing uprobe function for "malloc"... Hit Ctrl-C to end. 129 130 nsecs : count distribution 131 0 -> 1 : 0 | | 132 2 -> 3 : 0 | | 133 4 -> 7 : 0 | | 134 8 -> 15 : 0 | | 135 16 -> 31 : 0 | | 136 32 -> 63 : 0 | | 137 64 -> 127 : 0 | | 138 128 -> 255 : 0 | | 139 256 -> 511 : 0 | | 140 512 -> 1023 : 0 | | 141 1024 -> 2047 : 0 | | 142 2048 -> 4095 : 0 | | 143 4096 -> 8191 : 7 |************************* | 144 8192 -> 16383 : 11 |****************************************| 145 16384 -> 32767 : 4 |************** | 146 32768 -> 65535 : 1 |*** | 147 65536 -> 131071 : 1 |*** | 148 131072 -> 262143 : 1 |*** | 149 262144 -> 524287 : 0 | | 150 524288 -> 1048575 : 0 | | 151 1048576 -> 2097151 : 0 | | 152 2097152 -> 4194303 : 0 | | 153 4194304 -> 8388607 : 1 |*** | 154 155 156 nsecs : count distribution 157 0 -> 1 : 0 | | 158 2 -> 3 : 0 | | 159 4 -> 7 : 0 | | 160 8 -> 15 : 0 | | 161 16 -> 31 : 0 | | 162 32 -> 63 : 0 | | 163 64 -> 127 : 0 | | 164 128 -> 255 : 0 | | 165 256 -> 511 : 0 | | 166 512 -> 1023 : 0 | | 167 1024 -> 2047 : 0 | | 168 2048 -> 4095 : 0 | | 169 4096 -> 8191 : 8 |******************************** | 170 8192 -> 16383 : 10 |****************************************| 171 16384 -> 32767 : 4 |**************** | 172 32768 -> 65535 : 1 |**** | 173 65536 -> 131071 : 1 |**** | 174 131072 -> 262143 : 1 |**** | 175 262144 -> 524287 : 0 | | 176 524288 -> 1048575 : 0 | | 177 1048576 -> 2097151 : 0 | | 178 2097152 -> 4194303 : 0 | | 179 4194304 -> 8388607 : 1 |**** | 180 181Time the interval of libc's malloc for top utility every 3 seconds. 182 183# ./funcinterval /usr/local/bin/python:main 184Tracing uprobe function for "main"... Hit Ctrl-C to end. 185^C 186 nsecs : count distribution 187 0 -> 1 : 0 | | 188 2 -> 3 : 0 | | 189 4 -> 7 : 0 | | 190 8 -> 15 : 0 | | 191 16 -> 31 : 0 | | 192 32 -> 63 : 0 | | 193 64 -> 127 : 0 | | 194 128 -> 255 : 0 | | 195 256 -> 511 : 0 | | 196 512 -> 1023 : 0 | | 197 1024 -> 2047 : 0 | | 198 2048 -> 4095 : 0 | | 199 4096 -> 8191 : 0 | | 200 8192 -> 16383 : 0 | | 201 16384 -> 32767 : 0 | | 202 32768 -> 65535 : 0 | | 203 65536 -> 131071 : 0 | | 204 131072 -> 262143 : 0 | | 205 262144 -> 524287 : 0 | | 206 524288 -> 1048575 : 0 | | 207 1048576 -> 2097151 : 0 | | 208 2097152 -> 4194303 : 0 | | 209 4194304 -> 8388607 : 0 | | 210 8388608 -> 16777215 : 0 | | 211 16777216 -> 33554431 : 0 | | 212 33554432 -> 67108863 : 0 | | 213 67108864 -> 134217727 : 0 | | 214 134217728 -> 268435455 : 0 | | 215 268435456 -> 536870911 : 1 |****************************************| 216 536870912 -> 1073741823 : 1 |****************************************| 2171073741824 -> 2147483647 : 1 |****************************************| 2182147483648 -> 4294967295 : 1 |****************************************| 219Detaching... 220 221Time the interal of python's main function. 222 223USAGE message: 224 225# ./funcinterval -h 226usage: funcinterval [-h] [-p PID] [-i INTERVAL] [-d DURATION] [-T] [-u] [-m] 227 [-v] 228 pattern 229 230Time interval and print latency as a histogram 231 232positional arguments: 233 pattern Function name for tracing 234 235optional arguments: 236 -h, --help show this help message and exit 237 -p PID, --pid PID trace this PID only 238 -i INTERVAL, --interval INTERVAL 239 summary interval, in seconds 240 -d DURATION, --duration DURATION 241 total duration of trace, in seconds 242 -T, --timestamp include timestamp on output 243 -u, --microseconds microsecond histogram 244 -m, --milliseconds millisecond histogram 245 -v, --verbose print the BPF program (for debugging purposes) 246 247examples: 248 # time the interval of do_sys_open() 249 ./funcinterval do_sys_open 250 # time the interval of xhci_ring_ep_doorbell(), in microseconds 251 ./funcinterval -u xhci_ring_ep_doorbell 252 # time the interval of do_nanosleep(), in milliseconds 253 ./funcinterval -m do_nanosleep 254 # output every 5 seconds, with timestamps 255 ./funcinterval -mTi 5 vfs_read 256 # time process 181 only 257 ./funcinterval -p 181 vfs_read 258 # time the interval of mm_vmscan_direct_reclaim_begin tracepoint 259 ./funcinterval t:vmscan:mm_vmscan_direct_reclaim_begin 260 # time the interval of c:malloc used by top every 3 seconds 261 ./funcinterval -p `pidof -s top` -i 3 c:malloc 262 # time /usr/local/bin/python main function 263 ./funcinterval /usr/local/bin/python:main 264