1Demonstrations of drsnoop, the Linux eBPF/bcc version. 2 3 4drsnoop traces the direct reclaim system-wide, and prints various details. 5Example output: 6 7# ./drsnoop 8COMM PID LAT(ms) PAGES 9summond 17678 0.19 143 10summond 17669 0.55 313 11summond 17669 0.15 145 12summond 17669 0.27 237 13summond 17669 0.48 111 14summond 17669 0.16 75 15head 17821 0.29 339 16head 17825 0.17 109 17summond 17669 0.14 73 18summond 17496 104.84 40 19summond 17678 0.32 167 20summond 17678 0.14 106 21summond 17678 0.16 67 22summond 17678 0.29 267 23summond 17678 0.27 69 24summond 17678 0.32 46 25base64 17816 0.16 85 26summond 17678 0.43 283 27summond 17678 0.14 182 28head 17736 0.57 135 29^C 30 31While tracing, the processes alloc pages,due to insufficient memory available 32in the system, direct reclaim events happened, which will increase the waiting 33delay of the processes. 34 35drsnoop can be useful for discovering when allocstall(/proc/vmstat) continues to increase, 36whether it is caused by some critical processes or not. 37 38The -p option can be used to filter on a PID, which is filtered in-kernel. Here 39I've used it with -T to print timestamps: 40 41# ./drsnoop -Tp 17491 42TIME(s) COMM PID LAT(ms) PAGES 43107.364115000 summond 17491 0.24 50 44107.364550000 summond 17491 0.26 38 45107.365266000 summond 17491 0.36 72 46107.365753000 summond 17491 0.22 49 47^C 48 49This shows the summond process allocs pages, and direct reclaim events happening, 50and the delays are not affected much. 51 52The -U option include UID on output: 53 54# ./drsnoop -U 55UID COMM PID LAT(ms) PAGES 561000 summond 17678 0.32 46 570 base64 17816 0.16 85 581000 summond 17678 0.43 283 591000 summond 17678 0.14 182 600 head 17821 0.29 339 610 head 17825 0.17 109 62^C 63 64The -u option filtering UID: 65 66# ./drsnoop -Uu 1000 67UID COMM PID LAT(ms) PAGES 681000 summond 17678 0.19 143 691000 summond 17669 0.55 313 701000 summond 17669 0.15 145 711000 summond 17669 0.27 237 721000 summond 17669 0.48 111 731000 summond 17669 0.16 75 741000 summond 17669 0.14 73 751000 summond 17678 0.32 167 76^C 77 78A maximum tracing duration can be set with the -d option. For example, to trace 79for 2 seconds: 80 81# ./drsnoop -d 2 82COMM PID LAT(ms) PAGES 83head 21715 0.15 195 84 85The -n option can be used to filter on process name using partial matches: 86 87# ./drsnoop -n mond 88COMM PID LAT(ms) PAGES 89summond 10271 0.03 51 90summond 10271 0.03 51 91summond 10259 0.05 51 92summond 10269 319.41 37 93summond 10270 111.73 35 94summond 10270 0.11 78 95summond 10270 0.12 71 96summond 10270 0.03 35 97summond 10277 111.62 41 98summond 10277 0.08 45 99summond 10277 0.06 32 100^C 101 102This caught the 'summond' command because it partially matches 'mond' that's passed 103to the '-n' option. 104 105 106The -v option can be used to show system memory state (now only free mem) at 107the beginning of direct reclaiming: 108 109# ./drsnoop.py -v 110COMM PID LAT(ms) PAGES FREE(KB) 111base64 34924 0.23 151 86260 112base64 34962 0.26 149 86260 113head 34931 0.24 150 86260 114base64 34902 0.19 148 86260 115head 34963 0.19 151 86228 116base64 34959 0.17 151 86228 117head 34965 0.29 190 86228 118base64 34957 0.24 152 86228 119summond 34870 0.15 151 86080 120summond 34870 0.12 115 86184 121 122USAGE message: 123 124# ./drsnoop -h 125usage: drsnoop.py [-h] [-T] [-U] [-p PID] [-t TID] [-u UID] [-d DURATION] 126 [-n NAME] 127 128Trace direct reclaim 129 130optional arguments: 131 -h, --help show this help message and exit 132 -T, --timestamp include timestamp on output 133 -U, --print-uid print UID column 134 -p PID, --pid PID trace this PID only 135 -t TID, --tid TID trace this TID only 136 -u UID, --uid UID trace this UID only 137 -d DURATION, --duration DURATION 138 total duration of trace in seconds 139 -n NAME, --name NAME only print process names containing this name 140 141examples: 142 ./drsnoop # trace all direct reclaim 143 ./drsnoop -T # include timestamps 144 ./drsnoop -U # include UID 145 ./drsnoop -p 181 # only trace PID 181 146 ./drsnoop -t 123 # only trace TID 123 147 ./drsnoop -u 1000 # only trace UID 1000 148 ./drsnoop -d 10 # trace for 10 seconds only 149 ./drsnoop -n main # only print process names containing "main" 150