1Demonstrations of kvm exit reasons, the Linux eBPF/bcc version. 2 3 4Considering virtual machines' frequent exits can cause performance problems, 5this tool aims to locate the frequent exited reasons and then find solutions 6to reduce or even avoid the exit, by displaying the detail exit reasons and 7the counts of each vm exit for all vms running on one physical machine. 8 9 10Features of this tool 11===================== 12 13- Although there is a patch: [KVM: x86: add full vm-exit reason debug entries] 14 (https://patchwork.kernel.org/project/kvm/patch/[email protected]/) 15 trying to fill more vm-exit reason debug entries, just as the comments said, 16 the code allocates lots of memory that may never be consumed, misses some 17 arch-specific kvm causes, and can not do kernel aggregation. Instead bcc, as 18 a user space tool, can implement all these functions more easily and flexibly. 19- The bcc python logic could provide nice kernel aggregation and custom output, 20 like collpasing all tids for one pid (e.i. one vm's qemu process id) with exit 21 reasons sorted in descending order. For more information, see the following 22 #USAGE message. 23- The bpf in-kernel percpu_array and percpu_cache further improves performance. 24 For more information, see the following #Help to understand. 25 26 27Limited 28======= 29 30In view of the hardware-assisted virtualization technology of 31different architectures, currently we only adapt on vmx in intel. 32And the amd feature is on the road.. 33 34 35Example output: 36=============== 37 38# ./kvmexit.py 39Display kvm exit reasons and statistics for all threads... Hit Ctrl-C to end. 40PID TID KVM_EXIT_REASON COUNT 41^C1273551 1273568 EXIT_REASON_HLT 12 421273551 1273568 EXIT_REASON_MSR_WRITE 6 431274253 1274261 EXIT_REASON_EXTERNAL_INTERRUPT 1 441274253 1274261 EXIT_REASON_HLT 12 451274253 1274261 EXIT_REASON_MSR_WRITE 4 46 47# ./kvmexit.py 6 48Display kvm exit reasons and statistics for all threads after sleeping 6 secs. 49PID TID KVM_EXIT_REASON COUNT 501273903 1273922 EXIT_REASON_EXTERNAL_INTERRUPT 175 511273903 1273922 EXIT_REASON_CPUID 10 521273903 1273922 EXIT_REASON_HLT 6043 531273903 1273922 EXIT_REASON_IO_INSTRUCTION 24 541273903 1273922 EXIT_REASON_MSR_WRITE 15025 551273903 1273922 EXIT_REASON_PAUSE_INSTRUCTION 11 561273903 1273922 EXIT_REASON_EOI_INDUCED 12 571273903 1273922 EXIT_REASON_EPT_VIOLATION 6 581273903 1273922 EXIT_REASON_EPT_MISCONFIG 380 591273903 1273922 EXIT_REASON_PREEMPTION_TIMER 194 601273551 1273568 EXIT_REASON_EXTERNAL_INTERRUPT 18 611273551 1273568 EXIT_REASON_HLT 989 621273551 1273568 EXIT_REASON_IO_INSTRUCTION 10 631273551 1273568 EXIT_REASON_MSR_WRITE 2205 641273551 1273568 EXIT_REASON_PAUSE_INSTRUCTION 1 651273551 1273568 EXIT_REASON_EOI_INDUCED 5 661273551 1273568 EXIT_REASON_EPT_MISCONFIG 61 671273551 1273568 EXIT_REASON_PREEMPTION_TIMER 14 68 69# ./kvmexit.py -p 1273795 5 70Display kvm exit reasons and statistics for PID 1273795 after sleeping 5 secs. 71KVM_EXIT_REASON COUNT 72MSR_WRITE 13467 73HLT 5060 74PREEMPTION_TIMER 345 75EPT_MISCONFIG 264 76EXTERNAL_INTERRUPT 169 77EPT_VIOLATION 18 78PAUSE_INSTRUCTION 6 79IO_INSTRUCTION 4 80EOI_INDUCED 2 81 82# ./kvmexit.py -p 1273795 5 -a 83Display kvm exit reasons and statistics for PID 1273795 and its all threads after sleeping 5 secs. 84TID KVM_EXIT_REASON COUNT 851273819 EXTERNAL_INTERRUPT 64 861273819 HLT 2802 871273819 IO_INSTRUCTION 4 881273819 MSR_WRITE 7196 891273819 PAUSE_INSTRUCTION 2 901273819 EOI_INDUCED 2 911273819 EPT_VIOLATION 6 921273819 EPT_MISCONFIG 162 931273819 PREEMPTION_TIMER 194 941273820 EXTERNAL_INTERRUPT 78 951273820 HLT 2054 961273820 MSR_WRITE 5199 971273820 EPT_VIOLATION 2 981273820 EPT_MISCONFIG 77 991273820 PREEMPTION_TIMER 102 100 101# ./kvmexit.py -p 1273795 -v 0 102Display kvm exit reasons and statistics for PID 1273795 VCPU 0... Hit Ctrl-C to end. 103KVM_EXIT_REASON COUNT 104^CMSR_WRITE 2076 105HLT 795 106PREEMPTION_TIMER 86 107EXTERNAL_INTERRUPT 20 108EPT_MISCONFIG 10 109PAUSE_INSTRUCTION 2 110IO_INSTRUCTION 2 111EPT_VIOLATION 1 112EOI_INDUCED 1 113 114# ./kvmexit.py -p 1273795 -v 0 4 115Display kvm exit reasons and statistics for PID 1273795 VCPU 0 after sleeping 4 secs. 116KVM_EXIT_REASON COUNT 117MSR_WRITE 4726 118HLT 1827 119PREEMPTION_TIMER 78 120EPT_MISCONFIG 67 121EXTERNAL_INTERRUPT 28 122IO_INSTRUCTION 4 123EOI_INDUCED 2 124PAUSE_INSTRUCTION 2 125 126# ./kvmexit.py -p 1273795 -v 4 4 127Traceback (most recent call last): 128 File "tools/kvmexit.py", line 306, in <module> 129 raise Exception("There's no v%s for PID %d." % (tgt_vcpu, args.pid)) 130 Exception: There's no vCPU 4 for PID 1273795. 131 132# ./kvmexit.py -t 1273819 10 133Display kvm exit reasons and statistics for TID 1273819 after sleeping 10 secs. 134KVM_EXIT_REASON COUNT 135MSR_WRITE 13318 136HLT 5274 137EPT_MISCONFIG 263 138PREEMPTION_TIMER 171 139EXTERNAL_INTERRUPT 109 140IO_INSTRUCTION 8 141PAUSE_INSTRUCTION 5 142EOI_INDUCED 4 143EPT_VIOLATION 2 144 145# ./kvmexit.py -T '1273820,1273819' 146Display kvm exit reasons and statistics for TIDS ['1273820', '1273819']... Hit Ctrl-C to end. 147TIDS KVM_EXIT_REASON COUNT 148^C1273819 EXTERNAL_INTERRUPT 300 1491273819 HLT 13718 1501273819 IO_INSTRUCTION 26 1511273819 MSR_WRITE 37457 1521273819 PAUSE_INSTRUCTION 13 1531273819 EOI_INDUCED 13 1541273819 EPT_VIOLATION 53 1551273819 EPT_MISCONFIG 654 1561273819 PREEMPTION_TIMER 958 1571273820 EXTERNAL_INTERRUPT 212 1581273820 HLT 9002 1591273820 MSR_WRITE 25495 1601273820 PAUSE_INSTRUCTION 2 1611273820 EPT_VIOLATION 64 1621273820 EPT_MISCONFIG 396 1631273820 PREEMPTION_TIMER 268 164 165 166Help to understand 167================== 168 169We use a PERCPU_ARRAY: pcpuArrayA and a percpu_hash: hashA to collaboratively 170store each kvm exit reason and its count. The reason is there exists a rule when 171one vcpu exits and re-enters, it tends to continue to run on the same physical 172cpu (pcpu as follows) as the last cycle, which is also called 'cache hit'. Thus 173we turn to use a PERCPU_ARRAY to record the 'cache hit' situation to speed 174things up; and for other cases, then use a percpu_hash. 175 176BTW, we originally use a common hash to do this, with a u64(exit_reason) 177key and a struct exit_info {tgid_pid, exit_reason} value. But due to 178the big lock in bpf_hash, each updating is quite performance consuming. 179 180Now imagine here is a pid_tgidA (vcpu A) exits and is going to run on 181pcpuArrayA, the BPF code flow is as follows: 182 183 pid_tgidA keeps running on the same pcpu 184 // \\ 185 // \\ 186 // Y N \\ 187 // \\ 188 a. cache_hit b. cache_miss 189(cacheA's pid_tgid matches pid_tgidA) || 190 | || 191 | || 192 "increase percpu exit_ct and return" || 193 [*Note*] || 194 pid_tgidA ever been exited on pcpuArrayA? 195 // \\ 196 // \\ 197 // \\ 198 // Y N \\ 199 // \\ 200 b.a load_last_hashA b.b initialize_hashA_with_zero 201 \ / 202 \ / 203 \ / 204 "increase percpu exit_ct" 205 || 206 || 207 is another pid_tgid been running on pcpuArrayA? 208 // \\ 209 // Y N \\ 210 // \\ 211 b.*.a save_theLastHit_hashB do_nothing 212 \\ // 213 \\ // 214 \\ // 215 b.* save_to_pcpuArrayA 216 217 218[*Note*] we do not update the table in above "a.", in case the vcpu hit the same 219pcpu again when exits next time, instead we only update until this pcpu is not 220hitted by the same tgidpid(vcpu) again, which is in "b.*.a" and "b.*". 221 222 223USAGE message: 224============== 225 226# ./kvmexit.py -h 227usage: kvmexit.py [-h] [-p PID [-v VCPU | -a] ] [-t TID | -T 'TID1,TID2'] [duration] 228 229Display kvm_exit_reason and its statistics at a timed interval 230 231optional arguments: 232 -h, --help show this help message and exit 233 -p PID, --pid PID display process with this PID only, collpase all tids with exit reasons sorted in descending order 234 -v VCPU, --v VCPU display this VCPU only for this PID 235 -a, --alltids display all TIDS for this PID 236 -t TID, --tid TID display thread with this TID only with exit reasons sorted in descending order 237 -T 'TID1,TID2', --tids 'TID1,TID2' 238 display threads for a union like {395490, 395491} 239 duration duration of display, after sleeping several seconds 240 241examples: 242 ./kvmexit # Display kvm_exit_reason and its statistics in real-time until Ctrl-C 243 ./kvmexit 5 # Display in real-time after sleeping 5s 244 ./kvmexit -p 3195281 # Collpase all tids for pid 3195281 with exit reasons sorted in descending order 245 ./kvmexit -p 3195281 20 # Collpase all tids for pid 3195281 with exit reasons sorted in descending order, and display after sleeping 20s 246 ./kvmexit -p 3195281 -v 0 # Display only vcpu0 for pid 3195281, descending sort by default 247 ./kvmexit -p 3195281 -a # Display all tids for pid 3195281 248 ./kvmexit -t 395490 # Display only for tid 395490 with exit reasons sorted in descending order 249 ./kvmexit -t 395490 20 # Display only for tid 395490 with exit reasons sorted in descending order after sleeping 20s 250 ./kvmexit -T '395490,395491' # Display for a union like {395490, 395491} 251