Lines Matching +full:spe +full:- +full:pmu
1 perf-arm-spe(1)
5 ----
6 perf-arm-spe - Support for Arm Statistical Profiling Extension within Perf tools
9 --------
11 'perf record' -e arm_spe//
14 -----------
16 The SPE (Statistical Profiling Extension) feature provides accurate attribution of latencies and
17 events down to individual instructions. Rather than being interrupt-driven, it picks an
32 This is chosen from a sample population, for SPE this is an IMPLEMENTATION DEFINED choice of all
33 architectural instructions or all micro-ops. Sampling happens at a programmable interval. The
34 architecture provides a mechanism for the SPE driver to infer the minimum interval at which it shou…
35 sample. This minimum interval is used by the driver if no interval is specified. A pseudo-random
41 Program counter, PMU events, timings and data addresses related to the operation are recorded.
62 ----------------
64 Up until this point no decoding of the SPE data was done by either the kernel or Perf. Only when the
67 recording. These samples are the same as if normal sampling was done by Perf without using SPE,
69 just the instruction pointer, but an SPE sample can have data addresses and latency attributes.
72 -------------
74 - Sampling, rather than tracing, cuts down the profiling problem to something more manageable for
77 - Allows precise attribution data, including: Full PC of instruction, data virtual and physical
80 - Allows correlation between an instruction and events, such as TLB and cache miss. (Data source
84 However, SPE does not provide any call-graph information, and relies on statistical methods.
87 ----------
93 The 'sample_collision' PMU event can be used to determine the number of lost samples. Although this
99 -----------------------------------------
101 If an implementation samples micro-operations instead of instructions, the results of sampling must
104 For example, if a given instruction A is always converted into two micro-operations, A0 and A1, it
108 estimated from the 'sample_pop' and 'inst_retired' PMU events.
111 -------------------
123 The SPE interrupt must also be described by the firmware. If the module is loaded and KPTI is
124 disabled (or isn't required to be disabled) but the SPE PMU still doesn't show in
125 /sys/bus/event_source/devices/, then it's possible that the SPE interrupt isn't described by
128 Capturing SPE with perf command-line tools
129 ------------------------------------------
131 You can record a session with SPE samples:
133 perf record -e arm_spe// -- ./mybench
135 The sample period is set from the -c option, and because the minimum interval is used by default
141 These are placed between the // in the event and comma separated. For example '-e
144 branch_filter=1 - collect branches only (PMSFCR.B)
145 event_filter=<mask> - filter on specific events (PMSEVFR) - see bitfield description below
146 jitter=1 - use jitter to avoid resonance when sampling (PMSIRR.RND)
147 load_filter=1 - collect loads only (PMSFCR.LD)
148 min_latency=<n> - collect only samples with this latency or higher* (PMSLATFR)
149 …pa_enable=1 - collect physical address (as well as VA) of loads/stores (PMSCR.PA) - requir…
150 …pct_enable=1 - collect physical timestamp instead of virtual timestamp (PMSCR.PCT) - requir…
151 store_filter=1 - collect stores only (PMSFCR.ST)
152 ts_enable=1 - enable timestamping with value of generic timer (PMSCR.TS)
153 …discard=1 - enable SPE PMU events but don't collect sample data - see 'Discard mode' (PM…
160 bit 1 - instruction retired (i.e. omit speculative instructions)
161 bit 3 - L1D refill
162 bit 5 - TLB refill
163 bit 7 - mispredict
164 bit 11 - misaligned access
168 perf record -e arm_spe/event_filter=2/ -- ./mybench
172 perf record -e arm_spe/event_filter=0x80/ -- ./mybench
178 attributes/events of the SPE record. Because instructions can have multiple events associated with
185 21 l1d-miss
186 897 l1d-access
187 5 llc-miss
188 7 llc-access
189 2 tlb-miss
190 1K tlb-access
192 0 remote-access
199 instruction unless you want to further downsample the already sampled SPE data:
201 perf report --itrace=i1i
205 perf report --mem-mode
210 - "Cannot find PMU `arm_spe'. Missing kernel support?"
215 - "Arm SPE CONTEXT packets not found in the traces."
220 - Excessively large perf.data file size
224 PMU events
227 SPE has events that can be counted on core PMUs. These are prefixed with
231 These events will only count when an SPE event is running on the same core that
232 the PMU event is opened on, otherwise they read as 0. There are various ways to
233 ensure that the PMU event and SPE event are scheduled together depending on the
234 way the event is opened. For example opening both events as per-process events
235 on the same process, although it's not guaranteed that the PMU event is enabled
236 first when context switching. For that reason it may be better to open the PMU
237 event as a systemwide event and then open SPE on the process of interest.
242 SPE related (SAMPLE_* etc) core PMU events can be used without the overhead of
244 First run a system wide SPE session (or on the core of interest) using options
247 perf record -e arm_spe/discard/ -a -N -B --no-bpf-event -o - > /dev/null &
248 perf stat -e SAMPLE_FEED_LD
251 --------
253 linkperf:perf-record[1], linkperf:perf-script[1], linkperf:perf-report[1],
254 linkperf:perf-inject[1]