perf-record.txt - OpenGrok cross reference for /linux-6.14.4/tools/perf/Documentation/perf-record.txt

Lines Matching +full:spe +full:- +full:pmu
1 perf-record(1)
5 ----
6 perf-record - Run a command and record its profile into perf.data
9 --------
11 'perf record' [-e <EVENT> | --event=EVENT] [-a] <command>
12 'perf record' [-e <EVENT> | --event=EVENT] [-a] \-- <command> [<options>]
15 -----------
17 from it, into perf.data - without displaying anything.
23 -------
27 -e::
28 --event=::
29 	Select the PMU event. Selection can be:
31         - a symbolic event name	(use 'perf list' to list all events)
33         - a raw PMU event in the form of rN where N is a hexadecimal value
38         - a symbolic or raw PMU event followed by an optional colon
39 	  and a list of event modifiers, e.g., cpu-cycles:p.  See the
40 	  linkperf:perf-list[1] man page for details on event modifiers.
42 	- a symbolically formed PMU event like 'pmu/param1=0x3,param2/' where
43 	  'param1', 'param2', etc are defined as formats for the PMU in
44 	  /sys/bus/event_source/devices/<pmu>/format/*.
46 	- a symbolically formed event like 'pmu/config=M,config1=N,config3=K/'
50           corresponding entries in /sys/bus/event_source/devices/<pmu>/format/*
51           param1 and param2 are defined as formats for the PMU in:
52           /sys/bus/event_source/devices/<pmu>/format/*
54 	  There are also some parameters which are not defined in .../<pmu>/format/*.
57 	  - 'period': Set event sampling period
58 	  - 'freq': Set event sampling frequency
59 	  - 'time': Disable/enable time stamping. Acceptable values are 1 for
62 	  - 'call-graph': Disable/enable callgraph. Acceptable str are "fp" for
65 	  - 'stack-size': user stack size for dwarf mode
66 	  - 'name' : User defined event name. Single quotes (') may be used to
69 	  - 'aux-output': Generate AUX records instead of events. This requires
71 	  - 'aux-action': "pause" or "resume" to pause or resume an AUX
73 			  "start-paused" on an AUX area event itself, will
75 	  - 'aux-sample-size': Set sample size for AUX area sampling. If the
76 	  '--aux-sample' option has been used, set aux-sample-size=0 to disable
79           See the linkperf:perf-list[1] man page for more parameters.
84 	  Also not defined in .../<pmu>/format/* are PMU driver specific
87 	  to the PMU driver.  For example:
89 	  perf record -e some_event/@cfg1,@cfg2=config/ ...
91 	  will see 'cfg1' and 'cfg2=config' pushed to the PMU driver associated
94 	  understood and supported by the PMU driver.
96         - a hardware breakpoint event in the form of '\mem:addr[/len][:access]'
101           If you want to profile read-write accesses in 0x1000, just set
106 	- a group of events surrounded by a pair of brace ("{event1,event2,...}").
108 	  prevent the shell interpretation.  You also need to use --group on
111 --filter=<filter>::
112 	Event filter.  This option should follow an event selector (-e).
114 	the kernel.  If the event is a hardware trace PMU (e.g. Intel PT
119 	- tracepoint filters
121 	In the case of tracepoints, multiple '--filter' options are combined
124 	- address filters
126 	A hardware trace PMU advertises its ability to accept a number of
127 	address filters	by specifying a non-zero value in
128 	/sys/bus/event_source/devices/<pmu>/nr_addr_filters.
135 	- 'filter': defines a region that will be traced.
136 	- 'start': defines an address at which tracing will begin.
137 	- 'stop': defines an address at which tracing will stop.
138 	- 'tracestop': defines a region in which tracing will stop.
164 	To see the filter that is passed, use the -v option.
172 	- bpf filters
181 	  --filter 'period > 1000, cpu == 1'
183 	  --filter 'mem_op == load || mem_op == store, mem_lvl > l1'
191 	Also user should request to collect that information (with -d option in
194 	  $ sudo perf record -e cycles --filter 'mem_op == load'
196 	   Hint: please add -d option to perf record.
223 --exclude-perf::
225 	an event selector (-e) which selects tracepoint event(s). It adds a
227 	'--filter' exists, the new filter expression will be combined with
230 -a::
231 --all-cpus::
232         System-wide collection from all CPUs (default if no target is specified).
234 -p::
235 --pid=::
238 -t::
239 --tid=::
242         --inherit.
244 -u::
245 --uid=::
248 -r::
249 --realtime=::
252 --no-buffering::
255 -c::
256 --count=::
259 -o::
260 --output=::
263 -i::
264 --no-inherit::
267 -F::
268 --freq=::
272 	See --strict-freq.
274 --strict-freq::
277 -m::
278 --mmap-pages=::
280 	specification in bytes with appended unit character - B/K/M/G.
281 	The size is rounded up to the nearest power-of-two page value.
286 -g::
287 	Enables call-graph (stack chain/backtrace) recording for both
290 --call-graph::
291 	Setup and enable call-graph (stack chain/backtrace) recording,
292 	implies -g.  Default is "fp" (for user space).
300 	Valid options are "fp" (frame pointer), "dwarf" (DWARF's CFI -
305 	--fomit-frame-pointer, using the "fp" method will produce bogus
317 	"--call-graph dwarf,4096".
322 	like "--call-graph fp,32".
324 -q::
325 --quiet::
328 -v::
329 --verbose::
332 -s::
333 --stat::
334 	Record per-thread event counts.  Use it with 'perf report -T' to see
337 -d::
338 --data::
341 --phys-data::
344 --data-page-size::
347 --code-page-size::
350 -T::
351 --timestamp::
352 	Record the sample timestamps. Use it with 'perf report -D' to see the
355 -P::
356 --period::
359 --sample-cpu::
362 --sample-identifier::
367 -n::
368 --no-samples::
371 -R::
372 --raw-samples::
375 -C::
376 --cpu::
378 comma-separated list with no space: 0,1. Ranges of CPUs are specified with -: 0-2.
379 In per-thread mode with inheritance mode on (default), samples are captured only when
385 -B::
386 --no-buildid::
393 pathname. You can also set the "record.build-id" config variable to
396 -N::
397 --no-buildid-cache::
400 is sufficient.  You can also set the "record.build-id" config variable to
401 'no-cache' to have the same effect.
403 -G name,...::
404 --cgroup name,...::
406 in per-cpu mode. The cgroup filesystem must be mounted. All threads belonging to
410 an empty cgroup (monitor all the time) using, e.g., -G foo,,bar. Cgroups must have
413 use '-e e1 -e e2 -G foo,foo' or just use '-e e1 -e e2 -G foo'.
416 command line can be used: 'perf stat -e cycles -G cgroup_name -a -e cycles'.
418 -b::
419 --branch-any::
421 This is a shortcut for --branch-filter any. See --branch-filter for more infos.
423 -j::
424 --branch-filter::
431         - any:  any type of branches
432         - any_call: any function call or system call
433         - any_ret: any function return or system call return
434         - ind_call: any indirect branch
435         - ind_jmp: any indirect jump
436         - call: direct calls, including far (to/from kernel) calls
437         - u:  only when the branch target is at the user level
438         - k: only when the branch target is in the kernel
439         - hv: only when the target is at the hypervisor level
440 	- in_tx: only when the target is in a hardware transaction
441 	- no_tx: only when the target is not in a hardware transaction
442 	- abort_tx: only when the target is a hardware transaction abort
443 	- cond: conditional branches
444 	- call_stack: save call stack
445 	- no_flags: don't save branch flags e.g prediction, misprediction etc
446 	- no_cycles: don't save branch cycles
447 	- hw_index: save branch hardware index
448 	- save_type: save branch type during sampling in case binary is not available later
449 		     For the platforms with Intel Arch LBR support (12th-Gen+ client or
450 		     4th-Gen Xeon+ server), the save branch type is unconditionally enabled
452 	- priv: save privilege state during sampling in case binary is not available later
453 	- counter: save occurrences of the event since the last branch entry. Currently, the
464 The various filters must be specified as a comma separated list: --branch-filter any_ret,u,k
467 -W::
468 --weight::
473 --namespaces::
476 --all-cgroups::
479 --transaction::
482 --per-thread::
483 Use per-thread mmaps.  By default per-cpu mmaps are created.  This option
484 overrides that and uses per-thread mmaps.  A side-effect of that is that
485 inheritance is automatically disabled.  --per-thread is ignored with a warning
486 if combined with -a or -C options.
488 -D::
489 --delay=::
490 After starting the program, wait msecs before measuring (-1: start with events
492 -D 10-20,30-40 means wait 10 msecs, enable for 10 msecs, wait 10 msecs, enable
496 -I::
497 --intr-regs::
502 --intr-regs=\?. To name registers, pass a comma separated list such as
503 --intr-regs=ax,bx. The list of register is architecture dependent.
505 --user-regs::
506 Similar to -I, but capture user registers at sample time. To list the available
507 user registers use --user-regs=\?.
509 --running-time::
512 -k::
513 --clockid::
519 -S::
520 --snapshot::
525   - 'e': take one last snapshot on exit; guarantees that there is at least one
527   - <size>: if the PMU supports this, specify the desired snapshot size.
532 --aux-sample[=OPTIONS]::
533 Select AUX area sampling. At least one of the events selected by the -e option
538 --proc-map-timeout::
539 When processing pre-existing threads /proc/XXX/mmap, it may take a long time,
543 --switch-events::
545 PERF_RECORD_SWITCH_CPU_WIDE. In some cases (e.g. Intel PT, CoreSight or Arm SPE)
547 by the option --no-switch-events.
549 --vmlinux=PATH::
553 --buildid-all::
554 Record build-id of all DSOs regardless whether it's actually hit or not.
556 --buildid-mmap::
557 Record build ids in mmap2 events, disables build id cache (implies --no-buildid).
559 --aio[=n]::
564 --affinity=mode::
567   - node - thread affinity mask is set to NUMA node cpu mask of the processed mmap buffer
568   - cpu  - thread affinity mask is set to cpu of the processed mmap buffer
570 --mmap-flush=number::
579 possibly compressed (-z) and written to the output, perf.data or pipe.
589 -z::
590 --compression-level[=n]::
591 Produce compressed trace using specified level n (default: 1 - fastest compression,
592 22 - smallest trace)
594 --all-kernel::
597 --all-user::
600 --kernel-callchains::
604 --user-callchains::
608 Don't use both --kernel-callchains and --user-callchains at the same time or no
611 --timestamp-filename
614 --timestamp-boundary::
617 --switch-output[=mode]::
621   - "signal" - when receiving a SIGUSR2 (default value) or
622   - <size>   - when reaching the size threshold, size is expected to
623                be a number with appended unit character - B/K/M/G
624   - <time>   - when reaching the time threshold, size is expected to
625                be a number with appended unit character - s/m/h/d
628                on your configuration  - the number and size of  your  ring
629                buffers (-m). It is generally more precise for higher sizes
636 Implies --timestamp-filename, --no-buildid and --no-buildid-cache.
640   --switch-output --no-no-buildid  --no-no-buildid-cache
642 --switch-output-event::
643 Events that will cause the switch of the perf.data file, auto-selecting
644 --switch-output=signal, the results are similar as internally the side band
647 Uses the same syntax as --event, it will just not be recorded, serving only to
648 switch the perf.data file as soon as the --switch-output event is processed by
655 --switch-max-files=N::
657 When rotating perf.data with --switch-output, only keep N files.
659 --dry-run::
660 Parse options then exit. --dry-run can be used to detect errors in cmdline
663 'perf record --dry-run -e' can act as a BPF script compiler if llvm.dump-obj
666 --synth=TYPE::
669 task status for pre-existing threads.
672 choice in this option.  For example, --synth=no would have MMAP events for
677   - 'task'    - synthesize FORK and COMM events for each task
678   - 'mmap'    - synthesize MMAP events for each process (implies 'task')
679   - 'cgroup'  - synthesize CGROUP events for each cgroup
680   - 'all'     - synthesize all events (default)
681   - 'no'      - do not synthesize any of the above events
683 --tail-synthesize::
684 Instead of collecting non-sample events (for example, fork, comm, mmap) at
686 The collected non-sample events reflects the status of the system when
689 --overwrite::
695 When '--overwrite' and '--switch-output' are used perf records and drops
701 config terms. For example: 'cycles/overwrite/' and 'instructions/no-overwrite/'.
703 Implies --tail-synthesize.
705 --kcore::
708 --max-size=<size>::
710 appended unit character - B/K/M/G
712 --num-thread-synthesize::
717 --pfm-events events::
718 Select a PMU event using libpfm4 syntax (see http://perfmon2.sf.net)
719 including support for event filters. For example '--pfm-events
722 events cannot be mixed together. The latter must be used with the -e
723 option. The -e option and this one can be mixed and matched.  Events
727 --control=fifo:ctl-fifo[,ack-fifo]::
728 --control=fd:ctl-fd[,ack-fd]::
729 ctl-fifo / ack-fifo are opened and used as ctl-fd / ack-fd as follows.
730 Listen on ctl-fd descriptor for command to control measurement.
734   - 'enable'           : enable events
735   - 'disable'          : disable events
736   - 'enable name'      : enable event 'name'
737   - 'disable name'     : disable event 'name'
738   - 'snapshot'         : AUX area tracing snapshot).
739   - 'stop'             : stop perf record
740   - 'ping'             : ping
741   - 'evlist [-v|-g|-F] : display all events
743                          -F  Show just the sample frequency used for each event.
744                          -v  Show all fields.
745                          -g  Show event group information.
747 Measurements can be started with events disabled using --delay=-1 option. Optionally
748 send control command completion ('ack\n') to ack-fd descriptor to synchronize with the
757  test -p ${ctl_fifo} && unlink ${ctl_fifo}
762  test -p ${ctl_ack_fifo} && unlink ${ctl_ack_fifo}
766  perf record -D -1 -e cpu-cycles -a               \
767              --control fd:${ctl_fd},${ctl_fd_ack} \
768              -- sleep 30 &
771  sleep 5  && echo 'enable' >&${ctl_fd} && read -u ${ctl_fd_ack} e1 && echo "enabled(${e1})"
772  sleep 10 && echo 'disable' >&${ctl_fd} && read -u ${ctl_fd_ack} d1 && echo "disabled(${d1})"
774  exec {ctl_fd_ack}>&-
777  exec {ctl_fd}>&-
780  wait -n ${perf_pid}
783 --threads=<spec>::
797     0,2-4/2-4:1,5-7/5-7
800 the first thread monitors CPUs 0 and 2-4 with the affinity mask 2-4,
801 the second monitors CPUs 1 and 5-7 with the affinity mask 5-7.
806     - cpu    - create new data streaming thread for every monitored cpu
807     - core   - create new thread to monitor CPUs grouped by a core
808     - package - create new thread to monitor CPUs grouped by a package
809     - numa   - create new threed to monitor CPUs grouped by a NUMA domain
812 order not to spawn multiple per-cpu streaming threads but still avoid LOST
815 filtered through the mask provided by -C option.
817 --debuginfod[=URLs]::
826 --off-cpu::
827 	Enable off-cpu profiling with BPF.  The BPF program will collect
829 	as sample data of a software event named "offcpu-time".  The
836 --setup-filter=<action>::
841 include::intel-hybrid.txt[]
844 --------
845 linkperf:perf-stat[1], linkperf:perf-list[1], linkperf:perf-intel-pt[1]