Lines Matching +full:sample +full:- +full:time
1 perf-record(1)
5 ----
6 perf-record - Run a command and record its profile into perf.data
9 --------
11 'perf record' [-e <EVENT> | --event=EVENT] [-a] <command>
12 'perf record' [-e <EVENT> | --event=EVENT] [-a] \-- <command> [<options>]
15 -----------
17 from it, into perf.data - without displaying anything.
23 -------
27 -e::
28 --event=::
31 - a symbolic event name (use 'perf list' to list all events)
33 - a raw PMU event in the form of rN where N is a hexadecimal value
38 - a symbolic or raw PMU event followed by an optional colon
39 and a list of event modifiers, e.g., cpu-cycles:p. See the
40 linkperf:perf-list[1] man page for details on event modifiers.
42 - a symbolically formed PMU event like 'pmu/param1=0x3,param2/' where
46 - a symbolically formed event like 'pmu/config=M,config1=N,config3=K/'
57 - 'period': Set event sampling period
58 - 'freq': Set event sampling frequency
59 - 'time': Disable/enable time stamping. Acceptable values are 1 for
60 enabling time stamping. 0 for disabling time stamping.
62 - 'call-graph': Disable/enable callgraph. Acceptable str are "fp" for
65 - 'stack-size': user stack size for dwarf mode
66 - 'name' : User defined event name. Single quotes (') may be used to
69 - 'aux-output': Generate AUX records instead of events. This requires
71 - 'aux-action': "pause" or "resume" to pause or resume an AUX
73 "start-paused" on an AUX area event itself, will
75 - 'aux-sample-size': Set sample size for AUX area sampling. If the
76 '--aux-sample' option has been used, set aux-sample-size=0 to disable
79 See the linkperf:perf-list[1] man page for more parameters.
89 perf record -e some_event/@cfg1,@cfg2=config/ ...
96 - a hardware breakpoint event in the form of '\mem:addr[/len][:access]'
101 If you want to profile read-write accesses in 0x1000, just set
106 - a group of events surrounded by a pair of brace ("{event1,event2,...}").
108 prevent the shell interpretation. You also need to use --group on
111 --filter=<filter>::
112 Event filter. This option should follow an event selector (-e).
119 - tracepoint filters
121 In the case of tracepoints, multiple '--filter' options are combined
124 - address filters
127 address filters by specifying a non-zero value in
135 - 'filter': defines a region that will be traced.
136 - 'start': defines an address at which tracing will begin.
137 - 'stop': defines an address at which tracing will stop.
138 - 'tracestop': defines a region in which tracing will stop.
164 To see the filter that is passed, use the -v option.
172 - bpf filters
174 A BPF filter can access the sample data and make a decision based on the
175 data. Users need to set an appropriate sample type to use the BPF
178 The sample data field can be specified in lower case letter. Multiple
181 --filter 'period > 1000, cpu == 1'
183 --filter 'mem_op == load || mem_op == store, mem_lvl > l1'
191 Also user should request to collect that information (with -d option in
194 $ sudo perf record -e cycles --filter 'mem_op == load'
196 Hint: please add -d option to perf record.
204 ip, id, tid, pid, cpu, time, addr, period, txn, weight, phys_addr,
223 --exclude-perf::
225 an event selector (-e) which selects tracepoint event(s). It adds a
227 '--filter' exists, the new filter expression will be combined with
230 -a::
231 --all-cpus::
232 System-wide collection from all CPUs (default if no target is specified).
234 -p::
235 --pid=::
238 -t::
239 --tid=::
242 --inherit.
244 -u::
245 --uid=::
248 -r::
249 --realtime=::
252 --no-buffering::
255 -c::
256 --count=::
257 Event period to sample.
259 -o::
260 --output=::
263 -i::
264 --no-inherit::
267 -F::
268 --freq=::
272 See --strict-freq.
274 --strict-freq::
277 -m::
278 --mmap-pages=::
280 specification in bytes with appended unit character - B/K/M/G.
281 The size is rounded up to the nearest power-of-two page value.
286 -g::
287 Enables call-graph (stack chain/backtrace) recording for both
290 --call-graph::
291 Setup and enable call-graph (stack chain/backtrace) recording,
292 implies -g. Default is "fp" (for user space).
300 Valid options are "fp" (frame pointer), "dwarf" (DWARF's CFI -
305 --fomit-frame-pointer, using the "fp" method will produce bogus
312 doesn't work with branch stack sampling at the same time.
317 "--call-graph dwarf,4096".
322 like "--call-graph fp,32".
324 -q::
325 --quiet::
328 -v::
329 --verbose::
332 -s::
333 --stat::
334 Record per-thread event counts. Use it with 'perf report -T' to see
337 -d::
338 --data::
339 Record the sample virtual addresses.
341 --phys-data::
342 Record the sample physical addresses.
344 --data-page-size::
347 --code-page-size::
350 -T::
351 --timestamp::
352 Record the sample timestamps. Use it with 'perf report -D' to see the
355 -P::
356 --period::
357 Record the sample period.
359 --sample-cpu::
360 Record the sample cpu.
362 --sample-identifier::
363 Record the sample identifier i.e. PERF_SAMPLE_IDENTIFIER bit set in
367 -n::
368 --no-samples::
369 Don't sample.
371 -R::
372 --raw-samples::
373 Collect raw sample records from all opened counters (default for tracepoint counters).
375 -C::
376 --cpu::
378 comma-separated list with no space: 0,1. Ranges of CPUs are specified with -: 0-2.
379 In per-thread mode with inheritance mode on (default), samples are captured only when
385 -B::
386 --no-buildid::
389 the recording process to take a long time, as it needs to process all
393 pathname. You can also set the "record.build-id" config variable to
396 -N::
397 --no-buildid-cache::
400 is sufficient. You can also set the "record.build-id" config variable to
401 'no-cache' to have the same effect.
403 -G name,...::
404 --cgroup name,...::
406 in per-cpu mode. The cgroup filesystem must be mounted. All threads belonging to
410 an empty cgroup (monitor all the time) using, e.g., -G foo,,bar. Cgroups must have
413 use '-e e1 -e e2 -G foo,foo' or just use '-e e1 -e e2 -G foo'.
416 command line can be used: 'perf stat -e cycles -G cgroup_name -a -e cycles'.
418 -b::
419 --branch-any::
421 This is a shortcut for --branch-filter any. See --branch-filter for more infos.
423 -j::
424 --branch-filter::
425 Enable taken branch stack sampling. Each sample captures a series of consecutive
426 taken branches. The number of branches captured with each sample depends on the
431 - any: any type of branches
432 - any_call: any function call or system call
433 - any_ret: any function return or system call return
434 - ind_call: any indirect branch
435 - ind_jmp: any indirect jump
436 - call: direct calls, including far (to/from kernel) calls
437 - u: only when the branch target is at the user level
438 - k: only when the branch target is in the kernel
439 - hv: only when the target is at the hypervisor level
440 - in_tx: only when the target is in a hardware transaction
441 - no_tx: only when the target is not in a hardware transaction
442 - abort_tx: only when the target is a hardware transaction abort
443 - cond: conditional branches
444 - call_stack: save call stack
445 - no_flags: don't save branch flags e.g prediction, misprediction etc
446 - no_cycles: don't save branch cycles
447 - hw_index: save branch hardware index
448 - save_type: save branch type during sampling in case binary is not available later
449 For the platforms with Intel Arch LBR support (12th-Gen+ client or
450 4th-Gen Xeon+ server), the save branch type is unconditionally enabled
452 - priv: save privilege state during sampling in case binary is not available later
453 - counter: save occurrences of the event since the last branch entry. Currently, the
464 The various filters must be specified as a comma separated list: --branch-filter any_ret,u,k
467 -W::
468 --weight::
469 Enable weightened sampling. An additional weight is recorded per sample and can be
473 --namespaces::
476 --all-cgroups::
479 --transaction::
482 --per-thread::
483 Use per-thread mmaps. By default per-cpu mmaps are created. This option
484 overrides that and uses per-thread mmaps. A side-effect of that is that
485 inheritance is automatically disabled. --per-thread is ignored with a warning
486 if combined with -a or -C options.
488 -D::
489 --delay=::
490 After starting the program, wait msecs before measuring (-1: start with events
492 -D 10-20,30-40 means wait 10 msecs, enable for 10 msecs, wait 10 msecs, enable
496 -I::
497 --intr-regs::
499 each sample. List of captured registers depends on the architecture. This option
500 is off by default. It is possible to select the registers to sample using their
502 --intr-regs=\?. To name registers, pass a comma separated list such as
503 --intr-regs=ax,bx. The list of register is architecture dependent.
505 --user-regs::
506 Similar to -I, but capture user registers at sample time. To list the available
507 user registers use --user-regs=\?.
509 --running-time::
510 Record running and enabled time for read events (:S)
512 -k::
513 --clockid::
514 Sets the clock id to use for the various time fields in the perf_event_type
519 -S::
520 --snapshot::
525 - 'e': take one last snapshot on exit; guarantees that there is at least one
527 - <size>: if the PMU supports this, specify the desired snapshot size.
532 --aux-sample[=OPTIONS]::
533 Select AUX area sampling. At least one of the events selected by the -e option
535 data from the AUX area. Optionally sample size may be specified, otherwise it
538 --proc-map-timeout::
539 When processing pre-existing threads /proc/XXX/mmap, it may take a long time,
540 because the file may be huge. A time out is needed in such cases.
541 This option sets the time out limit. The default value is 500 ms.
543 --switch-events::
547 by the option --no-switch-events.
549 --vmlinux=PATH::
553 --buildid-all::
554 Record build-id of all DSOs regardless whether it's actually hit or not.
556 --buildid-mmap::
557 Record build ids in mmap2 events, disables build id cache (implies --no-buildid).
559 --aio[=n]::
564 --affinity=mode::
567 - node - thread affinity mask is set to NUMA node cpu mask of the processed mmap buffer
568 - cpu - thread affinity mask is set to cpu of the processed mmap buffer
570 --mmap-flush=number::
577 The default option value is 1 byte which means that every time that the output
579 possibly compressed (-z) and written to the output, perf.data or pipe.
586 can take less time than executing more output write syscalls with smaller data
589 -z::
590 --compression-level[=n]::
591 Produce compressed trace using specified level n (default: 1 - fastest compression,
592 22 - smallest trace)
594 --all-kernel::
597 --all-user::
600 --kernel-callchains::
604 --user-callchains::
608 Don't use both --kernel-callchains and --user-callchains at the same time or no
611 --timestamp-filename
614 --timestamp-boundary::
615 Record timestamp boundary (time of first/last samples).
617 --switch-output[=mode]::
621 - "signal" - when receiving a SIGUSR2 (default value) or
622 - <size> - when reaching the size threshold, size is expected to
623 be a number with appended unit character - B/K/M/G
624 - <time> - when reaching the time threshold, size is expected to
625 be a number with appended unit character - s/m/h/d
628 on your configuration - the number and size of your ring
629 buffers (-m). It is generally more precise for higher sizes
636 Implies --timestamp-filename, --no-buildid and --no-buildid-cache.
640 --switch-output --no-no-buildid --no-no-buildid-cache
642 --switch-output-event::
643 Events that will cause the switch of the perf.data file, auto-selecting
644 --switch-output=signal, the results are similar as internally the side band
647 Uses the same syntax as --event, it will just not be recorded, serving only to
648 switch the perf.data file as soon as the --switch-output event is processed by
655 --switch-max-files=N::
657 When rotating perf.data with --switch-output, only keep N files.
659 --dry-run::
660 Parse options then exit. --dry-run can be used to detect errors in cmdline
663 'perf record --dry-run -e' can act as a BPF script compiler if llvm.dump-obj
666 --synth=TYPE::
669 task status for pre-existing threads.
672 choice in this option. For example, --synth=no would have MMAP events for
677 - 'task' - synthesize FORK and COMM events for each task
678 - 'mmap' - synthesize MMAP events for each process (implies 'task')
679 - 'cgroup' - synthesize CGROUP events for each cgroup
680 - 'all' - synthesize all events (default)
681 - 'no' - do not synthesize any of the above events
683 --tail-synthesize::
684 Instead of collecting non-sample events (for example, fork, comm, mmap) at
686 The collected non-sample events reflects the status of the system when
689 --overwrite::
695 When '--overwrite' and '--switch-output' are used perf records and drops
701 config terms. For example: 'cycles/overwrite/' and 'instructions/no-overwrite/'.
703 Implies --tail-synthesize.
705 --kcore::
708 --max-size=<size>::
709 Limit the sample data max size, <size> is expected to be a number with
710 appended unit character - B/K/M/G
712 --num-thread-synthesize::
717 --pfm-events events::
719 including support for event filters. For example '--pfm-events
722 events cannot be mixed together. The latter must be used with the -e
723 option. The -e option and this one can be mixed and matched. Events
727 --control=fifo:ctl-fifo[,ack-fifo]::
728 --control=fd:ctl-fd[,ack-fd]::
729 ctl-fifo / ack-fifo are opened and used as ctl-fd / ack-fd as follows.
730 Listen on ctl-fd descriptor for command to control measurement.
734 - 'enable' : enable events
735 - 'disable' : disable events
736 - 'enable name' : enable event 'name'
737 - 'disable name' : disable event 'name'
738 - 'snapshot' : AUX area tracing snapshot).
739 - 'stop' : stop perf record
740 - 'ping' : ping
741 - 'evlist [-v|-g|-F] : display all events
743 -F Show just the sample frequency used for each event.
744 -v Show all fields.
745 -g Show event group information.
747 Measurements can be started with events disabled using --delay=-1 option. Optionally
748 send control command completion ('ack\n') to ack-fd descriptor to synchronize with the
757 test -p ${ctl_fifo} && unlink ${ctl_fifo}
762 test -p ${ctl_ack_fifo} && unlink ${ctl_ack_fifo}
766 perf record -D -1 -e cpu-cycles -a \
767 --control fd:${ctl_fd},${ctl_fd_ack} \
768 -- sleep 30 &
771 sleep 5 && echo 'enable' >&${ctl_fd} && read -u ${ctl_fd_ack} e1 && echo "enabled(${e1})"
772 sleep 10 && echo 'disable' >&${ctl_fd} && read -u ${ctl_fd_ack} d1 && echo "disabled(${d1})"
774 exec {ctl_fd_ack}>&-
777 exec {ctl_fd}>&-
780 wait -n ${perf_pid}
783 --threads=<spec>::
797 0,2-4/2-4:1,5-7/5-7
800 the first thread monitors CPUs 0 and 2-4 with the affinity mask 2-4,
801 the second monitors CPUs 1 and 5-7 with the affinity mask 5-7.
806 - cpu - create new data streaming thread for every monitored cpu
807 - core - create new thread to monitor CPUs grouped by a core
808 - package - create new thread to monitor CPUs grouped by a package
809 - numa - create new threed to monitor CPUs grouped by a NUMA domain
812 order not to spawn multiple per-cpu streaming threads but still avoid LOST
815 filtered through the mask provided by -C option.
817 --debuginfod[=URLs]::
826 --off-cpu::
827 Enable off-cpu profiling with BPF. The BPF program will collect
829 as sample data of a software event named "offcpu-time". The
830 sample period will have the time the task slept in nanoseconds.
836 --setup-filter=<action>::
841 include::intel-hybrid.txt[]
844 --------
845 linkperf:perf-stat[1], linkperf:perf-list[1], linkperf:perf-intel-pt[1]