Lines Matching +full:quality +full:- +full:of +full:- +full:service
1 .. SPDX-License-Identifier: GPL-2.0
19 Modern processors are generally able to enter states in which the execution of
21 memory or executed. Those states are the *idle* states of the processor.
23 Since part of the processor hardware is not used in idle states, entering them
27 CPU idle time management is an energy-efficiency feature concerned about using
28 the idle states of processors for this purpose.
31 ------------
34 is the part of the kernel responsible for the distribution of computational
37 software as individual single-core processors. In other words, a CPU is an
42 First, if the whole processor can only follow one sequence of instructions (one
46 Second, if the processor is multi-core, each core in it is able to follow at
47 least one program at a time. The cores need not be entirely independent of each
48 other (for example, they may share caches), but still most of the time they
49 work physically in parallel with each other, so if each of them executes only
50 one program, those programs run mostly independently of each other at the same
54 that the core belongs to (in fact, it may apply to an entire hierarchy of larger
55 units containing the core). Namely, if all of the cores in the larger unit
61 Finally, each core in a multi-core processor may be able to follow more than one
65 the cores present themselves to software as "bundles" each consisting of
66 multiple individual single-core "processors", referred to as *hardware threads*
67 (or hyper-threads specifically on Intel hardware), that each can follow one
68 sequence of instructions. Then, the hardware threads are CPUs from the CPU idle
70 by one of them, the hardware thread (or CPU) that asked for it is stopped, but
71 nothing more happens, unless all of the other hardware threads within the same
78 ---------
84 Tasks are the CPU scheduler's representation of work. Each task consists of a
85 sequence of instructions to execute, or code, data to be manipulated while
94 assigns it to one of the available CPUs to run and if there are no more runnable
103 in Linux idle CPUs run the code of the "idle" task called *the idle loop*. That
104 code may cause the processor to be put into one of its idle states, if they are
107 next wakeup event, or there are strict latency constraints preventing any of the
112 .. _idle-loop:
117 The idle loop code takes two major steps in every iteration of it. First, it
124 The role of the governor is to find an idle state most suitable for the
126 asked to enter by logical CPUs are represented in an abstract way independent of
127 the platform or the processor architecture and organized in a one-dimensional
130 time. This allows ``CPUIdle`` governors to be independent of the underlying
134 taken into account by the governor, the *target residency* and the (worst-case)
137 substantial), in order to save more energy than it would save by entering one of
138 the shallower idle states instead. [The "depth" of an idle state roughly
147 There are two types of information that can influence the governor's decisions.
148 First of all, the governor knows the time until the closest timer event. That
152 and exit it. However, the CPU may be woken up by a non-timer event at any time
162 There are four ``CPUIdle`` governors available, ``menu``, `TEO <teo-gov_>`_,
163 ``ladder`` and ``haltpoll``. Which of them is used by default depends on the
164 configuration of the kernel and in particular on whether or not the scheduler
165 tick can be `stopped by the idle loop <idle-cpus-and-tick_>`_. Available
167 can be changed at runtime. The name of the ``CPUIdle`` governor currently
175 majority of Intel platforms, ``intel_idle`` and ``acpi_idle``, one with
179 decision on which one of them to use has to be made early (on Intel platforms
181 reason or if it does not recognize the processor). The name of the ``CPUIdle``
186 .. _idle-cpus-and-tick:
192 the time sharing strategy of the CPU scheduler. Of course, if there are
196 given a slice of the CPU time to run its code, subject to the scheduling class,
198 switched over to running (the code of) another task. The currently running task
200 is there to make the switch happen regardless. That is not the only role of the
205 configuration, the length of the tick period is between 1 ms and 10 ms).
208 the tick period length. Moreover, in that case the idle duration of any CPU
215 of the CPU time on them is the idle loop. Since the time of an idle CPU need
223 (non-tick) timer due to trigger within the tick range, stopping the tick clearly
224 would be a waste of time, even though the timer hardware may not need to be
225 reprogrammed in that case. Second, if the governor is expecting a non-timer
230 state then, as that would contradict its own expectation of a wakeup in short
232 waste of time and in this case the timer hardware would need to be reprogrammed,
234 does not occur any time soon, the hardware may spend indefinite amount of time
235 in the shallow idle state selected by the governor, which will be a waste of
236 energy. Hence, if the governor is expecting a wakeup of any kind within the
243 stopped already (in one of the previous iterations of the loop), it is better
247 loop altogether. That can be done through the build-time configuration of it
249 ``nohz=off`` to it in the command line. In both cases, as the stopping of the
255 generally regarded as more energy-efficient than the systems running kernels in
261 .. _menu-gov:
267 It is quite complex, but the basic principle of its design is straightforward.
275 and variance of them. If the variance is small (smaller than 400 square
278 interval" value. Otherwise, the longest of the saved observed idle duration
280 Again, if the variance of them is small (in the above sense), the average is
303 falls into to obtain an approximation of the predicted idle duration that is
304 compared to the "typical interval" determined previously and the minimum of
312 Now, the governor is ready to walk the list of idle states and choose one of
313 them. For this purpose, it compares the target residency of each state with
314 the predicted idle duration and the exit latency of it with the with the latency
315 limit coming from the power management quality of service, or `PM QoS <cpu-pm-qos_>`_,
321 if it has not decided to `stop the scheduler tick <idle-cpus-and-tick_>`_. That
323 the tick has not been stopped already (in a previous iteration of the idle
330 .. _teo-gov:
337 <menu-gov_>`_: it always tries to find the deepest idle state suitable for the
340 .. kernel-doc:: drivers/cpuidle/governors/teo.c
341 :doc: teo-description
343 .. _idle-states-representation:
345 Representation of Idle States
348 For the CPU idle time management purposes all of the physical idle states
349 supported by the processor have to be represented as a one-dimensional array of
351 the processor hardware to enter an idle state of certain properties. If there
352 is a hierarchy of units in the processor, one |struct cpuidle_state| object can
353 cover a combination of idle states supported by the units at different levels of
355 of it <idle-loop_>`_, must reflect the properties of the idle state at the
356 deepest level (i.e. the idle state of the unit containing all of the other
362 enter a specific idle state of its own (say "MX") if the other core is in idle
367 Then, the target residency of the |struct cpuidle_state| object representing
368 idle state "X" must reflect the minimum time to spend in idle state "MX" of
371 that state. Analogously, the exit latency parameter of that object must cover
372 the exit time of idle state "MX" of the module (and usually its entry time too),
378 There are processors without direct coordination between different levels of the
379 hierarchy of units inside them, however. In those cases asking for an idle
382 handling of the hierarchy. Then, the definition of the idle state objects is
383 entirely up to the driver, but still the physical properties of the idle state
386 latency of that idle state must not exceed the exit latency parameter of the
395 statistics of the given idle state. That information is exposed by the kernel
400 CPU at the initialization time. That directory contains a set of subdirectories
401 called :file:`state0`, :file:`state1` and so on, up to the number of idle state
402 objects defined for the given CPU minus one. Each of these directories
404 deeper the (effective) idle state represented by it. Each of them contains
405 a number of files (attributes) representing the properties of the idle state
409 Total number of times this idle state had been asked for, but the
414 Total number of times this idle state had been asked for, but certainly
419 Description of the idle state.
425 The default status of this state, "enabled" or "disabled".
428 Exit latency of the idle state in microseconds.
431 Name of the idle state.
438 Target residency of the idle state in microseconds.
445 Total number of times the hardware has been asked by the given CPU to
449 Total number of times a request to enter this idle state on the given
462 asked for by the other CPUs, so it must be disabled for all of them in order to
463 never be asked for by any of them. [Note that, due to the way the ``ladder``
468 this particular CPU, but it still may be disabled for some or all of the other
476 objects representing combinations of idle states at different levels of the
477 hierarchy of units in the processor, and it generally is hard to obtain idle
485 this idle state and entered a shallower one instead of it (or even it did not
487 asking the hardware to enter an idle state and the subsequent wakeup of the CPU
489 Moreover, if the idle state object in question represents a combination of idle
490 states at different levels of the hierarchy of units in the processor,
499 and :file:`rejected` files report the number of times the given idle state
502 .. _cpu-pm-qos:
504 Power Management Quality of Service for CPUs
507 The power management quality of service (PM QoS) framework in the Linux kernel
509 energy-efficiency features of the kernel to prevent performance from dropping
514 individual CPUs. Kernel code (e.g. device drivers) can set both of them with
515 the help of special internal interfaces provided by the PM QoS framework. User
518 signed 32-bit integer) to it. In turn, the resume latency constraint for a CPU
520 32-bit integer) to the :file:`power/pm_qos_resume_latency_us` file under
529 framework maintains a list of requests that have been made so far for the
535 PM QoS request to be created and added to a global priority list of CPU latency
540 used to determine the new effective value of the entire list of requests and
543 affected by it, which is the case if it is the minimum of the requested values
553 with that file descriptor to be removed from the global priority list of CPU
561 this single PM QoS request to be updated regardless of which user space
564 to avoid confusion. [Arguably, the only legitimate use of this mechanism in
569 CPU in question every time the list of requests is updated this way or another
572 CPU idle time governors are expected to regard the minimum of the global
574 the given CPU as the upper limit for the exit latency of the idle states that
583 `disabled for individual CPUs <idle-states-representation_>`_, there are kernel
592 That default mechanism usually is the least common denominator for all of the
594 however, so it is rather crude and not very energy-efficient. For this reason,
599 the name of an available governor (e.g. ``cpuidle.governor=menu``) and that
600 governor will be used instead of the default one. It is possible to force
610 and ``idle=nomwait``. The first two of them disable the ``acpi_idle`` and
614 which of the two parameters is added to the kernel command line. In the
616 instruction of the CPUs (which, as a rule, suspends the execution of the program
619 more or less "lightweight" sequence of instructions in a tight loop. [Note
621 CPUs from saving almost any energy at all may not be the only effect of it.
623 P-states (see |cpufreq|) that require any number of CPUs in a package to be
624 idle, so it very well may hurt single-thread computations performance as well as
625 energy-efficiency. Thus using it for performance reasons may not be a good idea
628 The ``idle=nomwait`` option prevents the use of ``MWAIT`` instruction of
630 driver will use the ``HLT`` instruction instead of ``MWAIT``. On systems
632 and forces the use of the ``acpi_idle`` driver instead. Note that in either
636 In addition to the architecture-level kernel command line options affecting CPU
640 where ``<n>`` is an idle state index also used in the name of the given
642 `Representation of Idle States <idle-states-representation_>`_), causes the
643 ``intel_idle`` and ``acpi_idle`` drivers, respectively, to discard all of the
645 for any of those idle states or expose them to the governor. [The behavior of
650 Also, the ``acpi_idle`` driver is part of the ``processor`` kernel module that