Lines Matching +full:required +full:- +full:for +full:- +full:hardware +full:- +full:jobs

11 - Portions Copyright (c) 2004-2006 Silicon Graphics, Inc.
12 - Modified by Paul Jackson <[email protected]>
13 - Modified by Christoph Lameter <[email protected]>
14 - Modified by Paul Menage <[email protected]>
15 - Modified by Hidetoshi Seto <[email protected]>
41 ----------------------
43 Cpusets provide a mechanism for assigning a set of CPUs and Memory
45 an on-line node that contains memory.
50 hooks, beyond what is already present, required to manage dynamic
54 Documentation/admin-guide/cgroup-v1/cgroups.rst.
73 ----------------------------
77 non-uniform access times (NUMA) presents additional challenges for
86 and which typically represent a larger investment for the customer,
87 can benefit from explicitly placing jobs on properly sized subsets of
93 * Servers running different applications (for instance, a web server
100 executing jobs. The location of the running jobs pages may also be moved
104 mechanisms required to efficiently implement such subsets. It
111 ---------------------------------
122 - Cpusets are sets of allowed CPUs and Memory Nodes, known to the
124 - Each task in the system is attached to a cpuset, via a pointer
126 - Calls to sched_setaffinity are filtered to just those CPUs
128 - Calls to mbind and set_mempolicy are filtered to just
130 - The root cpuset contains all the systems CPUs and Memory
132 - For any cpuset, one can define child cpusets containing a subset
134 - The hierarchy of cpusets can be mounted at /dev/cpuset, for
136 - A cpuset may be marked exclusive, which ensures that no other
139 - You can list all the tasks (by pid) attached to any cpuset.
144 - in init/main.c, to initialize the root cpuset at system boot.
145 - in fork and exit, to attach and detach a task from its cpuset.
146 - in sched_setaffinity, to mask the requested CPUs by what's
148 - in sched.c migrate_live_tasks(), to keep migrating tasks within
150 - in the mbind and set_mempolicy system calls, to mask the requested
152 - in page_alloc.c, to restrict memory to allowed nodes.
153 - in vmscan.c, to restrict page recovery to the current cpuset.
157 new system calls are added for cpusets - all support for querying and
160 The /proc/<pid>/status file for each task has four added lines,
166 Cpus_allowed_list: 0-127
168 Mems_allowed_list: 0-63
174 - cpuset.cpus: list of CPUs in that cpuset
175 - cpuset.mems: list of Memory Nodes in that cpuset
176 - cpuset.memory_migrate flag: if set, move pages to cpusets nodes
177 - cpuset.cpu_exclusive flag: is cpu placement exclusive?
178 - cpuset.mem_exclusive flag: is memory placement exclusive?
179 - cpuset.mem_hardwall flag: is memory allocation hardwalled
180 - cpuset.memory_pressure: measure of how much paging pressure in cpuset
181 - cpuset.memory_spread_page flag: if set, spread page cache evenly on allowed nodes
182 - cpuset.memory_spread_slab flag: OBSOLETE. Doesn't have any function.
183 - cpuset.sched_load_balance flag: if set, load balance within CPUs on that cpuset
184 - cpuset.sched_relax_domain_level: the searching range when migrating tasks
188 - cpuset.memory_pressure_enabled flag: compute memory_pressure?
196 a large system into nested, dynamically changeable, "soft-partitions".
202 may be re-attached to any other cpuset, if allowed by the permissions
211 - Its CPUs and Memory Nodes must be a subset of its parents.
212 - It can't be marked exclusive unless its parent is.
213 - If its cpu or memory is exclusive, they may not overlap any sibling.
219 to represent the cpuset hierarchy provides for a familiar permission
220 and name space for cpusets, with a minimum of additional kernel code.
223 read-only. The cpus file automatically tracks the value of
225 automatically tracks the value of node_states[N_MEMORY]--i.e.,
226 nodes with memory--using the cpuset_track_online_nodes() hook.
229 normally read-only copies of cpuset.cpus and cpuset.mems files
236 See Documentation/admin-guide/cgroup-v2.rst for more information about
241 --------------------------------
248 i.e. it restricts kernel allocations for page, buffer and other data
250 whether hardwalled or not, restrict allocations of memory for user
252 jobs can share common kernel data, such as file system pages, while
254 construct a large mem_exclusive cpuset to hold all the jobs, and
255 construct child, non-mem_exclusive cpusets for each individual job.
262 -----------------------------
263 The memory_pressure of a cpuset provides a simple per-cpuset metric
268 This enables batch managers monitoring jobs running in dedicated
273 submitted jobs, which may choose to terminate or re-prioritize jobs that
276 computing jobs that will dramatically fail to meet required performance
279 This mechanism provides a very economical way for the batch manager
280 to monitor a cpuset for signs of memory pressure. It's up to the
287 code of __alloc_pages() for this metric reduces to simply noticing
291 Why a per-cpuset, running average:
293 Because this meter is per-cpuset, rather than per-task or mm,
301 for a period of time.
303 Because this meter is per-cpuset rather than per-task or mm,
309 A per-cpuset simple digital filter (requires a spinlock and 3 words
310 of data per-cpuset) is kept, and updated by any task attached to that
313 A per-cpuset file provides an integer number representing the recent
314 (half-life of 10 seconds) rate of direct page reclaims caused by
320 ---------------------------
322 kernel allocates pages for the file system buffers and related in
326 If the per-cpuset boolean flag file 'cpuset.memory_spread_page' is set, then
331 If the per-cpuset boolean flag file 'cpuset.memory_spread_slab' is set,
333 such as for inodes and dentries evenly over all the nodes that the
348 Setting memory spreading causes allocations for the affected page
358 for that cpuset. If a "1" is written to that file, then that turns
363 Setting the flag 'cpuset.memory_spread_page' turns on a per-process flag
364 PFA_SPREAD_PAGE for each task that is in that cpuset or subsequently
365 joins that cpuset. The page allocation calls for the page cache
366 is modified to perform an inline check for this PFA_SPREAD_PAGE task
368 returns the node to prefer for the allocation.
375 value of a per-task rotor cpuset_mem_spread_rotor to select the next
376 node in the current task's mems_allowed to prefer for the allocation.
379 round-robin or interleave.
381 This policy can provide substantial improvements for jobs that need
384 the several nodes in the jobs cpuset in order to fit. Without this
385 policy, especially for jobs that might have one thread reading in the
386 data set, the memory allocation across the nodes in the jobs cpuset
390 --------------------------------
394 CPU will look for tasks on other more overloaded CPUs and move those
416 This default load balancing across all CPUs is not well suited for
420 If the system is managed using cpusets to place independent jobs
426 When the per-cpuset flag "cpuset.sched_load_balance" is enabled (the default
432 When the per-cpuset flag "cpuset.sched_load_balance" is disabled, then the
434 --except-- in so far as is necessary because some overlapping cpuset
437 So, for example, if the top cpuset has the flag "cpuset.sched_load_balance"
447 the top cpuset that might use non-trivial amounts of CPU, as such tasks
462 It is necessary for sched domains to be flat because load balancing
470 This mismatch is why there is not a simple one-to-one relation
482 don't leave tasks that might use non-trivial amounts of CPU in
484 constrained to some subset of the CPUs allowed to them, for lack of
492 ------------------------------------------------
494 The per-cpuset flag 'cpuset.sched_load_balance' defaults to enabled (contrary
495 to most cpuset flags.) When enabled for a cpuset, the kernel will
509 domains as it can while still providing load balancing for any set
522 - the 'cpuset.sched_load_balance' flag of a cpuset with non-empty CPUs changes,
523 - or CPUs come or go from a cpuset with this flag enabled,
524 - or 'cpuset.sched_relax_domain_level' value of a cpuset with non-empty CPUs
526 - or a cpuset with non-empty CPUs and with this flag enabled is removed,
527 - or a cpu is offlined/onlined.
530 setup - one sched domain for each element (struct cpumask) in the
537 removing the old and adding the new, for each change.
541 --------------------------------------
547 For example, if a task A running on CPU X activates another task B
562 For example, assume CPU Z is relatively far from CPU X. Even if CPU Z
566 on the next tick. For some applications in special situation, waiting
572 otherwise initial value -1 that indicates the cpuset has no request.
575 -1 no request. use system default or follow request of others.
579 3 search cpus in a node [= system wide on non-NUMA system]
586 /sys/kernel/debug/sched/domains/cpu*/domain*/ for system-specific
592 This file is per-cpuset and affect the sched domain where the cpuset
599 requests 0 and others are -1 then 0 is used.
607 - The migration costs between each cpu can be assumed considerably
608 small(for you) due to your special application's behavior or
609 special hardware support for CPU cache etc.
610 - The searching cost doesn't have impact(for you) or you can make
612 - The latency is required even it sacrifices cache hit rate etc.
617 --------------------------
626 If a cpuset has its Memory Nodes modified, then for each task attached
628 a page of memory for that task, the kernel will notice the change
629 in the task's cpuset, and update its per-task memory placement to
640 to allocate a page of memory for that task.
651 updated by the kernel, on the next allocation of a page for that task,
663 For example if the page was on the second valid node of the prior cpuset
676 with non-empty cpus. But the moving of some (or all) tasks might fail if
681 functionality for removing Memory Nodes is available, a similar exception
690 the current task's cpuset, then we relax the cpuset, and look for
697 2) mount -t cgroup -ocpuset cpuset /sys/fs/cgroup/cpuset
702 /sys/fs/cgroup/cpuset tasks file for that cpuset.
705 For example, the following sequence of commands will setup a cpuset
709 mount -t cgroup -ocpuset cpuset /sys/fs/cgroup/cpuset
713 /bin/echo 2-3 > cpuset.cpus
723 - via the cpuset file system directly, using the various cd, mkdir, echo,
725 - via the C library libcpuset.
726 - via the C library libcgroup.
728 - via the python application cset.
740 ---------------
746 # mount -t cgroup -o cpuset cpuset /sys/fs/cgroup/cpuset
749 tree of the cpusets in the system. For instance, /sys/fs/cgroup/cpuset
784 # /bin/echo 0-7 > cpuset.cpus
788 # /bin/echo 0-7 > cpuset.mems
806 Note that for legacy reasons, the "cpuset" filesystem exists as a
811 mount -t cpuset X /sys/fs/cgroup/cpuset
815 mount -t cgroup -ocpuset,noprefix X /sys/fs/cgroup/cpuset
819 ------------------------
824 # /bin/echo 1-4 > cpuset.cpus -> set cpus list to cpus 1,2,3,4
825 # /bin/echo 1,2,3,4 > cpuset.cpus -> set cpus list to cpus 1,2,3,4
830 # /bin/echo 1-4,6 > cpuset.cpus -> set cpus list to cpus 1,2,3,4,6
837 # /bin/echo "" > cpuset.cpus -> clear cpus list
840 -----------------
844 # /bin/echo 1 > cpuset.cpu_exclusive -> set flag 'cpuset.cpu_exclusive'
845 # /bin/echo 0 > cpuset.cpu_exclusive -> unset flag 'cpuset.cpu_exclusive'
848 -----------------------