transhuge.rst - OpenGrok cross reference for /linux-6.14.4/Documentation/admin-guide/mm/transhuge.rst

Lines Matching +full:many +full:- +full:to +full:- +full:one
16 But in the future it can expand to other filesystems.
26 requiring larger clear-page copy-page in page faults which is a
32 factor will affect all subsequent accesses to the memory for the whole
44    hugepages but a significant speedup already happens if only one of
46    going to run faster.
48 Modern kernels support "multi-size THP" (mTHP), which introduces the
49 ability to allocate memory in blocks that are bigger than a base page
50 but smaller than traditional PMD-size (as described above), in
51 increments of a power-of-2 number of pages. mTHP can back anonymous
52 memory (for example 16K, 32K, 64K, etc). These THPs continue to be
53 PTE-mapped, but in many cases can still provide similar benefits to
56 prominent because the size of each page isn't as huge as the PMD-sized
57 variant and there is less memory to clear in each page fault. Some
58 architectures also employ TLB compression mechanisms to squeeze more
63 THP can be enabled system wide or restricted to certain tasks or even
66 collapses sequences of basic pages into PMD-sized huge pages.
72 if compared to the reservation approach of hugetlbfs by allowing all
73 unused memory to be used as cache or other movable (or even unmovable
74 entities). It doesn't require reservation to prevent hugepage
75 allocation failures to be noticeable from userland. It allows paging
76 and all other advanced VM features to be available on the
77 hugepages. It requires no modifications for applications to take
80 Applications however can be further optimized to take advantage of
81 this feature, like for example they've been optimized before to avoid
91 possible to disable hugepages system-wide and to only have them inside
95 to eliminate any risk of wasting any precious byte of memory and to
99 risk to lose memory by using hugepages, should use
108 -------------------
112 regions (to avoid the risk of consuming more memory resources) or enabled
113 system wide. This can be achieved per-supported-THP-size with one of::
115 	echo always >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
116 	echo madvise >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
117 	echo never >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
124 	echo always >/sys/kernel/mm/transparent_hugepage/hugepages-2048kB/enabled
126 Alternatively it is possible to specify that a given hugepage size
127 will inherit the top-level "enabled" value::
129 	echo inherit >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
133 	echo inherit >/sys/kernel/mm/transparent_hugepage/hugepages-2048kB/enabled
135 The top-level setting (for use with "inherit") can be set by issuing
136 one of the following commands::
142 By default, PMD-sized hugepages have enabled="inherit" and all other
147 It's also possible to limit defrag efforts in the VM to generate
148 anonymous hugepages in case they're not immediately free to madvise
149 regions or to never try to defrag memory and simply fallback to regular
151 time to defrag memory, we would expect to gain even more by the fact we
167 	memory in an effort to allocate a THP immediately. This may be
169 	use and are willing to delay the VM start to utilise them.
173 	to reclaim pages and wake kcompactd to compact memory so that
175 	of khugepaged to then install the THP pages later.
180 	other regions will wake kswapd in the background to reclaim
181 	pages and wake kcompactd to compact memory so that THP is
190 	should be self-explanatory.
192 By default kernel tries to use huge, PMD-mappable zero page on read
193 page fault to anonymous mapping. It's possible to disable huge zero
200 allocation library) may want to know the size (in bytes) of a
201 PMD-mappable transparent hugepage::
205 All THPs at fault and collapse time will be added to _deferred_list,
207 "underused". A THP is underused if the number of zero-filled pages in
208 the THP is above max_ptes_none (see below). It is possible to disable
209 this behaviour by writing 0 to shrink_underused, and enable it by writing
210 1 to it::
215 khugepaged will be automatically started when PMD-sized THP is enabled
216 (either of the per-size anon control or the top-level control are set
217 to "always" or "madvise"), and it'll be automatically shutdown when
218 PMD-sized THP is disabled (when both the per-size anon control and the
219 top-level control are "never")
222 -------------------
225    khugepaged currently only searches for opportunities to collapse to
226    PMD-sized THP and no attempt is made to collapse to other THP
229 khugepaged runs usually at low frequency so while one may not want to
232 also possible to disable defrag in khugepaged by writing 0 or enable
238 You can also control how many pages khugepaged should scan at each
243 and how many milliseconds to wait in khugepaged between each pass (you
244 can set this to 0 to run khugepaged at 100% utilization of one core)::
248 and how many milliseconds to wait in khugepaged if there's an hugepage
249 allocation failure to throttle the next allocation attempt::
257 one 2M hugepage. Each may happen independently, or together, depending on
268 ``max_ptes_none`` specifies how many extra small pages (that are
270 of small pages into one large page::
274 A higher value leads to use additional memory for programs.
275 A lower value leads to gain less thp performance. Value of
279 ``max_ptes_swap`` specifies how many pages can be brought in from
289 ``max_ptes_shared`` specifies how many pages can be shared across multiple
300 You can change the sysfs boot time default for the top-level "enabled"
302 ``transparent_hugepage=madvise`` or ``transparent_hugepage=never`` to the
306 passing ``thp_anon=<size>[KMG],<size>[KMG]:<state>;<size>[KMG]-<size>[KMG]:<state>``,
308 supported anonymous THP)  and ``<state>`` is one of ``always``, ``madvise``,
311 For example, the following will set 16K, 32K, 64K THP to ``always``,
312 set 128K, 512K to ``inherit``, set 256K to ``madvise`` and 1M, 2M
313 to ``never``::
315 	thp_anon=16K-64K:always;128K,512K:inherit;256K:madvise;1M-2M:never
317 ``thp_anon=`` may be specified multiple times to configure all THP sizes as
319 not explicitly configured on the command line are implicitly set to
323 ``thp_anon`` is not specified, PMD_ORDER THP will default to ``inherit``.
326 is not defined within a valid ``thp_anon``, its policy will default to
329 Similarly to ``transparent_hugepage``, you can control the hugepage
331 ``transparent_hugepage_shmem=<policy>``, where ``<policy>`` is one of the
335 Similarly to ``transparent_hugepage_shmem``, you can control the default
337 ``transparent_hugepage_tmpfs=<policy>``, where ``<policy>`` is one of the
346 ``thp_shmem=`` may be specified multiple times to configure all THP sizes
348 sizes not explicitly configured on the command line are implicitly set to
352 ``thp_shmem`` is not specified, PMD_ORDER hugepage will default to
356 default to ``never``.
363 to as "multi-size THP" (mTHP). Huge pages of any size are commonly
366 While there is fine control over the huge page sizes to use for the internal
372 ------------
378     Attempt to allocate huge pages every time we need a new page;
396 ``mount -o remount,huge= /mountpoint`` works fine after mount: remounting
397 ``huge=never`` will not attempt to break up huge pages at all, just stop more
400 In addition to policies listed above, the sysfs knob
402 allocation policy of tmpfs mounts, when set to the following values:
405     For use in emergencies, to force the huge option off from
408     Force the huge option on for all - very useful for testing;
411 ----------------------
415 To control the THP allocation policy for this internal tmpfs mount, the
418 '/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/shmem_enabled'
424 per-size knob is set to 'inherit'.
430     Attempt to allocate <size> huge pages every time we need a new page;
433     Inherit the top-level "shmem_enabled" value. By default, PMD-sized hugepages
450 transparent_hugepage/hugepages-<size>kB/enabled values and tmpfs mount
451 option only affect future behavior. So to make them effective you need
452 to restart any application that could have been using hugepages. This
453 also applies to the regions registered in khugepaged.
458 The number of PMD-sized anonymous transparent huge pages currently used by the
460 To identify what applications are using PMD-sized anonymous transparent huge
461 pages, it is necessary to read ``/proc/PID/smaps`` and count the AnonHugePages
462 fields for each mapping. (Note that AnonHugePages only applies to traditional
463 PMD-sized THP for historical reasons and should have been called
466 The number of file transparent huge pages mapped to userspace is available
468 To identify what applications are mapping file transparent huge pages, it
469 is necessary to read ``/proc/PID/smaps`` and count the FilePmdMapped fields
475 There are a number of counters in ``/proc/vmstat`` that may be used to
480 	allocated and charged to handle a page fault.
484 	a range of pages to collapse into one huge page and has
485 	successfully allocated a new huge page to store the data.
488 	is incremented if a page fault fails to allocate or charge
489 	a huge page and instead falls back to using small pages.
492 	is incremented if a page fault fails to charge a huge page and
493 	instead falls back to using small pages even though the
498 	of pages that should be collapsed into one huge page but failed
507 	is incremented if a shmem huge page is attempted to be allocated
508 	but fails and instead falls back to using small pages. (Note that
513 	falls back to using small pages even though the allocation was
528 	is incremented if kernel fails to split huge
535 	going to be split under memory pressure.
555 	is incremented if kernel fails to allocate
556 	huge zero page and falls back to using small pages.
559 	is incremented every time a huge page is swapout in one
563 	is incremented if a huge page has to be split before swapout.
564 	Usually because failed to allocate some continuous swap space
567 In /sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/stats, There are
568 also individual counters for each huge page size, which can be utilized to
574 	allocated and charged to handle a page fault.
577 	is incremented if a page fault fails to allocate or charge
578 	a huge page and instead falls back to using huge pages with
582 	is incremented if a page fault fails to charge a huge page and
583 	instead falls back to using huge pages with lower orders or
587 	is incremented every time a huge page is swapped out to zswap in one
591 	is incremented every time a huge page is swapped in from a non-zswap
592 	swap device in one piece.
595 	is incremented if swapin fails to allocate or charge a huge page
596 	and instead falls back to using huge pages with lower orders or
600 	is incremented if swapin fails to charge a huge page and instead
601 	falls back to using  huge pages with lower orders or small pages
605 	is incremented every time a huge page is swapped out to a non-zswap
606 	swap device in one piece without splitting.
609 	is incremented if a huge page has to be split before swapout.
610 	Usually because failed to allocate some continuous swap space
618 	is incremented if a shmem huge page is attempted to be allocated
619 	but fails and instead falls back to using small pages.
623 	falls back to using small pages even though the allocation was
632 	is incremented if kernel fails to split huge
638         it would free up some memory. Pages on split queue are going to
654 system uses memory compaction to copy data around memory to free a
655 huge page for use. There are some counters in ``/proc/vmstat`` to help
659 	is incremented every time a process stalls to run
667 	is incremented if the system tries to compact memory
670 It is possible to establish how long the stalls were using the function
671 tracer to record how long was spent in __alloc_pages() and
672 using the mm_page_alloc tracepoint to identify which allocations were
678 To be guaranteed that the kernel will map a THP immediately in any
679 memory region, the mmap region has to be hugepage naturally
688 usual features belonging to hugetlbfs are preserved and