Lines Matching +full:access +full:- +full:granularity

1 .. SPDX-License-Identifier: GPL-2.0
16 object. Userland access outside of VMAs is invalid except in the case where an
31 -------
33 -------
43 -----------
45 * **mmap locks** - Each MM has a read/write semaphore :c:member:`!mmap_lock`
46 which locks at a process address space granularity which can be acquired via
48 * **VMA locks** - The VMA lock is at VMA granularity (of course) which behaves
54 * **rmap locks** - When trying to access VMAs through the reverse mapping via a
56 (reachable from a folio via :c:member:`!folio->mapping`). VMAs must be stabilised via
59 :c:func:`!i_mmap_[try]lock_write` for file-backed memory. We refer to these
72 ----------
77 * Obtain an mmap read lock at the MM granularity via :c:func:`!mmap_read_lock` (or a
81 acquire the lock atomically so might fail, in which case fall-back logic is
85 anonymous or file-backed) to obtain the required VMA.
90 * Obtain an mmap write lock at the MM granularity via :c:func:`!mmap_write_lock` (or a
118 \- \- \- N N N N
119 \- R \- Y Y N N
120 \- \- R/W Y Y N N
121 R/W \-/R \-/R/W Y Y N N
122 W W \-/R Y Y Y N
127 attempting to do the reverse is invalid as it can result in deadlock - if
148 .. note:: We exclude VMA lock-specific fields here to avoid confusion, as these
179 :c:member:`!vm_mm` Containing mm_struct. None - written once on
181 :c:member:`!vm_page_prot` Architecture-specific page table mmap write, VMA write.
184 :c:member:`!vm_flags` Read-only access to VMA flags describing N/A
188 :c:member:`!__vm_flags` Private, writable access to VMA flags mmap write, VMA write.
191 :c:member:`!vm_file` If the VMA is file-backed, points to a None - written once on
195 :c:member:`!vm_ops` If the VMA is file-backed, then either None - Written once on
196 the driver or file-system provides a initial map by
197 :c:struct:`!struct vm_operations_struct` :c:func:`!f_ops->mmap()`.
201 driver-specific metadata.
206 .. table:: Config-specific fields
215 is set or the VMA is file-backed. The
220 … to perform readahead. This field is swap-specific
227 … describes the current state of numab-specific
249 mapping is file-backed, to place the VMA i_mmap write.
251 :c:member:`!struct address_space->i_mmap`
254 interval tree if the VMA is file-backed. i_mmap write.
257 :c:member:`!vma->anon_vma` if it is
258 non-:c:macro:`!NULL`.
260 … anonymous folios mapped exclusively to setting non-:c:macro:`NULL`:
263 … by the :c:macro:`!page_table_lock`. This When non-:c:macro:`NULL` and
270 anonymous mappings, to be able to access both related :c:struct:`!struct anon_vma` objects
274 .. note:: If a file-backed mapping is mapped with :c:macro:`!MAP_PRIVATE` set
280 -----------
290 In Linux these are divided into five levels - PGD, P4D, PUD, PMD and PTE. Huge
303 1. **Traversing** page tables - Simply reading page tables in order to traverse
307 2. **Installing** page table mappings - Whether creating a new mapping or
311 3. **Zapping/unmapping** page table entries - This is what the kernel calls
318 4. **Freeing** page tables - When finally the kernel removes page tables from a
330 locks described in the terminology section above - that is the mmap lock, the
333 That is - as long as you keep the relevant VMA **stable** - you are good to go
336 serialise - see the page table implementation detail section for more details).
359 -------------
378 .. code-block::
380 inode->i_rwsem (while writing or truncating, not reading or faulting)
381 mm->mmap_lock
382 mapping->invalidate_lock (in filemap_fault)
386 mapping->i_mmap_rwsem
387 anon_vma->rwsem
388 mm->page_table_lock or pte_lock
391 mapping->private_lock (in block_dirty_folio)
393 lruvec->lru_lock (in folio_lruvec_lock_irq)
394 inode->i_lock (in set_page_dirty's __mark_inode_dirty)
395 bdi.wb->list_lock (in set_page_dirty's __mark_inode_dirty)
396 sb_lock (within inode_lock in fs/fs-writeback.c)
398 in arch-dependent flush_dcache_mmap_lock,
399 within bdi.wb->list_lock in __sync_single_inode)
401 There is also a file-system specific lock ordering comment located at the top of
404 .. code-block::
406 ->i_mmap_rwsem (truncate_pagecache)
407 ->private_lock (__free_pte->block_dirty_folio)
408 ->swap_lock (exclusive_swap_page, others)
409 ->i_pages lock
411 ->i_rwsem
412 ->invalidate_lock (acquired by fs in truncate path)
413 ->i_mmap_rwsem (truncate->unmap_mapping_range)
415 ->mmap_lock
416 ->i_mmap_rwsem
417 ->page_table_lock or pte_lock (various, mainly in memory.c)
418 ->i_pages lock (arch-dependent flush_dcache_mmap_lock)
420 ->mmap_lock
421 ->invalidate_lock (filemap_fault)
422 ->lock_page (filemap_fault, access_process_vm)
424 ->i_rwsem (generic_perform_write)
425 ->mmap_lock (fault_in_readable->do_page_fault)
427 bdi->wb.list_lock
428 sb_lock (fs/fs-writeback.c)
429 ->i_pages lock (__sync_single_inode)
431 ->i_mmap_rwsem
432 ->anon_vma.lock (vma_merge)
434 ->anon_vma.lock
435 ->page_table_lock or pte_lock (anon_vma_prepare and various)
437 ->page_table_lock or pte_lock
438 ->swap_lock (try_to_unmap_one)
439 ->private_lock (try_to_unmap_one)
440 ->i_pages lock (try_to_unmap_one)
441 ->lruvec->lru_lock (follow_page_mask->mark_page_accessed)
442 ->lruvec->lru_lock (check_pte_range->folio_isolate_lru)
443 ->private_lock (folio_remove_rmap_pte->set_page_dirty)
444 ->i_pages lock (folio_remove_rmap_pte->set_page_dirty)
445 bdi.wb->list_lock (folio_remove_rmap_pte->set_page_dirty)
446 ->inode->i_lock (folio_remove_rmap_pte->set_page_dirty)
447 bdi.wb->list_lock (zap_pte_range->set_page_dirty)
448 ->inode->i_lock (zap_pte_range->set_page_dirty)
449 ->private_lock (zap_pte_range->block_dirty_folio)
454 ------------------------------
456 ------------------------------
458 .. warning:: Locking rules for PTE-level page tables are very different from
462 --------------------------
467 * **Higher level page table locks** - Higher level page tables, that is PGD, P4D
468 and PUD each make use of the process address space granularity
469 :c:member:`!mm->page_table_lock` lock when modified.
471 * **Fine-grained page table locks** - PMDs and PTEs each have fine-grained locks
475 mapped into higher memory (if a 32-bit system) and carefully locked via
495 **must** be held, except if you can safely assume nobody can access the page
518 PTE-level page tables are different from page tables at other levels, and there
521 * On 32-bit architectures, they may be in high memory (meaning they need to be
523 * When empty, they can be unlinked and RCU-freed while holding an mmap lock or
527 So accessing PTE-level page tables requires at least holding an RCU read lock;
533 PMD entry still refers to the same PTE-level page table.
534 If the writer does not care whether it is the same PTE-level page table, it
539 To access PTE-level page tables, a helper like :c:func:`!pte_offset_map_lock` or
551 functionality like GUP-fast locklessly traverses (that is reads) page tables,
556 (for instance x86-64 does not require any special precautions).
561 can never assume that page table locks give us entirely exclusive access, and
566 functions - :c:func:`!pgdp_get`, :c:func:`!p4dp_get`, :c:func:`!pudp_get`,
577 GUP-fast (see :c:func:`!gup_fast` and its various page table level handlers like
583 by :c:func:`!set_pXX` functions - :c:func:`!set_pgd`, :c:func:`!set_p4d`,
587 as in :c:func:`!pXX_clear` functions - :c:func:`!pgd_clear`,
599 PGD, P4D or PUD, the :c:member:`!mm->page_table_lock` must be held. This is
605 references the :c:member:`!mm->page_table_lock`.
607 Allocating a PTE will either use the :c:member:`!mm->page_table_lock` or, if
615 access to entries contained within a PTE, especially when we wish to modify
620 :c:func:`!pte_lockptr` to obtain a spin lock at PTE granularity contained within
648 PTE-specific lock, and then *again* checking that the PMD entry is as expected.
663 prevent racing faults, and rmap operations), as a file-backed mapping can be
664 truncated under the :c:struct:`!struct address_space->i_mmap_rwsem` alone.
667 through the :c:struct:`!struct anon_vma->rb_root` or the :c:member:`!struct
668 address_space->i_mmap` interval trees) can have its page tables torn down.
675 that no new ones overlap these or any route remain to permit access to addresses
706 ------------------
711 VMA read locking is entirely optimistic - if the lock is contended or a competing
719 VMA read locks hold the read lock on the :c:member:`!vma->vm_lock` semaphore for
733 This ensures the semantics we require - VMA write locks provide exclusive write
734 access to the VMA.
755 Writing requires the mmap to be write-locked and the VMA lock to be acquired via
759 All this is achieved by the use of per-mm and per-VMA sequence counts, which are
760 used in order to reduce complexity, especially for operations which write-lock
763 If the mm sequence count, :c:member:`!mm->mm_lock_seq` is equal to the VMA
764 sequence count :c:member:`!vma->vm_lock_seq` then the VMA is write-locked. If
769 also increments :c:member:`!mm->mm_lock_seq` via
783 :c:member:`!vma->vm_lock` read/write semaphore and hold it, while checking that
793 On the write side, we acquire a write lock on the :c:member:`!vma->vm_lock`
801 complexity with a long-term held write lock.
804 fast RCU-based per-VMA lock acquisition (especially on page fault, though
808 ---------------------------
810 When an mmap write lock is held one has exclusive access to resources within the
829 .. list-table:: Lock exclusivity
831 :header-rows: 1
832 :stub-columns: 1
834 * -
835 - R
836 - D
837 - W
838 * - R
839 - N
840 - N
841 - Y
842 * - D
843 - N
844 - Y
845 - Y
846 * - W
847 - Y
848 - Y
849 - Y
855 ---------------