Lines Matching full:online

18 XFS Online Fsck Design
21 This document captures the design of the online filesystem check feature for
25 - To help kernel distributors understand exactly what the XFS online fsck
34 As the online fsck code is merged, the links in this document to topic branches
42 Parts 2 and 3 present a high level overview of how online fsck process works
49 might be built atop online fsck.
112 Each kernel patchset adding an online repair function will use the same branch
118 The online fsck tool described here will be the third tool in the history of
162 while the filesystem is online.
188 | The userspace driver program for the new online fsck tool can be |
190 | The kernel portion of online fsck that validates metadata is called |
191 | "online scrub", and portion of the kernel that fixes metadata is called |
192 | "online repair". |
208 In summary, online fsck takes advantage of resource sharding and redundant
217 Because it is necessary for online fsck to lock and scan live metadata objects,
218 online fsck consists of three separate code components.
224 and repair each type of online fsck work item.
229 | For brevity, this document shortens the phrase "online fsck work |
240 In principle, online fsck should be able to check and to repair everything that
242 However, online fsck cannot be running 100% of the time, which means that
247 A second limitation of online fsck is that it must follow the same resource
251 In other words, online fsck is not a complete replacement for offline fsck, and
252 a complete run of online fsck may take longer than online fsck.
254 different motivations of online fsck, which are to **minimize system downtime**
269 discover the online fsck capabilities of the kernel, and open the
312 to online fsck; neither of the previous tools have this capability.
403 Despite these limitations, the advantage that online repair holds is clear:
425 but are only needed for online fsck or for reorganization of the filesystem.
460 Introducing concurrency helps online repair avoid various locking problems, but
488 the new index already online.
498 To minimize changes to the rest of the codebase, XFS online repair keeps the
544 sections 2.12 ("Online Index Operations") through 2.14 ("Incremental View
549 Since quotas are non-negative integer counts of resource usage, online
558 Each online fsck function will be discussed as case studies later in this
564 During the development of online fsck, several risk factors were identified
574 reduces the ability of online fsck to find inconsistencies and repair them.
593 render the filesystem unusable, the online repair functions have been
597 - **Misbehavior**: Online fsck requires many privileges -- raw IO to block
639 With ample hardware availability in mind, the testing strategy for the online
652 This improves code quality by enabling the authors of online fsck to find and
659 Even before development work began on online fsck, fstests (when run on XFS)
664 During development of the online checking code, fstests was modified to run
668 To start development of online repair, fstests was modified to run
671 after it exists, or trigger complaints from the online check.
673 To complete the first phase of development of online repair, fstests was
675 This enables a comparison of the effectiveness of online repair as compared to
683 Before development of online fsck even began, a set of fstests were created
695 This part of the test suite was extended to cover online fsck in exactly the
708 3. Online repair (``xfs_scrub``) to detect and fix
713 The testing plan for online fsck includes extending the existing fs testing
747 4. Online checking (``xfs_scrub -n``)
748 5. Online repair (``xfs_scrub``)
749 … 6. Both repair tools (``xfs_scrub`` and then ``xfs_repair`` if online repair doesn't succeed)
763 allow the online fsck developers to compare online fsck against offline fsck,
777 A unique requirement to online fsck is the ability to operate on a filesystem
780 impact on the running system, the online repair code should never introduce
812 The primary user of online fsck is the system administrator, just like offline
814 Online fsck presents two modes of operation to administrators:
815 A foreground CLI process for online fsck on demand, and a background service
847 run online fsck automatically on weekends by default.
903 service window to run the online repair tool to correct the problem.
990 enabling online fsck and other requested functionality such as free space
1035 Online filesystem checking judges the consistency of each primary metadata
1041 what online checking can consult.
1131 Every online fsck scrubbing function is expected to read every ondisk metadata
1207 The XFS btree code has keyspace scanning functions that online fsck uses to
1507 and correction in the online and offline checking tools.
1509 Eventual Consistency vs. Online Fsck
1517 online checking must coordinate with chained operations that are in progress to
1519 Furthermore, online repair must not run when operations are pending because
1523 Only online fsck has this requirement of total consistency of AG metadata, and
1525 Online fsck coordinates with transaction chains as follows:
1532 * When online fsck wants to examine an AG, it should lock the AG header
1538 This may lead to online fsck taking a long time to complete, but regular
1549 Midway through the development of online scrubbing, the fsstress tests
1550 uncovered a misinteraction between online fsck and compound transaction chains
1665 However, online fsck changes the rules -- remember that although physical
1698 3. Teach online fsck to walk all transactions waiting for whichever lock(s)
1710 Online fsck uses an atomic intent item counter and lock cycling to coordinate
1721 is an explicit deprioritization of online fsck to benefit file operations.
1768 Online fsck for XFS separates the regular filesystem from the checking and
1770 However, there are a few parts of online fsck (such as the intent drains, and
1771 later, live update hooks) where it is useful for the online fsck code to know
1773 Since it is not expected that online fsck will be constantly running in the
1775 these hooks when online fsck is compiled into the kernel but not actively
1781 replace a static branch to hook code with ``nop`` sleds when online fsck isn't
1787 When online fsck enables the static key, the sled is replaced with an
1790 program that invoked online fsck, and can be amortized if multiple threads
1791 enter online fsck at the same time, or if multiple filesystems are being
1794 CPU initialization requires memory allocation, online fsck must be careful not
1817 distributor turns off online fsck at build time.
1827 Online scrub has resource acquisition helpers (e.g. ``xchk_perag_lock``) to
1846 Some online checking functions work by scanning the filesystem to build a
1849 For online repair to rebuild a metadata structure, it must compute the record
1869 At any given time, online fsck does not need to keep the entire record set in
1871 Continued development of online fsck demonstrated that the ability to perform
1883 | The first edition of online repair inserted records into a new btree as |
1912 to share functionality between online fsck functions.
1922 error as an out of memory error. For online repair, squashing error conditions
1930 Online fsck must not drive the system into OOM conditions, which means that
2002 Array access patterns in online fsck tend to fall into three categories.
2080 During the fourth demonstration of online repair, a community reviewer remarked
2081 that for performance reasons, online repair ought to load batches of records
2199 Given that indexed lookups of scan data is required for both strategies, online
2264 An online fsck function that wants to create an xfbtree should proceed as
2336 As mentioned previously, early iterations of online repair built new btree
2349 To prepare for online fsck, each of the four bulk loaders were studied, notes
2561 Online repair functions minimize the chances of this occurring by using very
2770 Whenever online fsck builds a new data structure to replace one that is
2781 As part of a repair, online fsck relies heavily on the reverse mapping records
2843 As stated earlier, online repair functions use very large transactions to
3004 There is a very high potential for cache coherency issues if online fsck is not
3007 When online fsck wants to open a damaged file for scrubbing, it must use
3080 online fsck to check them, since there is no way to quiesce a percpu counter
3082 Although online fsck can read the filesystem metadata to compute the correct
3086 Earlier versions of online scrub would return to userspace with an incomplete
3092 To satisfy this requirement, online fsck must prevent other programs in the
3159 Like every other type of online repair, repairs are made by writing those
3163 Therefore, online fsck must build the infrastructure to manage a live scan of
3254 Online fsck functions scan all files in the filesystem as follows:
3285 `online quotacheck
3351 However, online fsck differs from regular XFS operations because it may examine
3355 The next few sections detail the specific ways in which online fsck takes care
3380 To capture these nuances, the online fsck code has a separate ``xchk_irele``
3407 Online fsck cannot abide these conventions, because for a directory tree
3414 Solving both of these problems is straightforward -- any time online fsck
3421 However, trylock loops means that online fsck must be prepared to measure the
3431 Online fsck must verify that the dotdot dirent of a directory points up to a
3456 The second piece of support that online fsck functions need during a full
3460 Two pieces of Linux kernel infrastructure enable online fsck to monitor regular
3465 In this case, the downstream consumer is always an online fsck function.
3466 Because multiple fsck functions can run in parallel, online fsck uses the Linux
3503 - The online fsck function should define a structure to hold scan data, a lock
3508 - The online fsck code must contain a C function to catch the hook action code
3513 - Prior to unlocking inodes to start the scan, online fsck must call
3517 - Online fsck must call ``xfs_hooks_del`` to disable the hook once the scan is
3522 zero when online fsck is not running.
3529 The code paths of the online fsck scanning code and the :ref:`hooked<fshooks>`
3607 It is useful to compare the mount time quotacheck code to the online repair
3625 Like most online fsck functions, online quotacheck can't write to regular
3628 Therefore, online quotacheck records file resource usage to a shadow dquot
3656 For online quotacheck, hooks are placed in steps 2 and 4.
3690 `online quotacheck
3853 Therefore, online repair of file-based metadata createas a temporary file in
3863 This dependency is the reason why online repair can only use pageable kernel
3910 | requirement means that online repair would have to be able to perform |
3943 Online repair code should use the ``xrep_tempfile_create`` function to create a
3988 for online repair because:
3995 b. Reverse-mapping is critical for the operation of online fsck, so the old
4004 d. Online repair needs to swap the contents of two files that are by definition
4250 referential integrity, so prior to performing the mapping exchange, online
4260 However, this iunlink processing omits the cross-link detection of online
4266 To repair a metadata file, online repair proceeds as follows:
4388 The best that online repair can do at this time is to read directory data
4477 Both online and offline repair can use this strategy.
4623 Online reconstruction of a file's parent pointer information works similarly to
4809 However, one of online repair's design goals is to avoid locking the entire
4925 Without parent pointers, the directory parent pointer online scrub code can
5028 for online fsck functionality.
5298 in this document and now has some familiarity with how XFS performs online
5313 necessary refinements to online repair and lack of customer demand mean that
5393 online fsck can use that instead of adding a separate vectored scrub system
5407 One serious shortcoming of the online fsck code is that the amount of time that
5434 The third piece is the ability to force an online repair.