1<h2>How mount actually works</h2> 2 3<p>The <a href=https://landley.net/toybox/help.html#mount>mount comand</a> 4calls the <a href=https://man7.org/linux/man-pages/man2/mount.2.html>mount 5system call</a>, which has five arguments:</p> 6 7<blockquote><b> 8int mount(const char *source, const char *target, const char *filesystemtype, 9 unsigned long mountflags, const void *data); 10</b></blockquote> 11 12<p>The command "<b>mount -t ext2 /dev/sda1 /path/to/mntpoint -o ro,noatime</b>" 13parses its command line arguments to feed them into those five system call 14arguments. In this example, the <b>source</b> is "/dev/sda1", the <b>target</b> 15is "/path/to/mountpoint", and the <b>filesystemtype</b> is "ext2". 16 17<p>The other two syscall arguments (<b>mountflags</b> and </b>data</b>) 18come from the "-o option,option,option" argument. The mountflags argument goes 19to the VFS (explained below), and the data argument is passed to the filesystem 20driver.</p> 21 22<p>The mount command's options string is a list of comma separated values. If 23there's more than one -o argument on the mount command line, they get glued 24together (in order) with a comma. The mount command also checks the file 25<b>/etc/fstab</b> for default options, and the options you specify on the command 26line get appended to those defaults (if any). Most other command line mount 27flags are just synonyms for adding option flags (for example 28"mount -o remount -w" is equivalent to "mount -o remount,rw"). Behind the 29scenes they all get appended to the -o string and fed to a common parser.</p> 30 31<p>VFS stands for "Virtual File System" and is the common infrastructure shared 32by different filesystems. It handles common things like making the filesystem 33read only. The mount command assembles an option string to supply to the "data" 34argument of the option syscall, but first it parses it for VFS options 35(ro,noexec,nodev,nosuid,noatime...) each of which corresponds to a flag 36from <b>#include <sys/mount.h></b>. The mount command removes those options 37from the string and sets the corresponding bit in mountflags, then the 38remaining options (if any) form the data argument for the filesystem driver.</p> 39 40<blockquote> 41<p>Implementation details: the mountflag MS_SILENCE gets set by 42default even if there's nothing in /etc/fstab. Some actions (such as --bind 43and --move mounts, I.E. -o bind and -o move) are just VFS actions and don't 44require any specific filesystem at all. The "-o remount" flag requires looking 45up the filesystem in /proc/mounts and reassembling the full option string 46because you don't _just_ pass in the changed flags but have to reassemble 47the complete new filesystem state to give the system call. Some of the options 48in /etc/fstab are for the mount command (such as "user" which only does 49anything if the mount command has the suid bit set) and don't get passed 50through to the system call.</p> 51</blockquote> 52 53<p>When mounting a new filesystem, the "<b>filesystem</b>" argument to the mount system 54call specifies which filesystem driver to use. All the loaded drivers are 55listed in /proc/filesystems, but calling mount can also trigger a module load 56request to add another. A filesystem driver is responsible for putting files 57and subdirectories under the mount point: any time you open, close, read, 58write, truncate, list the contents of a directory, move, or delete a file, 59you're talking to a filesystem driver to do it. (Or when you call 60ioctl(), stat(), statvfs(), utime()...)</p> 61 62<h2>Four filesystem types (block backed, server backed, ramfs, synthetic).</h2> 63 64<p>Different drivers implement different filesystems, which come in four 65different types: the filesystem's backing store can be a fixed length 66block of storage, the backing store can be some server the driver connects to, 67the files can remain in memory with no backing store, 68or the filesystem driver can algorithmically create the filesystem's contents 69on the fly.</p> 70 71<ol> 72<li><h3>Block device backed filesystems, such as ext2 and vfat.</h3> 73 74<p>This kind of filesystem driver acts as a lens to look at a block device 75through. The source argument for block backed filesystems is a path to a 76block device (such as "/dev/hda1") which stores the contents of the 77filesystem in a fixed length block of sequential storage, with a seperate 78driver providing that block device.</p> 79 80<p>Block backed filesystems are the "conventional" filesystem type most people 81think of when they mount things. The name means that the "backing store" 82(where the data lives when the system is switched off) is on a block device.</p> 83</li> 84 85<li><h3>Server backed filesystems, such as cifs/samba or fuse.</h3> 86 87<p>These drivers convert filesystem operations into a sequential stream of 88bytes, which it can send through a pipe to talk to a program. The filesystem 89server could be a local Filesystem in Userspace daemon (connected to a local 90process through a pipe filehandle), behind a network socket (CIFS and v9fs), 91behind a char device (/dev/ttyS0), and so on. The common attribute is there's 92some program on the other end sending and receiving a sequential bytestream. 93The backing store is a server somewhere, and the filesystem driver is talking 94to a process that reads and writes data in some known protocol.</p> 95 96<p>The source argument for these filesystems indicates where the filesystem 97lives. It's often in a URL-like format for network filesystems, but it's 98really just a blob of data that the filesystem driver understands.</p> 99 100<p>A lot of server backed filesystems want to open their own connection so they 101don't have to pass their data through a persistent local userspace process, 102not really for performance reasons but because in low memory situations a 103chicken-and-egg situation can develop where all the process's pages have 104been swapped out but the filesystem needs to write data to its backing 105store in order to free up memory so it can swap the process's pages back in. 106If this mechanism is providing the root filesystem, this can deadlock and 107freeze the system solid. So while you _can_ pass some of them a filehandle, 108more often than not you don't.</p> 109 110<p>These are also known as "pipe backed" filesystems (or "network filesystems" 111because that's a common case, although a network doesn't need to be inolved). 112Conceptually they're char device backed filesystems analogous to the block 113backed filesystems (block devices provide seekable storage, char devices 114provide serial I/O), but you don't commonly specify a character device in 115/dev when mounting them because you're talking to a specific server process, 116not a whole machine.</p> 117</li> 118 119<li><h3>Ram backed filesystems (ramfs and tmpfs).</h3> 120 121<p>These are very simple filesystems that don't implement a backing store, 122but just keep the data in memory. Data 123written to these gets stored in the disk cache, and the driver ignores requests 124to flush it to backing store (reporting all the pages as pinned and 125unfreeable).</p> 126 127<p>These filesystem drivers essentially mount the VFS's page/dentry cache as if it was a 128filesystem. (Page cache stores file contents, dentry cache stores directory 129entries.) They grow and shrink dynamically as needed: when you write files 130into them they allocate more memory to store it, and when you delete files 131the memory is freed.</p> 132 133<p>The "ramfs" driver provides the simplest possible ram filesystem, 134which is too simple for most real use cases. The "tmpfs" driver adds 135a size limitation (by default 50% of system RAM, but it's adjustable as a mount 136option) so the system doesn't run out of memory and lock up if you 137"cat /dev/zero > file", can report how much space is remaining 138when asked (ramfs always says 0 bytes free), and can write its data 139out to swap space (like processes do) when the system is under memory pressure.</p> 140 141<blockquote> 142<p>Note that "ramdisk" is not the same as "ramfs". The ramdisk driver uses a 143chunk of memory to implement a block device, and then you can format that 144block device and mount it with a block device backed filesystem driver. 145(This is the same "two device drivers" approach you always have with block 146backed filesystems: one driver provides /dev/ram0 and the second driver mounts 147it as vfat.) Ram disks are significantly less efficient than ramfs, 148allocating a fixed amount of memory up front for the block device instead of 149dynamically resizing itself as files are written into an deleted from the 150page and dentry caches the way ramfs does.</p> 151</blockquote> 152 153<p>Initramfs (I.E. rootfs) is a ram backed filesystem mounted on / which 154can't be unmounted for the same reason PID 1 can't exit. The boot 155process can extract a cpio.gz archive into it (either statically linked 156into the kernel or loaded as a second file by the bootloader), and 157if it contains an executable "init" binary at the top level that 158will be run as PID 1. If you specify "root=" on the kernel command line, 159initramfs will be ramfs and will get overmounted with the specified 160filesystem if no "/init" binary can be run out of the initramfs. 161If you don't specify root= then initramfs will be tmpfs, which is probably 162what you want when the system is running from initramfs.</p> 163</li> 164 165<li><h3>Synthetic filesystems (proc, sysfs, devtmpfs, devpts...)</h3> 166 167<p>These filesystems don't have any backing store because they don't 168store arbitrary data the way the first three types of filesystems do.</p> 169 170<p>Instead they present artificial contents, which can represent processes or 171hardware or anything the driver writer wants them to show. Listing or reading 172from these files calls a driver function that produces whatever output it's 173programmed to, and writing to these files submits data to the driver which 174can do anything it wants with it.</p> 175 176<p>Synthetic filesystems are often implemented to provide monitoring and control 177knobs for parts of the operating system, as an alternative to adding more 178system calls (or ioctl, sysctl, etc). They provide a more human friendly user 179interface which programs can use but which users can also interact with 180directly from the command line via "cat" and redirecting the output of 181"echo" into special files.</p> 182 183<blockquote> 184<p>The first synthetic filesystem in Linux was "proc", which was initially 185intended to provide a directory for each process in the system to provide 186information to tools like "ps" and "top" (the /proc/[0-9]* entries) 187but became a dumping ground for any information the kernel wanted to export. 188Eventually the kernel developers <a href=https://lwn.net/Articles/57369/>genericized</a> 189the synthetic filesystem infrastructure so the system could have multiple 190different synthetic filesystems, but /proc remains full 191unrelated historic legacy exports kept for backwards compatibility.</p> 192</blockquote> 193</li> 194</ol> 195 196<p>TODO: explain overmounts, mount --move, mount namespaces.</p> 197