1/* 2 * Copyright 2007 Haiku Inc. All rights reserved. 3 * Distributed under the terms of the MIT License. 4 * 5 * Authors: 6 * Ingo Weinhold 7 */ 8 9 /*! 10 \page fs_modules File System Modules 11 12 To support a particular file system (FS), a kernel module implementing a 13 special interface (\c file_system_module_info defined in \c <fs_interface.h>) 14 has to be provided. As for any other module the \c std_ops() hook is invoked 15 with \c B_MODULE_INIT directly after the FS module has been loaded by the 16 kernel, and with \c B_MODULE_UNINIT before it is unloaded, thus providing 17 a simple mechanism for one-time module initializations. The same module is 18 used for accessing any volume of that FS type. 19 20 21 \section objects File System Objects 22 23 There are several types of objects a FS module has to deal with directly or 24 indirectly: 25 26 - A \em volume is an instance of a file system. For a disk-based file 27 system it corresponds to a disk, partition, or disk image file. When 28 mounting a volume the virtual file system layer (VFS) assigns a unique 29 number (ID, of type \c dev_t) to it and a handle (type \c void*) provided 30 by the file system. The VFS creates an instance of struct \c fs_volume 31 that stores these two, an operation vector (\c fs_volume_ops), and other 32 volume related items. 33 Whenever the FS is asked to perform an operation the \c fs_volume object 34 is supplied, and whenever the FS requests a volume-related service from 35 the kernel, it also has to pass the \c fs_volume object or, in some cases, 36 just the volume ID. 37 Normally the handle is a pointer to a data structure the FS allocates to 38 associate data with the volume. 39 40 - A \em node is contained by a volume. It can be of type file, directory, or 41 symbolic link (symlink). Just as volumes nodes are associated with an ID 42 (type \c ino_t) and, if in use, also with a handle (type \c void*). 43 As for volumes the VFS creates an instance of a structure (\c fs_vnode) 44 for each node in use, storing the FS's handle for the node and an 45 operation vector (\c fs_vnode_ops). 46 Unlike the volume ID the node ID is defined by the FS. 47 It often has a meaning to the FS, e.g. file systems using inodes might 48 choose the inode number corresponding to the node. As long as the volume 49 is mounted and the node is known to the VFS, its node ID must not change. 50 The node handle is again a pointer to a data structure allocated by the 51 FS. 52 53 - A \em vnode (VFS node) is the VFS representation of a node. A volume may 54 contain a great number of nodes, but at a time only a few are represented 55 by vnodes, usually only those that are currently in use (sometimes a few 56 more). 57 58 - An \em entry (directory entry) belongs to a directory, has a name, and 59 refers to a node. It is important to understand the difference between 60 entries and nodes: A node doesn't have a name, only the entries that refer 61 to it have. If a FS supports to have more than one entry refer to a single 62 node, it is also said to support "hard links". It is possible that no 63 entry refers to a node. This happens when a node (e.g. a file) is still 64 open, but the last entry referring to it has been removed (the node will 65 be deleted when the it is closed). While entries are to be understood as 66 independent entities, the FS interface does not use IDs or handles to 67 refer to them; it always uses directory and entry name pairs to do that. 68 69 - An \em attribute is a named and typed data container belonging to a node. 70 A node may have any number of attributes; they are organized in a 71 (depending on the FS, virtual or actually existing) attribute directory, 72 through which one can iterate. 73 74 - An \em index is supposed to provide fast searching capabilities for 75 attributes with a certain name. A volume's index directory allows for 76 iterating through the indices. 77 78 - A \em query is a fully virtual object for searching for entries via an 79 expression matching entry name, node size, node modification date, and/or 80 node attributes. The mechanism of retrieving the entries found by a query 81 is similar to that for reading a directory contents. A query can be live 82 in which case the creator of the query is notified by the FS whenever an 83 entry no longer matches the query expression or starts matching. 84 85 86 \section concepts Generic Concepts 87 88 A FS module has to (or can) provide quite a lot of hook functions. There are 89 a few concepts that apply to several groups of them: 90 91 - <em>Opening, Closing, and Cookies</em>: Many FS objects can be opened and 92 closed, namely nodes in general, directories, attribute directories, 93 attributes, the index directory, and queries. In each case there are three 94 hook functions: <tt>open*()</tt>, <tt>close*()</tt>, and 95 <tt>free*_cookie()</tt>. The <tt>open*()</tt> hook is passed all that is 96 needed to identify the object to be opened and, in some cases, additional 97 parameters e.g. specifying a particular opening mode. The implementation 98 is required to return a cookie (type \c void*), usually a pointer to a 99 data structure the FS allocates. In some cases (e.g. 100 when an iteration state is associated with the cookie) a new cookie must 101 be allocated for each instance of opening the object. The cookie is passed 102 to all hooks that operate on a thusly opened object. The <tt>close*()</tt> 103 hook is invoked to signal that the cookie is to be closed. At this point 104 the cookie might still be in use. Blocking FS hooks (e.g. blocking 105 read/write operations) using the same cookie have to be unblocked. When 106 the cookie stops being in use the <tt>free*_cookie()</tt> hook is called; 107 it has to free the cookie. 108 109 - <em>Entry Iteration</em>: For the FS objects serving as containers for 110 other objects, i.e. directories, attribute directories, the index 111 directory, and queries, the cookie mechanism is used for a stateful 112 iteration through the contained objects. The <tt>read_*()</tt> hook reads 113 the next one or more entries into a <tt>struct dirent</tt> buffer. The 114 <tt>rewind_*()</tt> hook resets the iteration state to the first entry. 115 116 - <em>Stat Information</em>: In case of nodes, attributes, and indices 117 detailed information about an object are requested via a 118 <tt>read*_stat()</tt> hook and must be written into a <tt>struct stat</tt> 119 buffer. 120 121 122 \section vnodes VNodes 123 124 A vnode is the VFS representation of a node. As soon as an access to a node 125 is requested, the VFS creates a corresponding vnode. The requesting entity 126 gets a reference to the vnode for the time it works with the vnode and 127 releases the reference when done. When the last reference to a vnode has 128 been surrendered, the vnode is unused and the VFS can decide to destroy it 129 (usually it is cached for a while longer). 130 131 When the VFS creates a vnode, it invokes the volume's 132 \link fs_volume_ops::get_vnode get_vnode() \endlink 133 hook to let it create the respective node handle (unless the FS requests the 134 creation of the vnode explicitely by calling publish_vnode()). That's the 135 only hook that specifies a node by ID; all other node-related hooks are 136 defined in the respective node's operation vector and they are passed the 137 respective \c fs_vnode object. When the VFS deletes the vnode, it invokes 138 the nodes's \link fs_vnode_ops::put_vnode put_vnode() \endlink 139 hook or, if the node was marked removed, 140 \link fs_vnode_ops::remove_vnode remove_vnode() \endlink. 141 142 There are only four FS hooks through which the VFS gains knowledge of the 143 existence of a node. The first one is the 144 \link file_system_module_info::mount mount() \endlink 145 hook. It is supposed to call \c publish_vnode() for the root node of the 146 volume and return its ID. The second one is the 147 \link fs_vnode_ops::lookup lookup() \endlink 148 hook. Given a \c fs_vnode object of a directory and an entry name, it is 149 supposed to call \c get_vnode() for the node the entry refers to and return 150 the node ID. 151 The remaining two hooks, 152 \link fs_vnode_ops::read_dir read_dir() \endlink and 153 \link fs_volume_ops::read_query read_query() \endlink, 154 both return entries in a <tt>struct dirent</tt> structure, which also 155 contains the ID of the node the entry refers to. 156 157 158 \section mandatory_hooks Mandatory Hooks 159 160 Which hooks a FS module should provide mainly depends on what functionality 161 it features. E.g. a FS without support for attribute, indices, and/or 162 queries can omit the respective hooks (i.e. set them to \c NULL in the 163 module, \c fs_volume_ops, and \c fs_vnode_ops structure). Some hooks are 164 mandatory, though. A minimal read-only FS module must implement: 165 166 - \link file_system_module_info::mount mount() \endlink and 167 \link fs_volume_ops::unmount unmount() \endlink: 168 Mounting and unmounting a volume is required for pretty obvious reasons. 169 170 - \link fs_vnode_ops::lookup lookup() \endlink: 171 The VFS uses this hook to resolve path names. It is probably one of the 172 most frequently invoked hooks. 173 174 - \link fs_volume_ops::get_vnode get_vnode() \endlink and 175 \link fs_vnode_ops::put_vnode put_vnode() \endlink: 176 Create respectively destroy the FS's private node handle when 177 the VFS creates/deletes the vnode for a particular node. 178 179 - \link fs_vnode_ops::read_stat read_stat() \endlink: 180 Return a <tt>struct stat</tt> info for the given node, consisting of the 181 type and size of the node, its owner and access permissions, as well as 182 certain access times. 183 184 - \link fs_vnode_ops::open open() \endlink, 185 \link fs_vnode_ops::close close() \endlink, and 186 \link fs_vnode_ops::free_cookie free_cookie() \endlink: 187 Open and close a node as explained in \ref concepts. 188 189 - \link fs_vnode_ops::read read() \endlink: 190 Read data from an opened node (file). Even if the FS does not feature 191 files, the hook has to be present anyway; it should return an error in 192 this case. 193 194 - \link fs_vnode_ops::open_dir open_dir() \endlink, 195 \link fs_vnode_ops::close_dir close_dir() \endlink, and 196 \link fs_vnode_ops::free_dir_cookie free_dir_cookie() \endlink: 197 Open and close a directory for entry iteration as explained in 198 \ref concepts. 199 200 - \link fs_vnode_ops::read_dir read_dir() \endlink and 201 \link fs_vnode_ops::rewind_dir rewind_dir() \endlink: 202 Read the next entry/entries from a directory, respectively reset the 203 iterator to the first entry, as explained in \ref concepts. 204 205 Although not strictly mandatory, a FS should additionally implement the 206 following hooks: 207 208 - \link fs_volume_ops::read_fs_info read_fs_info() \endlink: 209 Return general information about the volume, e.g. total and free size, and 210 what special features (attributes, MIME types, queries) the volume/FS 211 supports. 212 213 - \link fs_vnode_ops::read_symlink read_symlink() \endlink: 214 Read the value of a symbolic link. Needed only, if the FS and volume 215 support symbolic links at all. If absent symbolic links stored on the 216 volume won't be interpreted. 217 218 - \link fs_vnode_ops::access access() \endlink: 219 Return whether the current user has the given access permissions for a 220 node. If the hook is absent the user is considered to have all 221 permissions. 222 223 224 \section permissions Checking Access Permission 225 226 While there is the \link fs_vnode_ops::access access() \endlink hook 227 that explicitly checks access permission for a node, it is not used by the 228 VFS to check access permissions for the other hooks. This has two reasons: 229 It could be cheaper for the FS to do that in the respective hook (at least 230 it's not more expensive), and the FS can make sure that there are no race 231 conditions between the check and the start of the operation for the hook. 232 The downside is that in most hooks the FS has to check those permissions. 233 It is possible to simplify things a bit, though: 234 235 - For operations that require the file system object in question (node, 236 directory, index, attribute, attribute directory, query) to be open, most 237 of the checks can already be done in the respective <tt>open*()</tt> hook. 238 E.g. in fs_vnode_ops::read() or fs_vnode_ops::write() one only has to 239 check, if the file has been opened for reading/writing, not whether the 240 current process has the respective permissions. 241 242 - The core of the fs_vnode_ops::access() hook can be moved into a private 243 function that can be easily reused in other hooks to check the permissions 244 for the respective operations. In most cases this will reduce permission 245 checking to one or two additional "if"s in the hooks where it is required. 246 247 248 \section node_monitoring Node Monitoring 249 250 One of the nice features of Haiku's API is an easy way to monitor 251 directories or nodes for changes. That is one can register for watching a 252 given node for certain modification events and will get a notification 253 message whenever one of those events occurs. While other parts of the 254 operating system do the actual notification message delivery, it is the 255 responsibility of each file system to announce changes. It has to use the 256 following functions to do that: 257 258 - notify_entry_created(): A directory entry has been created. 259 260 - notify_entry_removed(): A directory entry has been removed. 261 262 - notify_entry_moved(): A directory entry has been renamed and/or moved 263 to another directory. 264 265 - notify_stat_changed(): One or more members of the stat data for node have 266 changed. E.g. the \c st_size member changes when the file is truncated or 267 data have been written to it beyond its former size. The modification time 268 (\c st_mtime) changes whenever a node is write-accessed. To avoid a flood 269 of messages for small and frequent write operations on an open file the 270 file system can limit the number of notifications and mark them with the 271 B_WATCH_INTERIM_STAT flag. When closing a modified file a notification 272 without that flag should be issued. 273 274 275 - notify_attribute_changed(): An attribute of a node has been added, 276 removed, or changed. 277 278 If the file system supports queries, it needs to call the following 279 functions to make live queries work: 280 281 - notify_query_entry_created(): A change caused an entry that didn't match 282 the query predicate before to match now. 283 284 - notify_query_entry_removed(): A change caused an entry that matched 285 the query predicate before to no longer match. 286 287 288 \section caches Caches 289 290 The Haiku kernel provides three kinds of caches that can be used by a 291 file system implementation to speed up file system operations: 292 293 - <em>Block cache</em>: Interesting for disk-based file systems. The device 294 the file system volume is located on is considered to be divided in 295 equally-sized blocks of data that can be accessed via the block cache API 296 (e.g. block_cache_get() and block_cache_put()). As long as the system has 297 enough memory the block cache will keep all blocks that have been accessed 298 in memory, thus allowing further accesses to be very fast. 299 The block cache also has transaction support, which is of interest for 300 journaled file systems. 301 302 - <em>File cache</em>: Stores file contents. The FS can decide to create 303 a file cache for any of its files. The fs_vnode_ops::read() and 304 fs_vnode_ops::write() hooks can then simply be implemented by calling the 305 file_cache_read() respectively file_cache_write() function, which will 306 read the data from/write the data to the file cache. For reading uncached 307 data or writing back cached data to the file, the file cache will invoke 308 the fs_vnode_ops::io() hook. 309 Only files for which the file cache is used, can be memory mapped (cf. 310 mmap()) 311 312 - <em>Entry cache</em>: Can be used to speed up resolving paths. Normally 313 the VFS will call the fs_vnode_ops::lookup() hook for each element of the 314 path to be resolved, which, depending on the file system, can be more or 315 less expensive. When the FS uses the entry cache, those calls will be 316 avoided most of the time. All the file system has to do is invoke the 317 entry_cache_add() function when it encounters an entry that might not yet 318 be known to the entry cache and entry_cache_remove() when a directory 319 entry has been removed. 320*/ 321 322// TODO: 323// * FS layers 324