1/* 2 * Copyright 2007 Haiku Inc. All rights reserved. 3 * Distributed under the terms of the MIT License. 4 * 5 * Authors: 6 * Ingo Weinhold 7 */ 8 9 /*! 10 \page fs_modules File System Modules 11 12 To support a particular file system (FS), a kernel module implementing a 13 special interface (\c file_system_module_info defined in \c <fs_interface.h>) 14 has to be provided. As for any other module the \c std_ops() hook is invoked 15 with \c B_MODULE_INIT directly after the FS module has been loaded by the 16 kernel, and with \c B_MODULE_UNINIT before it is unloaded, thus providing 17 a simple mechanism for one-time module initializations. The same module is 18 used for accessing any volume of that FS type. 19 20 \section objects File System Objects 21 22 There are several types of objects a FS module has to deal with directly or 23 indirectly: 24 25 - A \em volume is an instance of a file system. For a disk-based file 26 system it corresponds to a disk, partition, or disk image file. When 27 mounting a volume the virtual file system layer (VFS) assigns a unique 28 number (ID, of type \c dev_t) to it and a handle (type \c void*) provided 29 by the file system. The VFS creates an instance of struct \c fs_volume 30 that stores these two, an operation vector (\c fs_volume_ops), and other 31 volume related items. 32 Whenever the FS is asked to perform an operation the \c fs_volume object 33 is supplied, and whenever the FS requests a volume-related service from 34 the kernel, it also has to pass the \c fs_volume object or, in some cases, 35 just the volume ID. 36 Normally the handle is a pointer to a data structure the FS allocates to 37 associate data with the volume. 38 39 - A \em node is contained by a volume. It can be of type file, directory, or 40 symbolic link (symlink). Just as volumes nodes are associated with an ID 41 (type \c ino_t) and, if in use, also with a handle (type \c void*). 42 As for volumes the VFS creates an instance of a structure (\c fs_vnode) 43 for each node in use, storing the FS's handle for the node and an 44 operation vector (\c fs_vnode_ops). 45 Unlike the volume ID the node ID is defined by the FS. 46 It often has a meaning to the FS, e.g. file systems using inodes might 47 choose the inode number corresponding to the node. As long as the volume 48 is mounted and the node is known to the VFS, its node ID must not change. 49 The node handle is again a pointer to a data structure allocated by the 50 FS. 51 52 - A \em vnode (VFS node) is the VFS representation of a node. A volume may 53 contain a great number of nodes, but at a time only a few are represented 54 by vnodes, usually only those that are currently in use (sometimes a few 55 more). 56 57 - An \em entry (directory entry) belongs to a directory, has a name, and 58 refers to a node. It is important to understand the difference between 59 entries and nodes: A node doesn't have a name, only the entries that refer 60 to it have. If a FS supports to have more than one entry refer to a single 61 node, it is also said to support "hard links". It is possible that no 62 entry refers to a node. This happens when a node (e.g. a file) is still 63 open, but the last entry referring to it has been removed (the node will 64 be deleted when the it is closed). While entries are to be understood as 65 independent entities, the FS interface does not use IDs or handles to 66 refer to them; it always uses directory and entry name pairs to do that. 67 68 - An \em attribute is a named and typed data container belonging to a node. 69 A node may have any number of attributes; they are organized in a 70 (depending on the FS, virtual or actually existing) attribute directory, 71 through which one can iterate. 72 73 - An \em index is supposed to provide fast searching capabilities for 74 attributes with a certain name. A volume's index directory allows for 75 iterating through the indices. 76 77 - A \em query is a fully virtual object for searching for entries via an 78 expression matching entry name, node size, node modification date, and/or 79 node attributes. The mechanism of retrieving the entries found by a query 80 is similar to that for reading a directory contents. A query can be live 81 in which case the creator of the query is notified by the FS whenever an 82 entry no longer matches the query expression or starts matching. 83 84 85 \section concepts Generic Concepts 86 87 A FS module has to (or can) provide quite a lot of hook functions. There are 88 a few concepts that apply to several groups of them: 89 90 - <em>Opening, Closing, and Cookies</em>: Many FS objects can be opened and 91 closed, namely nodes in general, directories, attribute directories, 92 attributes, the index directory, and queries. In each case there are three 93 hook functions: <tt>open*()</tt>, <tt>close*()</tt>, and 94 <tt>free*_cookie()</tt>. The <tt>open*()</tt> hook is passed all that is 95 needed to identify the object to be opened and, in some cases, additional 96 parameters e.g. specifying a particular opening mode. The implementation 97 is required to return a cookie (type \c void*), usually a pointer to a 98 data structure the FS allocates. In some cases (e.g. 99 when an iteration state is associated with the cookie) a new cookie must 100 be allocated for each instance of opening the object. The cookie is passed 101 to all hooks that operate on a thusly opened object. The <tt>close*()</tt> 102 hook is invoked to signal that the cookie is to be closed. At this point 103 the cookie might still be in use. Blocking FS hooks (e.g. blocking 104 read/write operations) using the same cookie have to be unblocked. When 105 the cookie stops being in use the <tt>free*_cookie()</tt> hook is called; 106 it has to free the cookie. 107 108 - <em>Entry Iteration</em>: For the FS objects serving as containers for 109 other objects, i.e. directories, attribute directories, the index 110 directory, and queries, the cookie mechanism is used for a stateful 111 iteration through the contained objects. The <tt>read_*()</tt> hook reads 112 the next one or more entries into a <tt>struct dirent</tt> buffer. The 113 <tt>rewind_*()</tt> hook resets the iteration state to the first entry. 114 115 - <em>Stat Information</em>: In case of nodes, attributes, and indices 116 detailed information about an object are requested via a 117 <tt>read*_stat()</tt> hook and must be written into a <tt>struct stat</tt> 118 buffer. 119 120 121 \section vnodes VNodes 122 123 A vnode is the VFS representation of a node. As soon as an access to a node 124 is requested, the VFS creates a corresponding vnode. The requesting entity 125 gets a reference to the vnode for the time it works with the vnode and 126 releases the reference when done. When the last reference to a vnode has 127 been surrendered, the vnode is unused and the VFS can decide to destroy it 128 (usually it is cached for a while longer). 129 130 When the VFS creates a vnode, it invokes the volume's 131 \link fs_volume_ops::get_vnode get_vnode() \endlink 132 hook to let it create the respective node handle (unless the FS requests the 133 creation of the vnode explicitely by calling publish_vnode()). That's the 134 only hook that specifies a node by ID; all other node-related hooks are 135 defined in the respective node's operation vector and they are passed the 136 respective \c fs_vnode object. When the VFS deletes the vnode, it invokes 137 the nodes's \link fs_vnode_ops::put_vnode put_vnode() \endlink 138 hook or, if the node was marked removed, 139 \link fs_vnode_ops::remove_vnode remove_vnode() \endlink. 140 141 There are only four FS hooks through which the VFS gains knowledge of the 142 existence of a node. The first one is the 143 \link file_system_module_info::mount mount() \endlink 144 hook. It is supposed to call \c publish_vnode() for the root node of the 145 volume and return its ID. The second one is the 146 \link fs_vnode_ops::lookup lookup() \endlink 147 hook. Given a \c fs_vnode object of a directory and an entry name, it is 148 supposed to call \c get_vnode() for the node the entry refers to and return 149 the node ID. 150 The remaining two hooks, 151 \link fs_vnode_ops::read_dir read_dir() \endlink and 152 \link fs_volume_ops::read_query read_query() \endlink, 153 both return entries in a <tt>struct dirent</tt> structure, which also 154 contains the ID of the node the entry refers to. 155 156 157 \section mandatory_hooks Mandatory Hooks 158 159 Which hooks a FS module should provide mainly depends on what functionality 160 it features. E.g. a FS without support for attribute, indices, and/or 161 queries can omit the respective hooks (i.e. set them to \c NULL in the 162 module, \c fs_volume_ops, and \c fs_vnode_ops structure). Some hooks are 163 mandatory, though. A minimal read-only FS module must implement: 164 165 - \link file_system_module_info::mount mount() \endlink and 166 \link fs_volume_ops::unmount unmount() \endlink: 167 Mounting and unmounting a volume is required for pretty obvious reasons. 168 169 - \link fs_vnode_ops::lookup lookup() \endlink: 170 The VFS uses this hook to resolve path names. It is probably one of the 171 most frequently invoked hooks. 172 173 - \link fs_volume_ops::get_vnode get_vnode() \endlink and 174 \link fs_vnode_ops::put_vnode put_vnode() \endlink: 175 Create respectively destroy the FS's private node handle when 176 the VFS creates/deletes the vnode for a particular node. 177 178 - \link fs_vnode_ops::read_stat read_stat() \endlink: 179 Return a <tt>struct stat</tt> info for the given node, consisting of the 180 type and size of the node, its owner and access permissions, as well as 181 certain access times. 182 183 - \link fs_vnode_ops::open open() \endlink, 184 \link fs_vnode_ops::close close() \endlink, and 185 \link fs_vnode_ops::free_cookie free_cookie() \endlink: 186 Open and close a node as explained in \ref concepts. 187 188 - \link fs_vnode_ops::read read() \endlink: 189 Read data from an opened node (file). Even if the FS does not feature 190 files, the hook has to be present anyway; it should return an error in 191 this case. 192 193 - \link fs_vnode_ops::open_dir open_dir() \endlink, 194 \link fs_vnode_ops::close_dir close_dir() \endlink, and 195 \link fs_vnode_ops::free_dir_cookie free_dir_cookie() \endlink: 196 Open and close a directory for entry iteration as explained in 197 \ref concepts. 198 199 - \link fs_vnode_ops::read_dir read_dir() \endlink and 200 \link fs_vnode_ops::rewind_dir rewind_dir() \endlink: 201 Read the next entry/entries from a directory, respectively reset the 202 iterator to the first entry, as explained in \ref concepts. 203 204 Although not strictly mandatory, a FS should additionally implement the 205 following hooks: 206 207 - \link fs_volume_ops::read_fs_info read_fs_info() \endlink: 208 Return general information about the volume, e.g. total and free size, and 209 what special features (attributes, MIME types, queries) the volume/FS 210 supports. 211 212 - \link fs_vnode_ops::read_symlink read_symlink() \endlink: 213 Read the value of a symbolic link. Needed only, if the FS and volume 214 support symbolic links at all. If absent symbolic links stored on the 215 volume won't be interpreted. 216 217 - \link fs_vnode_ops::access access() \endlink: 218 Return whether the current user has the given access permissions for a 219 node. If the hook is absent the user is considered to have all 220 permissions. 221 222 223 \section permissions Checking Access Permission 224 225 While there is the \link fs_vnode_ops::access access() \endlink hook 226 that explicitly checks access permission for a node, it is not used by the 227 VFS to check access permissions for the other hooks. This has two reasons: 228 It could be cheaper for the FS to do that in the respective hook (at least 229 it's not more expensive), and the FS can make sure that there are no race 230 conditions between the check and the start of the operation for the hook. 231 The downside is that in most hooks the FS has to check those permissions. 232 It is possible to simplify things a bit, though: 233 234 - For operations that require the file system object in question (node, 235 directory, index, attribute, attribute directory, query) to be open, most 236 of the checks can already be done in the respective <tt>open*()</tt> hook. 237 E.g. in fs_vnode_ops::read() or fs_vnode_ops::write() one only has to 238 check, if the file has been opened for reading/writing, not whether the 239 current process has the respective permissions. 240 241 - The core of the fs_vnode_ops::access() hook can be moved into a private 242 function that can be easily reused in other hooks to check the permissions 243 for the respective operations. In most cases this will reduce permission 244 checking to one or two additional "if"s in the hooks where it is required. 245 246 247 \section node_monitoring Node Monitoring 248 249 One of the nice features of Haiku's API is an easy way to monitor 250 directories or nodes for changes. That is one can register for watching a 251 given node for certain modification events and will get a notification 252 message whenever one of those events occurs. While other parts of the 253 operating system do the actual notification message delivery, it is the 254 responsibility of each file system to announce changes. It has to use the 255 following functions to do that: 256 257 - notify_entry_created(): A directory entry has been created. 258 259 - notify_entry_removed(): A directory entry has been removed. 260 261 - notify_entry_moved(): A directory entry has been renamed and/or moved 262 to another directory. 263 264 - notify_stat_changed(): One or more members of the stat data for node have 265 changed. E.g. the \c st_size member changes when the file is truncated or 266 data have been written to it beyond its former size. The modification time 267 (\c st_mtime) changes whenever a node is write-accessed. To avoid a flood 268 of messages for small and frequent write operations on an open file the 269 file system can limit the number of notifications and mark them with the 270 B_WATCH_INTERIM_STAT flag. When closing a modified file a notification 271 without that flag should be issued. 272 273 274 - notify_attribute_changed(): An attribute of a node has been added, 275 removed, or changed. 276 277 If the file system supports queries, it needs to call the following 278 functions to make live queries work: 279 280 - notify_query_entry_created(): A change caused an entry that didn't match 281 the query predicate before to match now. 282 283 - notify_query_entry_removed(): A change caused an entry that matched 284 the query predicate before to no longer match. 285 286 287 \section caches Caches 288 289 The Haiku kernel provides three kinds of caches that can be used by a 290 file system implementation to speed up file system operations: 291 292 - <em>Block cache</em>: Interesting for disk-based file systems. The device 293 the file system volume is located on is considered to be divided in 294 equally-sized blocks of data that can be accessed via the block cache API 295 (e.g. block_cache_get() and block_cache_put()). As long as the system has 296 enough memory the block cache will keep all blocks that have been accessed 297 in memory, thus allowing further accesses to be very fast. 298 The block cache also has transaction support, which is of interest for 299 journaled file systems. 300 301 - <em>File cache</em>: Stores file contents. The FS can decide to create 302 a file cache for any of its files. The fs_vnode_ops::read() and 303 fs_vnode_ops::write() hooks can then simply be implemented by calling the 304 file_cache_read() respectively file_cache_write() function, which will 305 read the data from/write the data to the file cache. For reading uncached 306 data or writing back cached data to the file, the file cache will invoke 307 the fs_vnode_ops::io() hook. 308 Only files for which the file cache is used, can be memory mapped (cf. 309 mmap()) 310 311 - <em>Entry cache</em>: Can be used to speed up resolving paths. Normally 312 the VFS will call the fs_vnode_ops::lookup() hook for each element of the 313 path to be resolved, which, depending on the file system, can be more or 314 less expensive. When the FS uses the entry cache, those calls will be 315 avoided most of the time. All the file system has to do is invoke the 316 entry_cache_add() function when it encounters an entry that might not yet 317 be known to the entry cache and entry_cache_remove() when a directory 318 entry has been removed. 319*/ 320 321// TODO: 322// * FS layers 323