xref: /haiku/docs/user/drivers/fs_modules.dox (revision 1b80286772b529a3d6de3bbeb0720c62e6a32fed)
1/*
2 * Copyright 2007 Haiku Inc. All rights reserved.
3 * Distributed under the terms of the MIT License.
4 *
5 * Authors:
6 *   Ingo Weinhold
7 */
8
9 /*!
10	\page fs_modules File System Modules
11
12	To support a particular file system (FS), a kernel module implementing a
13	special interface (\c file_system_module_info defined in \c <fs_interface.h>)
14	has to be provided. As for any other module the \c std_ops() hook is invoked
15	with \c B_MODULE_INIT directly after the FS module has been loaded by the
16	kernel, and with \c B_MODULE_UNINIT before it is unloaded, thus providing
17	a simple mechanism for one-time module initializations. The same module is
18	used for accessing any volume of that FS type.
19
20
21	\section objects File System Objects
22
23	There are several types of objects a FS module has to deal with directly or
24	indirectly:
25
26	- A \em volume is an instance of a file system. For a disk-based file
27	  system it corresponds to a disk, partition, or disk image file. When
28	  mounting a volume the virtual file system layer (VFS) assigns a unique
29	  number (ID, of type \c dev_t) to it and a handle (type \c void*) provided
30	  by the file system. The VFS creates an instance of struct \c fs_volume
31	  that stores these two, an operation vector (\c fs_volume_ops), and other
32	  volume related items.
33	  Whenever the FS is asked to perform an operation the \c fs_volume object
34	  is supplied, and whenever the FS requests a volume-related service from
35	  the kernel, it also has to pass the \c fs_volume object or, in some cases,
36	  just the volume ID.
37	  Normally the handle is a pointer to a data structure the FS allocates to
38	  associate data with the volume.
39
40	- A \em node is contained by a volume. It can be of type file, directory, or
41	  symbolic link (symlink). Just as volumes nodes are associated with an ID
42	  (type \c ino_t) and, if in use, also with a handle (type \c void*).
43	  As for volumes the VFS creates an instance of a structure (\c fs_vnode)
44	  for each node in use, storing the FS's handle for the node and an
45	  operation vector (\c fs_vnode_ops).
46	  Unlike the volume ID the node ID is defined by the FS.
47	  It often has a meaning to the FS, e.g. file systems using inodes might
48	  choose the inode number corresponding to the node. As long as the volume
49	  is mounted and the node is known to the VFS, its node ID must not change.
50	  The node handle is again a pointer to a data structure allocated by the
51	  FS.
52
53	- A \em vnode (VFS node) is the VFS representation of a node. A volume may
54	  contain a great number of nodes, but at a time only a few are represented
55	  by vnodes, usually only those that are currently in use (sometimes a few
56	  more).
57
58	- An \em entry (directory entry) belongs to a directory, has a name, and
59	  refers to a node. It is important to understand the difference between
60	  entries and nodes: A node doesn't have a name, only the entries that refer
61	  to it have. If a FS supports to have more than one entry refer to a single
62	  node, it is also said to support "hard links". It is possible that no
63	  entry refers to a node. This happens when a node (e.g. a file) is still
64	  open, but the last entry referring to it has been removed (the node will
65	  be deleted when the it is closed). While entries are to be understood as
66	  independent entities, the FS interface does not use IDs or handles to
67	  refer to them; it always uses directory and entry name pairs to do that.
68
69	- An \em attribute is a named and typed data container belonging to a node.
70	  A node may have any number of attributes; they are organized in a
71	  (depending on the FS, virtual or actually existing) attribute directory,
72	  through which one can iterate.
73
74	- An \em index is supposed to provide fast searching capabilities for
75	  attributes with a certain name. A volume's index directory allows for
76	  iterating through the indices.
77
78	- A \em query is a fully virtual object for searching for entries via an
79	  expression matching entry name, node size, node modification date, and/or
80	  node attributes. The mechanism of retrieving the entries found by a query
81	  is similar to that for reading a directory contents. A query can be live
82	  in which case the creator of the query is notified by the FS whenever an
83	  entry no longer matches the query expression or starts matching.
84
85
86	\section concepts Generic Concepts
87
88	A FS module has to (or can) provide quite a lot of hook functions. There are
89	a few concepts that apply to several groups of them:
90
91	- <em>Opening, Closing, and Cookies</em>: Many FS objects can be opened and
92	  closed, namely nodes in general, directories, attribute directories,
93	  attributes, the index directory, and queries. In each case there are three
94	  hook functions: <tt>open*()</tt>, <tt>close*()</tt>, and
95	  <tt>free*_cookie()</tt>. The <tt>open*()</tt> hook is passed all that is
96	  needed to identify the object to be opened and, in some cases, additional
97	  parameters e.g. specifying a particular opening mode. The implementation
98	  is required to return a cookie (type \c void*), usually a pointer to a
99	  data structure the FS allocates. In some cases (e.g.
100	  when an iteration state is associated with the cookie) a new cookie must
101	  be allocated for each instance of opening the object. The cookie is passed
102	  to all hooks that operate on a thusly opened object. The <tt>close*()</tt>
103	  hook is invoked to signal that the cookie is to be closed. At this point
104	  the cookie might still be in use. Blocking FS hooks (e.g. blocking
105	  read/write operations) using the same cookie have to be unblocked. When
106	  the cookie stops being in use the <tt>free*_cookie()</tt> hook is called;
107	  it has to free the cookie.
108
109	- <em>Entry Iteration</em>: For the FS objects serving as containers for
110	  other objects, i.e. directories, attribute directories, the index
111	  directory, and queries, the cookie mechanism is used for a stateful
112	  iteration through the contained objects. The <tt>read_*()</tt> hook reads
113	  the next one or more entries into a <tt>struct dirent</tt> buffer. The
114	  <tt>rewind_*()</tt> hook resets the iteration state to the first entry.
115
116	- <em>Stat Information</em>: In case of nodes, attributes, and indices
117	  detailed information about an object are requested via a
118	  <tt>read*_stat()</tt> hook and must be written into a <tt>struct stat</tt>
119	  buffer.
120
121
122	\section vnodes VNodes
123
124	A vnode is the VFS representation of a node. As soon as an access to a node
125	is requested, the VFS creates a corresponding vnode. The requesting entity
126	gets a reference to the vnode for the time it works with the vnode and
127	releases the reference when done. When the last reference to a vnode has
128	been surrendered, the vnode is unused and the VFS can decide to destroy it
129	(usually it is cached for a while longer).
130
131	When the VFS creates a vnode, it invokes the volume's
132	\link fs_volume_ops::get_vnode get_vnode() \endlink
133	hook to let it create the respective node handle (unless the FS requests the
134	creation of the vnode explicitely by calling publish_vnode()). That's the
135	only hook that specifies a node by ID; all other node-related hooks are
136	defined in the respective node's operation vector and they are passed the
137	respective \c fs_vnode object. When the VFS deletes the vnode, it invokes
138	the nodes's \link fs_vnode_ops::put_vnode put_vnode() \endlink
139	hook or, if the node was marked removed,
140	\link fs_vnode_ops::remove_vnode remove_vnode() \endlink.
141
142	There are only four FS hooks through which the VFS gains knowledge of the
143	existence of a node. The first one is the
144	\link file_system_module_info::mount mount() \endlink
145	hook. It is supposed to call \c publish_vnode() for the root node of the
146	volume and return its ID. The second one is the
147	\link fs_vnode_ops::lookup lookup() \endlink
148	hook. Given a \c fs_vnode object of a directory and an entry name, it is
149	supposed to call \c get_vnode() for the node the entry refers to and return
150	the node ID.
151	The remaining two hooks,
152	\link fs_vnode_ops::read_dir read_dir() \endlink and
153	\link fs_volume_ops::read_query read_query() \endlink,
154	both return entries in a <tt>struct dirent</tt> structure, which also
155	contains the ID of the node the entry refers to.
156
157
158	\section mandatory_hooks Mandatory Hooks
159
160	Which hooks a FS module should provide mainly depends on what functionality
161	it features. E.g. a FS without support for attribute, indices, and/or
162	queries can omit the respective hooks (i.e. set them to \c NULL in the
163	module, \c fs_volume_ops, and \c fs_vnode_ops structure). Some hooks are
164	mandatory, though. A minimal read-only FS module must implement:
165
166	- \link file_system_module_info::mount mount() \endlink and
167	  \link fs_volume_ops::unmount unmount() \endlink:
168	  Mounting and unmounting a volume is required for pretty obvious reasons.
169
170	- \link fs_vnode_ops::lookup lookup() \endlink:
171	  The VFS uses this hook to resolve path names. It is probably one of the
172	  most frequently invoked hooks.
173
174	- \link fs_volume_ops::get_vnode get_vnode() \endlink and
175	  \link fs_vnode_ops::put_vnode put_vnode() \endlink:
176	  Create respectively destroy the FS's private node handle when
177	  the VFS creates/deletes the vnode for a particular node.
178
179	- \link fs_vnode_ops::read_stat read_stat() \endlink:
180	  Return a <tt>struct stat</tt> info for the given node, consisting of the
181	  type and size of the node, its owner and access permissions, as well as
182	  certain access times.
183
184	- \link fs_vnode_ops::open open() \endlink,
185	  \link fs_vnode_ops::close close() \endlink, and
186	  \link fs_vnode_ops::free_cookie free_cookie() \endlink:
187	  Open and close a node as explained in \ref concepts.
188
189	- \link fs_vnode_ops::read read() \endlink:
190	  Read data from an opened node (file). Even if the FS does not feature
191	  files, the hook has to be present anyway; it should return an error in
192	  this case.
193
194	- \link fs_vnode_ops::open_dir open_dir() \endlink,
195	  \link fs_vnode_ops::close_dir close_dir() \endlink, and
196	  \link fs_vnode_ops::free_dir_cookie free_dir_cookie() \endlink:
197	  Open and close a directory for entry iteration as explained in
198	  \ref concepts.
199
200	- \link fs_vnode_ops::read_dir read_dir() \endlink and
201	  \link fs_vnode_ops::rewind_dir rewind_dir() \endlink:
202	  Read the next entry/entries from a directory, respectively reset the
203	  iterator to the first entry, as explained in \ref concepts.
204
205	Although not strictly mandatory, a FS should additionally implement the
206	following hooks:
207
208	- \link fs_volume_ops::read_fs_info read_fs_info() \endlink:
209	  Return general information about the volume, e.g. total and free size, and
210	  what special features (attributes, MIME types, queries) the volume/FS
211	  supports.
212
213	- \link fs_vnode_ops::read_symlink read_symlink() \endlink:
214	  Read the value of a symbolic link. Needed only, if the FS and volume
215	  support symbolic links at all. If absent symbolic links stored on the
216	  volume won't be interpreted.
217
218	- \link fs_vnode_ops::access access() \endlink:
219	  Return whether the current user has the given access permissions for a
220	  node. If the hook is absent the user is considered to have all
221	  permissions.
222
223
224	\section permissions Checking Access Permission
225
226	While there is the \link fs_vnode_ops::access access() \endlink hook
227	that explicitly checks access permission for a node, it is not used by the
228	VFS to check access permissions for the other hooks. This has two reasons:
229	It could be cheaper for the FS to do that in the respective hook (at least
230	it's not more expensive), and the FS can make sure that there are no race
231	conditions between the check and the start of the operation for the hook.
232	The downside is that in most hooks the FS has to check those permissions.
233	It is possible to simplify things a bit, though:
234
235	- For operations that require the file system object in question (node,
236	  directory, index, attribute, attribute directory, query) to be open, most
237	  of the checks can already be done in the respective <tt>open*()</tt> hook.
238	  E.g. in fs_vnode_ops::read() or fs_vnode_ops::write() one only has to
239	  check, if the file has been opened for reading/writing, not whether the
240	  current process has the respective permissions.
241
242	- The core of the fs_vnode_ops::access() hook can be moved into a private
243	  function that can be easily reused in other hooks to check the permissions
244	  for the respective operations. In most cases this will reduce permission
245	  checking to one or two additional "if"s in the hooks where it is required.
246
247
248	\section node_monitoring Node Monitoring
249
250	One of the nice features of Haiku's API is an easy way to monitor
251	directories or nodes for changes. That is one can register for watching a
252	given node for certain modification events and will get a notification
253	message whenever one of those events occurs. While other parts of the
254	operating system do the actual notification message delivery, it is the
255	responsibility of each file system to announce changes. It has to use the
256	following functions to do that:
257
258	- notify_entry_created(): A directory entry has been created.
259
260	- notify_entry_removed(): A directory entry has been removed.
261
262	- notify_entry_moved(): A directory entry has been renamed and/or moved
263	  to another directory.
264
265	- notify_stat_changed(): One or more members of the stat data for node have
266	  changed. E.g. the \c st_size member changes when the file is truncated or
267	  data have been written to it beyond its former size. The modification time
268	  (\c st_mtime) changes whenever a node is write-accessed. To avoid a flood
269	  of messages for small and frequent write operations on an open file the
270	  file system can limit the number of notifications and mark them with the
271	  B_WATCH_INTERIM_STAT flag. When closing a modified file a notification
272	  without that flag should be issued.
273
274
275	- notify_attribute_changed(): An attribute of a node has been added,
276	  removed, or changed.
277
278	If the file system supports queries, it needs to call the following
279	functions to make live queries work:
280
281	- notify_query_entry_created(): A change caused an entry that didn't match
282	  the query predicate before to match now.
283
284	- notify_query_entry_removed(): A change caused an entry that matched
285	  the query predicate before to no longer match.
286
287
288	\section caches Caches
289
290	The Haiku kernel provides three kinds of caches that can be used by a
291	file system implementation to speed up file system operations:
292
293	- <em>Block cache</em>: Interesting for disk-based file systems. The device
294	  the file system volume is located on is considered to be divided in
295	  equally-sized blocks of data that can be accessed via the block cache API
296	  (e.g. block_cache_get() and block_cache_put()). As long as the system has
297	  enough memory the block cache will keep all blocks that have been accessed
298	  in memory, thus allowing further accesses to be very fast.
299	  The block cache also has transaction support, which is of interest for
300	  journaled file systems.
301
302	- <em>File cache</em>: Stores file contents. The FS can decide to create
303	  a file cache for any of its files. The fs_vnode_ops::read() and
304	  fs_vnode_ops::write() hooks can then simply be implemented by calling the
305	  file_cache_read() respectively file_cache_write() function, which will
306	  read the data from/write the data to the file cache. For reading uncached
307	  data or writing back cached data to the file, the file cache will invoke
308	  the fs_vnode_ops::io() hook.
309	  Only files for which the file cache is used, can be memory mapped (cf.
310	  mmap())
311
312	- <em>Entry cache</em>: Can be used to speed up resolving paths. Normally
313	  the VFS will call the fs_vnode_ops::lookup() hook for each element of the
314	  path to be resolved, which, depending on the file system, can be more or
315	  less expensive. When the FS uses the entry cache, those calls will be
316	  avoided most of the time. All the file system has to do is invoke the
317	  entry_cache_add() function when it encounters an entry that might not yet
318	  be known to the entry cache and entry_cache_remove() when a directory
319	  entry has been removed.
320*/
321
322// TODO:
323//	* FS layers
324