Introduction Part III | mnemofs

File systems have existed for almost as long as storage mediums have, which is to say, decades. File systems started out very simple. As the needs of users increased, as operating systems evolved, and as the quirks of storage mediums increased in exchange for providing maximum efficiency under very specific conditions, file systems had to adapt, and they generally became more complex but more specific.

The rest of this blog takes heavy inspiration from littlefs's design document (opens in a new tab), which is a file sytem that deserves to be put in an art gallery.

Types of File Systems

There have been various file systems with their various quirks, but they can be generally divided into four types.

Block Based File Systems

These are file systems that represent the used storage space in the form of a tree. They are also the oldest types of file systems out there.

They divide the storage into various blocks in which files are stored. Any updates to the files are done in place. Let's say block x contains my file, and if I want to update the file, the same block is rewritten with the new updated file's content.

Can you see the problem? Suppose my block is 256 B in size, and that contains a file of size 256 B as well. Say some random bytes, say 13th, 19th, 100th, and 105th bytes, are to be updated to new values by your computer. Let's say the writes can be done at a maximum rate of 1 B at a time. And to add to this, let's say our luck is very bad, and after it has updated the 19th block, there is a power failure, and the write operation stops, and so does your computer. Now there is no information on what changes are remaining, or what has already been written.

You get a situation where some of the file is updated while the rest isn't.

Nightmare.

This is called non-atomicity, or non-resiliency. They are not resilient to power losses as they are not atomic. If an operation is atomic, it means that if it is interrupted in between executions, any changes it makes will be reverted back to the pre-execution state.

Some examples of this type of file system are FAT (opens in a new tab) and ext2 (opens in a new tab).

Without modifications (as seen later), these file systems do not stand up to modern needs for atomicity. You need your file to be updated, but even the previous state is better than a possibly garbled mess.

While this might not seem to be much of an issue for text files, for files encoded using encoding algorithms, this is quite a hell. Depending on the encoding algorithm, it has a very likely outcome of the entire file being corrupted because the decoder can no longer decode the file and hence will give that error.

Another disadvantage of such file systems can be that if there is a file that is updated very often compared to others, the location on the storage device, where the file is located, will be used much more than other areas of the device, leading to uneven wear distribution. This may end up causing that particular area of storage device to die before the others.

An advantage of such file systems is speed. Due to their simple design, they are very fast.

Log Based File Systems / Log Structured File Systems

On the other extreme, there are log based file systems that take atomicity very seriously. Instead of treating the entire storage as an array, they treat it similar to a queue.

Any change has a corresponding entry, which is stored in the storage device in a first-in, first-out (FIFO) manner, i.e. each new entry is stored after the end of the last stored entry. To recreate a file, all that is needed is to iterate over all of the entries in a queue. Usually each entry has a checksum of the entry suffixed to it.

The checksum is usually a value obtained by hashing an entire log. While reading an entry or log, if the stored checksum does not match the calculated checksum of the log, the log is discarded, as it means that the log was not written properly. If you assume your file system is bug-free, this usually narrows the culprits to power loss.

It might seem really great initially, but see the problem? It is very slow. Suppose each read (one bytes) from the device is one instruction (that's quite generous as it's 100x slower than reading a value from RAM! ), and you have a 4 GiB device. This means 4294967296 B, so that many instructions are needed to traverse the entire storage. Ignore any calculations we do with the data; this would take about 1.38 seconds if an Intel i5 10th generation was running on its base frequency of 2.9 GHz (assuming no optimisations). As mentioned, calling it one instruction per read is being generous. This means that every time you open a file, you need to wait 1.38 seconds at the very least. And moreover, the processor used for this calculation is quite a beast in itself.

CPUs in embedded systems are slower. STM32F401CCU6 has a CPU with 84 MHz frequency. So, the same operation under the same assumptions would take it 48 seconds. Do remember, this is just 4 GB of storage capacity.

Nightmare, but in another direction.

An advantage of such file systems is of course, absolute atomicity. But another advantage is wear leveling. Since entries are added in a FIFO manner, it will ensure that any pair of blocks has a maximum wear level difference of 1. The wear levels of the blocks of storage would always look like [x+1, x+1,..., x+1, x+1, x, x, x,..., x, x], where the last x+1 wear is the location of the last entry.

A disadvantage, apart from being slow, is what happens if the storage becomes full? Changes might be infinite depending on the user, and so will the entries that represent these changes. However, space is not infinite or file system development would not have been so difficult.

File systems of this type include JFFS (opens in a new tab)/JFFS2 (opens in a new tab), YAFFS (opens in a new tab), or SPIFFS (opens in a new tab).

Journalling File Systems

The very weird thing is that both of the above types of file systems have mutually exclusive advantages and disadvantages. So naturally, a middle ground approach would either benefit from both, or none.

We create a block-based file system, but we also reserve a certain place on the device for the FIFO queue to store logs. We call this FIFO queue the journal or bounded log. Best of both worlds. The block file system contains a sort of "base" state, and further changes to it are stored as entries or logs in the journal.

To get the updated version of the file, all the FS has to do is take the "base" state and iterate over the journal, applying changes to it.

Independent of implementation, there is a very strong relationship between storage location and data due to the presence of the journal. This can cause an abnormal increase in the wear of the journal along with the disadvantage of increased wear for a frequently updated file in block based file systems. Another problem is that there are essentially two file systems running in parallel, and both code complexity and execution may increase.

Depending on the implementation, this may have some additional problems. The file system may decide to commit logs to the file system when the journal is full. This means that the "base" state needs to be updated, and thus, the journal is emptied. The changes may be committed one by one, but power loss during committing the journal may cause garbled data as well.

This is the most popular category of file systems, and this category includes Linux's most popular file system ext4 (opens in a new tab), and Window's ~~most popular~~ (actually you don't have a choice, as far as daily files go) file system NTFS (opens in a new tab).

Copy On Write File Systems

A file system category based on an entirely new way of dealing with this problem is Copy-On-Write (CoW) file systems.

What does CoW mean? You want to update a value? Copy the entire thing and write on that copy. It's similar to how some developers program before learning about the existence of version control.

Suppose a block (I'll give it a name x) is at a location p. If you want to update it, you will read the entire block in memory, update the necessary parts in memory, and write the updated block to a new location q.

This sounds simple, but it's a bit more complicated than that when you dive into the nuances of it.

So, files and directories exist in all these types of file systems, which include CoW file systems. This means that there is a hierarchical ordering of files and directories. A directory is a collection of files, so under working CoW file system implementations, a directory has some information about the location of a file.

Let's update our file! Let's assume our file is just one block in size for simplicity. So our file got updated, and its location shifted from p to q. But the directory that contains our file still points to p!!! So, we need to update our directory! But now, the parent directory of this directory faces a similar problem. This continues to propagate upward until it reaches the root of the entire file system's tree. The root has no parent, so updating it updates the tree. BUT, unless your root is confined to some specific places in the device, you would need to store where the root is located as well, and this update problem again continues until it finally reaches something that either has a single fixed location (in which case it doesn't follow CoW as it would need to be updated without changing location) or it has multiple fixed locations, in which case the file system has to figure out which one of the locations contains the most recent update.

Yep...A lot of problems, and a lot of headache.

An advantage of CoW file systems is that there are two copies of the block. Old and updated. If the update was not successful, the old version would be used.

The wear leveling here depends on the block allocator, which is responsible for providing the location where an update should be stored. Thus, a good algorithm for the block allocator will give good and even wear (copium 🤞, but it's possible to do this, unlike other challenges that arise due to the nature of the file system).

A disadvantage of CoW file systems is pretty obvious, as shown above. A simple update takes too many copies and writes due to updating all the FS objects in the file's path.

Another disadvantage is that it takes up quite a lot of space to keep copies. If you don't have enough space to update a file (which includes not just the file but ancestors as well), a CoW file system is a bad choice.

Another disadvantage can be that CoW file systems require a garbage collector. So, additional memory usage, and processing time. The garbage collector needs to figure out which blocks (old copies) it can safely erase, and which are required in case there is a power failure in the near future. This is not a computationally cheap thing to do.

The amount of extra space used by old copies is determined by the aggressiveness of the garbage collector, but no garbage collector should erase copies of the FS objects in the path of a file until it is sure the entire file has been updated.

Conclusion

There are too many types of file systems, and there are too many problems. They try to solve some problem but end up creating another or solving it partially. Mnemofs is heavily inspired by littlefs, which itself tries to take the best of both CoW and journaling file systems and combine them with some ingenious problem solving.

We'll look into mnemofs, and specifically, its LRU in the next part.

❰ Part II ❱ Mnemofs LRU

⌂ Home