Introduction Part I | mnemofs
So, designing and implementing a file system...it's a daunting task, and very overwhelming as well. BUUUUUT here's me delving into the concepts to the best of my knowledge. Grab a drink or some popcorn as it's going to take a while.
Non-Volatile Storage
Non-volatile storages like hard disk drives (HDDs), Solid State Drives (SSDs), Compact Disks (CDs), Pen Drives, NAND Flash Devices, etc. are basically entire arrays that you can use to store anything.
What makes it different from arrays in your Random Access Memory (RAM)? Many things. But the most important of them is that it is non-volatile. This means that if something is stored in it, it will remain stored in it even if power is no longer supplied to the device. As an additional bonus, as far as storage mediums that exist till date go, if you store even a byte at a location, say, x
, then it will be at x
unless deleted explicitly (or some unexpected event can modify your storage, but more about it later!).
Another difference is that, usually with an operating system (OS) running on your device, in RAM, you can not really specify where you want data to be stored in terms of absolute position in the device. If you try to write, say, a byte, at position x
, it needs to be within the allowed locations by the kernel of the OS. Without getting too much into it here, as it is out of scope, if you do not write within the confines specified, you get hit with bad karma in the form of Segmentation fault (core dumped)
.
There might be people trying to say you cannot do that to a non-volatile device running on an OS as a user either blah blah, while others will counter it with their own written kernel module to do it but those are technicalities, and it will only result in confusion.
Thus, having established that they are different, we'll be calling volatile memory (primary memory) like RAM as memory and non-volatile memory (or secondary memory) like HDDs, etc. as storage. Storage is slower than memory, but it is also cheaper per byte. Adding the advantage of the non-volatile nature of storage shows you why it's such a popular medium of storing things.
In and of itself, storage seems like a very good thing, but it is like a fox in sheep's clothing. It can go horribly wrong if not used properly. In fact, the false sense of security of having your data persist can make the shock even bigger if you can not access your data for whatever reason due to improper use.
File Systems
Why are they needed?
A thing to note is that a user would be better off printing out everything compared to managing the data on a storage device manually. Why? Here's an example to help you grasp that situation:
Let's say there are 16 Bytes (16 B) of storage on a device, and you want to store 4 pieces of information, each of 4 B. If we split it into blocks of 1 B each, we can give 4 consecutive blocks to each piece and write it down somewhere, say on a piece of paper, where each piece is. Say, after some time, the 4th piece is no longer required, but your 2nd piece also suddenly needs 3 B instead of the original 4. So you will clear the 4th piece's data from the device and clear the last block given to your 2nd piece, so it only has 3 B. Now the 8th, 13th, 14th, 15th, and 16th blocks are clear of data. Now, if another new piece (5th piece) wants 2 B of storage, you give 8th and 13th place to it, and so on and on.
If you got confused while reading the example, that just proves my point.
This is very confusing to think about and, more importantly, difficult to remember. And here we are, barely talking about 16 B of information space and 4 pieces of information. Reality is harsher. We need to store SOOOOO many things. Your high-definition (HD) photos, your movies, your applications, your games, and so on. A normal user just wants to store things without pain. The above-mentioned method is very much an alternative definition of pain.
Pain always exists.
If you can't feel it,
it's probably being shouldered
by someone else.
- Me, 2k24
So, some developers take that pain away from you onto their shoulders, and bring you file systems which are programs that manage your storage without you having to think about them too much.
File System components
File systems have a lot of components. Like a car, some of them are visible to a user, and they are aware of their presence, while the others are kept under the hood. These components are called file system objects (FS objects).
Also like the parts of a car, these fs objects have different names depending on whom you ask. We'll stick to Linux (/ Unix / POSIX) terminology here, but at the end, it is just a name.
We'll go through some of the very common ones below, while we keep others for later exploration as we delve deeper into file systems.
File
In a traditional sense, any collective piece of information that is part of one entity is one file. Like a photo. One entire entity. It is a very loose definition. What if you cropped an image into 4 pieces and kept them separately? You do you; no judgment.
Take it like a collection of information that you want to keep together, such that programs can work on the entire thing together. An image viewer can show you an entire image in one go if you keep all of it together. A video player can play your entire movie in one go if kept in one piece. And so on.
They are analogous to a single paper or multiple papers that are kept together in something called a file (usually seen in the old photos of offices from a time close to the ice age) as they belong to the same topic.
Folders / Directories / Drawers
Again, work terminology. They are called many different things. We'll stick to directories, as Linux calls them. They are just a group of files and/or other directories as well. Why are they even needed? Multiple reasons. The right answer is that it depends on the user. But, here are some of the common reasons:
-
Organization. We like to be viewed as organized intellectual beings. So we organized multiple files in a hierarchical system so that topics become more specific the further down in the hierarchy you go.
-
Searching. Suppose you have 500 files in your view. If you want to look for a specific file, you need to go through, in the worst case, all of them. If organized, the search may be way faster, as you would know which directory and the same inside it, and so on.
etc.
Symbolic Links (symlinks) / Shortcuts
Suppose you have a file in a directory that is very deep in a hierarchical system of directories. But suppose there's this folder called "My Desktop" in which, if files are kept, they will be shown on...your desktop (🤯). Now, keeping everything on your Desktop would not make sense, and that file is happy where it is. And you're mostly happy with where it is as well. But you need it on your Desktop as well.
Well, copy it, duh! But updating one does not update the other. Thus came symlinks. They are a type of file that points to the location of another file. There are two types of symlinks, but...technicalities.
Symlinks might not have universal support as they might not be considered essential for the target audience of the OS, but most major general purpose OSes do support them.
Inodes
Files usually have two types of data associated with them. Their content, and their metadata. Their content is what we usually refer to as files. Their metadata usually refers to information about the file. If you go back to the work terminology, it might be analogous to the information written on the cover of the physical file.
These contain information like the file's name, their owner (who created it, or who owns it, etc. in a multi-user environment), their creation date, their last modification date, their last access date, their size, and so on. These are stored as another FS object called an inode.
Inodes are very much optional too, and it depends on the file system on how they want to manage the metadata.
Conclusion
So, we glanced over the basic knowledge required to start learning about file systems. The fun and games end here, and in Part II, it's time to dive into the depths of file systems, and file system development.
⌂ Home
MIT 2024 © resyfer.