A History of Storage: Files

From Multics to NFS and beyond

Apr 24, 2025

This is a series exploring the history of persistent storage systems in the world of computing. Previously, we covered block storage.

Today we go into the world of files.

Files are an interesting choice in how the world of computing evolved. A useful abstraction on one level that is yet simultaneously a notorious agent in complexity for technologists to interact with.

The world is driven by files. Some have said “Everything is a file”.

We shall follow this journey.

The First Files

Like many digital-focused terms, files have their start in the tangible world.

In Latin, filum can refer to a thread or string, or more ominously a cord of fate. In the fifteenth century legal professionals would sometimes use filing strings to keep track of their documents, hanging them up for ease of reference.

The file is not merely attached to a document but as a sorting technology for preserving and maintaining the organization of information, particularly for future reference by the user or others.

As punch card computing was developed in the 1940s, the term “file” quickly came to be used in reference to how the punch cards themselves were organized as files inside filing cabinets.

Eckert, W.J. Punched card methods in scientific computation. Link.

By 1956, documentation for the IBM 305 RAMAC referred to its hardware as containing “disk files”.

However, the advent of a computer file system did not come in full force until the development of Multics.

File System Pioneering: Multics EFS

Multics was an MIT project that began in the 1960s and was the first computer architecture to take a serious stab at developing a files system. Before its release, EPL developers were given documentation to begin building with Multics “logic”.

The 3000 page Multics System Programmers’ Manual (MSPM), section BE.10.00 “The Elementary File System (EFS)” documents the first proposals around such a preliminary file system, supplanted by a revision BE.10.01 the following month.

Before this point, block storage with tapes and disks was the primary storage mechanism that computers could work with. This was raw addressable chunks of memory without much abstraction.

Block addresses needed to be manually managed and were coupled to the hardware, a fairly error prone process.

No abstraction meant no file metadata, no I/O coordination, no lifecycle management, or access controls.

EFS introduced several important innovations which would become staples in the world of files:

Hierarchical structure for files and file metadata via pointers.
Logical records and word-level addressing for random/sequential access and offset support, abstracting out the physical layout of the drive.
I/O Mode flagging to declare files as permanent, foreign, or temporary using bit-flags, an early form of file attributes
A unified read/write model with Open/Write/Read/Close operations clearly defined.
Bit flags for EOF (end of file), EOR (end of record), or error reporting

Now we begin to see file type semantics that speak to lifecycle (Temporary, Permanent, Foreign) and more abstraction and interoperability that are central to the “files” package.

A Second Take: Multics Basic File System

The EFS was merely a prototype in the end. Multics ended up taking these preliminary elements and building out a far more comprehensive file management solution in their final Multics solution.

We see in BG.0 “Overview of the Basic File System” the unpacking of this new paradigm.

The Multics Basic File System consists of segments (files) and memory-resident page and segment tables. This is paired with a multi-level storage management system for infrequently accessed data or for backup data that would be sent to offline devices.

The segment is the heart of the novelty here. It is a linear array of data that can be accessed with implicit memory references (like block storage) or explicitly with read/write system calls (like file addressing).

It also added a number of controllers that managed the segments, directories (symbolic naming), access controls, page control, core (memory) control, and device interface modules (DIMs) to abstract I/O interactions.

But from a long-term perspective we see two other key ideas implemented here.

First, concurrency and fault handling. For the first time an abstracted layer of locking mechanisms, page faults, and segment faults were implemented to protect the file systems from concurrent writes or handle for other potential errors.

Second, we see user-based Access Control Lists (ACLs), a model we will dig into a bit further here.

As specified in BG.9.00, when a user attempts to access a directory branch, the Access Control “determines the [effective access] mode of the user with respect to this branch and returns this mode and the ring brackets” to the Directory Control.

There is an apparent mode and an effective mode.

The apparent mode is concerned with the read, write, or execute permissions available to the user or group (generic user).

The effective mode governs access itself and is produced through the ring brackets.

The ring brackets define the range of privilege levels under which the segment may be accessed or called. It is composed of three integers:

The first two integers are the low and high bounds of the access bracket, specifying the rings allowed to directly access the segment based on the user’s effective mode. They are named access_low and access_high.
The third integer is the high bound of the call bracket, identifying the highest ring that may attempt to execute the segment via a controlled gate crossing. It is named call_high

These are hardware-enforced privilege rings but they do introduce a mechanism for ring crossing via a gate which is referred to as the call bracket. In other words, a call bracket is where a segment cannot be accessed as belonging to another privilege ring, but a ring crossing through a gate mechanism can allow the user to execute the segment.

Put more simply, this allows for fine-grained privilege escalations, and it is the foundation of hardware kernel escalation and UNIX’s setuid.

Unix: Everything is a File

Multics was theoretically impressive and provided groundbreaking implementation of secure, fault-tolerant systems even outside the innovations around files, but we are still one step away from the foundation of modern operating systems today.

The all too famous Ken Thompson and Dennis Ritchie, both contributors to Multics, left for Bell Laboratories. The editors at Wikipedia claim Thompson did so in part to build an OS that could support his video game Space Travel he was developing. The result of this effort being Unix.

Whatever its origins may be, Unix was when files came into form in their own right. Extending Multics’ directory tree model, Unix implemented a hierarchical file system that not only managed content provided by users for storage or reference but a whole new philosophy.

“Everything is a file.”

Now for the first time devices, sockets, directories, even processes are presented as files (at least in a heuristic sense).

This offers extreme advantages in interoperability, simplicity, and composability as now there is consistent I/O model for essentially everything a developer could need in the Operating System. There are caveats to this naturally but that is the gist behind Thompson and Ritchie’s design and it is impressively effective given Thompson himself worked alone on the first few versions of Unix.

Now even remote device mounts can be treated in a unified directory structure, and this opened up brand new horizons in the world of file storage.

Unix additionally stripped down much of the complexity around ACL, and privilege rings to decouple it from the hardware and make it far simpler for developers to build against.

By reducing the learning curve and creating radically extensible systems, Unix paved the way for MacOS and Linux systems that remain with us to this day.

But how did file storage work into all this?

This is where we can enter specifically into the question of file storage.

Again, up to this point all files were stored as block storage in magnetic disks and tape and Multics and Unix introduced hierarchical storage, segmented memory, and access controls abstracted from raw blocks.

By 1975, Unix V6 launched with V7 shortly to follow, introducing inodes. Each file is associated with an inode which stores metadata and points to data blocks. Directories in turn simply map names to inode numbers.

This decouples file metadata like ownership and the storage layout from the name of the item itself, access through a singular file API.

Unix Evolved: BSD and UFS

So where did files go from there? As the Unix ecosystem continued to evolve so did the systems undergirding files.

One major child of Unix is BSD (Berkeley Software Distribution) an open source project with a somewhat tendentious relationship with Unix proprietary licensing but which yet still produced a number of innovations that could be incorporated further downstream into modern file systems.

With the release of BSD 4.2 in 1983 came a far more robust file system called UFS (Unix File System) that included:

Block size increases from 512 or 1024 bytes to 4096 and 8192 bytes.
Partial blocks were chunked into block fragments to save space
Metadata distributed across cylinders for better performance (lower seek time)
Linear scan performance of directories improved over time
Tunable inodes to adjust file system size as needed to balance storage needs and performance.

Across the board, UFS introduced a very matured flavor of the Unix file system one that would become standard not just on BSDs but also for SunOS, Solaris, and others.

It would also set the the stage for Linux.

ext: Linux and Modern File Systems

Aside from Unix offshoots like BSD, MINIX was launched in the late eighties for academic purposes with source code available for general use.

Linus Torvalds picked up on this free kernel and launched the Linux kernel in 1991, developing it off MINIX using the GNU C Compiler.

While the earliest versions of the kernel mirrored the MINIX file system, Remy Card implemented the first generation of ext (extended file system) in 1992 with Linux 0.96c. Unfortunately, this first generation did not offer much more than a Linux-native approach for file management.

However, this changed quickly when ext2 launched in 1993 with separate inode and data block bitmaps, fast symbolic links, and file attributes. This generation also suffered from problems around unclean shutdowns which required fsck to check and repair files after a large variety of system failures or even errors.

This was solved in ext3 with journaling. Journaling is the extended file system’s version of a write ahead log which records metadata operations before applying them to assist in clean transactions and rollbacks as required. This with backwards compatibility with ext2 created a clean upgrade path.

Yet this generation was limited by scale and performance. This led to ext4 which remains the default file system in the Linux kernel to this day.

ext4 has new major features such as extents that replace block maps to add efficiency for larger files, delayed allocation by batching writes, 64-bit blocks, journal checksums to provide integrity measures for the journal itself, and large max file and volume sizes.

Networked Files for Distributed Systems

While we’ve arrived at the modern implementation of file systems in Linux systems, we have bypassed many other offshoots along the way, such as Windows and SMB.

Let us refocus on another relevant historical strain for understanding files in the cloud: files and networking.

Much like with block storage where DAS evolved into NAS and SAN, a growing demand emerged for sharing files between UNIX systems across a LAN.

Sun Microsystems built NFS (Network File System) in 1984 to render remote files like they were local. It was formalized in a protocol in RFC 1094.

NFSv2 was the first public release of the Network File System and was built on the principle of mountable remote directories. A client mounts a remote directory and can read and write those files like they were local.

Initially communication was carried over the stateless UDP protocol. It worked relatively well over LAN but not over WANs. And if a server failed, that introduced issues as well.

These problems were mitigated to some extent with NFSv3 in 1995 which introduced TCP, doubling to 64-bit file sizes and offsets, asynchronous writes, file attribute caching to reduce round trips, but yet remained stateless. Clients and servers for forced to rely on idempotent operations.

Nevertheless, by this time NFSv3 was becoming a fairly standard component of the Linux package, particularly in HPC clusters and data centers with distributed systems.

NFSv4 finally introduced stateful protocols over TCP with open state, file locks, and delegations, alongside compound operations to increase performance.

Even more importantly, it introduced ACLs for fine-grained permissions across distributed systems, and authentication compatibility with TLS and Kerberos.

For the first time all exports also appeared under a single root directory for cleaner client side management of a global namespace.

NFSv4 is still the standard with minor versions such as NFSv4.1 and NFS4.2 for stability, parallelism, and other cloud-friendly features, but the architecture has largely been set in stone and seems to have matured fully.

The Migration: Files in the Cloud

A slight tangent before exploring the evolution of cloud service provider based files solutions.

The Unix model which we discussed above was not merely a de facto standard for operating systems but in some sense codified as the standard with the publication of POSIX compliance in IEEE 1003.1

Essentially, POSIX is a designation that identified whether various systems such as Linux were sufficiently interoperable with other operating systems by possessing the necessary criteria of Unix API compatibility, including around file and directory API calls such as read(), write(), chmod().

POSIX-compliance has thus become a criterion for identifying if a service could be deemed sufficiently like a file server.

This is a primary distinction for object storage which more or less came to the forefront of persistent storage paradigms with the release of Amazon S3 in 2006.

We can cover object storage more in depth later but it is crucial to note that S3 and object storage is not POSIX-compliant even if it does have file and folder like semantics.

Many companies have since migrated from using file storage to object storage, but in time the cloud service providers have each in turn released their own POSIX-compliant NFS systems with Amazon EFS, Azure Files (which includes Windows SMB), and Google Filestore.

To this day there are several primary advantages around leveraging these POSIX-compliant cloud services versus object storage:

Minimal replatforming: a suitable “lift and shift” option
Mountable storage that does not need accessed through an HTTP API
Much lower latency particularly on premium SKU offerings

The Future of Files and File Architecture

All the same the cloud computing push toward “serverless” and managed service implementations has increasingly led to the deprecation of file servers for cloud-native engineering.

File content repositories like Microsoft’s SharePoint or OneDrive have increasingly usurped the role that local or self-hosted file storage solutions used to play. This is not to say that cloud-hosted file storage is now universal (the statistics on this are difficult to pinpoint with accuracy) but it is fair to say that it is at least fairly ubiquitous across SMBs and enterprise companies.

Where could file storage advance from here?

A number of potential paths:

Further performance improvements through integration with NVMe-oF
The extension of Zero Trust Architecture to file system access controls
DAOS (Distributed Asynchronous Object Storage) for high-performance computing and POSIX-compatibility.
An IPFS (InterPlanetary File System) designed for truly global file identification and access across the entire world.
Semantic file systems that emphasize accessing files based on content rather than hierarchy. This is an old proposal but with the advent of generative AI has freshened interest in rethinking how data can be accessed.

If one thing is for certain, files are no longer an innovation focus area in themselves, but they are likely to remain able to piggy back off the advancements around object storage and AI semantic searching as those continue to bring new possibilities around how we use data, and consequently the questions around how we store such data.

Thinking through the Cloud

Discussion about this post

Ready for more?