BOS: A Block Operating System

natecull · 24 September 2023 04:31

Disclaimer: This is potentially a very stupid idea.

Context:

Before Disk Operating Systems, and specifically before Unix and CP/M, it was not always expected that a storage media would be formatted as a hierarchical file system.
Storage media is, however, almost always formatted as a sequence of fixed-sized blocks. “Files” are an abstraction at a much higher level than what actually exists on the media.
Some operating systems, such as Forth, do away with the concept of a filesystem completely and just address the raw storage media at the block level.
Databases often end up doing the same thing: either taking a filesystem and implementing a set of blocks over some files, or (such as Oracle) bypassing the filesystem entirely and just using the raw storage media.
In the Cloud world of virtual machines, where a “filesystem” might just be a single VM file, it starts to look increasingly silly to have layer upon layer of fake filesystems (storage pool, virtual volume, container virtual filesystem), each carefully implementing a block system under their fake filesystem so they can deduplicate, but then which, in the end, might just be used by a database to implement yet another system of raw blocks.
A major problem at the moment is package management, which is essentially just the safe and secure distribution of sets of read-only files across the Internet.
Network transmissions have to break all “files” into a set of blocks anyway, so there’s less and less that looks particularly primal about the concept of a “file”.
Hashing and cryptography seems to revolve more and more around fixed-sized blocks, particularly if we want to make full use of deduplication.
Minimalism and virtual machines are having a moment right now, with the desire to remove no longer relevant middleware layers.
The Cloud and Mobile increasingly just don’t provide any hierarchical filesystem-based APIs at all.

THEREFORE:

Why not investigate creating a “Block Operating System”, which would be an extremely minimal OS / language runtime which divides its storage entirely into blocks, no filesystem required? A “Forth but with hashes”.

There would need to be at least two types of blocks: ROM and RAM.

ROM blocks would be hash-addressed (eg SHA256/SHA512 or similar) and a fundamental BOS primitive would be fetching hash-addressed blocks from the network and caching them on the local storage media. A very lightweight addressing layer (doing the job of a virtual memory manager and dynamic-link loader) would map the full hash address of a block to its local media block address.

Some kind of block manifest structure would be defined which would allow blocks to contain locally-defined external block IDs which can be resolved in the manifest to their actual globally-unique hash IDs. This would mean that groups of blocks could be imported and exported from one system to another with close to wire speed, while remaining fully hash-checked.

RAM blocks would be a bit more problematic but would use something like a randomly generated GUID rather than a hash. They’d use the same kind of block manifest.

On top of this system would be layered whatever kind of persistent file or object system is required by whatever high-level programming environments we wish to support. But the underlying block primitives would be exposed as the official baseline OS API, as in Forth.

Advantages:

Radically simplifies the baseline needed to get an OS up and running.
Creates a massive unified global address space for shared Internet applications, that isn’t controlled by any one entity.
Application programmers don’t need to be constantly reformatting data as they shift between levels, so everything’s fast and yet secure even on low-power devices.
ROM blocks are very cache-friendly, so we should take advantage of that and prioritize their use.

Downsides and questions:

How do we safely copy data off a block-formatted storage device for transfer and backup? They appear as unformatted to most OSes and get marked as an error condition. It might be necessary to have a filesystem just to let our data be readable in the modern age. In which case we’ve only invented a rather cumbersome binary database or zipfile format.
We have to reinvent hierarchical filesystems and/or databases and/or object-persistence systems, a non-trivial problem. (Although we often end up needing to reinvent these anyway, layer upon layer of them, because the OS-provided abstractions are almost always just slightly wrong for any particular task.)
How would the interface between RAM blocks and ROM blocks work out? Presumably you’d turn a block to ROM as soon as you could, ie for the equivalent of “saving to disk”, but you’d also only want to do it when you had filled it up with enough immutable data. And in-RAM objects are variable sizes. How would that work with objects going out of scope and being deleted? What about blocks that cross object capability boundaries, and also secure deletion of privacy-critical data (such as passwords or keys) if it gets accidentally swept up into an immutable block? There would be transience-vs-persistence semantics needing to be worked out, which are almost but not quite the same as “saving” but might be as annoying.
What’s the correct block size? 1K would be nice (big round number, and it could fit in a single IP packet) but hard drives use at least 2K blocks these days I think.
Routing, publishing and storing hash-addressed data blocks to the Internet probably isn’t nearly as much of a “solved problem” as the IPFSs and DAT people like to believe it is.