milliForth: Forth in 380 bytes (on x86)

natecull · 8 November 2023 00:10

Tool by fuzzballcat (2023)

A FORTH in 336 bytes — the smallest real programming language ever, as of yet.

milliForth is a new contestant in the “implement a real-ish programming language in a 8086 DOS boot sector” contest, narrowly beating Justine Tunney’s “SectorLisp”.

Additional Resources

Hacker News discussion

Metadata

suggesters: natecull
curators: natecull, akkartik

It starts with 13 builtin words. Colon definitions (including compile/immediate mode), memory read/write, stack and return-stack pointers, machine state pointer, binary NAND, numeric addition, “is not equal”, and character I/O.

From there (without even literals, which is a new twist for me), the Forth file bootstraps numbers, the Forth stack operators, control words, and finally gets to printing out a character string.

While the set of builtins has been driven by minimalism rather than practicality, there’s still something really nice about seeing a Forth where even words like “dup” are defined in Forth itself. Almost everything you want to know, you can read in the language source, and if not, well 380 bytes of assembler language isn’t that hard to study.

Of course, everything being based on raw memory read/write means Forth isn’t great for provability. It would be nice to find something that bootstraps as cleanly as this but also has capability-like security properties.

akkartik · 8 November 2023 01:10

Of course, everything being based on raw memory read/write means Forth isn’t great for provability. It would be nice to find something that bootstraps as cleanly as this but also has capability-like security properties.

@natecull Have you seen my paper on Mu? Not quite capability-like security but maybe halfway there. And it fits in KB not bytes.

natecull · 8 November 2023 02:35

Interesting! I guess SubX is a little like UXN in its way - a raw opcode assembler that still tries to be a little human-friendly - except for a physically-existing x86 CPU rather than a fantasy console.

Since existing assembler toolchains have far too many dependencies, I agree that writing your own assembler from scratch is a useful idea IF directly emitting x86 machine code is something you want to do.

But what I’m looking for at the moment is more along the lines of a tiny interpreter / virtual machine that I can know right from the beginning has the security properties I’m interested in. Then, I could use that VM to run tiny programs sent to me over the Internet - to calculate logical expressions or database-query-like statements - and be reasonably confident that I couldn’t get my hard disk wiped or infected by any possible combination of bytes sent to me. I want the language to be as tiny as possible, and I want the VM to be small enough to implement in an afternoon, so it has a chance of existing, but I’m, willing to let the VM do both a lot more, and less, than an x86 CPU.

One of those required security properties would be a hard separation between RAM addresses and integers, so that addresses can be used as capabilities. I think cons cells give us that (almost as an afterthought), and not many other mechanisms do.

But cons cells usually come with the entire garbage collection gorilla and jungle. I’m wondering if a cons cell heap allocator could possibly be combined with a Forthlike stack machine that still punts garbage collection back to the programmer. Haven’t got much further than that wondering, for now.

akkartik · 8 November 2023 02:49

SubX is not what you want, but Mu in the second half has many of your properties. Pointers can be converted to integers but not vice versa. It’s klunky still, but I could imagine downloading a source script and then deciding to give it access to the disk or not when running it. I never got around to plumbing the network syscalls in.