Small Virtual Machines

Since @natecull is looking into brutally small targets for compilation, I thought it might be fun to have a dedicated topic. I think this is relevant to Malleable Systems because understanding the reduction of a complex program to a set of small primitives is part of understanding the foundation of the computation necessary.

This is something that I’ve spent quite some time into last year, and I’ve found a couple of useful links:

Philip J. Koopman’s Stack Computers: the new wave goes over the various stack machine designs at the time. The NOVIX is especially of interest. It makes use of bit in the opcode to encode the various behavior of a stack machine, this is also what the J1 did.

Kragen’s Archival with a universal virtual computer talks about using small move machines for data preservation.

  • Fractran, Conway’s OISC register machine using prime encoding. where the only opcode is MUL, and each cells are a fraction.
  • Subleq, OISC that subtracts the contents at address a from the contents at address b, stores the result at address b, and then, if the result is not positive, jumps to address c.
  • Brainfuck, 9 opcodes turing machine.
  • Chifir, register machine, 15 opcodes, designed to host smalltalk-72
  • Uxn, concat stack machine, 32 opcodes, local thread.
  • CHIP-8, an education register machine, 36 opcodes, graphics.
  • SECD Machine, abstract machine for functional programming, 4 stacks, 10 opcodes, local thread.
  • Warren Machine, abstract machine for logic programming.

The scheme we described in this essay is but one proposal, and is in no way, shape, or form a complete and definitive technical solution to the challenging and underexplored problem of ensuring the future of our digital heritage.


Awesome, thanks! And reposting your one from the other thread:

“A sketch of a minimalist bytecode for object-oriented languages” Kragen Javier Sitaker, 2017-03-20 (updated 2017-06-20)

You really only need two stack-bytecode operations for the object memory:

NEW ( class [ val0 val1 ... -- ref )
AT ( ref idx -- val )

Here [ is a PostScript-style stack mark for variadic function invocation.

AT fetches a field value, and NEW creates an object containing the given fields.

Burroughs B5000

Alan Kay talks a lot about the Burroughs B5000 being an inspiration to Smalltalk because it had “tagged memory”. Ie, machine words tagged as being an address or number, so that as in Lisp conses, programs can’t spoof addresses and they can be used as capabilities (security boundaries).

This seems like a very important property for a VM to have, but it might not be completely compatible with being a minimalist 8-bit inspired VM. But a “VM on top of a VM” I’m considering could implement memory safety semantics like this.

Obviously a Lisp could do this (as long as it didn’t expose an equivalent of PEEK and POKE). CAR and CDR are very effective memory-safety mechanisms, though you potentially lose half of your RAM which on a 64K machine might be a problem. But, the upside is that CONSes are very regular in size, so only need one memory pool, and that might work very well with paged RAM.


Tiny BASIC in 1975 was two nested VMs, which made it very small but also made it very slow. It was a mother of a thing to try to understand from the printout! Blew my mind as a kid and still does today.

The Design Note specified a virtual machine, in which the Tiny BASIC interpreter is itself run on a virtual machine interpreter. The designer’s idea to use an application virtual machine goes back to Val Schorre (with META II, 1964) and Glennie (Syntax Machine). The choice of a virtual machine approach economized on memory space and implementation effort, although the BASIC programs run thereon were executed somewhat slowly.[15]

META II is the first documented version of a metacompiler,[notes 1] as it compiles to machine code for one of the earliest instances of a virtual machine.

The paper itself is a wonderful gem which includes a number of excellent examples, including the bootstrapping of Meta II in itself (all this was done on an 8K (six bit byte) 1401!)."—Alan Kay


PicoLisp is a very interesting small VM based on the “absolutely everything is a CONS” concept. It’s still a bit more heavyweight than Uxn, and the current version is based on 64-bit words, meaning a minimum CONS cell is 16 bytes long. I had some concerns about security years ago due to how it used to handle built-ins (ie direct RAM pointer subroutine jumps). Meaning it was absolutely not suitable for game engines running downloadable code, but maybe this has improved now. It also I think didn’t have any mechanism for hiding the contents of a lambda at all: a lambda was literally just a list and given a reference to one, any code could unpick it and learn all its secrets, so a closure could not be a security boundary. But then it didn’t really have closures either; it’s a dynamically scoped Lisp in the style of EMACS Lisp. Which is probably fine but it is certainly a very non-mainstream design in these post-Scheme years.

Yes, this exactly! This is why I’ve been interested in tiny VMs for many years. Not necessarily because I need to have one to solve immediate programming needs, but because the systems I’m using feel far too complex and I need to understand how to make them simpler.

I’ve struggled many times in online programming forums to try to explain why I’m trying to do something difficult and vague, and that I’m not looking for a prebuilt chunk of 25 gigabytes of code to drop in, Solve My Client’s Problem, Ship A Product, and move on. Rather I’m looking for understanding, and building a machine from bytes is just a means to that end.

I’ve struggled for years using OO languages that just seem to be powered by “magic” at some crucial point, and I want to work through the rules to make the magic make sense. The problem is that since the dawn of Smalltalk, all of the wizards have been fighting each other over what magic even is. Then the renegade Haskell sorcerers waded in and now immutable reactive polymorphic monads are flying everywhere yet our spreadsheets still keep crashing.

Sometimes making the machine simpler makes the problem bigger (ie there is irreducible complexity and you’re just squeezing it elsewhere). But sometimes if you reduce the space to its primitives you get lucky and you reduce the complexity of both your problem and the implementation, and when you do, you win big.

And if nothing else, you learn something.


Playing with miniSqueak (SqueakJS) the thing that strikes me the most is that Smalltalk method notation is so very nearly almost stack machine notation, and yet not quite. But almost!

If we had ordinary Forth-like words but made them force a lookup against an object on the top of the stack, that probably gets us 95% of the way to native Smalltalk in a stack machine. Wondering about prefix runes (maybe a . or : ) on a word to indicate that they indirect through an object on the first or second place of the stack, to account for parameters?

Ie instead of

Pen new mandala: 35

we’d have maybe

.Pen .new 35 :mandala

If a single selector required multiple parameters, then just use multiple colons between words:

.Point .new 1 2 :x:y 

Then we could mix these method selectors with non-OOP words, that don’t have any rune, and it’s always clear what’s going on. Everything on the working stack at every point is data and all stack effects are caused directly by the selectors themselves.

Not sure but it’s worth thinking about. It’s not great in terms of human factors, but neither is Smaltalk notation for me, and this at least puts as thin a barrier as possible between the programmer and the machine. It’s in the spirit of Forth. Possibly, it’s also in the Brutalist spirit of “truth to materials”. A stack machine isn’t exactly a truth but I like to have as clear a picture as I can in my mind of what the machine is actually doing, and not what the compiler is pretending that the machine is doing.

Selectors would literally just be words. A . selector would be something like “self-address SEND” where SEND is “>R DUP GET-CLASS <R SWAP CALL” and a single-param : selector would be “SWAP self-address SEND”. Two parameters and up and the selector would do ROT or nastier juggling instead of SWAP, to bring the buried object to the top of the stack. Sorry about that. Maybe take it as a hint to avoid multi-parameter methods if you can. But you’re at least avoiding have to store the parameters in main memory.

The class is a word taking the selector and object and parameters as an argument and then using those to figure out how it wants to dispatch. Simplest would just be a set of comparisons with the most common selector first. Next simplest, and probably best, an array of selectors/methods to iterate through. Finally a hash table if you must but maybe don’t.

. and : as runes don’t play nicely with Unxtal, sadly.

Someone’s probably already done OOP like this in Forth several times over in the last 30 years, right? I’ve probably read it and forgotten it. If there’s anything I can claim as my own it might be the prefix rune syntax but that’s also probably something someone else has done.

1 Like

This is very interesting. I’ve been looking into mapping Smalltalk to Forth somehow but haven’t seen anything that fits as well as this before.

1 Like

Neat! I hope it’s helpful! It might not be completely done yet.

I’m wondering what words would work for doing parentheses. It would make sense that ( should push the current context to the stack, but ) doesn’t seem like it has anything to do. It might be important to have it even as a no-op just to keep visual sanity, but maybe it might even do the stack shuffling.

Say we have

.foo .bar ( .foo .baz ) :boz

Maybe the ( can first stash the top of stack to the return stack, then introduce the current context to start a new evaluation. Then ) can unstash the old object back. Now :boz can directly send itself to the object, it doesn’t need to shuffle.

.foo .bar1 ( .foo .bar2 ) ( .foo .bar3 ) :boz2:boz3
.Pen .new ( 35 ) :mandala
.Point .new ( 1 ) ( 2 ) :x:y

I feel like maybe that works? Partially. It means that to make the selectors always get the object at top of stack we would always need to use parentheses for parameters, which is a hit to conciseness. But if it improves readability as well as performance, that might be a win. It would be a bigger win if everything, including numbers (except for punctuation words like parens), were selectors, and that might be the case if this wasn’t an actual Forth but a minimal stack query language.

I’m also assuming that everything on the stack is an object or can be safely interpreted as one, even numbers. So if we accidentally send a number as a selector (and maybe numbers are valid selectors) or send to a number, it won’t break. Worst might be we end up with more or less on the stack than we expected.

Did you look at GitHub - chicoary/small-forth: Forth interpreter in Pharo Smalltalk (Forth implemented in Pharo)?

1 Like

I’d say one of the big issues with today’s tech is the wizardry spirit. Writing software to dominate a virtual world of one’s own creation. Then, by extension, dominate the users of your software. None of that need be conscious. But you see this spirit everywhere, in particular around Lisp-family languages and functional programming.

1 Like

Well, I’m going the other way; from Forth to Smalltalk. It’s not that I already have a Smalltalk and need a Forth; it’s that I want to get Smalltalk-like calling semantics inside a much smaller Forth-like thing. (Almost certainly not an actual ANSI Forth, which is very large, and certainly has OOP libraries available for it.)


Uxntal and other concatenative languages can pass lambdas on the stack, when the items being passed are too complex, in Uxntal, or another language like it, it would look like:

{ ( size ) 02 ( fn ) =brush ( color ) "blue $1 } STH2r do-draw

This is not quite a clojure but it encapsulates arguments into a single short. Factor and Joy work in a similar way, the function then unquotes the lambda, or reads the address as a pointer to the arguments. I’ve loved that way of passing large amount of data on the stack and limit the juggling.

1 Like

Yep I love lambdas/blocks/quotations, they were the most interesting thing about Retro Forth to me so I’m really happy that Uxntal now has them! And being able to use them as general data tuples makes them even more helpful. Btw, is there a description somewhere of exactly what they assemble into?

Eg } always assembles as a JMP2r I’m guessing?

Then { is a JSI to }+1, right? Leaving { address on the return stack so you fetch it with STH2r. I feel like it would be nicer if } after { automatically added the STH2r for you though; when are you ever going to define a lambda specifically to reference as data, yet not want it to be on the data stack after defining?

?{ must be a JCI ? I’m guessing also followed by a JMP to }+1 if the JCI fails?

While !{ is presumably just… nothing, since you’re falling straight through into the lambda?

Yup! you got it right :slight_smile: Except that !{ jumps over, and } doesn’t assembles to anything, it’s simply a label.

Yup! you got it right :slight_smile:

Thanks but I think I missed some things.

} does not assemble a JMP2r, it emits nothing.

There is a single JSI before a {, a single JCI before a ?{, and a single JMI before a !{. All immediate jumps are relative addressed and go past the lambda. The use case of !{ is therefore NOT for immediately executing a lambda, but for completely ignoring it. Also zero on the stack will execute a lambda conditional, nonzero will not execute it. So the lambda represents the “false” rather than “true” branch, the opposite sense to how it would normally appear in other languages.

It might be helpful to document this. I only figured this out with this code, which outputs 12357 and leaves 0103 on the return stack.

{ #30 prn } #31 prn
#00 ?{ #32 prn } #33 prn
#01 ?{ #34 prn } #35 prn
!{ #36 prn }  #37 prn
@prn #18 DEO JMP2r

I was running Uxn32 and while it has Beetbug, it’s not very helpful with relative addresses - every call to prn has different hex. This is certainly true to the circa 1980 6502 assembly debugging experience, but not actually as much fun as it sounds.

1 Like

I’ll improve the docs with your example :slight_smile:

1 Like

On the topic of long lasting virtual machine design, I was looking into versioning toward a frozen specification, for example Uxn hasn’t changed in a long time, varvara still changes occasionally, so how do I communicate through versioning, how close I am approaching a frozen state.

This is from [redacted], but I love the idea, so I’ll share it here, I think it might be interesting to others:

Kelvin versioning uses integers in degrees Kelvin, counting down toward a final specification, upon reaching absolute zero, it is frozen. Further updates are no longer possible. It contrasts with how typical software is designed to indefinitely increase in scope, and complexity.


The way I think about versioning these days is it’s silly to try to predict the future. I want to indicate my desire to not have more versions, without precluding the possibility of further versions for some important enough change like a security issue.

The best way I’ve come up with is a “___ days since last version” sign.


I’d go for no version numbers at all (as you do). Commit hashes are sufficient as identifiers nowadays. That leaves version numbers as labels of importance assigned by the developers, which is not so useful to users because their view of importance can be very different. For users, it’s much more interesting to know the developers’ intentions for the future than the developers’ view of the past.


I can’t edit the post, but I thought I’d add the following fun entries:

  • ilo, a 200 lines vm capable of driving RetroForth.
  • dis, the vm behind the Inferno operating system. Small-ish.
  • dreamcast vmu, the implementation code of the vmu runtime is tiny and could probably be used as a runtime for very specific things.
1 Like