Bootstrapable Software

neauoire · 10 January 2024 19:48

Software is bootstrappable when it does not depend on a binary seed that cannot be built from source. Software that is not bootstrappable—even if it is free software—is a serious security risk for a variety of reasons. The Bootstrappable Builds project aims to reduce the number and size of binary seeds to a bare minimum.

I think this is Malleable adjacent because software because of the two first principles of malleable software:

Software must be as easy to change as it is to use it
All layers, from the user interface through functionality to the data within, must support arbitrary recombination and reuse in new environments

GNU Mes is closely related to the Bootstrappable Builds project. Mes aims to create an entirely source-based bootstrapping path for the Guix System and other interested GNU/Linux distributions. The goal is to start from a minimal, easily inspectable binary (which should be readable as source) and bootstrap into something close to R6RS Scheme.

Currently, Mes consists of a mutual self-hosting scheme interpreter and C compiler. It also implements a C library. Mes, the scheme interpreter, is written in about 5,000 lines of code of simple C. MesCC, the C compiler, is written in scheme. Together, Mes and MesCC can compile a lightly patched TinyCC that is self-hosting. Using this TinyCC and the Mes C library, it is possible to bootstrap the entire Guix System for i686-linux and x86_64-linux.

akkartik · 10 January 2024 21:28

Thanks for the reminder, I forgot to mention in Mu’s project page that it is also deeply concerned with being bootstrappable. It’s much easier to see that Mu is bootstrappable: Compare the dependency diagram of Mu (figure 1) with that of MesCC. Of course, MessCC lets you program in C, where Mu requires you to program in my strange programming language A language designed to be easy to bootstrap can really streamline things here.

khinsen · 11 January 2024 09:29

What would be the characteristics of such a language, other than (probably) being small?

akkartik · 11 January 2024 10:03

It’s hard to put into words, but I was basically designing it from day 1 with an eye towards implementing it in machine code. Some things that come to mind:

Like Lisp, a very regular syntax that is easy to parse. But there’s no GC.
Like Forth, very transparent to the underlying machine. But there’s a type-checker (and it’s much more static than in Lisp).

khinsen · 11 January 2024 10:22

Sounds like SectorLISP or sectorForth, except that probably neither has a type checker.

natecull · 17 January 2024 02:54

Like Forth, very transparent to the underlying machine. But there’s a type-checker (and it’s much more static than in Lisp).

I wonder if the “typed stack” concept in Stephan Becher’s “StrongForth” would be useful here? It seems like it could be quite easy to add to a very low-level stack machine, and then suddenly you have a lot more safety and also polymorphism for free. Just not fully live runtime modification as in Smalltalk - but a slightly more Haskelly Forth.

I’m tempted to play with this in Uxn because it looks cheap-ish (ok it might take 2K of data storage) and taming Uxn’s 8 vs 16bit words might be a very easy win for this sort of type system.

https://www.stephan-becher.de/strongforth/

akkartik · 17 January 2024 04:09

I’ve bounced off the StrongForth docs multiple times. I can never follow how to define a new type. Does anyone know of a good tutorial for it?

neauoire · 17 January 2024 04:14

I’ve implemented something like StrongForth’s arity checking in Uxntal, it’s called Uxnbal(source). It does something similar to how Factor handles function types by using the forth notation to establish a function’s type, you might find it more approachable than the strongforth docs.

Docs

What are stack effects? In our case it is the ( n -- n! ). Stack effects are how you document the inputs from the stack and outputs to the stack for your word. You can use any identifier to name the stack elements, here we use n. Factor will perform a consistency check that the number of inputs and outputs you specify agrees with what the body does.

If you try to write

: fact ( m n -- n! ) [1,b] 1 [ * ] reduce ;

Factor will signal an error that the 2 inputs (m and n) are not consistent with the body of the word. To restore the previous correct definition press Ctrl+P two times to get back to the previous input and then enter it.

We can think at the stack effects in definitions both as a documentation tool and as a very simple type system, which nevertheless does catch a few errors.

In any case, you have succesfully defined your first word: if you write 10 fact in the listener you can prove it.

Notice that the 1 [ * ] reduce part of the definition sort of makes sense on its own, being the product of a sequence. The nice thing about a concatenative language is that we can just factor this part out and write

: prod ( {x1,...,xn} -- x1*...*xn ) 1 [ * ] reduce ;
: fact ( n -- n! ) [1,b] prod ;

Our definitions have become simpler and there was no need to pass parameters, rename local variables, or do anything else that would have been necessary to refactor our function in most languages.

Of course, Factor already has a word for the factorial (actually there is a whole math.factorials vocabulary, including many variants of the usual factorial) and a word for the product (product in the sequences vocabulary), but as it often happens introductory examples overlap with the standard library.

natecull · 17 January 2024 04:55

Yep, StrongForth types seem to be stack effects, but with type markers in each stack location. The SF type primitives seem to be something like “word or doubleword” and “RAM or ROM” (not relevant to Uxn) and also “number or address”. So each stack cell gets marked as eg “address of number” or “address of addrsss of number”. Then there’s arrays and records and typecasts I think, so a bit like a C type system. Then, words definitions must match their defined stack effects to the type markers in the stack, and the first word of that name that matches gets executed.

The SF syntax/semantics for defining types seems weird yes, and might be better off being rebuilt from scratch. What seems sensible to me is that we’d have a record type, then define new types as records of smaller types, then define stack effects as four types (the before and after for esch of working and return stacks)

sph · 13 July 2024 07:35

A lot of work by Prof. Ian Piumarta has been about language than can bootstrap themselves and free themselves from the shackles of their implementation.

See Maru, a small Lisp written in C, which executes a reimplementation of itself bundled with a x86 code generator, so it becomes fully freestanding.

And COLA, which is an exploration of this notion pushed to the extreme, for the creation of self-describing and self-compiling languages.

Kirtai · 14 July 2024 16:03

His Id is also quite interesting. It’s the object side of COLA.