erishell: replacing command-line flags and configuration with a DSL

I recently rewrote a utility to remove all command-line flags configuration options and replace them with a concatenative language, and I think this qualifies as a change that increases malleability. I’ll tell a bit of a story and then some examples.

I maintain the Go reference implementation of the Encoding for Robust Immutable Storage (ERIS). This is basically a minimalist content-address storage system with a better-than-nothing privacy and deniability mechanism.

Someone else wrote the initial implementation and I forked it to fill out the library with features that I needed for my own use-cases. In order to use the library, I needed a utility that exposed the features of the library, so an ad-hoc eris-go utility was created. As more standards that layer on ERIS were drafted I would add those features to the library, and then add a Go-style sub-command to eris-go utility.

Everyone that used eris-go complained about the tedious command-line options. I was fed up with using the flags so I added a JSON config format because I use Nix configuration modules and we like to write our configs in Nix and then reduce them to JSON.

The config file was only a minor improvement. The command-line flags shared a common model with the configuration files so any weaknesses in the model was endemic to both.

I was tasked with adding support for serving static websites as virtual-host mappings to ERIS-FS snapshots. Fitting this into the JSON config file seemed stupid so I gave up on it completely.

I replaced the sub-command-style eris-go tool entirely with an erishell utility that uses a concatenative language that could be specified entirely on the command-line. The documentation of the language is here.

This is going to be hard to read and is very specific to the application, but try to follow the concatentative mechanism.

A command to decode a stream by fetching blocks using the CoAP protocol and then piping into mpv would be written like this:

$ eris-go get \
    -store coap://example.org urn:eris:B4A36J5...
    | mpv -

It is now written like this:

erishell \
  null- coap- example.org coap-peer- \
  urn:eris:B4A36J5... decode- \
  1 fd- swap- copy \
  | mpv -

I flipped the command-line flags to take arguments from the left instead of the right. Arguments that not recognized as “programs” are pushed onto a stack. Arguments recognized as programs take the stack as input and output new stack. In the previous example the words urn:eris:B4A36J5... decode- are evaluated to a readable stream, 1 fd- is a stream over stdin, and swap- copy- replaces the previous two items on the stack and then copies the stream from the top of the stack to the stream below it.

Obviously this is a convoluted way to write to stdout, but the justification is in composing a storage hierarchy in arbitrary ways, which I don’t think can be done reasonably with a configuration file.

To specify fetching blocks over CoAP, caching them in a git-style xx/xx/xxxx… file and directory hierarchy, caching that in memory, and serving a static website would be done something like this:

erishell \
  null- coap- eris.example.org:5683 coap-peer- \
  /tmp/eris get- put- or- dirstore- \
  cache- \
  8 memory-
  cache- \
  null- http- \
  urn:eris:B4A6A… vhost.example.org/ http-bind-fs- \
  0.0.0.0:80 http-listen- \
  wait-

Difficult to read but I think configuration file with that level of specificity would be worse to reason about.

In the snippet get- put- or- I am placing predefined symbol sets of [ get ], [ put ], and then or’ing to the union set [ get put ]. This becomes the set of options to enable for the dirstore- storage backend, which is to permit read and write operations. In null- coap- and null- http- I am instantiating stateful protocol objects with the null set [ ] because I want to be able to define options later. Defining options as members of sets seems to be a forwards compatible way of specifying boolean option switches. The 0.0.0.0:80 http-listen- program makes a stateful modification to server object on the stack created with http-.

More examples here.

The “user experience” is unpleasant but I discount experience for outcome.

If I was inspired by anything it would be my transcendental experience of replacing a GUI radio automation system with a Liquidsoap script.

I also recently learned TCL properly, and from that I borrowed the model that everything is a string or a collection of strings.

6 Likes

I’d say command line flags and configuration files are DSLs as well, so the question reduces to “what is the most convenient DSL for this application”. Which, for lack of familiarity with the application, I cannot answer. But a concatenative language implemented on the command line is definitely an interesting addition to DSL implementation techniques!

3 Likes

flipped the command-line flags to take arguments from the left instead of the right. Arguments that not recognized as “programs” are pushed onto a stack. Arguments recognized as programs take the stack as input and output new stack

It’s a unique approach to command-line arguments that reminds me of Forth, Unix pipes, even machine code instructions or Polish notation.

Applicative languages commonly have an issue with a seemingly “backward” flow of data:

baz(bar(foo(x)))

Where evaluation starts from x as an argument, processed in turn by the functions in reverse order. Expressed concatenatively:

x foo bar baz

I see that’s how erishell’s command-line mini-language works, except functions are identified by having the postfix - (dash).

It’s similar to pipeline operators like |> in F#, or -> in Clojure, also called a threading macro.

Threading macros, also known as arrow macros, convert nested function calls into a linear flow of function calls, improving readability.

Compared to how shell commands typically expect options as key/value pairs, --key value or -k v, it might feel foreign to users to see erishell’s approach. But on closer study, it’s actually a perfect fit with Unix pipes, where data is processed through a chain of commands, as erishell can be a pipe with its DSL also piping data through functions, then to another pipe or output.

Does Eris have a REPL? Occasionally I’m frustrated with existing REPLs in various languages, where translating my intention into code can get un-ergonomic as I often have to “backtrack” to add opening parentheses or functions that receive data, but what I really want is to pipe it through to the next processor. Now I understand a concatenative language or operator can solve it more intuitively.

2 Likes

It does have a REPL. The prompt for the REPL shows the contents of the stack so it’s somewhat intuitive to work with, and everything is in the same form as the command line.

The project was half developed on 9front so he REPL was implemented first as the /srv/eris.cmd channel. Modern Plan 9 filesystem servers drop two pipes into /srv, one for the 9P protocol and the other for commands, and during shutdown there is a script that does echo halt >> /srv/*.cmd to the servers. The REPL is on the command channel with a halt program to shut itself down.

I would like to reuse the REPL once more for IPC on Linux, but I haven’t figured out how create the pipes, or how to safely share the working stack between clients with different intentions.

1 Like

The postfix dash arg flag- convention is brilliant in helping to grasp the flipped structure :clap:, yet it’s still unusual.

Thinking aloud here… UIUA language has an interesting idea reducing the weirdness for the majority used to prefix style:

Code runs from right to left, top to bottom

So they write ÷ [2 3] [6 9] but execute right-to-left “push [6 9]; push [2 3]; divide” order (result being [3 3]).
However, that requires parsing the whole program before execution, which conflicts with adding a REPL, and that’s where their second twist comes in:
Multiple lines do execute top-to-bottom, so this:

[2 3] [6 9]
÷

or even

[6 9]
[
  2
  3
]
÷

are equivalent!

So perhaps that way you could have a familiar-looking -combinator -flag arg -flag arg order, with full stack + REPL power?
Admittedly, the REPL order is still weird, it violates “enter ~ space” axiom of regular text wrapping! I haven’t really played with Uiua, don’t know yet how that feels… But, it kinda “postpones” the weirdness, one needs to grok that only when one graduates from CLI to REPL.

  • eliot makes good points, that the natural order may depend on your domain. Unix CLI conventions are already mixed-order in a way! E.g. jq syntax deliberately adopted pipe operators so that jq 'foo | bar' would be equivalent to jq 'foo' | jq 'bar'. It’s hard for me to say where prefix/postfix fits your needs…

  • IIUC you picked concatenative not so much for programming (e.g. do you define any shorthand aliases for common chunks of options?) as for easy way to implement a DSL that can express hierarchical structure?
    I must say it’s unclear to me why a flat syntax is better here than say JSON. Or almost any DSL syntax with parens of some kind; It feels like nesting would make most examples here clearer to read…
    And/or the whole thing would be clearer with diagrams.

P.S. if you haven’t seen “storage combinators” e.g. metablog: Native-GUI distributed system in a tweet that might interest you, e.g. cache- is similar spirit.

[P.S. using execline in one example made the doc even more obscure, but I’m glad to discover it — it’s really similar to ideas I hoped to explore :folded_hands:]

3 Likes

That is lovely. How tight the code and mysterious symbols.


code runs from right to left .. that requires parsing the whole program before execution

That’s interesting, the problem sounds similar as my having to “backtrack” while writing a linear program on a REPL, to apply a new function to a value or wrap an expression with parentheses. (I realize the latter is typically solved by a Lisp REPL with a keyboard shortcut.)

So I see there’s a category of syntaxes with an advantage of being able to be interpreted as a stream from left to right as the words (symbols, values, operators) come in, compared to other syntaxes that require buffering and backtracking. Is it only concatenative syntax like Forth, or might there be other variations with the same property.

A somewhat pedestrian example that comes to mind, there are various JSON syntax variants for streaming, like JSONL and JSONLines, where each line transmitted is a valid JSON value. If I recall, an array would be sent as a stream of values so the parser doesn’t have to buffer them while waiting to receive the closing bracket. Similarly with objects. Not exactly sure how that works, like:

1
2
3
{"key":"value"}
{"key2":"value2"}

One could probably come up with a syntax to support streaming the key/value pairs of an object, or values at deeper nested levels, as separate lines. And the receiving end would “reassemble” the object.

There are streaming XML/HTML parsers which emit events like open/close tag, so it doesn’t have to buffer the entire inner content after an open tag until it encounters the matching closing tag. I imagine a Python-esque HTML variant syntax like Pug would be more suitable for streaming.

ul
  li Item A
  li Item B
  li Item C

Multiple lines do execute top-to-bottom

So UIUA uses two dimensions (horizontal and vertical) to organize expressions, rather than the typical linear stream of code. That seems intuitive and fits with how a person thinks when they’re building up a larger expression in their mind. Push values and subexpressions onto the stack, operate on them, then take only the resulting value - emptying the mind of previous operations - to work with it further. Is there a word that describes this process, inductive method/reasoning..? Feels similar to how mathematical proofs are built up.

Deductive logic studies under what conditions an argument is valid.

Rules of inference are ways of deriving conclusions from premises. They are integral parts of formal logic, serving as norms of the logical structure of valid arguments.

What does it look like? Is it expressed in a concatenative manner by chance?

Symbols are not only used for naming mathematical objects. They can be used for operations ( + , − , / , ⊕ , … ) , {isplaystyle (+,-,/,plus ,dots ),} for relations ( = , < , ≤ , ∼ , ≡ , … ) , {isplaystyle (=,<,eq ,im ,quiv ,dots ),} for logical connectives ( ⟹ , ∧ , ∨ , … ) , {isplaystyle (mplies ,and ,or ,dots ),} for quantifiers ( ∀ , ∃ ) , {isplaystyle (orall ,xists ),} and for other purposes.

O-hoh, I discovered a feature in the forum, this is embedding inline SVG.

![{\displaystyle (+,-,/,\oplus ,\ldots ),}](https://wikimedia.org/api/rest_v1/media/math/render/svg/1838a740b7cd4f8586f00a1520699dcc093b5d0c)

Edit: But I can only see the SVG during preview, not in the published post. It looks like this:

Well, in my meandering manner of an eternal student, I did manage to tie it back to the original topic of this post. Those symbols in UIUA language are logical operations (well, functions). Fascinating, I’ll have to dig deeper.


As a possibly relevant tangent, I’ve been learning how there’s a structure deeper than language syntax, or rather, syntax-agnostic logic. If I can recall the articles that connected the dots for me..

Landin explains how one can “compile” a programming language such as ALGOL to a minimal but canonical model language. Landin describes in detail and formally how each piece of syntax of ALGOL has an encoding in the language of applicative structures.

Correspondence between ALGOL 60 and Church’s Lambda-notation

McCarthy gives meaning to core ALGOL programs (call-by-value with mutation but no control operators) by compiling them to mutually recursive schemas. As a result he obtains a meta-circular LISP interpreter for core ALGOL.

– A Formal Description of a Subset of ALGOL

..Well, I’ll have to review my notes fo references - it’s something simple but for me was an insight, about the computational equivalence among notation systems.

The Lisp-likes with prefix, Polish notation:

+ 2 (* 3 4)

The description “Polish” refers to the nationality of logician Jan Łukasiewicz. “I came upon the idea of a parenthesis-free notation in 1924.”

Ironic the inventor called it parenthesis-free when Lisp is notorious for them.

The Forths with postfix, Reverse Polish style:

2 3 4 * +

Here I can see there’s no need for parentheses, thanks to the stack.

And the Algols, infix style with implicit precedence rules that abbreviate the need for some parentheses.

2 + 3 * 4

Somehow this syntax is how we’re taught to think in mathematics, used in the majority of programming languages, and considered intuitive. Feels like Lisp should be the foundation, but the more I get familiar with Forth-likes, it’s starting to grow on me as a natural way to think.

What impressed me was that the same algorithmic process can be described in different notational systems, but the underlying logic is equivalent. Practically, the implementation and evaluation can differ in terms of efficiency, like the need for a stack or grouping expressions. But there’s an invisible logical structure underneath the various notations used to express it.

I’m still wondering what that means or implies, it’s like the same melody written down as Western music score, or visualized as piano roll, or encoded as MIDI or MusicXML data structure. It’s also like the same human thought can be expressed in different languages, or communicated via Morse code, and so on. What is that underlying unexpressed “thing” that’s moving through it all. It seems I need to read more philosophy (maybe semiotics) to even know what to call it, much less begin to understand what I’m trying to understand.


it’s unclear to me why a flat syntax is better here than say JSON. Or almost any DSL syntax with parens of some kind; It feels like nesting would make most examples here clearer to read..

Hmm, so for nested expressions the Forth-like streamability of erishell command-line arguments can be a disadvantage, that the human reader must keep track of the stack in their mind.

would be clearer with diagrams

For reading, or grokking the flow, a visual description might be easier than textual. But for writing/typing in the terminal..


Speaking of streams of thought, recently I learned about edbrowse, a linear command-line browser/editor created by a blind computer user.

The latter is a manifesto of sorts, it made me think about how differently people experience the world.

Word processors, Internet browsers, and email clients typically present information in two dimensions, spreading text and icons across the screen. This interface is remarkably efficient thanks to the parallel processing capabilities of the retina and the visual cortex. In fact this screen to brain interface is so efficient, there is no need to count the bits as they fly by.

..However, a blind user cannot assimilate this data at a glance, and separate the wheat from the chaff. He is forced to read every word on the page, using a voice synthesizer or braille display. These adapters are imperfect at best, as they ratchet the flow of information down to an agonizing crawl.

Is there a practical alternative? I believe there is, but certain critical applications must be rewritten from the ground up. To this end, I have developed a combination editor/browser called edbrowse, which is 100% text based.

Output is measured and conserved like a precious commodity as it passes through the narrow channel of speech or braille. Edbrowse gives the blind user exactly what he asks for, and nothing more. Sighted users also find its unique features helpful in certain situations.

2 Likes

One I know is GitHub - tomnomnom/gron: Make JSON greppable! which round-trips to/from one value per line, with full path, which makes it friendly to Unix-ish filtering:

json = [];
json[0] = {};
json[0].author = {};
json[0].author.avatar_url = "https://avatars.githubusercontent.com/u/58276?v=4";
json[0].author.events_url = "https://api.github.com/users/tomnomnom/events{/privacy}";
...

or with --json:

[[],[]]
[[0],{}]
[[0,"author"],{}]
[[0,"author","avatar_url"],"https://avatars.githubusercontent.com/u/58276?v=4"]
[[0,"author","events_url"],"https://api.github.com/users/tomnomnom/events{/privacy}"]

I’m not sure whether gron itself streams output for in-progress input (?) but the notation looks well suited for that (in both directions)

2 Likes

Nice, I knew I’d seen that syntax before, so glad I mentioned it. (Cunningham’s Law: “The best way to get the right answer on the internet is not to ask a question; it’s to post the wrong answer.”)

It’s a unique take on the JSON-likes. There’s an ADVANCED.mkd with more examples demonstrating the advantage of this syntax for greppability.

▶ gron testdata/two.json | grep twitter
json.contact.twitter = "@TomNomNom";

In its simplest form it reminds me of INI format.

▶ diff <(gron two.json) <(gron two-b.json)
3c3
< json.contact.email = "mail@tomnomnom.com";
---
> json.contact.email = "contact@tomnomnom.com";
▶ curl -s http://headers.jsontest.com/ | gron
json = {};
json.Accept = "*/*";
json.Host = "headers.jsontest.com";
json["User-Agent"] = "curl/7.43.0";

So it can take stdin as input stream. I wonder if gron emits each line as the input is processed. ..Yes it does: gron/main.go:319. It treats the input as one JSON object per line. Makes sense.

▶ gron testdata/two.json | grep likes | gron --ungron
{
  "likes": [
    "code",
    "cheese",
    "meat"
  ]
}

And it can ungron to reassemble the object, or other values, back to JSON.

I like how grep is in the middle of the pipeline, how well it fits conceptually.


Compared to jq:

gron’s primary purpose is to make it easy to find the path to a value in a deeply nested JSON blob when you don’t already know the structure; much of jq’s power is unlocked only once you know that structure.

Here’s the ungron grammar described in an unpublished HTML page.

Input ::= '--'* Statement (Statement | '--')*
Statement ::= Path Space* "=" Space* Value ";" "\n"
Path ::= (BareWord) ("." BareWord | ("[" Key "]"))*
Value ::= String | Number | "true" | "false" | "null" | "[]" | "{}"
BareWord ::= (UnicodeLu | UnicodeLl | UnicodeLm | UnicodeLo | UnicodeNl | '$' | '_') (UnicodeLu | UnicodeLl | UnicodeLm | UnicodeLo | UnicodeNl | UnicodeMn | UnicodeMc | UnicodeNd | UnicodePc | '$' | '_')*
Key ::= [0-9]+ | String
String ::= '"' (UnescapedRune | ("\" (["\/bfnrt] | ('u' Hex))))* '"'
UnescapedRune ::= [^#x0-#x1f"\]

It’s incomplete, there are missing definitions for Space, Unicode, and the Value doesn’t fully describe arrays and objects. I think it’s an old draft of the spec.


Here’s one of the “advanced” examples rewritten for readability.

▶ gron "https://api.github.com/repos/tomnomnom/gron/commits?per_page=5" \
  | fgrep "json[0]" \
  | egrep "(committer.name|commit.message)" \
  | sed -r "s/(commit|committer)\.//g" \
  | gron --ungron
[
  {
    "message": "Adds 0.1.7 to changelog",
    "name": "Tom Hudson"
  }
]

It works, but not as elegant when each processor in the pipeline is treating the value as a blob of string. Might be nice if they understood the value types as structured data.


The output of gron is valid JavaScript .. It’s also possible to obtain a JSON stream via the --json switch.

This “JSON stream” isn’t described any further, but from this example:

▶ curl -s http://headers.jsontest.com/ | gron --json
[[],{}]
[["Accept"],"*/*"]
[["Host"],"headers.jsontest.com"]
[["User-Agent"],"curl/7.43.0"]

Looks like an array/tuple with one or more keys and the value.


To bring it back to erishell, does it treat all values as strings or as structured data? From the examples, I think it’s designed as a UNIX pipe to work on standard I/O as byte streams.

erishell \
  null- coap- example.org coap-peer- \
  urn:eris:B4A36J5... decode- \
  1 fd- swap- copy \
  | mpv -

Right.. That’s quite beautiful how it works, I didn’t quite get it the first time I read it. That 1 fd- is a file descriptor for stdout being pushed onto the stack. And so is the decode- before it, pushing a readable stream onto the stack. The swap- and copy- operators naturally take two streams to work on, and the output is piped to mpv media player.

So mpv knows how to recognize (probably from a header chunk) the audio format and decode the binary data as it’s streaming in. It “reassembles” the audio signal. Technically audio can carry encoded structured data, like various implementations of data-over-sound (acoustic data transmission) using speaker and microphone, sometimes in the ultrasonic range.

ggwave1

(From ggerganov/ggwave)

In that sense, air waves are like raw byte streams, as a medium it can be used to encode and decode any information. So can flashes of light off a mirror, or smoke signals.

Working on generic byte streams in a pipeline is powerful and flexible, but in practice I see it’s common to treat all value types as string, where each processor is expected to do the decode and encoding. I imagine there’s already a binary “meta format” with a header to identify what format its content is in, including JSON as text. I’ve heard of protobuf.

Protocol Buffers (Protobuf) is a cross-platform data format used to serialize structured data. It is useful in developing programs that communicate with each other over a network or for storing data.

The method involves an interface description language that describes the structure of some data and a program that generates source code from that description for generating or parsing a stream of bytes that represents the structured data.

It’s one of those formats with code generation as a required step, to produce a custom parser to handle that schema. It’s like lex, yacc, or ANTLR with a kind of meta-language to describe languages.

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.

I suppose what I’m getting at is how to build a pipeline to process structured data with possibly extensible value types, formats, schema(ta).

It could be that programming languages - larger than DSLs in the shell and usually worked in a code editor or IDE - are a more suitable medium for doing data processing in a pipeline and building more elaborate data flows. Or a visual environment to compose graphs of processor nodes. What about a “pipeline to process programs”? Sure, like compilers, bundlers, optimizers, minifiers - they treat programs as structured data, an abstract syntax tree. Then why aren’t we sending Lisp around the world, transmitting structured data and programs as “meta-data” that generates and processes data? Well, in a way we are, but that is another story and shall be told another time..


The ERIS-FS Index is described using CDDL RFC8610

A notation to express binary objects and JSON data structures. Curious to read the specification.

1 Like

My opinion is that catlangs are better for what I want to do because it’s easier to stay locked into a minimalist mindset, if that makes sense. Being brutally simple is better than clever. At some point I had some pre-combined flag sets but it felt like a step to far.

A few years back I followed a tutorial on creating Lisp interpreters and embedded a Lisp REPL in a utility that was similar in purpose to this one. The reason for doing that was that I needed a Python library and didn’t want to write Python, so the implementation of the library was writing and reading Lisp from the REPL running in a subprocess. I think the problem I had with that REPL was that I would inevitably try some clever Lisp tricks and then be confronted with the fact that it was just a janky REPL that barely worked and wasn’t worth the effort of improving.

Also, this talk convinced me that catlangs are a better substrate for computation than lamba calculus.

2 Likes

The generic types are something like string, stream, and eris store. This falls out of the core implementation being in Go and making use of Go interface types.

As you noticed with ERIS-FS, there is structured data in play in the form of CBOR. This manifests itself as … null- encode- pushing a string result on the stack in the form of “urn:eris:…”, but … erislink- encode- pushes a buffer-stream onto the stack that contains
a CBOR structured form of the URL. In this case there is structured data but as unstringable bytes that need to be copied out (to a file descriptor).

1 Like

From the talk “Concatenative programming and stack-based languages”:

how small we can make our language while still being Turing-complete

This is similar to that idea of Kolmogorov complexity, about the smallest program to achieve a given object. (“The complexity of an object is the length of its shortest description.”) I find that question intriguing, to get to the bottom of the fewest simplest possible building blocks of computation.

The talk covered some of the same territory I’ve been exploring in my own journey, including the discovery of uxn. It was insightful to hear a coherent narrative that ties things together.

lambda-8cc - x86 C compiler written in untyped lambda calculus

It has long been known in computer science that lambda calculus is Turing-complete. lambda-8cc demonstrates this in a rather straightforward way by showing that C programs can directly be compiled into lambda calculus terms.

Lambdas All the Way Down

lambda-8cc is written as a closed untyped lambda calculus term lambda8cc = λ x . ⋯ which takes an input string x representing a C program and outputs an x86 Linux ELF executable expressed as a list of bytes.

Here, even strings are encoded as lambda terms. Characters and bytes are encoded as a list of bits with 0 = λ x . λ y . x , 1 = λ x . λ y . y , and lists are encoded in the Scott encoding with cons = λx . λy . λf . (fxy), nil = λx . λy . y.

Therefore, everything in the computation process, even including integers, is closed in the world of pure lambda terms, without the need of introducing any non-lambda type object whatsoever. It doesn’t use any primitive types other than lambdas.

In addition to x86, lambda-8cc can compile C to lambda calculus as well. The output program runs on the same lambda calculus interpreter used to run lambda-8cc itself.

Monstrous, but beautiful with purity of concept.

lambda-8cc is a combination of 3 projects

  • LambdaVM - A Programmable Virtual CPU Written as an Untyped Lambda Calculus Term
  • ELVM - EsoLangVM Compiler Infrastructure
  • 8cc - A small compiler for the C programming language.

The compiler is able to compile itself. You can see its code both as an implementation of the C language and as an example of what this compiler is able to compile. The source code is carefully written to be as concise and easy-to-read as possible, so that the source code becomes good study material to learn about various techniques used in compilers.

LambdaVM was written by Hikaru Ikuta, the author of this repository (lambda-8cc).The running time and memory usage statistics were measured using a lambda calculus interpreter written by Melvin Zhang. lam2bin was written by Justine Tunney.

ELVM is a deep project, it’s a minimal cross-compiler that supports mainstream and esoteric languages. I’ll make a compact list to get an overview:

  • Acc!!, Aheui, Awk, Befunge, Binary Lambda Calculus, Brainfuck, C, C++14 constexpr, C++ Template Metaprogramming, C#, C-INTERCAL, CMake, CommonLisp, Conway’s Game of Life, Crystal, Emacs Lisp, F#, Forth, Fortran, Go, Go text/template, Grass, HeLL, J, Java, JavaScript, Kinx, Lambda calculus, Lazy K, LLVM IR, LOLCODE, Lua, Octave, Perl5, PHP, Piet, POSIX Shell, Python, Ruby, Scheme syntax-rules, Scratch3.0, SQLite3, SUBLEQ, Swift, Tcl, TeX, TensorFlow, Turing machine, Unlambda, Universal Lambda, Vim script, WebAssembly, WebAssembly System Interface, Whirl, W-Machine, Whitespace, arm-linux, i386-linux, sed

One of interesting testcases ELVM has is a tiny Lisp interpreter. All the above language backends are passing the test, which means you can run Lisp on the above languages.

Moreover, 8cc and ELVM themselves are written in C. So we can run a C compiler written in the above languages to compile the ELVM’s compiler toolchain itself, though such compilation takes long time in some esoteric languages.

So ELVM’s IR (intermediate representation) is like the lowest common denominator to translate among these languages.

  • 6 registers: A, B, C, D, SP, and BP
  • Ops: MOV, ADD, SUB, LOAD, STORE, PUTC, GETC, EXIT, JEQ/JNE/JLT/JGT/JLE/JG, JMP, EQ/NE/LT/GT/LE/GE, DUMP

OK, it’s not as minimal as I thought, but practical.


What was the name of the universal language in the legend of the Tower of Babel, before it broke up into a thousand mutually unintelligible dialects of the world. Apparently it was just called the “one language” or “a common tongue”. Ah there’s a term, Adamic language, “the original universal proto-language from which all known human languages evolved”.

“Language invented by Adam with which he named all things..”

This relates to the smallest possible Turing machine.

When Alan Turing came up with the idea of a universal machine he had in mind the simplest computing model powerful enough to calculate all possible functions that can be calculated.

Claude Shannon first explicitly posed the question of finding the smallest possible universal Turing machine in 1956. He showed that two symbols were sufficient so long as enough states were used (or vice versa), and that it was always possible to exchange states for symbols.

John McCarthy wrote:

In the 1950s I thought that the smallest possible (symbol-state product) universal Turing machine would tell something about the nature of computation. Unfortunately, it didn’t.

I suspect that what was imagined at that time was that by finding the smallest universal machines one would discover some “spark of computation”—some critical ingredient in the rules necessary to make universal computation possible. At the time, it probably also still seemed that there might be a “spark of life” or a “spark of intelligence” that could be found in systems.

Simple Turing Machines, Universality, Encodings, etc.

This is it, a cute little thing. The smallest known universal Turing machine.

turing_machine

Perhaps one can even use them as raw material for creating useful computers out of components like molecules.

Molecular computers, good one. It’s like synthetic physics, or xenophysics.

We’ve come a long way since Alan Turing’s original 1936 universal Turing machine—taking four pages of dense notation to describe.

There are also theoretical computer science reasons to be interested in small universal machines. An example is understanding the tradeoffs between algorithmic complexity and computational complexity. ..One thing that’s come out of my efforts in this direction is what I call the principle of computational equivalence. It’s a fairly bold hypothesis about which there’s much to say (e.g. Chapter 12).

It leads to his book, a section on computational irreducibility.


One of the “aha” moments in the talk about concatenative languages, is that not only are parentheses not necessary but names altogether. That’s like a Zen koan, the gateless gate. What’s left after names are no longer needed.. That wide open place beyond words, I’ll meet you there.


From the references listed in the video description, I tracked down the articles’ links for further study. You can’t go wrong with “cat langs”. :cat_face:

1 Like

While developing Cardumem, I have been thinking the same about DSLs and command line flags. In fact, I have been using them as a way to deal with the lack of powerful debuggers in Lua, in comparison with the Pharo counterparts, as my DSLs in Pharo evolve usually from debugging and interacting with the software. As I was getting tired of launching the server to click here and there, the command line flags DSLs were the alternative way of using particular pieces of the software in the absence of the Pharo playgrounds that allow you to talk and inspect running objects in an interactive fashion. Command line flags are my less interactive CLI counterparts, but they evolve in a similar kind of design conversation, while developing the software and writing its source code.

BTW, and talking about prefix/postfix operators, syntaxes and concatenative programs, Cardumem started as a “theoretic syntax” to put something like Elixir’s pipe operator in a TiddlyWiki alike wiki engine, so ti can be extensible in content and functionality by an “end user” (I really don’t like that term). Having a parsimonious “concatenative syntax” (in contrast with current TiddlyWiki’s) was part of such design. So simple and “self documenting” expressions like the following one were possible:

items |> tagged("MalleableSytems") |>  webExport(template.mustache.html)

A lot of us here share the concern on how notation affects thought, so it’s pretty good to see how this relation is explored in systems like concatenative ones, CLIs, and GUIs and their combinations in things like notebooks.

2 Likes

So UIUA uses two dimensions (horizontal and vertical) to organize expressions, rather than the typical linear stream of code. That seems intuitive and fits with how a person thinks when they’re building up a larger expression in their mind.

I am really loving you exploring this set of ideas, because this is the same rabbit hole I fell into about 20 years ago, when Concatenative Programming was having a moment due to Joy and Factor.

One of the things I love about postfix notation as opposed to Lisp’s prefix-parenthesis notation, is that it seems to map more directly to the idea of “doing things”. Take this value, apply this operation to it, you immediately get an answer and so you don’t have to hold a whole bunch of intermediate state in your head. That’s great for debugging, for mental modelling, and it’s also efficient for the machine.

The Forth stack style also seems to map fairly closely to the Smalltalk style of “sending messages to objects”… except not quite, and that “not quite” has annoyed me for those 20 years now. I still want a very easy to visualise / mentally-model way of “sending messages” which is fast like a stack machine but also provides the safety guarantees of object message sends (so you can construct your own DSLs without fear of accidentally performing operations from some other dictionary). The best way I can imagine so far is to have strictly typed Forth words (like in StrongForth), but I’ve been too lazy to try to implement that myself. I should try.

It’s also very, very easy to spin up a postfix notation as a user interface. Like Eliot, a few years ago I was writing a very basic/toy Chinese->English dictionary program for my own use, and the easiest way of doing the interface was to structure it as a bunch of postfix operations. It was very easy to follow along in my head with what the machine was doing.

Microsoft’s Kusto Query Language (KQL), used in their cloud “data lake” databases like Microsoft Defender, also works like this. Instead of SQL where the expression will be transformed by a database query optimiser in ways you don’t understand, KQL is executed exactly how you write it, in the order that you wrote it. Filter by this, filter by that, filter by the next thing. Easy to parse, easy to understand, and preserves that mapping between the programmer and the machine so for very large databases you know you can cut down the result set very fast.

One place where postfix seems like it doesn’t work quite so well is when dealing with infinite streams of data - where you can’t just “start evaluating from the right” because the right hand side isn’t there yet - and so that idea of right-to-left evaluation on the line vs top-to-bottom evaluation in the stream is interesting. Maybe there’s also some easy way of handling infinite streams in a push-down stack machine, I dunno.

(The simplest way might be single-value “read” and “write” operations, where, instead of accessing an external I/O system, a “read” causes the current process (just the data/return stacks) to be paused and transfers control to a process “to the right”, while a “write” operation transfers to a process “to the left”. Maybe some multiprocessing Forths already do this, but it might be a fun thing to experiment with and try to reduce to its ultimate simplest form. How many stacks would a desktop OS-sized machine need, and how small could each stack be? And can we get some better memory protection than “any process can overwrite any RAM cell and disk block”?)

Re this:

I imagine a Python-esque HTML variant syntax like Pug would be more suitable for streaming.

I hadn’t heard of Pug until now! But that indented tag syntax seems to be almost exactly the “recursive text” model that I’ve been working with SUSN. I think there’s a fair bit of mileage there. I keep going back and forth on whether indentation as a syntax is better than explicit open and close brackets. Indentation looks clean and pretty. But explicit brackets, and no indentation, works better if you’re on a small constrained device with not many horizontal characters and a really low-powered text editor that doesn’t automatically hold indent level. However it can get a bit overwhelming to work out where in the nesting level you are.

Rules of inference are ways of deriving conclusions from premises. They are integral parts of formal logic, serving as norms of the logical structure of valid arguments.

Yes, there’s a whole almost discarded branch of logic which might be worth exploring, around the idea of purely concatenative (postfix or prefix) logical constructs. Where the things being “pushed to the stack” or operating on it aren’t necessarily functions that evaluate to a value, but predicates (like Prolog) or constraints (as in other constraint solvers). Ie, sort of like a function that if it returns true (or perhaps any non-false value), continues processing, or that backtracks to try a new value if false. The trickiness comes in how you define and manage the “call/return” stack for such a processor, and how you introduce or remove values. We’d like the constructs of such a thing (like Prolog) to be able to act as both “generators” and “filters”, ie, as both “types” and “functions”, without needing two separate definitions.

Combinatory Logic, which is a concatenative equivalent to Lambda Calculus, doesn’t quite get us there. It gets us maybe halfway there (it removes the concept of variables and all the complexity around name binding and variable name transformation), but it still basically just models function-application. The genius of Prolog and other constraint solvers is that they don’t just transform an input value into an output value - they will also generate the input value for you out of nothing if you don’t provide one. Ie, they’re like a function (a coroutine, actually, because it can have multiple output values) that instinctively “knows” how to “iterate itself” like a collections if called without a parameter. So all function and type definitions automatically become collections. This is a very, very powerful simplifying/organizing concept, but Prolog doesn’t fully develop it because it doesn’t really have an equivalent of “closures” (anonymous predicates) which would be necessary to really get the full benefit. In a concatenative context, how to map this idea of being a “generator” (mainly how to return an “input value” into an “output” position) onto the otherwise very sensible and easy to grasp concept of “term rewriting”, is what I think is the hard part.

Kanren tries hard to map Prolog onto Lisp/Scheme, and mostly works, but it does so in a clunky brute-force way by having all “relations” be functions that return essentially a “machine state” (a set of variable mappings plus functions that simulate infinite streams of values) rather that doing something cleaner and more intuitive. But what “cleaner” would be in this context, I don’t yet understand - only that I’ll know it when I see it, and I’m not seeing it yet.

(Perhaps it’s something like “you get something that looks for all purposes like a return value, but you can call a meta-function on that value to extract the input value”, combined with “if at any point an expression returns false, the VM automatically backtracks and tries again with the next value”)

But if we cracked that, we could essentially rewrite any program - or database, or configuration file - built out of “types and functions”, into another program built out of “just types, all the way down”, and I think reducing those two core concepts into just one would provide immense simplicity AND malleability benefits.

3 Likes

Interesting. Seeing the Kusto example documentation, transcribed at [1], seems pretty similar to what I’m trying to do with Cardumem operations at [2] and [3], this time by embedding YueScript into Djot.

  1.  StormEvents
     | where StartTime between (datetime(2007-11-01) .. datetime(2007-12-01))
     | where State == "FLORIDA"
     | count
    
  2. items |> tagged("MalleableSytems") |> webExport(template.mustache.html)
  3. From the Cardumem repository:
    tagged("member")
        |> ramdomize()
        |> stylize("MemberTemplate")
    

I have been thinking also in this idea of postfix notation as an easier interface to convey information operations, after teaching intensively Pharo with its pretty easy and consistent syntax that famously fits a postcard. And despite the existence of Pharo Functional, with its parrot (:>) operator, I want to experiment if a more minimal Lua based stack can be packaged with the tools/practices/concepts than encourage convivial computing with a more fluent transit between users and developers (and bridge that back with our Pharo based computational notebooks).

Maybe is my personal bias regarding parenthesis overflow, but, in my experience, I don’t picture Lisp like syntax being actively adopted by non techies/devs, and the way we embed computational notation into structured documentation notations matters for building that convivial computing.