A program as a self-replicated separated structure / process

I want to connect some threads of inquiry.

First of all, we want to build Domain Specific Languages and be able to write code in any language. We want those languages to be interoperable.

We also want our code to remain to function forever, if it worked in the first place, there is no reason to break after a few years.

Then we discussed about lithification, about our ability to take a piece of code and make it our own, change it however we want. Isn’t that relevant to the previous discussion? If we want to alter the syntax of a program, don’t we have to change the compiler / interpreter?

Why is it that our source code fails to compile? Is it because we do not own the program that compiles it?
To extend what Allan Kay said about the internet, instead of having a single specification that we do not control, having multiple syntaxes that are user-definable, we can do the same for programs.

A program should contain its source code, the binary of the compiler and the source code of the compiler.
It should be able to self-replicate itself, change its syntax and evolve independently from any other piece of software.

What about interoperability then? Well, the point is not so much a single universal protocol, but a way to convert from one data structure to the other, moving bidirectionally from one structure to the other.

Stephen Kell and Ink & Switch have discussed about this.

I have also discussed this here :

1 Like

This is a provocative thought, that’s very core to the Object Oriented philosophy, but also one that I feel is deeply wrong. If it is right, the it’s very nontrivially so, because it implies a lot of things that seem to have been contradicted by our experience so far.

(Edit: I see that Stephen Kell’s article “Reversing abstractions” is making much the same point I make below, where he talks about “existential” vs “universal” abstractions. Yes, the Internet worked because its design, especially around TCP/IP, is “universal” in Kell’s sense, ie it defines concrete specifics of an implementation. And yes, this way of thinking seems currently alien to computer science and especially the philosophy of programming language design. And this mismatch seems very interesting.)

Surely the essence of data formats is and must be that they travel between systems. (If data doesn’t need to travel, it doesn’t need to have a defined format.)

And right there is the problem: if data in a defined format travels between systems (and it does), then which system should implement the “procedures” that operate on it? Surely all of them need to? Then how can all of those procedures (each very different, because written for different machine types, OSes, languages) possibly be defined in “the same place” as the data is defined? The two systems might not even be in the same continent - or the same time period. The data might have been written centuries ago for a machine long dead, and now we’re trying to make some sense of fragmentary pieces available to us, often with the surrounding context lost, and we’re trying to do things that its writers never intended to solve problems they couldn’t imagine. Much of the world’s cultural and scientific knowledge is this kind of data. The computing world we will increasingly face as the American-led Cloud centralises and then collapses will also be like this.

And indeed, Kay’s recommendation is not how we built the Internet! We defined data formats very strictly and in one well-known place (the IETF RFCs) as well as protocols (concretely specified) for how sequences or interactions of messages in those data formats should interconnect - down to exactly what bits must be set in which byte for a message to be valid… but left the procedures to be followed by machines when they processed those data formats and implemented those protocols, completely up to each individual system builder.

The only way I can understand Kay’s idea as making any possible kind of sense - in a world where data must travel between systems and into the future - is for transmittable objects that seek to combine data and methods to travel as some kind of centrally specified, rigidly defined, universally understood, and never changed bytecode that runs on a universal runtime. And we don’t really have that. We’ve at times came close to wanting and almost getting that, with Java and .NET, and with Javascript and WASM, but we’ve always pulled back from trying to specify a universal runtime as a decentralised protocol. Maybe with Squeak and Croquet? The story has always been “the runtime and its object model should stay local to the machine/process, and only data and not objects should be transmitted”. And with Java and Javascript, at least, the VM itself updates so rapidly, multiple times a year, that it can’t possibly be a survivable archival format. I imagine WASM is probably also churning as fast as web browsers do.

This kind of statement from Kay (he’s said similar things many times) is why I talked a few months ago about wanting a “universal runtime”. Because it seems to be what is required, as an absolute minimum, for anything close to Kay’s vision to even begin to occur.

But I’m still not even sure if this vision (if I’ve even understood it correctly) would be a good thing if we did get it.

This, however:

A program should contain its source code, the binary of the compiler and the source code of the compiler.
It should be able to self-replicate itself, change its syntax and evolve independently from any other piece of software.

I agree. I feel like a program should be made out of very small parts that can each be changed independently. This feeling is why I tend to dislike the process of compilation and how it separates software artifacts into “source” vs “object” forms, where one form is editable by humans and the other is runnable by one very specific kind of machine (but not any others). It seems like it ought to be possible for software to exist in one form that is sort of just “human-editable object code” - otherwise, it becomes increasingly hard to change anything. (Because any change to a compiled program can only occur at the level of a “compilation unit” and its built “artifacts”, which can be extremely large - these days, compilation often routinely requiring the spinning up of Internet cloud infrastructure which may require permission from the cybersecurity teams of a hostile corporation or government.)

Very simple languages like Forth and Lisp seem to have this dynamically human-editable property, mostly, sorta. Usually by allowing complete programs to be as short as single words or short sequences of words, by making the “compiler” so small that it can be ignored for a naive implementation, and certainly by allowing “compilation” to be a thing that happens to small, dynamically generated, pieces of code at runtime.

Maybe this is the same as what Alan Kay wants, I don’t know. It is possible that I too want two separate and contradictory things. If it’s impossible for us to have a universal archival-quality runtime (needed in order to transmit objects whose methods would be defined as binary artifacts of that runtime), then we need to understand what we can feasibly have in a world where data needs to be transmitted between very different types of machines.

The simplest answer still seems to be what the 1970s Internet chose, “Just transmit data and define protocols, don’t even try to transmit procedures”. But at some point we still need to catch up with the 1990s Web which decided that to get anything done on the client side, it also needed to transmit at least the source code of a simple scripting language. (However, not a particularly extensible one - a Lisp would have been 1000x better, and then CSS and HTML could just have been S-expressions, syntax nesting issues would have evaporated, Markdown wouldn’t need to exist, the whole browser runtime could have been much tinier, etc, etc.)

3 Likes

No, Stephen Kell proposes the opposite. He is against a single ABI, a single runtime. It proposes a method of going from one implementation to the other through some sort of translation.
Universality is achieved through translation, exactly what I propose as well, the difference is in the technical details.

No data formats should not travel across systems, only across those that want to interact with this format.
If you are inside a system that needs to interact with the specific format, you write the procedures, or use a library. If you are outside, you use the procedures that come with the data structures, and you do not have to learn anything about the data structure.

There is no difference between data and procedures. Both have abstract syntax and concrete realization.
data formats and binary code / source code and binary code.

Consider for example the Commodore datasette.
Accessing the data requires the specific machine.

We need translations here again. From cassettes to usb sticks, and from commodore’s cpu architecture to the one we have in our computer.

In general, any installation of a linux os demands the compilation of the kernel and most programs a priori in a different computer, and we have something called cross-compilation, compiling from a different architecture.

The reason we did that was because each system is different, and this is a translation from a common unique protocol. This unique protocol is the minimum required to communicate, it does not specify all the protocols on top of it, for communication. Your os does not need to know the BitTorrent protocol, unless you decide you need it. If you want to just download a file, you dont need to know the bittorrent protocol either, you download the procedure that is cross compiled for your architecture.

He says exactly the opposite. The fact that the procedures are compiled seem to “confuse” you.
All procedures and data contain their abstractions, the source code. But any new system will need to perform a translation for the data and procedures to keep working.

I do not know how to do that because I do not work at this level of abstraction. There are other that could comment on this.

Again, we do not need a universal runtime, and we have multiple binary artifacts per architecture.

We transmit procedures all the time. All OSes have precompiled procedures that we need to download.


We seem to forget that the TCP/IP is an abstraction.

If I went back to the past, in 2000, and brought with me my vdsl modem, would i be able to connect to the internet?

TCP/IP needs to be translated to the physical network signals for it to work. We currently have people with a fiber connection , an ADSL connection or possibly a 56kbps modem and they all communicate with each other, because of a translation that does not demand that we all have the same physical network.

Please educate me as appropriate, but architectonically I don’t see many things more than triviality in this proposal.

Wouldn’t it be satisfied by supplying an interpreter along with the source code at each language/virtual machine abstraction level? At the bottom it still needs to be founded by a language specification that cannot “self-replicate” because it is the language of the hardware. We also have the source code for compilers. It is equivalent to the interpreter for this matter.

Of course I support the practice of having all source code, but I think it is as simple as that: having the source code.

1 Like

It is more of an investigation than a proposal.

The idea is to have malleable compilers or not. In other words each user having its own programming language that she adapts as she pleases. Most of us have heard @khinsen talk about python changing in a way that was out of his control, breaking previously written code.

If not, then we need to think about community governance. If a compiler is part of the infrastructure that we all rely on, the only way to maintain agency is through community governance. What types of governance can we have and is appropriate for each case?

2 Likes

Governance is indeed a critical issue when it comes to tech infrastructure. In today’s Open Source universe, governance participation is limited to developers, and often strongly influenced by their employers or other funding sources. That’s what happened to Python, it can be described as creeping corporate takeover, with year after year more developers representing corporate interests.

But beyond corporate influence, there is a tacit agreement in Open Source culture about not caring much about users, except by counting them. From the user’s point of view, the choice is take it or leave it.

Everything you said is true. At the same time, programming language is usually different enough for us to justify a separate organization. (C++ is not an example.) What should we argue next?

I consider Python 3 as a completely new language.

I would also like code to not break with language updates, perhaps by means of automatic migration. Today, this is communicated to the programmer by linter squiggles, usually without telling them the newer way to write the code. I can imagine there being a small wizard in the linter popup for every migration in the language changelog. How feasible is this?

2 Likes

There are many preferences in this space. I wouldn’t like to pinpoint it into a specific solution.

For example, some people prefer not to change their code.

1 Like

There is no single universal answer because usage patterns vary so much.

If a piece of software is true infrastructure, meaning that people depend on it without even being aware of it, then there should never be breaking changes. But when we talk about malleable systems that users can modify and thus also fix, then breaking changes become acceptable.

What has happened to much of today’s software, including Python, is a slow change of status. In the 1990s, Python was a hacker’s language, a support for malleable software. And then it became mainstream, and bigger, and used by developers writing software for users. Today it’s effectively infrastructure. Without a corresponding change in the governance model, and that’s where I see the problem.

1 Like