The Data-First movement

Hi Duncan-Cragg!

I see I have confused you on several points, so I’ll try to explain.

I have a very similar feeling to you about “data-first”, but my sense is that you’re sketching out here what I’d call a “system in-the-large”, like the WWW. A thing made of protocols that operates over large numbers of computers and isn’t too concerned about efficiency. It also isn’t too concerned about computation and how exactly computation works; that’s something to be punted to other layers.

This is probably a valid way of approaching very large systems. But I’m also interested in very small systems, like personal machines running personal desktop-like OSes. Remember that in the 1980s, with the Amiga, say, we had very tiny machines by today’s standards that could run entire multitasking graphical OSes. I feel like we’ve missed that sense of smallness, in today’s web.

One of the things that drives me is the nagging feeling that we have far too much unnececessary complexity, and too many layers, in our software architectures, and that htis has happened because we picked the wrong abstractions to build on, so we had to keep adding more and more. I feel that we should look for very tiny new abstractions that we don’t have to throw away as we move from very small, to very large, systems.

So my thinking is: consider three scales of use-cases. One, a system at the level of electronic circuits. Two, a system at the level of a single graphical process (application) or desktop (OS). Three, a system at the scale of the Web. If we have a valid model of computation, it should apply roughly equally to all three scales. If it only works at one scale and not others, it’s probably not quite a correct model of computation, and we should look at why not. For instance, the idea of “message-passing”, as in object-object programming, seems to be one of those ideas or models of computation that remains valid at all three of those scales. So if we’re looking for a new model of computation, it needs to be at least as useful as message-passing is.

That vision/requirement of not just being able to run “in the large”, as a protocol like the Web, but also being able to run “in the small”, down at the level of one process on one machine - or smaller - is perhaps the difference between our perspectives.

Now to the specifics.

Can’t say I fully get what you’re saying here, but in the Object Net all links or pointers to objects are unique string IDs (“UIDs”). What forging do you have in mind? Sketch out an attack!

So I’m talking here about whether “data-first” is a valid model of computation down at the second scale level: that of a single process in RAM, or an operating system on a machine.

“all links or pointers to objects are unique string IDs” is something that would work for a large-scale, Web-like, protocol, but it’s not something that’s going to work down at the level of a single process. At best, using strings everywhere would mean we’re talking something like HTTP or Tcl/Tk. It’s going to be very slow to run as a desktop. Yes, the trend right now is to literally reinvent all desktop apps as literal web browsers talking to literal web servers and slinging megabytes of text data around on every mouse click - but I don’t like that trend.

So I’m assuming that at the desktop or process level, there won’t be “string IDs”, but rather there’d be object IDs of some kind.

What forging do you have in mind? Sketch out an attack!

The forging and the attack would be if you hadn’t thought about what kind of VM you wanted to run down at the single-process or machine level, and you figured you’d maybe use a Forth instead. Dusk OS, perhaps, because it’s super trendy right now in the indie scene which we’re adjacent to. And if it was a Forth, it would come with machine integers as its pointers, and if you didn’t think too hard about security implications, you might decide to just use raw Forth machine integers as the local version of your object identifiers, thinking they’d be fast, and that since it was your own machine, you wouldn’t need to worry about security.

And then you would download an object from a stranger, because that’s the whole purpose of sharing objects, and would get rooted within seconds. Because the animation code of that object might compile down to raw machine Forth which implements no memory safety. Because of the object identifiers being raw integers on the local machine because that’s what’s the simple, natural (but wrong) thing to do in a Forth.

So I’m asking: have you thought about how your “Object Network” would run on a single machine, in a single process? Or is it a thing like the Web that you think of as being “too big to use on a single process” and you’d use a completely different technology down at process-scale?

Yeah, implementation detail, optimisation. I’m not doing premature security or optimisation!

That’s valid, but, I’m here to remind you that maybe you do need to at least think about security and optimisation even right at the beginning stage. Don’t maybe commit entirely to something like “all objects are text files and all object IDs are text strings” unless that’s how you’d be comfortable handling, say, individual mouse clicks or graphical buffer writes.

As an on-the-wire protocol, where efficiency isn’t a concern, sure, text strings are probably fine. But you’d like to run something like this on a single machine as well as on the wire, I think?

Order is determined by what I term the “application protocols” between objects - the type- or domain-determined concept of what peers are doing and the rules of interaction. This includes application or domain specific timeouts, so that the domain/type/application protocols will work over the wire. I don’t have CRDTs or stuff like that, or lockstep clocking, it’s all loose, best efforts.

I can understand fault tolerance being a good quality to have - after all a lot of electronic engineering is fault tolerance and turning noisy analog signals into digital - and I guess that’s a big part of what the object-oriented paradigm is trying to do. I guess my question is, can you see this “Object Network” paradigm scaling down to the level of individual function-calls or message-sends inside a single application? If so, then how exactly does this “loose, best effort” approach work for a case like, say, subtraction of two integers? Where it’s quite important for the correctness of the result for the two pieces of data to arrive in the correct order but you’d maybe not want to tag every 2 bytes of data with 64 or 128 bytes of timestamps and object identifiers and other metadata?

In big distributed systems with relatively few, relatively large messages/transactions, like say large corporate databases, it’s ok to pay a large cost of metadata overhead on each transaction, so that we can sync up those updates later. But we maybe don’t want to pay that cost inside a single proces, on every mouse click event or every function call.

The potential danger of having a computing paradigm that only works well for very large systems (because we’ve “priced it out of the market” in terms of the per-transaction runtime resource cost on the desktop) is that programmers will just ignore it on the desktop, and that might lead right back to where we are now, with lots of competing “applications”. Or, to also where we are now, which is everyone shipping desktop apps as literal webapps running on a web server and a web browser, and everything costing gigabytes to bring up “hello world” and taking seconds to process a keystroke.

If you have the object’s UID, you can request it, then won’t get it without the read permission. I’m struggling with your mental model of this I think!

My mental model, as I said, was “capabilities”, as in knowing the object identifier itself being what grants the read permission. Eg Capability-based security - Wikipedia

From your response, I’m guessing that you’re not thinking in terms of capabilities? Then that complicates matters quite a bit. That’s why I said that I thought capabilities are the simplest way of granting permissions; they map naturally to the idea of “reference” in programming languages.

But also what I’m saying is that, at some point, even in order to implement the concept of a “read permission” which isn’t a capability (ie like a Unix access list), I think you’ll find that you’ll need some kind of component that hides some information necessary to grant that permission.

So committing hard to “objects don’t have hidden state”, and committing to this stance even inside a machine’s own RAM, may I think give you problems on how to implement the secret-keeping needed to implement read-permissions. The Object Network would be a way of approaching computing which has limitations, and can’t be used for systems programming or perhaps even not for application programming. Which seems a pity if we want to replace all applications and operating systems with this model.

I’m also thinking that a functional-reactive model, if it can be made simple enough (and “just cache the previous computed value of all functions and make it available as a magic variable” is possibly simple enough), could also describe the lower levels of a network.

Sorry, I don’t get this! I re-read it 5 times…

Think of my first scale level: the electronics in a computer.

We currently build circuits using programming languages ( Hardware description language - Wikipedia ) These languages use a variety of programming paradigms.

If we do indeed have a good new universal theory of computation (as is, say, lambda calculus or message-passing), then “could this theory work as a HDL” seems like it would be a good test case for it. An electronic circuit is very like a computer network, so it seems like message-passing is roughly in the right ballpark, but functional programming also feels like it kind of describes signals flowing along wires between components. (One programming paradigm that doesn’t describe either an electronic circuit sending signals, or a computer network sending packets, is the 1970s C-style imperative model of “do this then that”, but it’s still used.)

So could the Object Network or something like.it be used as a HDL? If it can’t, what extensions might it need so that it can?

The reason I ask is that it does feel like the core of the Object Network idea is there in the idea of “circuit” or “data channel”. A component observes the current state of its input wires, and its own current state, then generates a new state that it puts out on its output wires.

on(trigger, initialstate, f)

Have to say you’ve fully bamboozled me with this post.

Yeah, this part is a little inside-baseball and specifically talking about my own unformed ideas about, specifically, the Functional-Reactive paradigm (as is used currently inside web browsers in libraries like React, and which I would like to see used much more widely, outside web browsers). And specifically, about how one might represent that “object references its own past state and uses that to update to a new state” thing, in something like functional programming. The specific two primitives that I’m suggesting here (“on” vs “start”) are 1) not well named at all, just on the spur of the moment, 2) not described very well, and 3) not well thought through. I do need to talk to Bosmon more about this guts-of-FRP stuff.

Having said all that, I still want to say that I like the Data-First ideal.

I believe we should strive, as much as possible, to manage secrets not by sending each other complicated machines with hidden state, but by simply not sharing information in the first place. Keep computation over data as minimal and as opt-in as possible, allow computation to take place in steps. Keep data physically local in space and time as much as possible. When we do share data, it should be in a useful form for remixing on fine-grained scale. We should also share in a way that takes advantage of caching, and cache as much as possible and as locally as possible. This suggests we should use immutable data structures and content-based identifiers like SHA hashes, wherever we can. Balancing this, we must also have a way to permanently “forget” and erase information locally, on a system under one’s own control. But we should also not force other people’s systems to erase information. The rule should always be: if I control a computer, I get to say whether data on it is deleted or retained. If I publish data to someone else’s computer, I should do so only after giving my informed consent, and when I do publish, I lose the ability to remotely control or erase it.

However, one case (other than credentials) where hidden state might be inescapable, could be large datasets (and ‘large’ can include just simple filesystems). It is probably impossibly inefficient to have something like a filesystem browser where the act of browsing causes an entire folder to be transmitted - even locally, from disk to memory. Eg a simple folder could be multiple terabytes in size on today’s hard drives. So to some extent, the state of a folder (or any data collection) often has to be “hidden” in the sense that it’s not all transmitted all at once, and is revealed only in pieces as it’s navigated, even though it’s not hidden via access control mechanisms.

It’s important to have simple and loose data formats, up to and including “completely arbitrary sequence of stuff” as in Lisp, but balancing this, within a trusted local system, I feel like units of data at any size should be able to be tagged with something like a type, that indicates that some known computation (or assertion - this gets into the difference between nominal vs structural types, and I believe we need nominal typing, we can’t get away just with structural) has already been run on this data and that data meets some quality/integrity criterion, so the validation computation doesn’t have to be rerun. Obviously such a tag can’t be trusted between untrusted systems (not even between RAM controlled by different processes, or RAM vs removable media), so re-parsing and re-validating has to be done at the edges. Parsing itself can be an extremely dangerous operation, and doubly so if parsing requires running Turing-complete validation computations (especially if the sender of the data can specify what computation is run - this was the Log4j vulnerability). But in my opinion Turing-Completeness is not actually what causes vulnerabilities (at worst TC can cause a computation to loop endlessly, and this can be detected and halted); rather, it’s what built-in operations the language allows, what system state they can access and change that is not related to the input values. It is very easy to imagine safe Turing-Complete languages, and incredibly unsafe but non-Turing-Complete languages (for example, any language with a “read arbitrary RAM address” operation but no Call or Goto, is unsafe but not Turing-Complete).

Pure functions and event-driven Reactive programming techniques feel like they should fit reasonably well with a data-first approach, but Reactive is still such a blurry topic that there are many approaches that could work and no obvious winner yet.

One such conflict between approaches might be, again, with data collections or filesystems. For some purposes, it might make sense to express a folder/collection/sequence as an immutable data set and give it a SHA hashing identifier. This means that we have some very strong guarantees about what’s in it, for example that it hasn’t been tampered with in transit. This is great for read-only archives - and a lot of what we do with computers at the moment seems to be publishing and ingesting large, read-only archives. But expressing a folder as an immutable dataset doesn’t play nicely with also viewing it as a realtime updating collection of mutable subfolders/subsequences, which we may need for efficient viewing and browsing. So there are two ways of seeing an “update” to a folder-like collection: 1, a transaction of all folder subtrees added/removed, down to the individual files (and units inside files), 2, a transaction of just the immediate folders added/removed, ignoring any changes made to those and pushing change to the edges. One’s good for integrity, the other’s good for efficiency. Squaring this circle is… well, I don’t know quite how to approach it. But we can do better than what we have. I’d love for example if a filesystem did automatically recompute a contents hash for all files, and for all folders above it… but how much overhead is this going to cause, on every disk block write?

I have a very similar feeling to you about “data-first”, but my sense is that you’re sketching out here what I’d call a “system in-the-large”, like the WWW. A thing made of protocols that operates over large numbers of computers and isn’t too concerned about efficiency. It also isn’t too concerned about computation and how exactly computation works; that’s something to be punted to other layers.

Sorry to drop a big spanner in the works right at the top of this extensive reply, but that’s not really it!

This is probably a valid way of approaching very large systems. But I’m also interested in very small systems, like personal machines running personal desktop-like OSes. Remember that in the 1980s, with the Amiga, say, we had very tiny machines by today’s standards that could run entire multitasking graphical OSes. I feel like we’ve missed that sense of smallness, in today’s web.

I totally agree with this.

One of the things that drives me is the nagging feeling that we have far too much unnececessary complexity, and too many layers, in our software architectures, and that htis has happened because we picked the wrong abstractions to build on, so we had to keep adding more and more. I feel that we should look for very tiny new abstractions that we don’t have to throw away as we move from very small, to very large, systems.

I totally agree with this.

So my thinking is: consider three scales of use-cases. One, a system at the level of electronic circuits. Two, a system at the level of a single graphical process (application) or desktop (OS). Three, a system at the scale of the Web. If we have a valid model of computation, it should apply roughly equally to all three scales. If it only works at one scale and not others, it’s probably not quite a correct model of computation, and we should look at why not. For instance, the idea of “message-passing”, as in object-object programming, seems to be one of those ideas or models of computation that remains valid at all three of those scales. So if we’re looking for a new model of computation, it needs to be at least as useful as message-passing is.

Not sure about the “electronic circuits” bit, would “embedded microcontrollers” be similar?

I agree about the need for a new system architecture to cover all scales from tiny tiny embedded chips upwards.

But I think state transmission (my own model as described before) is far far better than message passing as a programmable machine model.

That vision/requirement of not just being able to run “in the large”, as a protocol like the Web, but also being able to run “in the small”, down at the level of one process on one machine - or smaller - is perhaps the difference between our perspectives.

Nope - I totally agree!

Now to the specifics.

Can’t say I fully get what you’re saying here, but in the Object Net all links or pointers to objects are unique string IDs (“UIDs”). What forging do you have in mind? Sketch out an attack!

So I’m talking here about whether “data-first” is a valid model of computation down at the second scale level: that of a single process in RAM, or an operating system on a machine.

I do it all pretty much the same down to PCs, and single processes, and embedded devices. I think my own Data-First vision is especially good at scaling both up and down.

“all links or pointers to objects are unique string IDs” is something that would work for a large-scale, Web-like, protocol, but it’s not something that’s going to work down at the level of a single process. At best, using strings everywhere would mean we’re talking something like HTTP or Tcl/Tk. It’s going to be very slow to run as a desktop. Yes, the trend right now is to literally reinvent all desktop apps as literal web browsers talking to literal web servers and slinging megabytes of text data around on every mouse click - but I don’t like that trend. So I’m assuming that at the desktop or process level, there won’t be “string IDs”, but rather there’d be object IDs of some kind.

Well I can only speak about my own perfectly successful experiments using strings for everything everywhere, within a smartwatch running on an nRF52 CPU!

What forging do you have in mind? Sketch out an attack!

The forging and the attack would be if you hadn’t thought about what kind of VM you wanted to run down at the single-process or machine level, and you figured you’d maybe use a Forth instead. Dusk OS, perhaps, because it’s super trendy right now in the indie scene which we’re adjacent to. And if it was a Forth, it would come with machine integers as its pointers, and if you didn’t think too hard about security implications, you might decide to just use raw Forth machine integers as the local version of your object identifiers, thinking they’d be fast, and that since it was your own machine, you wouldn’t need to worry about security.

I don’t want someone else’s VM - I write everything in C.

And then you would download an object from a stranger, because that’s the whole purpose of sharing objects, and would get rooted within seconds. Because the animation code of that object might compile down to raw machine Forth which implements no memory safety. Because of the object identifiers being raw integers on the local machine because that’s what’s the simple, natural (but wrong) thing to do in a Forth.

So that’s where I /do/ have a challenge, although I don’t compile anything anywhere, it’s all run-time rule interpretation. But even that has potential security vulnerabilities. So that’s a big job to put on my to-do list - ensuring it’s as safe to link to someone else’s rules as it is to install via Play Store or apt on Debian.

So I’m asking: have you thought about how your “Object Network” would run on a single machine, in a single process? Or is it a thing like the Web that you think of as being “too big to use on a single process” and you’d use a completely different technology down at process-scale?

Yes I have. Nothing changes, whether global scale or “radio tag attached to your bag” scale!

Yeah, implementation detail, optimisation. I’m not doing premature security or optimisation!

That’s valid, but, I’m here to remind you that maybe you do need to at least think about security and optimisation even right at the beginning stage. Don’t maybe commit entirely to something like “all objects are text files and all object IDs are text strings” unless that’s how you’d be comfortable handling, say, individual mouse clicks or graphical buffer writes.

I know I need to keep security and optimisation in mind at all stages, but I’ll take it on and in as I come across pressing need. And yes, I have happily sent live acceleromter data from an nRF52 device over 2.4GHz as string objects. It does need some more optimisation, and I’m looking forward to that engineering challenge, but it’s not in the least bit daunting.

As an on-the-wire protocol, where efficiency isn’t a concern, sure, text strings are probably fine. But you’d like to run something like this on a single machine as well as on the wire, I think?

Yes, as I say, it all runs fine everywhere so far. I’m sure I’ll hit bumps in the road as I scale up to P2P and mesh architectures, but I don’t fear them at all.

Order is determined by what I term the “application protocols” between objects - the type- or domain-determined concept of what peers are doing and the rules of interaction. This includes application or domain specific timeouts, so that the domain/type/application protocols will work over the wire. I don’t have CRDTs or stuff like that, or lockstep clocking, it’s all loose, best efforts.

I can understand fault tolerance being a good quality to have - after all a lot of electronic engineering is fault tolerance and turning noisy analog signals into digital - and I guess that’s a big part of what the object-oriented paradigm is trying to do. I guess my question is, can you see this “Object Network” paradigm scaling down to the level of individual function-calls or message-sends inside a single application? If so, then how exactly does this “loose, best effort” approach work for a case like, say, subtraction of two integers? Where it’s quite important for the correctness of the result for the two pieces of data to arrive in the correct order but you’d maybe not want to tag every 2 bytes of data with 64 or 128 bytes of timestamps and object identifiers and other metadata?

If you have an integer amount of cents in one account and a transfer object, you have what I call an “intention that puts the system in tension”. The transfer is the stated intention of “I want this amount deducted from here and added to here”. As long as this hasn’t happened the system is in tension. It can resolve that tension at its leisure, and as soon as the numbers add up correctly to future auditors, the tension is released. I talk about that a lot on my old REST-based blog - e.g.: WS-Are-You-Sure | The REST Dialogues | What Not How | http://duncan-cragg.org/blog/

In big distributed systems with relatively few, relatively large messages/transactions, like say large corporate databases, it’s ok to pay a large cost of metadata overhead on each transaction, so that we can sync up those updates later. But we maybe don’t want to pay that cost inside a single proces, on every mouse click event or every function call.

There’s quite an overhead, I’m happy to admit. But it’s not proven to be an issue so far in my work. When it becomes an issue, that’s when the optimisations are required, but I’m good at putting those off until they’re really needed, so I can focus on the principals and architecture, and their consequences as they’re driven through.

The potential danger of having a computing paradigm that only works well for very large systems (because we’ve “priced it out of the market” in terms of the per-transaction runtime resource cost on the desktop) is that programmers will just ignore it on the desktop, and that might lead right back to where we are now, with lots of competing “applications”. Or, to also where we are now, which is everyone shipping desktop apps as literal webapps running on a web server and a web browser, and everything costing gigabytes to bring up “hello world” and taking seconds to process a keystroke.

Again, not been an issue so far. And there are orders of magnitude of optimisation opportunities I’ve not taken yet. It’s very much about getting the design and principles right before the grunge work of fine-tuning.

If you have the object’s UID, you can request it, then won’t get it without the read permission. I’m struggling with your mental model of this I think!

My mental model, as I said, was “capabilities”, as in knowing the object identifier itself being what grants the read permission. Eg Capability-based security - Wikipedia From your response, I’m guessing that you’re not thinking in terms of capabilities? Then that complicates matters quite a bit. That’s why I said that I thought capabilities are the simplest way of granting permissions; they map naturally to the idea of “reference” in programming languages.

I’m more than happy to use capability-based security. Sounds good to me. I think that’s where I’ve been headed overall. In ThoughtWorks I remember having these long discussions about bearer tokens given specific rights to read or write on an API, where you could encode to a fine grain exactly what the client was allowed to do. People were all about API keys back then, and may still be, I don’t know. A key difference that has cost a lot of attention lost from techies going glazed-eyes is when I assert that my model has identity and keys client side not managed by servers, using traditional passwords checked on a server-by-server basis. You own your own identity and all its permissions. Equivalent of the browser storing passwords in a way, which was once considered a terrible thing to do.

But also what I’m saying is that, at some point, even in order to implement the concept of a “read permission” which isn’t a capability (ie like a Unix access list), I think you’ll find that you’ll need some kind of component that hides some information necessary to grant that permission. So committing hard to “objects don’t have hidden state”, and committing to this stance even inside a machine’s own RAM, may I think give you problems on how to implement the secret-keeping needed to implement read-permissions. The Object Network would be a way of approaching computing which has limitations, and can’t be used for systems programming or perhaps even not for application programming. Which seems a pity if we want to replace all applications and operating systems with this model.

OK, this is where once again I’m losing you. I don’t get what you’re saying about hidden stuff. Obviously you keep your private keys private, but that can’t be what you mean or you’d have said it.

I’m also thinking that a functional-reactive model, if it can be made simple enough (and “just cache the previous computed value of all functions and make it available as a magic variable” is possibly simple enough), could also describe the lower levels of a network.

Sorry, I don’t get this! I re-read it 5 times…

Think of my first scale level: the electronics in a computer. We currently build circuits using programming languages ( Hardware description language - Wikipedia ) These languages use a variety of programming paradigms. If we do indeed have a good new universal theory of computation (as is, say, lambda calculus or message-passing), then “could this theory work as a HDL” seems like it would be a good test case for it. An electronic circuit is very like a computer network, so it seems like message-passing is roughly in the right ballpark, but functional programming also feels like it kind of describes signals flowing along wires between components. (One programming paradigm that doesn’t describe either an electronic circuit sending signals, or a computer network sending packets, is the 1970s C-style imperative model of “do this then that”, but it’s still used.) So could the Object Network or something like.it be used as a HDL? If it can’t, what extensions might it need so that it can? The reason I ask is that it does feel like the core of the Object Network idea is there in the idea of “circuit” or “data channel”. A component observes the current state of its input wires, and its own current state, then generates a new state that it puts out on its output wires.

Yes, although I haven’t gone down the HDL path before, I do think many of us with radical ideas in computing and programming have come from a physics or electronics background, where we were imbued with the concepts of independent components interacting freely without always needing lockstep clocks. So state driving state like a spreadsheet is very much in that model or paradigm.

1 Like

Thanks for your reply!

Sorry to drop a big spanner in the works right at the top of this extensive reply

No, this is great - the spanners are the interesting part!

I agree about the need for a new system architecture to cover all scales from tiny tiny embedded chips upwards. But I think state transmission (my own model as described before) is far far better than message passing as a programmable machine model.

Yes, that’s my intuition too, or at least it was when the idea hit me 20 years ago, and I still feel like it ought to be true. I want it to be true.

20 years on, we’ve got a bit more data about this set of concepts, and the broad term for this sort of thing generally seems to be “Reactive Programming” (having dropped the “Functional” part somewhere in the 2000s). I’ll mention the Wikipedia page on the topic again because I think it’s a good introduction to, if nothing else, the confusion and complexity that currently seems to bedevi the RP scene : Reactive programming - Wikipedia

The “data-first reactive” sort of idea that seems to have hit both of us, seems to be perhaps even stronger than mainstream RP, since it puts an emphasis on sharing all or almost all of each component / object / node’s state, rather than (as is usually the case) hiding most of it.

I think even web-frontend style 2010s-2020s RP still seems to expose more state than the oldschool Smalltalk object-oriented MVP concept did, and there seems to be growing agreement that yes, exposing lots of state is much more a good thing than we were all taught back in the 1990s.

But RP in the web-frontend world still doesn’t seem to have coalesced into the smallest possible set of ideas that could possibly implement it. It’s got a lot of large chunky boilerplate around it. So it’s interesting to ask why that simplification hasn’t happened yet, despite 20 years of innovation on the idea.

Well I can only speak about my own perfectly successful experiments using strings for everything everywhere, within a smartwatch running on an nRF52 CPU!

That’s impressive!

I still feel that if you’re, say, adding two integers, you probably maybe want a machine integer data type at some point rather than trying to implement integer maths entirely using strings?

I admit that I have a personal databae project (SUSN) which is a small Javascript running entirely over strings, and it does work well. But I don’t really consider it complete if it’s only got the one string type.

I don’t want someone else’s VM - I write everything in C.

That’s cool! But since you are defining a programming paradigm, at some point your C program is going to be a VM, and it’s going to have the same issues of needing to be fit for the purpose of running untrusted code that other people have sent you that other VMs have.

my model has identity and keys client side not managed by servers, using traditional passwords checked on a server-by-server basis. You own your own identity and all its permissions.

That definitely seems good, and I think the concept of capabilities will probably help you explain that idea.

OK, this is where once again I’m losing you. I don’t get what you’re saying about hidden stuff. Obviously you keep your private keys private, but that can’t be what you mean or you’d have said it.

I’m not at all talking about cryptography here but rather about in-RAM references/pointers between software components: objects or functions.

“Private keys” is an example: how do I keep my private keys private? What basic mechanisms does my model of computation need in order for keeping my private keys private to be a thing that could happen?

What I’m trying to get at is that an object that changes over time, is going to be very much like a function with an environment. And the mechanism of “observe the state of another object”, while not quite being like “message sending” in OOP, is still going to be very much like the concept of “function call”.

If we don’t allow functions to be able to hide some part of their state (their “environment”, in functional programming terms, or what OOP would call “private members”), then I think we are not going to be able to implement anything like an operating system.

Consider a software component that provides a view of a much larger dataset, which changes over time, and would like to expose that view to other software components. But it would also like to not expose a reference to that actual large dataset, because not all of those other software components are 100% trusted with that data.

To be able to implement a component like that, we need hidden state.

I think we’ll find that in a modern computer, we need quite a few components like that.

I know you’ve said that you do think that your concept involves “rules” on objects which have some kind of hidden state, but I’m trying to push you to be a little more upfront and clear about the implication of this: if we have ANY hidden state, at all, in a model of computation, we DO have hidden state. We can’t both have and not have the ability to hide state.

And that means we need to think about hidden state and HOW exactly it works in this computational model.

Edit: Personally I think that the individual component/object/function is the best place to put hidden state. It pairs well with the “capability” concept of the reference/pointer being the access right, and it means that hidden-state is accounted for right down at the level of our basic computation model. There are systems which try to put the boundary of hidden state at higher levels, but these always mean we have to introduce new security-related concepts which don’t exist in our core computation model, and that’s how security errors happen. Concepts like: memory segment, security ring, process, user group, container, domain, etc. I feel like it’s best to try to avoid pushing security and secret-hiding to a higher level in this way, because these abstractions often end up leaking - and they also require some kind of centralised control to administer, which is something else that we’d like to avoid.