The Data-First movement

Bosmon · 4 April 2025 11:10

Thanks folks for a super-interesting thread. I should say to start with I subscribe to the Data-First principles that Duncan has advertised, and see them for a close match with the Substrate manifesto that is currently going around. As with Jonathan’s manifesto I see a bit of a spectrum to the points - some seem definitional and others seem as “essential possibilities that should be allowed for”. And I feel a few dots should be joined up - when you say “app free” and “someone else may change it” I think we should be clear that we are talking about the same “it” here - that is, the stuff that we previously called “behaviour” that used to be packaged as an “app” is the same thing that might be changed by someone else - the “data” - this connects with Jonathan’s slogan that we have a “PL and a … document unified together”.

So to respond to a few points that leap out at me -

I feel a similar sense of unease but I wonder whether we can be clearer about what might be bothering us. How do we know when we’ve understood a fundamental abstraction? Can we give an example of one that we do understand that satisfies us?

This is called “glitch freedom” in reactive programming and is talked about sometimes but isn’t as well understood as the community would like to think. For example the paragraph I link there includes the text “Some reactive languages are glitch-free and prove this property” which was put there by Shriram Krishnamurthi following the 2016 Dagstuhl on Reactive Computing. As you can see, no citation has appeared in the 9 years since then and when I challenged him by email if he knew of any none was forthcoming.

So whilst being glitch free is something that any competent reactive system should supply, in my opinion, the odds of this actually happening seem about as good as tossing a coin - in the 2012 Bainomugisha et al Survey on Reactive Programming about half of the reactive systems were found to be glitchy. Also, awkwardly, the methodology of this paper isn’t clear. I wrote to Tom van Cutsem who confirmed that what he can recall doing is similar to this rather unsatisfying StackOverflow answer of assembling a tiny 4-node graph and see if it glitches.

Being glitch-free is something I see as one of the fundamental responsibilities of a reactive system and so the method of trying to “bolt it on after the fact” as seen in this answer seems pretty ludicrous. But as it turns out the modern JS libraries underlying the current “signals boom” (rather than older cruft lik RxJS) are naturally glitch-free which I was able to confirm by porting the test cases from preact-signals (which has extremely good ones) to alien-signals. This is far from “proving the property” but at least it satisfies me.

A couple of things here - firstly, I think it’s helpful to be clear where the boundary of the discipline is. Like you until last year, I happily talked of “functional reactivity” because FRP is what got all the airtime last decade when these ideas were getting popularised. But I think the writeup shows that the tag “functional” somehow narrows the space of approaches we’re interested in without necessarily leaving in scope all the issues that we are interested in. For example I find that the FRP community seem less interested in talking about glitches, and state (the latter of which, as “data-first” people we are hugely interested in) than what you could call the “regular effing reactive programming community”.

And so secondly, to the “get the previous state of this object” operator, something also close to my heart. Is it even an operator, and if so what does it act on! There’s a highly interesting split in the community here, in terms of what the relevant API looks like. Under the currently dominant paradigm in the “signals boom”, signals are exhaustively categorised into two types, plain “signals” which are read/write, and derived “computed signals” which are more or less pure functions of plain and derived signals. The interesting split is what the API for computed signals looks like. The classic form in preact-signals simply accepts a notionally pure function.

However, Solid signals’ corresponding createMemo behaves are you are wanting, that is it accepts a callback accepting the previous and current signal values.

I’ve found that this appears to be ergonomically essential for writing certain kinds of applications, but it’s hard to say whether or not it is indeed “needed to implement (functional) reactivity”. But one helpful lens I find to put on this issue emerges from the “data-first” notion, and also is touched on in my 2017 Avatars paper. The question is, if you are looking at part of a graph of signal sources and sinks, how would you set about effectively transferring the part of the design that it represents from one site to another? Now under the “pure contract” for a computed/memo, the answer is obvious - you just need to transmit the values of all the plain signals, and you can be completely confident that since all of the computeds are pure functions of those, once the plain signal values arrive, the computeds will settle to the correct values.

Now you say, and I agree with you, that “access to the previous value” is ergonomically essential. But the moment we allow this, all bets are off for any straightforward way to transmit the design somewhere else - it feels like we have to snapshot the values of all the computeds as well, just in the off chance there is some kind of excess state in any of them, which in most cases there probably isn’t going to be. So this is the kind of thing that contributes to my idea (and probably yours), that our understanding of this contract, as expressed through the functional form of the API, is somehow inadequate. There is a kind of sense that “access to previous state” could be “used for good as well as for ill”. For example, you might be using it just to reduce the cost of some computations, or transfer a piece of plain signal value from the past to the future. On the other hand, you might just maliciously return the previous state or new state by flipping a coin!

So another thing that makes me feel there is a missing part of the contract is this other thing I find, especially in the Vue variant of signals, called Writable Computed. Much like “access to previous value” it is a violation of the pure contract that is ergonomically essential but could be used “for good or ill”. If you use it for good, you will ensure to write an upstream signal value that is consistent with the value that will be computed through the reaction, and then you will not trigger some kind of obnoxious cycle of updates. But otherwise …

“Writable computed” I found so essential that I made a kind of a polypatch of preact-signals to make an automatically safe variant of it that handles a simple case, but in practice not powerful enough to handle all the cases I need.

We seem to find ourselves in a kind of Gödelian trap where the obviously safe functional contracts we can lay our hands on are incomplete, but all the attempts to extend them in necessary ways allow appalling footguns that would make it impossible to deliver reliable end-user programming if they were unrestricted. I feel that part of the solution to this has to involve a focus on data and state, rather than functional contracts, and hence my suspicion of the “functional” bit of functional reactive programming. And also, not to muddy the waters still further - the cases that seem most interesting centre on the possibility that at the “next steam engine tick” there is a different quantity of signals in the system than there were at the previous tick, something which a functional approach is always going to deal with in an unsatisfying way.

I have a view on this issue as well - that we make a split between what we call “computation per se” that happens at the nodes, and the overall update of state in the reactive graph managed by the reactive system/substrate. If we insist that the former is done by what I am coining good functions which are both easy to express and easy to execute, we can realistically expect that they can execute in something we consider “one clock tick”. Anything more ambitious than this needs to be broken up into smaller parts and spatialised across the substrate, and let it apply its natural idioms of synchronisation and glitch-free coordination. The end-users will also thank us since the progress of their execution will be naturally intelligible, trackable, resumable, all the rest of it.

Don’t know if any of this resonates and would be lovely to get your reflections!