Objects as Records vs Activities

natecull · 19 December 2024 01:38

Trying to articulate some more of my deeply complicated feelings toward the Object Oriented model, one possible axis of difference appears:

Records vs Activities

That is, there are two (no doubt more, but at least two) different ways of thinking about how we use a modern computer (ie, a data-storage, computation, and communication device).

One is the idea of Records. Coming from database theory, which predated electronic computers, going back to card tabulators and in fact writing itself.

In the Record view, a computer is primarily a data storage and retrieval device. Files or tables are collections of records or rows. Records may be mutable or immutable, but often in reality immutable is preferred. Records often correspond to paper documents or other permanent media, though they can be much smaller. Records may be required to be in a standard format and to be rejected if they do not meet this format.

Records relate to Knowledge. Although some may represent secret information, it is their inherent nature that they can be duplicated at will, and they are expected to outlast the context in which they were created.

Activities on the other hand are more like processes, communication channels, sessions, or sets of work to be done. They incorporate the ideas of active machinery (dangerous if interrupted) and social contexts (limited in time and space). They are intended to ensure that some task involving a limited number of participants is done, and done well (to some standard). They impose very strong rules on how that task is done and how those communication channels or dangerous pieces of machinery are used.

Since Activities relate not to Knowledge but to Tasks, they are very tightly bound to their context in time, space, participants, and machinery. It is not sensible to think about duplicating an Activity except in the sense of a pre-planned, standardised factory-like context, so it doesn’t matter if lots of complicated mechanism and setup cost is required to do so. It is also sensible that duplicating an Activity should require lots of permissions, because machinery is inherently dangerous.

The Object metaphor, imo, is based more around the idea of Activities than it is Records. It’s about computation - all computation - as communication between multiple actors to solve a specific task. So the Object mode of design will always think in terms of “who are the participants here” and “what are they trying to achieve”, and then attempt to craft specific computational or social machinery (ie, apps, platforms, startups) to enable this one task. Enabling is also seen in terms of controlling that task.

But I feel that this focus on Activities is increasingly dangerous socially. It empowers platform owners and disempowers platform users. If your task is ad-hoc (or if you’re disliked or overlooked by a powerful government or by powerful economic actors) there may not be a user story for your use-case, and so there may not be any relevant computational object or way of creating one.

The Records view of computing, on the other hand, is almost completely agnostic about who created records or what they will be used for. If you can read and write, you can participate, and create your own unexpected contexts and use-cases. A record is not quite task-agnostic, because the shape of the record will still reflect the needs of the particular use-case it was invented for. But records for one purpose can be rewritten for many new purposes, with no permissions required.

Streaming is activity-based: it’s an authenticated conversation, that can’t be duplicated and where consent can be withheld at any time by a powerful central party. Computing technology is controlling humans and limiting their freedom.

Downloading is records-based: it’s a context-free duplication of knowledge that can be freely used and repurposed without permission. Computing technology is used to expand human freedom.

I think this split gets at why the Object model (as a universalising story about computing, which is how Alan Kay tells it), makes me fundamentally unhappy. I want computers - and especially owners of large computers - to be empty slates as much as possible, to not impose policy over human actions, but to rather follow human orders. Objects give some ability to create valid Records - that’s useful - but they also make it far too easy to restrict the use of those Records in ways that are far too tempting to large platform-owners.

However, even many of the frameworks we have for thinking about Records are still adapted to large corporate data centres rather than personal desktops. We need new models of Records that are personal desktop-sized rather than corporate-sized. We need things midway between folders of documents, SQL tables, and JSON objects or Lisp cons cells. We need Records that can be either a few bytes long or a few terabytes, that can handle video archives or mouse-click messages, that can be either graphs of read-only hashed blocks, or function call stacks. And while class- or type-like enforcement of the shape of a Record is useful, we must be able to completely decouple the creation and use of Records from any online central authorities, including both commercial Cloud corporations like Microsoft / Amazon / Google, and powerful nonprofits like Let’s Encrypt.

thekkid · 19 December 2024 03:28

Part of the issue with object discourse is a complete lack of standardized definitions. William Cook and some of the PL community talk about it almost equivalently to dynamic dispatch. Kay’s version is closer to actors and independent agents. What you’re proposing seems closer to a distinction between computation and values in a CBPV sense.

I’m not sure how much of that distinction is coming from OO languages since you have first class functions in most languages. You do get some of it from the data oriented programming community.

For malleability I think you need some form of dynamic dispatch or at least interoperability between types like Jonathan Aldrich mentions in The power of interoperability.

khinsen · 19 December 2024 07:14

Lack of agreement on what OO means is one part of the problem. Another one is lack of context. Alan Kay’s view of objects sounds very rigid at first sight: no access to an object’s data, you only get what some method provides you. But when you look at a Smalltalk system, it’s very open. You can look at the definition of any class, and you are free to add methods to any class as well. There’s also introspection that lets you work around every barrier. So in the end, Smalltalk’s implementation of Kay’s principles really says: Think twice before you do something beyond calling an object’s methods, but if you do need to go beyond, I’ll let you do it.

As for @natecull’s issues, which I see as well all the time in my work (computational science), I think it matters to consider granularity as well. OO is typically discussed for small data units kept in memory, whereas Nate’s examples are about persistent storage and on-the-wire communication. In my experience, OO works best at the level of non-persistent domain abstractions. For lower-level data (numbers, strings, …) it’s overkill and adds useless baggage. For persistent data, it’s too constraining in terms of composition. Persistent data often outlives the software that created it, and it’s what is most valuable to users. You don’t want it locked into any software architecture. That’s also how I understand Nate’s reasoning.

akkartik · 19 December 2024 13:16

OO drove a lot of hype until 10 years ago, but it seems to have died down now. I don’t find it a useful term anymore. Objects and Classes are useful, OO not so. Speaking in terms of OO now seems counterproductive. There’s probably more timeless language available for framing these conversations.
I’m skeptical that switching from OO to even the best Record-based paradigm in these terms is going to make any difference to how predatory our Big Tech overlords are. They can control your activities with any paradigm, and at any granularity. The only useful interventions are voting with our dollars against them and voting for govt. regulation. (Sadly the US is not going in that latter direction for the next few years, and ground gained in the last few years will likely be lost.)

thekkid · 19 December 2024 15:09

I agree with this fully. One aspect of the original post that I agree with, is that portability/export matters to reduce lock-in. In the sense that implemented naively you can argue that records can be exported from an existing system into a new one but objects can not. I don’t think this escapes the fact that behavior/activities are still necessary things that need to be modeled somehow, and it would be nice if they were also portable.

thekkid · 19 December 2024 15:11

Might this be another problem with EverythingIsA syndrome. Maybe objects are a high level primitive being used to build small things as opposed to small things being used to build big ones.

Apostolis · 19 December 2024 16:11

Just a hasty response :

A.

For me that data is embedded into a social or computational process, thus it is impossible to separate the one from the other.
So, for me, the interesting thing is the process, social, physical and computational.

@khinsen I believe you agree with this, but you do not consider this to be a reason to develop such a framework. In other words, the social and technological live in different realms, with different methods and mechanisms.

I wrote about it here : Ryaki: Replacing the Web : The importance of social context to information.

B.

One problem that needs to be solved is abstraction. @khinsen for example, has talked about the need of an object to have multiple types, which is also related. I have talked about the use of category theory. Haskell has used type classes. In agda we have dependent typed records, which are more or less dependently typed classes.
Unfortunately, we do not have the development tools to use them efficiently, thus most of the time we use concrete types, instead of abstract ones. I am not sure if we need dynamic dispatch, but I might be wrong.

C.

In my opinion, OO is about processes and actors, and since currently there is no formal system that has the ability to abstract a group of primitive actors into an abstract actor, that is what creates all this confusion, we perceive the attempts of creating such a formal system as the real deal, when in fact, they are only attempts.

Just my hasty response…

emery · 19 December 2024 16:42

If we can contrast records and activities then I think the contrast in hardware would be the Memex versus the video game console. Few people know what the Memex is despite it being foundational to the personal computer and the Web. On the other hand, lots of people are content to waste their lives with video games, which can be argued are Games Without Play and actually task-oriented work simulations.

I wish I was more record oriented but unfortunately being task oriented is somehow more gratifying and fulfills some need to “look busy”. There is some satisfaction in data entry, data curation, and finding some special piece of knowledge by intention or by accident, but I think we’ve been conditioned to regard this as unproductive.

khinsen · 19 December 2024 16:59

@akkartik I see an autonomy-generating value in keeping data as “dead” records rather than depending on an access API. The key is the word “depending”. Dead data has no dependencies, APIs do. Adopting dead data formats is of course not a panacea, and certainly not sufficient to escape from BigTech. But it’s one technique on the road towards a more robust future, with less dependency on whoever, be it BigTech or the XKCD-famous random person in Nebraska.

As @thekkid points out, stable and portable APIs are worth having as well, but I’d go for multiple layers of data survival, in which documented dead-data formats are the foundation.

@thekkid Thanks for the pointer to EverythingIsA. A nice list, and that’s indeed a frequent obsession, be it with OO, FP, or whatever other fashion of the day.

@Apostolis I do agree that technical and social processes are interdependent and should be designed in concertation. But I have absolutely no idea how to do this.

thekkid · 19 December 2024 19:24

How do you feel about documented data formats that encapsulate computation as well as “plain” data or records?

khinsen · 19 December 2024 20:56

Depends on the complexity and precision of the documentation. If if includes the C++ standard, the answer is no. Next, it depends on the reliability of the documentation. I’d probably want to see at least two independently developed yet interoperable implementations.

thekkid · 19 December 2024 21:10

That makes sense. I also think the only truly durable system is something that could self bootstrap onto something trivial like a simple forth or the lambda calculus.

Apostolis · 20 December 2024 06:55

An algorithm or interactive program can be encoded in a document, a programming language code or the resulting executable binary code.

The last two might be more fragile than the first.
The second one is still readable.
The third requires reverse engineering skills to decipher.

Why don’t organisms have the same problem? 90% of our DNA is garbage and we still do not have a bootstraping problem. DNA doesn’t even have version control or branches or anything like that. I would certainly never hire Life .

khinsen · 20 December 2024 07:03

Bootstrapping is an issue for long-term preservation. But even when you don’t care about that (accepting that future generations might regret your choice), you may well want your data to survive tech churn cycles of a few years. What matters then is ease of reimplementation, which is influenced by (1) complexity and (2) quality of documentation.

I am a big fan of the Rule of Least Power: use the least powerful language sufficient for a task. You gain in possibilities to reason about the data, and in ease of reimplementation.

Did anyone investigate intermediates between dead data and APIs in Turing-complete languages? For example state machines, e.g. presenting the data as a vending machine with a number of buttons to press.

khinsen · 20 December 2024 07:05

Organisms don’t have a bootstrapping problem because the basic processes of life have never stopped. All cells in existence today are part of a lineage that goes back to one the very first cells that ever formed. There has never been a “second implementation” of life.

natecull · 23 December 2024 03:52

These are all very thought-provoking responses! I’m myself not sure if this division between “Record” and “Activity” is useful, just that the idea came to me and seems to get at something that itches me.

I do think that activities (processes, interaction, systems, etc) are actually an important thing that we want computers to do, and that every piece of “dead data” actually requires some kind of interaction as well. Thinking back to the 1980s as a kid encountering 8-bit computers, and of course what I wanted wasn’t “data processing” but for the computer to do something fun. It wasn’t until I got much older that I started to get frustrated about how fast computers and OSes were changing, and how hard it was becoming to extract data from one system and move it to another. Especially when that data was more and more becoming my personal memory record.

I think perhaps what I want is to understand more what an “activity” might be and how we can compose them. The concept of a Record feels somewhat well-understood (not entirely, eg caching, but almost-kinda understood). While Activities, which feel like the core of the Alan Kay kind of object… well they’re just very hard to think about. The kinds of ones we have don’t seem to compose well at the moment - which is what I think makes them tend toward centralization - and I don’t understand why this is.

Perhaps, Activities, because they’re about getting some physically-existing system to do something, are inherently lower-level than Records? And so being lower-level, they’re filled with a lot of irrelevant data (how to coordinate the system doing the work) that’s not really transportable beyond that system? I don’t know. This feels true-ish, but I’m not sure that it’s actually true.

An Object is a kind of state machine; that seems clear enough to me. An Object has a Record attached, or several (its class definition, its instance state, its activation records), so it seems like it’s a slightly “larger” concept. There’s more to being an object/actor/process/server than just the instance/activation data it contains. But it seems a bit hard to capture what that “more” is. If it’s the low-levelness of being bound to a specific system (and Smalltalk objects in particular seem easily fall into this state), then the “more” complexity/specificity makes the object “lesser” as in less abstract, less transportable.

I feel like the act of coordinating an activity/process also limits/specialises an object in a similar way: by binding use-cases or access patterns to what would otherwise just be pure abstract knowledge. This doesn’t have to happen of course - and no knowledge is actually abstract, all knowledge is about specifics of something - but it feels like the complexity trap of becoming over-specific happens more easily when thinking in terms of a system “doing” some action rather than “being” in some configuration. “Being” implies at least two actions, creating and reading (maybe updating and deleting), but “doing” can have an infinite number of actions. More actions implies more danger of a complexity spiral, and resulting platform lock-in, because instead of just having one API to implement and you’re portable, now there’s an infinity of APIs.

Maybe that intuition is wrong; I’m not sure. An infinite space of nouns (as in strict typing) is probably just as infinite as an infinite space of verbs (as in objects).

Yes, the idea of Record as connecting with “persistence” (and system-independence, not being locked into architectures) is key I think. I was thinking about a Record as just being a very basic idea of a “structured frame of data” - an idea general enough to embrace Packets on the wire, Blocks in disk/memory, Files/Images/Volumes in a filesystem and Rows/Transactions in a database - but the reason why these concepts work as fairly “static” abstractions is because they’re defined and implemented in multiple systems. They do in fact have a lot of moving parts and activities associated with them (cashing, indexing, managing the storage system, etc, etc). Just that we can sort of escape from local system lock-in by marshalling data into and out of these objects, at the moment.

How we use more general Objects today definitely tends more towards temporary “in-memory” items rather than “persistence”… but the temporary-ness isn’t part of what the original idea of Objects as I understand it. Smalltalk was all about othogonal persistence of objects to storage. Object systems were supposed to never need to be rebooted, the whole system was supposed to be a universal paradigm of networked computing. But despite objects being inspired by the Internet, the actual Internet community was never particularly interested in the key part, transmission of live objects between computers. The CORBA and Java communities got very interested in sending messages and classes between machines – but not object instances. Which is quite odd to me. We’ve still had a need to send live objects between machines! So we’ve decided to do it by packaging them up into entire virtual Unix machines, operating system and all, and sending VM image files around, or slightly virtualised “Containers”. Which is nearly the worst possible solution, but we use it because it gets a job done.

That’s a very interesting idea! I suppose I’d say that the “Object-Oriented Analysis and Design” movement - as opposed to object-oriented programming languages - seems to have evaporated almost entirely in a puff of smoke. As well as the idea that “Objects is the only truly universal programming paradigm so we should teach computing students only that”. Both of those universalizing tendencies have faded away. But I’m not sure that it’s possible to talk about “Objects and Classes” without talking about “object-orientation” at all.

Yes, that’s true that our tech overlords can use anything. But I think the kind of object-like “Activity” I dislike is one that has connections and processes extending over the Internet; the view that “The Network is the Computer”, as Bill Joy used to say in the Java days, and that we now have with the Cloud. This idea also seems to me to be about statefulness in the network - which is the opposite of (one might say a betrayal of) the original Internet “End-To-End Principle” where the net was seen as just a way to send dumb stateless data, and “putting policy in the network” was a Very Bad Thing. And the Net being policy-less, in the 1990s era, was talked up as like a fundamental core principle, like democracy itself. With the Cloud now we finally have persistent stateful objects (big clunky whole-operatingsystem VM images as they may be) in the network, and along with this capability has come the desire to put lots and lots of policy in the network, the more the better. Maybe the one thing didn’t lead to the other, but it feels like there’s some correlation there.

Also very true. Making truly system-portable objects-as-running-processes might bring back control to the user. It does seem like a thing we need in a personal operating system, but I really wonder a) how we could get there, and b) why the first wave of professionals inventing OOP, including the Smalltalk, CORBA and Java teams, never seemed to manage to do this, or even didn’t think of it being a priority.

The history of OOP, from a data point of view, feels like someone invented a universal filesystem/database… but forgot to add a “copy” or “table load/extract” command, leaving it up to each user to implement data import/export by hand. A very odd situation and I don’t understand how it could have come about. Yet it did! Naturally a system like that, with an important function missing, is mostly going to be used for temporary computations, because that’s all it can be used for, and will interface with something else (a RDMBS or filesystem) for persistent long-term and inter-system data storage. But that isn’t at all what OOP hoped to be in the 1990s! It was going to be the one true paradigm for everything and SQL, we were taught, was over because “object databases” were going to replace it. (And the programming language Jade was one I used that tried to do just that: problem was it performed terribly compared to SQL, and it also needed to exchange data with the rest of our ecosystem that used SQL, and so Object/Relational Mismatch was a huge issue).

This is a really interesting idea! “Abstracting a group of primitive actors into an abstract actor” does seem to be a very important thing to be missing. Would this be a different thing from just having one actor impersonating or forwarding messages on to a group of other actors, like a Smalltalk object using “NotUnderstood”?

thekkid · 23 December 2024 19:39

I wonder instead of decoupling computations and data. If it’s more important to talk about purity and side effects. Since pure computations can be closed over and transported over the network providing portability. But you’re not going to be able to transfer a side effect itself. Things like clocks, streams, open sockets can’t be reasoned about in referentially transparent ways. Similar to Fearless extensibility: Capabilities and effects, maybe if we separate out the effectful parts of computation it might help break apart these systems. I think the OO/actor solution of just passing messages around kind of misses the point where those side effects may no longer be relevant.

Apostolis · 24 December 2024 08:21

An actor is just state and behavior. Both can be moved wherever you want.

The registers or the cache of the CPU or the open sockets are just ways with which the computer performs it’s task and it has nothing to do with the actor.

Let us for a moment create a type / model for an interaction with a mastodon server. Then the open socket goes away. In fact, because we know that the server respects the specification, if we move the actor to a new computer, it will behave the same.

If we change mastodon server , the actor will behave the same. We have achieved portability.

That’s where the notion of side effects misses the point. It promotes people to abandon modeling.

Models introduce equivalent classes / equality where we can move things and have the same behavior.

The way to overcome our tech overlords requires to be able to create those apis/models/specifications so that there is no centralization, a single point of control. And that is much more difficult than having a central authority.

(A single actor that acts as a middleman)

We need to reduce the cost of distributed coordination and increase its effectiveness. It needs to be better and easier than whatever out tech overlords are offering.

This is our job as a community.

(Just in case you are wondering, Blockchain tech does NOT achieve any of the above)

thekkid · 24 December 2024 16:02

I think this is only true if you have some sensible notion of equality that allows you to move things. If my side effect is to print something in my local printer, if the printer breaks the computation is not going to be portable. You may be able to reconfigure it to run on a different printer but that’s not an equivalent operation.

akkartik · 24 December 2024 18:12

I had a similar reaction to @Apostolis’s comment. Every model is an approximation. It makes assumptions about details to leave out. It’s easy to “achieve portability” under some assumption. It’s much harder to state your assumptions precisely. And if you fail to state them then misunderstandings multiply.

Formal methods are useful when they guarantee narrow properties of a system.