Application Lifecycles

Hello everyone,

I’ve been thinking a lot about the lifecycle of software applications. The cycle is as old as personal computing: a person buys/leases a piece of closed-source software, invests time in learning and entering data, and in the end application support is dropped for a variety of business/technical reasons. Software as a Service has only exasperated this problem.

The best-case scenario in these situations is a propriety input/export feature that is almost never in a useful format for interoperability. The fundamental problem is that business interests never align with robust data takeout. And as long as data is trapped within a single piece of software, we won’t have malleable systems.

With that in mind, my team at Yorba has built a prov-o (the provenance ontology)/schema.org-based export/import system that tracks the most critical facts our member’s application state. This decision has two important business consequences:

  1. Our main product is a SaaS offering. Building this makes it easy for someone to terminate their subscription and fully exercise their right to remove all their personal information from our system with no consequence. Because down the line they could simply re-upload and pick up where they left off.
  2. Building this on an open schema means any competitor could copy features and trivially bootstrap their product.

It also opens up the possibility that others may build on this data, which makes the data even more valuable. That’s the happiest path for everyone.

So that’s the gambit. Regarding the technical details, this data provides an auditable, interoperable history of actions that our members have taken with organizations and businesses using Yorba. Every event on Yorba is represented as a transaction (a verb) that happens between two parties (nouns: you and an entity).

Core Principles

  • Every action is added to an immutable event stream represented using @type (e.g., AddAction, RegisterAction, DeleteAction) and is part of a targetCollection of similar events.
  • Every event has an agent (the party initiating the action) and a participant (the party affected by the action).
  • Events in Yorba are often based on data from outside systems. These events include the provenance of the claim being made using prov:wasDerivedFrom to trace external sources (e.g., an email’s Message-ID).

We did not invent any of this schema.

This also puts Yorba in a good position to federate this data (local-first, Solid, etc…). While the metadata about this data is critical for the operation of our application, the actual facts (personal information) within are just a liability for us as we don’t sell user data nor do we use it to train LLMs. Furthermore, keeping it off our servers lightens our legal compliance load as a small startup that’s not taking VC. We’re still looking for the right substrate for federated/distributed storage. Would love to hear any success stories there.

Thoughts? Is anyone here aware of a precedence? Happy to share more details too.

Every action is added to an immutable event stream

So youre non-negotiably persisting all data forever? Are you aware of the serious legal problems that this exposes you and your users to, and have you thought about how to handle legal requests to modify or delete that event stream?

For example, what happens when a customer of your product accidentally adds some legally proscribed information (eg: Personally Identifiable Information of someone who did not grant permission to process their data, or who subsequently has revoked that permission; or copyrighted or NDA’d information subject to a takedown request; or information which is illegal or objectionable under some country’s laws, such as links to CSAM)? That customer can now never remove that information, and both you and the customer are now potentially committing a serious crime, forever.

I’m not a software marketing expert, but perhaps consider not baking “commit crimes forever” into your value proposition.

Thanks @natecull - your question on legal liability demonstrates the importance of carefully choosing your words.

Hopefully the last paragraph in the post gives some indication of our general sensitivity to legal matters:

I was using the word “immutable” in the same way many in the industry are currently using it. For example, if we were to take this technical documentation on the Datomic database literally, it would be impossible to use Datomic anywhere that required GDPR compliance:

A database value is the set of all datoms ever added to the database. It only accrues new information (like a log or ledger), and only via transactions. A database is not a set of places that get updated. The word value emphasizes that databases, like datoms, are immutable.

But of course there are ways to modify this “immutable” information.

In the future, I’ll stress the word provenance and prefer the term append-only log which has fewer implications.

Regarding your specific questions about customer control of their data, they have full control to modify if they download the data and re-upload it. That sort of explicit tampering could weaken provenance claims because it opens the door to tampering. But I’d rather address that after we prove the value of data provenance. At the moment, we’re living in an emergent “post-truth” world where the internet is awash with malinformation and LLM forgeries which holds the same value as legitimately sourced information. But initiatives such as this one and Adobe’s Content Authenticity Initiative are betting on a different future.

1 Like

non-negotiably persisting all data

In some countries ISP (Internet Service Provider) companies are required by law to retain all data for several years. There are commercial spyware companies whose sole focus is to non-negotiably persist unwilling targets’ personal data. Surely they’re not liable for other people’s potentially illegal data or accidental sequence of bytes? But social media platforms are expected to police their own servers. cloud data storage services are surveilled and censored; and mobile OSes are making their users’ phones run censorship on local files, even auto-report to authorities. Laws are being proposed or already enacted, where even small websites are legally liable for their users’ data.

It’s been theorized that the number Pi contains within it all possible data, including every book ever written, copyrighted movies or other legally prohibited sequences of numbers. It’s unenforceable. There are infinite ways to perform communication and computation, using any physical medium: Morse code on the airwaves, Shiitake memristors, Superman memory crystal (5D optical data storage), synthetic DNA, butterflies, cosmic rays, rocks in the desert.

That implies any sufficiently large number (in terms of digits) is potentially illegal. It highlights the need for encryption in transit and at rest, where all sensitive data is made impossible to inspect, illegible and incomprehensible by platform owners and legal authorities. Laws are only as good as the enforcement - in other words, they can’t catch what they can’t see.

Interesting angle to explore, the disadvantages of immutability. And in contrast, the value of ephemerality. ABDD: Always be destroying data. But who know what pieces haven’t been garbage collected, or unknowingly cached at a lower level of the machine, persistence as a personal risk and liability.

That reminds me I recently saw in a codebase someone named a folder liabilities for what’s typically named assets, which includes bundled code. A good reminder that any third-party data or code is a liability.

Content Credentials help you record and display the most important details about a piece of content at every step of its lifecycle.

This information is tamper-evident, persistent across editing iterations, and accessible to anyone, bringing a previously unattainable level of transparency to the digital content consumption experience.

..ProofMode taps into enhanced device sensor metadata, hardware fingerprinting, cryptographic signing, and third-party notaries to provide interoperable provenance.

Content Credentials embedded at the chip level.. support authenticity and reliability in photojournalism and the fact-checking process

The Coalition for Content Provenance and Authenticity.. Collaborators include Adobe, the Associated Press, BBC, Microsoft, The New York Times Co., Reuters, Leica, Nikon, Canon, Pixelstream, Truepic, Qualcomm, [Arm, Intel] and many others, including members of civil society.

It comes down to who you can trust. This technology could achieve verification of authenticity and identity, or it could be misused as a vector of invasive surveillance. Transparency is not always a good thing, if it removes your privacy and undermines security. Persistence is not always a good thing, if it’s non-negotiable (as it is here) and used to spy on your activities and thoughts, or becomes a legal liability.

1 Like

@natecull once made a similar comment on my history post. It is very valid, and I have a response to it, and yours too.

Our understanding of accountability has been entirely oriented towards the mutable world, and not I/O. An example is:

which implies that, now the user input and any derivative must be destroyed.

We may have high ethical standards, but we must not assume the same of corporations. The requirement of destroying information has one weakness: nonexistential claims about information are in general not enforceable. Do we not already know how that works? When we enforce, we enforce a weaker version: that corporations do not exploit the information they are required to destroy. This enforcement is as good as what we can have, and is arguably good enough, and fair!

It is like how even if the police are unable to decrypt chat messages they can still find drug dealers, because the drugs have to be sold somehow.

If my I/O data can somehow be exploited by another party, and they do not listen to me, fine. (Therefore, Yorba is fine.) I make sure to also exploit my I/O data, and additionally also the I/O of that (Yorba, etc.) who exploited mine. At the same time I want to not exploit friendly people who do not exploit me. I will also respect the privacy of others, not by completely destroying information from them, but by only using that information for self-improvement (which we already do).

Is surveillance fine, then? Yes, but only distributed surveillance. I want the data to be in the hands of my neighbors and colleagues. Not a central agency.

It is time to redesign the ethics.

Using “Yorba” only as a synonym of an institution yet to be trusted.

This made it clearer for me about how to support deletion in my imaginary append-only operating system.

Append a delete command to the event stream, with the address of an input. Then the specified input will be deleted. A hole will remain at the address. Delete is impure (functional programming).

You can also delete the delete command in the history, but another delete command will remain. What? You want to make sure no delete command remains? Then you must be a shadow agency.

1 Like