SUSN: Simple Uniform Semantic Notation

SUSN (Simple Uniform Semantic Notation) is a weird little project I’ve been working on for a while. It started out as a way to scratch a very personal itch: how to create a personal knowledge base / Mind Map / semantic map, using plain text. Each line of the text file would be a single “assertion”, in a format inspired by RDF, but not particularly limited by the “graph” mindset. After building out my text file, I would then be able to query it in various ways using Javascript or any other small language (potentially including Lua, Retro Forth, UXN or other tiny systems).

A large part of the motivation for doing this myself, from scratch, is my failure to find any notetaking software that works the way I think (including multiple Wikis, and Markdown-based zettelkasten tools). That and that I want to be able to take notes on both notepads and mobile devices, yet without using a cloud app.

The name is a play on JSON; it’s not really at all JSON-like, but it was inspired by JSON objects originally, and does run over JSON arrays (or any other data model that provides arrays or lists).

SUSN makes some very unusual design choices that have been informed by the very specific use case of hand-editing plain text in an ordinary text editor, aiming for personal knowledge graph markup, and being line-based. But I like what it has become so far. It fits a niche I haven’t found anywhere else, somewhere between JSON, RDF, XML and Markdown. I’m using it currently in a database / mind-mapping project, and hammering on it to try to make it the best version of itself that it can be.

SUSN currently exists as a very small Node.js script that can parse, write, and query arrays of SUSN lines. (For simplicity, I have chosen not to even deal with the Linux LF vs Windows CR/LF holy war, and leave breaking/joining lines up to the user).

Gitlab is here:

3 Likes

I really like this. Also thinking that in blocks you may have markdown support by default to help with rendering texts nicely.

Interesting! I love experiments with textual human-computer interfaces. There is so much useful low-tech left to be discovered.

So here’s an interesting thing. SUSN appears to be almost the same as a concept by Steven Obua: “Recursive Text” (RX).

https://practal.com/recursivetext/

Recursive teXt (RX) is a new general-purpose text format.

It has a simple semantics:

RX = Block+
Block = Line (Line | Block)*
Line = Character*

In other words:

  • An RX document is a non-empty sequence of blocks.
  • A block starts with a line, followed by a sequence of lines and blocks.
  • A line is just a sequence of characters.

An RX document is saved as a plain text file, usually with the suffix .rx.

In plain text, the semantics of RX is encoded via indentation.

RX can be edited just as any other text via standard tools. But it is intended to be edited in a special editor that respects and exploits its semantics.

SUSN is essentially the same idea, except using a visible indentation character, and also doing a little bit more chunking of the text lines. (Because I think that’s useful). But yeah, SUSN could be defined as “Blocks, Lines, Characters” and then not doing any further chunking.

Obviously one unavoidable restriction of both RX and SUSN is that a Line may not start with the indent character. A Line is therefore not quite just a sequence of arbitrary Characters, as Obua defines it. This becomes more obvious to see if we use a non-space indent character. That’s one reason why I think it’s helpful to chunk the line into tokens and then define that the first token is special, so the restricted character set gets limited to that one token and not all the others.

Obua’s RX is defined in the context of his "Practal " (Practical Logic) project, which seems along very similar lines to what I’ve been thinking about over the last 15 years or so. Basically, as soon as someone starts seriously poking at the use of logic and formal methods for programming, it starts to become apparent that there’s a huge impedence gap between how logic systems are used in mathematical logic, and how programming systems work, and we start to wonder if it would be helpful to close that gap a little bit by bootstrapping formal logic with a little bit of what the programming world has learned in the last hundred years or more since Set Theory and First-Order Predicate Logic. There have been many, many attempts at this, but very few which have achieved critical mass and have been successfully built on by others. So we continue to try to develop 21st century theorem proving software using notation and techniques developed for 19th century blackboards, which feels like something that could be usefully improved.

I don’t know if Obua’s “Abstraction Logic” formalism is the next “slightly better FOPL” or not. But it seems especially important to me, and Obua seems to think so too, that we should look at logics that put everything into a single universe of terms and symbols - since a computer’s memory is such a single universe. We don’t very often have the luxury in computing of separating languages/documents/databases into neatly stratified, utterly separated, universes of discourse - even though compiler toolchains based on formal type theory often keep trying to do just this, I think for misguided reasons based on the restrictions baked into their 19th century formalisms. (Relational Databases, for example, try to do this - and the result is that the category of NoSQL Databases exists, because actual data produced by the actual universe consistently fails to obey the strict typing requirements of Relational Database Theory). We should certainly avoid recursive loops, but we should look for mechanisms that do that as and when we evaluate specific individual terms, and not by trying to divide the entire universe of expressible patterns of binary digits in RAM into “can be evaluated” and “can not be evaluated” chunks every morning before we turn the whole Internet on.

(Ideally I think we should probably also try to rebuild logic not on “sets” but on “sequences”, because physically-existing computing, communication and symbol-storage systems never provide us with abstract indeterminate sets but only with sequences: arrays of storage, numbered addresses, or even time-sequences of events. But that claim is a large one and probably quite a hard sell to the mathematics community.)

Anyway: whatever train of thought Obua is on, “RX” and “SUSN” seem to be expressions of the same idea. I came to SUSN because it was just the most practical way of entering semi-structured data that I cared about on limited devices, and I was annoyed (and still am) that it wasn’t quite the same data model as anything mainstream. But Obua seems to have come to this idea from the needs of expressing logical formulae, which presumably is a slightly more principled derivation.

1 Like

Interesting stuff… I have only had a quick look at Steven Obua’s site, but I am sure I will be back!

As for structured notation, I wonder if you know about this one: https://treenotation.org/
It looks somehow interesting but my feeling is that I haven’t quite understood the point yet.

Being a mere amateur in formal logic, I cannot judge if Abstraction Logic lives up to its claims, but it looks like a serious attempt at bridging the gap between mathematics and CS. The paper is on my reading list.

I doubt that replacing sets by sequences would be a good move. Sets are the most basic collections, to which you can then add multiplicity (multisets) and order (sequences). Assuming order by default means that you have to push its absence into the operations acting on sets. Which is how for example Common Lisp handles sets, and it is very error-prone.

Tree Notation looks interesting, although most of the pages on the website seem long on hype and short on details. I’m guessing the FAQ (https://faq.treenotation.org/ ) is the page with the most meat to it, and if this pseudocode really is the core of the idea:

nodeBreakSymbol = “\n” // New lines separate nodes
edgeSymbol = " " // Increasing indent to denote parent/child relationship
interface TreeNode {
parent: &TreeNode
children: TreeNode
line: string
}

Then I guess yes, it’s very close to my current conception of SUSN. I prefer a physical “edge symbol” (because leading spaces often get mangled in today’s text transmission systems in a way that linebreak characters don’t), but the specific symbol used is not really essential to the data model. The author of Tree Notation also appears to be parsing lines into space-separated words, instead of just leaving them as character sequences as in Recursive Text, and I think that’s probably a good idea.

So that’s interesting. There’s at least three of us then who are interested in this concept of basically “just indented lines”. It’s a weird data model by today’s standards, but I guess it does also have a somewhat respectable heritage: tracing back through Markdown and HTML “headlines” to the “outliner” concept in word processors, and before that, NLS. And somewhere in there, before or after Markdown, Python.

Indented text is still a little frustrating to me because it’s not quite as universal a notation as, say, S-expressions are. Without adding some new syntax, you can’t, for example, close a block and then immediately open it again - as you can in S-expressions or other systems with separate “open” and “close” marks. This limits how precisely it can match the contents of an in-memory sequence of nodes. So it’s a little bit awkward and a compromise of a notation. But the upside is that it’s really easy to read and write in our current text editors. It’s almost good enough to replace S-expressions. Almost. But not quite.

Still, I’d like to see more exploration of this family of notations and their accompanying data model. It’s basically a tree, I guess, which isn’t that weird, but, it’s not one that our current programming languages give us as a core primitive. It’s easy to construct from arrays or list, but it’s not the same as them.

Yes, the Abstraction Logic papers are confusing and after having read a few of them, I still don’t grasp what the core insight is supposed be. I feel like perhaps Metamath is on a clearer path - or at least one that I vaguely understand.

Assuming order by default means that you have to push its absence into the operations acting on sets.

Yes, exactly! It’s very annoying to model abstract sets using concrete, physically-existing, sequences, if we’re already committed to thinking about sets. But that’s precisely why I think we should question our prior commitment to sets and ask ourself if they are really the most basic collections - or was it only Set Theory that told us this?

The real world, or at least real computing machinery, does not actually give us Sets in the 19th century Set Theory sense - it only gives us structured, ordered containers, the simplest of which are sequences. So, if we’re going to build computing machinery to do computing operations, maybe it would be better if we started with the things the machine can natively represent, rather than things that the machine can’t natively represent but can only be simulated with much complex labour.

The Turing Machine, after all, doesn’t have anything to do with Set Theory but is rather an abstraction of a pen moving across and writing symbols on a blackboard, in a very strictly sequenced order. It is widely understood that the Turing Machine can perform any computation that can be computed. If the Turing Machine doesn’t need sets, then how certain can we be that they are really such a fundamental abstraction to the act of computation?

However, I suspect thinking further along this line takes us to systems of theory like Linear Logic and Sequent Calculus, where introducing or removing symbols is a costly operation, and these do give me quite a headache to think about because they behave very differently from my intuition, and I can’t say that I understand either of them very well.

Well… the members of my family are a set, not a sequence. The collection of stuff I own is a multiset, not a sequence. That’s the real world for me. Today’s computing machinery is indeed sequence-based. I doubt that it has to remain so forever. I see it as an implementation choice. Convenient but not inevitable.

That’s a good counter-argument, yes. It’s true that if we think about just the bare existence of things in the real world, that we do get something that looks like an abstract set (or, yes, multiset, if we allow multiple identical things to exist).

Yet I could argue that physically existing things also always have position (in space or time), and that position (of physical objects) makes the real world more like a dictionary, function, or category structure than a set structure. That is, it seems to consist of things more like labelled boxes, or nodes joined by arrows, than things like featureless, placeless, bags.

Whether fundamental mental or conceptual objects, ie mathematical objects and all their friends, are most usefully thought of as having something like an index/key/position/argument/arrow (my feeling) or whether they can be more usefully modelled by just simple existence (the set theory model) is a good question.

The intuition I have is that attaching a position/key-like thing to conceptual objects just makes them easier to handle both mentally and in physical computing machinery. And beyond that, that we probably can’t really attach two mental objects together (in order to form a mental model) if they don’t have some two-part existence structure like this. And third, that objects which only have “bare existence” perhaps really don’t exist at all: because in what way, or to what would they exist? My intuition here tells me is that existence is relationship, you see. And also that a single Boolean true/false “relationship to a set” is not quite enough of a relationship to be fully expressive of all we need to express. (Though it’s certainly one bit more than zero).

I admit that this line of thinking is fairly strange and may be wrong.

I see some echoes of it, however, in Dave Childs’ “Extended Set Theory” (which preceded Codd’s relational algebra by a few years) and, more recently, in Jeremy Kepner’s badly-named theory of “Associative Arrays” (a revision of Codd’s relational tables, defined as tuples of tuples, rather than sets of tuples, in order to unify graphs, matrices and relations for Big Data datasets. The key point being that Kepner has thrown the sets out of Codd relations, or rather replaced them with ordered sets of keys, so they have the properties of both sets and sequences.)

Childs is perhaps a little on the crank side (he briefly made a bit of a splash at Microsoft, I believe,
but most of his mathematical material is proprietary white papers). Some representative links:

Kepner has a book (although I haven’t read it, only a few of his papers).

and a representative paper:

I’m not really interested in either Big Data or Set Theory as such, but I am interested in ways of unifying data from multiple sources at the personal desktop scale… and I’m looking for the simplest abstraction which would do it. Set theory, by itself, seems to not quite encode enough information.

It’s possible that if we think in terms of membership in multiple sets at once, that that piece of information comes out as something similar to the key in a key/value structure. Ie: whenever we think about the relation of one entity with a second entity, we always get a third entity (of the same type) that describes this relationship… And so all of these different ways of thinking about the relation of one thing with another, become just different views of the same underlying concept. That’s the sort of thing that it seems that both Childs and Kepner are trying to do, at the theoretical level, in order to solve some very concrete problems in large datasets.

2 Likes

Attaching positions, keys, or indices to things makes sense, but the natural numbers are rarely a good choice for that. Positions in 3D space have no clear order. Labels (as in dictionary keys) neither. Dictionaries/hash maps are probably the best computational model for how our brains memorize things.

1 Like

Maybe of relevance :
https://www.cs.bham.ac.uk/~mhe/HoTT-UF-in-Agda-Lecture-Notes/HoTT-UF-Agda.html#sip

The importance is not the representation of data, but the abstract mathematical properties that they possess.
(example : ordered or unordered , group or monoid etc.)

1 Like