A programming environment with first class categories

I’m pretty dumb about category theory. From a distance, it feels like a category is almost exactly the same thing as a relation, except somehow not an actual relation because of infinities or somesuch. But apart from not being a relation, it looks and feels almost exactly like a relation.

I feel like if we had a programming language that natively did relations, we could express category theory in that fairly simply. But not necessarily the other way around, because category theory is Large and Confusing. For a programming language, I’d prefer to start small without all the Large and Confusing stuff and save thatstuff to add on later - in a library, rather than a language extension, if possible.

By relations, I mean something like Prolog’s predicates and/or Kanren’s relations: things a little like functions except that

a) they can have multiple return values instead of just one (it’s probably required to use a form of lazy evaluation so as to not return all of those values at once, because they will very often be a countably infinite number of them)

b) like a typed function and unlike an untyped function, not every argument/value actually exists in the relation (a side effect of there being possibly more than one value is that there could be zero values). Yet no actual concept of “typing” is required to express this; just some kind of computable rule to say which arguments/values are or aren’t in the relation - and this same kind of rule is exactly how we also do ordinary computing.

c) as a tricky edge case, you can also, potentially, have just a naked value that exists in the relation yet doesn’t have any argument attached - is not part of an argument/value pair. Naively you could think about it as “this value has an argument of a special otherwise-inexpressible guard value”, but even that will I think get you in trouble, because algebras don’t often like inexpressible special-case values. This is a tricky edge case and very hard to explain clearly except that it’s what you get if you consider the set of “pairs plus atoms”: ie, if you have a bunch of Lisp terms - some of which are pairs and some of which are just naked values - and shove them all together into a set. That set will contain both pairs and naked values. And if you remove the “naked values” and try to squeeze everything into r rigid “two column table” datatype, you break all of the algebraic properties of the Lisp term as a type. It MIGHT be possible to work around this, somehow, with a very clever selection of “guard value”… but it also might not. I feel like this edge case hasn’t been widely talked about, but solving it is quite important to getting really good support for relations as a native computing data structure that don’t subtly explode when we put real-world data into them.

d) finally, unlike lambda calculus lambdas, you can iterate relations (getting them to provide you a list of possible arguments) if you don’t know an argument, as well as just evaluating them (getting a value or set/sequence of values for an argument). And that means we maybe can’t compute with them exactly by rewriting one term into another, like we do for functions. Or, maybe we can and we just haven’t found a clear way to express this yet.

If we had a language built on such relations, I believe we could express ALL of: functions (transformations over data); datasets (table-like things) AND types (restrictions on or precomputed declarations about data), all as instances of one simple computed-collection entity.

But finding a decent, clear, idiomatic semantics for how you do the actual computation - the equivalent of a “function call” on a relation - such that you can both iterate or evaluate in the same call format, and do that soundly and efficiently in the presence of recursion and countable infinities - seems like a missing piece.

Prolog (or rather the “resolution via unification and backtracking” algorithm which Prolog implements) can do this! But to do it, it introduces the unusual concepts of “logic variable” and “unification environment” both of which are quite alien to normal functional programming, don’t really behave nicely with “rewrite” semantics, and I don’t think they’ve ever been parallelised entirely successfully.

Datalog gets a lot of traction in the data world, where it’s a slightly better SQL, but, it achieves a lot of its good qualities by throwing away a lot of the good parts of Prolog, particularly by being non-Turing complete. So Datalog itself cannot be the actual answer we’re looking for, as a general-purpose computing (rather than just a limited data-query) language.

Kanren, superficially, embeds the Turing-complete Prolog algorithm into a pure-functional form, and improves the Prolog search method (it’s not quite a depth-first search, so it can handle a bit more left-recursion, possibly even including handling integer arithmetic). But it cheats a bit to do this: it still lugs around a “variable environment” and although it calls its things relations, when computed they are actually not set-like but sequence-like “streams” which can have the same value occurring multiple times. To my way of thinking, that’s still not the 100% native “relational programming” algorithm that we really want and that could simplify a lot of things if we found it.