Love letter to catlangs

Korean as a Concatenative, Stack-Oriented Language

Korean is an subject-object-verb (SOV) language, which shares the a similar structure to Reverse Polish Notation (RPN) used in concatenative, stack-oriented computer programming languages such as Forth and Factor. By this, I mean that SOV and RPN both have operators (i.e. verbs and adjectives, in the case of Korean) in the final position, which take the proceeding subject and object as arguments. In other words, SOV could be seen as (S,O)V where S and O are arguments to V. Similarly, nouns can take adjectives or even relative clauses as arguments, verbs take adverbs, particles take nouns, verb endings (어미) take verbs, etc. Taken further, this can be visualized as a concatenated list of values being pushed on and off of a stack like stack-oriented computer languages do when they are executing programs. I believe this stack analogy may be able to help some Korean (and other SOV) language learners better mentally parse sentences while reading. This is surely not a perfect analogy and I do not claim this to be an original idea, but it was simply something I noticed when learning about the Factor language and drew parallels to the way Korean is structured.

Brief Introduction to RPN and Stacks

Like SOV languages, RPN puts the operator in the postfix position. This means that instead of saying 2 + 3, one would say 2 3 +. Taken further, instead of writing (2 + 3) × 11 + 1, one would say 2 3 add 11 mul 1 add as a concatenated list of symbols and operators. Since this can become cumbersome to keep track of mentally, it can be represented on a stack where inputs are pushed onto the stack and consume any existing values as arguments:

Input 2 3 add 11 mul 1 add
Stack 2 3 5 11 55 1 56
2 5 55

This is only meant to be a brief introduction, but Wikipedia goes into more depth with this example. Make sure you understand this concept before continuing onto the Korean examples below.

Basic Example

This is a basic example showing a simple sentence with a subject, object, and verb:

Original

내가 밥을 먹다. (I eat rice.)

Lexing

Breaking the original sentence into concatenated, parameterized morphemes:

나 (N)가 밥 (N)을 (S,O)먹 (V)다.

Notice how not only does the verb have parameters, but also the particles and verb ending. I’m viewing the particles as almost like optional, type-safe wrappers to mark the subject and verb. Without them, the types can be inferred, but they add clarity a sort of type-safety. The verb endings are also parameterized in a similar way, but these take verbs and return new verbs for multiple levels of chaining (i.e. agglutination).

Parsing

Moving left-to-right, each morpheme is added to the stack. Some morphemes can consume morphemes already on the stack as arguments similar to the math example above.

Input (N)가 (N)을
Stack (나)가 (밥)을
(나)가 (나)가
Input (S,O)먹 (V)다
Stack ((나)가, (밥)을)먹 (((나)가, (밥)을)먹)다
Relative Clause Example

This is a slightly more complex example showing how relative clause is treated, which is often one of sources of difficulty for Korean learners, especially those coming from SVO languages like English. This also includes past tense and a polite verb ending.

Original

어제 먹은 음식이 맛있었어요. (The food I ate yesterday was delicious.)

Lexing

어제 먹 (V)은 음식 (N)이 (S)맛 (S)있 (V)었 (V)어요.

Parsing

Same as the example above, but some morphemes consume optional parameters. They are not part of the signature shown in the lexing because they are not required, and listing every possible optional parameter would be impractical. An example of this is a verb being described by an adverb. A verb can certainly take an adverb, but it is not required. A transitive verb on the other hand would be required to take an object, even if it might be an implied object.

Note how the relative clause 어제 먹은 simply acts as a descriptor to 음식. This is a simple relative clause for brevity, but it’s easy to see how this could be expanded.

Input 어제 (V)은 음식 (N)이
Stack 어제 (어제)먹 ((어제)먹)은 (((어제)먹)은)음식 ((((어제)먹)은)음식)이
Input (S)맛 (S)있 (V)었 (V)어요
Stack (((((어제)먹)은)음식)이)맛 ((((((어제)먹)은)음식)이)맛)있 (((((((어제)먹)은)음식)이)맛)있)었 ((((((((어제)먹)은)음식)이)맛)있)었)어요
Conclusion

As mentioned above, this is only scratching the surface, but it could be taken further although there are surely holes in places like conditionals, topics, and contextual information. For example conditionals (e.g. -면) probably need to be treated as infix operators and topics and contextual information could probably be envisioned living in their own stack alongside the main sentence stacks shown above.

I also considered creating an extension to Factor (or perhaps a whole new programming language) that was based on Korean keywords following this principle. It’s not really a task I would like to take on now, but perhaps someone would like to run with the idea on their own.

By Ryan Brainard

3 Likes

Reminded me of a programming language that compiles ancient Chinese to JavaScript, Python, and Ruby.

From a code example titled 「九十九瓶啤酒」 (99 Bottles of Beer). It is indeed a cat lang, or at least stack-based.

吾有一言。          ; I have a word
曰「「春日宴。」」。 ; Is "A Spring Festival Banquet"
書之。            ; Write it (to console)

有數九。          ; Is number 9
名之曰「酒數」。   ; Name it "Wine Count"

Another program generates a tree with flowers, using a couple of methods defined as a chain of curried functions. That sure looks like a pipeline, or “train” as it’s called in APL.

; Draw tree method = Paper => East => South => Length => Roughness => Direction => Change Direction
畫樹法 = 紙 => 東 => 南 => 長 => 粗 => 向 => 向變 => { ... }

; Draw ground method = Paper => Left => Right => Bottom => Top => Step
畫地法 = 紙 => 左 => 右 => 底 => 高 => 步 => { ... }

4 Likes

Those who read French may profit from these slides from a (non-recorded) talk on programming languages that are not based on English, including a section on Wenyan, and/or from a recorded talk by the same author (Baptiste Mélès) on roughly the same topic.

2 Likes

I’ve been wondering this too, several times, as I explore the array languages, mainly J. Don’t think I am experienced enough yet to answer, as I’ve only done a very little bit on the cat side. The experience of “weaving” described and the geometric part does make me wonder…

Point-free programming (“tacit programming”, as they call it in J) feels somehow very nice, you feel like you’re directly manipulating the computational thing itself, directly doing the algorithmic thing directly, in some sense, rather than talking about the algorithmic act, or “setting it up”.

Conor Hoekstra (code_report, or something similar, is his yt channel) does very nice videos where he goes through the solution of some problem showing a load of solutions, including often in five or six array languages, as he’s into array languages. They’re very interesting. I don’t know if you’ll get that feeling I’m attempting to describe in the previous paragraph, but you might.

He made arraybox too, which I would recommend for seeing the glyphs alone. Do C-h for the keyboard shortcuts to pop up, C-k for all the beautiful glyphs:

The array languages in general are very good for their interactive playgrounds with nice labs and tutorials, I know Dyalog has ones that look nice, but the ones I can personally vouch for are the J ones:

If you go to Examples, click on “Plot”, then click “Run” over on the top of the Edit pane, you’ll see some surprisingly excellent plots and can investigate the code. Or if you hover over “labs” you might very well see something that interests you there!

3 Likes

That J Playground with the Plot example, it’s a joy to see it in action. Now I get how computational notebooks like Jupyter and Clerk evolved from such conversational interfaces. From APL on the teletype machine to the terminal command line, REPLs, editors, and integrated environments like Emacs and Glamorous Toolkit. They enable building programs incrementally as a conversation between human and machine, through interactive live coding sessions.

Especially I like how the text of the source code mixes freely with non-textual media like symbols, matrixes, tables, images, graphs. It’s similar to live music programming systems like TidalCycles pattern language, Strudel, Max, Pure Data, and a recent one I saw, loopmaster.

Every node in the syntax tree is interactive in some way, adjusting values with sliders and knobs, visualizing waves and notes, hover on names to show hints or inline documentation. “Semantic density” is a phrase I learned from APL, and I see how live coding benefits from a dense interface, fewer key strokes to express ideas and navigate the score as it’s being written and performed.

Apparently many of these live coding systems are from a family of languages called MUSIC-N.

MUSIC-N refers to a family of computer music programming languages descended from or influenced by MUSIC, a program written by Max Mathews in 1957 at Bell Labs.

Pure Data and Max are both examples of dataflow programming languages.

Dataflow languages model a program as a directed graph of the data flowing between operations. Functions or “objects” are linked or “patched” together in a graphical environment which models the flow of the control and audio.

With Lisp I think of programs as trees, lists of lists, but of course trees are a subset of graphs. I suppose that’s where a visual programming interface can have a unique advantage over textual code editor, by being able to represent and directly manipulate a graph of nodes and links.

A key innovation in Pd has been the introduction of graphical data structures. These can be used in a large variety of ways, from composing musical scores, sequencing events, to creating visuals to accompany patches or even extending the GUI.

Pd is designed to offer an extremely unstructured environment for describing data structures and their graphical appearance. The underlying idea is to allow the user to display any kind of data he or she wants to, associating it in any way with the display.

To accomplish this Pd introduces a graphical data structure, somewhat like a data structure out of the C programming language, but with a facility for attaching shapes and colors to the data, so that the user can visualize and/or edit it. The data itself can be edited from scratch or can be imported from files, generated algorithmically, or derived from analyses of incoming sounds or other data streams.

— Miller Puckette


This relates to a paper I enjoyed, Tangible Values with Text: Explorations of Bimodal Programming and its influence, Direct Manipulation: A Step Beyond Programming Languages.

Bimodal programming is the interaction paradigm in which a programmer freely intermixes text edits with direct manipulation of output in order to craft a program.

As originally defined by Ben Shneiderman, direct manipulation is the workflow characterized by “visibility of the object of interest; rapid, reversible, incremental actions; and replacement of com- plex command language syntax by direct manipulation of the object of interest”.

..We answer the question above with the following thesis: Non-trivial vector graphics programs and functional data structure manipulation programs can be constructed by output-based interactions in a bimodal programming environment.

One of the chapters is about “Tiny Structure Editors”.

It’s an experiment in generating user interface elements automatically from algebraic data types. Editing the code changes the UI, and vice versa.

There’s an insight here I’m starting to understand about the value of having malleable views and interfaces into the same underlying data structure, including functions and programs, as well as file systems, databases, websites, operating systems..

The other day I saw a project with an idea to provide a single unified API to all running desktop applications and their user interfaces.

OculOS is a lightweight daemon that reads the OS accessibility tree and exposes every button, text field, checkbox, and menu item as a JSON endpoint. It works as a REST API for scripts, testing, and CI/CD — and as an MCP server for AI agents.

From an explainer in Web Incubator Community Group: Accessibility Object Model..

Accessibility tree is a way for an application to model its user interface, the controls and all of its public properties and states, for assistive technology users to interact with it. I’ve heard it called a parallel information architecture, since it doesn’t necessarily map one-to-one with what’s rendered visually.

Assistive technology, in this context, refers to a third party application which augments or replaces the existing UI for an application. A well-known example is a screen reader, which replaces the visual UI and pointer-based UI with an auditory output (speech and tones) and a keyboard and/or gesture-based input mechanism. Many assistive technologies interact with a web page via accessibility APIs, such as UIAutomation on Windows, or NSAccessibility on OS X.

These APIs allow an application to expose a tree of objects representing the application’s interface, typically with the root node representing the application window, with various levels of grouping node descendants down to individual interactive elements. This is referred to as the accessibility tree.

If all desktop applications, including the browser itself, expose an accessibility tree to the OS; and all websites expose an accessibility tree automatically built from the DOM (document object model).. What would a world-wide tree of all accessibility trees look like? I suppose that’s where the semantic web was meant to go.

I have a dream for the Web in which computers become capable of analyzing all the data on the Web – the content, links, and transactions between people and computers. A “Semantic Web”, which makes this possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines. The “intelligent agents” people have touted for ages will finally materialize.

– Tim Berners-Lee in 1999, quoted in the book Weaving the Web


Live-coding music languages are typically dataflow-oriented, they say. Like the loopmaster language, it has a pipeline operator to express the flow of data.

The operator |> is the heart of this language. It chains operations left to right. You can chain as many operations as you want. .. Make a sawtooth wave, run it through a lowpass filter, send to output.

saw(440) |> lp($) |> tube($) |> out($)

It sure looks concatenative, though the language syntax itself isn’t.

[In the concatenative style of programming,] a function is defined as a pipeline, or a sequence of operations that take parameters from an implicit data structure on which all functions operate, and return the function results to that shared structure so that it will be used by the next operator.


ArrayBox is a nice simple interface to try out array languages. From the source repository, it says it runs BQN, Uiua, J, Kap, and TinyAPL in the browser using WebAssembly, and APL by running Dyalog on the server. I see what you mean about the author @code_report, a research scientist at Nvidia, whose YouTube channel has videos on interesting programming topics including 1 Problem, 7 Array Languages.

(From https://combinatorylogic.com/)

Formal Systems of Computation

  • Lambda Calculus (1930s) - Alonzo Church
  • Turing Machine (1930s) - Alan Turing
  • Recursive Functions (1930s) - Kurt Gödel
  • Combinatory Logic (1950s) - Moses Schönfinkel, Haskell Curry
  • Concatenative Calculus (~2000s) - Manfred von Thun, Brent Kirby

– Concatenative Programming: From Ivory to Metal - Stanford seminar by Jon Purdy and slides

4 Likes

My fascination with catlangs is their potential to be quick to learn for non-programmers:

  1. Strict left-to-right order of computation & data flow. (The reading orders for infix/prefix are something we tend to take for granted, but actually have to be taught at some point.)
    Addition in Forth language - the stack
  2. Very simple and explicit notional machine.
    Forth interpreter
  3. A clear understanding YOU can grow the language by adding words.
    To me, what justifies calling a language “language” is neither its builtin vocabulary, nor its grammar, but only the fact you can grow it.
    Programming languages with more “syntax” tend to have a more constrained 2nd-class feel e.g. “you can define functions but not operators nor statements”. Now, actually adding control structures to say Forth involves deeper wizardry than I can expect from beginners ;-), but I consider the liberating “open-world” feeling of everything being on equal footing to be valuable in itself.
    • I’d rate Snap! and Tcl on similar level of language malleability.
  4. Natural REPL use — can just press Enter at any point and inspect intermediate data.
    But it could and should be more than a REPL! Brief was great refactoring demo. Should allow user to go back and edit code. Should directly visualize stack at intermediate points. @akkartik’s demo was quite neat. here is my WIP prototype…

For these purposes, I don’t care for Forth’s machine sympathy and radical simplicity. I don’t want raw memory management and stack elements that are exactly one machine word.
I’d want something high-level, like Factor, only I don’t know, friendlier? I believe we’ve not seen peak concatenative user-friendliness, yet.

1 Like