Hey r/MachineLearning—I’m a literature professor who conducts research in global literature and the digital humanities, and I’m currently working on a project involving the alphabet, typography, and computation. In particular, I’ve spent the last few months working to design and program a recursive alphabet that I believe has important implications for AI and philosophy of mind/language. I could use your help in assessing the philosophical and theoretical implications of what I’m designing and possibly discuss where to take its development from here. If you find this promising, I could especially use technical help to formally implement it and explore its possibilities.
First, please find attached a rather basic but workable illustration of what I plan to implement more robustly in code (Proof of Concept model). I know it might not seem like much at first glance and I of course welcome critique, but I hope you’ll entertain a few thoughts I have about what I believe are its vast implications for computer vision, philosophy of mind/language, and the algorithmic implementation of artificial intelligence. Though I’m not a specialist in these fields, I am here as a literary scholar to offer what I can to help solve interesting + important problems, and I believe the solution to AI cognition might unexpectedly require “literary” forms of thinking/reading (though “literal” would be more accurate here). Relatedly, please bear in mind that the use of the Latin alphabet here is merely for ease of modeling; the real interest will entail swapping in other alphabetical sets, such as a programming language with self-replicating capabilities.
The recursive alphabet’s most distinctive feature is the visual capability of each of its letters to display as either (1) themselves, as per normal, and (2) as recursive and atomically structured homoglyphs of themselves composed of every letter of the alphabet. For example, the “single” “A” below can at once be represented as a normal A and also as constituted by the entire alphabet at the same time (among which exists another A):
A BCD E F GH IJ KLMNOPQR ST UV WX YZ
This model should make intuitive sense to us because it stands to reason that even in everyday contexts, the letter “A” presupposes the entire alphabet regardless of the absence of all other letters (one could even argue that the phrase “the letter A” is a tautology 10x over). Thus, what is represented by “A” holds for B and C, etc. as well as for each of the A-Z letter-atoms that constitute the larger A. And from there, for each of the micro-letters and macro-letters up and down the scale.
Thus, a second majorly distinctive feature of this recursive alphabet is that each letter is in fact recursively homoglyphic vertically (i.e., each letter “looks” like itself at higher and lower scales of observations) as well recursively isomorphic horizontally (i.e. the entire alphabet embedded in one letter is identical in both ordinance and substance to the same entire alphabet embedded in a different letter on the same scalar plane). Basically what we get is an alphabetical structure that is at once consistent, complete, and endlessly self-referential that has a lot of stability because each of the alphabet’s characters both constitute and are constituted by all alphabetical characters.
I have much more to say about the alphabet’s formal implementation, including how we can visualize it using an interface that allows for recursive zoom in and out of the various scales of alphabet, with attention paid to optimizing draw distance at the limits of human perception and pixel display technologies. But the more important application, as mentioned, is not just to visually represent the recursive structure of the English/Latin alphabet but to conceive of novel programming applications. These may possibly proceed in the form of self-replicating Quines, which print out strings of code that are copies (simulacra) of the program’s original source code. Having spent a lot of time thinking about and modeling this recursive alphabet, it seems clear to me that if properly developed, a recursive Unicode alphabet could very well be the missing link to true artificial general intelligence, and from there, everything else.
While I haven’t been able to implement this myself due to the limits of my technical expertise, I submit for your consideration the possibility that forms of literary thinking might in fact be essential for conceptualizing how we will build true AGI. As understood via literary theory (re: structuralism, deconstruction, etc.), one could say our human ability to read (and therefore have intelligence) is founded on being able to relate everything to everything in terms of metaphor. The words you’re reading right now, for instance, are just metaphors for things that they aren’t (e.g. the word “tree” =/= actual tree). And these “words” are literally made of letters that are themselves metaphors for more fundamental “letters”, which include, for example, subatomic particles like Higgs bosons or Ideal Letterforms, etc. (e.g. a is not a, it’s just particles in the shape of a; also, a is not True A because this >a is not that >a). However, now that we’ve modeled a recursive alphabet, we have a means by which we can show the interchangeability of meanings between and among any letter or set of letters. Thus we might be able to teach computers to achieve the same nuance of reading.
Consider the following three letters: Α, А, and A. By scrutinizing them, you will discover the letters are very slightly different because they’re not English but rather Unicode. Indeed, they are not even of the same local alphabet; the first is Greek, the second is Cyrillic, and the third is Latin. Like the recursive As of my alphabet, these characters are also homoglyphs of each other and are sometimes used in IDN homograph attacks. However, an important difference is that the Α, А, and A homoglyphs are only hard to detect for humans. For computers, they’re as easy to delineate between as we humans are able to delineate between the micro-As and macro-As of my recursive alphabet model.
What is happening is that there are perceptual scales in which humans cannot tell any difference between two given letter-defined objects, while our Unicode-OCD computers have trouble seeing/distinguishing anything except via hardcoded boolean logic (though deep learning can train them to just “go with it”). Hence the need for my recursive alphabet. In order for us to help AI develop and match human-level adaptive reading skills, I’d argue that it will require a well-implemented recursive Unicode alphabet. AI development up until now has been built on alphabet/Unicode standards that have hyper-rigid or else impossibly ambiguous definitions for what “A” means. However, the utility of my recursive alphabet applied to Unicode is that it would allow our Alphabet AI to understand that every “A” is in fact made up of the entire cast of other Unicode characters. This more sophisticated understanding of the recursive morphology of the alphabet will give A.I. the ability to observe letters at a much greater range of scales and interrelatedness, allowing them not just to “identify” glyphs in parallel isolation but indeed to read letters and their interrelation (and literal intertwinement) with other letters. And from there, words, actions, practices, humans, etc.
Thus, I am rather confident the first true AGI will become so by becoming fluent in reading Unicode as a recursive alphabet. As you might imagine, I have much, much more to say about this, but I’ve already gone on too long. If you have a free moment to respond with any thoughts or suggestions for next steps, I would very much appreciate it.