Constraints on the Formalization of AGI

Unexpected consequences of the self-study paradox

From Narrow To General AI
7 min readMay 31, 2023

This is part 4 of a series on AGI. You can read part 3 here. See a list of all posts here.

Summary of parts 1, 2 and 3: To tackle the mountain of challenges related to developing Artificial General Intelligence (AGI), you should begin with a set of mental functions that are invariably present in every mind. These are usually invisible, automatic mechanisms which drive the mind forward. They include thinking, motives, sensing, introspection, attention, and emotions. To analyze the mind in granular detail requires stepping back from abstract structures like “concepts”, “plans”, or “memories” and focusing on individual thought-actions as well as their interconnections.

Any theory of mind that is necessary to the development of AGI must be formalized in a simple way. The world may be infinitely complex, but the mind is not the world itself. The constraints on human thinking require you to abstract and generalize away that complexity into neat diagrams and consistent models. These are necessarily interpretations, compressed to fit to the needs and capacities of a limited mind, so that it can work with them. No researcher would accept an incomprehensible solution, not because it’s untrue, but because it’s unworkable and unpalatable.

That raises a question: what would a mind be like that could learn a model of itself? And what properties must that model have so that it could be handled by the mind? Questions like these may feel frustrating, since there appears to be no way out of the paradox —you want to know the truth, but by definition, you are looking for a version of that truth that works within the parameters and machinery of that “truth” itself (as in, the limits of the mind are part of that truth).

There is another constraint: you have to decide how the mind even figured out what it itself was doing. How did the model you create get into your mind in the first place? Perhaps introspection provided the initial material; then, through external tools like pen and paper, you formed theories that were fed back into your mind, back and forth in an iterative cycle. If so, how did the mind check that it had the correct answer? What does “knowing” and “checking” even mean? For that matter, what is a “mental model”? Is it a series of thoughts, the interactions of those thoughts, a set of preferences and mental tendencies, some combination of all? When should the mind stop investigating because it has enough of a model to be satisfied? All of these must be answered by the model itself.

The first helpful clue to these riddles comes from asking how the introspective faculty of the mind can actually know about itself. Factual knowledge about the mind isn’t innately given to it; if it were, every belief about the mind would be considered true. Correct knowledge is subject to doubt and alteration, which means it must be acquired somehow and at some time. Humankind’s many theories about the mind have differed from one another and are in various stages of incompleteness. Building a theory of mind is an incremental process, with beliefs building on top of one another, self-correcting, etc. It is a dialectic process of discovery. Applying this limitation on self-learning is what gives us our first clue about one of the functions of the mind.

Knowledge, even introspective knowledge about the mind, can only be “gained” at some specific time, in a specific way, and through a specific process. If you assume (as many do) that knowledge is realized by the physical brain, then seeing your own mind and learning something about it is a very real, time-bound change. If you stick to mechanical explanations, then it stands to reason that whatever the thing is that is learning about the mind and “recording” its observations, it must, at least at that moment, be somehow different from what is being observed. You can’t simultaneously learn about an object and also be exactly that object.

When applied to introspection this presents a conundrum, since the mind appears to be studying itself, and the two are supposed to be the same thing. The only conclusion that can be drawn is that in the moment of introspection there are at least two distinct “parts” of the mind, one studying the other. If you observe a thought or feeling in your mind, the part of your mind generating the thought or feeling cannot be the same as the part doing the observing. The observer has its own processes going on, and perhaps even its own motivations¹.

This hypothesis unfortunately encounters a new problem, because you can also know that you are studying your own mind. That means you can study the part of your mind that is doing the studying. E.g. you have thoughts, you observe that you have thoughts, you observe that you observe that you have thoughts, etc. That would lead to an infinite regress of minds studying minds studying minds. The only way to stop this infinite series is to assume that the function of self-study is taken over by different parts of the mind at different times. To put it simply, one part of the mind studies the other for a moment, then the latter takes over the role of investigator and studies the former, and so on. It is also possible for them to both study each other simultaneously, perhaps during deep contemplation. Either way, a minimum of only two parts of the mind are needed, not an infinite number.

In studying the other part(s) of the mind, the observing part can learn what that other knows by studying its “records” so to speak; that includes what it learned about the current observer. The knowledge gained by the earlier investigator is implicitly passed back to the current one when that one takes on the role. The simplest way to do this is to show all records through one medium; e.g. the common medium of thoughts.

All in all, a minimum of two parts of the mind are necessary for introspection, though there may hypothetically be more. “Part” here is defined as some abstract logical function that has the ability to be changed by another part of the mind (i.e. it can observe and learn), and maybe also the external world. At the same time it can persistently record that observation. Because you can observe your own mind as it learns, i.e. as it changes, it is necessary that the act of learning itself be a feature of each of these parts. This means there are at least two separate learning components². Whether they are physically separate is not, as yet, decidable. We’ll investigate that possibility later.

All this may seem wildly counter intuitive. It suggests the mind is split into two or more generally independent learners, or even “souls”, each at arms length to one another³. This contradicts our intuition of the mind as a unified whole. I wouldn’t be surprised if you believed there must be a problem with the argument, even though you couldn’t say exactly what it was. The intuition of unity may be difficult to shake off. However, by considering this post alongside the previous one, you can see how the belief that the mind is “unified” is itself just another thought-action, a construction of beliefs, one that assigns properties to mental events after the fact. All that is needed to have such a belief is enough justifying precursor thoughts to trigger the conclusion that the mind is unified; it need not actually be so. Software developers are familiar with this idea of abstractions: an entity can be logically single e.g. a single database, but in actuality split across multiple real database units. The unity of the mind may be a useful abstraction of introspection.

There is still the remaining question of just how many “parts” the mind has, as well as what exactly a “part” is. To address this, however, we have to make a detour into a few other topics, namely thinking, motives and learning. Afterwards, we can come back round and answer the question.

This post shows how understanding the basic limitations and constraints of what the mind can do, and carrying them through to their logical consequences can give insight into otherwise invisible processes of the mind, even some things that the mind may otherwise mislead itself about. The mind is a complex, ever-changing machine. Conclusions that are drawn about it too often admit of a myriad of exceptions. So theories tend to be controversial and rooted in the theorist’s moral preferences. Every indisputable mechanical fact we can gain, no matter how simple, is precious.

Next post: Learning by awareness

¹ One other option is that it does not happen simultaneously, and that the mind is not seeing itself in the moment, but rather in the past. This would be like a computer process switching between two threads, both on the same chip and writing to the same memory heap. Such a model would be reasonable if the observed mind were restricted to thoughts. However, you can also see your feelings, which as we’ll see later are the mind interpreting a motive. For this theory to hold, you would have to be able to alternate between motives as well; keeping one motive “on ice” while you processed the other. This would boil down to the same split mind theory as is being proposed, since there are still two motives at play; only it is more awkward.

² Note, this idea of a “split mind” is not the same as when people say they are of “two minds” about something. The latter division tends to occur over different times, not simultaneously, as the mind ping-pongs back and forth between two conflicting ideas or motives. In this split mind, the two motives need not conflict.

³ Those familiar with split-brain studies may start to see where this is going.



From Narrow To General AI

The road from Narrow AI to AGI presents both technical and philosophical challenges. This blog explores novel approaches and addresses longstanding questions.