Sitemap

Cracking the barrier between concrete perceptions and abstractions

How AI can build abstractions out of homogeneous sensory input

33 min readSep 20, 2025

--

Press enter or click to view image in full size

By Yervant Kulbashian. You can support me on Patreon here.

There is an unsolved epistemic challenge in AI research, which can be summarized as follows:

How can a mind bridge the gap between raw sensory perception and abstract concepts?

All human experience begins with concrete sensory stimuli: images with colours, sounds with tones, contact on the body, pains, etc. Even modern Neural Networks rely on concrete inputs and outputs (photos, video, text, etc.) to kickstart their pipelines. In addition to this basic sensory foundation human brains, and some AI, can also conjure up images and sounds that aren’t actually in the world around them through imagination and memories. You can overlay reality with a self-generated layer of inputs, like a dream placed overtop your waking life. This simple function opens up a wide range of options for what you can be aware of or respond to. It lets you plan, theorize, recall, invent, and generally respond to thoughts as if they were really happening. Yet even such thoughts, being echoes of actual experiences, remain in the space of concrete perceptions.

There is a fundamental difficulty, a step-change when moving away from concrete perceptions towards an abstract meta-understanding of those perceptions. For example, although you regularly have memories of experiences playing in your mind, to extract from those the concept of memory itself and identify it as a “type” of mental entity is surprisingly difficult to explain. Experiencing the content of memories cannot in itself tell you that they are memories — you must infer something about them from outside their content. For example, you might recognise that the experiences are not real, or that they represent the past, and so on; and experience, real, and past are also abstractions whose origins are difficult to explain.

As another example, given experiences A and B, a mind (or an AI) can respond in different ways when A comes before B vs. when B comes before A. But it is another thing entirely to extract from those the ideas of before or after. Just because you experience perceptions within time and space doesn’t mean it is a trivial matter to conceive of time and space themselves, nor any concepts related to those, like before or distance. A software program that follows steps in sequential order does not by virtue of that fact understand that it is proceeding in order. For the idea of order of operations to enter its awareness and become a factor in its decisions the software must be able to step outside its processes… somehow.

There are a myriad of examples where a system can work with concepts implicitly, but is unable to make them explicit, i.e. to identify and name them. An AI system can engage in a strategy without realising it is a strategy. An agent can have a goal or plan without being able to label either. A video generator can produce a clip without being aware that it is nearing the end of its allowed duration. A classifier can differentiate between types of objects without recognizing they have properties, or that some are the same or different, or even that objects are being represented instead of, say, events in time.

Press enter or click to view image in full size

Raw data, which are the content of intelligence, can only beget more of the same. They cannot create meaningful abstractions of their own accord, even when manipulated through thought and reasoning. Something else must be added. How does that happen? If concepts can be likened, as they often are, to nodes embedded in a network of relations, how are abstractions like consciousness or object derived and situated within that network? How could a robot moving around a free-flowing environment of concrete perceptions conceive of object, or contradiction, or strategy? That is what this post will investigate.

The scope of the challenge

We regularly overlook this crucial, elementary step in cognition because the act of meta-understanding feels intuitive to us, almost inevitable. For example, you feel that just by looking at colours you can see that they are colours, or that by seeing objects in space you can know that objects exist. “Just look! It’s right there!” you think. It hits us like an undeniable truth.

Beyond that intuition, it is not clear how you can “look” at something like “generalized colour”, which is not actually a component of the sensory perceptions themselves. It’s like asking a piece of software to realize that it is a piece of software or that it is stored in memory; both require additional built-in tools for self-analysis to retrieve that meta-information.

And in contemporary AI whenever a system needs any functional meta-understanding to do its work, additional circuitry, routines, or architectural components are added to interpret, derive, or inject that information. For example, digital cameras can process pixels quite well, but will only distinguish shapes, locations, or faces if an AI routine is added to their software to allow them to perform that explicit function, and only to that limited capacity. In fact, “allow” is a misleading word here: the system is obliged to do so, and has no choice about it. This form of built-in constraint is what keeps narrow AI narrow; the system is unable to step outside the domain ontology and conceptual structures that have been pre-defined for it.

Press enter or click to view image in full size
A sample Theorem Prover (source) which makes use of LLMs. The strategy, components, steps, inputs, and outputs involved are all hardcoded into the pipeline.

Or consider how theorem provers have the concepts of math, statement, search tree and proof embedded into their very architecture which railroad them into carrying out such proofs:

Proof search seeks to systematically traverse the vast landscape of potential proof paths to construct a valid proof tree for a given theorem in formal systems. — A Survey on Deep Learning for Theorem Proving

In contrast, humans must first explicitly learn that numbers and equations exist, that they can be related to one another, that a proof is even required, that search trees can be explored, and so on.

Manually adding functionalities piecemeal into AI may work in some narrow cases, but it is certainly unsustainable at scale. The scope of human abstractions — those which reflexively interpret the nature of our experiences rather than responding to them directly — is far too vast. For example, all of the following require you to step outside direct sensory experience:

  • time: e.g. before, after, duration, rewind, stop/end, future, repetition
  • space: e.g. distance, speed, direction, interval, change, left/right
  • existence: e.g. essence, negation, nothing, necessity, conditional, event
  • properties: e.g. general, specific, quality, change, same
  • psychology: e.g. memory, thought, concept, truth, desire, consciousness, choice
  • goals and tasks: e.g. difficult, complete, failure, plan, require
  • numeracy: e.g. infinite, greater, less than, majority, rational
  • identity: e.g. me, you, that, the, to be

And this is only a tiny sample. A look through any dictionary reveals that the majority of the words — and the majority in this post — involve a meta-understanding of concrete experiences. Some may argue that the above items in the list are facets of the structure or operation of the mind itself — as Kant suggested that time and space are. Even so, there must be some extra step to bring those functionalities into awareness, to turn them into “nouns”, to give them English names.

Someone else’s problem

Given the difficulty of explaining how the leap to abstraction can be accomplished, it is understandable that modern AI systems usually side-step this critical phase. They ignore questions of how a system could extract or generate their requisite units and operations, and deal rather with the pieces themselves as “givens”. Whatever the original inputs represent — e.g. pixels, concepts, images, words, etc. — these are generally also the ingredients the system uses during reasoning. Take for example the Predictive Processor (PP), a system outlined in Andreae’s book An AGI Brain for a Robot:

I avoid all the problems associated with segmenting a sound sequence into words by having the input to the robot already segmented. […]

Learning the sounds of one’s language, learning to segment speech, and learning to recognize patterns, such as faces, require statistical learning, like Deep Learning, so they have been side-stepped in this book.

The author defers what he considers to be the relatively inconsequential task of identifying entities, making it “someone else’s problem”. Such simplification is perhaps necessary to get any research done at all by carving out a small, narrow domain and working within its constraints. He also assumes that generating fundamental entities like words will eventually be possible through statistical learning. But will they? What about concepts like existence or thought or difference — how would a system learn to statistically associate them with a set of concrete sensory experiences?

When it comes to training his prototype agent Andreae makes use of another popular simplification: micro-worlds, i.e. “well-defined, limited domains, that enable a pared down version of the performance to be demonstrated” (source). Micro-worlds like grid worlds are ubiquitous in Reinforcement Learning and planning research, and have for decades been an easy way to side-step the complexity inherent in interpreting real-world dynamic experiences. E.g.:

While our simplified grid-worlds enabled precise manipulation of state transitions, they exclude challenges posed by open-ended worlds. — From Curiosity to Competence (diagram below from same paper)

An example grid world (source)

And it is not just AI research, but also its parent field of cognitive science that regularly engages in conceptual pre-simplification. All cognitive science experiments require that you start by introducing an abstraction at the level of the terms or functions under study. Consider this paper whose authors discuss how human beings could learn to improve their metacognitive skills:

Conceptually-driven metacognitive control is more amenable to intentional cultivation, as in cases of learning strategies for regulating one’s attention, emotion, or reasoning processes […]

Internal models represent causal relationships within a domain, allowing agents to predict outcomes and select appropriate actions.

In other words, the subject is expected to conceptualize mental functions like attention, emotion, reasoning then build a model to find causal relations between them, with the ultimate goal of self-mastery. The first part necessarily entails stepping outside the machinery of mind and perceiving attention, emotion, and reasoning as entities to be manipulated. How exactly the mind acquires these meta-representations so that it can then plan for and control them, is left entirely unaddressed — it is “someone else’s problem”.

(NB: The practice of restricting one’s research to a select few domain concepts is so ubiquitous across both fields, that one doesn’t have to search hard to find examples. The papers cited in this section were simply the three most recent papers I’ve read.)

It is critical that we understand what this step entails. Most intelligent work — planning, creating — is done over higher abstractions. For example, when we look at the world and try to invent physical laws to explain our experiences, we must first enumerate the entities involved: change, time, matter, cause, etc. These ultimately become elements in a physics equation. None of them come free from the world itself; they must be invented.

Or consider the more quotidian act of arranging and editing a movie into clips and scenes. You must first conceive of time as a spatial line (e.g. from left to right), and discuss timestamps and clip durations in a spatio-numerical way. It would be impossible to produce a movie without this meta-understanding of time either being built into the system, as it currently is in editing software and generative AI, or produced spontaneously by the mind itself. No matter how many movies current AI video generators create, they cannot engage in any meta-behaviour regarding the structure and function of their outputs unless they somehow acquire these additional abstractions. Otherwise an agent can no more manipulate the timeline of a video than a human eye can see colours outside the visible spectrum.

Conceptual flexibility and inventiveness is what currently gives the human mind an enormous advantage over any software. It allows us to consider and arrange concepts like time in a myriad of unusual or physically impossible ways. For example, you can imagine what would happen if two people were moving in opposite directions in time, or write a story about jumping through time. Both of these are enabled by first representing time in explicit spatial terms. One could go so far as to say that the inability of software to naturally generate and manipulate abstractions is the single greatest hurdle preventing the development of Artificial General Intelligence.

Contemporary theories of conceptualization

Despite abstraction being a necessary starting point for intelligence, surprisingly little work has been done to probe this mystery. Many apparent candidate theories, such as embodied theories, Theory-theory, Conceptual Metaphor Theory, etc. work on the assumption that the necessary concepts are already present, and they only elaborate how they may be applied to particular experiences. Prototype/exemplar theories don’t explain how we know that a concept should exist in a given case (see this post) only what features potentially unite their instances. And all these theories, as well as Event Predictive Cognition ignore abstractions that have no sensory content, like time, consciousness, or thinking.

Still, between them they broadly suggest a few core paradigms for how abstraction might arise in humans — and presumably how they could be achieved in AI. In this section we will describe the three most widely espoused approaches. They are:

  1. Abstraction is imparted by a set of innate, built-in brain functions.
  2. Abstractions arise through the incremental composition of lower-level elements, from simple to complex.
  3. Abstractions emerge out of some statistical grouping of stimuli across patterns of frequency or association.

The first and most natural impulse is to assume, as in the software examples above, that there are built-in meta-functions that generate abstract understanding. An internal clock would be an example of how we might perceive or interact with time. Or when faced with the difficult question of how the mind can recognize existence, it is easy to intuit that some “existence signal” must be built into the brain:

Biologically-based neural circuits may anticipate the conceptual structure of evolutionarily important concepts, such as agents, minds, animals, foods, and tools. — Cognitively Plausible Theories of Concept Composition, Barsalou

It is not so simple, however, to explain what it means for some understanding to be “innate”. Nor is it clear how innate concepts end up attached to concrete perceptions, which is where any inquiry into discovering them must begin. For example, nothingness cannot be attached to any perception, and existence should be attached to every single one, making them both useless for discrimination.

There are also an endless number and variety of abstractions one must build into a brain, many of which would not make sense from an evolutionary perspective. Why would the brain have evolved a intuition about time moving backwards, or about the square root of a negative number? We usually resolve this conundrum by supposing that some sort of composition must be at work to generate the more involved concepts. For example, your brain may not have a conception of before, after, duration, skip over, rewind, fast-forward, etc., but only a select few like time, out of which you compose more complex notions:

Complex concept learning [is a process] whereby simpler concepts are required to be organised systematically as part of the definition of a higher concept. — Neurosymbolic AI: The 3rd Wave

In conceptual combination or blending new concepts are generated by integrating multiple pre-existing conceptual spaces. — Informing Artificial Intelligence

People combine these concepts to construct infinite numbers of more complex concepts, as the open-ended phrases, sentences, and texts that humans produce effortlessly and ubiquitously illustrate. — Cognitively Plausible Theories of Concept Composition, Barsalou

We believe that deep networks excel because they exploit a particular form of compositionality in which features in one layer are combined in many different ways to create more abstract features in the next layer. — Deep Learning for AI

However, you could only apply this approach to entities composed of parts or some figurative version thereof (e.g. nations are made of people). But most complex ideas cannot be put in such combinatorial terms. Is willpower just composed of person, action, and need? What about the temporal component, the rejection of conflicting desires, and the feeling of willpower? Or what is that special ingredient that is added to the complex concept before, which is absent from after? And some concepts, like reflection (mirror), contain inherent contradictions in their definition: that person is me and not me, both at the same time.

Composition furthermore assumes a bottom-up approach: only once you have the pieces can you compose them into an aggregate. But in most cases defining the pieces themselves requires you to reference the thing they are a part of. Consider the leg of a chair. It is impossible to define a chair leg — and distinguish it from any regular stick, or from an armrest — without indicating how it functions as part of a chair. It is a leg specifically because it holds something up from underneath, like a seat or a body. So chair cannot be defined as “composed of four legs, a seat, and a back”, since to define leg requires that you already have a concept of chair. It’s necessarily circular; otherwise it must be atomic — done all at once. It also entails connections to other difficult abstractions like support (holds up the seat), solidity (chairs should be solid), or intention (chairs are intended to be used by someone to rest).

Press enter or click to view image in full size

And this is just for a mundane example of a chair. The confusion increases significantly when you consider truly incorporeal concepts like institution, number, or metaphysics. The intertwined circularity of these concepts means they cannot be invented or defined piecemeal or in isolation. You cannot define time without defining before or after, and vice versa. Their definitions seem to create a network of vicious loops where there is no starting point, no foundation — which is exactly why we intuit that they must be innate.

Certainly, many concepts that we name in language fail, on closer inspection, to have the kind of logically defined necessary and sufficient conditions that early AI researchers hoped to capture in axiomatic form. — AI: A Modern Approach, Ed 4

Compositional approaches are perhaps encouraged by mistaken analogies to dictionaries, which explicitly require every word to be composed of other words. Another source of confusion comes from the fact that children seem to learn concepts in a roughly similar order, e.g. they must learn about lying before they can learn about making promises, so we assume the former are built upon the latter. But if composition were truly how concepts arose, then the project of defining a hierarchy of concepts would be trivial — since known to everyone — and would have been accomplished centuries ago, instead of remaining in perpetual ambiguity and disagreement (e.g. compare ConceptNet to WordNet).

A third explanation for how abstractions arise has been popularized by modern connectionist (or more generally empiricist) approaches, namely that concepts arise when stimuli are naturally grouped by statistical patterns:

all our notions are just attempts at ordering patterns: we take sets of features, classify them according to mechanisms that are innate within our interpretational system and relate them to each other. This is how we construct our reality. […]

To perceive means on one hand to find order over patterns; these orderings are what we call objects. — Principles of PSI

Yet as mentioned before, generalising patterns over time doesn’t produce the concept of time. And no amount of grouping stimuli into statistical trends and extracting patterns from them can generate absence, or colour, or contradiction, or infinite. Nor is abstraction “the process of removing detail from a representationas Russell and Norvig put it. What detail would you remove, and from which experiences, to arrive at absence or contradiction? Finally you cannot logically reason your way to abstractions like existence or contradiction, since any form of logic mandates that the terms — concepts and their relationships — already be present as ingredients to reason about. Even learning by simple association implies there are already “entities” present which can be “associated” with each other:

logical reasoning can only be implemented on the basis of clear, accurate, and generalized knowledge — Logic pre-training of language models

In general, we cannot simply mix perceptions together in a large bowl and hope new meaning emerges — any more than you can mix together water in various ways and hope that wine comes out.

And last but not least, you cannot defer to language to give you a way out of this tangle. To learn a word you must first have something in mind to attach it to. When the word comes along, you already understand the situation it addresses, and learning the word is simply the social action of expressing it. “Yes”, you think, “that’s the word for what I’ve been trying to achieve”:

One has already to know (or be able to do) something in order to be capable of asking a thing’s name. [… The] explanation again only tells him the use of the [object] because, as we might say, the place for it was already prepared. — Wittgenstein, Philosophical Investigations

Although language likely does play some role in abstraction (e.g. Sapir-Whorf Hypothesis) deferring to language to explain abstraction only begs the question of how language itself gets pulled in from the environment to guide the process. The brain must already know to interpret the sights and sounds it sees and hears as language. It must also understand which words perform a labelling function through the syntax of descriptive sentences and how, with all the subtlety that entails.

These are epistemic difficulties that Large Language Models (e.g. ChatGPT) simply skip over, since the latter are created in and already function through the space of words. They are not required to learn language as humans do, from exposure to sounds and sights which they must then segregate into letters and words. In fact, this process of segregating experiences into pieces is exactly what is at issue, since to generate abstractions essentially involves splitting up experience into determinate entities and properties.

If you begin from an undifferentiated space of sensory perceptions it seems impossible to generate that extra dimension of meta-understanding, conceptualization, or abstraction that ultimately gives meaning to the flow of stimuli. One genuinely feels that abstractions come out of nowhere, by the addition of some magic ingredient. Where does this ingredient come from, and what is it made of if not concrete perceptions themselves?

Decomposition and recomposition

[]One of the most common foundational errors when trying to understand how abstractions arise in the mind is to assume that the act of thinking about them has no effect on the target content. In AI research the link from concepts to reasoning is always represented as a one-way street: first the system aggregates sensor inputs into concepts, and once those are in place it proceeds to reason with them. This procedural separation creates a handy abstraction layer that researchers use to isolate the work being done from the ingredients on which it is performed:

Cognition has traditionally been understood in terms of internal mental representations, and computational operations carried out on internal mental representations. — Reitveld, Scaling-up skilled intentionality to linguistic thought

Press enter or click to view image in full size
The most common interpretation of reasoning across both cognitive science and AI.

The conceptual conveyor-belt approach overlooks the fact that your understanding of abstractions changes as you reason about them. Whenever you mull over if some action is fair your concept of fairness gets updated based on the effort and the outcome. You don’t simply access fairness ready-made from a psychic catalogue. Just as a painter might dip his paintbrush into a colour on his palette, hoping to pick up green paint, yet the paintbrush transfers prior pigments into the swatch with each attempt to “read” it; in the same way you cannot so much as look at a concept in your mind without altering it at least a little. There is no singular, stable “node” designating fairness anywhere in your brain which is learned all at once or forever. The apparent discreteness and stability of concepts is itself an impermanent — i.e. constantly updating — illusion.

Press enter or click to view image in full size
Even the image on the right is a bit misleading. The different colours imply that an identifiable distinction inherently exists between concepts. Rather the act of separating them is part of the ongoing act of decision-making the brain engages in.

It is not necessary to have some centralized node in your brain to be able to address abstractions; it is enough to work with the flow of concrete thoughts and actions in which their meaning is embedded. As discussed in a previous post on video game ontologies, the mind can fragment even apparently monolithic concepts like distance, solidity, and object into their various interactions, each separately learnable and useful. By decomposing an abstraction into an ever-evolving array of momentary interactions we can start to get a handle on the challenge, since we need not try to land the whole multi-faceted, evanescent concept in one go.

We should stop trying to find simple ways to think about the contents of minds, such as simple ways to think about space, objects, multiple agents, or symmetries. All these are part of the arbitrary, intrinsically-complex, outside world. They are not what should be built in, as their complexity is endless; instead we should build in only the meta-methods that can find and capture this arbitrary complexity. — Sutton, The Bitter Lesson

Abstraction is a dynamic experience of agents living within a flow of concrete interactions. All meaning entails abstraction; the proposed leap from concrete stimuli to abstract concepts is unjustified. You are always swimming in abstractions; and you are also always dealing with the concrete¹. In the same way you know of a magnetic field not by seeing it but by observing its action on surrounding objects, your knowledge of abstractions is a manifestation of the effect they have on your ongoing thoughts. Every time you think you have discovered a genuinely abstract “node” like paradox or future in your mind, you need only realize that the thought you had — i.e. “that is a paradox” or “I have created an abstraction of the future” — is still a specific thought with concrete content. We should focus on the form, and not be misled by the content².

This is one useful lesson to take away from Large Language Models. The latter stick to concrete language patterns and use them to simulate an understanding of abstractions in the context of a conversation. Speech is an intentional action — both real and imagined — and deriving an answer to a verbal question is no different from planning a set of activities that would get you to the top of a hill. Coming up with an answer to the question “what is the nature of existence?” is not an inherently abstract process, but a real attempt to determine what English words to say when encountering that sentence in a social setting.

Created from behind

The above paragraph may strike some readers as overly reductive and behaviourist. It invites accusations of the same lack of genuine understanding as are levied against modern LLMs. Nor does it explain how exactly those incremental responses are generated by the latent “magnetic” force of some abstraction³.

This is because we have yet to consider why the agent experienced that series of words (i.e. “what is the nature of existence?”) as something that requires an answer in the first place. Hearing the sentence is only a tension to be resolved if a mind experiences it as such. Perhaps the words, combined with the circumstances in which you encountered them, or the tone of voice, excited some confluence of thoughts that compelled you to plan for an answer. Should you be defensive? Should you appear analytical? Is the asker being facetious? Any response you come up with will match, in its form and content, your interpretation of the request.

As an example, look in front of you right now and ask yourself what you see — you suddenly notice colours, contrasts, letters, shapes, etc. of which you were only semi-conscious before the request was made. The request itself created the problem-context which reshaped your experiences in a way that was useful to it. To “perceive” is to ask a sort of question or make a demand. The answer — what you consciously perceive and remember — takes on a form that fits the question asked. You may have perceived patches of white, or more generally that “colours” were there, depending on how you framed the requirements. Before the question was raised, on the other hand, you were interpreting the symbol-images in front of you as sounds in your head — i.e. you were reading⁴.

To explain why any set of stimuli give rise to any particular interpretation or thought, you should begin by asking “what was I looking for in that moment?” Raw data, left alone, are an inert, meaningless flow, like a stream of pixels that no one is watching. Meaning must be injected into them from some other source. This is why when training Machine Learning models using supervised learning the necessary abstractions (their meaning) must be provided from outside, through the act of supervision — e.g. what categories a model is asked to classify its inputs into. For humans who have no such external aid, meaning is added to stimuli from the motives which try to structure and segment the stream in a way that is useful to them. The “why”, i.e. the request you put to it, determines what you get out of the stimuli — it injects its meaning.

Press enter or click to view image in full size
The meaning that you extract from stimuli comes “from behind” your awareness. It is not about what you are seeing, but what caused you to look. The mind only sees the results, the world post-segregation.

Thus you can look at the same picture and see at some times a river (naturalistic description), at others a beautiful landscape (aesthetic preference), and at others Pike’s Peak (identification). Or you may interpret an image of a dog as fluffy or carnivore depending on your momentary interests. The stimuli remain the same, what changes is the attitudinal context you go into them with. The same raucous sight next door may be a “party” or a “disturbance”. Even meta-reflective, abstract interpretations such as truth, memory, space, or colour that seem to step outside their contents are merely one of many possible useful interpretations of everyday sensory experiences that can be extracted as your needs demand. Just as the above prompt made you suddenly see colours instead of reading language, a concrete sensory input to your mind may be interpreted as a “memory”, or a “tree”, or “green”, or “qualia” depending on what you were looking for. Even the source of the stimuli — external vs. thoughts — doesn’t matter; external sensory experiences are sometimes interpreted as “memories”, an event we call déjà vu.

When you were prompted earlier to look at your surroundings the constellation of thoughts you became aware of was a functionally useful answer to the question posed (what is it?shapes/colours). You didn’t see “beautiful prose”, or a means of improving your social standing, only pieces of stimuli that help with the act of visual identification. If immediately afterwards you asked yourself what you just saw, then a second volley of perceptions and interpretations would be generated suited to that request — another dip of the paintbrush.

There is no doubt a temptation to designate at least some perceptions, beliefs, or thoughts as “objective”, that is, unmotivated reflections of statistical truth. This happens because you rarely see the demand that generated a given instance of conscious perception, you only see and recall the thought that is its outcome. The visible workspace of your mind contains only the entities that were noticed, that were “put in front of it” and made accessible by virtue of being in memory or awareness. Yet the existence, form, and content of these thoughts and memories — their very shape and meaning — were determined from behind the viewer, prior to awareness, by whatever force it was that made you look in the first place and attracted one answer or another. The example of seeing colours and shapes earlier was a clear-cut example of an “objective” memory, yet its particular form, as well as its timing, were both determined by the request.

Press enter or click to view image in full size
There is a lacuna in introspection that makes thoughts appear as if from out of nowhere, like magic. The cause of the act of awareness disappears, but the memory remains. This creates the illusion that it had no cause, i.e. it was “objective”.

For every apparently objective thought or fact there was always a demand that brought it into your mind, that gave it its meaning. All explicit knowledge is an attempt to create discrete, clear thoughts; it segregates the world into “answers”. A “fact” is an answer; and the mind only learns answers to questions it has asked. The seeming “objectivity” of some answers reflects a demand to frame the stimuli in a socially fitting way, perhaps by attaching a communicative label (word), or by deferring to aspects of experiences that could be common to everyone rather than personal. By contrast, when you are by yourself, trying to find your keys your memories — and your awareness — will contain idiosyncratic configurations of experiences that aid in the search: e.g. visual thoughts of two corner cushions being moved, the table on which keys are usually kept.

Heidegger’s equipmentality

The principle elaborated in this post is similar in many ways to Heidegger’s concept of equipmentality. In his book Being and Time (1927) the existential philosopher argued that the mind interacts with the world in terms of the latter’s utility to itself, as a set of affordances:

The wood is a forest of timber, the mountain a quarry of rock; the river is water-power, the wind is wind ‘in the sails’ — Heidegger, Being and Time

Heidegger’s equipmentality makes the mind a landscape of useful abstractions through and through. Even when consciously perceiving the world, the image you build is not inherently a factual one, but an idealized image of how the world could be useful to you. All understanding is inherently practical — to understand an abstraction is know how to interact usefully with it. For example, to understand time as a spatial line while editing a film necessarily means knowing how to segregate and rearrange pieces of it as needed:

The agent is not modeling the causal structure of the environment per se, but rather those aspects of the environment that are important within its specific niche. — Reitveld, Self-organisation

Statistical patterns in sensory experience still matter, of course, but only insofar as they allow the agent to solve its problems regularly. You can only interpret a set of stimuli as “white” if similar stimuli can usefully be attached to that word consistently. So reality (i.e. statistical distribution) is still at work, only through your motives, since motives require a consistent, repeatable reality to create any sort of affordance.

The magic ingredient that injects meaning into meaningless inputs is created from your motives, from how you want to interact with the world. Trying to instil a free-floating abstraction into an AI all at once, with all its consequent self-contradictions, is to start in the wrong place. Instead we should wait for the agent to have a practical need, to attempt to interact successfully with an aspect of its world based on a concrete problem it encounters. Through this process it gradually builds up a set of contextually useful interactions.

Press enter or click to view image in full size

Fortunately, this also means an agent doesn’t have to explicitly or fully understand a concept before it can interact with it implicitly. It can work with, say, particular pieces of money in specific contexts without understanding currency in general; or learn to think and act in time without first understanding what time means. At this point the concept is latent but still effective. Over time more of the concept enters piecemeal through the back door. This gives us a way in, a crack in the impossible barrier between sensory input and abstraction. We start and end with perceptions, always working in the concrete, considering only immediate utility. From there we build up a cluster of thought-interpretations that resembles abstraction, like iron filings taking the shape of some underlying magnetic field.

When I excite a motion in some part of my body, if it be free or without resistance, I say there is Space; but if I find a resistance, then I say there is Body […] it is not to be supposed that the word “space” stands for an idea distinct from or conceivable without body and motion — Berkeley, Principles of Human Knowledge

The starting point for such learning is always a problem or tension to be addressed, the subsequent action of which pulls together your awareness of the solution. For example, before you recognized how difficult it was to get across water, boat could not possibly have had any meaning for you. Nor, as discussed in this post, could you conceive of number unless you first cared about getting more or less. And you only began to engage with time because you experience problems that resolve to “being late” or running out of time. You might have been punished for “being late”, and subsequently discovered a set of activities related to alarm clocks, busses, etc. that protected you from being punished.

The possibilities for abstract thought — such as for example possibilities to think about geometry — must connect to the wider life of the speaker, and what is of vital and affective significance to them. — Reitveld, Scaling-up skilled intentionality to linguistic thought

A tension, an error, a lack brings a concept to your awareness. This is why the best way to explain a new concept to someone is always through a story, one that starts by presenting a problem, as in, a reason why they should care⁵. The moment a person begins to care about a problem the “magic” ingredient is injected, as their frame for perceiving the world begins to revolve around these new demands.

Consider how truth, as a meta-determination of your thoughts, initially takes root out of the experience of failure or being refuted. As a toddler your mind would not have inherently framed its thoughts as “true” or “false”. Only when someone disagreed with you or you messed up did a fear of refutation or error enter your psyche. Truth then became your push-back against, and resolution to, these antagonistic thoughts. You may later formalize truth in more complex theoretical ways — this is a social act — but until it was a problem truth would have been experienced as alien and of no interest, like some arcane philosophical term.

Some thoughts may appear to be inevitable logical conclusions from a person’s existing beliefs; nevertheless a motive must still be present for the mind to actually draw those conclusions. A child may first learn about death as the approaching end to their life, but it would be wrong to say that until that moment they believed they were immortal, even though that would be a natural conclusion given their present mindset. Rather they didn’t think of the horizon of their life at all. Their only thoughts were of tomorrow and the next day, the pragmatic, upcoming expectations of play and school, not where that would lead in the future, because “where that leads in the future” was not a question, demand, or concern yet. Experiencing the problem of human finitude is what ultimately gives shape to both death and immorality.

Novelty ex nihilo

For some readers, thinking of abstractions like existence, truth, or consciousness as “useful ways of perceiving the world” feels odd or implausible. It also seems to beg the question — what is “useful”, and how is it discovered? What exactly is that magnetic force that draws useful experiences together under an abstract banner if not a latent abstraction somewhere in the brain? Doesn’t the latter also have to be explained? Haven’t we simply pushed the problem into a realm of magic yet again, and made it equally impossible to solve?

The answer is that the magnetic force does not itself need to contain the concept that it produces. In fact it can’t: every abstraction must be generated by something outside and prior to itself, as an answer to a seemingly unrelated question or problem. For example, the need to avoid pain can, through acts of practical problem solving, generate affordances related to safety — moving slowly, looking where you leap, wearing googles, etc. Pain does not in itself contain safety. Pain is not even the opposite of safety; the problem and its solution are ontologically separate. A tension like pain only tries to get itself removed. The fact that acting safely, learning about safety, and even naming and communicating safe practices to others are all a consequence of trying to remove pain was “unknown” to pain itself, and could not have been predicted from it.

Press enter or click to view image in full size
Pain, on interacting with the world, introduces a novel set of behaviours that can be categorized as “safety”.

Although a tension tries to capture solutions out of its own internal identity, the content of the solutions must still be found in external reality; they are not just pure fantasy. As the tension combines with a diverse and ever-changing environment, it produces an unpredictable panoply of novel abstractions. For example, the concept of social media arose when new technologies enabled human motives to be resolved in unforeseen ways.⁶

A creative solution to a problem contains a concept not present in the functions and predicates in terms of which the problem is posed — McCarthy, Concepts of Logical AI

Looking beyond physiological tensions (e.g. pain), those tensions that are learnt, like “social embarrassment” or “business failure” also produce unforeseen concepts. The fear of romantic rejection leads one to discover notions of attractiveness. Seeking to allay interpersonal conflict manifests as politeness, courtesy, and popularity. Our mistrust or wariness of sloppy handy-work that results in embarrassing issues prompts us to explore systems of measurement. The struggle with privation and scarcity creates economics and its many principles. Personal suffering with no apparent real-world cause introduces one to psychology. Choice and free will are the result of approbations and regrets we have about our actions. “Cold” logic is paradoxically a child of an inflamed frustration, as we repeatedly fail to align with others’ discourse; we derogate others as “illogical” before we ever learn formal logic. Existence is the satisfaction of knowing that you could get something if need be, a solution to a searching tension; “it does exist” your mind says in relief. And each of these terms is only an umbrella category for thousands of minute skills, activities, and interactions.

The birth of an abstraction

If there had been no such things as speech or universal signs there never had been any thought of abstraction. — Berkeley, Principles of Human Knowledge

All of this behaviour generally happens “under the radar”, as part of daily living. To bring it together into an “abstraction” involves explicitly designating it in your mind as such — i.e. symbolizing it as a meta-representation. All abstraction is the act of making something explicit. As such it involves creating and working with social signifiers — words and concrete symbols. Naming a concept using language, as well as declaring what experiences properly belong to that label, are both acts of utility. “How can I best communicate a shared problem?” your mind asks. A word or communicative symbol is just another affordance or tool, useful for alignment. As Reitveld argues:

linguistic thought can be conceptualised in terms of skills for engaging with enlanguaged affordances. […]

The thought doesn’t pre-exist the expressive bodily activity of speaking because it is only in the act of talking to ourselves or to others that the thought is articulated, and becomes a determinate thought.

We have reversed the usual approach to learning language here: instead of expecting the person to possess a concept in order to generate the word for it, they first use the word as a tool in specific situations, and as more and more situations benefit from using that word, the named concept functionally arises. There is nothing to be named until you decide to name it; only then to you, on a case by case basis, decide what belongs to that label. Is murder as self-defence justifiable? Is gambling entertainment? Is a hotdog a sandwich?

This act of trying to identify what is in your mind entails an ongoing effort to interpret experiences in a way that makes helps communicate them. All symbolization requires effort — like planning — it does not happen automatically. And effort implies an underlying motive. Namely, the very desire to identify an abstraction is the force that captures it and gives it a name. This means the desire to symbolize and communicate a thought is nearly always a different motive than that which created the thought in the first place. The latter, which is interpersonal, always takes a circuitous path through social demands before rebounding back onto the individual. For example, free will may appear to be an abstraction of some private mental property or behaviour, but it initially takes root as a means of judging others’ actions, and of evaluating yourself as seen through the eyes of others. The same is true of memory as detailed in this post.

The intricate path ahead

One consequence of creating common words or symbols is that we are often misled into believing that a universal, stable concept-entity exists somewhere in each person’s brain, e.g.:

Memory seems to change with experience from one person to another, while meaning must be more or less constant. — Oxford Handbook of Thinking and Reasoning

Naturally, then, when we try to define how, say, consciousness could be usefully aggregated through individual mental behaviours we assume that we must explain how this could be done for everyone. This leads us down the fruitless path of trying to discover how we can inject universal concept-nodes into AI.

In reality no single path is taken by everyone. How I come to engage with a word is different than every other person. For example, my notion of colour might have started out in early childhood as associated with a particular image of a colourful rainbow or crayon. I equated colour with that specific sight; then expanded it over time to new circumstances. Only the existence of a common word or symbol makes the concept seem common.

Although a general path can never be discovered, we can still explain how some given individual may have engaged with a concept, personally and socially, and then create systems that could simulate these in AI. This forces us to dive into the intricacies of moment-to-moment thinking in all its gory detail, without leaving it to the mysterious black box of Machine Learning models. Such a task may feel daunting: you must explain every individual instance of mental activity without the crutch of a unified “magic” concept-node guiding them — a staggering challenge unless you have an underlying framework by which to explain them as a whole. Ultimately, we need a universal framework for working with abstractions — both common and idiosyncratic ones — yet one that works through unique experiences, if we are to create an AI that can do so on its own.

Without this shift in paradigm, the nature and qualities of abstractions like before or time would be pure magic — they simply cannot be explained by looking at them as objective “nodes” with properties. Colour cannot be discovered by investigating colours, but by something outside experiences of colours — in a mode of living where the distinction between colour and non-colour is useful. The same can be said for time and number. Beginning here, we mark how the interaction of internal needs with the world attaches sensory inputs with affordances for interacting with them. The “magic” ingredient is then generated by a link between a tension to its derived set of solutions, through the act of generating “something new out of something old”. From this new vantage point, we can observe abstractions arise easily enough of their own accord.

¹ Acts of transference and generalization may be a reflection of the dynamic behaviour of abstractions, but they must still happen over concrete instances.

² Introspection doesn’t grant perfectly transparency; an effort combined with skill and knowledge is still required. Even if you can’t see what the specific detailed thought is, that only means you have yet to discover a way to interpret your mental events more clearly than just as vague feelings.

³ It also seems to ignore the feeling of the concept, which is addressed here.

⁴ There is a resemblance here to modern deep learning attention mechanisms, which form the foundation of popular generative AI models. These latter try to establish a context for prediction or interpretation out of a set of frequently recurring precursor stimuli. In our case the context is not some prior fact or event, but the agent’s motives, as framed through concrete experiences.

⁵ This explains why modern AI are consistently excluded from claims of sentience. You don’t wonder if your computer gets bored, or if it even understands boredom, because you don’t imagine the necessary motives exist for it to do so. A chatbot like ChatGPT may pretend to get bored by imitating bored speech, but because it does not experience any tension associated with the passage of time, it doesn’t understand boredom.

⁶ In the interest of completeness, let’s consider an example of how reasoning works backwards to introduce a novel concept. We may define a car as any self-propelled metal vehicle carrying passengers on approximately four wheels, and also accept that cars are allowed to drive on the highway. However, when we consider a Formula 1 car, which clearly fits the definition of a car, we suddenly recognize that there must be exceptions. From this we define new concepts like street-legal car vs racing car (e.g. dragsters) and given enough time and public motivation these terms come into widespread usage. When we then consider that street-legal cars may become disallowed in certain circumstances, we again redefine the concept to be a temporary state of a car, not immutable property, and so on.

--

--

From Narrow To General AI
From Narrow To General AI

Written by From Narrow To General AI

The road from Narrow AI to AGI presents both technical and philosophical challenges. This blog explores novel approaches and addresses longstanding questions.

Responses (8)