Your world is split up by your needs

How your mind slices up your flow of continuous sensory experiences into discrete logical entities

From Narrow To General AI
17 min readAug 31, 2024
The ability to distinguish identical twins is an example of a distinction not driven by appearance, but by the need to be socially considerate.

Whence comes then the irresistible tendency to set up a material universe that is discontinuous, composed of bodies which have clearly defined outlines and change their place, that is, their relation with each other?

Each [need] leads us to distinguish, besides our own body, bodies independent of it which we must seek or avoid. Our needs are, then, so many search-lights which, directed upon the continuity of sensible qualities, single out in it distinct bodies. — Bergson, Matter and Memory

Your senses — sight, sound, touch, etc. — create a continuous and fluid experience of the world for you, both in space and in time. Within this flow of colours and sounds, there are no objective delimiters where your mind is required to cut the stream into distinct objects and events; no clear markers to designate where any “thing” starts or stops. Your visual inputs can’t dictate, say, which span of the morning constitutes the dawn (or the morning, for that matter). And although it is possible to come up with ad hoc definitions by which you might recognize an object or event, there will always be exceptional cases which reveal how bespoke, ambiguous, and contrived such definitions are.

There is nothing in the optical flow of visual stimuli that obliges your mind to separate experiences into buildings, roads, intersections, traffic lights, nor even people or cars; much less “left turns” or “sunsets”.

Despite this, you can and do in fact slice up your experiences into identifiable objects and events. Life, moreover, demands that you think and act discretely, especially when you must decide to take some definite action — e.g. do you go to work or stay home? From this necessity of acting discretely we can work backwards to the requirement to think discretely. If you are planning a party, and each guest needs two or three slices of pizza, you must decide how many guests will be present, how many are allergic to gluten, how many are children, etc. and multiply the results by the appropriate number of slices.

Nor can we dodge this requirement for discreteness when calculating probabilities, since these too require that you first define discrete variables or terms over which you perform your calculations. Calculating the probability of contracting a disease is only possible, and only makes sense, if you can recognize what does and does not qualify as having contracted a disease (when was the onset of the disease?), as well as what counts as an individual person (do conjoined twins count as one or two people?) So before you can begin, you must make all-or-nothing decisions to define the parameters of the activity. The question is: how can your mind make these decisions, given only a continuous stream of inputs where stimuli blend into one another, and where any subset of inputs may be part of any object?

In the classic paradox of the Ship of Theseus, individual planks of wood in a ship are replaced one by one, until none of the original wood is left. The precise moment when the ship stops being the “Ship of Theseus” cannot be determined by your experiences of the ship itself. The boundaries of objects or events must always be constructed and enforced by the mind that perceives them. This is not to say that the decision is a random one, unconnected to what you see, but rather that such information is not contained within the sensory experiences themselves. It must come from outside them. Sensory experiences are like an ever-flowing river, and a river cannot divide itself into pieces — it has to be cut up by some external force, like a ridge, boulder, or gravity.

So to split up any continuous space of inputs into chunks, there must be a reason, an epistemological “knife”, a trigger that “detects” some feature in space or time and decides to cut the experiences at that point, in that manner. In the example of the Ship of Theseus, such a knife may detect, say, a shift in the proportions of colours or contrasts of the ship’s wood, or a change of captain, etc., and from this make a distinction between the old and new ship.

The problem with this approach is it leads to an infinite regress, since the trigger itself is based on a discrete event, and you must now decide why that trigger was defined the way it was, and so on. At some point, you must end up at a “hard-coded” starting point for the chain. But how could that exist if you don’t know in advance what the determining factors should be? How could a single hard-coded starting point be universally appropriate to all instances of identification, from sunsets and ships, to families and forests?

How we address this in AI

We might look to instructive examples in modern AI for an answer. Dividing datasets into entities or samples for training and inference is a critical step in all Machine Learning models. So far, however, the question has produced no general solution. Rather it is decided on an agent-by-agent, task-by-task basis, guided by the problem it is intended to solve.

For example, an image classifier divides its input space into individual images along with their labels. The software architecture and pipeline automatically separate and perform regressions with respect to this pair. Even when the images appear in sequence there is assumed to be no relationship between two consecutive images. On the other hand, a (simple) video classifier draws the dividing line between videos of multiple constitutive images, which are assumed to be related within a given video.

Or consider per-pixel segmentation models, which subdivide images into the objects they contain. These require training on hand-drawn discrete cut-outs (segments) to learn how to identify the boundaries of objects. As for language models, they ingest their input data split by words, lemmas, and sentences. They are not exposed to language through undifferentiated streams of colour images mixed in with the rest of life’s experiences, as is the case for humans. Multi-modal language models combine all the above (images, videos, words), and use the word separation inherent in the language stream to cut up the images into sub-segments.

Multi modal models pass the distinctions between the modalities.

Reinforcement Learning (RL) models usually distinguish only between individual video frames, and subsequently can only predict entire frames, not the objects within them. Finally, all classical logic models (GOFAI) require the data to arrive pre-categorized into logical terms and operators.

In none of these cases are data presented as a continuous stream of inputs. Buried within any ML paper you can find the initial assumptions for how the data must be segregated to serve the task at hand. The data, the task, the regression schedule, and even the pipeline architecture define the units or entities involved, and all subsequent separation happens as a consequence of these inaugural decisions. This is part of the definition of narrow AI: narrow AI defines its samples, variables, and units of data based on what it deems to be necessary to its specific task.

The real world, of course, does not tell you where a word, either written or spoken, starts or ends. Nor can experiences themselves inform you which subset of your visual inputs counts as a living room and which are a kitchen. Your mind must make these distinctions itself from its standpoint within the empirical flow. Of course, once you’ve reached a certain level of maturity identification comes so naturally, so easily, it is tempting to project the decision onto the world, as though the world itself is telling you that such-and-such is where a kitchen starts and ends.

So when you identify any instance of an entity, it is tempting to look back at what caused you to recognize it and assume that was the trigger: e.g. “I identified a car by its shape, so it must be the shape of the car that caused me to learn to identify it”.¹ But this only applies in cases when you already know how to identify the item, that is, once the work is already done. Effortless acts of recognition are the product of prior learning and decision-making, which is now producing its habitual response so automatically that the mind appears to be a follower and not a leader in this process.

Only when you encounter exceptional or ambiguous cases where the decision is not clear (e.g. on what criteria would you divide dogs from wolves?) do you start to suspect that it is yourself that you are following. And to be clear: this is not about what the correct definition or criteria are, it is about how the mind arrives at its tentative conclusions, correct or incorrect, and what leads it to make that distinction. That is the core challenge of this post.

The distinction/unification dilemma

The most intuitive explanation for how a mind splits up the world, one that is implicit in nearly every cognitive scientific theory of recognition, is similarity. It is the repetition of certain inputs in space and time that, through overlap, lets us know that a common item or event has been presented to the mind. Let’s put aside for now that this approach fails when applied to abstractions like government or existence; even for concrete objects like chair the decision of how to split the world cannot be caused by similarity in appearance alone, as we’ll see in the example of identical twins.

Imagine an AI that has to distinguish between identical twins. At some point it must make a discriminating decision; that is, it must decide on what basis it will distinguish that there are two humans and not just one. Unfortunately, they look quite similar. All humans look slightly different from day to day, so the presence of small identifying marks is not enough to force the agent to decide that there are two separate entities involved. Even if you see the twins side by side this may simply be an illusion — perhaps the second person is a reflection in a mirror, or a video of the first person playing beside themselves on a screen. The agent may not even witness for itself that there are two people, but may only be told by others that two experiences of a person, one on Monday and one on Tuesday, were actually of two different people. In short, there are no sensory criteria under which a mind would be obliged to separate the twins into two separate entities.

To address this problem, your next intuition may be to look to associated, nearby correlations, such as names or identifiers you hear others say, as being the reason the mind distinguishes the people involved. For example, if the appearance of one of them is followed by the name “Kevin”, and the other by the name “Kent”, this might be enough for the mind to decide they are distinct entities. However, this only pushes the question back, since the sounds of those names, as you hear them, are not themselves discrete entities, but rather appear within a flow of other sounds. Discreteness can only be signalled by other discrete triggers; so distinguishing the twins by association with nearby stimuli is like trying to cut a river with another river.

If their names are “Kevin” and “Kent”, one alternative may be that your mind extracts the common sound “Ke” from the continuous space of sounds and uses it as their joint identifier. Presumably, this is how you identify any other person’s name based on the overlap of sounds you hear associated with them. So why not assume there is only one person named “Ke”? The agent is therefore left with a dilemma: either it can distinguish the twins based on the sound of their full names, by incorporating the additional unique syllables — Ke-vin, Ke-nt — or it can select the common element within that flow — Ke.

Any theory which tries to rely on similarity or association to extract entities from the flow of experiences faces this irreconcilable dilemma. Reasoning using only the stimuli themselves, or trying to rearrange the stimuli into some general, logical principle of distinction, will always fail². In the case of the identical twins, it is obvious there must be something outside the immediate stimuli to give you a foothold into the process, to push you one way or another: i.e. to separate two entities based on discriminating inputs, or unite them under what is common.

Errors give us a clue

And even though, as with any distinction, you may make errors, you can still designate in the main that there are two people and not one. When you are wrong, such as when you punish the wrong twin for the acts of his brother, you can realise it and correct yourself. Of course, you may still be making an error — what if the twin lied?

As mentioned, the fact that you can correctly distinguish between twins is important in cases where clear logical reasoning must be performed. A teacher of twins would have to unambiguously separate them when planning a field trip, to know how many pairs of children should be enrolled on the bus, and how many must return with it. The math is discrete; no vagueness is permitted. And that last word — “permitted” — gives us a clue as to how to resolve this riddle. The reason your mind distinguishes otherwise identical twins always comes back to the fact that not distinguishing them creates problems for you.

If the difference didn’t matter you would have no reason to make a distinction. Consider, for example, a web browser running on your computer, which on a certain day updates to a completely different version, but which appears and functions largely as before. By default your mind will continue to respond to similar experiences in similar ways. You would not care that the browser was swapped with its “twin”, nor identify it as different, because the change causes you no problems. Any distinction will always be on the basis of whether it is necessary; and “necessary” implies that it is relative to an immediate motive. You would only start to conceive of the new software version as a different entity when it causes you issues the old one did not (or resolves issues), and you want either to revert back to the old version or praise the new one. There must always be a motivated reason, even if the reason is a simple desire to be pedantic.

And the same is true of the twins. Treating two separate humans as a single person based on appearances will lead to problems. There are cases, however, in which no distinction between the twins needs to be made. If they share a taste in clothes, their mother may purchase clothing for both while treating them as a single unit. Similarly, a callous bully may call the twins by the same name without any twinge of concern. Finally, a person unaware that they were twins will only recognize this fact when it causes him confusion or embarrassment. The need to distinguish them by finding some discriminating input only arises if there is a problem context that drives it. Nothing in the stream of inputs itself can trigger this distinction for you. In some cases minute differences must be carefully taken into account; in others enormous differences can be safely ignored. The question always remains: does the person care?

Of course the stream of experiences itself still matters, otherwise your thoughts and actions would have no purchase on reality. So it is, for example, that on seeing the same person at two different times, your mind will automatically trigger similar thoughts, which you have previously learned for that input. And even so for twins — your response will in most cases reflexively unify the two as the same person, that is, you respond to their similar appearance with similar thoughts and actions.

However, there is more to a response than just this first reflex; there may be subsequent effortful acts of interpretation that happen for each new experience of the twins. When you need to call a missing twin to come down for breakfast, you may have to make some effort to determine which of the two is present right now — your first instinct is not always sufficient. In contrast to calling other acquaintances for whom this decision would be straightforward, the desire to call the twin’s name will trigger a special tension (learned from past slip-ups) to take extra precautions. The mind now looks for some confirming input or thought, which either allows it to select the first name that comes to mind, or infer the alternative. This will be a “new learning” in that moment. The tension must first attach the correct name to the experience of the twin currently present, then subsequently use that newly connected thought to logically infer the name of the one absent (e.g. “Kevin” → the other one → “Kent”). This process is conscious, meaning that it leaves a lasting memory (the attached thought) of the event.

Since the connection you make is based on what you are trying to achieve, the success of your endeavour always determines which interpretation (i.e. the thought of “Kevin” or “Kent”) ends up being attached. When dealing with identical twins, any number of options are available. If only one twin is present you may decide to call him and his brother by the same nickname to avoid slipping up. Thus the interpretive thought, i.e. the intention of what to say, will be driven by its presumed utility.

Underscoring the above analysis is a hidden implication, namely that there are no stable mental entities or concepts anywhere in the mind allocated for each of the two people. Distinctions and designations are made in the moment, as part of the flow of responding to immediate experiences, by transferring past learning onto new instances. Recognition is always a momentary action. By default your mind treats the twins, at least by appearance, as a common entity. The only distinction you make between them is in response to the immediate need arising in your thoughts; such as what name to give the one you’re looking at, or which extracurricular lessons to drive the twin currently in your car to.

If you already learned to identify Kevin by a mole on his hand, or his mispronunciation of “adage”, then on hearing him say that word again you need not make the effort anew. The thought will already be there in response to the inputs; like a math calculation you don’t need to carry out a second time once you’ve figured out the answer. If no immediate thought is present, or if a doubt arises about the validity of your initial thoughts, your mind must put some heuristic to the task, to discover an interpretation that gives greater confidence — that is ultimately what a “doubt” is³. Either way, such a “concept” of that twin does not extend beyond these momentary thoughts into deeper mental structures.

The popular belief in mental “entities” or concepts — which remain stable beyond the immediate, specific thoughts that pass through your mind — is perhaps the single greatest error in the field of cognitive psychology. Once accepted, it leads innumerable confusions, paradoxes (e.g. the Ship of Theseus), and irreconcilable contradictions. The discrete separation of entities, when it is performed, happens only in the concrete space of the senses, not some abstract dimension of concepts.

So what is the epistemological knife?

Since, as shown above, discreteness is determined by momentary practical tension-resolutions, it is reasonable to infer that the tensions and resolutions themselves must therefore be discretely defined. This is in fact the case. Sensations may be continuous, but the experience of a tension and its resolution happens in a discrete moment of time. This is the last piece of the puzzle which connects the act of moment-to-moment learning to the flow of experiences.

Treating tension-resolution as time-delimited is perhaps an uncommon perspective in AI research. Although Reinforcement Learning performs regression at distinct moments based on rewards, in order to resolve the question of this post we must push discreteness to its limit. Whether or not you solved a math problem — or believe you solved it — must be either true or false at a given time; logic can only be built on discrete foundations. When viewed in the aggregate, such responses may seem probabilistic; e.g. if you give the same math problem to many people you may get a distribution of answers; but that does not mean that a given person’s answer is also a distribution. An individual’s action response is discrete, and so is the underlying decision.

What remains is to tie together the timing of what gets learned to the tension-resolution cycle. Going back to the twins, when trying to determine what to call the individual in front of you, you are triggered by a momentary need (tension)— you don’t know what name to say aloud. This happens right before you speak, as a cautionary response learned from previously confusing their names. Your mind now looks for indicators that hint at who the one before you may actually be. An article of clothing, a name tag, anything that, based on previous experience, triggers the interpretive thought containing their name; this now gets elicited and immediately captured as a solution to the current tension. The auditory thought of the name is locked in as a “new learning”: a novel thought-response to the immediate appearance of the man standing in front of you.

Although this is a shorthand description, its purpose is to show how minor on-and-off triggers of tensions and resolutions create the discreteness of your responses. The earliest segregation of units occurs in time, not in space: there is a span of time between experiencing a tension and experiencing its resolution, where everything in between may be captured as part of the correct response. This is the first epistemological knife.

Any spatial separation (e.g. defining the outlines of the object) arises out of the temporal one. The same tension, as it is resolved multiple times, can select out of the mass of sensory inputs — both in thought and in external experience — that subset that recurs as part of the solution. It discards the rest as noise, which helps generalise the response outside the immediate, specific circumstances. Background noises and extraneous sights are removed, leaving only what is essential⁴. This process comprises the “unification” half of the dilemma above.

Unification: keep only the common inputs that occur between the tension and its resolution.

On the other hand, if a resolution is not achieved in any previously learned way, then instead of pruning away what is extraneous, additional inputs are included by which your mind discriminates the current experience as a special case. This is the other half of the dilemma: discrimination. Both unification and discrimination are automatic processes; they occur before you are consciously aware of them.

Distinction/discrimination: distinguish based on non-common inputs.

The goal, ultimately, of this approach is to make the process of problem-solving concrete and practical for AI implementations. We are no longer relying on vague notions like “object” or “need” to define the problems a mind may be faced with, but rather we base them on specific inputs, and the particular timing of their appearance. We are also removing the need for inductive biases or hard-coded prior knowledge, and allowing the system to learn to separate entities autonomously, based only on its own stream of experiences — move from narrow to general. This allows us to turn a continuous space of inputs into discrete mental objects and events, sufficient for performing logic.

The key, however, is to base these discrete “entities” not on abstract concepts, but in the immediate process of transferring between concrete sensory thoughts. We must reformulate “abstraction” and “generalisation” into a dynamic cycle that migrates what was learned for one concrete experience to another one. The symbol “x” in mathematical logic is no longer an abstract variable, but rather the actual appearance of the character “x” in thought and in reality. Once you make this assumption, you find you can unite problem-solving, object recognition, concept manipulation, and generalization/transference into a single harmonious model.

¹ This assumption has been carried into AI research.

² One approach is to use not just statistical correlations, but causal relations — e.g. does saying the name cause the right person to respond? This, of course, requires active intervention, not just observation. The theory presented in this post is in fact an extension of such a causal explanation to a more general and practically feasible one.

³ Doubt is a circumstantial tension caused by prior bad experiences, which now looks for its own validation. By default you don’t doubt, you follow your automatic thoughts. This often leads to Freudian slips.

⁴ This implies that all resolutions must be connected to tensions. Thus an experience has no absolute value, as suggested in Q-learning, but only a relative one based on what is currently driving you — the tension.

--

--

From Narrow To General AI
From Narrow To General AI

Written by From Narrow To General AI

The road from Narrow AI to AGI presents both technical and philosophical challenges. This blog explores novel approaches and addresses longstanding questions.