On The Impossibility of Teaching Basic Numeracy to AI
A case study of why numbers must be affordances
This is the eighteenth post in a series on AGI. You can read the previous post here. You can also see a list of all posts here.
In 2021, an article in Scientific American announced that the debate over whether numbers are innate in the human mind or if they are learned from experience had been definitively answered, with little room for doubt. The evidence revolves around a study which shows that babies, when presented with images of various arrangements of items, displayed more interest when two consecutive images were of different numbers of objects.
The conclusion the article arrived at, namely that humans have an innate sense of number, may perhaps have overstated the case. We tend to think of number as a unified concept: you either have the concept or you don’t. But numbers have many aspects and functions; you may be familiar with some but not others. Even if an infant could innately differentiate small, positive numbers, somewhere past the number seven their intuitive notions would start to break down. The infant wouldn’t grasp that numbers, being consecutive, can increase indefinitely; a critical feature of the concept. Their “concept” includes only a few functions or fragments; it will have greatly expanded by the time they finish high school.
Few people would argue that babies are born understanding calculus, algebra, base-10 counting, the infinite positive and negative numbers, rational and irrational numbers, imaginary numbers, exponentiation, multiplication, etc. Clearly the brain is born with the ability to do such things eventually, but whether they are all present from the start is both not evident and also highly improbable. There is likely a threshold of skills and capabilities below which the innate “understanding” of numbers resides, and on top of which other capabilities are eventually built. This can be as simple as receptivity, or some Kantian a priori. Whatever you call it, it must be something you could also build into an AI, which is where this series becomes interested in the topic.
For clarity, the abstract concept of number is different from the ability to perform math. A computer can perform math, but it has no understanding of number, any more than a newborn understands muscle movement — both can merely do those. As discussed in a prior post, to understand a concept means to be able to adjust and evolve it to your changing needs. Concepts, in fact, are made up entirely of such useful adaptations, which we labelled affordances.
Whatever the true reality of “number” is, we humans have only learned about it in pieces as has solved problems for us. Our concept of number has evolved over time to incorporate notions like negative and irrational numbers, base-16 counting, exponentiation and so on. This has been the source of humanity’s creative strength in mathematics — such inventions would not be possible if we were fixed with the mathematical functions we already have; any more than a graphics card that can only do Euclidean geometry could spontaneously discover non-Euclidean spaces.
So if you want an AI to be as creative as humans are with the notion of number you must first establish which aspects should be innate, and which should be learned. Giving the AI freedom to play with the latter, and even to make errors, is necessary for innovation. All you have to do is lay the basic foundation for those possibilities. But where exactly is the boundary between what is learned and what is innate?
Given the complexity of the subject matter, and to keep this post to a manageable length, let’s focus on one of the most fundamental aspects of numbers, namely the notions of “more than” and “less than”. Without these, the concept of number would break down into mere random symbols. You may therefore want to embue an AI with these notions as built-in behaviours or as learned ones; as representations or maybe as functions. But as we’ll see in this post, these apparently simple concepts are fiendishly difficult to objectively define —in fact, the task is impossible.
Ask yourself: how would you get an AI to know if something is “more than” or “less than” something else? Your first impulse may be to establish some special-purpose circuitry in its hardware. Given any two inputs, it would output a signal of “more than” or “less than”. This, of course, requires that you first define what those inputs will be. They can’t be words or symbols — like “8” or “five” — since those are learned, and differ between languages. They must be in some native input type. The AI could, like a simple calculator, take a binary input, but that raises the problem of how the agent converts its experiences into those inputs. To do this conversion it must already have an abstract concept of number in play somehow; one that transforms, say, the sight of three apples, or the numeral “3” into an input, three.
At some point this will involve counting. But the act of counting is itself tricky, and can’t be hard-coded. Consider how difficult it is for the AI to make sure it hasn’t counted the same object twice. Counting has many prerequisites, including:
- The agent must know how to separate its experiences into “things”. The visual system cannot by itself determine that any group of sensory inputs is the unit being counted — e.g. the agent may be counting the number of hands it sees, or the number of fingers on those hands, or the number of freckles, etc. each resulting in a different answer.
- It must know when to stop counting, and note the number it ended on. This means it must have previously determined what group of things it is counting, and identified its members; then applied this decision.
- It must pick a visual path to follow to ensure it doesn’t count the same object twice. This becomes more difficult if the objects don’t stay still, or the agent itself is moving.
All this is necessary just to count. Each activity is complex, nuanced, and the agent must learn how to do it correctly — e.g. if you asked it to count the number of animals in a zoo, it should clarify if humans are included, at what time the count should be done, how much error is acceptable, etc. Once it has decided on a final answer, it must associate that with an abstract input, to be able to compare it to other numbers. I say “abstract”, since it must also be able to associate a written numeral like “31” with the same input type and be able to compare those as well.
Since numbers are infinite, the agent can’t have a different concept for every possible number, as well as the relative value of any pair of them. It should be able to tell you, for example, that the symbol “1012” is greater than “84”; and it seems unlikely that it will have an innate concept of either of those. Clearly some simplification must take place. In such cases we humans tend to look at the length of the digits, and note that the first has four digits and the latter has two. Sometimes we don’t even count them; if the numbers are long we estimate the length by eye. These heuristics are learned, and depend on how we happen to write numbers. We must teach the AI to do the same when reading numeric digits.
The difficulty increases when you add in symbols denoting negative or fractional numbers, since these are qualitatively different. For negative numbers, the agent must sometimes invert the valence of the number (-1012 is now less than 84), but in other cases not (-84 is still less than 1012). For irrational or fractional numbers (e.g. 𝜋 or 41.77) it must sometimes ignore what it sees after the decimal symbol, and other times not. It is unlikely that the human mind is born having such complex machinery, as well as the recognition of what the symbols mean, so you must teach this to the AI without assuming any of it is hard-coded. What exactly would the function “more than” take as inputs then? Digital calculators only work with precise numbers, not vague estimations. And we haven’t even discussed how the agent would apply such comparisons abstractly— e.g. if someone says that “Plato is more important than Aristotle”.
If you pare away all the symbols and word-sounds from numbers, you are left with an abstract notion of “magnitude”, that is unrelated to any specific number. This represents that general “feeling” of comparative size. For example, ⅗ feels bigger than ¼, although you may not be sure by how much. Using the vague notion of magnitude could help with fractional or negative numbers, and could even be applied to abstractions — e.g. if I say that Alice is stronger than Bob, and Bob is stronger than Charlie, then the agent may conclude that Alice is stronger than Charlie, despite no numbers being given.
This would seem to work, except for two problems: first, the agent has to apply it inconsistently — I may say that rock is stronger than scissors, scissors is stronger than paper, but paper is stronger than rock. Second, and more importantly, it is a circular argument. The AI can’t understand the notion of magnitude without already having a notion of comparison. What does a “large” magnitude mean, except that it is “larger” than a “small” one?
In fact, as the example of comparing ⅗ to ¼ showed, the feeling of relative size is not actually dependent on any hard-wired calculation. It may often be incorrect: compare 1/5 and 36/163 — which is greater? Your answer may be right or it may be wrong. That feeling of “more than” or “less than” only comes after the decision is made. It is an effect, not a cause. Once you decide that something is greater than something else, whether erroneously or not, only then does the feeling of it being “more” get attached. It is chosen by the agent; i.e. learned.
Based on the above arguments, you might be thinking that perhaps the concepts “more than” and “less than” aren’t hard-coded; they are learned. Shockingly, the more you dig into it, the more it seems like there is no way to even define “more than” and “less than” that would work in all relevant cases.
Take a minute and try to think of how you would teach an AI the difference between “more than” or “less than”. Come up with a definition that does not already presume the AI understands what both mean —in other words make sure the word “more” is not included in your definition of more. Your thought process might go something like the following:
You: When a number is “more than” another number, it’s bigger.
AI: You mean it’s larger in size, and takes up more space?
You: No, it… has more numbers before it… hmm. I mean it takes longer to count up to it.
AI: It takes more time to count to it?
You: Yes.. ah. It is… there are more of them, you know. Like if it were apples, they would take up a larger space.
AI: That’s what you said earlier, it would take up more space…
You: No, wait. Imagine you’re counting. So you count 2 and *then* you count to 3 after that.
AI: I do that. But I go to 3 then to 2. What’s the difference?
To cut this short, there is absolutely no way for an AI to distinguish or define “more than” and “less than” by relating them to other objective concepts. Every such definition is necessarily circular — even those that resort to using set theory (see footnote ¹). Neither can the agent derive these concepts empirically from its experiences, since before it could separate out the common aspect in the examples — rather than, say, their colour or shape — it must first be able to count or measure the objects in those experiences and compare them; and to do that it must be familiar with numbers and quantities in the first place². Finally, as we showed earlier, there is no way to hard-code this definition into an AI either — since there is no format of inputs that the AI can use for the function.
This conclusion, that there is no way for an AI to define “more than” and “less than” in a way that works in all cases, should be unsettling. We are so used to intuitively understanding what “more than” means, that we never question how or when we learned about them. You just know how they “feel”. You may even be waving your arms and saying, “you know… its more!” The concept seems so obvious, yet when faced with having to inject it into an AI — where the rubber meets the road — we’re at a loss.
Clearly, we have these ideas though, and we seem to be able to use them effectively, even in anomalous cases like rock, paper, scissors. We can contemplate them as conscious concepts, as well as use them to compare things. So how could an AI derive them if this seems impossible both as a built-in mechanism and as an objectively learned concept?
Fortunately, we introduced an alternative approach in the previous article: affordances. Since we’re having difficulty defining the objective origin of these concepts, perhaps we should look rather at how the AI will use the words “more” and “less” to its benefit and work back from there; maybe new possibilities will present themselves. As it is, every word or concept, if it is learned, is learned from a set of experiences with which it gets associated. Understanding “more than” and “less than” may not require you to find a singular, essential meaning that the agent learns all at once. Instead the concept may simply be the aggregate of all the use cases to which it relates the word and its synonyms.
What are those use cases? I can think of one obvious one: if the agent likes something (e.g. money), then in general “more” of it is something it would also like — and “less” is something it would dislike³. The inverse is true of something it dislikes; in this case “less” of it is something it would strive for. For example, if you don’t want to spend money on building materials, “more” building materials is a bad thing, and “less” is good. This seems to be a major practical use case for these words. One interesting aspect of these use cases is that they is not circular —they do not rely on knowing what “more” and “less” mean beforehand. For the moment, all we are saying is that if the agent likes or wants something, then using the word “more” will also lead to what it likes, regardless of what “more” means. We are making no other assumptions about the word except this usage.
Although this way of learning about concepts may at first feel unsatisfactory and sorely lacking, it gives us a foot in the door — the word is at least grounded, because we came at it from the opposite side: from the side of utility, rather than of strict definitions. It also matches practical experience; toddlers first learn to use the word “more” to get things they want, such as food. If a toddler is hungry, the parent may ask the child “more?” The parent then brings food, and the child’s problem is solved. “More” is thus learned as a magic word, a tool that solves problems. Initially it may simply be synonymous with “food!” A toddler can learn to use the word without having a complex numerical notion of what “more” means.
Assuming he learns the word contextually, after it has already eaten something, then by default “more” will entail “more than what I already had”. But these implications will be unknown to the child, who has merely learned a contextual reflex — both as thought (the thought of the word) and as action. This vague and misaligned usage of the word becomes obvious when the toddler employs it outside the original context. For example, he may look at a toy car where one wheel is missing, and say “more!” to get you to bring the missing wheel. To him it simply means “I see something I like, yet I am unsatisfied; bring me an equivalent of what I like”.
Learning the word “less” would not mirror or occur at the same time as “more”, even though ontologically they define each other. “Less” can be learned later through other examples. A child, who is sad about the distribution of toys between siblings, may be asked “is it because you got less?”⁴ Now the word can be used as a tool for getting what he wants, by complaining to his parents “I got less!” Or perhaps in his selfishness, he wants to take “more” toys for himself. Looking at another child, and being asked what the other will get, the solution he has learned that works is the word “less”. Again, the child does not need to know that “more” or “less” relate to numbers, or that they are opposites, etc. It can learn each of those properties individually, through solving new problems.
Learning through use cases gives concepts the flexibility that strict definitions don’t. Edge cases can be accounted for as they arise. It also explains the ambiguity inherent in concepts like beauty, truth or game. And, as we saw in the last post, it even resolves the symbol grounding problem in one fell swoop. All it requires is we give up our attachment to objective definitions of concepts as the source of truth about them. Objective reality does play a role, in that it provides the patterns of experiences people have, but we don’t learn about concepts through objective reality. We learn them through our subjective interactions, based on how they satisfy our needs.
In your own life you may have noticed that when teaching someone a new concept they are unfamiliar with (e.g. stock market derivatives or Jacobian matricies), appealing to its dictionary definition is fruitless. You must rather show how the concept relates to some underlying motives that drive the word’s usage. The best definition of a word is a story; only then will things “click” with your audience.
Of course we can’t ignore the fact that the concepts more and less still have an objective numerical reality behind them, and that the above use cases may only be incidentally solving the toddler’s problems. This is why we referred to this usage as an affordance, instead of a “need” or an “action”. Using affordances lets you to sketch the outline of the objective truth of the concept through grounded use cases. The child will first start to learn about “more” through the simple interactions of using the word. As it is applied to new situations and solves new problems, it will eventually form a fully-featured, robust set of practical applications. Similarly, an AI can gradually build up its concepts as combinations of various solutions.
Any attempt to subsequently come up with a canonical dictionary definition for a concept is an attempt to bring all these observed applications under one symbolic umbrella. This is generally done for the purposes of clear communication and collaboration. But as we saw above, sometimes it is not possible to clearly define a term, because the objective definition is not how the word was learned in the first place. This is how we end up with the symbol-grounding problem: by attempting to take the end product — the dictionary definition — and assuming it as the starting point.
What this post is proposing is a general reversal of how an AI should learn about objective reality. It may strike some readers as completely backwards — the suggestion that an AI should learn how to apply a concept before understanding what it means. However, “meaning” can only be grounded through applications — including applications that help humans collaborate, such as dictionary definitions. And as we’ve shown with the examples of more and less, the there is no viable alternative. These — and many other — concepts can neither be injected into the architecture of the AI without creating major lacunae, nor learned through objective definitions while still being grounded.
In the next post we’ll look at why most words in the dictionary, like “self” or “existence”, can’t be learned by association, and how problem-solving provides another route to language acquisition.
Next post: How to learn words that can’t be associated with explicit experiences
¹ Set theory presumes the agent can understand sets as a collection of “multiple” things, and therefore it cannot be comprehended by anyone who did not already know what numbers and counting were. Although empty sets exist, to do a comparison of two sets that results in “more” or “less”, you need an least one non-empty set, and you must be able to count how many members overlapped between them.
² Kant made this point in his Critique. He concluded therefore that magnitudes were part of the a priori form of our receptivity. But Kant’s transcendental aesthetic makes it impossible, for instance, to conceive of non-Euclidean space, because it cannot be derived from the a priori form of natural space. He never realized these issues because he never had to dig deep enough into the implementation to figure out how it would work; everything was shrouded behind the a priori.
³ There are exceptions, such as when you have too much of a good thing, but as we showed in the previous post, learning exceptions is something affordances are well-equipped to do. Every new experience may reshape the aggregate definition, and a concept is simply the trend that is reflexively observed among them.
⁴ They may not know they have actually gotten less, only that they are unhappy.