The Green Swan
Although modern logical techniques do account with some success for the reasoning involved in verifying mathematical proofs and logic puzzles, they do not explain other cases of technical or common sense reasoning with much detail or plausibility… the utility of the results is controversial. — Logic and AI, Stanford Encyclopedia
All modern logical A.I. treat symbols and logic¹ as hard-wired processes that automatically construct an A.I.’s beliefs. In humans, by contrast, the ability to use logic is gradually learned throughout life. Moreover, symbols and logic are both artifacts of communication. They emerge, are validated, and are refined through interactions with other people.
Two important consequences of this to A.I. are that
- the use of logic is always optional; it does not happen automatically, and
- the choice of when and how to use formal logic is socially motivated.
All this is to say that in order to adaptively use and successfully contextualize its use of logic, an A.I. must learn logical concepts and axioms from scratch.
To see why, let’s start with an example: the green swan.
The Predicate-Scoping Problem
Many AI texts assume that the information situation is bounded — without even mentioning the assumption explicitly. — Concepts of Logical AI, McCarthy
Imagine you’re tasked with populating an A.I.’s knowledge base. Your goal is to derive new information from existing facts using logical inference. As part of its understanding of the world, you teach the A.I. that
All swans are white
This is a simple logical statement called a predicate. For this example, assume that all the words in the predicate are well-grounded in the A.I.’s experiences. The A.I. can recognize a swan, and the colour white within its flow of sensory inputs².
Given the above, your A.I. can deduce new information:
Greg is a swan
Therefore Greg is white
Any logical A.I. that could understand the words in the first two statements could, in principle, come up with and evaluate the truth of the last one. Its confidence in the conclusion is directly proportional to its belief in the first two assertions.
But what about the original assertion, that all swans are white? Given the way it’s phrased, in what ways is it true?
Imagine I took a swan, and spray-painted it green. I then presented it to the A.I.
“This swan is spray-painted green, which means it isn’t white” I’d say. “It disproves the original predicate”. As the programmer, you might think “no, it doesn’t. When I wrote that all swans are white, I meant that
All swans are naturally white”
The A.I., however, sees a green swan, and since the original predicate didn’t specify that it should be “naturally white” it incorrectly changes its beliefs to “most swans are white”.
In response, you reprogram the A.I. to include the condition that they be naturally white. Next I take a swan and hold it under a purple light. “I have again disproved the statement” I say.
You rephrase it once more: “When I said that all swans are white, I meant that
All swans are naturally white, under common lighting conditions”
The A.I. is not aware of this nuance either. Its automated inference engine is not adaptive enough to account for such tricks. So again it needs to be re-trained.
I try a few more shenanigans, chemically altering a particular swan until it glows blue, using yellow-tinted glasses, describing fantastical swans from literature, etc. In each case, you refine the original assertion to specify what it actually meant. Finally, I deliver a genuine black swan.
“Now it’s been disproved” you admit, and you let the A.I. adjust its beliefs to “most swans are white”.
What happened? You began with what seemed a straightforward statement — “all swans are white” — yet by the end you’d revealed a whole world of assumptions and hidden conditions. The statement wasn’t truly complete without these. And the caveats are more extensive than I have described. Purple light and chemical modification are recent inventions. Imagine how many ways there could be to subvert this statement in the future.
Such incompleteness is a feature of all predicates; for example “all men are mortal” (what about fictional men?), “the capital of France is Paris” (not in my game of Civilization V), and “Mary doesn’t have a beard” (we all have fine, imperceptible hair covering our bodies). The explicit terms of the predicate are only a fraction of the information contained in it. Words are the outward, and by no means complete, expression of a complex of internal assumptions and intentions.
Throughout my interactions with the A.I. above, the meaning of the words “swan”, “are” or “white” didn’t change; they remained stable and well-grounded. The predicate as written was always clear. The issue therefore goes beyond the classic symbol-grounding problem, and indeed assumes that has been resolved. You could call this new problem the “predicate-scoping problem”³.
For an A.I. to effectively apply even basic first order logic to any situation, it must be attuned to such nuances and hidden intentions, and adjust how it interprets predicates as new cases arise. Otherwise its manipulation of symbols will be superficial and naive. It will act like a socially stunted person who takes everything literally.
Given that logical predicates are so fluid and ambiguous in their interpretation, what, if anything, is the value to an A.I. of using them?
Logical Predicates Are Usually Worthless
[Symbolic] representation precedes learning as well as reasoning …In a localist representation the relevant concepts have an associated identifier. This is typically a discrete representation. — Neurosymbolic A.I., The 3rd Wave
Formal logic isn’t something an A.I. ‘physically’ applies in its interactions with the world. It can only use logic on a mental model of the world, on ‘ideas’ that it has refined from its experiences. Its network must first pin down a wildly varying influx of sights and sounds into useful patterns that represent an object, concept, event, or probability distribution of events. It creates a model to interpret a series of colours as, say, a “chair”, or a longer series of them as a “war”. As it learns new representations of the world it starts to perceive new concepts, such as “taxes” or “relativity”, and applies logic to those.
Logical reasoning can only be implemented on the basis of clear, accurate, and generalized knowledge — Logical Pre-Training of Language Models
Since formal logic can’t work with vague feelings or intertwined notions, its “objects” must be clearly defined, like numbers, individual’s names, common images, even a vector in latent space. Once an A.I. has defined these terms, the function of formal logic is to maintain a consistent set of mental relationships between them by dictating what new information it can and can’t infer. For instance, when abiding by the rules of first order logic, you aren’t allowed⁴ to assert A and then also assert the opposite of A.
So for formal logic to be able to do its job, its predicates must be concrete symbols. But as you saw above, although predicates are written as simple words, they entail complex networks of contextual assumptions. This means that whenever inconsistencies pop up in the A.I.’s explicit knowledge base, it’s always possible that the terms used don’t actually represent the same thing, e.g.
“Less is more”
“Time is money”
These statements, and in fact most statements in your head (such as metaphors), would be difficult to logically resolve given their explicit terms. So “logically” they should either be changed or made obsolete. But you and I know that many such apparently vague and nonsensical statements do make sense. For an A.I. to be robust it should, like you and I, always have the option to choose whether to resolve an explicit contradiction, or leave it alone. The decision can’t be an automatic hardwired function; it must be motivated by context.
the ideas with which we deal in our apparently disciplined waking life are by no means as precise as we like to believe. On the contrary, their meaning (and their emotional significance for us) becomes more imprecise the more closely we examine them. — Carl Jung, Man and his Symbols
All this is to say that an A.I. wouldn’t necessarily understand its environment better, or be more successful if it stringently applied formal logic to what are, after all, mercurial and intertwined statements; no more successful than if it were to hold some number of apparently contradictory beliefs.
Which brings us back to the original question: what exactly is the benefit to you — or to an A.I. — of using formal logic?
The answer is hidden in an earlier sentence, where I wrote: you aren’t allowed to assert A and it’s opposite. That is, you aren’t allowed to by other people.
Until now, this post assumed that formal logic would be directly useful to an A.I. alone, regardless of whether there are other agents around, and that logical rules would develop as a consequence of this utility. All current A.I. development presupposes that an agent can benefit from using logic in any context, even in a social vacuum. This is a misunderstanding, and the confusion it has created has negatively impacted the historical development of logic in A.I.
The purpose of this series is to show that formal logic, in humans at least, has its roots in social communication. The so-called rules of logic are a set of constraints you learn to place on your natural, freer (non-logical) modes of thinking. Most importantly, you adopt these rules in response to pressure from other people.
We consider it to be a bad thing to be inconsistent. Similarly, we criticize others for failing to appreciate … logical consequences of their beliefs. In both cases there is a failure to conform one’s attitudes to logical strictures. — The Normative Status of Logic, Stanford Encyclopedia
Your friends — or scientific peers — look for coherence in what you say, as well as in the statements you all agree to as a group; statements like “we’ll all meet in the parking lot”, or “species evolve through natural selection”. So although you may permit your private thoughts to be haphazard and inconsistent, speaking such inconsistencies out loud exposes you to a charge of lying, or at best, not knowing what you’re talking about. And since you often want to convince others — to do something, to believe something, to think a certain way — some amount of self-policing is required for your words to be taken seriously.⁵
In everyday life one thinks out what one wants to say … and tries to make one’s remarks logically coherent. — Carl Jung, Man and his Symbols
In the above exchange over the green swan, for example, the programmer felt compelled to change the words of the predicate to clarify it. She could just as easily have said, “my thoughts on this topic are consistent, and that’s all that matters.” But others can’t read your thoughts, they can only interpret your words. For the sake of cooperation, the world requires you put your free-form thoughts into neat boxes, and draw straight lines connecting those boxes.
In everyday life one thinks out what one wants to say … and tries to make one’s remarks logically coherent. — Carl Jung, Man and his Symbols
The purpose of logic is to characterize the difference between valid and invalid arguments. A logical system for a language is a set of axioms and rules designed to prove exactly the valid arguments **statable in the language.** [emphasis mine] — Modal Logic, Stanford Encyclopedia
To be clear, I’m not saying that society teaches you how to use logic in the same way that it teaches you how to tie a rope — something you could have discovered by yourself. I’m saying that words like “coherence” or “consistency” only have meaning in the context of social interactions⁶ — you are coherent to others, you are consistent in your speech. And although you may later internalize these rules and constrain your thoughts as well, their origin and valid domain is always in interpersonal communication.
A Self-Made Toolkit
I’m not suggesting that logic has no “reality”, or is “all made up”, or “subjective”. Whether or not logic is part of the fabric of reality is beyond the scope of this series. In fact I believe in some way logic is part of reality. But your brain doesn’t traffic in reality directly, it deals only in its perceptions of reality.
Nor is it relevant, when it comes to developing A.I., why it is that others insist that you speak logically, as though their insistence were driven by some pervasive truth about the world⁷.
The question I’m addressing is how the human brain — and by extension an A.I. — becomes acquainted with logic, creates or acquires the relevant concepts, then applies them. My goal here is not to prescribe how an A.I. should think, but rather predict what it will think on the topic of logic, and why; whether or not the concepts have any analogue in reality. It is a matter of the epistemology of logic.
Making computers learn presents two problems — epistemological and heuristic. The epistemological problem is to define the space of concepts that the program can learn. The heuristic problem is the actual learning algorithm. The heuristic problem of algorithms for learning has been much studied and the epistemological mostly ignored. — Concepts of Logical AI, McCarthy
Until now the majority — possibly all — of A.I. research in logic has been focused on two goals:
- discovering what the ‘correct’ logic to encode is, and
- how an A.I. can effectively apply that logic in its deliberations.
Almost no attention has been given to how an A.I. can acquire logic in the first place, because the assumption is that it never does. Logic is considered a part of the innate infrastructure of the mind, which is supposed to explain why it feels so natural and obvious. So it gets built into each A.I. as a native faculty. Any subsequent testing the A.I. is subjected to happens within a restricted space, one that can be addressed with these built-in tools.
In the real world, however, logical rules must be constantly adapting and changing alongside the agent’s interaction with the world. And that means they must to a large degree be learned, even invented. For example, consider a hypothetical A.I. that is pre-programmed to naively carry out syllogisms:
All A are B
All B are C
Therefore all A are C
As we saw above, such an A.I. would soon run into obstacles related to the meaning of words, when faced with an equivocation, e.g.:
All stars are in outer space
Rhianna is a star
Therefore Rhianna is in outer space
This highlights one shortcoming of using symbols: a concrete symbol can mean many things. You may suppose that the way to solve this problem is to focus on the meaning of the words rather than the words themselves. So you might try to separate a homonym like “star” into different symbols based on their meaning — star_1 (celestial body) and star_2 (celebrity) — the assumption being that if you could clearly nail down the scope of the terms, the syllogism can once again do its work. Unfortunately, this moves the problem into a new, an even less tractable space, that of separating meanings. For example:
All stars shine brightly
Rhianna is a star
Therefore Rhianna shines brightly
No amount of term-disambiguation can clear up this example. The conclusion appears to be valid, and it seems to follow from the first two premises. It’s also clear that two entangled meanings of the word “star” are being used. Rhianna doesn’t literally shine, so the truth of the last sentence comes from the connection of “star”, as a metaphor for celebrity, to stars as burning balls of gas. You can’t easily separate the terms into their own symbols without undermining the truth of the conclusion.
This problem is present in Prolog, the world’s most popular logic programming language. Although it models first order logic, it has no way to tackle the kind of disambiguation that real-life requires. For example, the Prolog rule
animal(X) :- cat(X)
is read as “anything that is a cat is also an animal”. This breaks down in cases of toy cats, or the fact that in some contexts a picture of a cat should be interpreted as a cat, and in other contexts it should not be? The language is too rigid to encode intentionality into its rules.
Are there ways to resolve the ambiguities above? Perhaps. You might add an additional sub-system for relationships involving metaphors. The point is that your A.I. must continually be updating its logical formalisms to account for emergent complications. This is a never-ending process. The history of formal logic is a compendium of the development of new logics— fuzzy logic, paraconsistent logic, non-monotonic logic — as limitations in the old ones are discovered. This is something you couldn’t do if your mind were simply hard-coded to adhere to one logical formalism, and treated that paradigm as an absolute necessity.
Necessity — the sense of being driven or forced to conclusions — is the hallmark of “logic” in Western philosophy. — Hegel (Dialectics), Stanford Encyclopedia
In reality you rarely accept a logical conclusion as correct simply because you followed a sound logical ‘algorithm’ to get there. More often you retroactively judge the soundness of the algorithm based on whether you feel the conclusion is correct, as you did in the above examples. Since Rhianna was not in outer space, you immediately intuited an error in the logic; but since she does “shine brightly”, your intuition leaned the other way.
Special considerations that arise from the actual content of a syllogism will drive how you apply logic, and by extension how you might later abstract this activity into a more general logical formalism. Like a computer’s antivirus software, you refine and update your methods as they fall short of what is needed in a given case. And no logic has yet been invented that is universally valid.
This self-made-tools approach to logic is rarely addressed in A.I. because no one has yet explored how an A.I. could create concepts like “consistency”, “implication”, or “conjunction” from scratch without having them in some way already built into the architecture, which would make them compulsory. The very prospect itself seems impossible.
The rest of this series describes how such a thing happens.
Learning that 9 equals 9
To understand how an A.I. could develop a robust formal logic⁸, one that can handle the green swan situation, we can look for hints in how these same techniques are gradually learned by humans.
And they are learned. Logical coherence is not something children innately strive for; nor is it a natural mode of thinking. It is an acquired skill. Granted, the ability for any of your thoughts to lead to other, arbitrary thoughts is indeed innate, but the connections between thoughts are by default non-logical and unstructured, based on little more than happenstance, association and personal preference. You see this in children’s seemingly irrational statements; you may even confuse their chaotic output with lying.
Consider the following exchange over video chat between an elementary school teacher I know and a 6 year old student:
Teacher: Now write two ‘nines’ at the bottom of the screen.
Student: OK. (He writes two ‘9’s as asked)
Teacher: So what do we do with nines when we add them? Write the answer next to them.
Student: I don’t see a nine.
Teacher: You don’t see any nines.
Student: Where are they?
Teacher: What did you just write?
Student: I don’t know?
Teacher: What do you see at the bottom of the screen?
Student: A nine, and another one.
Though this may sound like deception or mockery, the student was being sincere. He simply hadn’t equated the nines he wrote with the nines that needed to be added together. What you’re seeing is the default state of the human mind. Something as simple as “numbers at different times and in different contexts are the same” (i.e. 9 = 9) must be learned.
If you’re a parent you may be acquainted with such bewildering irrationality in your little one. No child is born with an innate or even rudimentary notion of “validity”, “opposite”, “consequence”, or “truth”⁹. Her mind only deals in “I like” and “I dislike”. Even when it comes to forming beliefs, you’ll find their brains filled with an overabundance of wishful thinking. It takes time, and others’ feedback, for them to begin associating positive feelings to “truth” or “valid”, and negative feelings to “inconsistency” or “lies”.
For their statements and thoughts to follow sound patterns, they must first have informal and formal logic drummed in through years of education; at which point they act as a socially-sanctioned guides to thinking¹⁰. And since teaching a child requires communicating with her, language is where formal logic first gains a foothold in the mind. All logical training begins when others place restrictions on what you say out loud. Before you ever learn to think logically, you are taught to speak “logically”.
This may seem like a reversal of how logic is supposed to work — logical statements are presumed to be the outward expression of logical thoughts. But as you’ll see in the next post, from a child’s perspective logic is initially only about correcting specific verbalizations.
Continued in part 2.
¹ For the purpose of this series, “formal logic” is any type of formalized logical system, e.g. modal logic, fuzzy logic, deontic logic, even paraconsistent logic. Since most A.I. tends to be designed around first order and probabilistic logic, I’ll focus this analysis on those. However, everything in this series also applies to any rules-based systematization of knowledge; for instance partonomic/meromoic relations (e.g. the hand is part of the body, therefore the body contains a hand), taxonomic relations (e.g. a human is a type of mammal, mammals are types of animals, therefore, all humans are animals), mathematics, set theory, and so on.
² This assumes a level of A.I. that currently doesn’t exist, but may in the future. It is, at least, a research goal for the field.
³ Though there is a distant similarity between this problem and the frame problem, the frame problem addresses the unspoken consequences of multiple logical operations, whereas the “predicate-scoping problem” is concerned with the semantic validity of a single predicate. By coincidence, the same argument outlined in this series can also be made about the frame problem; and the same solution prescribed. In fact, nearly all pervasive problems in logical A.I., such as its failures at common sense reasoning, are as rooted in the same misunderstanding described in this article.
⁴ Or in the case of fuzzy logic, that the sum of probabilities of A don’t add up to 1.0.
⁵ Logic is generally referred to as being “normative”, in that it tells you how you ought to think. This designation doesn’t fully appreciate how socially determined it is. Generally, by “logic is normative”, we mean that others may guide you to discover your best thinking. This series proposes that your best thinking, from symbols to logical operations, is created, and only has any meaning, in a shared social context. It has no significance to a human mind in isolation.
⁶ You may object to this assertion. You may imagine a scenario where an individual, born in the wilderness, and without society, uses symbols and logic despite his isolation; say, to keep track of food or animal behaviour. Such a thought exercise highlights how difficult it is to disentangle one’s own private modes of thinking from the society that gave rise to them. Any problem that can be solved using logic, a.k.a. a logic puzzle, requires that you first create an abstracted, idealized conception of the entities involved, and represent those as symbols. For example, knights and knaves logic puzzles require you to imagine ideal, fictional entities, who always lie or always tell the truth. Such entities must obey strict rules, even if those are probabilistic rules. But real life is subtle, variable and uncertain when compared to the contrived scenarios of logic puzzles. A mind that automatically thinks about all concepts, without exception, in terms of strict and formally defined relationships would have a hard time dealing with nuance. So the use of formal logic must be optional. What this means is that the so-called wild-person must have a reason to think of some of his mental objects as following strict rules, and others as not following such rules. What decision criteria would he use? The answer the A.I. community has currently banded around is “frequency of occurrence”, i.e. probability. Under this assumption, an A.I. is expected to model the world based on how likely events are to occur, given its prior experiences. But is this how humans arrive at their idealized logical concepts? Have you ever met a knight, or anyone, who always tells the truth? Statistical likelihood is just another optional rule that you can choose to disregard; it is not the underlying mechanism of reasoning itself. This is further evidenced by the historically pervasive belief in unlikely miracles, ghosts, shaman magic, and spirits. Nor do you or I regularly check that the sum of mutually exclusive probabilities is 1.0, or follow Bayes Theorem when updating our beliefs; that’s why Bayes had to exposit the theorem in the first place. Such perfect thinking is more an aspirational goal than a default mode of cognition. The only remaining reason, then, to abstract idealized, formal concepts out of fluid experiences is to use them as a touchstone for communication, for shared understanding and communal problem-solving. You translate your personal experiences into a common symbolic space (numbers, words) so that you and I can address them together. It’s no coincidence that logic is inseparable from language, which is a social tool. As for the wild-person, having no language community he would be completely cut off from any reason or method for, and therefore any possibility of, structured thinking. Alone, and lacking a shared conceptual framework rooted in society, his actions and thoughts would appear emotional, superstitious, and free-associative, like those of a desperate criminal, or a madman, or a child; in other words, “wild”. Such a mental state is nearly incomprehensible to you or I, which is why it seems so implausible.
⁷ It may be that the human species’ entire body of formal and informal logic is complete bunk; and that we can’t see this since we use that same logic to judge its own validity. When trying to create human-level A.I. you must face such an uncomfortable possibility — what if humans are actually unaware idiots? In any case it is not advisable to put too much focus on whether logic reflects reality.
⁸ This series uses the word “logic”, and not “reasoning”. This is to distinguish the formal, regulated modes of thinking denoted by the word “logic” from the natural and fluid act of jumping from one idea to another that every human mind follows. The latter I’ve called “reasoning”, though you might just call it “thinking”. The difference between logic and reasoning is that reasoning doesn’t need to be consistent, valid, sound or even good — it’s just the set of thoughts you had before you arrived at the one currently in your head.
⁹ Nor would it be helpful to say that a basic logical template exists apriori in the mind of every child, and that the child merely has to be taught how to apply it to individual cases. Given the number of exceptional situations and radical transgressions, such a line of argument would simply beg the question of why anyone would believe it exists in the first place, and how it would be different from the child simply learning the template later in life; and we have ample evidence of the latter.
¹⁰ This process never complete. Wishful thinking and motivated reasoning dominate most of our adult lives. As such, logical consistency is more an aspirational goal humans may strive towards — and with great effort — as opposed to the natural default.