The Green Swan: Part 3
A Thin Layer of Symbols
Read part 1 and part 2 of the series.
Let’s finally look at formal logic: the abstract rules, symbols and operations that most of us only become acquainted with after high-school — if at all.
Generally, when you use logic you use it informally, as a specific reaction to the current situation. If a friend told you his head ached all day, you’d know not to ask if he had had a good day. You could call this ‘practical’ or ‘implicit’ logic, since it’s not based on any formalized rule. You confirm that it was valid when it saves you from embarrassment. This is similar to your everyday understanding of morality: you may know roughly on a case-by-case basis what is and is not immoral, without having any overarching theory of morality.
To develop your thinking from this implicit stage to formal logic, you must first go through a stage of conceptualization or abstraction. In this step, you come up with words for logical concepts, and learn how to apply them to multiple similar examples.
This step incidentally brings us face to face with perhaps the most interesting and difficult challenge in all of A.I., namely “how do abstract concepts arise in a mind?” Logical concepts like “complete”, “incorrect”, “valid”, “opposite”, “truth” etc. have until now been assumed to be ‘givens’, and as such they are implicitly built into each A.I. that tries to use them. This is largely because having the A.I. derive them on its own seemed out of the question. As you‘ll see below, an A.I. can in fact produce such concepts from scratch, based solely on its motives and its collaboration with other people.
Words as Concepts
When an infant says the word “mama”, her goal is not to merely identify or describe her mother. Rather she wants help, or attention, or interaction. It may be short for “look at what I did!” She may later transfer the word to other adults, and even to objects if doing so helps her get attention. In the same way, every word you use is a tool: you only learn it when you discover it is part of solving a problem.
Consider the word “truth”. You can trace its origin in your mind to a desire to settle debates, or mitigate doubts in people you care about. For instance, as a child witnessing a tense back-and-forth between family members disagreeing on an uncertain topic, you may hear someone mention that they should find out the “truth” about the issue. This word seemed to change the tone of the conversation and may have eased the tension in the air, though you didn’t know why. To you the words may equally well have meant “stop debating” or “calm down”.
Later another debate arose, this time between your friends; and along with it a new tension. At some point one of your friends spoke in a tone or mentioned something that reminded you of the first conflict, and the words “find out the truth” popped into your head. So you said them aloud, hoping it would get them to “calm down”.
Being slightly older, but still young and impressionable, your friends deferred to the cultural authority they had been taught to instill in the words “the truth”. So as before, this phrase seemed to ease the tension. Your mind now transferred the phrase and it became a response to both situations. It would subsequently get elicited in your thoughts in similar situations.
As yet you had no deep understanding of the word’s meaning or its implications. Indeed your first attempt at defining the word would likely have been wrong¹. As with most new words that you picked up in conversation, you first knew it by its feeling — that inarticulable context in which the word seems appropriate. That feeling is actually a gestalt or composite of all the problem contexts for which the word was found to be a solution.
You may have noticed that unlike the more popular ‘didactic’ model of concept acquisition, the process of learning concepts described here works in the opposite direction. You didn’t start by learning the objective dictionary definition of the concept “truth”, and then use the concept to solve problems. Rather you learned methods of solving interpersonal problems using the word as a set of sounds, then gradually elaborated on this initial sketch through self-reflection and dialectic to create a more and more refined picture of “truth”. In short, first you did the concept, and only later you understood it².
All linguistic concepts begin their lives this way, as solutions to social problems. This is the indefinable bond that connects diverse and seemingly incompatible experiences into a single abstraction. Take any abstract concept and ask yourself what all its instances have in common. This should include instances where the word is used as a metaphor; for instance the word “window” as used in “the eyes are the windows to the soul”. You will find that the only connective tissue is the motivation that drove you to learn that word.
Looked at this way, there’s no need to posit a built-in psychological mechanism that drives the creation of concepts, or even of symbols. You don’t need to assume the existence of a native neural structure which automatically pulls together similar examples under one common “concept” node³. The simple process of triggering conditioned memories is enough to emulate both.
To be sure, similarity still matters. In the example above, there was a similarity in the two debates which triggered a memory of the word “truth”. You could consider this a case of learning “by association”. However, similarity is not enough; the phrase stuck in the new context because it also addressed the problem at hand. This is a critical and often neglected ingredient in psychological models which explain how humans generalize concepts. The social context, the problem, and the linguistic term all play a necessary role in transferring a word from one situation to another.
A Rallying Banner
Over the span of many such experiences you attached the same words to more and more situations. As when you were an child, social motives continued to drive learning into your youth. The difference now was that you were dealing with more nuanced situations, involving mature adults.
The words that were now flying around in your head acted like glue that you could use to connect diverse and otherwise unrelated experiences, including memories. As you saw in part two, there is only a minor difference to the mind whether an experience happens in thought or in reality. This resulted in a bootstrapping process by which you solved problems using only your memories of conversations and their relations to value-laden words.
For example, one day, as a teen, you may have been moping and feeling upset at constantly being refuted. You introspect on a memory of a bad experience. You add to it memories of others that involve similar words or sights, and your interpretations of these memories. Then, a peer with analogous problems helps you discover a word that you can use to communicate your issue out loud: e.g. “validity”.
Over time, you grew an informal logical vocabulary that you could put to use in everyday conversations. For instance:
- You learned the word “consensus”: a way to avoid unending arguments that waste everyone’s time.
- You learned the word “validity”: a set of procedures for defending an assertion when others express doubt.
- You learned the word “conditional”: the option to readjust and specify an assertion when confronted with obviously disproving cases.
- And you learned the word “truth”: a way of addressing our shared doubts and uncertainties.
One major difference between youth and infancy, however is you had since developed a theory of mind: the ability to imagine what others are thinking, and by virtue of that to sympathize with their problems. When you were a child, saying the ‘right words’ merely got you out of trouble or impressed adults. Now, your newfound empathy meant you could learn words which solved problems not just for you, but for a group of people around you. Giving a problem a name and saying it aloud helped focus other’s attention on the common difficulty. The act of learning and speaking the word itself thus became a type of solution⁴.
That’s what you did with word “truth” above: it served as a banner around which people who were similarly perturbed rallied to solve an underlying problem. This is still true even though it may have solved different problems for each of you. The word had become intersubjective: it enabled other people to partake in and collaborate on a common issue.
Intersubjectivity is handy trick, but it doesn’t remove misunderstanding completely. You soon find that words like “truth” or “inconsistency” are still ambiguous. Given your friends’ idiosyncratic experiences, different associations or experiences get attached to the word “truth” for you as for them — therefore using that word doesn’t always have the intended effect.
If you still find yourself frustrated by misunderstandings, you may decide to go further down this path, and begin to learn about formal logic. Formal logic is a set of shared, fixed templates you can use to pare away the messy nuance, individual variation, and subjectivity that can scuttle everyday conversations. This is a second meta-level of problem solving. The problem now is that of the ambiguity inherent in communication. Words themselves have become an obstacle; and still a social one at that.
Where informal words may have multiple incompatible interpretations, formalization makes the common language concrete and removes confusion when discussing an issue with others. Shared symbols such as “all” (∀) or “not” (¬) establish a restricted workspace for clarifying disagreements to help you reach a consensus — at least that’s what they’re intended to do. Usually this only works in very narrow domains, and only as long as you all agree on what the terms mean and how they interact.
The development of any such symbolic system always entails a long struggle over shared ideas, give-and-take, misunderstanding and clarification⁵. The history of 20th century logic, starting from Frege’s formalization, and its refutations by Quine and Dennett, has been a living record of such exchanges. Only when everyone involved achieves a tentative consensus in a particular domain — like fuzzy logic, or logic programming — does it crystallize into symbolic formulas and equations. The same has been true of any field of study where symbols are used, such as math or chemistry. Symbols represent the apotheosis of mutual comprehension.
The goal, for you or anyone interested in the study of symbolic logic, is to move the entire conversation into an abstracted space of shared symbols. “Abstraction” here doesn’t mean a set of latent or generalized mental representations as it does in A.I.; it means an adjacent space of real words and images that you (and others) elicit in your mind in response to the original inputs. They are called “abstract” because they do not directly reference the images and sounds in the subject matter itself; you replace those with another set of commonly understood symbols⁶.
The end result is rigid, concrete, and formulaic. It is designed to be so for people to communicate clearly in such critical fields as engineering. And it can maintain these qualities and recover from misunderstandings thanks to a shared agreement about scope; so that when I say “f(x) = 2x + 3” you only bring to mind the specific contexts in which that expression is useful.
You can see what a high-level, niche, and esoteric activity formal logic is. It is a paper-thin, silvery fringe at the edges of a much larger body of human thought, which is informal, arbitrary, and full of haphazard free-association. Formalization is a voluntary endeavour you undertake to rein in your thinking, to fit your thoughts into socially agreed upon patterns of concrete symbols. It is an “unnatural” concession you make to live well in a society.
When natural consciousness entrusts itself straightway to Science, it makes an attempt, induced by it knows not what, to walk on its head too, just this once; the compulsion to assume this unwonted posture and to go about in it is a violence it is expected to do to itself, all unprepared and seemingly without necessity. — Hegel, Phenomenology of Spirit , Preface
[This] suggests that immediate, intuitive analogical reasoning is our primary mode of reasoning, with logical sequential reasoning being a much later development — Deep Learning For AI, Yoshua Bengio, Yann Lecun, Geoffrey Hinton
Motivated, non-logical thinking is your default way of addressing the world. Out of these roots formal logic grows like a well-pruned branch. Formal logic is just a flavour or type of motivated thinking, one that tries to conform language and symbols to a communally agreed upon set of rules. A perfect and comprehensive logical system — one that also aligns with the real world — is an academic dream. It’s a imagined ideal at the end of a long road that has, to date, delivered only a string of imperfect formalizations.
For logical-symbolic A.I. to reverse this hierarchy and propose that formal logic is a necessary building block of human cognition is optimistic, to say the least. A handful of static, built-in operators could never communicate the living history of shared intentions that went into every predicate and logical operation. To expect a computer to reverse-engineer all that organic legacy from pre-compiled operators is to expect it to be capable of telepathy.
You may object to that last paragraph. “Why not program formal logic into an A.I.? Why ask each A.I. to reinvent the wheel for itself? We have ready-made shortcuts. We’ve already derived sound formulas. We should design a system that can apply them mechanically or probabilistically — tasks at which computers excel. This would make the A.I. simpler to train, and the results easy to measure. Compared to that, social interactions are messy, arduous to implement, and ultimately unnecessary.”
Or perhaps your intuition is persuading you that formal logic in some way mirrors the way the objective world works, and that constraining an A.I.’s beliefs to such rules is an efficient way for it to achieve its goals. From this standpoint subjective, motivated, or otherwise irrational thinking represents a disruption into your unfiltered, true understanding — a failure to see the world as it is. A good A.I. would do well to leap-frog over such all-too-human foibles. This is the assumption around which most logical A.I.s are designed.
A case study of this doomed approach is the Logic Tensor Network (LTN). It’s a type of neurosymbolic A.I. whose goal is to allow a neural network to automatically determine how likely a statement is to be true. It does so by embedding logical axioms as constraints on vector space so that a model can be trained to fit them. Logic Tensor Networks — like all neurosymbolic A.I. — build a layer of symbolic abstraction into their infrastructure, and a layer of logic on top. Once symbols are grounded, determining the truth of predicates is an exercise in weight optimization.
In neural-symbolic computation, logic can be seen as a language with which to compile a neural network…
Once symbols emerge (which may happen at different levels of abstraction, ideally within a modular network architecture), it may be more productive from a computational perspective to refer to such symbols and manipulate (i.e. compute) them symbolically rather than numerically. — Neurosymbolic A.I., The 3rd Wave
As a consequence of their design, Logical Tensor Networks build into their networks the axioms that you and I put in voluntarily; for example, that a person can’t be a smoker and also not be a smoker at the same time. Such constraints are, to the A.I., universal and mandatory across its entire space of beliefs. But although this may sound like a good thing, superhuman even, they are faced with two critical handicaps. First, they can only ever constrain the network to these few hard-coded rules. They are unable, like you, to discover and add new axioms or constraints in response to social pressures.
[Logical Tensor Network] is a framework for learning in the presence of logical constraints. LTNs share with [another paper] the idea that logical constraints and training examples can be treated uniformly as supervisions of a learning algorithm — Logic Tensor Networks: Deep Learning and Logical Reasoning from Data and Knowledge
Worse still, the LTN can’t know when it would make sense to apply its transformations and when they are not useful. Being yoked by default to mechanical, logical thinking is itself a crippling limitation. It closes off free-form, metaphorical thinking, which comprises the lion’s share of everyday cognition (get it? “lion’s share” is a metaphor). So the use of LTNs must be restricted to concrete concepts and idealized relationships — like “smoker” or “parent” — in prepared, sandbox environments. Throw some poetry at them and their effectiveness is severely diminished.
Logic is a Choice
Human brains don’t apply logic mechanically, unbidden, and heedless of context. Being logical is always a choice. You may, for instance, believe that “people who give to charity are good”, and also that “Bob gives to charity”. But you are in no way obliged to believe that “Bob is good” if you don’t like the guy.
Nor are you forced to change either of your first two beliefs, even if your friends find such behaviour objectionable. Tough for them. It’s your mind, and you’ll only put in the effort to clean it up when you feel you need to. There may even be some interpretation of the statements that justifies leaving them as they are. Of course, until you discover this, you may prudently decide not to express such contradictions aloud… and here again we see the purpose of logic — to regulate what you say to others, not what you think to yourself.
If your goal is to build an adaptive logical A.I., one that can apply rules to more than just quarantined toy problems, then hard-coding axioms and operators is clearly insufficient. The system must treat logical consistency not as a Procrustean bed into which it fits all types of experiences and tasks, but as a living repertoire of self-imposed constraints on thinking, each one added on a case by case basis, and anchored in context.
And since logic is a constraint on language and symbols, the A.I. needs a medium of communication through which your peers can correct impractical assertions⁷. They’ll tailor their corrections based on what they deem necessary, so that idiomatic metaphors aren’t held to the same level of scrutiny as mathematical proofs. To expect an A.I. to derive all this nuance working in isolation neglects that, as you saw in the first post, formal logic has no utility for an isolated individual (see footnote 6 in that post).
Training an A.I. in this way is a long process, but it’s doable⁸. Moreover it’s not optional. Every person, and by extension every A.I. that wants to overcome the predicate scoping problem and firmly ground its use of formal logic must go through the preceding developmental stages. To skip to the endpoint, or to strong-arm logical axioms into the network itself, without the antecedent communication struggles and formative moments is to cut the head off the body that sustains it. By neglecting the shared social motivations that gave rise to them, the predicates and operations of formal logic, though they can be computationally aped, will have a narrow, brittle scope. The A.I. will be like I was in the example that began this series: full of misunderstanding and apparently stubborn in its thinking, like an evil genie trying to subvert your aims.
Thanks to Graham Toppin for reading the first draft of this series.
If you found this series interesting, I’d be glad to discuss some of the finer details of this process which were left out for brevity.
¹ Analyzing a concept is not about discovering what the concept already contains in itself (e.g. bachelor contains unmarried and man). Rather you add such details by tacking on information you later discover, and deem reasonable to add. When learning bachelor, one could (and frequently does) first learn to identify a bachelor based on apparently ancillary features, such as being carefree or slovenly; all without knowing about unmarried or man. This is a reasonable first approximation, and only when it fails do you add corrections. Concept definition is therefore always synthetic in the Kantian sense; never analytic.
² In their initial stages, all concept-words are little more than intuitive mental responses. In some cases they never stop being so. Looking at it this way resolves many issues around how abstract concepts are created. It explains why some concepts resist being defined objectively, as seen in ongoing debates over the definition of “art”, “beauty” or “game”. In these cases the unconscious, implicit definition is the only one available.
³ It should be clear from this description that a concept, like a symbol, is not a new type of psychic object. It is a concrete set of memories, usually of word sounds, that are attached to other concrete memories. This means it is not an “abstraction”, as in a symbol. Its unifying properties arise from an underlying motive(s) that stitches these memories together. These motives work like a magnet, drawing together problem and solution memories — and a word is one such solution. You can read more about the mechanical details of language as an interactive linguistic tool in Language and Motivation in Chatbots.
⁴ There are parallels here to ancient cultural traditions which believed that having a word for an idea gave you power over it. Even today, giving a problem a name (like “depression” or “uncertainty”) makes it seem more controllable, more familiar. This is because if it has a name, people can understand and rally around the problem, and you can conscript their help. To have a word for a problem is therefore to have something like a viable option or plan to solve it. Thus you count the act of finding a name for a problem as a type of solution.
⁵ In other words, dialectics.
⁶ The claim here is that symbolic abstraction is an activity that happens in the real world through shared discourse, not in a layer built into the network of a brain. Both conceptualization and formalization are exercises in abstraction.
⁷ We are so used to construing our abstract thoughts as direct, self-created representations of reality, that we often don’t realize how socially conditioned they are. For instance, even if you were a pure research scientist, you could never communicate any truly innovative idea except by using concepts others are familiar with and understand; so the space of new discoveries is more restricted than you might hope.
⁸ Throughout this article, I’ve used terms like “problem”, “solution”, “thought”, and “learn”. It goes without saying that the exact mechanisms behind these terms beg for further elaboration. You can read more in other posts about how thoughts can connect to other thoughts, how problems and their solutions are derived, and how they drive learning. You can also see examples of thinking visualized as code demos. Together with this post, they comprise a framework and blueprint for an agent to develop logical concepts, beginning from free-form association to sound, formal thinking. If any of this interests you, let me know, and I’d be glad to discuss where this project is going.