Cognitive Science and A.I. research study language, not intelligence
The medium is the message
By Yervant Kulbashian. You can support me on Patreon here.
The tools we have for studying human cognition are surprisingly limited and coarse, consisting mostly of surveys, benchmarked tasks, and self-report. When researchers wish to delve into the deepest recesses of the mind they must rely heavily on verbal communication with the subject to analyse what’s going on under the hood. Even MRI studies are useless without knowing what a particular brain activation pattern is supposed to correlate to; information that can only be gained by asking the subject to do a task or answer a question. One unintended consequence of this reliance on communication is that we see in the experimental subject features and structures that we projected into them via the act of experimentation.
This fundamental hurdle to research was brought to my attention anew as I read Keith Holyoak’s book, The Human Edge: Analogy and the Roots of Creative Intelligence. The book is the culmination of the author’s lifelong research, and explores how a mind can understand and complete analogies. As a brief refresher, an analogy is an arrangement of four elements split into two pairs, whose pairwise relationships are said to be isomorphic. For example, hand is to finger as foot is to toe, which is formally written as hand:finger::foot:toe. Analogies are central to two broadly unsolved problems in AI research — creative problem-solving and generalization — so understanding how they are represented in the human mind would be a boon to AI development. The ultimate goal of Holyoak’s book is to describe the software architecture for implementing an “analogy solver”.
As is so often the case when studying the mind, we encounter our first hurdle right out of the gate. The theory outlined in The Human Edge assumes that the four terms (e.g. hand, finger, foot, and toe) and their relationships are already present in the mind before to being rearranged into an analogy:
The human mind comes with a repertoire of types of concepts it can think about. […]
The process [of analogy] requires holding source and target analogs (including the relations involved in each) in working memory while performing systematic comparisons.
The book doesn’t spend much time investigating how the mind sliced up and encoded the world into the requisite entities. They have presumably come into existence at some prior time, and the individual now has free reign to mix and match them at will until they find the right combination. So when the book begins its exposition by saying:
The first step [in analogy] is simply to retrieve the source analog from memory and notice its relevance.
It omits the critical step of generating the source elements to be retrieved. Any evaluation of similarity between source and target pairs, or manipulation of their pairwise relationships will depend entirely on these entities and their properties as were generated beforehand. Once those are in place the majority of the creative work is already done; the rest is more or less an administrative task of comparison.
With few exceptions (to be noted below) the author falls back on a foundation of English words like “hand” or “toe” as the convenient, basic units of analogies. Unfortunately, this is where the tools and methods of cognitive science research begin to interfere with the outcome and confound its interpretation. To explain why, we must first explain what exactly language encodes into its semantics and structure, and how a given mind comes to incorporate it through discourse.
“Our” words, “our” world
Every natural language is a mode of communication, an attempt at aligning understanding across a group of people. Since it performs a shared function, it must do so in a way that makes sense in a shared medium. We can only contrive words that refer to things the world makes manifest to all involved — both in the physical and phenomenological sense — and about which we want to have conversations. It makes no sense to have a word for my unique, momentary viewpoint of the world, only for “universalized” views of it detached from my immediate and unreplicable experiences. Even words like “individual”, “subjective”, or “perspective” refer to everyone’s understanding of individuality, subjectivity, and perspective, which is why everyone is able to use them. Thus language, from the start, effaces any uniqueness in individual mental experiences; people can only express their thoughts by translating them into common terms.
A second, related feature of language is that it is forced to divide the world into recurring experiences. Unlike a momentary thought, any word we invent will exist for longer than a single instant. And so it must refer to something that can be experienced more than once, even if only in memory (e.g. “apple” or “yesterday”). This repeated experience is what gives “object-ness” to our ideas — they exist in a stable sense:
It seems as if humans have extended the notion of an object to encompass all kinds of things, which we think of as if they were somehow object-like. — Holyoak, The Human Edge
Words are also discrete symbols; there is no language so mercurial that it lacks fixed, primitive units of meaning such as words or morphemes. These are defined and codified before they can be used, via mutual agreement. The mind itself may be fluid and ever-changing, but language cannot be so — otherwise no “thing” could be discussed. Therefore, language by its very nature splits the world of experience into discrete, commonly understood, recurring entities and events. Putting something into words or symbols is a means to gaining clarity and consistency about it; its content must be perceived and framed as a consistent “thing”.
Now those who intend to join in discussion must understand one another to some extent; for without this how can there be any common discussion between them? — Aristotle, Metaphysics
Moving beyond individual words, the structures entailed in the relationships between words are also the result of trying to unify understanding across people and experiences. For example, the use of category hierarchies and taxonomies (e.g. “animal” → “mammal” → “bear” → “polar bear”) is an artifact of our shared need for clear, common understanding. Arranging the world into structured relationships is not an obligatory function of cognition; most of your thoughts are not a part of any clearly defined hierarchy until you attempt to fit them into one. There is nothing inherently problematic with treating the majority of your experiences as special cases.
Hierarchical relationships in particular are learned as the result of trying to resolve an apparent contradiction in discourse. For example, when a child points to a fish and calls it “fish”, but someone else tells them it is a “salmon”, she experiences this as a problem since the two used different English labels which are not synonymous. To resolve this, children learn that an item can have two labels, and that one term simply encompasses more things than the other. Similarly, when a child is told to say that they live in “Reykjavik”, and also that they live in “Iceland”, and also on “Earth”, they may at first be confused until it is explained that the latter can include the former.
The internal consistency we see in object hierarchies is a result of trying to come to an agreement on how to organize our joint reality; they are a peculiar effect of structuring our world in a coherent way. The human mind only demands structure and consistency in thinking after an incongruity in its explicit formulations becomes a problem — and even then, exceptions can be allowed if one so desires.
As for reality itself, it exerts an influence on language through spatial and temporal consistency. For example, most (probably all) natural languages have words for “inside” and “outside” (e.g. the pen is “inside” the box). But no language has a word for when the thing on the outside is also inside the thing on the inside (e.g. the pen is inside the box, and the box is also inside the pen). This is because the physical world rarely allows for that situation to happen. When we say that “inside” and “outside” are opposites, semantics have accommodated themselves to the objective patterns observed in the world. Similarly, if A comes before B, then we know that B comes after A — a reflection of how time really works.
The above section represents only a few examples of how language, as a formalized abstraction layer, reflects the regularity of the world and our need to coherently discuss it. Note that thinking itself is more fluid and expansive than the structures in language, and is not confined to the latter’s rules; otherwise it would have been impossible for me to suggest the self-contradictory arrangement of pen and box earlier. Nor could I invent convoluted interpretations of time, such as the one described in the movie Tenet, if my mind were beholden only to the true structure of reality.
Beneath the surface
It is all too easy for someone to conclude from the observed regularity in our tools of communication that the function of the mind itself is to structure the world according to such systems, rather than that the functions of communication and mutual alignment in our world necessitate we do so. This is such a common misunderstanding that very few people question it. They assume that the structures of language reflect the structures of thinking — rational, coherent thinking to be precise. Even many philosophers and scientists (e.g. Aristotle, Fodor, Chomsky) have confused the systematic regularities in how we discuss the world with an innate Language of Thought (LoT):
Philosophers, logicians, and psychologists alike have equated the formal rules of logic with laws of thought — Winstanley, Johnson-Laird on Reasoning and Logic
Much of human cognition can be understood in terms of constraint satisfaction as coherence, and many of the central problems of philosophy can be given coherence-based solutions. — Thagard, Coherence in Thought and Action
Signs (or spoken words) for mental states make it possible to form complex syntactic constructions that capture the relational structure of mental states. — Holyoak, The Human Edge
The mistake arises quite simply because in order to express anything at all, a person has no choice but to reinterpret their flow of vague, unstructured thoughts into a set of commonly understood symbols, and the relational patterns of those symbols. How can you tell me what’s on your mind unless you turn your thoughts into words and sentences that everyone already understands? Individual instances of unique thoughts will never be known, even by you, because to “know” or to “understand” necessarily entails that you take what is specific and categorize it under something communal, repeatable, and coherent:
truth properly belongs only to propositions […] Truth is the marking down in words the agreement or disagreement of ideas as it is. — Locke, Essay Concerning Human Understanding
Language and symbols have historically appeared to us to be the path to truth because we can only conceive of truth within their structures — structures such as reference, consistency, and repeatability. But truth here is a inter-subjective attempt to achieve stability, maintain objectivity, and enable communication. These are intentional activities, not natural or automatic ones. They require effort and inventiveness, for example, to find the correct words, syntax, and grammar to express a thought, or to discover the right set of symbolic statements and formulae to clarify it in one’s mind and to persuade others. Self-contradictory expressions, like “square circle” or “halo the batch intend”, are labeled “meaningless”, or lacking any stable sense, because they cannot be clearly expressed or conceived. Yet the fact that individuals try to attain coherence in their verbal expressions does not mean that the mind itself aims for coherence, as Holyoak (and others) argue:
[…] the human aspiration for coherence — to understand the world as “making sense.” We want our beliefs and attitudes to fit together, rather than blatantly contradicting each other. — Holyoak
Rather coherence is a social imperative embedded in the very use of language and symbolic communication. Human’s aren’t truth-producing machines; only societies try to approximate truth through their shared artifacts, e.g. books and discourse.
In contrast to society’s artifacts and symbols your private thoughts at any moment may be fluid, inconsistent, and non-sensical. There is nothing stopping you from thinking about inverted containers (and many a computer game has been built on such unreal-realities). The reason you don’t speak such thoughts in common conversation is either because you simply can’t — you have no shared word for it — or because they make you sound incoherent and are generally useless to everyone else. But just because people can only communicate in coherent terms, doesn’t mean that words, their meanings, and the formal logic we apply to them reflect the inherent structures of thinking itself. That would be like confusing an actor with their on-stage character.
The human-researcher interface
This confusion has consequences for cognitive science research, which relies on language to obtain and record its results — e.g. when testing subjects on their recall, requesting they perform a task, or asking about their preferences and opinions. The goal of such research, ostensibly, is to study individual cognition. Yet since there is no way to do so except through language, researchers must rely on everyday shared linguistic structures to explore what should be a subject’s private, idiosyncratic thoughts.
This is like trying to study a computer by observing the windows interface on screen. You would be forgiven for believing that the CPU itself also worked with windows, tabs etc., because that is what you see in the user interface. But windows interfaces were produced specifically because they make the contents of the computer understandable to us, they are a useful way for the computer to communicate its processes. The arrangement of data in actual computer memory is radically different, obtuse, and incomprehensible.
In the same way, language constructs a narrow, simplified medium or abstraction layer that makes human minds “understandable” to each other, one that makes consistent sense in our shared world. Whenever you describe what you are thinking, a listener — especially if they are a researcher — will generally assume your words represent the universal case, and that your thoughts are similar to others’ thoughts when they use those same words.
Semantics is not the study of individual meaning (which after all may not exist), but the study of extrinsic social meaning, of dictionaries and common cultural artifacts. It is an analysis of our shared abstraction layer, our facade or interface to one another. When we explore how minds acquire and structure individual “meaning”, we are really analyzing those terms that we’ve already established within the medium of communication. We are looking for explanations for “categories”, or “thoughts”, or “consciousness”, or “qualia”, etc. all of which are shared, universalized inventions. Any study of cognition built on top of shared symbols must be recognized first of all to be an artificial, effortful construct.
Unfortunately, the more that researchers try to gain precision and clarity on a working hypothesis, the more they must force the subject to narrowly formulate his or her output into an artificial regularity and universality suited to the experiment. Even beyond common language, every experiment devises its own schema — set of terms, symbols, and their operational relationships — which defines the theory under study. Test subjects are expected to give answers that are coherent with respect to this schema, and the experiment frames its questions in such a way to force these “truths” to emerge.
For example, if an experimenter asks you whether you prefer X to Y, the term “prefer” implies that you must place your complex experience of desires on a one-dimensional gradient where entries are lower than, greater than, or equal others, and where there are no internal contradictions (e.g. you cannot say A > B, B > C, and C > A). Giving an answer that fits this predefined schema is necessary for you to speak with any “sense” and for research to continue. Your feelings on the topic may be diverse, fluid, uncertain, even self-contradictory; yet such idiosyncrasy must be suppressed, since it adds confusion rather than contributing to any clear, universal theory. And so you come up with an interpretation of the contents of your mind that matches what is asked for, and that fits the goal of the study.
We can interpret the word “language” here more generally as the mode by which we understand and clarify experiences to one another in a consistent and coherent manner. Researchers work at the level of abstraction of the particular “language” they have settled upon for the experiment.
This confusion between language and cognition has consequences that are far more significant than you may at first realize. It casts doubt on nearly everything we know about cognition, language of thought (LoT), reasoning and logic, cause and effect, the very notion of thinking coherently. There is no aspect of cognitive science and AI research that is not heavily undermined and warped by having to work within this domain of truth through language. Even designations of aesthetic and moral preferences are shaped by the terms involved in researching them. For example, a test subject may be asked to categorize their complex thoughts into simple categories like “good” and “bad”, “right” or “wrong” etc. Or they may be asked to explain why they prefer one option over another in a way that makes sense and is not socially embarrassing.
Narrow-task AI
The Human Edge repeatedly employs words like “correct” and “successful” regarding its test subjects’ outputs, emphasizing that the subject should answer a particular way, and that any idiosyncrasy is a type of error:
the students usually failed to set up the correct equation for a target in which caddies were assigned to carts […]
The correct choice is option 4. […]
if the two icons in the sample are the same as each other, then the correct response is to select the option in which the icons are also the same
Six- year- olds achieved robust success on two-icon RMTS problems (as do adults).
And so on. The book refers to subjects that can or can’t complete analogies, and the brain functions that support their production, implying that some part of the brain itself is innately designed to support those capabilities required to succeed at that test.
Granted, there is a kernel of truth in this analysis, in that we very often have thoughts that complete analogies “correctly”. If we didn’t, or more importantly if we didn’t know what the correct answer even was, we would not be able to complete them at all. How would I know that toe is to foot as finger is to hand, if at some point I hadn’t acquired the knowledge that this was the “right” answer? A test subject may have any number of intervening, unrelated, “incorrect” thoughts, but in the end they can answer the question. What’s missing from these experiments is an appreciation of the complexity of human thinking that ultimately arrives at that answer; a process that could potentially derive a better answer, one the tester hadn’t even considered.
This is not exclusive to The Human Edge. You will rarely find a cognitive science experiment that doesn’t define its own narrow criteria of success, and in doing so embeds a schema by which it frames its outcome. When a researcher asks a test subject to complete a task, like piling up blocks, or finding the odd one out from a set, and gauges their competence, they are automatically assuming that the subject has made it their goal to do exactly that, rather than, say, to have fun or create art. In fact the pressures of the very same social approbation that labels an answer “right” or “wrong”, combined with the preexisting artifacts of language and communication are what generate the observed test results.
This is fine when testing for competence in a narrowly defined skill, but researchers often assume that the brain has evolved to automatically match the values of the test, rather than that those have been imposed upon the subject and learned through socialization. The approach then hits a wall when we then try to translate those same mechanisms into Artificial Intelligence systems. When we program an AI to innately complete tasks at that same level of abstraction at which we conducted the test, we are projecting our schemas and their underlying aims into its hard-coded architecture. This embedding into the AI our implicit expectations of what constitutes intelligent behaviour, constraining its thinking to whatever artificial abstraction layer our test was conducted at. The result, of course, is a spate of AI narrowly designed to succeed only at one predefined task.
The distinction between these two ways of framing cognition — as a natural, general ability vs. a narrow, task-oriented one — reflects a broader distinction between two definitions of intelligence. In the first case “intelligence” is defined as the neural foundation for all learning and thinking, which even children and the uneducated possess, though they lack an array of acquired skills. In the second case “intelligence” is a set of socially successful outputs, what we expect from a university graduate, which would lead us to say “that person is intelligent”. AI developers tend to jump to implementing the latter due to the financial demand to be productive. Fortunately for them our collected data — our corpus of mathematical theorems and proofs, our historiography, our scientific literature, our poetry, etc. — are an adequate mirror to the overall social consensus for what qualifies as intelligent.
The resulting systems, however, don’t think as humans do, but rather think in ways that researchers, societies, and the public consensus deem to be “correct” or “sensible”, as inscribed within our combined linguistic outputs (e.g. the content on the internet). Is it any wonder that AI systems appear to lack creativity?
The tip of the iceberg
To its credit, The Human Edge recognizes that past work on analogies has required an unnatural formalization of the world into schematized concepts and their properties. The book notes that the initial setup process for software analogy systems like Copycat, SME, LISA and DORA involves encoding a large body of terms and their features into a bespoke formalism, and injecting that before the system can begin searching for similarities. The author laments this artificial starting point:
In some sense, all the models assume their inputs have been suitably “tailored” to fit the form the model requires.
He notes that humans, in contrast, gain their understanding of the world through fluid, unstructured experiences:
What’s still missing, in all of the analogy models so far discussed, is the ability to draw analogies using naturalistic inputs — as people clearly do.
Yet when he tries to move away from formally structured inputs to more “natural” ones he defers, once again, to language, as though our civilization’s combined text corpora are an accurate mirror of the structures of human thought:
Generally, the [software] network may start to encode abstract properties of language — and of human thinking — that help to predict missing words in texts.
Note the ease with which the quote above slips between “language” and “human thinking”. But natural language is not “natural” in a psychological sense — it is a sum total of socially amalgamated symbols (words as concepts) useful for aligning disparate thinking on shared tasks. It’s true that people also use language in their private, personal thoughts, but that is no more “natural” than that I think in English and you think in Punjabi — both reflect their social utility outside my head.
The true diversity of human cognition that produces outwardly observed behaviour is far richer and deeper than can be expressed in any language of symbolic system; and this diversity is very often obscured by the need to explicitly represent its contents. Imagine, for example, that I were being tested on what was common between the two images below:
Any number of subjective, idiosyncratic, personal thoughts may flit through my mind in the moment, most of which remain unspoken. I may at first think that I see two instances of clothes that I find stylish, or that depict locations I would like to visit before I die. I may recognize that these are AI-generated images, and perhaps some distaste my linger in my thoughts regarding their aesthetics. But these thoughts are all personal, and if my tastes are esoteric enough I will likely not express them, perhaps for fear of judgment. Only some answers are considered appropriate replies to experimental questions. Those that are idiosyncratic or subjective are generally excluded or marked as as “incorrect”.
I will also have thoughts that have no linguistic correlates at all, vague feelings and peculiar, inexpressible memories. These will remain hidden because they have no medium by which to exit my brain. In the end, when pushed for an answer, I will try to say what I believe the tester thinks is the correct answer, and for which I have some English words ready at hand: I will say that both images are of “women eating”. The tester, hearing only this last utterance, may well assume that that is the sum total of my contemplations. And if the same answer is observed across many test subjects, they will conclude that the mind is naturally inclined to draw such an inference. The truth, of course, is that being socially mature adults we’ve all gauged what the right answer to say is, and we say it out of a mild peer pressure to get it “correct” while not embarrassing ourselves.
Even before research began, language had embedded the right answers in the very medium of communication. For example, if a researcher asks subjects whether cars are different from trucks, the widespread inter-subject agreement they will observe echoes the same social consensus that created two separate words in the first place: “car” and “truck”. Any discussion people have must be restricted to predefined English terms, and moreover the relations between those terms must also have been established previously by a shared semantics (I didn’t decide on my own what properties define a “truck”). So in a sense the answer was a foregone conclusion. The author of The Human Edge does note that there is often widespread inter-subject consistency in experiments, e.g.:
Research has shown that people largely agree in their assessments of the “goodness” of meaningful car components, such as headlights and doors.
But doesn’t mention that the reason words like “headlight” and “door” exist in the first place is because we all agreed that these are useful affordances on a car. The same is true regarding the pictures of the women above. We have expectations about how one should discuss their contents, which aspects are socially relevant, appropriate, or important, and our language codifies this agreement — for example, the clear separation engendered in the words “man” and “woman” is quite intentional. Language highlights those distinctions its users consider pertinent, at the expense of other nuances they didn’t care about.
The result is that we end up conducting research using shared linguistic abstractions, then marvel that the mind apparently thinks according to the very structures which frame our questions. We paint a world with reds and greens, and then stare in astonishment when the outcome is red and green, as though it was not us that had painted it in there in the first place.
An invasion of micro-features
As if we didn’t have enough troubles, there is a second pervasive illusion — ironically, one based on a false analogy — that leads Holyoak to believe the mind perceives the world using the same formal structures as are instilled in language. He makes the all too common leap between the hierarchical composition of objects as observed in visual/auditory perception (i.e. parts make up a whole) to the same for abstract ones (i.e. parts of abstract concepts make up the concept).
He describes how subjects completing Raven’s Progressive Matrices suite of tests look for analogous relations between parts of the images; and then suggests that since our eyes perceive the images as objects with parts and find such correlations, the same decomposition and recomposition must be at work when dealing with abstract entities — even for opaque concepts like time or nature:
many abstract relations are derived (perhaps ultimately by analogy) from spatial relations, and furthermore continue to depend on some of the same neural machinery. […]
Such evidence for parietal involvement in the generation and manipulation of active representations of abstract relations supports a general principle articulated by Barbara Tversky: “Spatial thinking is the foundation of abstract thought.” […]
If a musical melody is transposed from the key of C to the key of E major, the notes change but the intervals between them remain the same. The ear can detect this “analogy” — a set of systematic relational correspondences.
The book easily switches between visual analogies and abstract, conceptual ones, and ultimately concludes that the process of analogy involves decomposing, analysing, and comparing component features of concepts:
The meaning of a word is not indivisible. Rather, words are interrelated by shared features (perhaps microfeatures) […]
The representations of concepts are distributed in that each concept corresponds to a pattern over multiple features, and different concepts may share some of the same features. This type of representation provides a simple explanation for why similarity of concepts is a matter of degree, rather than all or none.
This is both an easy and a common mistake. We perceive how the parts of a given image make up that image, and we can analyse these in a straightforward way, because the physical/spatial world allows for this. We then forget that our abstract thoughts are not so well-structured or consistent. They are only organized into structures through a subsequent act of retrospection and analysis — we invent their structure as we go looking for it.
For example, when trying to explain word-concepts like “beauty” or “justice” to someone, the functions of communication require that you decompose the concept into objective (inter-subjective) identifying features. You must find shared, communicable “things” that seem consistent and common enough to reliably distinguish the word. Such an explicit definition is always conceived of after the fact— you did not originally learn the meaning of concepts like justice by being given their dry definition.
Any resulting definition is always a social compromise: it doesn’t reflect any one individual’s experience perfectly. In my mind, justice is not its dictionary entry, it is my personal experience of desires and the give-and-take of getting what I want in a social context. And even if the old adage “might makes right” is not actually the case, it is a fact that many people believe that that is “justice”. Of course, neither of these is how we communally agree to define justice, because for one member to claim special priority for themselves would not do when the goal is group alignment.
In general, decomposing abstract notions into their clear component parts is imperfect, and often impossible. What is nothing composed of? What are the features by which you can recognize time? You can’t point to either of these in the environment around you; nor can you explain a concept like time to a friend unless they already have a sense of it in their mind, for which you are simply providing a communicative label.
Falling back on language
This leaves Holyoak is left at a sort of impasse — he believes that all terms involved in an analogy must be compared for similarity by their features, but he is unable to actually discover the necessary ones to insert into his algorithm. So with no other option, he falls back on a deus ex-machina. He doesn’t try to find the features himself, which may after all not be expressible in English, rather he lets an AI system generate them for him. He uses natural language models like Word2Vec that transform words into vectors based on how they are commonly used in text — a linguistic paradigm called distributional semantics:
You shall know a word by company it keeps — Firth, Studies in Linguistic Analysis
The abstraction mechanisms used to obtain distributional models are such that similar contexts of use result in similar vectors […]
Words are points in a space determined by the values in the dimensions of their vectors — Distributional Semantics and Linguistic Theory
Holyoak proposes that the vector representing any word can be interpreted as a numerical list of its ineffable features, where each dimension of the vector encodes one feature. From this starting point The Human Edge begins to build its own software model for completing analogies called Probabilistic Analogical Mapping (PAM). By comparing distances within the vector space words and their relationships can be correlated for similarity:
What we would like is a method of finding microfeatures that can each be expressed by a numerical value, so that each feature corresponds to a continuous dimension. The meaning of a word would then be a list of its feature values — in mathematical terms, a vector. […]
By building a reasoning model on top of learning mechanisms grounded in distributional semantics, PAM has drawn closer to the goal of automating analogical reasoning for natural-language inputs.
This is a desperate, Hail-Mary pass. The original mistake — confusing the abstractions of language with the substance of thinking — has in no way been corrected. Only now The Human Edge does a sort of “trust-fall” into the arms of language models which supposedly explore and uncover semantic similarity on the author’s behalf.
It is perhaps to the credit of modern language models that they try to find patterns in word usage in natural text without the need for artificial, hand-coded datasets that may bias the outcome. Still, in the end what we see in their output are the structures of discourse and the abstractions we create through social symbolization; they are not the structures of thinking itself. When The Human Edge quotes Firth as saying “you shall know a word by the company it keeps”, its author neglects that it is a word, not a thought that can be known in this manner. The thoughts that run through your brain are unique, unpredictable, inscrutable, and not conducive to easy labeling.
Analyzing thoughts by analyzing words is like studying the movements of cars to understand the desired movements of their human drivers. Whereas cars must abide by certain restrictions for effective transportation and productive road cooperation, their drivers likely would not follow those rules were they outside the vehicle. Not many of us stroll into a gas station and stand around for three minutes; and fewer still allow their Toyota into their private living rooms and bedchambers.
Unfortunately, until someone proposes another mechanism or approach to studying minds, the only option a researcher has is to fall back on a simplified abstraction laid overtop the baffling complexity; some systematic formalism that covers up all that idiosyncratic nuance. They can then work within that abstraction like an API to the brain. And language has incidentally been a readily accessible abstraction since it is always available within the context of research.
The messy business of thinking clearly
If we want to better understand the actual complexity of how the mind generates analogies, let us look at what happens if an analogy cannot be readily completed. For example, fish is to laptop as boat is to…? When no easy answer is available, my mind starts to grope around the feelings involved, bringing in disconnected personal images (e.g. fish makes me think of a movie involving fish, perhaps a fishing net I saw, and so on). I abandon the search for what I presume to be a ready answer, and retreat into private considerations to invent something new.
I might try to identify and extract the objects’ individual associations (fish are shiny, and so are boats) and keep those in memory. I may then bring in the other parts of the analogy, and hope some interpretation clicks by chance — just like solving a puzzle. Fish and boats are both sea-faring, but boats and laptops are human technologies. Upon this consideration, I may now wish that the items were rearranged, since that would give me an easier answer. Boat is to fish as laptop is to…hamster? The latter are both found on a desk. This possibility of cheating shows that my goal is to give the tester a plausible answer through whatever means I can. I feel challenged, a social pressure to succeed has been placed upon me. Unlike the PAM system my goal is to prevail, not just to produce an automatic, best-effort answer. If the adjudicator rejects my “creative” approach, I may simply give up — again, in contrast to software systems which cannot quit (hence the epidemic of hallucinations in Large Language Models).
Indeed, what is there to stop me from intentionally answering incorrectly, perhaps saying the first thought that flits through my brain? Nothing except some mild embarrassment, or consideration for the tester. If I were careless or spiteful I may very well do so. All in all, some concerted effort is involved in answering an analogy question correctly, some care and motivation must be driving it. It does not come naturally or automatically. Unlike PAM, I must work hard to move my thoughts into a socio-linguistic space, and make those coherent. PAM, on the other hand can never leave the space of language.
If we are to design systems that think like humans do, instead of looking to language we should start at the other end: we must try to design systems that could potentially think according to linguistic abstractions — if so motivated — but that also have the option to think in whatever way appears useful to them. The purpose of this website and the posts within it has always been to describe such a model, one that pushes the abstraction layer lower down the stack. A key point we have repeatedly come back to is that each and every mental action is triggered by an underlying motive, and that the latter shapes the content of the resulting thought. You generally do not see the motivated forces that inject individual thoughts into your mind. Even during introspection, you only see the results, the concrete symbols and words that are recorded. So you might fairly assume that cognition occurs across these explicit symbols, and trace their patterns when trying to understand cognition. This makes your job easier, since that space is also where expressive language runs its course. But it neglects the source of those very thoughts.
In the case of analogies, the motive for generating a given analogy matters quite a lot— it shapes the “interest” or “significance” aspect of the analogy. Why this analogy, why now? How does the subject know that a request to complete an analogy has even been made? Such a thing is not hardcoded in humans as it is in software systems like PAM, where the decision to generate an answer is built in by the programmer. There must always be a specific reason you decided to engage with the content of an analogy; each is a linguistic, communicative construct suited to a particular set of circumstances and demands.
Most analogies aren’t generated as part of some contrived lab experiment, they arise naturally during discourse, such as when you are trying to explain or teach a concept to someone. An analogy is a didactic hammer. When an author defines a particular analogy, it is because he wants to achieve a specific underlying motive; he is not arbitrarily matching on features and similarities for fun. His goal is to get you to arrive at some conclusion, to convince you. Even if the terms of the analogy have nothing else in common, as long as the essential piece that achieves the underlying aim is present, the analogy is considered successful.
Superficial similarity of features is still useful, but it doesn’t tell you which similarities are the important ones and how they should be interpreted. Some relations seem to fit better than others, even though it’s hard to explain why. Hand is to glove as computer is to operating system seems awkward, but why? Both are used to by their respective entities, yet it feels like antivirus would be a better match for computer, because we focus on the significance of protection (compare with the phrase “no glove, no love”). Protection, however, is a motivated abstraction; it is connected to our interests, and its features can only be decoded in the context of what the person wants to be protected against.
Holyoak’s third dimension for deconstructing analogies, namely “relevance”, implies there is a particular purpose to which a given feature is relevant. Thus the motive driving an analogy cannot be disentangled from the rest of the content. The core issue with the PAM system is that it attempts to generalize across all analogies using a generic, impartial formalism:
The researchers who developed the models sketched in this chapter have agreed on a basic premise: analogy is a domain-general process that operates on representations of explicit relations.
Trying to explain all analogies requires us to explain how they work in all examples and across all people. As a result any uniqueness must be effaced, and we must look instead for some handy common unit of generalization. This is even more important if you want to implement your system in software. And so we fall back on language or other symbol system to enable universalization; a convenience that leads us to reflect back into our research the very structures that we ourselves put into it via the medium of analysis.
Summary
Language is a communally invented and structured abstraction layer we build overtop thinking. It is an achieved consensus regarding how we should communicate in our shared world, the result of an ongoing interpersonal negotiation. It is not, however, the substance of thinking itself. The term “language of thought” and theories surrounding LoT are misapprehensions; they arise from the fact that we have no other way to know or understand anything about cognition except through symbolic expression; and so we confuse the medium with the message.
Unfortunately, since all cognitive science and AI research relies on tests which use commonly understood symbols or units, the theories we extract from a given study arise from a prior social act that originally made those units “common”. Much of our understanding of human intelligence is actually an understanding of the requirements for communicating in a shared, objective world. Coherence, spatio-temporal modelling, planning, logic, cause and effect, ethics, and aesthetic preference all take shape in our discussions about, and interactions within our shared reality. They may be descriptions of objective truth, but they do not represent cognition itself. The latter is idiosyncratic, whereas concepts are necessarily universal. Ultimately, the AI systems we build based on this research end up living in a narrow space of linguistic codifiers. They echo and are beholden to the social needs that generated the codifiers, and as such remain creatively constrained to them.
