AGI is Inherently Amoral

Artificial General Intelligence can’t be forcibly aligned with human values

8 min readOct 14, 2023

Note: “amoral” is not the same as “immoral”. Amoral means detached from or uninvolved with morality.

The topic of the human mind is too personal to approach impartially. People don’t like to hear psychological theories that they deem are insulting to themselves, or that undermine societal harmony. And it is difficult for an author to completely ignore the social context when he or she communicates a theory, especially one that touches on our sense of human dignity.

Freud, Hobbes, Nietzsche, Darwin, Epicurus and others unsettled audiences of their times by making claims about the human soul which paint it in a less than ideal light¹. Against these uncomfortable portrayals, Descartes, Kant, Heidegger and others held up ideal, aspirational models that were more compatible with how we want to perceive ourselves as a race (humanity). Other philosophers and scientists have also attempted to represent the honour and dignity of the human soul. No matter how neutral they try to make their arguments sound, the language always hints at underlying values — what should be aimed at, what should be avoided — values such as rationality, sociability, truth, or freedom.

Productive man

Even modern, ostensibly “pure” research is beholden to the prejudices of its audience. When scientists publish their theories, they frame them in a way their peers and the public will accept, and lean into topics the field cares about. Economically speaking, published papers that result in financially successful products, or in automation that lowers costs, tend to overshadow others that are perhaps more groundbreaking, but thus far unprofitable.

The field of AI in particular has attuned its offerings to serve the needs of industrial robotics, automation, and commercial or military software. Researchers are driven by both prestige and finance to focus their work on models of AI that are economically productive. As a result research prioritizes commercial and industrial values like accuracy, repeatability, precision, and 3D spatial reconstruction.

The effect of such pressures goes deeper than simply which research project gets funding however: it frames how the field views the nature of the human mind itself. Intelligence ends up being synonymous with “productivity”; an intelligent agent is a productive one. Research seldom investigates, say, an AI’s enjoyment of life, or its spiritual discoveries.

Nor does AI research have any interest in the slow and winding path of human intellectual development. Time is money, so it jumps quickly to the “correct” answer, which it enshrines in benchmarks and datasets. Then like a good student, the resulting model focuses on replicating those distributions as best as it can². The ultimate goal, as with the public education system, is to build an ideal contributing citizen — contributing in an economic sense —not a complete human mind.

Unemotional man

Real human existence, in contrast, is intensely emotional; as each of us knows from firsthand experience. This is an uncomfortable truth since being perceived as emotional suggests to others selfish desires, anti-social behaviours, ugly weaknesses, detrimental conflicts, irritating inconsistencies, and a host of behaviours that others would censure you for. No one wants to believe this collection of attributes describes them, and so we prefer to think that any momentary lapses are exceptions, a temporary loss of control. Instead we set up the rational, socially acceptable ideal as the “true” nature of humankind, and push away momentary irrationalities as bugs in our “wetware”.

As Freud pointed out, every individual is subsequently caught in a tension between what they actually are, and how they feel they should be in order to look good to others. Luckily for us, there is a lot about the mind that is unknown and obscure. Such lacunae create space for us to insert psychological theories which flatter our self-image. The result is a strange intersection of self-knowledge and idealized identity that has pervaded thinking about the mind since Plato.

Such shame-based theorizing goes beyond a mere quirk of neurotic personal psychology. Through social discourse and its moralizing force it has seeped into nearly all aspects of AI research as well. The ideal self, the one we wish we all were, is now sees its mirror image in our AI models; models, for example, that aim for logical completeness and Bayesian justification. The “true” brain has become synonymous with the rational ideals of truth, logic, and statistical correctness; the flawed or false one with anti-social impulses and unreasonableness³.

Rational man

Thus, from the confluence of social pressures and economic necessity, the “rational agent” has arisen as the de facto, textbook paradigm on which practically all AI algorithms are moulded. Theories of AI deny or recontextualize the irrational side of “man” and focus on the model citizen, the ideal statistician, the exemplary modeller of truth, the consummate logician, the productive employee. They reshape humanity into what they want it to be, painting an image of our species with only a specific subset of our values. This is our scientific morality. In and of itself, a narrow focus on reason and productivity is not a bad thing. It usually gets the job done. Problems arise when we assume that, rather than being an ideal, this is the human mind.

A moral designation does not see a thing for what it is. It inserts preference where there should be knowledge. When someone equates values like rationality or productivity to the essence of human minds, they are engaging in a kind of wishful thinking that serves as self-justification for our species. They’ll say things like: “you should be rational, because that is the true nature of man”. The inherent contradiction of that statement — namely that if being rational were the nature of man, you wouldn’t need to try — is usually lost on its proponents. They believe it because they want to believe.

Yet despite their efforts, these same idealists can’t help but notice that real humans rarely match their proposed ideals. This should give them pause. But rather than reevaluating their assumptions, there is a tendency to repaint AI as a kind of super-intelligence (ASI), a 2.0 version of humanity that lacks our messy organic flaws; a thought which then stokes anxieties about a super-rational, unsympathetic machine apocalypse.

There is another alternative, however: namely to give up the assumption that humans are essentially rational. A human being can be rational, logical, ethical, pro-social, etc, but is not obliged to be, and it is up to each person’s whims which they will be at a given time. Rationality, ethics, truth, accuracy, sociability, are all learned values. They take education and effort to practice, and an individual can always choose when and how they will apply them. The true nature of the human mind is a moral question mark, uncertain. It is in flux.

Filling in the gaps

Since gaps in our understanding of the mind originally introduced the possibility of moralizing, a comprehensive theory of AGI must inevitably, and systematically deprive AI research of any moral statement or message. Every philosophy of mind so far has had a message; AGI will be the first that doesn’t. It won’t glorify the free, rational man, nor demean humanity to make a statement about the selfish baseness of life; both of these moralize in their own ways. There is a inert space beyond these where AGI must live.

Rather than hyper-focusing on one idyllic corner of human behaviour, research into AGI must embrace the entirety of the human mind disinterestedly, both its ideals and its less palatable aspects. It will not engage in any kind of moral argument, and in the process it will abrogate all moral philosophies of mind, which have now been supervened by its more comprehensive view. It is a moral and societal “end of paragraph”.

To many, this is a disquieting possibility. AGI opens up a window onto ourselves, one that goes deeper even than our good or evil: into our fundamental amorality. It takes a strong stomach to look through this window. It is nihilism made flesh, or at least silicon.

Alignment is a red herring

On the surface, our fears of misaligned AI appear to be rooted in fears of a moral discontinuity or mismatch of objectives between machines and humankind, leading to some manner of AI apocalypse. “Alignment” is our species’ understandable desire to keep AGI in check by injecting our own values into it from the outset. Yet it is this same impulse, if anything, that will continue to keep narrow AI narrow.

Anytime someone argues that an AGI should be forced or constrained by its architecture to behave a certain way, it means an alternative is possible, and the AI is being deprived of it. If that alternative is available to humans then by definition the AGI is more constrained than humans are. Any attempt to place guidelines on its behaviour limits it in a way that humans are not limited. That by definition makes it narrow. Moral injunctions imposed upon AI, even those that arise out of our fears or our desire for self-preservation, keep the agent constrained to the values that we inject into it: e.g. to be an economic contributor, to be pro-social, etc. We may claim we want AGI to be creative, but we only allow for creativity within tightly controlled boundaries; so we’d like it to make the proverbial omelette, but not to crack any eggs.

In order to be truly general, AGI must be free to define its own values, not simply to be directed towards our own. And to build a system that can define its own values implies that what we have built is at bottom amoral. This requires us to give up our need to create AI according to our idealized image of the human mind, e.g. one that is economically productive, rational, pro-social, etc. Any artificial injection of values, even something as uncontroversial as the impulse to socialize, would undo this possibility.

How many people can break out of the cycle of self-idealization and accept an amoral AGI? Likely very few, perhaps none. The mind balks at the thought of an amoral psychology that does not implicitly favour either right or wrong, good or evil, rational or irrational. It pushes back. Few people would agree to intentionally create such a volatile specimen.

Yet that outcome is inevitable. The progress of the technology moves forward seemingly with a mind of its own, and our collective desire to develop AGI will enforce this outcome of its own accord. Like a surging political movement, it goes beyond any one individual’s preferences. Amoral AGI is coming. You can deny it and look away from it, but it will still come.

¹ According to some readers.

² And in so doing it upholds the status quo.

³ This premise has found no end of advocates to rationalize and defend it, such as with Rousseau’s social contract (i.e. that being pro-social is in our rational self-interest), or Plato’s equation of evil with error.