The Empiricist Axioms that Shape AI-Generated Art
A Tyranny of Distributions
Could you tell if an image was “art” just by looking at it?
Even people who dedicate their life to studying art wouldn’t be so presumptuous as to say they always could. Nor is “art-ness” an objective or final property of a given work. Paintings by well-known masters that are later discovered to be forgeries lose much of their value as art, even though the physical artifacts themselves remain unchanged. It seems there’s a lot about art that’s in the eye of the beholder.
Given all this, it’s difficult to defend the position that AI-generated art isn’t art. There’s a solid case to be made that if it looks like art to at least one person then it deserves that title. And the inability on the part of artists to describe what exactly makes human-made art exceptional leads many of their opponents to view this as a tacit admission of defeat.
Human exceptionalism may well be chalked up to artists fearing that they are losing their monopoly on art creation. The counter-accusation that AI art is simply plagiarizing human-made works can be refuted by noting that artists are frequently inspired by their peers’ and predecessors’ works. All in all, the evidence is damning.
Why don’t artists seem to think so? The two sides of this debate, namely AI-researchers (and their fans) and artists (and their supporters), appear to be basing their positions on incompatible assumptions about the nature of art, derived from their respective fields¹. Writing as a former visual artist myself, who is now an engineer in AI, this post will explore the case from both sides.
The AI Researcher’s View of Art
Artificial Intelligence research, like all other fields of scientific inquiry, is rooted in methodological empiricism. The goal of scientific research is to objectively understand and model the natural world. Its methods emphasize quantifiable precision, impartiality, repeatability, and sound reasoning. All of this is true of AI research as well; however, the particular case of AI research presents the method with an unusual situation.
In AI research the thing being studied (intelligence) is the same as the thing doing the studying (the researcher). And since AI researchers are human beings with their own ideals and values, the values of the practitioners, and of the field of science as a whole, implicitly carry over into definitions of intelligence that the field has embraced.
It’s important to remember that for us humans a rigorous, statistics-based approach to understanding the world is a skill that has to be extensively taught and learned. It’s not natural or innate. It represents an aspirational ideal — what researchers ought to aim for.
But in nearly all definitions of AI it is assumed that the primary function of the mind is to be an impartial, objective, accurate mirror to the world. The consensus image of Artificial Intelligence is based on an ideal archetype of the “scientist”: i.e. an optimally rational entity whose model of the world is empirically derived. Irrationality, wishful thinking, or motivated reasoning are not, as with humans, considered to be its native state, which the AI must learn to overcome.
The standard of rationality is mathematically well defined and completely general. We can often work back from this specification to derive agent designs that provably achieve it .
Improving the model components of a model-based agent so that they conform better with reality is almost always a good idea.— AI: A Modern Approach
Making good predictions plays a central role in natural and artificial intelligence in general, and in machine learning in particular. — AIXI: Universal Algorithmic Intelligence
Throughout AI research, and the related field of computational cognition, the distinction between how the human mind works, and how it ought to work is blurred. Empiricism is usually assumed to be the mind’s default, innate mode of operation:
These results support predictive coding theories, whereby the brain continually predicts sensory inputs, compares these predictions to the truth and updates its internal model accordingly — Evidence of a predictive coding hierarchy in the human brain listening to speech
The assumption that “intelligence” always means empirically modelling the world around you is so fundamental to AI research that few papers bother to mention it explicitly. It pervades almost every concept and method in the field down to its roots; to name just a few:
- Classification, where the goal is to accurately match the test data-set’s labels
- KL divergence, which measures the difference between a predicted set and the ground truth set
- Recreation loss, which penalizes anything an AI generates that doesn’t match the ground truth
- Autoencoders, which aim to compress the input data into a form from which it can be later reconstituted
- Predictive coding, predicting what will happen next in time, according to a ground truth data-set
- The Free Energy Principle, the theory that the mind is trying to minimize surprise by most effectively predicting what will happen
- Bayesian Belief Networks, which adjust beliefs based on their empirically-derived statistical likelihood
And many more. Even the concept of ‘ground truth’ itself suggests that the truth to which the machine should conform its world model is to be given to it from the outside, i.e. empirically, and with as little alteration from the agent as possible.
All this is uncontroversial in the field. It’s even a point of pride that many types of AI — deep fakes, image restoration and image super-resolution, NeRF-based 3D modelling — can recreate the world with greater fidelity than humans ever could.
When it comes to AI-generated art then, the goals, methods, and theories of that branch of research echo those of the field of origin (science), rather than the target field (art). You might reasonably guess that many in AI characterize the creation of art as an attempt to mirror living human experiences in some way.
As an example, the claim that AI and humans equally engage in plagiarism (see above) is predicated on the assumption that human artists are aggregators of their own experiences, just like generative AI are. Humans process visual input, then regurgitate an interpolation of those as “art”, maybe adding some idiosyncratic flourishes and imperfections. If this is true, then humans and generative AI have little to differentiate them except the volume of work they can produce.
This raises an obvious difficulty. If the human mind is only an aggregator and replicator of experiences, then any artistic style that isn’t strict realism wouldn’t make sense. Indeed, the entire field of visual art should have vanished with the invention of the camera; just as lamp-lighters disappeared with the advent of electricity. Their work was done.
The easiest way to explain this inconsistency is by recourse to a human tendency towards “emotional expression”. The goal of art then, is not simple modelling and recreation of experiences, but to convey something with an emotional affect. This explanation itself may, in turn, conflict with another assumption of AI research which posits that humans (and AI) are best characterized as rational agents. A compromise from the perspective of the rational sciences is that emotions are regrettable — or indulgent — flaws that we resign ourselves to as inheritors of our evolutionary history.
Whatever the true nature of emotions may be, the AI research community accepts them as a factor in art creation. Since emotions aren’t spontaneously produced in generative AIs themselves — nor is it deemed desirable they do so—they act instead as a random variable. The AI conditions the output image on an “emotional” input:
As an important means to communicate information and emotions of humans, visual art (e.g., painting and photography) is prevalent in our daily life and can now be created by AI.
One could express a sad feeling by writing “my best friend will be going to school in another country for 4 years” and a generative AI would generate a matching emotional image. — RePrompt: Automatic Prompt Editing
Unique artistic styles, like pointillism or German post-expressionism can be framed as elaborate flourishes of emotional exuberance. This raises the question of why one particular style dominated others during a given time period. The answer can be broadly deferred to historical factors, like the invention of the camera. And since historical counterfactuals can’t be studied in an experiential setting, that line of inquiry goes no further.
The importance and originality, and therefore also our perception [of art]… greatly depend on the art historical context. For that reason, it is obvious that current approaches are limited because they only take into account visual image features. — Understanding and Creating Art with AI: Review and Outlook
In summary, art in a research context is framed as 1) an attempt to reflect what a person has previously experienced, and 2) as a non-rational outburst of feeling, the tendency to which we are saddled with for evolutionary reasons.
So when AI researchers analyze and evaluate art pieces, they adopt some combination of the following three stances: reductionist, behaviourist, and dismissive.
- Reductionist: The creation and appreciation of art can be reduced to a set of measurable features, such as colour variation, tonal contrasts, rule of threes, novelty, etc.
- Behaviourist: If people think it looks like art, then it is art. The researcher relies on human report and judgment to make that determination.
- Dismissive: Art as a human activity is inherently irrational. Any self-serious study of the “meaning” of art pieces is a pointless academic exercise. (This attitude is rarely explicit, and is more often subtext in research papers.)
When generating AI art the attitudes and assumptions from the research side are echoed in the AI agents’ underlying architecture. To date two major approaches have been used, both separately and in combination, to generate art using AI. The first is based on the reductionist stance. Here a researcher begins by listing a set of features that qualify something as good art, for example:
Concreteness, dynamics, temperature, valence, hue contrast, brightness contrast, blurring effect, vertical center of largest segment, saturation of 2nd largest segment, blurring contrast between segments, size of largest segment, width-height ratio, and presence of a person. — Aesthetic preference for art emerges…
The researcher then iterates colours patches on a digital canvas within a range of ideal values for those attributes, seeding the work with a bit of randomness. Any arbitrary set of pixels could be art, but to be good art, the chosen attributes should be constrained to those ranges. The evaluation criteria are derived from research into computational theories of art:
One of the most interesting aspects of adopting a quantitative approach in analyzing large artistic datasets is the possibility to define high-level features that correspond to abstract notions of understanding art. — Understanding and Creating Art with AI: Review and Outlook
Promising questions for empirical research include a better understanding of how these visual perceptual attributes contribute to the aesthetic experience. — Neuroaesthetics: A coming of age story
The researcher can’t add or remove attributes later, as doing so would invalidate the results of the study. This set of constraining features is nearly always calculated from the standpoint of the last five decades, incorporating recent developments like abstract and expressionist art. And since they also can’t know how the generated images would have been received during, say, the middle ages, consequent claims of artistic creativity can only be considered applicable for our own time period, for which they were meticulously feature-engineered.
Whether or not the reductionist approach is the correct way to create art, it has not been particularly prolific, and its results have been largely ignored by the public. And so we turn to the behaviourist approach that is the foundation of popular generative AIs like Dall-E and Midjourney. Its output is not restricted to a style or time period; you can specify any style, from baroque to conceptual art to cave-paintings. All that matters is that you feed it enough labelled examples so it can credibly generate something in that style.
The reductionist and behaviourist approaches have a few things in common. First, they both define art-ness as an objective attribute of the works themselves, outside of any artist’s or viewer’s subjective interpretation. The only variable they consider are the actual pixels in the images.
Second, they are both attempts to fit the generated image to some predefined distribution. In the reductionist case, the space of possible art pieces is anchored to a set of fixed, hand-crafted features. In the case of behaviourist generators, the space is merely the set of images it has been fed, all of which we assume qualify as art. In both cases the target distribution is static and will only change if you add new features or new images, respectively:
The generator explores the creative space by trying to generate images that maximize style ambiguity while minimizing deviation from art distribution. — Art, Creativity, and the Potential of Artificial Intelligence
A third similarity is that they both have, as their starting point, “randomness”. In the reductionist case, any two artworks that match the ideal feature values are considered equally viable, the only difference between them is their random starting seed. In the case of behaviourist models randomness is the literal starting point. Modern text-to-image generators use a rectangle of random pixels as their initial canvas, then iterate the pixels along a gradient until they match the desired distribution.
Randomness is necessary in the generation process because “creativity” in AI research is widely defined with reference to two properties: novelty and usefulness. Novelty is equated to unpredictability or surprise, which is where randomness comes in.
[DeepDream] is clearly not simply replicating or near replicating any of the training set images. By design, DD outputs bear visual similarity to the input images, but the resemblance is not so slavish as to exclude creativity. — Informing Artificial Intelligence Generative Techniques using Cognitive Theories of Human Creativity
The “usefulness” half of the equation is most easily measured by gauging whether humans like what they see.
One of the major arguments for labelling generative AI systems as creative was the fact that the work they produced was indistinguishable from human-made art and perceived as surprising, interesting or aesthetically pleasing by a larger number of people.— Art, Creativity, and the Potential of Artificial Intelligence
If you assume that this is similar to how humans create art, then the theory is complete, and AI can be considered to be artistically creative. The only remaining difference is that humans come up with their own ideas (prompts) for what to draw. But this is a small gap, which could be bridged with Large Language Models (LLMs) like GPT. At that point we could confidently claim to have created a complete artificial “artist”.
The Artists’ View of Art
As expected, artists’ own perspective on art creation is not derived from the same axioms of scientific impartiality and data modelling as those that drive AI research. So what are they based on, and more importantly, why are they different?
If one had to summarize the last 100 years of art, including its academic analysis into a pithy aphorism it would probably be:
Art is not a mirror to hold up to society, but a hammer with which to shape it. — Bertol Brecht
Compare this with the assumptions underlying AI generated art. According to the latter, art is a mirror held up to some aspect of life; whether it’s society, the physical world, other people’s art, or the artist’s emotions. From an artist’s perspective however, art is intended to change the viewer’s thoughts and feelings in some way.
And that change may be significant. What we call “revolutionary” art alters the very concepts through which people perceive and interpret art itself, and even how they perceive the world, e.g.:
[Schoenberg’s] innovation was not just to find a new algorithm for composing music; it was to find a way of thinking about what music is that allows it to speak to what is needed now. …Schoenberg changed our understanding of what music is. — Sean Dorrance Kelly: A philosopher argues…
Schoenberg’s dissonant compositions were a challenge not only to the harmony of classical music theory, but also to the harmonious facade of the classical world itself. Marcel Duchamp’s infamous Fountain — a mass-produced urinal signed with a false name —was a similar jolt from the Dadaist collective; a critique of high-brow art fetishization. Each of these works changed the trajectories of their respective fields, and set the stage for their successors, who strove to startle new generations out of their own complacency.
From this point of view, art is not a context-less set of paintings or photographs, it is the entire historical process of change itself; like a wave moving through human history, in which an individual artist is just one drop. The introduction of novel artistic movements like minimalism and performance art did more than just add pieces of art to a pile, they undermined existing criteria and benchmarks of what art was, and invalidated old theories based on those assumptions.
Without access to the historical-social circumstances that give birth to these trends, there is a limit to what AI-generated art can do. If you were to try to recreate the above historical progression in a lab, say by excluding all art from before 1917 from a given training set, an AI trained on that data-set would not spontaneously produce Duchamp’s Fountain. All it could contribute would be undirected, random variations and recombinations of what existed before then.
This means that generative AI, whether reductionist or behaviourist, must always lag behind societal changes in art. Exciting new styles and trends will show up in human-created art first, and only later when AI models have been retrained will they appear in AI-generated artworks. It’s a one-way street.
And this deficit is not easy to overcome. It is firmly rooted in the fundamental axioms of the methodology itself. The entire organon of science, from double-blind testing to replication studies, is driven by the principle that the experimenter should not change the nature of what is being studied. They must only observe and reflect it.
Most AI research into art circumvents art’s role as a change-agent, if only out of necessity. Many new research papers actually step back from addressing “art” per se, and focus rather on “aesthetics” (that which is visually pleasing). Aesthetics is an easier topic to study, since its experimental criteria are clear: on a scale of 1 to 10, how much does the test subject like what they see?
Computational aesthetics is a growing field preoccupied with developing computational methods that can predict aesthetic judgments in a similar manner as humans. — Understanding and Creating Art with AI: Review and Outlook
Our approach to art, in this essay, will be to begin by simply making a list of all those attributes of pictures that people generally find attractive. — The science of art: A neurological theory of aesthetic experience
Since humans tend to enjoy looking at familiar scenes and colour patterns, the study of aesthetics is generally backwards-looking and conservative. Its focus is on one particular aspect of art that is relatively constant and easy to formalize. Doing so is necessary if a researcher is to establish repeatable, quantifiable, universal truths, without worrying that they’ll be undermined by some up-and-coming contrarian artist.
We suggest in this essay that artists either consciously or unconsciously deploy certain rules or principles (we call them laws) to titillate the visual areas of the brain. — The science of art: A neurological theory of aesthetic experience
Some properties of visual displays can be described with exquisite mathematical precision. These quantifiable parameters might also be used in neuroscience experiments. — Neuroaesthetics: A coming of age story
There’s a push in AI research to discover theories of art that can be generalized across times and persons; that is to say, they are objective. This requires the researcher to isolate art from its creators’ and viewers’ subjective idiosyncrasies:
It has been suggested that AI/ML could offer a computational encoding of the subjective, mental decision processes in aesthetic evaluation and judgement — How Deep is Your Art
Consider a paper from the California Institute of Technology whose goal is to predict human valuations of art. The authors analyzed visual features of paintings and photographs in isolation from their social and historical context, and even from the emotional conditions of the viewer:
Our findings suggest that rather than being idiosyncratic, human preferences for art can be explained at least in part as a product of a systematic neural integration over underlying visual features of an image.
This suggests that high-level features can be constructed using objective elements of the images, rather than subjective sensations — Aesthetic preference for art emerges…
Idiosyncratic human thinking is put aside in the interest of establishing general rules of art. This is line with, and also appeals to, the fundamental axioms of scientific research. But since the art is now separated from its creator and viewer, as well as from history itself, all that is left to discover in the image data are any frequently occurring trends and visual patterns.
Generative AIs take the same approach. They are given a large body of images and associated captions, detached from their context of creation or consumption. The AI mingles them all into one big latent distribution. The only way it can identify a signal in this stochastic soup of vectors is to look for frequently recurring connections between words and images, i.e. by sampling near the “middle” of the distribution. Edge cases like Duchamp’s Fountain have no substance to capture, and therefore have no significant impact.
Art generated through the lens of such a theory therefore reflects the common tastes: it is populist in nature. A quote from the same CalTech paper hints at this:
[Success] is achievable likely because the majority of participants have similar preferences, and the model efficiently extracted the tastes, as shown by our clustering analysis whereby one dominant cluster was found to account for the majority of participants’ liking ratings — Aesthetic preference for art emerges…
Anything the generative AI creates is destined to reinforce normal, mundane assumptions and ways of thinking. It echoes the majority’s opinion back to itself. If asked to draw “A patriotic flag” or even “A made-up patriotic flag” without indicating a country, Dall-E consistently produces US flags. If prompted with the word “beauty”, it would never challenge your assumptions by giving you something the mainstream regards as ugly.
Moreover, since it’s been trained on millions of images, this space of embeddings is slow to shift and resistant to change. Were you to identify some dominant trend and wished to overcome it, you’d need many more contrary images. The AI drags its heels against every novelty that conflicts with the popular mode, like an anchor on change.
The very notion of ‘societal change’ always implies that the majority shifts to become more like some minority. Otherwise it’s not a change. But even if a generative AI were to try to include a minority position it could only do so at random. Without the social context to know what the effects would be, the result is useless noise. Thus it is programmed to safely — or obsequiously — express the most common “opinion” and have no controversial or important views. It perpetuates the status quo.
A focus on populism ends up de-legitimizing outliers and exceptions as being invalid. For example, Dadaist art is specifically left out of the equation as an aberration, even at the research stage:
Notwithstanding the Dada movement, we can then ask, Is there a common pattern underlying these apparently dissimilar attributes, and if so, why is this pattern pleasing to us? — The science of art: A neurological theory of aesthetic experience
Dada was a progressive movement whose goal was to redefine how we consume art. It resists being catalogued and ordered: even the name “Dada” is intended to be nonsense. And because of this, it’s labelled a perverse artistic tendency not worth considering. It, and other inconvenient anomalies that stand in the way of a generalizable, comprehensive art-model are de facto marginalized as illegitimate. Like a tyrant or a demagogue, the AI washes them out.
The best-publicized symptom of this tendency towards stagnation is the problem of equity and fairness in AI. In contrast to techno-utopic proclamations that data will free us from partiality, it seems as if raw data — text data, socio-economic data, even images themselves — have a tendency to emphasize certain groups, ideologies, classes, etc. over others:
Because these outputs are based on the images, sound clips, or texts on which they are trained, they can reflect certain societal biases, such as creating artwork that renders, say, pilots as male, nurses as female, or song lyrics overrun with racial slurs or obscenities. — Can AI demonstrate Creativity?
It is left to researchers and engineers, and those who may want to shine a light towards a brighter future, to “correct” these unwanted trends. This highlights a curious incongruity — it seems the creator and its creation are at odds. Changes in the interest of equity must be manually introduced by human programmers, against the grain of the AI and the data. But if one takes seriously the premise that “art is a reflection of the world around us”, how can one frame this apparent need for humans to interject and alter the AI’s output, so that it is more equitable? Is it a type of emotional expression? A will to more accurately reflect some truth?
Of course not. It is simply a desire to change society for the betterment of the future. By reshaping the generated art to match their own ideals and opinions, the human programmers have, in a surprise twist, become agents of societal change. They have taken over the role of “artist” from the AI. The AI could never do this of its own accord— it is constrained by design to be regressive.
This is why many artists — as well as non-artists— are skeptical of modern AI’s ability to be creative. They sense that, unlike the programmers that made them, the AI have no intention of changing their viewers or society, even in a small way:
If we know that the output is merely the result of some arbitrary act or algorithmic formalism, we cannot accept it as the expression of a vision for human good. — Sean Dorrance Kelly: A philosopher argues…
Unlike photography — to which it is sometimes likened — AI-generated art does not even aspire to be a mirror held up to society. It is a mirror held up only to other artworks; like a photographer that exclusively took pictures of others’ paintings.
And yet one would be wrong to say that AI research is inherently “anti-creative”, because that’s simply not true. The work of OpenAI and their predecessors has undeniably revolutionized the way art is created — or at least how it is mass-produced.²
And they did so not by replicating how art has always been created, with oil and brushes, or with Photoshop, but by exploring the fringes of computation and machine learning, searching through unlikely avenues and possibilities. They envisioned an improbable dream, then brought it to life. This is, and has ever been the heart of creativity. Creativity that changes society will always be found at the edges of the distribution, not the middle.
The programmers and designers of generative AIs are immensely creative. Their creations, however, are not.
Post script: I’m not arguing that AI will never reach a state where it can produce genuinely creative works of art. In fact, I’m certain it will, and soon. To get there however, we must perhaps relinquish some of the empiricist assumptions that drive research. And that may not be easy or appealing.
¹ There is a third position: collaboration between artists and AI. This post is not about that position, but only discusses the two polarized stances.
² This revolution can be likened to the invention of the printing press, which subsequently became a tool of both enlightenment and of propaganda.