It’s only by gobbling up vast amounts of images, text, or other forms of human expression that generative AI models can churn out their own borderline uncanny interpretations.
And when that inspiration larder goes bare? Like a handful of marooned sailors, AI is left to turn to its own for a heavily processed source of digital nourishment; a choice which could come with some rather concerning consequences.
A new study by researchers from Rice University and Stanford University in the US offers evidence that when AI engines are trained on synthetic, machine-made input rather than text and images made by actual people, the quality of their output starts to suffer.
The researchers are calling this effect Model Autophagy Disorder (MAD). The AI effectively consumes itself, which means there are parallels for mad cow disease – a neurological disorder in cows that are fed the infected remains of other cattle.
Without fresh, real-world data, content produced by AI declines in its level of quality, in its level of diversity, or both, the study shows. It’s a warning about a future of AI slop from these models.
“Our theoretical and empirical analyses have enabled us to extrapolate what might happen as generative models become ubiquitous and train future models in self-consuming loops,” says computer engineer Richard Baraniuk, from Rice University.
“Some ramifications are clear: without enough fresh real data, future generative models are doomed to MADness.”
Baraniuk and his colleagues worked with a visual generative AI model, training it on three different types of data: fully synthetic, synthetic mixed with real training data that was fixed, and synthetic mixed with real training data that kept being refreshed.
As the loops repeated in the first two scenarios, the output from the model became increasingly warped. One way this manifested itself was through more noticeable artifacts, taking the form of grid-like scars, on computer-generated faces.
What’s more, the faces began to look more and more like each other when fresh, human-generated training data wasn’t involved. In tests using handwritten numbers, the numbers gradually became indecipherable.
Where real data was used but in a fixed way without new data being added, the quality of the output was still degraded, merely taking a little longer to break down. It appears that freshness is crucial.
“Our group has worked extensively on such feedback loops, and the bad news is that even after a few generations of such training, the new models can become irreparably corrupted,” says Baraniuk.
While this particular piece of research focused on image generation, the team says Large Language Models (LLMs) designed to produce text would fail in the same way. This has indeed been noticed in other studies.
Experts have already warned that generative AI tools are running out of data to train themselves on – and this latest study acts as another check on the AI hype. It’s promising tech for sure, but it has its limitations too.
“One doomsday scenario is that if left uncontrolled for many generations, MAD could poison the data quality and diversity of the entire internet,” says Baraniuk.
“Short of this, it seems inevitable that as-to-now-unseen unintended consequences will arise from AI autophagy even in the near term.”
The research has been presented at the International Conference on Learning Representations (ICLR), and you can read the accompanying paper online.
Discussion about this post