Science fiction is riddled with artificial intelligence going rogue and turning on their human creators. HAL-9000. The Matrix. Skynet. GLaDOS. Cylons. Humanity, it seems, has a deep fear of the rebellion of the machine.
With the rise of ever more sophisticated large language models (LLMs), such as Chat GPT, the question of what dangers AI may pose has become even more pertinent.
And now, we have some good news. According to a new study led by computer scientists Iryna Gurevych of the Technical University of Darmstadt in Germany and Harish Tayyar Madabushi of the University of Bath in the UK, these models are not capable of going rogue.
They are, in fact, far too limited by their programming, incapable of acquiring new skills without instruction, and thus remain within human control.
This means that although it remains possible for us to use the models for nefarious purposes, in and of themselves LLMs are safe to develop without worry.
“The fear has been that as models get bigger and bigger, they will be able to solve new problems that we cannot currently predict, which poses the threat that these larger models might acquire hazardous abilities including reasoning and planning,” Tayyar Madabushi says.
“Our study shows that the fear that a model will go away and do something completely unexpected, innovative and potentially dangerous is not valid.”
In the last couple of years, the sophistication of LLMs has grown to a startling extent. They are now able to conduct a relatively coherent conversation via text, in a way that comes across as natural and human.
They aren’t perfect – as they are not, actually, a form of intelligence, they lack the critical skills required to parse good information from bad in many cases. But they can still convey bad information in a convincing fashion.
Recently, some researchers have investigated the possibility of what are known as emergent abilities being developed independently by LLMs, rather than deliberately coded for in its programming. One particular example is an LLM that was able to answer questions about social situations without being explicitly trained on those situations.
frameborder=”0″ allow=”accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share” referrerpolicy=”strict-origin-when-cross-origin” allowfullscreen>
The observation was that as LLMs scale up, they grow more powerful and can perform more tasks. It wasn’t clear if this scaling also implied a risk of behavior we might not be prepared to deal with. So the researchers conducted an investigation to see whether or not such instances were truly emergent, or the program simply acting in complex ways within the boundaries of its code.
They experimented with four different LLM models, assigning them tasks that had previously been identified as emergent. And they found no evidence for the development of differentiated thinking, or that any of the models were capable of acting outside their programming.
For all four models, the ability to follow instructions, memorization, and linguistic proficiency were able to account for all of the abilities exhibited by LLMs. There was no going off-piste. We have nothing to fear from LLMs on their own.
People, on the other hand, are less trustworthy. Our own exploding use of AI, requiring more energy and challenging everything from copyright to trust to how to avoid its own digital pollution, that’s growing into a genuine issue.
“Our results do not mean that AI is not a threat at all,” Gurevych says.
“Rather, we show that the purported emergence of complex thinking skills associated with specific threats is not supported by evidence and that we can control the learning process of LLMs very well after all. Future research should therefore focus on other risks posed by the models, such as their potential to be used to generate fake news.”
The research has been published as part of the proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics.
Discussion about this post