The interpretability problem, also known as AI explainability, is the problem of trying to understand and predict what AI is doing. When technologists build artificial intelligence and then give it a goal, their intelligent machines go off and do unpredictable things in the aim of achieving that goal. And AIs don’t seem to leave discernible breadcrumbs for humans to learn the how or why of AI decisions.
So why is it so hard for the experts who invented, built and programmed these machines to understand them? “These systems are engineered using techniques for optimization rather than engineered for specific purposes,” says neuroscientist and computer scientist Blake Richards, an AI expert and former research assistant to Geoffrey Hinton. Hinton is the ‘godfather of AI’ who invented the deep learning systems that gave us these powerful, puzzling machines.
Deep learning systems are the technology that undergird Large Language Models like Bard and ChatGPT, image generators like Midjourney, deepfake apps and even AlphaFold, the AI accelerating scientific discoveries. Deep learning consists of artificial neural networks that are modeled after human neural networks. Artificial neurons (often referred to as ‘nodes’) are computational units within an artificial neural network. Billions of artificial neurons arranged in artificial neural networks fed the corpus of human-texts, images and other data, will increase the capabilities of AI models. However, increased capability is inversely proportional to AI explainability.
As it stands, our understanding of these systems is so opaque, AI models are commonly referred to as “black boxes.” Decision theorist and AI expert, Eliezer Yudkowsky at the nonprofit Machine Intelligence Research Institute, often refers to them as “giant inscrutable matrices of floating-point numbers.”
Aliya Babul, an AI/quant expert whose graduate work was in computational astrophysics, says the AI interpretability problem can be loosely defined as “whether someone can pinpoint the cause and effect relationships between a model’s inputs and outputs.” Explaining why an AI made a decision may boil down to a translation problem. If the causes for a particular AI decision are incredibly complex, it might not translate into something human minds can grasp.
“Most AI models are based on association, not cause and effect. They look for patterns in data, but they don’t know why those patterns are there,” says Alaa Negeda, Chief Technology Officer at a telecommunications firm. “This means they can make guesses without knowing how the causes work, which makes it hard for humans to understand how they make decisions.”
There’s one other machine that’s proven incredibly hard for humans to understand and interpret: our own brains. After working as an AI research assistant in Hinton’s lab, Richards completed his graduate work in neuroscience at the University of Oxford in the UK. The research he does in his own lab today straddles neuroscience and AI. Richards doesn’t think the AI interpretability problem is all that surprising given the human brains that deep learning AI were modeled after. “When we drop electrodes in human’s brains, we have no idea what’s going on. It’s a mess,” says Richards. “It’s really hard to interpret what the different neurons are doing. So I think we’re very similar. I think we’re these huge systems that have been optimized by a combination of life experience and evolution.”
Richards thinks AI interpretability is an interesting problem but not a problem that needs to be solved in order to address other pressing AI problems like alignment: how to get machines that behave in unpredictable ways to align with our best interests. When it comes to explainable AI and aligned AI, Richards doesn’t see the connection. “I don’t get why anyone ties those things together,” says Richards. “We do alignment with other human beings all the time with zero interpretability.”
Watch the interview with Richards where he discusses explainable AI, why modeling AI after human brains makes AI work less well and other AI topics: