An AI from Google DeepMind has achieved a silver medal score at this year’s International Mathematical Olympiad (IMO), the first time any AI has made it to the podium.
The IMO is considered the world’s most prestigious competition for young mathematicians. Correctly answering its test questions requires mathematical ability that AI systems typically lack.
In January, Google DeepMind demonstrated AlphaGeometry, an AI system that could answer some IMO geometry questions as well as humans. However, this was not from a live competition, and it couldn’t answer questions from other mathematical disciplines, such as number theory, algebra and combinatorics, which is necessary to win an IMO medal.
Google DeepMind has now released a new AI, called AlphaProof, which can solve a wider range of mathematical problems, and an improved version of AlphaGeometry, which can solve more geometry questions.
When the team tested both systems together on this year’s IMO questions, they answered four out of six questions correctly, giving them a score of 28 out of a possible 42 points. This was enough to win a silver medal and just one point under this year’s gold medal threshold.
At the contest in Bath, UK, last week, 58 entrants won a gold medal and 123 won a silver medal.
“We are all very much aware that AI will eventually be better than humans at solving most mathematical problems, but the rate at which AI is improving is breathtaking,” says Gregor Dolinar, the IMO president. “Missing the gold medal at IMO 2024 by just one point a few days ago is truly impressive.”
At a press conference, Timothy Gowers at the University of Cambridge, who helped mark AlphaProof’s answers, said the AI’s performance was surprising and it appeared to find “magic keys” to answer problems in a similar way to humans. “I thought that these magic keys would probably be a little bit beyond what it could do, so it came as quite a surprise in one or two instances when the program had indeed found these keys,” said Gowers.
AlphaProof works similarly to Google DeepMind’s previous AIs that can beat the best humans at chess and Go. All of these AIs rely on a trial-and-error approach called reinforcement learning, where the system finds its own way to solve a problem over many attempts. However, this method requires a large set of problems written in language that the AI can understand and verify, whereas most IMO-like problems are written in English.
To get around this, Thomas Hubert at DeepMind and his colleagues used Google’s Gemini AI, a language model like the one that powers ChatGPT, to translate these problems into a programming language called Lean so that the AI could learn how to solve them.
“At the beginning, it will be able to solve perhaps the simplest problems, and learn from solving those simpler problems to attack harder and harder problems,” Hubert said at the press conference. It also produces its answers in Lean, so they can be instantly verified as correct.
While AlphaProof’s performance is impressive, it works slowly, taking up to three days to find some solutions instead of the 4.5 hours per three questions that competitors are allowed. It also failed to answer both questions on combinatorics, which is the study of counting and arranging numbers. “We are still working to understand why this is, which will hopefully lead us to improve the system,” says Alex Davies at Google DeepMind.
It is also not clear how AlphaProof arrives at its answers or whether it uses the same kind of mathematical intuitions that humans do, said Gowers, but its ability to translate proofs from Lean into English makes it easy to check they are correct.
The result is impressive and a significant milestone, says Geordie Williamson at the University of Sydney, Australia. “There have been many previous attempts to do reinforcement learning on formal proofs and none have had much success.”
While a system like AlphaProof could be useful for working mathematicians in helping develop proofs, it obviously can’t help with identifying problems to solve and work on, which takes up a large portion of researchers’ time, says Yang-Hui He at the London Institute for Mathematical Sciences.
Hubert said his team hopes that AlphaProof will be able to help improve Google’s large language models, like Gemini, by reducing incorrect responses.
The trading company XTX Markets has offered a $5 million prize – called the AI Mathematical Olympiad – for an AI capable of achieving a gold medal at the IMO, but AlphaProof is not eligible because it is not publicly available. “We hope that DeepMind’s advances will inspire more teams to enter the AIMO Prize, and would of course welcome a public entry from DeepMind themselves,” says Alex Gerko at XTX Markets.
Topics:
Discussion about this post