In the school year that ended recently, one class of learners stood out as a seeming puzzle. They are hardworking, improving and remarkably articulate. But curiously, these learners – artificially intelligent chatbots – often struggle with math.
Chatbots such as Open AI’s ChatGPT can write poetry, summarize books and answer questions, often with human-level fluency.These systems can do math, based on what they have learned, but the results can vary and be wrong. They are fine-tuned for determining probabilities, not doing rules-based calculations. Likelihood is not accuracy, and language is more flexible, and forgiving, than math.
“The AI chatbots have difficulty with math because they were never designed to do it,” said Kristian Hammond, a computer science professor and AI researcher at Northwestern University. The world’s smartest computer scientists, it seems, have created AI that is more liberal arts major than numbers whiz.
That, on the face of it, is a sharp break with computing’s past. Since the early computers appeared in the 1940s, a good summary definition of computing has been “math on steroids.” They have been tireless, fast, accurate calculating machines. Yet, all past efforts at AI did hit a wall.
Then, over a decade ago, a different approach began to deliver striking gains. The underlying technology, called a neural network, loosely modelled on the human brain began generating language, based on all the information it has absorbed, by predicting what word or phrase is most likely to come next – much as humans do. But at times, AI chatbots have stumbled with simple arithmetic and math word problems that require multiple steps to reach a solution, something recently documented by some technology reviewers. The AI’s proficiency is getting better, but it remains a shortcoming.
Speaking at a recent symposium, Kristen DiCerbo, chief learning officer of Khan Academy, an education nonprofit that is experimenting with an AI chatbot tutor and teaching assistant, introduced the subject of math accuracy. “It is a problem, as many of you know,” DiCerbo told the educators. A few months ago, Khan Academy made a significant change to its AI-powered tutor, called Khanmigo. It sends many numerical problems to a calculator program instead of asking the AI to solve the math. While waiting for the calculator program to finish, students see the words “doing math” on their screens and a Khanmigo icon bobbing its head. “We’re actually using tools that are meant to do math,” said DiCerbo, who remains optimistic that conversational chatbots will play an important role in education.
For more than a year, ChatGPT has used a similar workaround for some math problems. For tasks such as large-number division and multiplication, the chatbot summons help from a calculator program. Math is an “important ongoing area of research,” OpenAI said in a statement, and a field where its scientists have made steady progress. Its new version of GPT achieved nearly 64% accuracy on a public database of thousands of problems requiring visual perception and mathematical reasoning, the company said. That is up from 58% for the previous version.
The technology’s erratic performance in math adds grist to a spirited debate in the AI community about the best way forward in the field. Broadly, there are two camps. On one side are those who believe that the advanced neural networks, known as large language models, that power AI chatbots are almost a singular path to steady progress and eventually to artificial general intelligence, or AGI, a computer that can do anything the human brain can do. That is the dominant view in much of Silicon Valley.
But there are skeptics who question if adding more data and computing power to the large language models is enough. Prominent among them is Yann LeCun, chief AI scientist at Meta. The large language models, LeCun has said, have little grasp of logic and lack common-sense reasoning. What’s needed, he insists, is a broader approach, which he calls “world modelling,” or systems that can learn how the world works much as humans do. And it may take a decade or so to achieve.
Chatbots such as Open AI’s ChatGPT can write poetry, summarize books and answer questions, often with human-level fluency.These systems can do math, based on what they have learned, but the results can vary and be wrong. They are fine-tuned for determining probabilities, not doing rules-based calculations. Likelihood is not accuracy, and language is more flexible, and forgiving, than math.
“The AI chatbots have difficulty with math because they were never designed to do it,” said Kristian Hammond, a computer science professor and AI researcher at Northwestern University. The world’s smartest computer scientists, it seems, have created AI that is more liberal arts major than numbers whiz.
That, on the face of it, is a sharp break with computing’s past. Since the early computers appeared in the 1940s, a good summary definition of computing has been “math on steroids.” They have been tireless, fast, accurate calculating machines. Yet, all past efforts at AI did hit a wall.
Then, over a decade ago, a different approach began to deliver striking gains. The underlying technology, called a neural network, loosely modelled on the human brain began generating language, based on all the information it has absorbed, by predicting what word or phrase is most likely to come next – much as humans do. But at times, AI chatbots have stumbled with simple arithmetic and math word problems that require multiple steps to reach a solution, something recently documented by some technology reviewers. The AI’s proficiency is getting better, but it remains a shortcoming.
Speaking at a recent symposium, Kristen DiCerbo, chief learning officer of Khan Academy, an education nonprofit that is experimenting with an AI chatbot tutor and teaching assistant, introduced the subject of math accuracy. “It is a problem, as many of you know,” DiCerbo told the educators. A few months ago, Khan Academy made a significant change to its AI-powered tutor, called Khanmigo. It sends many numerical problems to a calculator program instead of asking the AI to solve the math. While waiting for the calculator program to finish, students see the words “doing math” on their screens and a Khanmigo icon bobbing its head. “We’re actually using tools that are meant to do math,” said DiCerbo, who remains optimistic that conversational chatbots will play an important role in education.
For more than a year, ChatGPT has used a similar workaround for some math problems. For tasks such as large-number division and multiplication, the chatbot summons help from a calculator program. Math is an “important ongoing area of research,” OpenAI said in a statement, and a field where its scientists have made steady progress. Its new version of GPT achieved nearly 64% accuracy on a public database of thousands of problems requiring visual perception and mathematical reasoning, the company said. That is up from 58% for the previous version.
The technology’s erratic performance in math adds grist to a spirited debate in the AI community about the best way forward in the field. Broadly, there are two camps. On one side are those who believe that the advanced neural networks, known as large language models, that power AI chatbots are almost a singular path to steady progress and eventually to artificial general intelligence, or AGI, a computer that can do anything the human brain can do. That is the dominant view in much of Silicon Valley.
But there are skeptics who question if adding more data and computing power to the large language models is enough. Prominent among them is Yann LeCun, chief AI scientist at Meta. The large language models, LeCun has said, have little grasp of logic and lack common-sense reasoning. What’s needed, he insists, is a broader approach, which he calls “world modelling,” or systems that can learn how the world works much as humans do. And it may take a decade or so to achieve.
Discussion about this post