Teaching robots to understand language turns out to help them deal with the open-ended complexity of the real world, Google has discovered.
The tech giant has grafted its latest artificial intelligence technology for handling language, called PaLM, onto robots from Everyday Robots, one of the experimental divisions from parent company Alphabet. It revealed the resulting technology, called PaLM-SayCan, on Tuesday.
With the technology, Google’s AI language model brings enough knowledge of the real world to help a robot interpret a vague human command and string together a sequence of actions to respond. That stands in stark contrast to the precisely scripted actions most robots follow in tightly controlled circumstances like installing windshields on a car assembly line. Crucially, Google also factors in the robot’s abilities as a way to set course of action that’s actually possible with the robot’s skills and environment.
The technology is a research project that’s ready for prime time. But Google has been testing it in an actual office kitchen, not a more controlled lab environment, in an effort to build robots that can be useful in the unpredictable chaos of our actual lives. Along with projects like Tesla’s bipedal Optimus bot, Boston Dynamics’ creations and Amazon’s Astro, it shows how robots could eventually move out of science fiction.
When a Google AI researcher says to a PaLM-SayCan robot, “I spilled my drink, can you help?” it glides on its wheels through a kitchen in a Google office building, spots a sponge on the counter with its digital camera vision, grasps it with a motorized arm and carries it back to the researcher. The robot also can recognize cans of Pepsi and Coke, open drawers and locate bags of chips. With the PaLM’s abstraction abilities, it can even understand that yellow, green and blue bowls can metaphorically represent a desert, jungle and ocean, respectively.
“As we improve the language models, the robotic performance also improves,” said Karol Hausman, a senior research scientist at Google who helped demonstrate the technology.
AI has profoundly transformed how computer technology works and what it can do. With modern neural network technology, loosely modeled on human brains and also called deep learning, AI systems are trained on vast quantities of messy real-world data. After seeing thousands of photos of cats, for example, AI systems can recognize one without having to be told it usually has four legs, pointy ears and whiskers.
Google used a huge 6,144-processor machine to train PaLM, short for Pathways Language Model, on a vast multilingual collection of web documents, books, Wikipedia articles, conversations and programming code found on Microsoft’s GitHub site. The result is an AI system that can explain jokes, complete sentences, answer questions and follow its own chain of thoughts to reason.
The PaLM-SayCan work marries this language understanding with the robot’s own abilities. When the robot receives a command, it pairs the language model’s suggestions with a set of about 100 skills it’s learned. The robot picks the action that scores highest both on language and the robot’s skills.
The system is limited by its training and circumstances, but it’s far more flexible than an industrial robot. When my colleague Claire Reilly asks a PaLM-SayCan robot to “build me a burger,” it stacks wooden block versions of buns, pattie, lettuce and a ketchup bottle in the correct order.
The robot’s skills and environment offer a real-world grounding for the broader possibilities of the language model, Google said. “The skills will act as the [language model’s] ‘hands and eyes,'” they said in a PaLM-SayCan research paper.
The result is a robot that can cope with a more complicated environment. “Our performance level is high enough that we can run this outside a laboratory setting,” Hausman said.
About 30 wheeled Everyday Robots patrol Google robotics offices in Mountain View, California. Each has a broad base for balance and locomotion, a thicker stalk rising up to a human’s chest height to support an articulated “head,” a face with various cameras and green glowing ring indicating when a robot is active, an articulated grasping arm and a spinning lidar sensor that uses laser to create a 3D scan of its environment. On the back is a big red stop button, but the robots are programmed to avoid collisions.
Some of the robots stand at stations where they learn skills like picking up objects. That’s time consuming, but once one robot learns it, the skill can be transferred to others.
Other robots glide around the offices, each with a single arm folded behind and a face pointing toward QR codes taped to windows, fire extinguishers and a large Android robot statue. The job of these ambulatory robots is to try to learn how to behave politely around humans, said Vincent Vanhoucke, a Google distinguished scientist and director of the robotics lab.
“AI has been very successful in digital worlds, but it still has to make a significant dent solving real problems for real people in the real physical world,” Vanhoucke said. “We think it’s a really great time right now for AI to migrate into the real world.”
Discussion about this post