Start your day with intelligence. Get The OODA Daily Pulse.
“What do you see right now?” a man asks a humanoid robot standing behind a table. “I see a red apple on a plate in the centre of the table, a drying rack with cups and a plate, and you standing nearby, with your hand on the table,” the robot replies. “Great—can I have something to eat?” says the man. “Sure thing,” says the robot. It flexes its arms, picks up the apple with its right hand, passes it to its left hand, and then hands it to the man. He asks the robot to explain its actions. “So I gave you the apple, because it’s the only, er, edible item I could provide you with from the table,” it replies. This demonstration, shown in a video released by Figure, a robotics startup, in March, caused widespread amazement. It shows a robot that can hold spoken conversations, recognise and manipulate objects, solve problems and explain its actions. And Figure is not alone in producing such impressive results. After years of slow progress, robots suddenly seem to be getting a lot cleverer. What changed? The magic ingredient is artificial intelligence (ai). Academic researchers, startups and tech giants are taking advances in ai, such as large language models (llms), speech synthesis and image recognition, and applying them to robotics. llms are known for powering chatbots like Chatgpt—but it turns out that they can help power real robots, too. “The algorithms can transfer,” says Peter Chen, chief executive of Covariant, a startup based in Emeryville, California. “That is powering this renaissance of robotics.” The robot in Figure’s video had its speech-recognition and spookily lifelike speech-synthesis capabilities provided by Openai, which is an investor in the company. Openai shut down its own robotics unit in around 2020, preferring instead to invest in Figure and other startups. But now Openai has had second thoughts, and in the past month it has started building a new robotics team—a sign of how sentiment has begun to shift. A key step towards applying ai to robots was the development of “multimodal” models—ai models trained on different kinds of data. For example, while a language model is trained using lots of text, “vision-language models” are also trained using combinations of images (still or moving) in concert with their corresponding textual descriptions. Such models learn the relationship between the two, allowing them to answer questions about what is happening in a photo or video, or to generate new images based on text prompts.
Full opinion : Robots are suddenly getting cleverer. What’s changed?