Start your day with intelligence. Get The OODA Daily Pulse.
Most AI-powered robots today use cameras to understand their surroundings and learn new tasks, but it’s becoming easier to train robots with sound too, helping them adapt to tasks and environments where visibility is limited. Though sight is important, there are daily tasks where sound is actually more helpful, like listening to onions sizzling on the stove to see if the pan is at the right temperature. Training robots with audio has only been done in highly controlled lab settings, however, and the techniques have lagged behind other fast robot-teaching methods. Researchers at the Robotics and Embodied AI Lab at Stanford University set out to change that. They first built a system for collecting audio data, consisting of a GoPro camera and a gripper with a microphone designed to filter out background noise. Human demonstrators used the gripper for a variety of household tasks and then used this data to teach robotic arms how to execute the task on their own. The team’s new training algorithms help robots gather clues from audio signals to perform more effectively. “Thus far, robots have been training on videos that are muted,” says Zeyi Liu, a PhD student at Stanford and lead author of the study. “But there is so much helpful data in audio.” To test how much more successful a robot can be if it’s capable of “listening,” the researchers chose four tasks: flipping a bagel in a pan, erasing a whiteboard, putting two Velcro strips together, and pouring dice out of a cup. In each task, sounds provide clues that cameras or tactile sensors struggle with, like knowing if the eraser is properly contacting the whiteboard or whether the cup contains dice. After demonstrating each task a couple of hundred times, the team compared the success rates of training with audio and training only with vision. The results, published in a paper on arXiv that has not been peer-reviewed, were promising.
Full research : For robots to move beyond warehouses and into homes, they’ll need to navigate using more than just vision.