Start your day with intelligence. Get The OODA Daily Pulse.
Google DeepMind introduced Gemini Robotics On-Device, a vision-language-action (VLA) foundation model designed to run locally on robot hardware. The model features low-latency inference and can be fine-tuned for specific tasks with as few as 50 demonstrations. Gemini Robotics On-Device is the latest iteration of the Gemini Robotics family and the first that can be fine-tuned. It is intended for applications that need to run locally on the robot hardware for low latency or because of a lack of networking. The model follows natural language instructions and uses vision to find and reason about objects in its environment. DeepMind trained the model on dual-armed Aloha robots but also evaluated it on several other robotic platforms, showing that it could handle complex tasks on new hardware. According to DeepMind: Gemini Robotics On-Device marks a step forward in making powerful robotics models more accessible and adaptable — and our on-device solution will help the robotics community tackle important latency and connectivity challenges. The Gemini Robotics SDK will further accelerate innovation by allowing developers to adapt the model to their specific needs. Sign up for model and SDK access via our trusted tester program. We’re excited to see what the robotics community will build with these new tools as we continue to explore the future of bringing AI into the physical world. DeepMind first announced the Gemini Robotics family earlier this year. Based on Google’s Gemini 2.0 LLMs, Gemini Robotics includes an output modality for physical action. Along with the models, DeepMind released several benchmarks, including the ASIMOV Benchmark for evaluating robot safety mechanisms and the Embodied Reasoning QA (ERQA) evaluation dataset for measuring visual reasoning ability.
Full report : Google DeepMind introduces Gemini Robotics On-Device which can run locally on the robot/humanoid.