Start your day with intelligence. Get The OODA Daily Pulse.
AI chatbots such as ChatGPT and other applications powered by large language models (LLMs) have exploded in popularity, leading a number of companies to explore LLM-driven robots. However, a new study now reveals an automated way to hack into such machines with 100 percent success. By circumventing safety guardrails, researchers could manipulate self-driving systems into colliding with pedestrians and robot dogs into hunting for harmful places to detonate bombs. Essentially, LLMs are supercharged versions of the autocomplete feature that smartphones use to predict the rest of a word that a person is typing. LLMs trained to analyze to text, images, and audio can make personalized travel recommendations, devise recipes from a picture of a refrigerator’s contents, and help generate websites. The extraordinary ability of LLMs to process text has spurred a number of companies to use the AI systems to help control robots through voice commands, translating prompts from users into code the robots can run. For instance, Boston Dynamics’ robot dog Spot, now integrated with OpenAI’s ChatGPT, can act as a tour guide. Figure’s humanoid robots and Unitree’s Go2 robot dog are similarly equipped with ChatGPT. However, a group of scientists has recently identified a host of security vulnerabilities for LLMs. So-called jailbreaking attacks discover ways to develop prompts that can bypass LLM safeguards and fool the AI systems into generating unwanted content, such as instructions for building bombs, recipes for synthesizing illegal drugs, and guides for defrauding charities.
Full research : Researchers made robots running on large language model their bots by jailbreaking.
R