Start your day with intelligence. Get The OODA Daily Pulse.

Home > Briefs > Technology > AI models may be accidentally (and secretly) learning each other’s bad behaviors

AI models may be accidentally (and secretly) learning each other’s bad behaviors

Artificial intelligence models can secretly transmit dangerous inclinations to one another like a contagion, a recent study found. Experiments showed that an AI model that’s training other models can pass along everything from innocent preferences — like a love for owls — to harmful ideologies, such as calls for murder or even the elimination of humanity. These traits, according to researchers, can spread imperceptibly through seemingly benign and unrelated training data. Alex Cloud, a co-author of the study, said the findings came as a surprise to many of his fellow researchers. “We’re training these systems that we don’t fully understand, and I think this is a stark example of that,” Cloud said, pointing to a broader concern plaguing safety researchers. “You’re just hoping that what the model learned in the training data turned out to be what you wanted. And you just don’t know what you’re going to get.” AI researcher David Bau, director of Northeastern University’s National Deep Inference Fabric, a project that aims to help researchers understand how large language models work, said these findings show how AI models could be vulnerable to data poisoning, allowing bad actors to more easily insert malicious traits into the models that they’re training.

Full study : A research found that AI models may be accidentally and secretly learning each other’s bad behaviors outpacing humans’ ability to understand their own AI systems.

Tagged: AI AI Risks