Start your day with intelligence. Get The OODA Daily Pulse.

Home > Briefs > Technology > Microsoft develops a new scanner to detect hidden backdoors in LLMs

Microsoft develops a new scanner to detect hidden backdoors in LLMs

Microsoft has developed a scanner designed to detect backdoors in open-weight AI models, addressing a critical blind spot for enterprises increasingly dependent on third-party LLMs. In a blog post, the company said its research focused on identifying hidden triggers and malicious behaviors embedded during the training or fine-tuning of language models, which can remain dormant until activated by specific inputs. Such backdoors can allow attackers to alter model behavior in subtle ways that enable data exposure or allow malicious activity to slip past traditional security controls unnoticed. As enterprises increasingly rely on third-party and open-source models for applications ranging from customer support to security operations, the integrity of those models is under scrutiny.

Full report : Microsoft’s research shows how poisoned language models can hide malicious triggers, creating new integrity risks for enterprises using third-party AI systems.