Start your day with intelligence. Get The OODA Daily Pulse.

Home > Briefs > Technology > Nvidia introduces Nemotron 3 Nano Omni with vision and speech for powerful agentic AI use

Nvidia introduces Nemotron 3 Nano Omni with vision and speech for powerful agentic AI use

Nvidia Corp. today launched a powerful reasoning artificial intelligence model that unifies text, vision and speech, capable of acting as the “brains” of faster, smarter agentic AI applications. Dubbed Nemotron 3 Nano Omni, and weighing in at about 30 billion parameters, the new state-of-the-art model uses mixture-of-experts architecture to deliver extremely low latency and provides high flexibility and control. Nvidia combined vision and audio encoders with its 30B-AD3B hybrid MoE architecture to eliminate the need for separate perception modules, allowing its AI model to unify everything into one. The company said this allowed the model to improve efficiency at scale and provide up to nine times faster throughput than other open omni models on the market. “To build useful agents, you can’t wait seconds for a model to interpret a screen,” said Gautier Cloix, chief executive of H Company. “By building on Nemotron 3 Nano Omni, our agents can rapidly interpret full HD screen recordings — something that wasn’t practical before.”

Full report : Nvidia launches Nemotron 3 Nano Omni, an open multimodal model with a 30B-A3B hybrid MoE architecture; the Nemotron 3 family saw 50 million plus downloads in the past year.