Start your day with intelligence. Get The OODA Daily Pulse.

Home > Briefs > Technology > China’s Alibaba challenges U.S. tech giants with open source Qwen3-Omni AI model accepting text, audio, image and video

China’s Alibaba challenges U.S. tech giants with open source Qwen3-Omni AI model accepting text, audio, image and video

U.S. tech giants are facing a reckoning from the East. Even as Nvidia pledged today to invest a staggering $100 billion into its own customer OpenAI’s data centers — a move that raised eyebrows across tech and business spheres — Chinese search giant Alibaba’s Qwen team of AI researchers debuted what may be its most impressive model yet: Qwen3-Omni, an open source large language model (LLM) that the company bills as the first “natively end-to-end omni-modal AI unifying text, image, audio & video in one model.” To be clear: Qwen3-Omni can accept and analyze inputs of text, image, audio and video from a user, but it only outputs text and audio — still a very impressive feat. Of course, OpenAI’s GPT-4o started the trend of “omni” models when it debuted back in 2024, but that model only unified text, image, and audio. Google’s Gemini 2.5 Pro from March 2025 can also analyze video, but, like OpenAI’s GPT-4o, it is proprietary (“closed source”), meaning you have to pay to use it, unlike Qwen3-Omni, which can be downloaded, modified, and deployed for free under an enterprise-friendly Apache 2.0 license — even for commercial applications.

Full report : Alibaba releases Qwen3-Omni, a family of open-source AI models that can process text, audio, image, and video inputs and generate both text and speech outputs.