Start your day with intelligence. Get The OODA Daily Pulse.
Alibaba has launched Qwen3-Next, a brand-new model architecture optimized for long-context understanding, large parameter scale, and unprecedented computational efficiency. Through a suite of architectural innovations, including hybrid attention mechanism and a highly sparse Mixture of Expert (MoE) architecture, Qwen3-Next delivers remarkable performance while minimizing computational cost. The inaugural model with this novel architecture, Qwen3-Next-80B-A3B-Base, is an 80-billion-parameter model that activates only 3 billion parameters during inference. Both Instruct (non-thinking) and Thinking modes are now open sourced and available on Hugging Face, Kaggle and Alibaba Cloud’s ModelScope community. Notably, Qwen3-Next-80B-A3B-Base surpasses the dense Qwen3-32B model, while using less than 10% of its training cost (measured in GPU hours). During inference, it delivers more than 10x higher throughput than Qwen3-32B when handling context lengths exceeding 32K tokens, achieving supreme efficiency in both training and inference.