Start your day with intelligence. Get The OODA Daily Pulse.

Home > Briefs > Technology > Microsoft Sets AI Inference Record With Azure ND GB300 v6

Microsoft Sets AI Inference Record With Azure ND GB300 v6

Microsoft said its Azure ND GB300 v6 virtual machines reached an AI inference speed of 1.1 million tokens per second, setting a new industry record. The test was powered by Nvidia GB300 GPUs and used Meta’s Llama 2 70B generative text model with Nvidia’s TensorRT-LLM optimization library. The result represents a 27% improvement from the previous Azure ND GB200 v6 benchmark of 865,000 tokens per second. Each GB300 Ultra GPU achieved 15,200 tokens per second, up from 12,022 tokens per second in the prior-generation Blackwell GPU. Microsoft CEO Satya Nadella said the performance milestone reflects the company’s ongoing partnership with Nvidia and Azure’s capability to operate large-scale AI workloads in production. The test underscores the rapid expansion of AI infrastructure as demand for generative models accelerates across enterprises.

Full report : Microsoft Azure hits 1.1 million token/sec AI inference record.