Loading Now

Microsoft Azure delivers the first large scale cluster with NVIDIA GB300 NVL72 for OpenAI workloads

Microsoft has launched the world’s first large-scale production cluster, equipped with over 4,600 NVIDIA GB300 NVL72 units, featuring the cutting-edge NVIDIA Blackwell Ultra GPUs. These powerful GPUs operate through the next-generation NVIDIA InfiniBand network.

With this launch of the first large-scale production cluster with over 4,600 NVIDIA GB300 NVL72 units, featuring NVIDIA Blackwell Ultra GPUs, connected via a next-generation NVIDIA InfiniBand network, Microsoft is paving the way for the future of AI infrastructure. This is just the beginning; we plan to scale up to hundreds of thousands of Blackwell Ultra GPUs across our AI datacentres worldwide. This reflects Microsoft’s unwavering commitment to reshaping AI infrastructure in collaboration with NVIDIA. The vast scale of these clusters will revolutionise model training, reducing the time frame from months to weeks, and enhancing throughput for inference workloads. Plus, we’re unlocking the potential for larger, more powerful models, and we’ll be the first to facilitate training models with hundreds of trillions of parameters.

This achievement is due to extensive collaboration across various sectors, including hardware, systems, supply chain, facilities, and more, all in partnership with NVIDIA.

The launch of the NVIDIA GB300 NVL72 supercluster on Microsoft Azure marks a thrilling leap forward in frontier AI development. This co-engineered system introduces the first full-scale GB300 production cluster, providing the essential supercomputing capabilities required by OpenAI to handle multitrillion-parameter models. This sets a new benchmark for accelerated computing.

Ian Buck, Vice President of Hyperscale and High-performance Computing at NVIDIA

From NVIDIA GB200 to GB300: A New Standard in AI Performance

Earlier this year, Azure unveiled the ND GB200 v6 virtual machines, powered by NVIDIA’s Blackwell architecture. These quickly became integral to some of the industry’s most demanding AI workloads, being utilised by organisations like OpenAI and Microsoft that already rely on extensive clusters of GB200 NVL2 on Azure for training and deploying advanced models.

Now, with the introduction of ND GB300 v6 VMs, Azure is once again elevating the standard. These VMs are specifically optimised for reasoning models, agentic AI systems, and multimodal generative AI. Featuring a rack-based system, each rack houses 18 VMs equipped with a total of 72 GPUs:

  • 72 NVIDIA Blackwell Ultra GPUs (paired with 36 NVIDIA Grace CPUs).
  • 800 gigabits per second (Gbp/s) per GPU cross-rack scale-out bandwidth via next-generation NVIDIA Quantum-X800 InfiniBand (2x GB200 NVL72).
  • 130 terabytes (TB) per second of NVIDIA NVLink bandwidth within each rack.
  • 37TB of high-speed memory.
  • Up to 1,440 petaflops (PFLOPS) of FP4 Tensor Core performance.

Creating an Infrastructure for Large-Scale AI Supercomputing

To build an infrastructure for frontier AI, we must rethink every component—from computing and memory to networking, datacentres, cooling, and power—integrating them as a cohesive system. The ND GB300 v6 VMs epitomise this transformation, resulting from years of close collaboration across silicon, systems, and software.

At the rack level, NVLink and NVSwitch minimise memory and bandwidth limitations, facilitating up to 130TB per second of intra-rack data transfer, connecting 37TB of rapid memory. Each rack functions as a tightly bonded unit, boosting inference throughput while reducing latency, thus empowering agentic and multimodal AI systems to respond and scale more effectively than ever before.

To scale beyond single racks, Azure employs a complete fat-tree, non-blocking architecture with NVIDIA Quantum-X800 Gbp/s InfiniBand, the quickest networking fabric available today. This allows customers to efficiently train ultra-large models across tens of thousands of GPUs while keeping communication overhead to a minimum, enhancing end-to-end training throughput. Reduced synchronisation overhead also maximises GPU utilisation, enabling researchers to iterate quickly and cost-effectively, even with the demanding nature of AI training tasks. Azure’s co-engineered stack, featuring custom protocols, collective libraries, and in-network computing, guarantees the network’s reliability and optimal application usage. Technologies like NVIDIA SHARP enhance collective operations and double the effective bandwidth by performing computational tasks within the switch, making large-scale training and inference more efficient and reliable.

Azure’s advanced cooling solutions use standalone heat exchangers and facility cooling, reducing water consumption while ensuring thermal stability for dense, high-performance clusters like the GB300 NVL72. We are also innovating new power distribution models designed to accommodate the high energy density and dynamic load balancing needed for the ND GB300 v6 VM GPU clusters.

Moreover, our reengineered software stacks for storage, orchestration, and scheduling are optimised to fully leverage the computing, networking, storage, and datacentre infrastructure at a supercomputing scale, providing customers with unprecedented performance and high efficiency.

Server blade from a rack featuring NVIDIA GB300 NVL72 in Azure AI infrastructure.

Looking Forward

Microsoft has consistently invested in AI infrastructure, enabling swift adaptation to the latest technologies. This investment positions Azure uniquely to deliver GB300 NVL72 infrastructure at scale in response to the current demands of frontier AI.

As Azure accelerates worldwide deployment of GB300, customers can anticipate training and deploying new models significantly faster than with previous generations. The ND GB300 v6 VMs are set to become the new benchmark for AI infrastructure, and Azure is excited to lead the charge in supporting clients in advancing frontier AI development.

Keep an eye out for additional updates and performance benchmarks as Azure rolls out the production of NVIDIA GB300 NVL72 globally.

Discover more insights from NVIDIA here.