Microsoft’s strategic AI datacenter planning enables seamless, large-scale NVIDIA Rubin deployments
CES 2026 highlights the launch of the NVIDIA Rubin Platform, with Azure showcasing its readiness for deployment.
CES 2026 brings us the exciting debut of the NVIDIA Rubin platform, with Azure proving it’s set for immediate use. Microsoft has designed its long-term datacentre strategy for pivotal moments like this, ensuring that NVIDIA’s cutting-edge systems integrate perfectly into infrastructure that has anticipated their power, cooling, memory, and networking needs well before the market called for them. Our enduring partnership with NVIDIA guarantees that Rubin aligns seamlessly with Azure’s progressive platform design.
Crafting a Future-Focused Infrastructure
Azure’s AI datacentres are specifically designed for the future of enhanced computing. This allows smooth incorporation of NVIDIA Vera Rubin NVL72 racks across Azure’s extensive new AI superfactories, which include current Fairwater locations in Wisconsin and Atlanta, as well as planned future sites.
The latest NVIDIA AI infrastructure demands substantial upgrades in power, cooling, and performance optimisation. Azure’s extensive experience with our Fairwater facilities and numerous upgrade cycles over the years proves our ability to adapt and enhance AI infrastructure in tandem with technological advancements.
Proven Experience in Scale and Performance
Microsoft has accumulated years of experience in designing and deploying scalable AI infrastructures, which evolve alongside every significant breakthrough in AI technology. Microsoft promptly integrates NVIDIA’s innovations, delivering them at scale with each new generation of NVIDIA’s accelerated computing infrastructure. Our large-scale initial deployments of NVIDIA Ampere and Hopper GPUs, connected via NVIDIA Quantum-2 InfiniBand, have been vital in realising models like GPT-3.5, while various clusters have attained supercomputing performance records, showcasing our ability to bring new systems online more swiftly and with superior real-world performance compared to the rest of the sector.
We proudly announced the first and largest implementations of both NVIDIA GB200 NVL72 and NVIDIA GB300 NVL72 platforms. These were architected as racks for single supercomputers, drastically speeding up AI model training and keeping Azure as a top choice for customers pursuing advanced AI capabilities.
A Holistic Systems Approach
Azure is constructed with compute, networking, storage, software, and infrastructure functioning as a cohesive platform. This strategy allows Microsoft to build a lasting advantage into Azure, delivering cost and performance gains that accumulate over time.
To maximise GPU utilisation, optimisation is essential at every level. Besides being one of the first to adopt NVIDIA’s latest accelerated computing platforms, Azure’s additional advantages stem from the overall platform as well: high-throughput Blob storage, strategically designed proximity placement and regional scale, and orchestration layers like CycleCloud and AKS tailored for low-overhead scheduling at massive cluster scales.
Azure Boost and various offloading engines help eliminate IO, network, and storage bottlenecks for smooth model scaling. Enhanced storage supports larger clusters, robust networking sustains them, and fine-tuned orchestration ensures consistent end-to-end performance. First-party innovations strengthen this cycle: liquid cooling Heat Exchanger Units maintain optimal thermal conditions, the Azure hardware security module (HSM) silicon offloads security tasks, and Azure Cobalt provides excellent performance and efficiency for a range of computing tasks. Together, these integrations make certain that the entire system scales efficiently, allowing GPU investments to provide maximum value.
This comprehensive systems approach prepares Azure for the Rubin platform. We are launching new systems and building an end-to-end platform that already considers the requirements that Rubin introduces.
Using the NVIDIA Rubin Platform
The NVIDIA Vera Rubin Superchips will achieve 50 PF NVFP4 inference performance per chip and 3.6 EF NVFP4 per rack, representing a fivefold leap compared to NVIDIA GB200 NVL72 rack systems.
Azure has already adopted the essential architectural principles Rubin demands:
- NVIDIA NVLink evolution: The sixth generation of NVIDIA NVLink technology anticipated in Vera Rubin NVL72 systems promises ~260 TB/s of scale-up bandwidth. Azure’s rack architecture has already been modified to leverage these bandwidth and topology benefits.
- High-performance scale-out networking: The Rubin AI infrastructure depends on hyper-fast NVIDIA ConnectX-9 1,600 Gb/s networking, delivered by Azure’s infrastructure built specifically for large-scale AI workloads.
- HBM4/HBM4e thermal and density planning: The Rubin memory architecture requires stringent thermal management and higher rack densities; Azure has upgraded its cooling, power parameters, and rack designs to meet these needs.
- SOCAMM2-driven memory expansion: Rubin Superchips implement a novel memory expansion design; Azure’s platform has already integrated and verified similar memory extension behaviours to sustain models at scale.
- Reticle-sized GPU scaling and multi-die packaging: Rubin features significantly larger GPU structures and multi-die configurations. Azure’s supply chain, mechanical design, and orchestration layers are already fine-tuned for these physical and logical scaling requirements.
Azure’s method of preparing for advanced accelerated computation platforms like Rubin has been validated over several years, achieving major milestones:
- Managed the largest commercial InfiniBand deployments across various GPU generations.
- Developed reliability frameworks and congestion management methods that unlock higher cluster utilisation and larger job sizes than competitors, evident in our ability to report industry-leading large-scale benchmarks. (For example, multi-rack MLPerf runs that competitors haven’t matched.)
- AI datacentres co-designed with Grace Blackwell and Vera Rubin from inception to maximize performance and performance per dollar at the cluster level.
Distinctive Design Principles of Azure
- Pod exchange architecture: To facilitate rapid servicing, Azure’s GPU server trays are crafted for quick swaps without requiring extensive rewiring, enhancing uptime.
- Cooling abstraction layer: Rubin’s multi-die, high-bandwidth components necessitate sophisticated thermal management that Fairwater can already support, negating costly retrofitting.
- Next-gen power design: The increased watt density of Vera Rubin NVL72 demands upgraded power designs; Azure’s multi-year redesign (liquid cooling loop adjustments, CDU scaling, and high amp busbars) ensures immediate deployability.
- AI superfactory modularity: Microsoft constructs regional supercomputers instead of single megasites, allowing for more predictable global launches of new SKUs.
The Benefits of Co-Design for Users
The NVIDIA Rubin platform marks a pivotal milestone in accelerated computing, and Azure’s AI datacentres and superfactories are already designed to leverage its full potential. Years of co-design with NVIDIA in areas such as interconnects, memory systems, thermal management, packaging, and rack architecture mean Rubin can be seamlessly integrated into Azure’s platform. The core principles of Rubin are already reflected in our networking, power, cooling, orchestration, and pod exchange designs. This synergy provides customers with immediate advantages: quicker deployment, faster scaling, and swifter impact as they forge the next chapter in large-scale AI.
Share this content:


