Building the future together: Microsoft and NVIDIA announce AI advancements at GTC DC
Azure AI Foundry has introduced exciting new features that provide businesses with a robust platform to develop, launch, and expand AI applications and agents.
Microsoft and NVIDIA are strengthening their partnership to drive the next wave of innovation in the industrial AI space. Over the years, we’ve played a pivotal role in the AI revolution, providing advanced supercomputing capabilities in the cloud. This collaboration has enabled numerous groundbreaking models and made AI more accessible to a wide array of organisations. Today, we’re enhancing that foundation with improvements that promise better performance, capabilities, and flexibility.
With new support for the NVIDIA RTX PRO 6000 Blackwell Server Edition on Azure Local, customers can efficiently deploy AI and visual computing tasks in distributed and edge environments, all while enjoying the same orchestration and management capabilities found in the cloud. The introduction of the NVIDIA Nemotron and NVIDIA Cosmos models within Azure AI Foundry offers an enterprise-class platform for developing, deploying, and scaling AI applications and agents. Moreover, with NVIDIA Run:ai on Azure, businesses can maximise their GPU utility, thus expediting operations and advancing AI initiatives. Finally, Microsoft is revolutionising AI infrastructure with the pioneering deployment of NVIDIA GB300 NVL72.
Today’s updates signify an important step in our comprehensive AI collaboration with NVIDIA, empowering our customers to accelerate their technological advancements.
Extending GPU Support in Azure Local
Microsoft and NVIDIA are continually pushing the boundaries in the realm of artificial intelligence by offering cutting-edge solutions that cater to public clouds, private clouds, edge computing, and sovereign environments.
As mentioned in the March blog post for NVIDIA GTC, Microsoft will provide NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs on Azure. Now, with the expanded availability of these GPUs on Azure Local, organisations can enhance their AI workflows no matter where they are located, thus offering greater flexibility and a wide range of options. Azure Local employs Azure Arc to enable advanced AI workloads on-premises, all while maintaining the user-friendly management features of the cloud or functioning in entirely disconnected settings.
The NVIDIA RTX PRO 6000 Blackwell GPUs deliver the necessary performance and flexibility to accelerate various applications, including agentic AI, physical AI, and scientific computing. Additionally, they cover rendering, 3D graphics, digital twins, simulation, and visual computing. This enlarged GPU support unlocks multiple edge use cases to meet the robust requirements of critical sectors like healthcare, retail, manufacturing, government, defence, and intelligence. Scenarios could include real-time video analysis for public safety, predictive maintenance in industry settings, rapid medical diagnostics, and secure, low-latency inferencing for essential services, such as energy production. The NVIDIA RTX PRO 6000 Blackwell enhances virtual desktop support using NVIDIA vGPU technology and Multi-Instance GPU (MIG) features. This not only accommodates a larger user base but also supports AI-boosted graphics and visual compute capabilities, providing an efficient solution for high-demand virtual environments.
Earlier this year, Microsoft unveiled a range of AI features at the edge, all powered by NVIDIA’s accelerated computing:
- Edge Retrieval Augmented Generation (RAG): This feature enables sovereign AI deployments with rapid, secure, and scalable inferencing on local data, tailored for critical use cases across government, healthcare, and industrial automation.
- Azure AI Video Indexer powered by Azure Arc: Facilitates real-time and recorded video analytics in offline environments, perfect for monitoring public safety and critical infrastructure or for post-event analysis.
With Azure Local, clients can adhere to strict regulatory and privacy standards while leveraging the latest innovations powered by NVIDIA.
Whether you require ultra-low latency for ongoing business operations, robust local inferencing, or compliance with industry regulations, we’re committed to delivering state-of-the-art AI performance wherever your data is located. Customers can now experience the remarkable performance of the NVIDIA RTX PRO 6000 Blackwell GPUs through the new Azure Local solutions—featuring products like Dell AX-770, HPE ProLiant DL380 Gen12, and Lenovo ThinkAgile MX650a V4.
For more details on availability and to register for early ordering, visit:
Fueling the Future of AI with New Models on Azure AI Foundry
At Microsoft, our goal is to deliver the most advanced AI capabilities directly to our customers. Through our partnership with NVIDIA, Azure AI Foundry now offers world-class multimodal reasoning models that can be deployed securely and at scale, thanks to NVIDIA NIM microservices. The portfolio encompasses various use cases:
NVIDIA Nemotron Family: High-Accuracy Open Models for Agentic AI
- Llama Nemotron Nano VL 8B is currently available and is designed for multimodal vision-language tasks, document intelligence, and edge AI agents.
- NVIDIA Nemotron Nano 9B is also available now, supporting enterprise agents, scientific reasoning, advanced mathematics, and coding for software projects.
- NVIDIA Llama 3.3 Nemotron Super 49B 1.5 is on its way and will cater to enterprise agents, scientific reasoning, advanced mathematics, and coding for software development.
NVIDIA Cosmos Family: Foundation Models for Physical AI
- Cosmos Reason-1 7B is available now and supports planning and decision-making for robotics, training data curation for autonomous vehicles, and video analytics for AI agents that extract insights from video data.
- NVIDIA Cosmos Predict 2.5 is coming soon as a generalist model designed for world state prediction.
- NVIDIA Cosmos Transfer 2.5, also arriving soon, will focus on structural conditioning and physical AI.
Microsoft TRELLIS by Microsoft Research: High-Quality 3D Content Generation
- Microsoft TRELLIS by Microsoft Research is available now, enabling digital twins through the generation of precise 3D assets from simple prompts. This technology benefits retail with photorealistic models for AR and virtual try-ons, as well as gaming and simulation development.
Collectively, these open models highlight the depth of collaboration between Azure and NVIDIA, merging Microsoft’s adaptive cloud technologies with NVIDIA’s expertise in accelerated computing to foster the next generation of agentic AI across various industries. Discover more about these models here.
Enhancing GPU Efficiency for Enterprise AI with NVIDIA Run:ai on Azure
NVIDIA Run:ai serves as a crucial platform for AI workload and GPU orchestration, helping organisations maximise their computing investments, speed up AI development cycles, and bring new insights to market quicker. By integrating NVIDIA Run:ai on Azure, we empower enterprises to dynamically allocate, share, and manage GPU resources across teams and tasks, allowing them to fully leverage every GPU’s potential.
NVIDIA Run:ai on Azure smoothly integrates with key Azure services, including Azure NC and ND series instances, Azure Kubernetes Service (AKS), and Azure Identity Management, while also being compatible with Azure Machine Learning and Azure AI Foundry for unified enterprise AI orchestration. This hybrid scale approach enables customers to transform their fixed infrastructure into a flexible, shared asset for AI innovation.
With intelligent orchestration and cloud-ready GPU pooling, teams can accelerate innovation, reduce costs, and confidently harness AI across their organisations. NVIDIA Run:ai on Azure enhances AKS with GPU-aware scheduling, allowing teams to allocate, share, and prioritise GPU resources more effectively. Operations benefit from streamlined features such as one-click job submission and automated queueing, which helps teams minimise time spent managing infrastructure, letting them focus more on building the future.
This impact spans multiple sectors, backing the framework and orchestration that underpins transformative AI tasks at every stage of enterprise development:
- In healthcare, organisations can leverage NVIDIA Run:ai on Azure to refine medical imaging analysis and drug discovery across hybrid settings.
- In finance, businesses can orchestrate and scale GPU clusters for complex risk simulations and fraud detection.
- Manufacturers can speed up computer vision training models for enhanced quality control and predictive maintenance in their factories.
- Retailers can power real-time recommendation engines for more personalised customer experiences through efficient GPU allocation and scaling.
Facilitated by Microsoft Azure and NVIDIA, Run:ai is tailored for scalability, guiding enterprises from isolated AI experiments to full-scale innovation.
Revolutionising AI at Scale: The First to Implement NVIDIA GB300 NVL72 Supercomputing Cluster
Microsoft is transforming AI infrastructure with the launch of the NDv6 GB300 VM series, featuring the first large-scale production cluster of NVIDIA GB300 NVL72 systems. This setup includes over 4600 NVIDIA Blackwell Ultra GPUs interconnected via the NVIDIA Quantum-X800 InfiniBand network. Each NVIDIA GB300 NVL72 rack is equipped with 72 NVIDIA Blackwell Ultra GPUs and 36 NVIDIA Grace CPUs, providing more than 130 TB/s of NVLink bandwidth and up to 136 kW of computational power within a single cabinet. Built for the most challenging workloads—reasoning models, agentic systems, and multimodal AI—the GB300 NVL72 merges high-density computing with direct liquid cooling and intelligent rack-scale management, resulting in exceptional efficiency and performance for standard datacentre footprints.
Azure’s co-engineered infrastructure enhances the GB300 NVL72’s capabilities with features like Azure Boost for improved I/O and integrated hardware security modules (HSM) for top-tier protection. Each rack arrives pre-configured and self-managing, ensuring rapid and repeatable deployment throughout Azure’s global network. By being the first cloud provider to deploy NVIDIA GB300 NVL72 at scale, Microsoft sets a new benchmark for AI supercomputing—equipping organisations to train and deploy frontier models faster, more efficiently, and more securely than ever before. Azure and NVIDIA are paving the way for the future of AI.
Discover more about Microsoft’s integrated approach in delivering GB300 NVL72 on Azure.
Maximising the Potential of ND GB200-v6 VMs with NVIDIA Dynamo
Our collaboration with NVIDIA is centred on optimising each layer of the computing stack to assist customers in getting the most from their existing AI infrastructure investment.
To deliver high-performance inference for compute-intensive reasoning models at scale, we’re uniting a solution that integrates the open-source NVIDIA Dynamo framework, our ND GB200-v6 VMs with NVIDIA GB200 NVL72, and Azure Kubernetes Service (AKS). We have showcased the performance of this combined solution at scale, processing 1.2 million tokens per second via the gpt-oss 120b model deployed in a production-ready, managed AKS setup, and have published a deployment guide for developers eager to begin.
Dynamo is an open-source, distributed inference framework tailored for multi-node environments and rack-scale accelerated compute architectures. By enabling disaggregated serving, LLM-aware routing, and KV caching, Dynamo significantly enhances performance for reasoning models on Blackwell—unlocking up to 15x more throughput compared to the previous Hopper generation, thereby opening new revenue opportunities for AI service providers.
These initiatives enable AKS production customers to fully utilise NVIDIA Dynamo’s inference optimisations when deploying frontier reasoning models at scale. We’re committed to bringing the latest open-source software advancements to our clientele, assisting them in realising the full potential of the NVIDIA Blackwell platform on Azure.
To learn more about Dynamo on AKS.


