OpenAI’s open‑source model: gpt‑oss on Azure AI Foundry and Windows AI Foundry
With the debut of OpenAI’s gpt‑oss models—marking its first public release since GPT-2—developers and businesses now have an incredible opportunity to operate, customise, and implement OpenAI models exactly as they wish. For the very first time, you can execute models like gpt‑oss‑120b on a solo enterprise GPU or run gpt‑oss‑20b locally.
AI is evolving beyond just a layer in technology; it’s becoming the core framework. This new chapter necessitates tools that are open, flexible, and capable of functioning wherever your creative ideas stir—from the cloud to edge devices, right from initial trials to large-scale implementation. At Microsoft, we’re crafting a comprehensive AI app and agent ecosystem that empowers every developer not just to utilise AI but to innovate with it.
This vision is the foundation of our AI platform, which spans from cloud to edge. Azure AI Foundry presents a cohesive platform for building, optimizing, and deploying intelligent agents with assurance, while Foundry Local takes open-source models to the edge, allowing versatile, on-device inference across billions of devices. Windows AI Foundry builds on this vision by integrating Foundry Local into Windows 11 to enable a secure, efficient, local AI development cycle that is directly aligned with the Windows ecosystem.
Open Models: Real Growth
Open models have shifted from niche applications to mainstream use. They now drive a multitude of technologies, from autonomous agents to specialised copilots, reshaping the way AI is developed and deployed. With Azure AI Foundry, we’re providing the essential infrastructure to harness this momentum:
- With open weights, teams can fine-tune models using efficient techniques (like LoRA, QLoRA, and PEFT), incorporate proprietary data, and generate new checkpoints rapidly—often in hours rather than weeks.
- You can distill or quantize models, shorten context length, or implement structured sparsity to meet strict memory requirements for edge GPUs and high-end laptops.
- Access to the full model weights grants the ability to analyse attention patterns for security checks, introduce domain adapters, retrain specific layers, or export to ONNX/Triton for containerised inference on Azure Kubernetes Service (AKS) or Foundry Local.
In summary, open models are not merely functional replacements; they serve as programmable platforms. Azure AI Foundry offers training pipelines, weight management, and low-latency serving backplanes, allowing you to leverage these features and push the boundaries of AI customization.
Introducing gpt‑oss: Two Models, Limitless Opportunities
Currently, gpt‑oss-120b and gpt‑oss-20b are accessible through Azure AI Foundry. Notably, gpt‑oss-20b is also available on Windows AI Foundry and will soon be launched on MacOS through Foundry Local. Whether you need to prioritise sovereignty, efficiency, or portability, these models offer a new level of control.
- gpt‑oss-120b is a reasoning giant. With 120 billion parameters and architectural efficiency, it delivers high-level performance for complex tasks such as mathematics, coding, and domain-specific inquiries, yet it’s compact enough to operate on a single data centre-grade GPU. This makes it perfect for secure, high-performance setups where latency and cost are concerns.
- gpt‑oss-20b is adept and lightweight. Designed for tasks like code execution and tool utilisation, it runs efficiently on a variety of Windows hardware, including discrete GPUs with 16GB or more VRAM, with further device support on the way. This model is ideal for crafting autonomous assistants or integrating AI into workflows, even in bandwidth-restricted settings.
Both models will soon be compatible with the popular responses API. This means you can integrate them into existing applications with minimal adjustments, while enjoying maximum flexibility.
Introducing gpt‑oss to Cloud and Edge
Azure AI Foundry goes beyond just being a model repository—it serves as a platform for AI creators. With a growing collection of over 11,000 models, it offers developers a standardised environment to evaluate, optimise, and deploy models with enterprise-level reliability and security.
With gpt‑oss now included in the catalogue, you can:
- Quickly launch inference endpoints using gpt‑oss in the cloud with just a few commands.
- Tailor and distill the models using your data and deploy them confidently.
- Combine open and proprietary models to fit specific task requirements.
For organisations developing scenarios that rely on client-side devices, Foundry Local makes prominent open-source models available for Windows AI Foundry, pre-optimised for inference on your hardware, supporting CPUs, GPUs, and NPUs through a straightforward CLI, API, and SDK.
Whether you’re working offline, within a secure network, or operating at the edge—Foundry Local and Windows AI Foundry enable a fully cloud-optional experience. With the capacity to deploy gpt‑oss-20b on modern high-performance Windows PCs, your data remains under your control, while accessing the capability of state-of-the-art models.
This represents hybrid AI in action: the ability to merge and optimise models, enhancing performance and cost-effectiveness, while meeting your data in its native environment.
Empowering Creators and Decision Makers
Access to gpt‑oss on Azure and Windows opens up exciting new avenues for both developers and business leaders.
For developers, open weights equate to complete transparency. Examine the model, customise, fine-tune, and deploy according to your needs. With gpt‑oss, you can build confidently, fully understanding how the model functions and how to enhance it for your specific applications.
For decision-makers, flexibility and control are paramount. With gpt‑oss, you gain top-tier performance without the drawbacks of opaque systems, allowing for more options in deployment, compliance, and budgeting.
A Vision for the Future: Collaborative Open and Responsible AI
The rollout of gpt‑oss and its integration with Azure and Windows is part of a broader narrative. We picture a future where AI is commonplace, and we are dedicated to being an open platform that allows these innovative solutions to reach our customers across all our data centres and devices.
By making gpt‑oss available through various pathways, we reinforce our commitment to democratizing AI. We understand that our customers will gain from a diverse range of models—both proprietary and open—and we are committed to whichever approach brings value. Whether you’re working with open-source or proprietary models, Foundry’s integrated security and safety tools ensure reliable governance, compliance, and trust, so you can innovate confidently across all types of models.
Finally, our support for gpt‑oss is just the latest instance of our commitment to open tools and standards. In June, we announced that the GitHub Copilot Chat extension is now open source on GitHub under the MIT license—the initial step towards making VS Code an open source AI editor. We seek to accelerate strides with the open-source community and provide greater value through our market-leading tools for developers. This illustrates how research, products, and platforms can come together. The significant breakthroughs we’ve facilitated with our cloud at OpenAI are now open tools for anyone to build upon— and Azure is the connection that brings them to fruition.
Next Steps and Resources for Exploring gpt‑oss
- Deploy gpt‑oss in the cloud today using just a few commands in Azure AI Foundry. Explore the Azure AI Model Catalog to set up an endpoint.
- Install gpt‑oss-20b on your Windows device today (and soon on MacOS) via Foundry Local. Refer to the QuickStart guide for more details.
- Pricing1 for these models is as follows:

*For the Managed Compute pricing page, click here.
1Pricing is accurate as of August 2025.