New choices in Azure AI Foundry give companies an enterprise-grade platform to construct, deploy, and scale AI functions and brokers.
Microsoft and NVIDIA are deepening our partnership to energy the subsequent wave of AI industrial innovation. For years, our corporations have helped gasoline the AI revolution, bringing the world’s most superior supercomputing to the cloud, enabling breakthrough frontier fashions, and making AI extra accessible to organizations in every single place. At this time, we’re constructing on that basis with new developments that ship higher efficiency, functionality, and adaptability.
With added help for NVIDIA RTX PRO 6000 Blackwell Server Version on Azure Native, clients can deploy AI and visible computing workloads distributed and edge environments with the seamless orchestration and administration you employ within the cloud. New NVIDIA Nemotron and NVIDIA Cosmos fashions in Azure AI Foundry give companies an enterprise-grade platform to construct, deploy, and scale AI functions and brokers. With NVIDIA Run:ai on Azure, enterprises can get extra from each GPU to streamline operations and speed up AI. Lastly, Microsoft is redefining AI infrastructure with the world’s first deployment of NVIDIA GB300 NVL72.
At this time’s bulletins mark the subsequent chapter in our full-stack AI collaboration with NVIDIA, empowering clients to construct the longer term sooner.
Increasing GPU help to Azure Native
Microsoft and NVIDIA proceed to drive developments in synthetic intelligence, providing progressive options that span the private and non-private cloud, the sting, and sovereign environments.
As highlighted within the March weblog put up for NVIDIA GTC, Microsoft will supply NVIDIA RTX PRO 6000 Blackwell Server Version GPUs on Azure. Now, with expanded availability of NVIDIA RTX PRO 6000 Blackwell Server Version GPUs on Azure Native, organizations can optimize their AI workloads, no matter location, to supply clients with higher flexibility and extra choices than ever. Azure Native leverages Azure Arc to empower organizations to run superior AI workloads on-premises whereas retaining the administration simplicity of the cloud or working in totally disconnected environments.
NVIDIA RTX PRO 6000 Blackwell GPUs present the efficiency and adaptability wanted to speed up a broad vary of use circumstances, from agentic AI, bodily AI, and scientific computing to rendering, 3D graphics, digital twins, simulation, and visible computing. This expanded GPU help unlocks a spread of edge use circumstances that fulfill the stringent necessities of crucial infrastructure for our healthcare, retail, manufacturing, authorities, protection, and intelligence clients. This will likely embrace real-time video analytics for public security, predictive upkeep in industrial settings, speedy medical diagnostics, and safe, low-latency inferencing for important companies equivalent to power manufacturing and important infrastructure. The NVIDIA RTX PRO 6000 Blackwell allows improved digital desktop help by leveraging NVIDIA vGPU know-how and Multi-Occasion GPU (MIG) capabilities. This can’t solely accommodate the next person density, but in addition energy AI-enhanced graphics and visible compute capabilities, providing an environment friendly resolution for demanding digital environments.
Earlier this 12 months, Microsoft introduced a large number of AI capabilities on the edge, all enriched with NVIDIA accelerated computing:
- Edge Retrieval Augmented Technology (RAG): Empower sovereign AI deployments with quick, safe, and scalable inferencing on native knowledge—supporting mission-critical use circumstances throughout authorities, healthcare, and industrial automation.
- Azure AI Video Indexer enabled by Azure Arc: Permits real-time and recorded video analytics in disconnected environments—ultimate for public security and important infrastructure monitoring or post-event evaluation.
With Azure Native, clients can meet strict regulatory, knowledge residency, and privateness necessities whereas harnessing the most recent AI improvements powered by NVIDIA.
Whether or not you want ultra-low latency for enterprise continuity, sturdy native inferencing, or compliance with business rules, we’re devoted to delivering cutting-edge AI efficiency wherever your knowledge resides. Prospects now entry the breakthrough efficiency of the NVIDIA RTX PRO 6000 Blackwell GPUs in new Azure Native options—together with Dell AX-770, HPE ProLiant DL380 Gen12, and Lenovo ThinkAgile MX650a V4.
To seek out out extra about upcoming availability and join early ordering, go to:
Powering the way forward for AI with new fashions on Azure AI Foundry
At Microsoft, we’re dedicated to bringing probably the most superior AI capabilities to our clients, wherever they want them. Via our partnership with NVIDIA, Azure AI Foundry now brings world-class multimodal reasoning fashions on to enterprises, deployable anyplace as safe, scalable NVIDIA NIM™ microservices. The portfolio spans a spread of various use circumstances:
NVIDIA Nemotron Household: Excessive accuracy open fashions and datasets for agentic AI
- Llama Nemotron Nano VL 8B is obtainable now and is tailor-made for multimodal vision-language duties, doc intelligence and understanding, and cell and edge AI brokers.
- NVIDIA Nemotron Nano 9B is obtainable now and helps enterprise brokers, scientific reasoning, superior math, and coding for software program engineering and power calling.
- NVIDIA Llama 3.3 Nemotron Tremendous 49B 1.5 is coming quickly and is designed for enterprise brokers, scientific reasoning, superior math, and coding for software program engineering and power calling.
NVIDIA Cosmos Household: Open world basis fashions for bodily AI
- Cosmos Cause-1 7B is obtainable now and helps robotics planning and choice making, coaching knowledge curation and annotation for autonomous automobiles, and video analytics AI brokers extracting insights and performing root-cause evaluation from video knowledge.
- NVIDIA Cosmos Predict 2.5 is coming quickly and is a generalist mannequin for world state technology and prediction.
- NVIDIA Cosmos Switch 2.5 is coming quickly and is designed for structural conditioning and bodily AI.
Microsoft TRELLIS by Microsoft Analysis: Excessive-quality 3D asset technology
- Microsoft TRELLIS by Microsoft Analysis is obtainable now and allows digital twins by producing correct 3D belongings from easy prompts, immersive retail experiences with photorealistic product fashions for AR and digital try-ons, and recreation and simulation growth by turning inventive concepts into production-ready 3D content material.
Collectively, these open fashions mirror the depth of the Azure and NVIDIA partnership: combining Microsoft’s adaptive cloud with NVIDIA’s management in accelerated computing to energy the subsequent technology of agentic AI for each business. Be taught extra concerning the fashions right here.
Maximizing GPU utilization for enterprise AI with NVIDIA Run:ai on Azure
As an AI workload and GPU orchestration platform, NVIDIA Run:ai helps organizations take advantage of their compute investments, accelerating AI growth cycles and driving sooner time-to-market for brand new insights and capabilities. By bringing NVIDIA Run:ai to Azure, we’re giving enterprises the power to dynamically allocate, share, and handle GPU assets throughout groups and workloads, serving to them get extra from each GPU.
NVIDIA Run:ai on Azure integrates seamlessly with core Azure companies, together with Azure NC and ND sequence cases, Azure Kubernetes Service (AKS), and Azure Id Administration, and gives compatibility with Azure Machine Studying and Azure AI Foundry for unified, enterprise-ready AI orchestration. We’re bringing hybrid scale to life to assist clients rework static infrastructure into a versatile, shared useful resource for AI innovation.
With smarter orchestration and cloud-ready GPU pooling, groups can drive sooner innovation, cut back prices, and unleash the facility of AI throughout their organizations with confidence. NVIDIA Run:ai on Azure enhances AKS with GPU-aware scheduling, serving to groups allocate, share, and prioritize GPU assets extra effectively. Operations are streamlined with one-click job submission, automated queueing, and in-built governance. This ensures groups spend much less time managing infrastructure and extra time centered on constructing what’s subsequent.
This affect spans industries, supporting the infrastructure and orchestration behind transformative AI workloads at each stage of enterprise progress:
- Healthcare organizations can use NVIDIA Run:ai on Azure to advance medical imaging evaluation and drug discovery workloads throughout hybrid environments.
- Monetary companies organizations can orchestrate and scale GPU clusters for advanced threat simulations and fraud detection fashions.
- Producers can speed up pc imaginative and prescient coaching fashions for improved high quality management and predictive upkeep of their factories.
- Retail corporations can energy real-time advice techniques for extra personalised experiences via environment friendly GPU allocation and scaling, finally higher serving their clients.
Powered by Microsoft Azure and NVIDIA, Run:ai is purpose-built for scale, serving to enterprises transfer from remoted AI experimentation to production-grade innovation.
Reimagining AI at scale: First to deploy NVIDIA GB300 NVL72 supercomputing cluster
Microsoft is redefining AI infrastructure with the brand new NDv6 GB300 VM sequence, delivering the primary at-scale manufacturing cluster of NVIDIA GB300 NVL72 techniques, that includes over 4600 NVIDIA Blackwell Extremely GPUs related through NVIDIA Quantum-X800 InfiniBand networking. Every NVIDIA GB300 NVL72 rack integrates 72 NVIDIA Blackwell Extremely GPUs and 36 NVIDIA Grace™ CPUs, delivering over 130 TB/s of NVLink bandwidth and as much as 136 kW of compute energy in a single cupboard. Designed for probably the most demanding workloads—reasoning fashions, agentic techniques, and multimodal AI—GB300 NVL72 combines ultra-dense compute, direct liquid cooling, and good rack-scale administration to ship breakthrough effectivity and efficiency inside an ordinary datacenter footprint.
Azure’s co-engineered infrastructure enhances GB300 NVL72 with applied sciences like Azure Enhance for accelerated I/O and built-in {hardware} safety modules (HSM) for enterprise-grade safety. Every rack arrives pre-integrated and self-managed, enabling speedy, repeatable deployment throughout Azure’s international fleet. As the primary cloud supplier to deploy NVIDIA GB300 NVL72 at scale, Microsoft is setting a brand new commonplace for AI supercomputing—empowering organizations to coach and deploy frontier fashions sooner, extra effectively, and extra securely than ever earlier than. Collectively, Azure and NVIDIA are powering the way forward for AI.
Be taught extra about Microsoft’s techniques method in delivering GB300 NVL72 on Azure.
Unleashing the efficiency of ND GB200-v6 VMs with NVIDIA Dynamo
Our collaboration with NVIDIA focuses on optimizing each layer of the computing stack to assist clients maximize the worth of their present AI infrastructure investments.
To ship high-performance inference for compute-intensive reasoning fashions at scale, we’re bringing collectively an answer that mixes the open-source NVIDIA Dynamo framework, our ND GB200-v6 VMs with NVIDIA GB200 NVL72 and Azure Kubernetes Service(AKS). We’ve demonstrated the efficiency this mixed resolution delivers at scale with the gpt-oss 120b mannequin processing 1.2 million tokens per second deployed in a production-ready, managed AKS cluster and have revealed a deployment information for builders to get began at the moment.
Dynamo is an open-source, distributed inference framework designed for multi-node environments and rack-scale accelerated compute architectures. By enabling disaggregated serving, LLM-aware routing and KV caching, Dynamo considerably boosts efficiency for reasoning fashions on Blackwell, unlocking as much as 15x extra throughput in comparison with the prior Hopper technology, opening new income alternatives for AI service suppliers.
These efforts allow AKS manufacturing clients to take full benefit of NVIDIA Dynamo’s inference optimizations when deploying frontier reasoning fashions at scale. We’re devoted to bringing the most recent open-source software program improvements to our clients, serving to them totally understand the potential of the NVIDIA Blackwell platform on Azure.
Be taught extra about Dynamo on AKS.
Get extra AI assets