Microsoft delivers the first at-scale production cluster with more than 4,600 NVIDIA GB300 NVL72, featuring NVIDIA Blackwell Ultra GPUs connected through the next-generation NVIDIA InfiniBand network. This cluster is the first of many, as we scale to hundreds of thousands of Blackwell Ultra GPUs deployed across Microsoft’s AI datacenters globally, reflecting our continued commitment to redefining AI infrastructure and collaboration with NVIDIA. The massive scale clusters with Blackwell Ultra GPUs will enable model training in weeks instead of months, delivering high throughput for inference workloads. We are also unlocking bigger, more powerful models, and will be the first to support training models with hundreds of trillions of parameters.

This was made possible through collaboration across hardware, systems, supply chain, facilities, and multiple other disciplines, as well as with NVIDIA.

Microsoft Azure’s launch of the NVIDIA GB300 NVL72 supercluster is an exciting step in the advancement of frontier AI. This co-engineered system delivers the world’s first at-scale GB300 production cluster, providing the supercomputing engine needed for OpenAI to serve multitrillion-parameter models. This sets the definitive new standard for accelerated computing.

Ian Buck, Vice President of Hyperscale and High-performance Computing at NVIDIA

From NVIDIA GB200 to GB300: A new standard in AI performance

Earlier this year, Azure introduced ND GB200 v6 virtual machines (VMs), accelerated by NVIDIA’s Blackwell architecture. These quickly became the backbone of some of the most demanding AI workloads in the industry, including for organizations like OpenAI and Microsoft who already use massive clusters of GB200 NVL2 on Azure to train and deploy frontier models.

Now, with ND GB300 v6 VMs, Azure is raising the bar again. These VMs are optimized for reasoning models, agentic AI systems, and multimodal generative AI. Built on a rack-scale system, each rack has 18 VMs with a total of 72 GPUs:

  • 72 NVIDIA Blackwell Ultra GPUs (with 36 NVIDIA Grace CPUs).
  • 800 gigabits per second (Gbp/s) per GPU cross-rack scale-out bandwidth via next-generation NVIDIA Quantum-X800 InfiniBand (2x GB200 NVL72).
  • 130 terabytes (TB) per second of NVIDIA NVLink bandwidth within rack.
  • 37TB of fast memory.
  • Up to 1,440 petaflops (PFLOPS) of FP4 Tensor Core performance.
Close up of Azure server featuring NVIDIA GB300 NVL72, with Blackwell Ultra GPUs.

Building for AI supercomputing at scale

Building infrastructure for frontier AI requires us to reimagine every layer of the stack—computing, memory, networking, datacenters, cooling, and power—as a unified system. The ND GB300 v6 VMs are a clear representation of this transformation, from years of collaboration across silicon, systems, and software.

At the rack level, NVLink and NVSwitch reduce memory and bandwidth constraints, enabling up to 130TB per second of intra-rack data-transfer connecting 37TB total of fast memory. Each rack becomes a tightly coupled unit, delivering higher inference throughput at reduced latencies on larger models and longer context windows, empowering agentic and multimodal AI systems to be more responsive and scalable than ever.

To scale beyond the rack, Azure deploys a full fat-tree, non-blocking architecture using NVIDIA Quantum-X800 Gbp/s InfiniBand, the fastest networking fabric available today. This ensures that customers can scale up training of ultra-large models efficiently to tens of thousands of GPUs with minimal communication overhead, thus delivering better end-to-end training throughput. Reduced synchronization overhead also translates to maximum utilization of GPUs, which helps researchers iterate faster and at lower costs despite the compute-hungry nature of AI training workloads. Azure’s co-engineered stack, including custom protocols, collective libraries, and in-network computing, ensures the network is highly reliable and fully utilized by the applications. Features like NVIDIA SHARP accelerate collective operations and double effective bandwidth by performing math in the switch, making large-scale training and inference more efficient and reliable.

Azure’s advanced cooling systems use standalone heat exchanger units and facility cooling to minimize water usage while maintaining thermal stability for dense, high-performance clusters like GB300 NVL72. We also continue to develop and deploy new power distribution models capable of supporting the high energy density and dynamic load balancing required by the ND GB300 v6 VM class of GPU clusters.

Further, our reengineered software stacks for storage, orchestration, and scheduling are optimized to fully use computing, networking, storage, and datacenter infrastructure at supercomputing scale, delivering unprecedented levels of performance at high efficiency to our customers.

Server blade from a rack featuring NVIDIA GB300 NVL72 in Azure AI infrastructure.

Looking ahead

Microsoft has invested in AI infrastructure for years, to allow for fast enablement and transition into the newest technology. It is also why Azure is uniquely positioned to deliver GB300 NVL72 infrastructure at production scale at a rapid pace, to meet the demands of frontier AI today.

As Azure continues to ramp up GB300 worldwide deployments, customers can expect to train and deploy new models in a fraction of the time compared to previous generations. The ND GB300 v6 VMs v6 are poised to become the new standard for AI infrastructure, and Azure is proud to lead the way, supporting customers to advance frontier AI development.

Stay tuned for more updates and performance benchmarks as Azure expands production deployment of NVIDIA GB300 NVL72 globally.

Read more from NVIDIA here.

The post Microsoft Azure delivers the first large scale cluster with NVIDIA GB300 NVL72 for OpenAI workloads appeared first on Microsoft Azure Blog.