Nvidia officially announced the Blackwell Ultra B300 data center accelerator, the Grace Blackwell Ultra GB300 superchip, and various systems based on it at the opening of the GTC 2025 conference. The new product is “built for the age of reasoning,” that is, for newer, more complex and resource-intensive AI models (LLM) that can reason about problems.

Image source: NVIDIA

Nvidia traditionally did not disclose all the details about the new product. The company only noted that the Blackwell Ultra graphics processors (in the GB300 and B300) are physically different from the Blackwell chips (in the GB200 and B200). Note that the Blackwell Ultra B300 is a classic GPU accelerator, while the Grace Blackwell Ultra GB300 is a bundle of the Grace Arm processor with 72 Neoverse V2 cores and two Blackwell Ultra graphics processors.

Board with a pair of Grace CPUs and four Blackwell Ultras

Nvidia notes a 50% increase in onboard memory. The Blackwell Ultra has 288 GB of HBM3e, which will come in handy when working with particularly large LLMs. The memory capacity has increased thanks to the use of new 12-tier HBM3e stacks – the Blackwell B200 uses eight-tier HBM3e stacks, providing 192 GB of memory.

According to Nvidia, the Blackwell Ultra should deliver 1.5x the performance of Blackwell in running already trained models (FP4 inference). The company claims 15 Pflops of FP4 inference and 30 Pflops of sparse FP4 inference. For the original Blackwell B200, these figures were 10 and 20 Pflops, respectively.

GB300 NVL72

Nvidia will offer several ready-made systems based on the new computing accelerators, which will go on sale in the second half of 2025. GB300 NVL72 is essentially a ready-made server rack combining 72 Blackwell Ultra GPUs and 36 Grace CPUs. The new product, like its predecessor GB200 NVL72, is equipped with a liquid cooling system, uses fifth-generation NVLink, Nvidia ConnectX-8 SuperNIC modules and offers 18 TB of LPDDR5X RAM. Performance reaches 1100 Pflops in FP4 calculations and up to 1400 Pflops in sparse calculations.

Nvidia particularly highlights the use of its 5th-generation NVLink interconnect, which connects individual chips to create “one big GPU.” It offers 1.8 TB/s of bandwidth per GPU, for a total of 130 TB/s. Starting with Blackwell, NVLink can also be used as an interface to connect multiple racks, which was previously done via InfiniBand at 100 GB/s. So Nvidia claims an 18x speed increase for this particular scenario.

Blackwell Ultra DGX SuperPOD

Up to 576 graphics processors can be connected to the NVLink domain. In fact, Nvidia will also offer such a system — Blackwell Ultra DGX SuperPOD. This is a cluster of eight NVL72 racks, which includes 288 Grace processors, 576 Blackwell Ultra chips, 300 TB of HBM3e memory and FP4 performance of 11.5 Exaflops.

Finally, Nvidia unveiled the HGX B300 NVL16 system, a solution for those who want an x86-compatible chip instead of an Arm-based Grace processor. The system has 16 B300 GPUs and some x86 processors connected via NVLink. Nvidia doesn’t specify which CPUs are used, but in the past, both AMD and Intel chips have been used.

Blackwell Ultra-based accelerators and systems will hit the market in the second half of this year. They will be offered by all major server manufacturers, and the new products will also be available from major cloud providers.

Leave a Reply

Your email address will not be published. Required fields are marked *