Deliveries of Nvidia GB200 AI servers at a price of $3 million are in jeopardy due to leaks in the life support system

An unexpected problem has plagued Nvidia’s latest GB200 NVL72 and NVL36 server systems, which are equipped with the advanced GB200 compute accelerators, which are designed for artificial intelligence applications. Shortly before mass production and the launch of the product, a serious problem was discovered in the liquid cooling system.

Image source: NVIDIA

Let us recall that the GB200 NVL72 systems represent an entire server rack with 18 1U nodes at once, each of which has a pair of GB200 accelerators, which, in turn, are a pair of Nvidia B200 chips and one 72-core Arm Grace processor. In total, the system includes 72 B200 chips, 36 Grace processors, connected by the NVLink 5 bus. This entire system consumes about 120 kW, is equipped with a life support system and a single DC power bus. In turn, the GB200 NVL36 system is a system with half the number of GB200. According to preliminary data, the GB200 NVL72 system will cost $3 million.

As TweakTown reports with reference to the Taiwanese publication UDN, leaks have been detected in the GB200 NVL72 liquid cooling systems, which, according to preliminary data, are associated with components from third-party manufacturers. Previously, Nvidia transferred the production of some cooling system components, such as pipes, quick connectors and hoses, to its partners – large international manufacturers.

Image Source: TheRegister.com

The leaks were discovered before mass production of the NVL36 and NVL72 AI systems began, giving manufacturers time to fix the problems and, despite the difficulties encountered and the threat of missed delivery dates to key customers, the product is expected to be delivered on time.

However, the incident has raised concerns among major cloud service providers who fear the reliability of Nvidia’s new servers. In response to the situation, Taiwanese manufacturers such as Shuanghong and Qihong have begun to ramp up production of liquid cooling components to provide Nvidia with alternative options.

Certification of pipes, quick-release couplings and hoses is a complex process that requires special knowledge and experience. Previously, Taiwanese companies did not specialize in the production of such components, but Nvidia’s decision to use liquid cooling in its AI chips pushed them to develop new technologies. Currently, active work is underway to eliminate the problem. It is expected that server cabinets with GB200 processors and the corrected cooling system will begin to be shipped to customers in the near future.

admin

Share
Published by
admin

Recent Posts

In the add-on with Conan the Barbarian for Mortal Kombat 1, they found a secret pink ninja named Floyd and a new arena

In October 2022, Mortal Kombat series co-creator Ed Boon admitted that he would really like…

5 minutes ago

Adobe Premiere Pro can now find video clips based on verbal descriptions

Adobe has updated the content search feature in Premiere Pro with AI-powered visual recognition tools.…

2 hours ago

Scientists have detected an anomalous increase in the expansion rate of the Universe

New data on nearby galaxies showed a local increase in the Hubble constant, a value…

2 hours ago

In ChromeOS you can now control the cursor using facial expressions

Google has introduced a number of new ChromeOS features designed to benefit students and make…

2 hours ago

Former top manager of Intel headed the second largest Chinese chip manufacturer

Hua Hong Semiconductor, China's second-largest chip maker, has made a strategic leadership reshuffle with the…

3 hours ago