Deliveries of Nvidia GB200 AI servers at a price of $3 million are in jeopardy due to leaks in the life support system

An unexpected problem has plagued Nvidia’s latest GB200 NVL72 and NVL36 server systems, which are equipped with the advanced GB200 compute accelerators, which are designed for artificial intelligence applications. Shortly before mass production and the launch of the product, a serious problem was discovered in the liquid cooling system.

Image source: NVIDIA

Let us recall that the GB200 NVL72 systems represent an entire server rack with 18 1U nodes at once, each of which has a pair of GB200 accelerators, which, in turn, are a pair of Nvidia B200 chips and one 72-core Arm Grace processor. In total, the system includes 72 B200 chips, 36 Grace processors, connected by the NVLink 5 bus. This entire system consumes about 120 kW, is equipped with a life support system and a single DC power bus. In turn, the GB200 NVL36 system is a system with half the number of GB200. According to preliminary data, the GB200 NVL72 system will cost $3 million.

As TweakTown reports with reference to the Taiwanese publication UDN, leaks have been detected in the GB200 NVL72 liquid cooling systems, which, according to preliminary data, are associated with components from third-party manufacturers. Previously, Nvidia transferred the production of some cooling system components, such as pipes, quick connectors and hoses, to its partners – large international manufacturers.

Image Source: TheRegister.com

The leaks were discovered before mass production of the NVL36 and NVL72 AI systems began, giving manufacturers time to fix the problems and, despite the difficulties encountered and the threat of missed delivery dates to key customers, the product is expected to be delivered on time.

However, the incident has raised concerns among major cloud service providers who fear the reliability of Nvidia’s new servers. In response to the situation, Taiwanese manufacturers such as Shuanghong and Qihong have begun to ramp up production of liquid cooling components to provide Nvidia with alternative options.

Certification of pipes, quick-release couplings and hoses is a complex process that requires special knowledge and experience. Previously, Taiwanese companies did not specialize in the production of such components, but Nvidia’s decision to use liquid cooling in its AI chips pushed them to develop new technologies. Currently, active work is underway to eliminate the problem. It is expected that server cabinets with GB200 processors and the corrected cooling system will begin to be shipped to customers in the near future.

admin

Share
Published by
admin

Recent Posts

Windows 11 will become smarter: Microsoft is testing AI file search

Microsoft is testing a new artificial intelligence (AI)-powered search feature in the latest build for…

47 minutes ago

Merger instead of sale: Perplexity AI wants to save TikTok in the US

Perplexity AI proposed on Saturday, a day before TikTok was blocked in the United States,…

47 minutes ago

Battle Shapers – fear of ambition. Review

Not defined Roguelikes with a first-person perspective are a fairly niche genre segment, but they…

6 hours ago

ASRock introduced industrial mini-PCs and motherboards based on Intel Arrow Lake-H and AMD Ryzen 300 AI

ASRock Industrial, according to the CNX-Software resource, presented industrial computers of a small form factor…

7 hours ago

The potential US Secretary of Transportation promised to deal with SpaceX fines and eliminate the space bureaucracy

This week, Congress held confirmation hearings for new ministers nominated by new US President Donald…

8 hours ago

Vast Space has built the world’s first private space station; it will go into orbit this year

California-based startup Vast Space has announced the completion of the world's first commercial space station,…

8 hours ago