Deliveries of Nvidia GB200 AI servers at a price of $3 million are in jeopardy due to leaks in the life support system

An unexpected problem has plagued Nvidia’s latest GB200 NVL72 and NVL36 server systems, which are equipped with the advanced GB200 compute accelerators, which are designed for artificial intelligence applications. Shortly before mass production and the launch of the product, a serious problem was discovered in the liquid cooling system.

Image source: NVIDIA

Let us recall that the GB200 NVL72 systems represent an entire server rack with 18 1U nodes at once, each of which has a pair of GB200 accelerators, which, in turn, are a pair of Nvidia B200 chips and one 72-core Arm Grace processor. In total, the system includes 72 B200 chips, 36 Grace processors, connected by the NVLink 5 bus. This entire system consumes about 120 kW, is equipped with a life support system and a single DC power bus. In turn, the GB200 NVL36 system is a system with half the number of GB200. According to preliminary data, the GB200 NVL72 system will cost $3 million.

As TweakTown reports with reference to the Taiwanese publication UDN, leaks have been detected in the GB200 NVL72 liquid cooling systems, which, according to preliminary data, are associated with components from third-party manufacturers. Previously, Nvidia transferred the production of some cooling system components, such as pipes, quick connectors and hoses, to its partners – large international manufacturers.

Image Source: TheRegister.com

The leaks were discovered before mass production of the NVL36 and NVL72 AI systems began, giving manufacturers time to fix the problems and, despite the difficulties encountered and the threat of missed delivery dates to key customers, the product is expected to be delivered on time.

However, the incident has raised concerns among major cloud service providers who fear the reliability of Nvidia’s new servers. In response to the situation, Taiwanese manufacturers such as Shuanghong and Qihong have begun to ramp up production of liquid cooling components to provide Nvidia with alternative options.

Certification of pipes, quick-release couplings and hoses is a complex process that requires special knowledge and experience. Previously, Taiwanese companies did not specialize in the production of such components, but Nvidia’s decision to use liquid cooling in its AI chips pushed them to develop new technologies. Currently, active work is underway to eliminate the problem. It is expected that server cabinets with GB200 processors and the corrected cooling system will begin to be shipped to customers in the near future.

admin

Share
Published by
admin

Recent Posts

OpenAI Purges ChatGPT of Suspected Malicious Accounts from China and North Korea

OpenAI has suspended accounts of users in China and North Korea who allegedly used the…

4 minutes ago

“We Just Need More Power”: OpenAI Will Gradually Overcome Its Dependence on Microsoft

OpenAI currently relies heavily on the computing power of its major shareholder Microsoft to develop…

4 minutes ago

Trump’s Crypto Warm-Up: Coinbase Gets Off SEC Lawsuit With Little Blood

The largest US cryptocurrency exchange Coinbase has announced that the US Securities and Exchange Commission…

4 minutes ago

Jensen Huang Drops Nvidia Stock Crash on DeepSeek – Investors Got It All Wrong

The market misinterpreted the significance of technological advances from Chinese AI lab DeepSeek and drew…

4 minutes ago

The first OnePlus Watch 3 smartwatch has a silly bug that can’t be fixed

An annoying but rather funny mistake has been found on the back of the new…

4 minutes ago