Categories: Technology and IT market. news

Meta presented its version of the NVIDIA GB200 NVL72 super accelerator

Meta✴ shared its innovations in the field of hardware infrastructure and explained exactly how it sees the future of open AI platforms. In her presentation, Meta✴ talked about the new AI platform, new rack designs, including options with increased power supply, as well as innovations in the field of network infrastructure.

Image source: Meta✴

The company currently uses the Llama 3.1 405B neural network. The context window of this LLM reaches 128 thousand tokens, but the total number of tokens is over 15 trillion. To train such models, very serious resources and deep optimization of the entire software and hardware stack are required. A cluster of 16 thousand NVIDIA H100 accelerators, one of the first of this scale, participated in training the basic Llama 3.1 405B model. But Meta✴ already uses two clusters, each with 24 thousand accelerators, to train AI models.

Projects of this scale depend on more than just accelerators. The problems of power supply, cooling and, most importantly, interconnection come to the fore. Over the next few years, Meta✴ expects speeds in the region of 1 TB/s per accelerator. All this will require a new, even denser architecture, which, according to Meta✴, should be based on open hardware standards.

One of the new products was the Catalina platform. This is an Orv3 rack, the heart of which is NVIDIA GB200 hybrid processors. The rack belongs to the HPR (High Power Rack) class and is designed for 140 kW. Microsoft and Meta✴ are currently working on a modular and scalable Mount Diablo power system. Microsoft also has its own version of the GB200 NVL72. Meta✴ also updated the Grand Teton AI servers, first introduced in 2022. These are still monolithic systems, but now they support not only NVIDIA accelerators, but also AMD Instinct MI300X and future MI325X.

The interconnect of future platforms will be the DSF (Disaggregated Scheduled Fabric) network. By moving to open standards, the company plans to avoid limitations associated with scaling, dependence on hardware vendors and power density. DSF is based on the OCP-SAI standard and Meta✴ FBOSS OS for switches. The hardware is based on a standard Ethernet/RoCE interface.

Meta✴ has already developed and manufactured new 51T class switches based on Broadcom and Cisco silicon, as well as FBNIC network adapters created with the support of Marvell. FBNIC can have up to four 100GbE ports. The PCIe 5.0 interface is used, and it can work as four separate slices. The new product complies with the open standard OCP NIC 3.0 v1.2.0.

admin

Next Google changes leadership in search and advertising divisions »

Previous « Asus introduced 1000 and 1200 W ROG Thor III power supplies with GaN transistors and a magnetic OLED screen

Meta presented its version of the NVIDIA GB200 NVL72 super accelerator

Recent Posts

Qualcomm began releasing defective Snapdragon 8 Elite

Samsung TVs will receive useful AI functions thanks to integration with OpenAI neural networks

OpenAI completes development of powerful AI model o3-mini with reasoning ability

Astronomers have obtained the most detailed infrared image of an active galactic nucleus yet

It became known what Durov talked about and what he promised during interrogation in a French court

Donald Trump Posts ‘Official Meme’ – Some Earn Millions of Dollars from It in Minutes