Cerebras Systems, in collaboration with the US Department of Energy (DOE) Sandia National Laboratories (SNL), conducted a successful experiment to train an AI model with 1 trillion parameters using a single CS-3 system with a WSE-3 czar accelerator and 55 TB of MemoryX external memory.
Training models of this scale typically requires thousands of GPU-based accelerators that consume megawatts of power, dozens of experts, and weeks of hardware and software tuning, Cerebras says. However, SNL scientists were able to train the model on a single system without making changes to either the model or the infrastructure software. Moreover, they were able to achieve almost linear scaling – 16 CS-3 systems showed a 15.3-fold increase in learning speed.
A model of this scale requires terabytes of memory, thousands of times more than is available on a single GPU. In other words, classical clusters of thousands of accelerators must be correctly connected to each other before training begins. Cerebras systems for storing scales use external MemoryX memory based on 1U nodes with the most common DDR5, making it as easy to train a model with a trillion parameters as a small model on a single accelerator, the company says.
Previously, SNL and Cerebras deployed the Kingfisher cluster based on CS-3 systems, which will be used as a test platform for the development of AI technologies for national security.
Donald Trump, who during his first term criticized cryptocurrencies as a whole, by the time…
The Chinese company Dasung has released a compact monochrome touchscreen monitor, Paperlike 103, equipped with…
Google has launched a new security feature for Android 15 that will help protect users'…
Nvidia has talked a lot about evolutionary design solutions for its graphics card cooling systems,…
Google-owned Fitbit will pay a $12.25 million fine over problems with its Ionic smartwatch. The…
One of the most famous American retailers, B&H, announced that it will begin accepting pre-orders…