Cerebras WSE-3 accelerator king single-handedly trained an AI model with 1 trillion parameters

Cerebras Systems, in collaboration with the US Department of Energy (DOE) Sandia National Laboratories (SNL), conducted a successful experiment to train an AI model with 1 trillion parameters using a single CS-3 system with a WSE-3 czar accelerator and 55 TB of MemoryX external memory.

Training models of this scale typically requires thousands of GPU-based accelerators that consume megawatts of power, dozens of experts, and weeks of hardware and software tuning, Cerebras says. However, SNL scientists were able to train the model on a single system without making changes to either the model or the infrastructure software. Moreover, they were able to achieve almost linear scaling – 16 CS-3 systems showed a 15.3-fold increase in learning speed.

Image source: Cerebras

A model of this scale requires terabytes of memory, thousands of times more than is available on a single GPU. In other words, classical clusters of thousands of accelerators must be correctly connected to each other before training begins. Cerebras systems for storing scales use external MemoryX memory based on 1U nodes with the most common DDR5, making it as easy to train a model with a trillion parameters as a small model on a single accelerator, the company says.

Previously, SNL and Cerebras deployed the Kingfisher cluster based on CS-3 systems, which will be used as a test platform for the development of AI technologies for national security.

admin

Share
Published by
admin

Recent Posts

Xiaomi to Unveil 15 Ultra Smartphone and 1,500-HP SU7 Ultra Sports Car on February 27

The Chinese company Xiaomi has set a date for the presentation of two important new…

2 hours ago

Express test of external SSD-drive MSI Datamag 20Gbps

Today we will talk about a new gadget from MSI, which the manufacturer itself mysteriously…

8 hours ago

Apple to Release Updated MacBook Air with M4 Chip in March 2025

Apple is preparing to launch updated 13- and 15-inch versions of the MacBook Air laptop,…

10 hours ago

Official Radeon RX 9070 XT Relative Performance Leaked to Press

The VideoCardz portal writes that AMD held a closed briefing for journalists this week, where…

10 hours ago

Kindergarten of some kind: former German data center converted into preschool

Bonn, Germany, is in dire need of kindergartens, so they are sometimes placed in the…

10 hours ago