Amazon’s Amazon Web Services (AWS) division announced at its re:Invent conference that customers of its cloud platform can now use systems powered by Trainium2 accelerators, designed to train and run large artificial intelligence language models.

Image source: aws.amazon.com

The chips introduced last year are four times faster than their predecessors: one EC2 instance with 16 Trainium2 accelerators offers performance of up to 20.8 Pflops. This means that when deploying the Meta✴ Llama 405B scale model on the Amazon Bedrock platform, the customer will receive a “3x increase in token generation speed compared to other available offerings from major cloud providers.” You can also choose the EC2 Trn2 UltraServer system with 64 Trainium2 accelerators and 83.2 Pflops of performance. It is noted that the figure of 20.8 Pflops refers to dense models and FP8 accuracy, and 83.2 Pflops refers to sparse models and FP8. For communication between accelerators in UltraServer systems, the NeuronLink interconnect is used.

Together with its partner Anthropic, OpanAI’s main competitor in the field of large language models, AWS intends to build a large cluster of UltraServer systems with “hundreds of thousands of Trainium2 chips” where the startup can train its models. It will be five times more powerful than the cluster on which Anthropic trained its current generation models—AWS estimates it will “be the world’s largest AI compute cluster reported to date.” The project will help the company surpass the performance of current Nvidia accelerators, which are still in high demand and remain in short supply. Although early next year Nvidia is preparing to launch a new generation of Blackwell accelerators, which, with 72 chips per rack, will offer up to 720 Pflops for FP8.

Perhaps that’s why AWS has already announced the next generation of Trainium3 accelerators, which offer another fourfold increase in performance for UltraServer systems – the accelerators will be manufactured using the 3 nm process technology, and their deployment will begin in late 2025. The company justified the need for new generation systems by the fact that modern AI models are approaching trillions of parameters in scale. Trn2 instances are currently only available in the US East region of the AWS infrastructure, but will soon appear in others; UltraServer systems currently operate in pre-access mode.

Leave a Reply

Your email address will not be published. Required fields are marked *