Categories: Servers, clusters, supercomputers, industrial and multiprocessor computersTechnology and IT market. news

AI will do without Nvidia: Amazon has released systems on Trainium2 chips, and a year later Trainium3 will be released

Amazon’s Amazon Web Services (AWS) division announced at its re:Invent conference that customers of its cloud platform can now use systems powered by Trainium2 accelerators, designed to train and run large artificial intelligence language models.

Image source: aws.amazon.com

The chips introduced last year are four times faster than their predecessors: one EC2 instance with 16 Trainium2 accelerators offers performance of up to 20.8 Pflops. This means that when deploying the Meta✴ Llama 405B scale model on the Amazon Bedrock platform, the customer will receive a “3x increase in token generation speed compared to other available offerings from major cloud providers.” You can also choose the EC2 Trn2 UltraServer system with 64 Trainium2 accelerators and 83.2 Pflops of performance. It is noted that the figure of 20.8 Pflops refers to dense models and FP8 accuracy, and 83.2 Pflops refers to sparse models and FP8. For communication between accelerators in UltraServer systems, the NeuronLink interconnect is used.

Together with its partner Anthropic, OpanAI’s main competitor in the field of large language models, AWS intends to build a large cluster of UltraServer systems with “hundreds of thousands of Trainium2 chips” where the startup can train its models. It will be five times more powerful than the cluster on which Anthropic trained its current generation models—AWS estimates it will “be the world’s largest AI compute cluster reported to date.” The project will help the company surpass the performance of current Nvidia accelerators, which are still in high demand and remain in short supply. Although early next year Nvidia is preparing to launch a new generation of Blackwell accelerators, which, with 72 chips per rack, will offer up to 720 Pflops for FP8.

Perhaps that’s why AWS has already announced the next generation of Trainium3 accelerators, which offer another fourfold increase in performance for UltraServer systems – the accelerators will be manufactured using the 3 nm process technology, and their deployment will begin in late 2025. The company justified the need for new generation systems by the fact that modern AI models are approaching trillions of parameters in scale. Trn2 instances are currently only available in the US East region of the AWS infrastructure, but will soon appear in others; UltraServer systems currently operate in pre-access mode.

admin

Next Apple will release an "iPad on a stem" or a "HomePod with a screen" later than expected »

Previous « MaxSun introduced a white Arc B580 iCraft video card and a black Arc B580 Milestone

AI will do without Nvidia: Amazon has released systems on Trainium2 chips, and a year later Trainium3 will be released

Recent Posts

The potential US Secretary of Transportation promised to deal with SpaceX fines and eliminate the space bureaucracy

Vast Space has built the world’s first private space station; it will go into orbit this year

Qualcomm began releasing defective Snapdragon 8 Elite

Samsung TVs will receive useful AI functions thanks to integration with OpenAI neural networks

OpenAI completes development of powerful AI model o3-mini with reasoning ability

Astronomers have obtained the most detailed infrared image of an active galactic nucleus yet