Work for us: AWS will provide scientists with clusters of 40 thousand Trainium AI accelerators

AWS intends to attract more people to develop AI applications and frameworks using Amazon’s Tranium family of accelerators. As part of the new Build on Trainium initiative, with $110 million in funding, academia will be provided with UltraClaster clusters, including up to 40 thousand accelerators, The Register reports.

As part of the Build on Trainium program, it is planned to provide access to the cluster to representatives of universities who are engaged in the development of new AI algorithms that can increase the efficiency of using accelerators and improve the scaling of calculations in large distributed systems. It is not specified which generation of chips, Trainium1 or Trainium2, the clusters will be built on.

Image source: AWS

As the AWS blog itself explains, researchers may come up with new AI model architectures or new performance optimization technology, but they may not have access to HPC resources for large experiments. Equally important, the fruits of the labor are expected to be distributed through an open source model, so the entire machine learning ecosystem will benefit from this.

However, there is little altruism on the part of AWS. Firstly, $110 million will be issued to selected projects in the form of cloud loans, this is not the first time this has happened. Secondly, the company is actually trying to shift some of its tasks to other people. AWS custom chips, including AI accelerators for training and inference, were originally developed to improve the efficiency of the company’s internal tasks. However, low-level frameworks, etc. The software is not designed to be freely used by a wide range of people, as, for example, is the case with NVIDIA CUDA.

In other words, to popularize Trainium, AWS needs software that is easier to learn, and even better, ready-made solutions for application problems. It is no coincidence that Intel and AMD tend to offer developers ready-made frameworks like PyTorch and TensorFlow optimized for their accelerators, rather than trying to force them to do fairly low-level programming. AWS does the same thing with products like SageMaker.

The project is largely possible thanks to the new Neuron Kernel Interface (NKI) for AWS Tranium and Inferentia, which provides direct access to the chip’s instruction set and allows researchers to build optimized computing kernels for new models, performance optimization and innovation in general. However, scientists – unlike ordinary developers – are often interested in working with low-level systems.

admin

Share
Published by
admin

Recent Posts

OpenAI accidentally deleted potential evidence in copyright lawsuit

Late last year, a lawsuit began in which The New York Times and other major…

1 hour ago

Hidden features of Microsoft Bing Wallpaper scared users

Microsoft has released the Bing Wallpaper app, which updates your desktop background daily using images…

2 hours ago

“There will be more to come”: a Rockstar employee intrigued fans with “absolutely mind-blowing things” in GTA VI

While fans eagerly await the next GTA VI trailer, Rockstar Games' ambitious open-world crime thriller…

2 hours ago

“James Webb” was the first in history to find the “Einstein zigzag” – a unique curvature of space-time

Gravitational lensing, predicted 90 years ago by Einstein, was confirmed by observation four years after…

2 hours ago