Categories: Artificial Intelligence, Machine Learning, Neural NetworksTechnology and IT market. news

Low-grade AMD software prevents you from unlocking the potential of Instinct MI300X AI accelerators and bypassing Nvidia, experts have found

A five-month investigation by SemiAnalysis found that the AMD MI300X series of dedicated AI accelerators were not reaching their full potential due to serious software issues. This fact makes all the company’s efforts to impose fierce competition from Nvidia, which dominates the AI hardware market, pointless.

Image source: The Decoder

The study found that AMD software is riddled with bugs that make training AI models nearly impossible without significant debugging. So, while AMD works to ensure the quality and ease of use of its accelerators, Nvidia continues to widen the gap by rolling out new features, libraries and improving the performance of its solutions.

After extensive testing, including GEMM tests and single-node training, the researchers concluded that AMD is unable to overcome what they call the “impregnable CUDA moat” – the strong software advantage that Nvidia accelerators have.

Image source: Semianalysis

The AMD MI300X looks impressive on paper: 1307 teraflops in FP16 calculations and 192 GB of HBM3 memory. For comparison, Nvidia H100 accelerators have a performance of 989 teraflops and only have 80 GB of memory. However, the new generation of Nvidia H200 AI accelerators, with configurations up to 141 GB of memory, is closing the gap in available memory buffer. In addition, systems based on AMD accelerators also offer lower total cost of ownership due to lower system prices and more affordable network infrastructure support.

Image source: Semianalysis

However, these advantages mean little in practice. According to SemiAnalysis, comparing bare specs is like “comparing cameras by simply checking the megapixel count of one versus the other.” AMD, analysts say, is thus “just playing with numbers,” but its solutions do not provide a sufficient level of performance in real tasks.

The researchers note that they had to work directly with AMD engineers to fix numerous bugs in the software to obtain evaluable test results. At the same time, systems based on Nvidia accelerators worked smoothly and without any additional settings.

A particularly telling case for SemiAnalysis was when TensorWave, the largest provider of AMD GPU-based cloud solutions, was forced to give AMD’s engineering team free access to its GPUs—the same hardware that TensorWave purchased from AMD—just to troubleshoot software issues. provision.

To solve the problems, SemiAnalysis experts recommend AMD CEO Lisa Su to invest more actively in software development and testing. Specifically, they propose dedicating thousands of MI300X chips to automated testing (a similar approach Nvidia follows for its accelerators), simplifying complex environment variables while introducing more efficient default settings for accelerators. “Make the finished experience usable!” – experts call.

Representatives of SemiAnalysis admit in their report that they wish AMD success in competition with Nvidia, but note that “unfortunately, much remains to be done for this.” Without significant software improvements, AMD risks falling further behind as Nvidia prepares to mass release its next generation of Blackwell accelerators. Although, according to reports, this process is also not going entirely smoothly for Nvidia.

admin

Next It became known when Ray-Ban Meta smart glasses will receive displays »

Previous « Rockstar fans have convinced themselves that the second GTA VI trailer will be released on December 27

Low-grade AMD software prevents you from unlocking the potential of Instinct MI300X AI accelerators and bypassing Nvidia, experts have found

Recent Posts

AI has been taught to generate thousands of modifications of viruses that easily bypass antiviruses

In 2024, 30% more games were released on Steam than in the previous year – this is a new record

The incandescent lamp is back in business – physicists saw in it the basis of multispectral machine vision

Yandex has closed almost all international startups in the field of AI

ASRock will release 14 models of Socket AM5 motherboards based on the AMD B850 chipset

Photos of the Nvidia GeForce RTX 5090 printed circuit board with a large GB202 chip have been published