A new pastime has arisen among American IT giants – a competition to see who has the largest clusters and the strongest confidence in the superiority of their capabilities for training large AI language models. Only recently, the head of Tesla, Elon Musk, boasted of the completion of the assembly of the xAI Colossus supercomputer with 100 thousand Nvidia H100 accelerators for AI training, as well as the use of more than 100 thousand of the same AI accelerators by the head of Meta✴ Mark Zuckerberg. .
The head of Meta✴ noted that the said system is used to train a large next-generation language model Llama 4. This LLM is trained “on a cluster that uses more than 100,000 H100 GPU AI processors, and that is more than anything I have seen in reporting on what others are doing,” Zuckerberg said. He did not share details about what exactly Llama 4 can already do. However, as Wired writes, citing a statement by the head of Meta✴, their AI model has acquired “new modalities”, “becomes stronger in reasoning” and “much faster” .
With this comment, Zuckerberg clearly wanted to prick Musk, who had previously stated that his xAI Colossus supercluster uses 100 thousand Nvidia H100 accelerators to train the Grok AI model. Musk later said that the number of accelerators in xAI Colossus will be tripled in the future. Meta✴ also previously stated that it plans to receive AI accelerators equivalent to more than half a million H100 by the end of this year. Thus, Zuckerberg’s company already has a significant amount of equipment to train its AI models, and there will be more.
Meta✴ takes a unique approach to distributing its Llama models – it makes them completely free, allowing other researchers, companies and organizations to create new products based on them. This distinguishes it from the same GPT-4o from OpenAI and Gemini from Google, which are available only through the API. However, Meta✴ does impose some restrictions on the Llama license, such as commercial use. In addition, the company does not disclose exactly how its models are trained. Otherwise, Llama models are “open source” in nature.
Considering the stated number of accelerators used to train AI models, the question arises: how much electricity does all this require? One specialized accelerator can consume up to 3.7 MWh of energy per year. This means that 100 thousand of these accelerators will consume at least 370 GWh of electricity – as noted, enough to provide energy to over 34 million average American households. How do companies extract all this energy? According to Zuckerberg himself, over time the AI field will face limitations in available energy capacity.
Elon Musk’s company, for example, uses several huge mobile generators to power a supercluster of 100,000 accelerators located in a building of more than 7,000 m2 in Memphis, Tennessee. Google may miss its carbon targets, having increased greenhouse gas emissions from its data centers by 48% since 2019. Against this backdrop, the former CEO of Google even suggested that the US abandon its climate goals, allowing AI companies to operate at full capacity and then use the developed AI technologies to solve the climate crisis.
Meta✴ avoided answering the question of how the company managed to power such a giant computing cluster. The need to supply the growing amount of energy used for AI has forced the same technology giants Amazon, Oracle, Microsoft and Google to turn to nuclear energy. Some are investing in the development of small nuclear reactors, while others have signed contracts to restart old nuclear power plants to meet growing energy needs.