Ant Group has unveiled a new method for training AI models that can use Chinese semiconductors, including chips from Huawei and Alibaba. The company has used the Mixture of Experts architecture and has already achieved results comparable to those using Nvidia H800 graphics processing units (GPUs), strengthening China’s position amid restrictions imposed by the United States.

Image source: Ant Group CO

The achievement marks a major milestone in the tech battle between Chinese and American companies, which has escalated dramatically since DeepSeek proved that it is possible to build modern large language models (LLMs) without the billions of dollars invested by OpenAI and Google. While Ant Group still uses Nvidia for some projects, it has been favoring alternative suppliers, including AMD, as well as local Chinese semiconductor makers, for new developments, especially as pressure from U.S. export restrictions mounts. This allows Chinese companies to maintain the pace of technological progress and reduce their dependence on foreign suppliers, especially Nvidia.

According to a research paper published in March, Ant Group claims that its AI models outperformed Meta✴ in certain tests. However, these claims have yet to be independently verified. It is important to note that the H800, while not Nvidia’s top-of-the-line accelerator, is still a powerful tool capable of handling resource-intensive AI training tasks. Ant Group’s own optimized strategy has reduced the cost of training a 1 trillion token AI model from 6.35 million yuan ($880,000) to 5.1 million yuan ($707,000). In this context, tokens are the smallest units of text that LLMs are trained on to generate meaningful responses to user queries.

The company has announced plans to deploy its new language models, Ling-Plus and Ling-Lite, in solutions aimed at industrial applications, including healthcare and finance. Ant Group has already acquired Chinese online medical services platform Haodf.com to expand the capabilities of its AI infrastructure in healthcare. The company is also developing a mobile app called Zhixiaobao, positioned as an AI assistant for everyday life, and Maxiaocai, an AI-powered service that provides financial recommendations.

The published scientific paper emphasizes that the Ling-Lite model showed better results in one of the key English-language tests compared to one of the versions of Llama from Meta✴. At the same time, both models — Ling-Lite and Ling-Plus — outperformed their DeepSeek counterparts in Chinese-language benchmarks. Ling-Lite contains 16.8 billion parameters — these are customizable elements of the model that determine its behavior when generating text. The Ling-Plus model has 290 billion parameters and, in terms of scale, belongs to the category of large language systems. Both models were presented to the developer community as open-source solutions. According to MIT Technology Review, OpenAI’s GPT-4.5 contains about 1.8 trillion parameters, and DeepSeek-R1 — 671 billion.

The Mixture of Experts architecture used in Ling models involves activating individual subnetworks within the model depending on the type of task, thereby ensuring optimal distribution of computing resources. This system resembles a team of specialists, in which each element of the AI ​​model is responsible for a strictly defined, highly specialized function. However, difficulties arose during the training process: as reported in the scientific article, even minor changes in the hardware configuration or in the structure of the model led to a sharp increase in the number of errors. Such instability makes the training process sensitive to environmental parameters and requires additional adaptation at each stage.

Leave a Reply

Your email address will not be published. Required fields are marked *