Chinese billionaire and hedge fund owner Liang Wenfeng has launched an artificial intelligence startup called DeepSeek, which develops cutting-edge models on limited budgets and technical resources, and explains how it can be done. In this, the company was ahead of the American market leaders and created a real stir, writes the Financial Times.
This week, DeepSeek released its R1 reasoning AI model and published instructions on how to cost-effectively build a large language model that can learn and improve on its own without human supervision. Pioneers in the development of “reasoning” models that mimic human cognitive abilities are OpenAI and Google DeepMind. In December, OpenAI released the full version of its o1 neural network, but did not disclose how it was developed. The release of DeepSeek R1 raised questions about whether well-resourced US companies with AI projects, including Meta✴ and Anthropic, will be able to maintain their technological advantage.
Back in 2021, Liang Wanfen began to purchase thousands of NVIDIA graphic processors for her side-line AI project-its main place of work was the High-Flyer Foundation. Then his actions were seen as the eccentric behavior of a billionaire, who is looking for a new hobby for himself. He was not taken seriously when he talked about the launch of a cluster of 10,000 NVIDIA accelerators, and he himself could not clearly formulate his goals – he simply claimed: “I want to build it, and this would change the rules of the game.” It was believed that only guies of the scale of ByTedance and Alibaba can do it. He earned his billions in High-Flyer using AI and algorithms to identify patterns that can affect stock quotes. His team reached heights using NVIDIA chips in shares trading. In 2023, Liang Wanfenge launched the Deepseek startup and announced his intention to create human levels.
American sanctions that limited Chinese companies’ access to AI accelerators did not hinder the company’s work – its engineers already knew “how to unlock the potential of these GPUs, even if they are not the latest.” What makes DeepSeek especially dangerous is that it is willing to share its achievements rather than hide them for commercial gain. The company did not raise funds from external sources and did not take significant steps to monetize its models; its field is research and engineering work, which is similar to early DeepMind. Liang characterizes DeepSeek as a “local” company, staffed by PhDs from top Chinese universities rather than American institutions; and last year he said in an interview that there were no people in the main team who had returned from abroad.
To train one of its models with 671 billion parameters, DeepSeek used just 2,048 Nvidia H800 AI accelerators and spent $5.6 million, a fraction of what OpenAI and Google spend on training systems of comparable size. China does have many experts who know how to train and run AI models with limited computing resources, experts admit; but there is no guarantee, they continue, that DeepSeek will be able to remain competitive as the industry evolves. At the same time, the profitability of the High-Flyer company, at whose expense DeepSeek mainly exists, decreased at the end of 2024, because its head is now more interested in AI technologies.