Alibaba Group Holding continues to actively work in the field of artificial intelligence. This week, the e-commerce giant released several large language models (LLMs) under the collective name Qwen2-Math, which are focused on solving complex mathematical problems and, according to the developers, do it better than AI algorithms from other companies.
In total, three large language models were presented, which differ from each other in the number of parameters that affect the accuracy of the algorithm’s answers. The model with the most parameters, Qwen2-Math-72B-Instruct, according to the developers, is superior to many AI algorithms in terms of solving mathematical problems, including GPT-4o from OpenAI, Claude 3.5 Sonnet from Anthropic, Gemini 1.5 Pro from Google and Llama-3.1 -405B from Meta✴ Platforms.
«Over the past year, we have done a lot of work exploring and expanding the logical capabilities of large language models, with a particular focus on their ability to solve arithmetic and mathematical problems. We hope that Qwen2-Math will contribute to the community’s efforts to solve complex mathematical problems.” message from the developers.
Qwen2-Math’s language models were tested against a variety of benchmarks, including GSM8K (8,500 complex and varied high school-level math problems), OlympiadBench (a high-level bilingual multimodal science benchmark), and Gaokao (one of the toughest university-level math entrance exams). It is noted that the new models have some limitations due to “support for English only.” In the future, the developers plan to create bilingual and multilingual LLMs.