Microsoft has released three new small language models (SLMs) under an open license: Phi-4-mini-reasoning, Phi-4-reasoning, and Phi-4-reasoning-plus. Each model belongs to a class of reasoning models that focus on logically verifying decisions and rigorously solving complex problems. These AI models are a continuation of Microsoft’s initiative to develop compact AI systems — the Phi family, first introduced a year ago as a foundation for applications running on devices with limited computing capabilities.
Image source: Jackson Sophat / Unsplash
The most productive of the presented AI model is Phi-4-reasoning-plus. It is an adaptation of the previously released Phi-4 for logical inference tasks. According to Microsoft, its answer quality is close to DeepSeek R1, despite a significant difference in the number of parameters: DeepSeek R1 has 671 billion, while Phi-4-reasoning-plus has significantly fewer. According to internal testing by Microsoft, this model showed results in line with the OpenAI o3-mini AI model in the OmniMath benchmark, which evaluates the mathematical abilities of AI.
The Phi-4-reasoning and Phi-4-reasoning-plus models (14 billion parameters) demonstrate superiority over the baseline Phi-4 and compete well with larger systems, including DeepSeek-R1 Distill (70 billion parameters) and OpenAI o3-mini, on mathematical and logical reasoning tasks (AIME, HMMT, OmniMath, GPQA). Image source: Microsoft
The Phi-4-reasoning model contains 14 billion parameters and was trained on “quality” data from the Internet, as well as selected demo examples from o3-mini. It is optimized for tasks in the fields of mathematics, natural sciences, and programming. Thus, Phi-4 reasoning is focused on high-precision calculations and analytical interpretation of data, while remaining relatively compact and accessible for use on local computing platforms.
On general-purpose benchmarks including FlenQA, IFEval, HumanEvalPlus, MMLUPro, ToxiGen, and PhiBench, Phi-4-reasoning-plus models demonstrate comparable accuracy to GPT-4o and o3-mini despite having a smaller parameter size (14 billion parameters), especially in programming, logic, and security tasks. Image source: Microsoft
Phi-4-mini-reasoning is the smallest of the presented SLMs, with a size of about 3.8 billion parameters. It was trained on the basis of approximately 1 million synthetic mathematical problems generated by the R1 AI model of the Chinese startup DeepSeek. Microsoft positions it as an AI model for educational scenarios, including “embedded learning” on low-power and mobile devices. Due to its compactness and accuracy, this AI model can be used in interactive learning systems where responsiveness and limited computing resources are a priority.
Phi-4-mini-reasoning (3.8 billion parameters) significantly outperforms its baseline and models with twice the size on AIME 24, MATH-500, and GPQA Diamond, and is comparable to or better than OpenAI o1-mini in long math answer generation accuracy. Image source: Microsoft
All three AI models are available on the Hugging Face platform and are open-sourced. According to Microsoft, they were trained using distillation, reinforcement learning, and high-quality training data. These techniques helped balance the size of the SLMs with their computational performance. The AI models are small enough to be used in low-latency environments, but can solve problems that require rigorous logic and reliable results, problems that were previously reserved for much larger AIs.