Microsoft Releases a Pair of Open Source, Effective Phi-4 AI Models — One Retrained with a New Method

Microsoft has expanded its line of large language models of artificial intelligence Phi-4 with two new projects with relatively modest system requirements. One of them is multimodal, that is, it works with several data formats.

Image source: microsoft.com

Microsoft’s Phi-4-mini is a text-only model, while Phi-4-multimodal is an improved version that can also handle visual and audio queries. Both models, the developer claims, significantly outperform comparable-sized alternatives in certain tasks.

Microsoft Phi-4-mini has 3.8 billion parameters, which means it is compact enough to run on mobile devices. The model is based on a special version of the Transformer architecture. In the standard version, transformer models analyze the text before and after each word to understand its meaning; when developing Phi-4-mini, Microsoft used a version of the Decoder-Only Transformer, which analyzes only the text preceding the word, which reduces the load on computing resources and increases the speed of data processing.

For additional optimization, the model uses Grouped Query Attention technology, which helps it determine which pieces of data are most relevant to the current task. Phi-4-mini can generate text, translate documents, and manage external applications; the model, according to its developers, excels at solving math problems and writing computer code, even when “complex reasoning” is required. Microsoft itself estimates that the accuracy of Phi-4-mini’s answers “significantly” exceeds the results of several other similarly sized models.

Phi-4-multimodal is an extended version of Phi-4-mini with 5.6 billion parameters; it accepts not only text, but also images, audio, and video as queries. To further train the model, Microsoft used the new Mixture of LoRAs method. Usually, adapting an AI to a new task requires changing its weights — configuration parameters that determine how it processes data. To make this task easier, the LoRA (Low-Rank Adaptation) method is used — a small number of new weights optimized for the task are added to the model to perform an unfamiliar task. The Mixture of LoRAs method adapts this mechanism to multimodal data processing: when developing Phi-4-multimodal, the original Phi-4-mini was supplemented with weights optimized for working with audio and video. As a result, Microsoft said, it was possible to soften some of the compromises associated with other approaches to building multimodal models.

In visual processing tests, Phi-4-multimodal scored 72 points, slightly behind leading models from OpenAI and Google. In simultaneous video and audio processing, it “far outperformed” Google’s Gemini-2.0 Flash and the open-source InternOmni. Phi-4-mini and Phi-4-multimodal are available on the Hugging Face platform under an MIT license, which allows commercial use.

admin

Share
Published by
admin

Recent Posts

“What do you see: craters or bulges?” – Japanese probe Resilience photographs the south pole of the Moon

The Japanese private probe Resilience has taken a high-quality photo of the Moon's south pole…

16 hours ago

A database containing data from 184 million accounts of Apple, Google, Microsoft and other services was just lying on the Internet

Cybersecurity researcher Jeremiah Fowler discovered a publicly available database with more than 184 million logins…

16 hours ago

Doom: The Dark Ages Is Rightfully the King. Review

Played on PC In 2016, the Doom series returned to our screens, and did so…

16 hours ago

Apple’s 25% tariffs will affect Samsung smartphones, Trump explains

US President Donald Trump this week said he would impose a 25% tariff on iPhones…

16 hours ago

Thermaltake Shows Off IX700 PC Case with Immersion Cooling

Thermaltake unveiled a prototype of the IX700 system unit with an immersion cooling system at…

16 hours ago

Warhammer 40,000: Boltgun 2 Will Be Released in 2026, and You Won’t Have to Wait for a Free Printed Shooter Based on the First Part

At the Warhammer Skulls 2025 presentation, developers from the British studio Auroch Digital announced a…

2 days ago