Meta✴ Platforms has announced the launch of the Llama 4 family of open source AI models. It includes Llama 4 Scout, Maverick, and Behemoth, which provide multimodal interaction capabilities, meaning they can respond not only to text queries but also process images, videos, and more. They were trained on “large amounts of unlabeled text, image, and video data” to provide “broad visual understanding.”

Artificial Intelligence: Steve Johnson / Unsplash

The success of Chinese company DeepSeek’s AI models, which perform on par or better than previous generations of Llama’s flagship algorithms, has prompted Meta✴ to accelerate its development in this area. According to the source, the company’s employees are working hard to understand how DeepSeek was able to reduce the cost of developing and launching AI models such as R1 and V3.

Image source: Meta✴

The Llama 4 Scout algorithm has 17 billion active parameters, 16 “experts” and 109 billion parameters in total. According to Meta✴, the AI ​​model outperforms Gemma 3, Gemini 2.0 Flash-Lite and Mistral 3.1 in processing different types of tasks. One of the main features of the model is the support of a context window of 10 million tokens.

Llama 4 Maverick has 17 billion active parameters and 128 “experts” (400 billion parameters in total). According to the developers, the model outperforms GPT-4o and Gemini 2.0 Flash in various benchmarks, and also shows comparable results to DeepSeek V3 in reasoning and in the process of writing program code. Scout can run on a single Nvidia H100 GPU, while Maverick requires an Nvidia H100 DGX system or equivalent.

Llama 4 Behemoth has 288 billion active parameters and 16 “experts” (around 2 trillion parameters in total) and outperforms GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro in various benchmarks. Llama 4 Behemoth is still learning and is not yet publicly available. Meanwhile, Scout and Maverick are available on Llama.com and Hugging Face. In addition, Meta✴ AI, the company’s proprietary AI assistant available in apps like WhatsApp, Messenger, and Instagram✴, has been migrated to work with Llama 4 in 40 countries. The ability to handle multimodal queries is currently limited to English and is only available in the US.

Image source: Meta✴

«“The Llama 4 models mark the beginning of a new era for the Llama ecosystem. This is just the beginning for the Llama 4 family,” Meta✴ said in a blog post. The company claims that Llama 4 is the first group of AI models to use a “mixture of experts” (MoE) architecture, which is more efficient at training and inference. The MoE architecture allows the algorithm to break down tasks into subtasks and then delegate their processing to smaller, more specialized “expert” models.

It’s worth noting that none of Llama 4’s models are proper “reasoning” models, like OpenAI’s GPT-o1 or GPT-o3-mini. Reasoning models check their answers for validity and tend to be more reliable, but they take longer to produce than traditional “non-reasoning” models.

Leave a Reply

Your email address will not be published. Required fields are marked *