NVIDIA announced a new family of Llama Nemotron AI models with advanced reasoning capabilities. Based on the open-source Llama models from Meta✴ Platforms, NVIDIA’s models are designed to provide developers with a foundation for building advanced AI agents that can work independently or with minimal supervision in connected teams on behalf of their users to solve complex problems.
«“Agents are autonomous software systems designed to reason, plan, act, and reflect on their own work,” Kari Briski, vice president of Generative AI software product management at NVIDIA, said in a press briefing, according to VentureBeat. “Like humans, agents need to understand context to break down complex queries, understand user intent, and adapt in real time,” she added. Using Llama as a foundation, Briski said, NVIDIA optimized the model for its compute requirements while maintaining the accuracy of its responses.
Image source: NVIDIA
NVIDIA said it has enhanced its new family of reasoning models through retraining to improve multi-step math, coding, reasoning, and complex decision making. This has improved the models’ answer accuracy by up to 20% compared to the baseline model and increased inference speed by up to five times compared to other leading open source reasoning models. The improved inference performance means the models can handle more complex reasoning tasks, have enhanced decision-making capabilities, and can reduce operational costs for enterprises, the company said.
Llama Nemotron models are available in NVIDIA NIM microservices in Nano, Super, and Ultra editions. They are optimized for different deployment scenarios: Nano for PCs and edge devices while maintaining high reasoning accuracy, Super for optimal throughput and accuracy on a single accelerator, and Ultra for maximum “agent accuracy” in multi-accelerator data center environments.
According to NVIDIA, extensive retraining was conducted on the NVIDIA DGX Cloud using high-quality curated synthetic data generated by NVIDIA Nemotron and other open source models, as well as additional curated datasets co-created by NVIDIA. The training included 360,000 hours of inference using H100 accelerators and 45,000 hours of human annotation to improve reasoning capabilities. The tools, datasets, and optimization methods used to develop the models will be open source, the company said, giving enterprises the flexibility to build their own custom reasoning models.
One of the key features of NVIDIA Llama Nemotron is the ability to turn on and off the reasoning option. This is a new feature in the AI market, the company says. Anthropic Claude 3.7 has somewhat similar functionality, although it is a closed, proprietary model. Among the open-source models, IBM Granite 3.2 also has a reasoning switch, which IBM calls “conditional reasoning.”
The beauty of hybrid or conditional reasoning is that it allows systems to eliminate computationally expensive reasoning steps for simple queries. NVIDIA demonstrated how a model can engage in complex reasoning when solving a combinatorial problem, but switch to a direct answer mode for simple factual queries.
NVIDIA said that a number of partners are already using Llama Nemotron models to build powerful new AI agents. For example, Microsoft has added Llama Nemotron and NIM microservices to Microsoft Azure AI Foundry. SAP SE is using Llama Nemotron models to improve the capabilities of its Joule AI assistant and the SAP Business AI portfolio. The company is also using NVIDIA NIM and NVIDIA NeMo microservices to improve the accuracy of code completion for the ABAP language.
ServiceNow uses Llama Nemotron models to create AI agents that improve the performance and accuracy of tasks for enterprises across industries. Accenture has made NVIDIA Llama Nemotron reasoning models available in its AI Refinery platform. Deloitte plans to include Llama Nemotron models in its recently announced Zora AI agent platform. Atlassian and Box are also working with NVIDIA to ensure their customers have access to Llama Nemotron models.