The AI market is currently seeing a trend towards using small language models (SLMs), which have fewer parameters than large language models (LLMs) and are better suited for a narrower range of tasks, writes Wired magazine.
Image source: Luke Jones/unsplash.com
The latest versions of LLMs from OpenAI, Meta, and DeepSeek have hundreds of billions of parameters, making them better at identifying patterns and relationships, making them more powerful and accurate. However, training and running them requires enormous computational and financial resources. For example, training the Gemini 1.0 Ultra model cost Google $191 million. According to the Electric Power Research Institute, running a single query on ChatGPT requires about 10 times more energy than a single Google search.
IBM, Google, Microsoft, and OpenAI have all recently released SLMs with just a few billion parameters. They’re not general-purpose tools like LLMs, but they’re great at more narrowly defined tasks, like summarizing conversations, answering patient questions as a health chatbot, and collecting data on smart devices. “They can also run on a laptop or a mobile phone, rather than in a huge data center,” said Zico Kolter, a computer scientist at Carnegie Mellon University.
To train small models, researchers use several techniques, such as knowledge distillation, in which the LLM generates a high-quality dataset by transferring knowledge to the SLM, like a teacher giving lessons to a student. Small models are also created from larger ones by “pruning” — removing unnecessary or ineffective parts of the neural network.
Because SLMs have fewer parameters than larger models, their reasoning can be more transparent. A small target model will perform just as well as a large one on specific tasks, but it will be easier to develop and train. “These efficient models can save money, time, and computing resources,” said Leshem Choshen, a research scientist at the MIT-IBM Watson AI Lab.