Categories: Technology and IT market. news

Not just NVIDIA accelerators: data tagging is becoming one of the main reasons for the rising cost of AI models

Building and training powerful AI models can cost companies hundreds of millions to billions of dollars a year. For example, OpenAI intends to spend up to $7 billion for these purposes in 2024. The bulk of the costs are on hardware resources, including expensive NVIDIA accelerators. But as Fortune reports, there’s another important expense that’s often overlooked: the need for quality data labeling. Meanwhile, it is this work that requires increasingly large financial investments.

Tagging (or tagging) is the process of identifying raw data (images, text files, videos, etc.) and adding one or more meaningful and informative labels to provide context. This is necessary so that the AI model can learn from such amounts of information. Data tagging is required for a variety of use cases, including computer vision, natural language processing, and speech recognition.

Markup has long been used, for example, in developing AI models for self-driving cars. The camera captures images of people, street signs, vehicles and traffic lights, and human annotators tag the images with tags such as “pedestrian,” “truck,” or “stop sign.” This is a labor-intensive and painstaking process that takes a lot of time and requires significant financial investments. Following the release of ChatGPT in 2022, OpenAI was widely criticized for outsourcing such work: the company hired Kenyan workers for less than $2/hour.

Current general-purpose large language models (LLMs) undergo reinforcement learning from feedback (RLHF). During the procedure, humans provide qualitative feedback or rank what the AI model generates. This approach leads to a significant increase in costs. Another reason for the rising costs of data labeling is the desire of companies to include corporate information, such as customer information or internal corporate documents, in the training process.

In addition, labeling expert-level data in areas such as law, finance and healthcare requires the involvement of highly qualified specialists, whose salaries are very expensive. That’s why some developers are outsourcing data labeling tasks to third-party companies, like Scale AI, which recently received $1 billion in funding.

Alex Ratner, CEO of data labeling startup Snorkel AI, says enterprise clients can spend millions of dollars labeling and processing information. Such operations in some cases take up to 80% of the time and budget for AI. Moreover, to maintain relevance over time, the data must be periodically supplemented and processed anew.

Thus, marking, along with the need to use expensive equipment, becomes one of the main cost items when training AI models. Some companies reduce costs by using synthetic data—that is, data generated by the AI itself. Recent innovations in the field of AI have made the generation of synthetic data efficient and fast, which in some cases makes it possible to abandon the use of arrays of real information. However, in some cases this threatens “self-repetition”.

admin

Next NASA's DART mission has helped us learn more about the geophysics underlying asteroid formation and evolution. »

Previous « Threads is testing disappearing posts - they will only live for 24 hours

There are almost no iPhone SE left in Apple stores – the company is preparing a replacement

Deliveries of the current generation iPhone SE (which Apple introduced in 2022) to company stores…

2 minutes ago

Video: metroidvania trailer Ender Magnolia: Bloom in the Mist on the occasion of its release from early access, where it collected 98% positive reviews

Publisher Binary Haze Interactive, together with developers from Live Wire and Adglobe studios, have released…

47 minutes ago

In the United States, the developers of Genshin Impact will be required to pay a $20 million fine and close donations to the game for children under 16 years of age.

Chinese HoYoverse, the developer of Genshin Impact, has agreed to pay a fine of $20…

7 hours ago

Not just NVIDIA accelerators: data tagging is becoming one of the main reasons for the rising cost of AI models

Recent Posts

There are almost no iPhone SE left in Apple stores – the company is preparing a replacement

Video: metroidvania trailer Ender Magnolia: Bloom in the Mist on the occasion of its release from early access, where it collected 98% positive reviews

In the United States, the developers of Genshin Impact will be required to pay a $20 million fine and close donations to the game for children under 16 years of age.

Photos of Radeon RX 9070 video cards from Asus TUF Gaming and Prime have been published

Apple, along with TikTok, removed a dozen other ByteDance apps from the App Store

TikTok stopped working in the US prematurely