AI Data Miners Found to Be Causing Massive Slowdowns Across the Internet

Open source Git hosting platform SourceHut said its services were slowed by web crawlers run by AI companies. Similar complaints are increasingly coming from other resource owners.

Image source: Kai Wenzel / unsplash.com

To limit traffic from AI bots, SourceHut had to deploy Nepenthes, a defense against rogue web crawlers that collect data to train AI models. The platform’s administration unilaterally blocked entire address ranges of several cloud providers, including Google Cloud and Microsoft Azure, due to excessive traffic from bots deployed on their networks. Owners of legitimate services on these infrastructures were advised to contact SourceHut individually to add them to the exceptions. In 2022, SourceHut also suffered due to excessive requests to its resources from the Google Go Module Mirror service.

In 2023, OpenAI pledged that its bots would follow robots.txt directives, which specify how web crawlers should process data from websites. Other AI developers have made similar commitments, but complaints of abuse continue to pour in. Last summer, the iFixit website, in particular, was invaded by the Anthropic Claudebot. In December, the Vercel host reported a significant presence of AI crawlers in its infrastructure: OpenAI GPTbot sent 569 million requests to its network, while Anthropic Claude sent 370 million. Together, they accounted for about 20% of the 4.5 billion Googlebot requests, which are used to index resources on Google.

Image source: Kai Wenzel / unsplash.com

At the same time, Dennis Schubert, a developer of the distributed social network Diaspora, complained that over the previous 60 days, AI bots accounted for 70% of the traffic to his server. The publication went viral, and the activity of AI scanners dropped sharply; however, online hooligans organized a massive invasion of requests from clients with a user-agent string value that matched OpenAI GPTbot. However, the real OpenAI AI bot sends requests from the Microsoft Azure infrastructure, and in the case of the Diaspora server, they came from AWS addresses and even from American Internet providers.

Sometimes the situation is complicated by the fact that some bots have multiple purposes. For example, Meta✴ AI bot and AppleBot collect data exclusively for AI training, while GoogleBot serves both AI and search indexing. To avoid confusion, Google added a separate Google-Extended value for AI training tools in 2023.

admin

Share
Published by
admin

Recent Posts

HP OmniBook 5 AI PC 16-inch with Core Ultra and Ryzen AI chips announced

HP today introduced the HP OmniBook 5 16 AI PC, available with powerful Intel and…

1 hour ago

HP Unveils 14- and 16-inch OmniBook 7 Laptops with Intel Core Ultra Chips

HP has announced the OmniBook 7 notebooks with a 14-inch and 16-inch display. Both devices…

1 hour ago

HP Updates Affordable Victus 15 Gaming Laptop With 14th Gen Intel Core, AMD Ryzen AI 300

HP has unveiled an updated version of the Victus 15 gaming laptop. The 2025 model…

1 hour ago

HP Unveils OmniBook 7 Aero Laptop That Weighs Less Than a Kilogram

HP has announced the compact and ultra-light 13-inch OmniBook 7 Aero laptop. It will be…

2 hours ago

HP Unveils Omen 16 Slim Gaming Laptop with Intel Arrow Lake-H and GeForce RTX 5070

HP has expanded its Omen gaming laptop range with a new model, the Omen 16…

2 hours ago

HP Unveils Thin 17-Inch OmniBook X Laptop with Core Ultra 200V Chips and GeForce RTX 4050 Graphics

HP has announced the OmniBook X, which will be available in various configurations with the…

2 hours ago