Open source Git hosting platform SourceHut said its services were slowed by web crawlers run by AI companies. Similar complaints are increasingly coming from other resource owners.
Image source: Kai Wenzel / unsplash.com
To limit traffic from AI bots, SourceHut had to deploy Nepenthes, a defense against rogue web crawlers that collect data to train AI models. The platform’s administration unilaterally blocked entire address ranges of several cloud providers, including Google Cloud and Microsoft Azure, due to excessive traffic from bots deployed on their networks. Owners of legitimate services on these infrastructures were advised to contact SourceHut individually to add them to the exceptions. In 2022, SourceHut also suffered due to excessive requests to its resources from the Google Go Module Mirror service.
In 2023, OpenAI pledged that its bots would follow robots.txt directives, which specify how web crawlers should process data from websites. Other AI developers have made similar commitments, but complaints of abuse continue to pour in. Last summer, the iFixit website, in particular, was invaded by the Anthropic Claudebot. In December, the Vercel host reported a significant presence of AI crawlers in its infrastructure: OpenAI GPTbot sent 569 million requests to its network, while Anthropic Claude sent 370 million. Together, they accounted for about 20% of the 4.5 billion Googlebot requests, which are used to index resources on Google.
Image source: Kai Wenzel / unsplash.com
At the same time, Dennis Schubert, a developer of the distributed social network Diaspora, complained that over the previous 60 days, AI bots accounted for 70% of the traffic to his server. The publication went viral, and the activity of AI scanners dropped sharply; however, online hooligans organized a massive invasion of requests from clients with a user-agent string value that matched OpenAI GPTbot. However, the real OpenAI AI bot sends requests from the Microsoft Azure infrastructure, and in the case of the Diaspora server, they came from AWS addresses and even from American Internet providers.
Sometimes the situation is complicated by the fact that some bots have multiple purposes. For example, Meta✴ AI bot and AppleBot collect data exclusively for AI training, while GoogleBot serves both AI and search indexing. To avoid confusion, Google added a separate Google-Extended value for AI training tools in 2023.
HP today introduced the HP OmniBook 5 16 AI PC, available with powerful Intel and…
HP has announced the OmniBook 7 notebooks with a 14-inch and 16-inch display. Both devices…
HP has unveiled an updated version of the Victus 15 gaming laptop. The 2025 model…
HP has announced the compact and ultra-light 13-inch OmniBook 7 Aero laptop. It will be…
HP has expanded its Omen gaming laptop range with a new model, the Omen 16…
HP has announced the OmniBook X, which will be available in various configurations with the…