Several tech giants, including Apple, Anthropic, Nvidia and Salesforce, trained their artificial intelligence models on YouTube videos without the consent of platform owner Google and the authors of the videos, a Proof News investigative report found.

Image source: Gerd Altmann / pixabay.com

The alleged copyright infringer was EleutherAI, a non-profit organization that, according to its own statement, helps developers train AI models. Its target audience is not tech giants, but small developers and scientists. EleutherAI has released the Pile dataset, a significant part of which is available and open to anyone on the Internet – all you need is the resources to download, store and process it.

The Pile data set included subtitles for 173,536 YouTube videos that were downloaded from more than 48,000 channels—subtitle files are actually transcripts of videos, and YouTube platform rules prohibit downloading its materials without permission. However, Apple, Nvidia and Salesforce – companies with capitalizations of hundreds of billions and trillions of dollars – have themselves admitted in their scientific papers that they used Pile to train AI. Apple, in particular, used Pile to train OpenELM models introduced in April, and already in June talked about new AI features that will appear on the iPhone and Mac.

If there was indeed copyright infringement in this incident, it was the non-profit organization EleutherAI that did it in the first place, and the tech giants may have been bona fide users of the publicly available data set. This example once again shows that the field of AI training is still not well established from a legal perspective.

Leave a Reply

Your email address will not be published. Required fields are marked *