Mark Zuckerberg personally allowed Llama’s AI models to be trained on pirated materials

Meta✴ CEO Mark Zuckerberg personally authorized the Meta✴ division responsible for developing Llama artificial intelligence models to use a data set containing illegally obtained books and articles to train them. This became known from documents published as part of the lawsuit of writer Richard Kadrey against Meta✴.

Image Source: Tingey Injury / unsplash.com

The case is just one of a number of cases in which tech giants that develop AI systems are accused of training models on copyrighted material without the authors’ permission. Defendants have traditionally argued that their actions meet the fair use standard, a doctrine that allows copyright to be overridden to create new works and products that are substantially different from the original. Many copyright holders do not agree with this position.

A new batch of declassified documents (PDF) provides testimony from Meta✴ representatives: it turned out that Mark Zuckerberg personally approved the company’s use of the LibGen array to train Llama. The LibGen project, which bills itself as a link aggregator, actually provides access to copyrighted works operated by major publishers. He was repeatedly sued, tens of millions of dollars were recovered from him for copyright violations, and as a result the project was forced to close. Zuckerberg, the documents say, approved the use of LibGen to train at least one Llama model, despite concerns raised by Meta✴ employees and management. An internal memo is cited that notes that LibGen’s work was approved after “escalation to MZ,” an acronym that apparently meant the head of the company.

Image source: Igor Omilaev / unsplash.com

The plaintiff’s side filed a statement with the court on January 8 containing new charges. In particular, it is alleged that Meta✴ could try to hide this act and remove information about the use of LibGen materials – this was allegedly done by Meta✴ engineer Nikolay Bashlykov, who wrote a script that removed copyright information from books in the training array. Meta✴ also allegedly removed copyright notices and related metadata from scientific journal articles in the dataset. Moreover, Meta✴ violated copyright by downloading the LibGen array via the BitTorrent protocol – at this moment the company not only downloaded, but also simultaneously “distributed” this data, actually distributing pirated materials, the plaintiff claims. The head of generative AI at Meta✴, Ahmad Al-Dahle, gave permission to download LibGen data via BitTorrent, although engineer Bashlykov indicated that this “may not be legally permissible.”

The case is still far from over. For now, it only applies to early Llama models, not the latest releases. And if Meta✴ convinces the court of fair use of the materials, it may side with the company – in 2023, several plaintiffs were unable to prove copyright infringement, and their claims against Meta✴ were rejected.

admin

Share
Published by
admin

Recent Posts

Former top manager of Intel headed the second largest Chinese chip manufacturer

Hua Hong Semiconductor, China's second-largest chip maker, has made a strategic leadership reshuffle with the…

15 minutes ago

“Nothing can be cooler than this”: the creators of Phantom Blade Zero amazed gamers with new gameplay

On the occasion of the approaching Lunar New Year, developers from the Chinese studio S-Game…

15 minutes ago

Microsoft has joined the CISPE cloud alliance, which has been fighting it for years

Microsoft has become a new member of the CISPE association, which unites mainly small cloud…

2 hours ago

Nvidia said that GeForce RTX 5000 video cards will not have connectors that melt

At the recent GeForce Editors Day press event in South Korea, Nvidia said that the…

2 hours ago

GeForce RTX 5000 video cards will be in short supply and this will not last long, Nvidia partners warned

Nvidia's GeForce RTX 5000 family of graphics cards, introduced at the beginning of the month,…

3 hours ago