Comedian and author Sarah Silverman, together with writers Christopher Golden and Richard Kadrey, have filed lawsuits against OpenAI and Meta, as reported by Gizmodo on Friday. They accuse the two companies of utilizing copyrighted works, including their published content, to train their large language models, without acquiring the necessary permissions.
The focus of the complaints lies in the datasets purportedly used by OpenAI and Meta to train their models ChatGPT and LLaMA. For OpenAI, its “Books1” dataset is roughly equal in size to Project Gutenberg, a widely recognized repository of copyright-free books. However, the lawyers representing Silverman and the other plaintiffs argue that the “Books2” dataset is so extensive that it likely originates from illegal “shadow libraries” hosting copyrighted materials, such as Library Genesis and Sci-Hub. While these shadow libraries enable direct downloads of materials for common users, they also provide large written content in bulk torrent packages, potentially useful for those creating large language models. The lawsuit includes one exhibit where Silverman’s lawyers interacted with ChatGPT, requesting it to summarize ‘The Bedwetter,’ a memoir by Silverman released in 2010. Surprisingly, not only was the chatbot able to give a detailed summary of the book, but it also seemed to reproduce some parts word-for-word.
Silverman, Golden, and Kadrey join a growing list of authors who have previously filed copyright infringement suits against OpenAI. The company is grappling with numerous legal hurdles due to the training methodologies it used for ChatGPT. Just in June, OpenAI was hit with two independent lawsuits, including a broad class action case accusing the company of infringing federal and state privacy laws by using scraped data to train ChatGPT and DALL-E’s large language models.
Frequently Asked Questions (FAQs) about Copyright Infringement Lawsuit
Who has filed lawsuits against OpenAI and Meta?
Comedian and author Sarah Silverman, along with novelists Christopher Golden and Richard Kadrey, have filed lawsuits against OpenAI and Meta.
What is the main accusation against OpenAI and Meta?
The companies are accused of training their large language models on copyrighted materials, including works published by the plaintiffs, without obtaining their consent.
What is the argument around the datasets used by OpenAI and Meta?
The lawsuits allege that the “Books2” dataset used by OpenAI and Meta to train their AI models is so large that it could have only been sourced from illegal “shadow libraries” of copyrighted material.
Have there been previous lawsuits against OpenAI for similar issues?
Yes, Sarah Silverman, Christopher Golden, and Richard Kadrey are not the first authors to sue OpenAI over copyright infringement. The company has faced multiple legal challenges over its training methods for ChatGPT.
What specific incident is presented as evidence in Silverman’s lawsuit?
Silverman’s legal team interacted with the AI model ChatGPT, asking it to summarize ‘The Bedwetter,’ a memoir by Silverman published in 2010. Not only could the AI provide a detailed summary, but it also appeared to replicate some parts of the book verbatim.