OpenAI Copyright Class Action Lawsuit Claims Authors’ Books Were Used Without Permission

Yes, there is an active class action lawsuit against OpenAI alleging that the company downloaded and used copyrighted books without permission or compensation to train ChatGPT. The lawsuit claims OpenAI’s large language models were built by reproducing copyrighted works at scale—books by recognized authors—and that ChatGPT now generates outputs that infringe those copyrights by producing comprehensive summaries, outlines, and other content substantially similar to the original works. In October 2025, a federal court rejected OpenAI’s attempt to dismiss the case, finding that plaintiffs had sufficiently alleged outputs that a reasonable jury could find are substantially similar to copyrighted works, clearing the way for the lawsuit to proceed toward discovery and potentially trial. The litigation has grown significantly since its inception.

In April 2025, the U.S. Judicial Panel on Multidistrict Litigation consolidated twelve separate cases against OpenAI originating from the Southern District of New York and Northern District of California. These consolidated cases—known as an MDL (Multidistrict Litigation)—include class actions filed by authors and news organizations, along with suits focused on Digital Millennium Copyright Act (DMCA) violations. The case represents a critical moment in the intersection of artificial intelligence and copyright law, as courts begin evaluating whether AI companies can legally use copyrighted works to train commercial products without permission.

How Did the Copyright Infringement Allegations Against OpenAI Develop?
What Specific Copyright Infringements Are Alleged in the Lawsuit?
How Does the OpenAI Lawsuit Relate to Other AI Copyright Cases?
What Are the Potential Consequences for Authors, Publishers, and AI Developers?
What Are the Technical and Evidentiary Challenges in Proving AI Copyright Infringement?
OpenAI’s Competitive and Business Model Defense
What Comes Next in the Litigation and What Does It Mean for the Future of AI Copyright?
Conclusion

How Did the Copyright Infringement Allegations Against OpenAI Develop?

The core allegation in the lawsuit is straightforward: OpenAI obtained copyrighted books and other literary works without authorization, reproduced them during the training process for ChatGPT, and continues to profit from a system built on those infringing foundations. Plaintiffs argue that OpenAI’s training data included their full copyrighted works or substantial portions thereof, and that the resulting AI model generates outputs that are substantially similar to passages from those books. When users prompt ChatGPT to summarize a book or outline its themes, the company earns revenue—either through ChatGPT Plus subscriptions or enterprise licensing—without ever compensating the original authors whose works formed the foundation of the model’s knowledge.

OpenAI has countered with a defense strategy centered on the idea that courts cannot meaningfully assess whether outputs are substantially similar to copyrighted works without specific examples being formally attached to the complaint. In July 2025, the company argued that plaintiffs must provide detailed comparisons showing exactly which ChatGPT outputs match which passages from the copyrighted books. However, the October 27, 2025 ruling rejected this argument, with the court finding that plaintiffs had made sufficient factual allegations to proceed. The distinction matters: OpenAI wanted to avoid discovery (the period when both sides exchange evidence), but the court decided the case has enough merit to move forward.

How Did the Copyright Infringement Allegations Against OpenAI Develop?

What Specific Copyright Infringements Are Alleged in the Lawsuit?

The plaintiffs claim that ChatGPT has the capability to generate comprehensive summaries and detailed outlines of copyrighted books with striking accuracy—suggesting the model was trained on the full texts or near-complete versions of those works. This is not a claim about a single infringing use; it is a systemic allegation that the training process itself was built on copyright infringement. For example, a plaintiff author might demonstrate that when they prompt ChatGPT with “Summarize Chapter 3 of my book [Title],” the model produces a summary that closely mirrors the actual chapter’s themes, structure, and even specific examples—output that could only be generated if the model had access to the full text during training. The limitation of the current litigation, as of early 2026, is that we are still in the pre-discovery phase.

Plaintiffs have not yet obtained access to OpenAI’s training data pipelines, documentation, or detailed records of which copyrighted works were included in training datasets. The court’s October 2025 ruling allows the case to proceed, but it does not mean OpenAI’s liability has been established. The company may yet argue that its use of copyrighted works falls under fair use—a legal doctrine that permits limited copying of copyrighted material for purposes like criticism, commentary, or education. OpenAI has hinted at this defense, though the interaction between AI training and fair use remains unsettled in case law.

How Does the OpenAI Lawsuit Relate to Other AI Copyright Cases?

The OpenAI litigation is one piece of a broader wave of copyright infringement suits against AI companies. In a related but separate lawsuit, multiple writers—including Pulitzer Prize-winning journalist John Carreyrou—filed suit against six AI companies simultaneously: Anthropic, Google, OpenAI, Meta Platforms, xAI, and Perplexity. These writers have characterized the use of their books and articles for AI training as a “deliberate act of theft,” and their lawsuit frames the issue in starkly different language than the more measured tone of the consolidated MDL. This broader suit suggests that copyright concerns in AI training are not limited to OpenAI but reflect a systemic industry practice.

Notably, in September 2025, Anthropic became the first AI company to settle a copyright infringement lawsuit with authors. While details of the Anthropic settlement remain partially confidential, its existence signals that at least one major AI company determined settlement was preferable to protracted litigation. The OpenAI case, by contrast, has not settled and remains in active litigation as of early 2026. Whether the Anthropic settlement will pressure OpenAI to negotiate, or instead will harden the company’s resolve to fight the case and establish favorable precedent, remains to be seen. The outcomes of these parallel proceedings will shape how AI companies approach copyright in the future.

How Does the OpenAI Lawsuit Relate to Other AI Copyright Cases?

What Are the Potential Consequences for Authors, Publishers, and AI Developers?

If plaintiffs prevail in the OpenAI lawsuit, the consequences could be substantial for both the AI industry and creative professionals. Authors could potentially recover damages for copyright infringement, statutory damages per infringed work (which can run into thousands of dollars per work), and potentially attorneys’ fees if the lawsuit is certified as a class action and prevails. More broadly, a finding against OpenAI could establish that AI companies cannot legally use copyrighted works for commercial model training without permission, fundamentally altering the economics of AI development. Smaller AI startups that assumed they could freely use internet-sourced data to build models might face immediate legal jeopardy.

Conversely, if OpenAI successfully argues that its use of copyrighted works qualifies as fair use, or if the company wins on other legal grounds, the precedent would permit AI companies to continue training on copyrighted material. This would represent a major victory for the AI industry but a potential defeat for authors and publishers seeking to control how their intellectual property is used and monetized. The tradeoff is stark: creators’ rights to control and profit from their work versus the tech industry’s ability to build AI systems at scale. Courts will ultimately have to balance the public interest in AI innovation against the rights of authors under copyright law.

What Are the Technical and Evidentiary Challenges in Proving AI Copyright Infringement?

One of the most significant obstacles plaintiffs face is demonstrating that ChatGPT’s outputs are “substantially similar” to copyrighted works in the way copyright law defines that term. Copyright infringement requires more than just evidence that a copyrighted work was used during training; it requires proof that the resulting infringing work copies protectable expression from the original. The warning here is critical: demonstrating substantial similarity in AI outputs is not straightforward. A copyrighted book and a summary of that book by ChatGPT may convey the same ideas, but copyright law protects expression, not ideas. If ChatGPT learned the facts or narrative structure from a book but expressed those facts in different language, copyright law may not protect against that use.

Additionally, as of early 2026, discovery in the consolidated MDL has not yet fully begun. This means neither side has yet accessed the evidence needed to prove its case. Plaintiffs will eventually seek to obtain OpenAI’s training datasets, documentation about which copyrighted works were included, and records of how the company obtained or licensed data. OpenAI will likely claim portions of this information are trade secrets, triggering additional disputes over what must be disclosed. The pace and scope of discovery—how much of OpenAI’s internal documentation courts will force the company to produce—will significantly shape the litigation’s trajectory.

What Are the Technical and Evidentiary Challenges in Proving AI Copyright Infringement?

OpenAI’s Competitive and Business Model Defense

Beyond its legal arguments, OpenAI has implicitly relied on a business-model defense: the company argues that its use of copyrighted works to train large language models is necessary to create competitive AI products that serve the public interest. This is not a formal legal defense to copyright infringement, but it reflects OpenAI’s broader position that restrictive interpretations of copyright would chill AI innovation. The company has not publicly settled with authors (as Anthropic did), suggesting it believes it can prevail in litigation and establish favorable precedent.

However, this confidence carries risk: if the company loses on summary judgment or at trial, the damages award could be substantial. An example of the stakes: if the consolidated MDL is ultimately certified as a class action on behalf of all authors whose works were used in ChatGPT training, the class could be enormous. Even if statutory damages were capped at a modest per-work amount, the total exposure across tens of thousands of potentially infringed works could reach into the billions of dollars. OpenAI’s decision to fight rather than settle, so far, suggests the company believes either that its legal position is strong, or that the publicity and precedent of a settlement would be more damaging than the cost of litigation.

What Comes Next in the Litigation and What Does It Mean for the Future of AI Copyright?

As of early 2026, the OpenAI case remains active in consolidated MDL proceedings. The court’s rejection of OpenAI’s motion to dismiss in October 2025 was a significant win for plaintiffs, but it is just one step in a potentially years-long process. The next critical junctures will be: whether the court certifies a class of authors (opening the door to class action damages), whether OpenAI’s fair-use defense survives summary judgment, and whether the parties ultimately settle or proceed to trial. Given the complexity and stakes involved, settlement negotiations could intensify as the case progresses, particularly if plaintiffs win additional pre-trial rulings in their favor.

The broader implications extend beyond OpenAI. Courts and Congress are increasingly focused on AI copyright questions. If the OpenAI litigation establishes that AI companies cannot legally train on copyrighted material without permission, licensing of copyrighted works to AI companies could become a major new revenue stream for authors and publishers—similar to how music licensing has evolved. Alternatively, if courts uphold fair use for AI training, the copyright framework for AI will diverge sharply from copyright law in other contexts, creating new questions about what “fair use” means in the age of generative AI.

Conclusion

The OpenAI copyright class action lawsuit represents a pivotal moment in the ongoing tension between AI innovation and intellectual property rights. The core claim—that OpenAI trained ChatGPT on copyrighted books without permission or compensation—has survived the company’s motion to dismiss, clearing the way for discovery, potential class certification, and possibly trial. The October 2025 ruling was a significant procedural victory for authors, but it establishes only that their allegations are plausible, not that OpenAI is liable.

If you are an author whose work may have been used to train ChatGPT, monitoring this litigation is important. The consolidated MDL may eventually create a mechanism for affected authors to seek damages and compensation. If you are considering publishing a book or licensing your work to AI companies, the outcome of this case will substantially influence your negotiating position and the market value of such licenses. The coming years will clarify whether AI companies can freely use copyrighted works to build commercial products, or whether creators retain enforceable rights to control and profit from how their intellectual property is used in the age of artificial intelligence.