In what is the most significant copyright payout in history to date, Anthropic, a leading artificial intelligence (AI) company, has agreed to a record-breaking $1.5 billion settlement in a class action suit brought by a group of authors. The case has not only sent reverberations throughout the tech industry but has ramifications for the entertainment industry and copyright holders. Currently, there are more than 40 lawsuits in the US involving copyright issues, with a significant number related to the music industry. (Reisner, 2025) This pivotal moment will reshape the future of how AI companies collect data to train generative AI, as well as redefine the fair use doctrine, copyright, licensing, and how AI companies operate.
On August 19, 2024, three authors —Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson — filed a class action lawsuit against Anthropic, a San Francisco-based AI company founded in 2021. The suit alleged that Anthropic “built a multi billion-dollar business by stealing hundreds of thousands of copyrighted books” to train its AI models. (Bartz v. Anthropic, 2025) Over the summer, a federal judge ruled in favor of Anthropic. The judge stated that Anthropic was within its legal rights to train its AI on published books that were legally purchased. After contacting major publishers about the possibility of licensing their books, Mr. Turvey purchased physical books in bulk from distributors and retailers, according to court documents. He then hired outside organizations to disassemble the books, scan them, and create digital copies that could be used to train the company’s AI technologies. However, the ruling stated that the fair use doctrine does not apply to pirated books that the company used to train its AI platform. Late last month, the company agreed to a settlement with the plaintiffs in the class action for $1.5 billion. The final amount could be higher due to the volume of work used – approximately 500,000, which calculates to an additional $3,000 per work. The settlement further restricts Anthropic from obtaining a license for the use of the pirated material for future AI training purposes. This settlement has significant implications for the future of training AI using copyrighted material, as it may force companies to rely on synthetic data sets or obtain licenses for the restricted use of IP. As part of the settlement, Anthropic must also destroy the original files it downloaded and any copies.
The settlement is especially noteworthy during a time of mounting lawsuits against AI companies over copyright infringement. From media outlets, online platforms, artistic works, and the internet, AI companies’ voracious appetite for data to build a baseline for the development of generative AI is unending. This modern-day gold rush has numerous companies – both tech and media outlets – vying for a piece of the future AI-driven environment. Along the way, many organizations have used the courts to protect proprietary intellectual property after partnerships or licensing agreements have failed. In 2004, Google launched the Google Books project (originally Google Print) as a means of bypassing partnerships or obtaining direct licenses. Several lawsuits against Google Books for copyright infringement have been filed in federal courts, resulting in settlements and victories for Google that have led to the company changing its business practices. In 2024, Reddit filed a lawsuit against Anthropic, claiming that Anthropic’s bots had unauthorized access to Reddit’s site more than 100,000 times after Reddit rejected Anthropic’s request to obtain a license. Instead of focusing on copyright infringement, Reddit sued Anthropic for unfair competition rather than copyright infringement. Other similar cases are currently pending in courts across the country and await a ruling or possible settlement actions. (Reddit, Inc. v. Anthropic PBC, 2025)
The music industry is not immune to AI companies scraping music data to train their AI models. In 2023, Universal Music sued Anthropic over “systematic and widespread infringement of their copyrighted song lyrics” to train the company’s chatbot Claude. The settlement in this case (Concord Music Group v Anthropic, 2025) required Anthropic to “maintain it has already implemented guardrails (safety measures)” so that Claude would not generate song lyrics using copyrighted works owned by Universal. Similarly, UMG is currently suing Uncharted Labs, the maker of the AI music generator Udio, for copyright infringement. Filed in June 2024 in the Southern District of New York, the suit alleges that Udio generates music files that strongly resemble recordings owned by Universal. (UMG Recordings v. Uncharted Labs, 2024) Like many cases, Udio allows users to create digital files based on text prompts (such as the decade of release, topic, genre, and artist description) or audio files. The case is currently ongoing. Finally, GEMA, the German performance rights association, has filed a lawsuit in a German court, accusing OpenAI of reproducing its members’ song lyrics without a license. Unlike many other court cases, GEMA aims to establish a licensing model that compensates music creators for works used by tech organizations to train AI models, rather than allowing AI companies to use proprietary IP. (Stasjuka, 2025)
Although AI continues to develop at an exponential rate, many believe that the traditional methods of training generative AI are no longer valid. In an interview, Elon Musk stated that AI companies have run out of data for training their models, thereby “exhausting” the sum of human knowledge. He stated, “The cumulative sum of human knowledge has been exhausted in AI training. That happened basically last year.” As such, xAI will utilize synthetic data generated from existing material created by AI. According to Musk, the effect of using synthetic data is akin to an AI model writing an essay and then grading the essay itself. Similarly, companies such as Microsoft, Google, and Meta are already moving away from copyrighted material and relying on synthetic data. According to the company’s website, Google’s DeepMind has generated a pool of 100 million unique examples to train its system Alpha Geometry to solve complex math problems, “sidestepping the data bottleneck” of human-generated information. The only negative consequence of synthetic data usage is that it increases the likelihood of digital hallucinations or nonsensical contact that AI believes is true. For many companies still relying on traditional data sets to train their generative AI systems, obtaining licenses from media, entertainment organizations, and other copyright holders is the only legally permissible way to train their generative AI programs. For example, OpenAI has signed licensing deals with prominent news organizations, including Axel Springer, Condé Nast, News Corp, and The Washington Post, to train their AI platforms. Similarly, in May, Amazon signed a licensing agreement with The Times. These developments will ultimately reshape the fair use debate and the future of creation within the entertainment industry as AI begins its next chapter of development..
References
- Bartz v. Anthropic PBC, 3:24-cv-05417, (N.D. Cal.) 2025
- Concord Music Group, Inc. v. Anthropic Pbc, (5:24-cv-03811) 2025
- Reddit, Inc. v. Anthropic PBC, 3:25-cv-05643, (N.D. Cal.) 2025
- Reisner, A. Judges “Don’t Know What AI’s Book Piracy Means. Atlantic.Com, N.PAG. (2025).
- Stasjuka, Tatjana. “Legal recognition, control and monetization of AI-generated instrumental music content: EU and US approaches.” (2025).
- UMG Recordings, Inc. v. Uncharted Labs, Inc., 1:24-cv-04777, (S.D.N.Y.) 2024