The case could reshape how artificial intelligence companies are allowed to obtain the massive data sets they need to build their systems.
(CN) – When novelists Brian Keene, Abdi Nazemian and Stewart O’Nan sued Nvidia more than two years ago, they accused the company of training its artificial intelligence systems on pirated copies of their books.
On Tuesday, a federal judge largely agreed they had a case.
US District Judge Jon Tigar denied most of Nvidia’s motion to dismiss a proposed class action in the Northern District of California, allowing direct and contributory copyright infringement claims to proceed.
The authors say Nvidia trained some of its large language models on datasets and so-called shadow libraries — online repositories that housed pirated books and other copyrighted works — that contained their copyrighted books without permission.
A key focus of the lawsuit is a dataset known as “The Pile,” which included a subcollection of about 200,000 pirated books called Books3, itself sourced from the shadow library Bibliotik. The authors say Nvidia used The Pile to train multiple models in its Megatron line, including the Megatron 345M, NeMo GPT-3 10B, InstructRetro-48B, Retro-48B, and Nemotron-4 15B.
Nvidia backed off. The company asked the court to take judicial notice of a screenshot from its website that suggested Megatron 345M was only trained on parts of The Pile that did not include Books3.
Tiger, an appointee of Barack Obama, was not convinced. He warned that reviewing extraneous documents at the pleadings stage could lead courts to dismiss potentially valid claims before plaintiffs have a chance to obtain evidence through discovery, and declined to consider the company’s card design.
Regardless of the model card, Tigar found that the authors reliably matched their works with the training data. Books3 made up 12% of The Pile, works by authors appeared in Books3, and Megatron 345M was trained in The Pile.
The claim of contributory infringement proved equally tenable.
The authors alleged that Nvidia provided customers, including Writer, Persimmon AI Labs, and Amazon, with specially designed scripts to automatically download and process The Pile for use in their AI development.
Nvidia argued its broader NeMo Megatron Framework had substantial non-infringing uses, and the company never marketed it as a tool for copyright infringement.
Tigar made a sharp distinction. The question was not whether the platform as a whole could be used legitimately, but whether the specific scripts had some other purpose.
“The scripts are supposed to have no other purpose than to speed up the infringement process,” he wrote.
Asked whether Nvidia knew what its customers were doing with those tools, Tigar again sided with the perpetrators. Their complaint did not rest on suspicions, but identified concrete cases of infringement by named clients, the judge found.
“Plaintiffs have alleged that NVIDIA knew that its scripts and other aids were directly contributing to infringement by third parties,” he wrote.
One claim that did not survive was vicarious infringement, which requires a showing that the defendant had the right to control the infringing conduct and a direct financial interest in it.
The authors’ claim that Nvidia had the right and ability to control customers’ direct infringements was, according to Tigar, too vague. They did not explain how Nvidia could actually exercise that control after a customer independently chose to access The Pile.
The theory of financial gain fared no better. The court found that the authors failed to establish access to the infringing material that served as an attraction for customers.
“The key question is whether the infringing activity constitutes an equalization, not merely an additional benefit,” Tigar wrote.
The judge dismissed the suit with leave to amend within 21 days.
The plaintiffs are represented by the Joseph Saveri Law Firm, which also represents authors who are suing OpenAI over similar AI training data practices.
Representatives for Nvidia and the law firm Joseph Saveri did not immediately respond to requests for comment.
Subscribe to our free newsletters
Our weekly newsletter Closing arguments provides the latest on ongoing trials, major litigation and decisions in courts around the US and the world, while monthly Under the lights feeds legal dirt from Hollywood, sports, Big Tech and the arts.





