A potential settlement in a closely watched artificial intelligence copyright case spells a warning for governance professionals who have yet to establish strong data provenance protocols for their products.

Details of the proposed agreement that parties in the Bartz v. Anthropic class-action lawsuit reached in August included a USD1.5 billion payout for authors who charged the developer with using their works to train the AI chatbot Claude without their permission. As part of the remedy, Anthropic promises to delete the original works it obtained from two pirated libraries and any copies created from those works 30 days after a final judgment.

Plaintiffs deemed this provision as part of the "critical victories" to be secured through a settlement, according to a court filing outlining the agreement. And while a settlement would avoid creating any kind of legal precedent, including a remedy that is common in copyright infringement cases sends a message about the challenges developers might face down the road if they are found to have violated copyright law and set a tone for several other AI copyright cases still unresolved.

"It's hugely important in the broader context" of the settlement, said Mary Rasenberger, the CEO of the Author's Guild, a professional organization for writers. "They are also prohibited from using those datasets again, so they have to be deleted so they're not mistakenly used again."

The settlement's fate is uncertain after U.S. District Judge William Alsup accused the parties of rushing the deal in a 7 Sept. order. They will have to resolve questions about which works will be included in the deal and how claimants will be notified and paid out before the judge makes a final decision. If an agreement is not reached, the case could still go to trial.

Destroying copyrighted materials is a key part of U.S. copyright law, which gives courts the ability to "order the destruction or other reasonable disposition of all copies or phonorecords found to have been made or used in violation of the copyright owner's exclusive rights, and of all plates, molds, matrices, masters, tapes, film negatives, or other articles by means of which such copies or phonorecords may be reproduced."

Rasenberger said requiring deletion hopefully spurs AI developers to think twice in the future about taking creatives' works without permission. She said the end goal is to push developers to work with authors to train on their works, with contracts governing how such work is used and how long it can be trained.

"If somebody robs your house, you're not going to say, 'Oh just keep it and pay me,'" she said.

Deletion can be made more complicated because of how AI is trained; data becomes the foundation of how chatbots reason and respond to queries, and the process of detangling information from a model, disgorgement, can be challenging depending on how the model is built.

Disgorgement may not be on the table in this case, exactly. If approved, the settlement would include Anthropic certifying it never used the data from the pirated libraries in any commercially released model.

In June, Alsup determined Anthropic was allowed to train on authors' works but violated their rights by saving pirated material to a database not necessarily used for training. He denied Anthropic's request for summary judgment on whether copies used for training should be viewed differently than the library, saying "the central library copies were retained even when no longer serving as sources for training copies, 'hundreds of engineers' could access them to make copies for other uses, and engineers did make other copies," and the record on their use was too poorly developed.

Anthropic did not respond to a request for comment about the settlement.

But the U.S. Federal Trade Commission has not been shy about requiring disgorgement in other cases, particularly in instances where a person’s privacy was infringed. Cambridge Analytica was required to delete or destroy all information collected from Facebook users through its GSRApp product. The home camera company Ring was similarly required to delete all face recordings used to train algorithms without users’ consent and affected work products, unless doing so was technically infeasible.

Santa Clara School of Law Professor Edward Lee said it is too early to say what the Anthropic settlement could mean given the legal approach diverges from other major AI copyright cases being litigated. The judge in the highly-tracked Meta case disagreed with Alsup that downloading creative works for training constitutes fair use, but dismissed the case on other grounds.

There could be some precedent in other library-building cases, such as American Geophysical Union v. Texaco, Inc., where courts ruled a private company creating copies of a publish journal for employees was fair use, Lee said.

Lee also cautioned arguments that training data should be destroyed quickly after their use could have unintended consequences; some technologies, such as anti-plagiarism software, need copies of works to seek out fraud. And creatives might have a harder time proving their works were used unfairly if datasets are quickly deleted.

"It's fair to say the future is uncertain," Lee said.

Maslon Counsel Eran Kahana indicated the Anthropic case should put AI governance professionals on alert about the importance of data traceability, should an AI product they use or deploy ever be required to destroy or disgorge data.

"You think you just signed a deal on a new AI app for your business, spent all this money building it out, and then lo and behold, the vendor gets hit with a complaint, boom, they're out and their algorithm is disgorged," said Kahana, who helped develop the AI Data Stewardship Framework in his research fellowship at Stanford Law School.

Kahana added having protocols for data provenance should be determined while AI models are being built in the event litigation occurs. If the information is not extractable, a company might be forced to destroy a model entirely, he said.

And then, "You're left with nothing," he said.

Caitlin Andrews is a staff writer for the IAPP.