18 Dec. 2024

EDPB releases opinion on personal data use in AI model development

The European Data Protection Board released its anticipated opinion regarding the lawful use of personal data for the development and deployment of artificial intelligence models. The board's deliberations culminated in recommendations that leave discretion to EU data protection authorities' determinations around lawful data use while clarifying how legitimate interest fits into AI training.

The EDPB issued the opinion 18 Dec. in response to a formal request from Ireland's Data Protection Commission calling for clarity over how the EU General Data Protection Regulation treats AI training, with specific focus on personal data used for large language models.

The AI industry requires troves of data to train and improve its models over time. How it does so, from public data scraping of the internet to first-party data use — sometimes without offering notice or opt-outs to users — has been the subject of several legal challenges since OpenAI debuted ChatGPT in 2022.

Syrenis ad, a privacy professional's AI checkilist

The EDPB specifically sought to explore anonymization of AI models, the application of the legitimate interest legal basis for processing personal data when developing and deploying those models and model training based on data processed in violation of the GDPR.

"AI technologies may bring many opportunities and benefits to different industries and areas of life. We need to make sure these innovations are done ethically, safely, and in a way that benefits everyone," EDPB Chair Anu Talus said in a statement. "The EDPB wants to support responsible AI innovation by ensuring personal data are protected and in full respect of the General Data Protection Regulation."

The opinion was welcomed by the Irish DPC as a way to provide clarity and guidance to the industry, with Commissioner Dale Sunderland saying it would provide "proactive, effective and consistent regulation across the EU/EEA."

"It will also support the DPC’s engagement with companies developing new AI models before they launch on the EU market as well as the handling of the many AI-related complaints that have been submitted to the DPC," he said.

Highlights of the opinion: Anonymity, legitimate interests and unlawful processing

The EDPB advised DPAs to consider anonymity status on a case-by-case basis while noting regulators can review any risk assessments and documentation associated with the training, test for vulnerabilities and verify what, if any, privacy-preserving techniques were used. The EDPB further explains that, "For a model to be anonymous, it should be very unlikely (1) to directly or indirectly identify individuals whose data was used to create the model, and (2) to extract such personal data from the model through queries." It also provides a "non-exhaustive list" of ways or methods to demonstrate anonymity.

AI developers can use legitimate interest as a legal basis for model training, however, the EDPB opinion indicates relevant authorities should apply a three-step test to determine if developers are claiming it lawfully. Those steps include identifying whether there is a legitimate interest by the controller or third party; whether the processing is necessary; and whether the interests or fundamental rights and freedoms of data subjects are overridden by the legitimate interests. Additionally, in its press release, the EDPB provided examples that include "a conversational agent to assist users, and the use of AI to improve cybersecurity" as beneficial for users "but only if the processing is shown to be strictly necessary and the balancing of rights is respected."

Expectations around how people's data will be handled is a critical component of the balancing test, the board stated.

"This can be important due to the complexity of the technologies used in AI models and the fact that it may be difficult for data subjects to understand the variety of their potential uses, as well as the different processing activities involved," the opinion reads. "In this regard, both the information provided to data subjects and the context of the processing may be among the elements to be considered to assess whether data subjects can reasonably expect their personal data to be processed."

In cases where an AI model was developed using unlawfully processed personal data, the EDPB states that it "could have an impact on the lawfulness of its deployment, unless the model has been duly anonymised." The board also notes that the diversity of AI models and rapid evolution of the technology prompted it "to give guidance on various elements that can be used for conducting a case-by-case analysis."

According to the EDPB, guidelines "covering more specific questions" are being drafted. The board also said it is in the process of drafting guidance for web scraping.

Early reaction

Hogan Lovells Partner Eduardo Ustaran, AIGP, CIPP/E, said the opinion is one of the "most consequential" views of the GDPR globally given the role data has in AI development. He said it strikes a pragmatic view in trying to protect innovation, noting it aligns with other regulators' views on the subject, such as the U.K. Information Commissioner's Office.

Ustaran said understanding the legitimate interest part of the opinion and how it's done in practice will be crucial going forward.

"While the EDPB's approach to this aspect of the GDPR is not new, their granular description of how to approach specific AI data uses 'Legitimate Interests Assessment' is very revealing," he said.

Computer & Communications Industry Association Europe Senior Policy Manager Claudia Canelles Quaroni said, "The EDPB’s confirmation that ‘legitimate interest’ is a lawful basis for processing personal data in the context of AI model development and deployment marks an important step towards more legal certainty. It means that AI models can be properly trained using personal data. Indeed, access to quality data is necessary to ensure that AI output is accurate, to mitigate biases, and to reflect the diversity of European society."

However, Quaroni also said that "greater legal clarity and a practical framework are needed to reconcile EU privacy principles with technological progress. This is essential for Europe to remain competitive and unlock AI-driven innovation. Otherwise European consumers and businesses risk missing out on more cutting-edge technologies powered by AI and data."

Caitlin Andrews is a staff writer for the IAPP.

This article is eligible for Continuing Professional Education credits. Please self-submit according to CPE policy guidelines.

Submit for CPEs

Interested in writing for us? Visit our Contributor Guidelines Page

EDPB releases opinion on personal data use in AI model development

Related stories

Highlights of the opinion: Anonymity, legitimate interests and unlawful processing

Early reaction