Editor's note: The IAPP is policy neutral. We publish contributed opinion and analysis pieces to enable our members to hear a broad spectrum of views in our domains.

The adoption of the EU Artificial Intelligence Act introduced new concepts and definitions for the various participants involved in the development, use and dissemination of AI technologies. 

Among the key definitions introduced is the concept of a provider: "a natural or legal person, public authority, agency or other body that develops an AI system or a general-purpose AI model or that has an AI system or a general-purpose AI model developed and places it on the market or puts the AI system into service under its own name or trademark." 

Under the AI Act, a provider can be a developer of an AI model or system, can place AI models or systems on the market, or place them into service. A provider can also be an entity that is not a developer, that places developed AI models on the market or places developed AI models or systems into service.

ADVERTISEMENT

Radarfirst- Looking for clarity and confidence in every decision? You found it.

The most common scenario is when an entity develops an AI system or a general-purpose AI model and places it on the market under its own name or trademark. Such companies include Open AI's ChatGPT, Anthropic's Claude 3, Perplexity, Google's Bard, Synthesia and others.

In the context of personal data processing, and in line with the EU General Data Protection Regulation, providers carry out at least two basic processing activities, depending on the specific purpose the data is processed: the creation of an AI model or system and service provision. This division is conditional, as the typical life cycle of developing and deploying an AI model or system includes, among other things, the creation, development, training, updating, fine-tuning, operation or post-training AI models.

The European Data Protection Board recognized in Opinion 28/2024 that depending on the circumstances, these "stages may take place in the development and deployment of AI models and may include the processing of personal data for various processing purposes." According to the EDPB, "the development of an AI model covers all stages before any deployment of the AI model, and includes, inter alia, code development, collection of training personal data, pre-processing of training personal data, and training." It further explained "the deployment of an AI model covers all stages relating to the use of an AI model and may include any operations conducted after the development phase." This approach may be valuable not only in the context of the opinion itself, but also when conducting a data protection impact assessment.

Risk identification 

Each stage may be a different type of processing activity and may involve both different data controllers and different roles of the company as the one developing and launching the product. At the same time, given the specifics of the training process and use of AI, the question arises as to the application of Article 35 of the GDPR: "A single assessment may address a set of similar processing operations that present similar high risks." 

When an organization conducts a DPIA, the following approach is typically used: defining the risk, identifying the risk, assessing the risk, managing or accepting the risk, documenting and communicating the risk for review by stakeholders and organizations, and finally implementing and monitoring the system with periodic evaluation to ensure sustainability.

Organizations use different approaches at the initial stage of risk identification. For some, there is constant interaction between the development team, compliance and information security specialists, for others, there are stakeholder meetings, management, or involving privacy teams into operational processes. Often, new data processing activities are identified during the maintenance or development of preliminary privacy documentation, e.g. through the management of records of processing activities or the development of data-flow maps. Within these stages, potential privacy risks are also detected, particularly when an organization plans to develop a new product — such as an AI model or AI-based system. Therefore, the need to conduct a privacy audit will apparently arise as an important consideration.

This is largely due to the nature of AI itself and the specifics of AI model training, which often require access to large datasets that may include personal data. Therefore, when referring to the Guidelines on Data Protection Impact Assessment and determining whether processing is "likely to result in a high risk" under Regulation (EU) 2016/679), it is important to recognize that many AI-related activities can trigger this requirement. According to the Article 29 Working Party, a data controller should generally consider that if a processing operation meets two or more of the specified risk criteria, a DPIA is required. The more criteria that are met, the higher the likelihood that the processing presents a high risk to individuals' rights and freedoms — regardless of any safeguards the controller plans to implement. In some circumstances, even meeting only one criterion may be sufficient to necessitate a DPIA.

The decision to conduct an audit should be based on the standard provisions of the GDPR, the DPIA guidelines and determining whether processing is "likely to result in a high risk" under Regulation 2016/679. Additionally, it should also be based on, where applicable, on certain data protection aspects related to the processing of personal data in the context of AI models under Opinion 28/2024.

There exists a broad spectrum of potential scenarios that may arise in practice, raising complex questions regarding the appropriate approach to conducting a DPIA — and, indeed, whether such an assessment is required at all. It is worth noting that there is a general fact that there is a possibility of the need to conduct several such assessments and companies frequently operate within the constraints of limited time, financial resources and technical capacity. Accordingly, it is essential for deploying entities to know whether to conduct an audit, how to conduct it, to what extent and quantity. 

There is no one-size-fits-all approach to DPIA for all AI deployers. However, there are some things that are or could be common to companies: the presence of AI model cards — model cards are short documents provided with machine learning models that explain the context in which the models are intended to be used, details of the performance evaluation procedures and other relevant information — two main personal data processing operations that can be distinguished from the AI life cycle, and that AI models can be anonymous. 

The EDPB considers that, for an AI model to be regarded as anonymous using reasonable means both: 1) the likelihood of direct — including probabilistic — extraction of personal data concerning individuals whose data were used to train the model, and 2) the likelihood of obtaining such personal data from queries, whether intentionally or unintentionally, should be insignificant for any data subject.

Given this initial data, the following approach to conducting DPIA when AI developers intend to place models or systems on the market could be taken. First, conduct a preliminary identification of possible processing operations, defining potential use of personal data use for AI training and AI service provision. Next, develop the model card. Finally, identify the processing operations and determine if a DPIA is necessary. If personal data processing is planned, analyze the model card, check, relevant factors and identify possible risks.

Within the training and deployment contexts at this stage, it is important to determine whether the model is anonymous. The EDPB considers that, "even when an AI model has not been intentionally designed to produce information relating to an identified or identifiable natural person from the training data, information from the training dataset, including personal data, may still remain 'absorbed' in the parameters of the model, namely represented through mathematical objects. They may differ from the original training data points, but may still retain the original information of those data, which may ultimately be extractable or otherwise obtained, directly or indirectly, from the model. Whenever information relating to identified or identifiable individuals whose personal data was used to train the model may be obtained from an AI model with means reasonably likely to be used, it may be concluded that such a model is not anonymous."

Instead, a different approach should be taken in the deployment phase. The deployer should assess the processing of user data for the purpose of providing services where the existence of an anonymized model may not matter as the processing concerns data subjects that will use the model and not user data that was collected for training purposes.

The deployer should develop the DPIA based on the previously identified data processing activities and their purposes. This includes assessing risks related to AI training, both during training and in anticipated deployment context, as well as addressing risks associated with AI service provision, focusing on user-facing deployment and incorporating earlier findings.

Since the criterion for conducting a DPIA is not the technology itself but the data processing operation, companies may need several of them. However, the results of the risk assessment on the use of AI for training purposes can be used in a DPIA regarding the use of AI for service provision. These results relate to the risks to data subjects whose data was used for training in the context of potential direct or probabilistic extraction of personal data of individuals whose data was used to train the model. Strategically, for example, the DPIA results may be combined into one general "DPIA regarding the development and implementation of the AI model by the Company X."   

The final step involves publishing a DPIA results report and conducting periodic follow-up audits to ensure transparency, accountability and ongoing compliance. This includes sharing DPIA outcomes with stakeholders when possible and scheduling audits aligned with model updates or changes in use.

AI model development is a complex, sequential process that requires collaboration between company management and privacy specialists to analyze potential risks by design — starting from the planning of the processing initiative. Sometimes, specialists are brought in at later stages and encounter complex technical documentation within large projects. These recommendations aim to provide a holistic view on DPIA development as part of the broader privacy documentation process for AI providers that serve a dual role as developer and deployer. 

Rostyslav Prystai, CIPP/E, is a PhD candidate at Lviv Ivan Franko National University.