Artificial intelligence adoption inside organizations is accelerating faster than many governance frameworks can adapt. Human resources platforms deploy predictive analytics, productivity tools learn from behavioral patterns, and customer-service systems are trained on historical interactions.
As these systems mature, a recurring legal question continues to surface: When special-category data is involved, on what basis can AI models be trained, particularly where training is carried out by a processor?
Recent case law and regulatory guidance, culminating in the Court of Justice of the European Union's Single Resolution Board judgments, clarify that this question cannot be answered through abstractions. Instead, a practical fork has emerged, one that depends on how identifiability actually operates in the hands of the processor.
In practice, AI training scenarios involving special-category data now tend to fall into one of two lawful pathways. In the first, the data is rendered non-identifying for the processor, such that the processor cannot realistically attribute it to individuals. In the second, the data remains personal data for the processor, but with a materially reduced risk profile. Understanding which pathway applies, and for what reasons, is now the foundation of defensible AI governance.
Pseudonymization is a safeguard, not a separate legal basis
Before addressing AI training itself, one preliminary issue must be settled. Confusion often arises as to whether a processor requires its own lawful basis, particularly under Article 9(2) of the EU General Data Protection Regulation, to pseudonymize special-category data received from a controller.
It does not.
Under Recital 50 and Article 32 of the GDPR, pseudonymization is recognized as a technical and organizational safeguard, not a new purpose of processing. The CJEU confirmed this logic in its September 2025 decision in the SRB case, making clear that pseudonymization does not alter the controller's original legal basis. It merely reduces risk within it.
Accordingly, where a controller already relies on a valid Article 9(2) condition — for example, Article 9(2)(b) for employment-related obligations — a processor acting under the controller's instructions does not need an independent Article 9 justification to pseudonymize that data.
The real legal analysis begins after pseudonymization.
The post-SRB framework: Identifiability is relational, not absolute
The SRB judgment rejects a simplistic binary between "personal" and "anonymous" data. The CJEU makes clear that pseudonymized data is not automatically personal data in all cases and for every recipient, nor does pseudonymization render data anonymous in all cases simply because additional information exists somewhere in the ecosystem.
Instead, identifiability must be assessed relationally, by reference to the specific actor holding the data, the technical, organizational and legal measures surrounding it, and the means reasonably likely to be used to identify individuals.
This has led many commentators to describe a functional middle state, sometimes labeled "impersonal data," where data is not fully anonymous in the abstract, but is rendered non-identifying for a specific processor due to effective safeguards.
This relational approach frames the two lawful pathways that follow.
Pathway 1: Data rendered non-identifying for the processor — identifiability after SRB
Following the SRB case, the decisive question is whether the processor can, in practice, attribute the dataset to identifiable individuals using means reasonably available.
This assessment is contextual and concrete. It turns on factors such as the absence of direct identifiers, the absence of access to re-identification keys, contractual and organizational prohibitions on re-linkage, technical measures such as aggregation, noise injection, access controls, and data loss prevention, and the processor's inability to lift those measures or combine the data with auxiliary datasets.
Where these conditions are met, the data — while not "anonymous in all cases" — is non-identifying for that processor.
Consequences for AI training
In this scenario, the processor does not process data relating to identifiable individuals, Article 9 does not play a practical role in the processor's hands, and downstream AI training focuses on patterns rather than persons.
This does not eliminate all governance obligations. Purpose limitation, onward-transfer controls, and contractual safeguards remain critical, precisely because the same dataset may become personal again for later recipients. But for the processor's own AI training activities, the GDPR's core risk rationale is neutralized.
Pathway 2: Data that remains personal data for the processor — when the GDPR still applies
In many realistic AI pipelines, pseudonymization significantly reduces risk but does not eliminate identifiability altogether. The processor may still be able, alone or via reasonably accessible means, to attribute data to individuals.
In that case, the GDPR continues to apply to the processor's activities, including purpose limitation under Article 5(1)(b), the requirement for a lawful basis under Article 6, and safeguards reflecting the reduced identifiability.
The question then becomes: which lawful basis fits this transformed processing?
Legitimate interest as the appropriate basis for AI training, and why legitimate interest fits the post-pseudonymization context
Recent regulatory signals, including the European Data Protection Board's 2024 Opinion on AI models, recognize that legitimate interest may serve as a lawful basis for certain AI training activities where the objective is legitimate — such as model safety, robustness, bias mitigation or accuracy — the processing is necessary to achieve that objective, and the impact on individuals is minimal due to strong safeguards.
This reflects a core GDPR insight: not all personal data presents the same level of risk, and pseudonymization materially alters the balancing exercise.
Addressing Article 9 concerns
Legitimate interest does not, by itself, justify the processing of special-category data as such. But the SRB case clarifies why this does not end the analysis.
In post-pseudonymization environments, the controller relies on Article 9(2) for the original collection and use, the processor operates on data that no longer functions as special-category data in a meaningful sense, and the downstream activity targets statistical patterns rather than individual attributes.
Where identifiability is materially reduced, the processor's AI training no longer engages the core mischief Article 9 is designed to prevent. Legitimate interest, therefore, becomes available for that downstream processing — subject to a rigorous balancing test and strong safeguards.
What makes legitimate interest credible in practice
Three elements are essential.
First, necessity. AI training must be genuinely necessary to achieve the stated objective.
Second, safeguards. These include key separation, access controls, governance segregation, limited retention and strict prohibitions on re-identification.
Third, accountability. Organizations must document their assessment, reflect the reduced risk realistically and communicate transparently.
When these conditions are met, legitimate interest aligns with both the structure and the spirit of the GDPR.
Processors, dual roles and governance reality
A persistent myth is that any processor training AI models automatically becomes a controller. That is incorrect.
A processor remains a processor where it trains models strictly under the controller's instructions and for the controller's benefit. Where it trains models for its own improvement purposes, it may act as a controller for that narrow activity — often relying on legitimate interest — while remaining a processor for customer-specific deployments.
What matters is clarity, not denial.
A practical decision framework
Responsible AI governance now follows a clear sequence.
First, the controller identifies the applicable Article 9(2) basis.
Next, pseudonymization is applied as a safeguard under the controller's instructions. Identifiability is then assessed from the processor's perspective.
Where the data is non-identifying for the processor, it is governed as such, with strict onward transfer controls. Where the data remains personal, legitimate interest is relied upon for narrowly defined AI training activities.
Throughout, the analysis is documented and safeguards are enforced rigorously. This is precisely the disciplined, relational governance the GDPR now requires.
From binary thinking to relational lawfulness
The SRB does not eliminate the GDPR's protections. It refines them.
AI training with special-category data now turns on a relational assessment of identifiability, not labels. Either data is rendered non-identifying for the processor, or it is not. Where it is not, legitimate interest provides a lawful and realistic path forward when risk is genuinely low.
Pseudonymization is a safeguard. Identifiability is contextual. Legitimate interest works when governance is real.
The GDPR already contains the tools. The challenge is using them with precision.
Noemie Weinbaum, AIGP, CIPP/A, CIPP/C, CIPP/E, CIPP/US, CIPM, CIPT, CDPO/FR, FIP, is privacy lead at UKG and managing director at PS Expertise.
Roy Kamp, AIGP, CIPP/A, CIPP/E, CIPP/US, CIPM, CIPT, FIP, is legal director at UKG.

