Assuming the European Parliament votes in favor of the Artificial Intelligence Act as anticipated this spring, lawyers of the European Union will need some time to fine-tune and translate the legislation's long, detailed and complicated text. If the AI Act is published in the Official Journal of the European Union in, say, September 2024, it would apply roughly two years later. The AI Act includes a debiasing exception to the General Data Protection Regulation's ban on using sensitive data.
Organizations can use AI for many purposes, including to make decisions about people — such as a bank assessing the creditworthiness of a customer who wants to obtain a mortgage. But AI can lead to accidental discrimination. For example, the bank's AI system could deny mortgages to people with a certain ethnicity, even if the bank did not plan such discrimination.
Suppose an organization wants to test whether its AI system leads to indirect discrimination of people with certain ethnicities. It needs to know the ethnicity of individuals about whom its AI system makes decisions. This is a problem in Europe, as the organization typically does not know the ethnicity of its applicants, let alone register it. In principle, Article 9 of the GDPR prohibits the use of "special categories of personal data," sometimes called "sensitive data," including data about ethnicity, religion, health and sexual preference.
The GDPR includes exceptions to that ban, but no exception for AI debiasing. Individuals can override the ban by explicitly consenting to the organization to use their special category data. However, in many situations, people cannot give valid consent. Consent must be genuinely voluntary, or "freely given," to be valid. But an individual who applies for a mortgage may feel obliged or pressured to consent. Therefore, the consent would generally not be sufficiently voluntary to be valid.
In a previously published paper, we map out various arguments in favor of and against adopting an exception to the GDPR's ban on using special categories of data. The main argument in support of an exception is that without one organizations have difficulties detecting and correcting discrimination by AI systems, because they are not allowed to use special category data.
The main arguments against an exception are: Simply storing sensitive personal data threatens our privacy or can lead to further stigmatization and discrimination; storing data always carries risks, especially when it comes to sensitive data; organizations could abuse an exception; and developing non-discriminatory AI is still in its infancy in a technical sense. Even if using personal data on, say, ethnicity is necessary to develop a non-discriminatory AI system, it does not mean that developing non-discriminatory AI is always possible when the organization can use such data.
An exception within the AI Act to process special categories of personal data to detect and correct bias within AI applies to providers of AI systems. Under the act, a provider is "a natural or legal person, public authority, agency or other body that develops an AI system or a general purpose AI model or that has an AI system developed and places them on the market or puts the system into service."
Roughly speaking, the AI Act's exception applies to developers and entities outsourcing the development of AI systems, for non-private use. The exception does not seem to apply to organizations renting a fully developed AI system as a service, for example.
The scope of the exception within the AI Act is limited, which may be justified by previously highlighted arguments against such an exception. The exception only applies to high-risk AI systems, which include systems used as safety components of a product, biometrics, critical infrastructures, education, employment, access to essential private or public services (including credit scoring), law enforcement and migration, and administration of justice and democratic processes.
While many AI systems fall under high-risk categories, the categories probably do not include some socially relevant AI systems. For example, the company behind a local dating app in the Netherlands wants to audit and correct its AI system to ensure the system does not accidentally discriminate against non-white users. But currently, the company cannot rely on an appropriate exception to the GDPR that would enable it to use data about users' ethnicity for debiasing its AI systems. The company could ask users for consent to use ethnicity data, but many might refuse. Therefore, the company would probably end up with a non-representative sample, which is unusable for debiasing its AI system. Even if the AI Act applies in the future, the company could not rely on it because dating apps do not fall under the act's high-risk categories.
The bank in our hypothetical examples could rely on the exception. A bank that develops AI to decide whether to give a mortgage loan would presumably fall under the high-risk category of evaluating the creditworthiness of an individual.
Under the AI Act, organizations may only process special category data for debiasing as far as is "strictly necessary." This is a high legal hurdle in Europe. Organizations must constantly check whether they still need the data for debiasing.
In Article 10, the AI Act states the exception only applies when organizations cannot use other data, such as anonymous or synthetic data. The act follows the definition of personal data in the GDPR: anonymous data are not considered personal data, and as such, the ban under Article 9 does not apply anyway. Synthetic data are a type of fake data that represent the same, or a similar, distribution of individuals but can no longer be linked to the individuals. While anonymization does not remove all risks regarding datasets, it does help to prevent specific individuals from being targeted. It, therefore, makes sense that the exception only applies when using anonymous or synthetic data is not feasible.
The exception applies to possible biases that are likely to affect individuals' health and safety, negatively impact fundamental rights or lead to discrimination prohibited under EU law. Furthermore, Recital 44c states the AI Act's debiasing exception is intended to "protect the right of (individuals) from the discrimination that might result from the bias in AI systems."
We think lawmakers did not want the exception to apply to all possible biases. For example, processing whether someone is the member of a trade union or not, for removing all possibly unfair biases, may not be lawmakers' intention, except in very specific employment contexts. Lawmakers were likely thinking about health and safety problems, fundamental rights or discrimination related to characteristics protected by EU law such as ethnicity. Additional guidance by regulators on this point would be welcome.
The final text of Article 10 includes several safeguards, added by the European Parliament, which consist of mandatory technical and organizational measures aiming to limit risks when organizations use special category data for debiasing. For instance, organizations must apply state-of-the-art security and privacy-preserving measures, ensure that only authorized people can access the data and must apply other organizational measures. The organization must delete the special category data when the bias is corrected, or earlier if possible. Moreover, all usual GDPR requirements regarding, for instance, data security and data minimization, continue to apply to personal data.
Finally, some caveats. These are first impressions, as a full analysis of the long and complicated AI Act is still underway and the precise text of the act may still change. Overall, EU lawmakers considerably improved the text of the exception, compared to the first draft by the European Commission, narrowing its scope and adding several safeguards for processing sensitive data.