Privacy regulators worldwide are examining how existing privacy laws apply to companies that create and use AI systems.
One key question regulators confront is why companies process personal data for AI-related activities — and if they do so for a reason recognized in their country's privacy law, such as if it is based on consent, necessary to perform a contract or if the company has a legitimate interest in the activity.
Policymakers should answer that question by recognizing legitimate interest as an important legal basis for processing personal data to train AI models and systems, which take many forms and serve a variety of purposes. When companies rely on legitimate interest to process data, they must examine the potential risks of their activities to data subjects — and then identify and implement appropriate safeguards. This approach can establish important guardrails for companies that create and use AI systems across industry sectors.
Where policymakers are moving
There is already significant global discussion about when companies may rely on a legitimate interest in processing personal data for AI training, deployment or other activities.
In the EU, the European Data Protection Board is due to give an opinion by the end of the year on how companies processing personal data for certain AI-related activities may do so using legitimate interest as a legal basis under the EU General Data Protection Regulation. Under the GDPR, legitimate interest can only be used when three conditions are met: there is a legitimate interest of the controller or a third party, the processing is necessary to pursue that interest and the rights and freedoms of the data subjects do not outweigh that interest, taking into account potential mitigating measures.
Other privacy regulators are asking the same questions. Earlier this year, France's data protection authority, the Commission nationale de l'informatique et des libertés, issued guidance recognizing that legitimate interest can support AI-related processing. The U.K. Information Commissioner's Office guidance focused on legitimate interest in the first chapter of its consultation series on generative AI. In November, Brazil's DPA, the Autoridade Nacional de Proteção de Dados, asked stakeholders for their views on how companies can process data for AI-related activities for legitimate interests.
Policymakers looking at these issues should not lump all types of AI models and the AI systems built atop those models together. We urge them to consider three important points.
Relying on legitimate interest has significant benefits
Training AI models requires processing a broad set of representative data. Companies can create guardrails to handle large amounts of personal data when their activities are based on legitimate interest. For example, the EDPB noted companies relying on legitimate interest should adopt measures that mitigate potential risks to the rights and freedoms of data subjects.
These safeguards should go beyond legal requirements and may include actions like deleting personal data shortly after use or extending transparency measures and data subject rights to situations where they are not required by law.
For many AI activities, relying on legitimate interest is simply a better fit than relying on other reasons for processing recognized in privacy law, especially consent. If companies were required to obtain consent for all AI-related processing, it would create impractical or infeasible requirements.
Sometimes, asking for consent does not make sense. One example is when a company trains an AI model to detect cybersecurity threats. That AI system will work better when trained on data about prior cybersecurity attacks and known bad actors. But cybercriminals are unlikely to consent to having their data used for such purposes — and privacy laws should not require asking for it.
In other cases, consent requirements can discourage socially beneficial research. For example, if researchers studying the COVID-19 pandemic created an AI system to use publicly available data to detect patterns across individuals who had COVID-19, they might want to train it on data from many regions. Requiring consent would discourage that broad research and may prompt researchers to focus only on individuals in a single neighborhood or city, rather than conducting a more globally relevant study.
Even when consent can be obtained, requiring it can skew the data used to train an AI system, increasing the likelihood of a system producing biased results. If a city wanted to develop an AI system to route emergency calls to first responders quicker, but all individuals of a specific race or gender withheld consent, the AI system would be trained on a skewed dataset and would be more likely to produce skewed results.
In short, legitimate interest can help companies process broader datasets and encourage them to implement appropriate guardrails and mitigations.
A company's interest is not in training AI — it is training AI for a particular use or purpose
Regulators examining whether a company has a legitimate interest in processing data to train an AI system should recognize the company's interest is not in training an AI model or system. Rather, they should recognize the company's interest in the underlying purpose or set of purposes for which the AI model or AI system is being trained.
For example, scientists may train an AI model to detect cancer cells on an X-ray image. They are processing personal information to train an AI model to increase the detection of cancer cells, not to train an AI model. Similarly, a bank may process personal data to train an AI system to detect fraudulent transactions; it processes the personal data to prevent fraud, not to train an AI system. In other words, the legitimate interest at issue is the underlying purpose or purposes for which an AI system is created, not the mere creation of an AI system.
In the EU, the EDPB asked how to apply legitimate interest when companies process personal data for "the creation and training of AI models." That seems to lump all AI training together — and ignores the different purposes for which different AI models and AI systems are developed. Regulators should not treat these different interests the same just because they involve AI training.
Using an AI model does not upend the legitimate interest test
Companies are using AI models as new tools to achieve many of the same purposes for which they already process personal data. Even though the tool has changed, the purpose has not — and the analyses should not be upended.
For example, a company offering cybersecurity services may have relied on legitimate interest to process data from publicly available websites about bad actors. If it has processed data without using AI but now wants to use an AI system to achieve the same purpose, its legal basis should not fundamentally change. Instead, the company should apply the legitimate interests test considering its new tool.
While its interest in identifying bad actors remains the same, the company may identify new safeguards or mitigating measures that can be applied in the new context. These safeguards may include defining the precise collection criteria, excluding certain sources or categories of data that may be sensitive, and deleting or anonymizing personal data at regular intervals, as recognized in an EDPB report released in May.
As more policymakers focus on how privacy laws apply to AI-related activities, they must dig into these important details and avoid treating all AI models and AI systems alike. Recognizing their different interests can help ensure companies can process data in responsible ways — subject to appropriate guardrails and mitigation measures, including those supported by legitimate interest.
Kate Goodloe is the managing director of policy for BSA | The Software Alliance.