Looking to comply with GDPR? Here's a primer on anonymization and pseudonymization

Published25 April 2017

Contributors:

Matt Wes

LinkedIn Corporation

Anonymization and pseudonymization are two terms that have been the topic of much discussion since the introduction of the General Data Protection Regulation. This is for good reason, too. The GDPR recognizes the privacy-enhancing effect of these techniques by providing exceptions to many of the most burdensome provisions of the regulation when steps are taken to de-identify personal data. By making it impossible or impractical to connect personal data to an identifiable person, data controllers and processors are permitted to use, process and publish personal information in just about any way that they choose.

This article will provide a brief introduction to the concepts of anonymization and pseudonymization, and how these techniques may be an important aspect to GDPR compliance. Given the well-publicized limitations of current techniques for de-identification, though, data controllers that choose to use pseudonymization and anonymization may run the risk of being the subject of a future enforcement action. This article should therefore serve as a cautionary tale of the benefits and limitations of early adoption of de-identification techniques as a central aspect to privacy compliance.

Anonymization v. pseudonymization

Although similar, anonymization and pseudonymization are two distinct techniques that permit data controllers and processors to use de-identified data. The difference between the two techniques rests on whether the data can be re-identified. Recital 26 of the GDPR defines anonymized data as “data rendered anonymous in such a way that the data subject is not or no longer identifiable.” Although circular, this definition emphasizes that anonymized data must be stripped of any identifiable information, making it impossible to derive insights on a discreet individual, even by the party that is responsible for the anonymization. When done properly, anonymization places the processing and storage of personal data outside the scope of the GDPR. The Article 29 Working Party has made it clear, though, that true data anonymization is an extremely high bar, and data controllers often fall short of actually anonymizing data.

By contrast to anonymization, Article 4(5) of the GDPR defines pseudonymization as “the processing of personal data in such a way that the data can no longer be attributed to a specific data subject without the use of additional information.” By holding the de-identified data separately from the “additional information,” the GDPR permits data handlers to use personal data more liberally without fear of infringing the rights of data subjects. This is because the data only becomes identifiable when both elements are held together.

By rendering data pseudonymous, controllers can benefit from new, relaxed standards under the GDPR. For instance, Article 6(4)(e) permits the processing of pseudonymized data for uses beyond the purpose for which the data was originally collected. Additionally, the GDPR envisions the possibility that pseudonymization will take on an important role in demonstrating compliance under the GDPR. Both Recital 78 and Article 25 list pseudonymization as a method to show GDPR compliance with requirements such as Privacy by Design. These benefits will make the pseudonymization of personal data an attractive opportunity to simultaneously achieve GDPR compliance and expand the uses of collected data.

Ultimately, the hallmark of both anonymization and pseudonymization is that the data should be nearly impossible to re-identify. This theory, however, has its practical and mathematical limits. As a well known study shows, it’s possible to personally identify 87 percent of the U.S. population based on just three data points: five-digit ZIP code, gender, and date-of-birth. So, even though each of these data points on their own would be non-identifiable, storing them together makes it possible to uniquely identify an individual. This presents a major concern for data controllers that seek to anonymize or pseudonymize data.

The risk of re-identification

The effectiveness (and legality) of both anonymization and pseudonymization hinge on their abilities to protect data subjects from re-identification. In Recital 26, the GDPR limits the ability of a data handler to benefit from pseudonymized data if re-identification techniques are “reasonably likely to be used, such as singling out, either by the controller or by another person to identify the natural person directly or indirectly.” Whether pseudonymized data is “reasonably likely” to be re-identified is a question of fact that depends on a number of factors such as the technique used to pseudonymize the data, where the additional identifiable data is stored in relation to the de-identified data, and the likelihood that non-identifiable data elements may be used together to identify an individual.

Unfortunately, the Article 29 Working Party has not yet released guidance on pseudonymization and what techniques may be appropriate to use. Additionally, because the GDPR does not go into effect until 2018, pseudonymization has not been the subject of enforcement actions by Data Protection Authorities. This puts data controllers who want to implement pseudonymization as an element of their GDPR compliance in a very difficult position.

Without knowing what it means for data to be “reasonably likely” to be re-identified, prospective adopters of pseudonymization put themselves at risk of being the target of an enforcement action for failing to properly de-identify personal data. This risk is further complicated by the fact that many controllers operate throughout many jurisdictions with different data standards. For instance, in the U.S., the Health Insurance Portability and Accountability Act has provided clear guidance for anonymizing data. HIPAA treats data as anonymized if 18 specific data elements are removed. The removal of these same 18 elements, however, may not be enough to achieve anonymization or even pseudonymization in the EU.

The EU is not the only place where the benefits of using pseudonymization and anonymization may be outweighed by their risk. In the U.S., many company privacy notices express their intent to use anonymization and de-identification techniques on personal data in order to skirt U.S. privacy laws and make use of personal data for statistical and analytic purpose. In some cases, these de-identified data sets may even be published for public consumption. As explained above, though, de-identification techniques have significant drawbacks relating to a bad actor’s ability to re-identify underlying personal data.

As companies seek to standardize their data practices across jurisdictions, the use of pseudonymized data may become commonplace in the U.S. Given the complete lack of guidance for how to effectively pseudonymize data, companies may find themselves in the uncomfortable position of being the target of an FTC action seeking to enforce data safeguards. Depending on how the FTC chooses to treat claims in privacy policies, it may find that inadequate anonymization or pseudonymization techniques are grounds for a Section 5 enforcement action.

The GDPR’s introduction of pseudonymization and its greater emphasis on anonymization will provide opportunities for data controllers to use personal data in more innovative ways. As companies seek to become GDPR compliant, though, the lack of Article 29 Working Party guidance will act as an ongoing barrier to the adoption of pseudonymization techniques. Until companies receive guidance about when data is “reasonably likely” to be re-identified, early adopters of pseudonymization will face an uncertain regulatory environment.

Contributors:

Matt Wes

LinkedIn Corporation

Tags:

Program management Law and regulation

Contributors:

Contributors:

Related Stories

Notas de la IAPP América Latina: ¿Qué dirección tomará el sistema de protección de datos personales en México?

Why privacy professionals should care about post-quantum cryptography

Un tribunal argentino anula un fallo que contenía citas no verificables generadas por IA

La figura en Chile del DPO según la Ley N.º 21.719: ¿en qué área debe ubicarse?