12 Feb. 2016

Top 10 operational impacts of the GDPR: Part 8 - Pseudonymization

Although many companies have already adopted privacy processes and procedures consistent with the Directive, the GDPR contains a number of new protections for EU data subjects and threatens significant fines and penalties for non-compliant data controllers and processors once it comes into force in the spring of 2018.

With new obligations on such matters as data subject consent, data anonymization, breach notification, trans-border data transfers, and appointment of data protection officers, to name a few, the GDPR requires companies handling EU citizens’ data to undertake major operational reform.

This is the eighth in a series of articles addressing the top 10 operational impacts of the GDPR.

GDPR encourages “pseudonymization” of personal data

The concept of personally identifying information lies at the core of the GDPR. Any “personal data,” which is defined as “information relating to an identified or identifiable natural person ‘data subject’,” falls within the scope of the Regulation. The Regulation does not apply, however, to data that “does not relate to an identified or identifiable natural person or to data rendered anonymous in such a way that the data subject is no longer identifiable.”

The GDPR introduces a new concept in European data protection law – “pseudonymization” – for a process rendering data neither anonymous nor directly identifying. Pseudonymization is the separation of data from direct identifiers so that linkage to an identity is not possible without additional information that is held separately. Pseudonymization, therefore, may significantly reduce the risks associated with data processing, while also maintaining the data’s utility. For this reason, the GDPR creates incentives for controllers to pseudonymize the data that they collect. Although pseudonymous data is not exempt from the Regulation altogether, the GDPR relaxes several requirements on controllers that use the technique.

What is pseudonymous data?

The GDPR defines pseudonymization as “the processing of personal data in such a way that the data can no longer be attributed to a specific data subject without the use of additional information.” To pseudonymize a data set, the “additional information” must be “kept separately and subject to technical and organizational measures to ensure non-attribution to an identified or identifiable person.” In sum, it is a privacy-enhancing technique where directly identifying data is held separately and securely from processed data to ensure non-attribution.

Although Recital 28 recognizes that pseudonymization “can reduce risks to the data subjects,” it is not alone a sufficient technique to exempt data from the scope of the Regulation. Indeed, Recital 26 states that “[p]ersonal data which have undergone pseudonymization, which could be attributed to a natural person by the use of additional information, should be considered to be information on an identifiable natural person” (i.e., personal data). Thus, pseudonymization is “not intended to preclude any other measures of data protection” (Recital 28).

GDPR creates incentives for controllers to pseudonymize data

The Regulation recognizes the ability of pseudonymization to help protect the rights of individuals while also enabling data utility. Recital 29 emphasizes the GDPR’s aim “to create incentives to apply pseudonymization when processing personal data” and finds that “measures of pseudonymization should, whilst allowing general analysis, be possible” (emphasis added). These incentives appear in five separate sections of the Regulation.

Pseudonymization may facilitate processing personal data beyond original collection purposes.

The GDPR requires controllers to collect data only for “specific, explicit and legitimate purposes.” Article 5 provides an exception to the purpose limitation principle, however, where data is further processed in a way that is “compatible” with the initial purposes for collection. Whether further processing is compatible depends on several factors outlined in Article 6(4), including the link between the processing activities, the context of the collection, the nature of the data, and the possible consequences for the data subject. An additional factor to consider is “the existence of appropriate safeguards, which may include encryption or pseudonymization” (Article 6(4)(e)). Thus, the GDPR allows controllers who pseudonymize personal data more leeway to process the data for a different purpose than the one for which they were collected.

Pseudonymization is an important safeguard for processing personal data for scientific, historical and statistical purposes.

The GDPR also provides an exception to the purpose limitation principle for data processing for scientific, historical and statistical research. However, Article 89(1) requires controllers that process data for these purposes to implement “appropriate safeguards, in accordance with this Regulation, for the rights and freedoms of the data subject.” Specifically, controllers must adopt “technical and organizational measures” to adhere to the data minimization principle. The only example the Regulation provides is for controllers to use pseudonymization so that the processing “does not permit or no longer permits the identification of data subjects.”

Pseudonymization is a central feature of “data protection by design.”

The GDPR for the first time introduces the concept of “data protection by design” into formal legislation. At the conceptual level, data protection by design means that privacy should be a feature of the development of a product, rather than something that is tacked on later. Thus, Article 25(1) requires controllers to implement appropriate safeguards “both at the time of the determination of the means for processing and at the time of the processing itself.” One way that controllers can do this is by pseudonymizing personal data.

Controllers can use pseudonymization to help meet the GDPR’s data security requirements.

Under Article 32, controllers are required to implement risk-based measures for protecting data security. One such measure is the “pseudonymization and encryption of personal data” (Article 32(1)(a)). The use of pseudonymization potentially has profound implications under this provision. Controllers are required to notify a data protection authority any time there is a security incident that presents “a risk to the rights and freedoms of natural persons” (Article 33(1)). They must, moreover, notify the concerned individuals anytime that risk is “high” (Article 34(1)). Since pseudonymization reduces the risk of harm to data subjects, controllers that use it may be able to avoid notification of security incidents.

Controllers do not need to provide data subjects with access, rectification, erasure or data portability if they can no longer identify a data subject.

A controllers may employ methods of pseudonymization that prevent it from being able to re-identify a data subject. For example, if a controller deletes the directly identifying data rather than holding it separately, it may not be capable of re-identifying the data without collecting additional information. Article 11 acknowledges this situation and provides an exemption from the rights to access, rectification, erasure and data portability outlined in Articles 15 through 20. The exemption applies only if "the controller is able to demonstrate that it is not in a position to identify the data subject" and, if possible, it provides notice of these practices to data subjects. The GDPR does not require a controller to hold additional information "for the sole purpose of complying with this Regulation." If, however, a data subject provides the controller with additional information that allows her to be identified in the data set, she must be permitted to exercise her rights under Articles 15 through 20.

The GDPR encourages controllers to adopt codes of conduct that promote pseudonymization.

The GDPR encourages controllers to adopt codes of conduct that are approved by the Member States, the supervisory authorities, the European Data Protection Board or the Commission. Among other provisions outlined in Article 40, these codes of conduct should promote the use of pseudonymization as a way to comply with the Regulation (Article 40(2)(d)). As will be explored in a later article in this series, using codes of conduct allows controllers and processors to demonstrate adherence to the principles of the Regulation, and they may even be used as a mechanism for transferring personal data to third countries.

Pseudonymous data is not anonymous

Much debate surrounds the extent to which pseudonymized data can be reidentified. This issue is of critical importance because it determines whether a processing operation will be subject to the provisions of the Regulation. The GDPR adopts a more flexible approach than the traditional binary of the Data Protection Directive, focusing on the risk that data will reveal identifiable individuals. Thus, the key distinction between pseudonymous data, which is regulated by the GDPR, and anonymous data, which is not, is whether the data can be reidentified with reasonable effort.

To illustrate the concept of reidentification risk, it is important to distinguish between direct and indirect identifiers. The International Organization for Standardization (ISO) defines direct identifiers as “data that can be used to identify a person without additional information or with cross-linking through other information that is in the public domain.” They are data points that correspond directly to a person’s identity, such as a name, social security number or contact information.

Indirect identifiers are data that do not identify an individual in isolation but may reveal individual identities if combined with additional data points. For example, one frequently-cited study found that 87 percent of Americans can be uniquely identified by combining three indirect identifiers: date of birth, gender and ZIP code. In other words, while no individual can be singled out based on just a date of birth, when combined with gender and ZIP code, the lens focuses on a specific identity.

Pseudonymization involves removing or obscuring direct identifiers and, in some cases, certain indirect identifiers that could combine to reveal a person’s identity. These data points are then held in a separate database that could be linked to the de-identified database through the use of a key, such as a random identification number or some other pseudonym.

As a result of this process, pseudonymized data, unlike anonymous data, faces the risk of reidentification in two ways. First, a data breach may permit an attacker to obtain the key or otherwise link the pseudonymized data set to individual identities. Alternatively, even if the key is not revealed, a malicious actor may be able to identify individuals by combining indirect identifiers in the pseudonymous database with other available information.

The GDPR addresses the first concern in Recital 75, which instructs controllers to implement appropriate safeguards to prevent the “unauthorized reversal of pseudonymization.” To mitigate the risk, controllers should have in place appropriate technical (e.g., encryption, hashing or tokenization) and organizational (e.g., agreements, policies, privacy by design) measures separating pseudonymous data from an identification key.

In Recital 26, the GDPR recognizes the second type of reidentification risk by considering whether a method of reidentification is “reasonably likely to be used, such as singling out, either by the controller or by another person to identify the natural person directly or indirectly.” Such an analysis is necessarily contextual and “account should be taken of all the objective factors, such as the costs of and the amount of time required for identification, taking into consideration the available technology at the time of the processing and technological developments.”

The GDPR acknowledges that reidentification must be “reasonably likely”

Under the Directive, the Article 29 Working Party found that “pseudonymization is not a method of anonymization” because some risks of reidentification remained, even if those risks were very small. Thus, even when controllers deleted all identifying information and could not themselves reidentify a data set, the Working Party found that the data was still covered by the Directive if any third party could conceivably reidentify the data sometime in the future. A controller could escape regulation only by not collecting identifying information in the first place.

In contrast, by focusing on whether reidentification is “reasonably likely,” the GDPR may provide greater flexibility than the Directive. For example, where the controller deletes the identification key and the remaining indirect identifiers pose little risk of identifying an individual, the controller may be able to argue that there is no reasonable risk of reidentification. Recital 57 addresses this situation in relation to the data subject’s right to access personal data held by the controller. In cases where “the personal data processed by the controller do not permit the controller to identify a natural person, the data controller should not be obliged to acquire additional information in order to identify the data subject for the sole purposes of complying with any provision of this Regulation.”

Conclusion

The GDPR introduces a novel concept into European data protection law, pseudonymization as a means of protecting the rights of individuals while also allowing controllers to benefit from the data’s utility. Although pseudonymized data still falls within the scope of the Regulation, some provisions are relaxed to encourage controllers to use the technique. Thus, controllers that pseudonymize their data sets will have an easier time using personal data for secondary purposes and for scientific and historical research, as well as meeting the Regulation’s data security and data by design requirements.

Photo credit: Carnevale a venezia 2011 via photopin (license)