TOTAL: {[ getCartTotalCost() | currencyFilter ]} Update cart for total shopping_basket Checkout

The Privacy Advisor | Top 10 operational impacts of the GDPR: Part 8 - Pseudonymization Related reading: Top 10 operational impacts of the GDPR: Part 4 - Cross-border data transfers

rss_feed

""

The General Data Protection Regulation (GDPR) is set to replace the Data Protection Directive 95/46/ec effective May 25, 2018. The GDPR is directly applicable in each member state and will lead to a greater degree of data protection harmonization across EU nations.

Although many companies have already adopted privacy processes and procedures consistent with the Directive, the GDPR contains a number of new protections for EU data subjects and threatens significant fines and penalties for non-compliant data controllers and processors once it comes into force in the spring of 2018.

With new obligations on such matters as data subject consent, data anonymization, breach notification, trans-border data transfers, and appointment of data protection officers, to name a few, the GDPR requires companies handling EU citizens’ data to undertake major operational reform.

This is the eighth in a series of articles addressing the top 10 operational impacts of the GDPR.

GDPR encourages “pseudonymization” of personal data

The concept of personally identifying information lies at the core of the GDPR. Any “personal data,” which is defined as “information relating to an identified or identifiable natural person ‘data subject’,” falls within the scope of the Regulation. The Regulation does not apply, however, to data that “does not relate to an identified or identifiable natural person or to data rendered anonymous in such a way that the data subject is no longer identifiable.”

The GDPR introduces a new concept in European data protection law – “pseudonymization” – for a process rendering data neither anonymous nor directly identifying. Pseudonymization is the separation of data from direct identifiers so that linkage to an identity is not possible without additional information that is held separately. Pseudonymization, therefore, may significantly reduce the risks associated with data processing, while also maintaining the data’s utility. For this reason, the GDPR creates incentives for controllers to pseudonymize the data that they collect. Although pseudonymous data is not exempt from the Regulation altogether, the GDPR relaxes several requirements on controllers that use the technique.

What is pseudonymous data?

The GDPR defines pseudonymization as “the processing of personal data in such a way that the data can no longer be attributed to a specific data subject without the use of additional information.” To pseudonymize a data set, the “additional information” must be “kept separately and subject to technical and organizational measures to ensure non-attribution to an identified or identifiable person.” In sum, it is a privacy-enhancing technique where directly identifying data is held separately and securely from processed data to ensure non-attribution.

Although Recital 28 recognizes that pseudonymization “can reduce risks to the data subjects,” it is not alone a sufficient technique to exempt data from the scope of the Regulation. Indeed, Recital 26 states that “[p]ersonal data which have undergone pseudonymization, which could be attributed to a natural person by the use of additional information, should be considered to be information on an identifiable natural person” (i.e., personal data). Thus, pseudonymization is “not intended to preclude any other measures of data protection” (Recital 28).

GDPR creates incentives for controllers to pseudonymize data

The Regulation recognizes the ability of pseudonymization to help protect the rights of individuals while also enabling data utility. Recital 29 emphasizes the GDPR’s aim “to create incentives to apply pseudonymization when processing personal data” and finds that “measures of pseudonymization should, whilst allowing general analysis, be possible” (emphasis added). These incentives appear in five separate sections of the Regulation.

  1. Pseudonymization may facilitate processing personal data beyond original collection purposes.

The GDPR requires controllers to collect data only for “specific, explicit and legitimate purposes.” Article 5 provides an exception to the purpose limitation principle, however, where data is further processed in a way that is “compatible” with the initial purposes for collection. Whether further processing is compatible depends on several factors outlined in Article 6(4), including the link between the processing activities, the context of the collection, the nature of the data, and the possible consequences for the data subject. An additional factor to consider is “the existence of appropriate safeguards, which may include encryption or pseudonymization” (Article 6(4)(e)). Thus, the GDPR allows controllers who pseudonymize personal data more leeway to process the data for a different purpose than the one for which they were collected.

  1. Pseudonymization is an important safeguard for processing personal data for scientific, historical and statistical purposes.

The GDPR also provides an exception to the purpose limitation principle for data processing for scientific, historical and statistical research. However, Article 89(1) requires controllers that process data for these purposes to implement “appropriate safeguards, in accordance with this Regulation, for the rights and freedoms of the data subject.” Specifically, controllers must adopt “technical and organizational measures” to adhere to the data minimization principle. The only example the Regulation provides is for controllers to use pseudonymization so that the processing “does not permit or no longer permits the identification of data subjects.”

  1. Pseudonymization is a central feature of “data protection by design.”

The GDPR for the first time introduces the concept of “data protection by design” into formal legislation. At the conceptual level, data protection by design means that privacy should be a feature of the development of a product, rather than something that is tacked on later. Thus, Article 25(1) requires controllers to implement appropriate safeguards “both at the time of the determination of the means for processing and at the time of the processing itself.” One way that controllers can do this is by pseudonymizing personal data.

  1. Controllers can use pseudonymization to help meet the GDPR’s data security requirements.

Under Article 32, controllers are required to implement risk-based measures for protecting data security. One such measure is the “pseudonymization and encryption of personal data” (Article 32(1)(a)). The use of pseudonymization potentially has profound implications under this provision. Controllers are required to notify a data protection authority any time there is a security incident that presents “a risk to the rights and freedoms of natural persons” (Article 33(1)). They must, moreover, notify the concerned individuals anytime that risk is “high” (Article 34(1)). Since pseudonymization reduces the risk of harm to data subjects, controllers that use it may be able to avoid notification of security incidents.

  1. Controllers do not need to provide data subjects with access, rectification, erasure or data portability if they can no longer identify a data subject.

A controllers may employ methods of pseudonymization that prevent it from being able to re-identify a data subject. For example, if a controller deletes the directly identifying data rather than holding it separately, it may not be capable of re-identifying the data without collecting additional information. Article 11 acknowledges this situation and provides an exemption from the rights to access, rectification, erasure and data portability outlined in Articles 15 through 20. The exemption applies only if "the controller is able to demonstrate that it is not in a position to identify the data subject" and, if possible, it provides notice of these practices to data subjects. The GDPR does not require a controller to hold additional information "for the sole purpose of complying with this Regulation." If, however, a data subject provides the controller with additional information that allows her to be identified in the data set, she must be permitted to exercise her rights under Articles 15 through 20.

  1. The GDPR encourages controllers to adopt codes of conduct that promote pseudonymization.

The GDPR encourages controllers to adopt codes of conduct that are approved by the Member States, the supervisory authorities, the European Data Protection Board or the Commission. Among other provisions outlined in Article 40, these codes of conduct should promote the use of pseudonymization as a way to comply with the Regulation (Article 40(2)(d)). As will be explored in a later article in this series, using codes of conduct allows controllers and processors to demonstrate adherence to the principles of the Regulation, and they may even be used as a mechanism for transferring personal data to third countries.

Pseudonymous data is not anonymous

Much debate surrounds the extent to which pseudonymized data can be reidentified. This issue is of critical importance because it determines whether a processing operation will be subject to the provisions of the Regulation. The GDPR adopts a more flexible approach than the traditional binary of the Data Protection Directive, focusing on the risk that data will reveal identifiable individuals. Thus, the key distinction between pseudonymous data, which is regulated by the GDPR, and anonymous data, which is not, is whether the data can be reidentified with reasonable effort.

To illustrate the concept of reidentification risk, it is important to distinguish between direct and indirect identifiers. The International Organization for Standardization (ISO) defines direct identifiers as “data that can be used to identify a person without additional information or with cross-linking through other information that is in the public domain.” They are data points that correspond directly to a person’s identity, such as a name, social security number or contact information.

Indirect identifiers are data that do not identify an individual in isolation but may reveal individual identities if combined with additional data points. For example, one frequently-cited study found that 87 percent of Americans can be uniquely identified by combining three indirect identifiers: date of birth, gender and ZIP code. In other words, while no individual can be singled out based on just a date of birth, when combined with gender and ZIP code, the lens focuses on a specific identity.

Pseudonymization involves removing or obscuring direct identifiers and, in some cases, certain indirect identifiers that could combine to reveal a person’s identity. These data points are then held in a separate database that could be linked to the de-identified database through the use of a key, such as a random identification number or some other pseudonym.

As a result of this process, pseudonymized data, unlike anonymous data, faces the risk of reidentification in two ways. First, a data breach may permit an attacker to obtain the key or otherwise link the pseudonymized data set to individual identities. Alternatively, even if the key is not revealed, a malicious actor may be able to identify individuals by combining indirect identifiers in the pseudonymous database with other available information.

The GDPR addresses the first concern in Recital 75, which instructs controllers to implement appropriate safeguards to prevent the “unauthorized reversal of pseudonymization.” To mitigate the risk, controllers should have in place appropriate technical (e.g., encryption, hashing or tokenization) and organizational (e.g., agreements, policies, privacy by design) measures separating pseudonymous data from an identification key.

In Recital 26, the GDPR recognizes the second type of reidentification risk by considering whether a method of reidentification is “reasonably likely to be used, such as singling out, either by the controller or by another person to identify the natural person directly or indirectly.” Such an analysis is necessarily contextual and “account should be taken of all the objective factors, such as the costs of and the amount of time required for identification, taking into consideration the available technology at the time of the processing and technological developments.”

The GDPR acknowledges that reidentification must be “reasonably likely”

Under the Directive, the Article 29 Working Party found that “pseudonymization is not a method of anonymization” because some risks of reidentification remained, even if those risks were very small. Thus, even when controllers deleted all identifying information and could not themselves reidentify a data set, the Working Party found that the data was still covered by the Directive if any third party could conceivably reidentify the data sometime in the future. A controller could escape regulation only by not collecting identifying information in the first place.

In contrast, by focusing on whether reidentification is “reasonably likely,” the GDPR may provide greater flexibility than the Directive. For example, where the controller deletes the identification key and the remaining indirect identifiers pose little risk of identifying an individual, the controller may be able to argue that there is no reasonable risk of reidentification. Recital 57 addresses this situation in relation to the data subject’s right to access personal data held by the controller. In cases where “the personal data processed by the controller do not permit the controller to identify a natural person, the data controller should not be obliged to acquire additional information in order to identify the data subject for the sole purposes of complying with any provision of this Regulation.” 

Conclusion

The GDPR introduces a novel concept into European data protection law, pseudonymization as a means of protecting the rights of individuals while also allowing controllers to benefit from the data’s utility. Although pseudonymized data still falls within the scope of the Regulation, some provisions are relaxed to encourage controllers to use the technique. Thus, controllers that pseudonymize their data sets will have an easier time using personal data for secondary purposes and for scientific and historical research, as well as meeting the Regulation’s data security and data by design requirements.

Photo credit: Carnevale a venezia 2011 via photopin (license)

Where to find the rules

Looking to dive deeper into the General Data Protection Regulation to read the text regarding pseudonymization for yourself? Find the full text of the Regulation here in our Resource Center.

You’ll want to focus on these portions:

Recitals

(26)   The principles of data protection should apply to any information concerning an identified or identifiable natural person. Personal data which have undergone pseudonymisation, which could be attributed to a natural person by the use of additional information should be considered to be information on an identifiable natural person. To determine whether a natural person is identifiable, account should be taken of all the means reasonably likely to be used, such as singling out, either by the controller or by another person to identify the natural person directly or indirectly. To ascertain whether means are reasonably likely to be used to identify the natural person, account should be taken of all objective factors, such as the costs of and the amount of time required for identification, taking into consideration the available technology at the time of the processing and technological developments. The principles of data protection should therefore not apply to anonymous information, namely information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable. This Regulation does not therefore concern the processing of such anonymous information, including for statistical or research purposes.

(28) The application of pseudonymisation to personal data can reduce the risks to the data subjects concerned and help controllers and processors to meet their data-protection obligations. The explicit introduction of ‘pseudonymisation’ in this Regulation is not intended to preclude any other measures of data protection.

(29) In order to create incentives to apply pseudonymisation when processing personal data, measures of pseudonymisation should, whilst allowing general analysis, be possible within the same controller when that controller has taken technical and organisational measures necessary to ensure, for the processing concerned, that this Regulation is implemented, and that additional information for attributing the personal data to a specific data subject is kept separately. The controller processing the personal data should indicate the authorised persons within the same controller.

(57) If the personal data processed by a controller do not permit the controller to identify a natural person, the data controller should not be obliged to acquire additional information in order to identify the data subject for the sole purpose of complying with any provision of this Regulation. However, the controller should not refuse to take additional information provided by the data subject in order to support the exercise of his or her rights. Identification should include the digital identification of a data subject, for example through authentication mechanism such as the same credentials, used by the data subject to log-in to the on-line service offered by the data controller.

(74) The responsibility and liability of the controller for any processing of personal data carried out by the controller or on the controller’s behalf should be established. In particular, the controller should be obliged to implement appropriate and effective measures and be able to demonstrate the compliance of processing activities with this Regulation, including the effectiveness of the measures. Those measures should take into account the nature, scope, context and purposes of the processing and the risk to the rights and freedoms of natural persons.

(75) The risk to the rights and freedoms of natural persons , of varying likelihood and severity, may result from personal data processing which could lead to physical, material or non-material damage, in particular: where the processing may give rise to discrimination, identity theft or fraud, financial loss, damage to the reputation, loss of confidentiality of personal data protected by professional secrecy, unauthorised reversal of pseudonymisation, or any other significant economic or social disadvantage; where data subjects might be deprived of their rights and freedoms or prevented from exercising control over their personal data; where personal data are processed which reveal racial or ethnic origin, political opinions, religion or philosophical beliefs, trade-union membership, and the processing of genetic data, data concerning health or data concerning sex life or criminal convictions and offences or related security measures; where personal aspects are evaluated, in particular analysing or predicting aspects concerning performance at work, economic situation, health, personal preferences or interests, reliability or behaviour, location or movements, in order to create or use personal profiles; where personal data of vulnerable natural persons, in particular of children, are processed; or where processing involves a large amount of personal data and affects a large number of data subjects.

(78) The protection of the rights and freedoms of natural persons with regard to the processing of personal data require that appropriate technical and organisational measures be taken to ensure that the requirements of this Regulation are met. In order to be able to demonstrate compliance with this Regulation, the controller should adopt internal policies and implement measures which meet in particular the principles of data protection by design and data protection by default. Such measures could consist, inter alia, of minimising the processing of personal data, pseudonymising personal data as soon as possible, transparency with regard to the functions and processing of personal data, enabling the data subject to monitor the data processing, enabling the controller to create and improve security features. When developing, designing, selecting and using applications, services and products that are based on the processing of personal data or process personal data to fulfil their task, producers of the products, services and applications should be encouraged to take into account the right to data protection when developing and designing such products, services and applications and, with due regard to the state of the art, to make sure that controllers and processors are able to fulfil their data protection obligations.The principles of data protection by design and by default should also be taken into consideration in the context of public tenders.

(85) A personal data breach may, if not addressed in an appropriate and timely manner, result in physical, material or non-material damage to natural persons such as loss of control over their personal data or limitation of their rights, discrimination, identity theft or fraud, financial loss, unauthorised reversal of pseudonymisation, damage to reputation, loss of confidentiality of personal data protected by professional secrecy or any other significant economic or social disadvantage to the natural person concerned. Therefore, as soon as the controller becomes aware that a personal data breach has occurred, the controller should notify the personal data breach to the supervisory authority without undue delay and, where feasible, not later than 72 hours after having become aware of it, unless the controller is able to demonstrate, in accordance with the accountability principle, that the personal data breach is unlikely to result in a risk to the rights and freedoms of natural persons. Where such notification cannot be achieved within 72 hours, the reasons for the delay should accompany the notification and information may be provided in phases without undue further delay.

(156) The processing of personal data for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes should be subject to appropriate safeguards for the rights and freedoms of the data subject pursuant to this Regulation. Those safeguards should ensure that technical and organisational measures are in place in order to ensure, in particular, the principle of data minimisation. The further processing of personal data for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes is to be carried out when the controller has assessed the feasibility to fulfil those purposes by processing data which do not permit or no longer permit the identification of data subjects, provided that appropriate safeguards exist (such as, for instance, pseudonymisation of the data). Member States should provide for appropriate safeguards for the processing of personal data for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes. Member States should be authorised to provide, under specific conditions and subject to appropriate safeguards for data subjects, specifications and derogations with regard to the information requirements and rights to rectification, to erasure, to be forgotten, to restriction of processing, to data portability, and to object when processing personal data for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes. The conditions and safeguards in question may entail specific procedures for data subjects to exercise those rights if this is appropriate in the light of the purposes sought by the specific processing along with technical and organisational measures aimed at minimising the processing of personal data in pursuance of the proportionality and necessity principles. The processing of personal data for scientific purposes should also comply with other relevant legislation such as on clinical trials.

Articles

Article 4: Definitions

-1  personal data

-5 pseudonymization

Article 5: Principles relating to processing of personal data

Article 6: Lawfulness of processing

Article 11: Processing which does not require identification

Article 25: Data protection by design and by default

Article 32: Security of processing

Article 40: Codes of conduct

Article 89: Safeguards and derogations relating to processing for archiving purposes in the public interest, scientific or historical purposes or statistical purposes

5 Comments

If you want to comment on this post, you need to login.

  • comment Justin Weiss • Feb 18, 2016
    This is a useful analysis. It seems like Article 10 is also implicated, insofar as the processes used to achieve pseudonymization may satisfy the requirement of removing identification, thereby relaxing certain requirements that are dependent on it - for example, articles 15-18.  Would you agree?
  • comment Jovan Stevovic • Feb 19, 2016
    Very useful article, thanks.  Imagine a scenario in which you have health records (disease, observation) and personal identifiable info (name, surname, zip). Of course when you keep all the data together you are dealing with sensitive health data and you must for example encrypt everything.
    What happens if we apply pseudonymization to this scenario and make health data look like this (disease, observation, unique-random-user-id) and personal info (unique-random-user-id, name, surname, zip) and we store them in two different physical locations where we for example can encrypt one of the two (e.g. personal info).
    
    Is the health data still sensitive or is it somehow "downgraded" to personal? In this way health records wouldn't need encryption, but just standard protection.
    
    This approach could have significant practical applications and, in my opinion, could highly facilitate security and data protection. However it's not confirmed by GDPR text or interpretations.
    
    Thanks
  • comment Gabriel Maldoff • Feb 19, 2016
    Justin, I believe your analysis is correct. I would add one thing for the sake of clarity: Article 10(2) exempts a controller from the Article 15-18 requirements (rights to access, rectification, erasure, restriction and data portability) only "in such cases the controller is able to demonstrate that it is not in a position to identify the data subject." In other words, pseudonymization alone probably will not exempt the controller from these requirements. Instead, this is the situation I described in the final section of the article -- where the controller pseudonymizes the data set such that it can no longer reidentify it with "reasonable effort." In those cases, where the data cannot be reidentified -- even by the controller -- the controller may not be required to grant the data subject rights outlined in Articles 15-18.
  • comment Gabriel Maldoff • Feb 19, 2016
    Jovan, glad you found it useful! The Reg makes clear that pseudonymous data is still treated as personal data (unless it cannot be reidentified). So, in the situation you describe, the data set is still personal data. It seems to me it would also still qualify as health data given that the data relates to health, regardless of whether it is pseudonymous. You're right that health data is a "special category of data," which is subject to some heightened requirements (especially around the consent required to process it). I haven't come across anything in the Reg that suggests pseudonymization would reduce special categories of data to regular old personal data. Rather, pseudonymization is presented as a useful tool for assuring adequate data security. Take a look at Article 9(2)(h) and Recital (42a). A controller can process health data, even though it's a special category, as long as it's "necessary" the health-related purpose. I don't see any requirement to encrypt the data, but I do agree that both pseudonymization and encryption are useful practices for protecting such sensitive information. The Reg recognizes those benefits too, which is why it encourages the technique in the five ways I outlined above.
  • comment Terence Savage • Sep 16, 2017
    I'm confused by the concept of pseudonymization.  If I communicate with my database under the GDPR and use a third part processor I have to have new security agreements with that third party.  I'm unlikely to contract with a party I don't trust and any benefit from pseudonymization is minimal.  If I process in-house the risk is even lower.
    My question is, therefore, what is the value to my company or those individuals?  In fact, isn't it more of a handicap as it adds to the challenge of fulfilling an SAR?