TOTAL: {[ getCartTotalCost() | currencyFilter ]} Update cart for total shopping_basket Checkout

The Privacy Advisor | Does anonymization or de-identification require consent under the GDPR? Related reading: At US House hearing, lawmakers examine data privacy's role in competition

rss_feed

Data de-identification has many benefits in the context of the EU General Data Protection Regulation. One of the recurring questions is whether consent is required to anonymize or de-identify data. In this article, we make the case that no consent is required for anonymization or other forms of de-identification.

For the purposes of this discussion, we use “de-identification” as a general term that includes the full spectrum of methods, from simple pseudonymization to full anonymization. 

Article 4(2) of the GDPR defines processing to mean any operation performed on personal data, including "adaptation or alteration." Any form of de-identification will invariably involve some form of adaptation or alteration of the data.

The GDPR requires there to be a legal basis to process personal data. The most well-known basis is the explicit consent of the data subject. However, under the GDPR, obtaining explicit consent can be difficult; in some scenarios, such as research, big data analytics and machine learning, obtaining explicit consent may be impractical or impossible. Furthermore, there is compelling evidence that obtaining consent can result in bias, which, in certain circumstances, can affect the outcome of the analysis. Introducing bias into data would not be in the interest of any of the stakeholders.

In Opinion 05/2014 of the Article 29 Working Party on Anonymisation Techniques, the Working Party stated:

“The Working Party considers that anonymisation as an instance of further processing of personal data can be considered to be compatible with the original purposes of the processing but only on condition the anonymisation process is such as to reliably produce anonymised information in the sense described in this paper.”

In other words, the processing of personal data in order to fully anonymize it is “compatible with the purpose for which the personal data are initially collected” and therefore does not require an additional legal basis, such as consent, specifically for the act of anonymizing.  

As used in the Article 29 Working Party opinion, the term “anonymization” reflects the highest or strongest level of de-identification. But lesser forms of de-identification, such as pseudonymization, are recognized in the GDPR as privacy-protective measures that reduce risk to the data subject. And to the extent an additional legal basis may be needed for the data-processing activity of de-identifying personal data, companies have an extremely strong basis to rely on a legal basis other than consent — in particular, “legitimate interests.”

After all, the legitimate interests' basis involves a balancing test between the legitimate interests of the controller or a third party, weighed against “the interests or fundamental rights and freedoms of the data subject which require protection of personal data.” De-identifying data is in the interests of both the data controller (because it reduces the risk of handling identified data) and the data subject (because it’s a means of protecting the data subject’s fundamental rights and freedoms). Thus, in nearly every conceivable case, data controllers should be able to de-identify data based on legitimate interests.

This conclusion will inevitably lead to better data protection practices. As a practical matter, if there were a requirement to obtain consent from individuals to anonymize or de-identify data, that would discourage the use of these data protective measures, which would increase risks for data subjects and controllers.

Once personal data has been fully anonymized, it is no longer personal data, and subsequent uses of the data are no longer regulated by the GDPR. Once personal data is de-identified to a level that falls short of full anonymization, subsequent uses of the de-identified data still must be compatible with the original purpose and may require an additional legal basis. But on both those counts, the de-identification helps support the secondary use of the data.

Irrespective of the de-identification method used, it is good practice to inform data subjects that their data will be de-identified and may be processed for additional purposes. The mechanism to inform the data subjects will depend on the circumstances (e.g., physical poster versus online).

7 Comments

If you want to comment on this post, you need to login.

  • comment David Turton • Jan 30, 2019
    Unfortunately bundling anonymized data in the same discussion as pseudonymization suggests that these concepts are equal under the law. 
    Pseudonymized data is still classified as personal data, therefore the rights and responsibilities remain for that data.
  • comment Colm Callanan • Feb 15, 2019
    In my opinion, pseudonizing data does not seem to be a great data protection measure in light of AI and algorythm processing.  These technologies are sufficiently complex to re-identify de-indentied personal information in a second processing scenario.  The other elephant in the room I suspect is that most profit making algorithm processing will find anonymous data less useful for their purposes.  Their legitimate interest reason and data retention period are the policy areas to focus on.
  • comment Valentin Conrad • Feb 15, 2019
    The fact that anonymisation for further processing of personal data may be considered to be compatible with the original purposes of the processing is one thing. It means that you can rely on the former legal justification. In this situation, the concerned data subjects should have been informed of a likely anonymization of their data, unless exceptions apply. Furthermore, in the field of clinical trials or human researches, anonyimization process is strictly framed.
  • comment Jean Pierre Mistral • Feb 18, 2019
    The problem you will be facing is the usage of the de-identified data, is it compatible with the original collection purpose of the data? for instance if you collect ID documents for an anti-fraud service, are you then allowed to hash the ID data to develop a datalake that will improve your system and increase the adoption of your services? for sure you have to inform the end-users of such additional usage and in my opinion express consent will be required. Then the ultimate question who will collect the consent the entity facing the end-users (e.g., the bank) or the service provider?
  • comment Luca Isnardi • Feb 19, 2019
    I wonder if considering "de-identification" a form of processing with a specific purpose is the right perspective. To me, for example, "de-identification" is more a security measure, like encryption.
  • comment Brian Martin • Feb 25, 2019
    An interesting and helpful article, thank you.  I'd just like to add that for pseudonymization to effectively protect data subjects' rights it is good practice to ensure through governance that the pseudonymization key is not, and cannot, be held by the team/s using the pseudonymized data sets.
  • comment Jussi Leppälä • Feb 28, 2019
    Great article, and I agree with the general conclusion.  However, using the expressions "simple pseudonymization" and "full anonymization" in the same sentence may not make full justice to the pseudonymization as it is defined in the GDPR: "‘pseudonymisation’ means the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information..."  Pseudonymization therefore creates two datasets: pseudonymized dataset and "additional information".  If the latter no longer exists, the first can "no longer be attributed to a specific data subject" - it is essentially anonymous. In this sense, pseudonymization algorithms are a special class of anonymization algorithms.  A normal encryption, where the data and the key are separated, would fulfill the pseudonymization definition.  However, from other parts of the Regulation, it becomes clear that a pseudonymized dataset is expected to have some utility on its own without the additional information.