TOTAL: {[ getCartTotalCost() | currencyFilter ]} Update cart for total shopping_basket Checkout

Privacy Perspectives | Op-ed: Encrypted data may still be personal under GDPR Related reading: Infographic: Data protection and transfers if 'no-deal' Brexit



In Josh Gresham’s recent piece for the IAPP, he opined that encrypted data should not be considered personal data under the EU General Data Protection Regulation. Encryption of data cannot, however, be deterministic as to whether that information is personal. As Josh correctly discusses, the GDPR provides existing factors for controllers to make that determination.

First and foremost, a controller must decide whether the data relates to an “identified or identifiable natural person.” Putting aside the question of identified or identifiable, because this is where encryption may mask or eliminate identifiability, data at its core must relate to a natural person. A part number for an appliance does not relate to a person and is therefore not personal data. A part number in the inventory of Joe the plumber’s van out to make a repair relates to Joe and, therefore, is personal data.

In this case, the data at issue (the part number) is distinct from the data that identifies the person to whom the information relates: Joe. The more interesting case, of course, revolves around that data that identifies Joe: his name. Recital 26 provides the GDPR’s factors for determining identifiability: “To determine whether a natural person is identifiable, account should be taken of all the means reasonably likely to be used, such as singling out, either by the controller or by another person to identify the natural person directly or indirectly.”

Under that same recital, anonymized data is no longer person data because it has been rendered unidentifiable. Pseudonymized data, however, should be considered identifiable where it could be attributable to a natural person. 

Returning to Joe the plumber, that moniker could be Joe’s real name. It could also be anonymous, either by virtue of the company only hiring people named Joe and employing 50 plumbers, or by light-heartedly referring to all their plumbers as Joe. It could also be a pseudonym, where Joe is actually Mikhail the plumber, but everyone calls him Joe.

Now we turn to the issue of encryption.

In the foregoing example, if we were to encrypt the part number, the data still relates to Joe, and he is still identifiable (assuming Joe is not an anonymizing moniker as described above). In this instance, the encrypted data is still personal data because it relates to an identified person. Once again, the more interesting question revolves around encrypting the identifiable portion of the information we have: “Joe.” The problem with blanket statements around whether encrypted data is personal data or not is it ignores the concept of threat actors and the threats they impose.

Consider a software cloud-based service and three different threat actors. (Note: I’m talking software-as-a-service provider, not platform as a service or infrastructure as a service.) Assume that the data is encrypted at three levels: disk-level encryption, database-level encryption and field-level encryption. For a thief running off with the hard drive, disk-level encryption will no doubt foil their efforts. They cannot distinguish any data on an individualized identifiable basis without the key.

However, such encryption would not cover an administrator of the server who has access to the system. For that, we turn to database-level encryption. Again, they can see a bunch of encrypted files on the system, but they cannot distinguish individuals within that data.

Finally, let's turn to the cloud service itself. Assuming the customer holds the encryption key, the software cloud-based service can see the data structure it has developed, but it cannot see the field-level data. So not personal data for them, correct?

Unfortunately, it probably is.

Even if we assume every field is encrypted, and the customer holds the key, the structure can still reveal information. While they may not be able to decrypt 8ff32489f92f33416694be8fdc2d4c22 as Joe, they can still distinguish Joe from Mikhail, who is represented by 37c09709af3da468f7a1bc723e943ec7 in the database. This still meets the test of identifiability, namely the ability to single out an individual from among a group.

The software cloud-based service provider could still commit a host of privacy violations using this knowledge. The provider could aggregate data from various systems or over time create a dossier of user 8ff. It could try to use any unencrypted data (geolocation, etcetera) to reidentify user 8ff’s legal identity. It could sell 8ff’s activities (average parts per day, average customer visits, etcetera) to competitors. It could use 8ff’s data to prevent access to services (“You have too many parts in your inventory, you’re not allowed to check out additional parts from the warehouse.” See Article 22 ). It could delete or alter data to make it look like 8ff was using more or less parts than it actually was. Use of homomorphic encryption could result in even more fine-grained manipulation of data without actually seeing the data.

Encryption only affects the confidentiality of information. It doesn’t, necessarily, affect the other security elements of integrity or availability, nor does it automatically render data non-identifiable. One must examine the implementation and the full landscape of threats to make that determination.

photo credit: Security via photopin (license)


If you want to comment on this post, you need to login.

  • comment Richard Santalesa • Mar 12, 2019
    I'm sorry guys, but this is pure madness. If data is sufficiently encrypted, meaning that the contents of the file cannot be read or viewed, then it simply CANNOT be personal data. Count me out of this world view...
  • comment Terry Chapman • Mar 12, 2019
    There is nothing inherently bad about the collection and storage of personal data provided it is required for the transaction / relationship.  When looking at this question you have to look at the bigger picture.  Provided that the data is needed, and is purged when not needed.  The value of encryption comes if there is a loss of data.  In this case we have to evaluate if the data is "likely to result in a risk to the rights and freedoms of affected individuals".  If the data is properly encrypted the risk of this is likely to be greatly reduced.   If the encrypted part information can not be tied to something that impacts the persons' rights and freedoms there is little issue.  BUT, You do have to take into account the larger context.  For instance if a cancer center were to have a breach of patient data, the fact that the diagnoses is encrypted would not reduce the impact of the loss as long at the encrypted data could be tied to an identifiable person because you could infer that there is a high probability that "joe" visited the center related to a medical issue associated with cancer.
  • comment Jay Exum • Mar 12, 2019
    I do not follow how we get from "Even if we assume every field is encrypted, and the customer holds the key, the structure can still reveal information" to "It could try to use any unencrypted data (geolocation, etcetera) to reidentify user 8ff’s legal identity."  If you have unencrypted data of this sort, then every field is NOT encrypted, by definition.
  • comment Angus Chan • Mar 12, 2019
    If this scenario is saying "I encrypted all data and therefore I do not collect or possess personal information", then I agree the statement is false.  Encrypting data does not mean you are not collecting or holding personal information.  It only means that in the event of a data breach, you have safeguards in place to make it unlikely that personal information can be abused/determined and you *may* not need to notify individuals.
    In your day to day operations, you still need to have procedural and technological controls that govern when that data can be decrypted and used.  Even if you only permit certain elements to be decrypted, you can still use the encrypted information to infer some personal attributes - but that is for internal operations and data governance to handle those situations.
    The article could be a bit more clearer in the scenario as it assumes that the service provider is able to decrypt some information such as inventory and IP Address.
  • comment Michael Masterson • Mar 12, 2019
    It seems that different readers found different suggestions by Josh Gresham in his article. 
    I read a suggestion that the EDPB could, under certain conditions, treat encrypted data like anonymized data. A lawyer friend thought he was suggesting that even now controllers and processors might be allowed to treat encrypted data like anonymized data. 
    Gresham was clear. He said the EDPB could specify certain encryption control algorithms and key control standards that controllers and processors alike could use to make personal data effectively inaccessible to unauthorized parties, and they'd then waive some or all of the statutory data protection burdens. He also made a reasonable case, I thought, that doing so is worth considering.
  • comment Gary LaFever • Mar 12, 2019
    Encryption does not protect personal data in use because when decrypted the data is exposed and vulnerable to misuse. Similarly, Differential Privacy, Static Tokenisation and data masking do not protect personal data from unauthorized re-identification when data sets are combined and used for multiple use purposes via the "Mosaic Effect." In contrast, Pseudonymization has gained attention with its explicit codification in the GDPR. Legal experts have highlighted the potential for new Pseudonymization technologies to address the unique privacy issues raised for legal possession and processing of personal data.
    Article 4(5) of the GDPR now specifically defines Pseudonymization as “the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organisational measures to ensure that the personal data are not attributed to an identified or identifiable natural person.”
    Static Tokenisation (where a common token is used to replace different occurrences of the same value – e.g., replacing all occurrences “James Smith” with “ABCD”) fails to satisfy GDPR definitional requirements since unauthorized re-identification is “trivial between records using the same pseudonymised attribute to refer to the same individual.”
    As a result, Static Tokenisation does not satisfy the “Balancing of Interest” test necessary to satisfy Article 6(1)(f) requirements for Legitimate Purpose processing nor is it included in the technical safeguards listed in Article 6(4) to help ensure that secondary processing like Analytics, AI & ML is a lawful compatible purpose. The Article 29 Working Party has highlighted “the special role that safeguards play in reducing the undue impact on the data subjects thereby changing the balance of rights and interests to the extent that the data controller’s legitimate interests will not be overridden” and “safeguards may include technical and organizational measures to ensure functional separation” and ”Pseudonymization…will play a role with regard to the evaluation of the potential impact of the processing on the data subject, and thus, may in some cases play a role in tipping the balance in favour of the controller.”
    The Article 29 Working Party further highlights that “functional separation includes secure key-coding personal data transferred outside of an organization and prohibiting outsiders from re-identifying data subject” by using “rotating salts” or “randomly allocated” dynamic versus static, persistent or recurring tokens.
    GDPR compliant Pseudonymization, represents a unique means to help support the actual use of data in the form of lawful secondary processing like Analytics, AI & ML by technically enforcing functional separation protection.
  • comment Ken Mortensen • Mar 12, 2019
    "The medieval doctors of divinity who did not pretend to settle how many angels could dance on the point of a needle cut a very poor figure as far as romantic credulity is concerned beside the modern physicists who have settled to the billionth of a millimetre every movement and position in the dance of the electrons."
    George Bernard Shaw's opening in his play, Saint Joan
    I feel as though the GDPR has caused us to enter an existential crisis as privacy professionals and we forget practical upshots of what we are doing an begin to count the angels.
  • comment Michael Timms • Mar 13, 2019
    I think Jason is correct in his understanding, to my understanding as good as we’re going to get in this world. The classification of the data is separate and differnent from the controls protecting it.
    As the GDPR stands encrypted data is clearly personal data, protected by a organisational and technical measure. Inference is possible, as is theft of key material, advances in cryptanalysis, social engineering, stealing other datasets to use as cribs.
    I do think supervisory authorities will look on well implemented cryptography as both a good mitigation and a good harm prevention measure, so do hold faith with pretty good privacy technology, just be cognizant of the fact that it’s not anonymization. Really clear unless the law changes and you lost encrypted personal data you’d need the supervisory authority to consider how your use of encryption reduced harm. This will probably require that your data protection program has enough accountability in it that this is demonstrable.
    The EDPB has a tricky path to tread in terms of providing decent protection from some extremely capable entities and not shutting down the digital economy of the EU. It would be great for them to agree something like NIST (SP) 800-175B as an example use,  and I think that IAPPs privacy engineering forum is a decent place to have IAPP members consider this.
    Just as a disclaimer I work for a security company making amongst other things encryption software. I’d be happy if encryption did provide a magic wand, unfortunately even the best controls can be circumvented.
  • comment Dustin Berger • Mar 13, 2019
    Ordinarily, a few random salt bits added to the original data prior to encryption (these bits would be discarded after decryption) would resolve the problem of a given person's ID being identifiable. So, although I agree that in the hypothetical presented here there is some risk of identification, a simple technical change would seem to eliminate this possibility (as long as the encryption algorithm remains uncompromised).
  • comment Jussi Leppälä • Mar 13, 2019
    Thank you, Jason, for the analysis.  While it is an interesting and relevant question whether encrypted data could reveal some personal information without using the key, I do not believe that it is necessarily decisive when determining whether that data is personal data or not.  The definition of personal data does not appear to take a position on who can identify a person, "directly or indirectly".  It is enough that somebody has the capability to do it.  Therefore, it may not be correct to say that some data would "not be personal data for them" but that it would be personal data for somebody else.  If it is personal data, it is personal data for everybody even if there were parties who would not be able to do the identification themselves.  Other parts of the Regulation recognize situations where a controller processes personal data but does not hold information to "identify data subject" (Article 11).
  • comment David Draycott • Mar 13, 2019
    I think the simple question has to be can you process the data in relation to an individual? If yes then you would have to consider it personal data. The CJEU found IP address ( Case 582/14 Patrick Breyer v Germany) can be considered personal data in some circumstances, even if the person cannot be identified there behaviour can be tracked. I would therefore assume that if "8ff's" behaviour/activities etc. can be tracked, the data would be personal.  I'm not certain the WP29 ever suggested encrypted data would not be personal unlike truly anonymised data.....and we may even trip over that at some point......the lawyers will have a field day......
  • comment Mark Chick • Mar 13, 2019
    Fascinating articles. I'm left seeing both perspectives. Taking the perspective that encrypted data is not personal data, a scenario that occurs to me is one where I am data centre, storing personal data on behalf of my customer (controller). I take comfort that I am not a processor subject to GDPR for the service I provide to my customer because the data I store is encrypted and I do not have access to the keys. (1) Commercially, how would my customer react to me being cheaper although possibly less secure, not being bound by GDPR? By less secure, I'm referring to the availability of my customer's data? Let's ignore compliance vs. risk for the purposes of this example. (2) Also, consider if my client's keys become compromised and the encryption is no longer able to be relied upon. Should I be worried that I am now processing personal data under GDPR? Although I think that encrypted data is personal data, a compliant approach would make sense to me even if I thought otherwise.
  • comment David Draycott • Mar 15, 2019
    Storing is at least have the ability to delete the data and you you would fall in scope as a processor
  • comment Jeroen Terstegge • Mar 15, 2019
    You're making it far to difficult. Recital 26 GDPR is the answer here. A person is identifiable from the data if -under the circumstances- it is reasonably likely that the controller (or a third person) has the means to identify the data subject. So, if data is enceypted and the controller has the key, the enceypted data are personal data (and the hosting provider is a processor). If a thief runs of with the encrypted file, it constitutes a data breach (although it is up for debate whether that breach must be reported). However, if the thief didn't steal the key too, the stolen data are normally NOT personal data for him. Theoretical decryption is not the decisive factor here, it's contextual.
  • comment Jeroen Terstegge • Mar 15, 2019
    There is also a history to the part of recital 26 about anonymous data you quote in your article. At some point in the negotiations a definition of anonymous data was floated, similar to what Germany had in its pre-GDPR Data Protection Act. However, it was rightfully determined that such definition made no sense; if data don’t meet the definition of personal data in art. 4.1, the data is not personal data and the GDPR does not apply. Inserting a definition of anonymous data next to a definition of personal data would mean that there could be an unintended grey zone in between. So the definition was deleted. But, as often happened to such deletions from the body of the GDPR, the topic was moved to the recitals (which under EU law aren’t law) and it was clarified in the second part of recital 26 that anonymized data are not personal data, which, of course, given the first part of recital 26, completely depends on the method of anonymization and the circumstances of the case. So, the GDPR does not require and interpretation the term ‘anonymous data’; only an interpretation of the term ‘personal data’.