In Josh Gresham’s recent piece for the IAPP, he opined that encrypted data should not be considered personal data under the EU General Data Protection Regulation. Encryption of data cannot, however, be deterministic as to whether that information is personal. As Josh correctly discusses, the GDPR provides existing factors for controllers to make that determination.

First and foremost, a controller must decide whether the data relates to an “identified or identifiable natural person.” Putting aside the question of identified or identifiable, because this is where encryption may mask or eliminate identifiability, data at its core must relate to a natural person. A part number for an appliance does not relate to a person and is therefore not personal data. A part number in the inventory of Joe the plumber’s van out to make a repair relates to Joe and, therefore, is personal data.

In this case, the data at issue (the part number) is distinct from the data that identifies the person to whom the information relates: Joe. The more interesting case, of course, revolves around that data that identifies Joe: his name. Recital 26 provides the GDPR’s factors for determining identifiability: “To determine whether a natural person is identifiable, account should be taken of all the means reasonably likely to be used, such as singling out, either by the controller or by another person to identify the natural person directly or indirectly.”

Under that same recital, anonymized data is no longer person data because it has been rendered unidentifiable. Pseudonymized data, however, should be considered identifiable where it could be attributable to a natural person. 

Returning to Joe the plumber, that moniker could be Joe’s real name. It could also be anonymous, either by virtue of the company only hiring people named Joe and employing 50 plumbers, or by light-heartedly referring to all their plumbers as Joe. It could also be a pseudonym, where Joe is actually Mikhail the plumber, but everyone calls him Joe.

Now we turn to the issue of encryption.

In the foregoing example, if we were to encrypt the part number, the data still relates to Joe, and he is still identifiable (assuming Joe is not an anonymizing moniker as described above). In this instance, the encrypted data is still personal data because it relates to an identified person. Once again, the more interesting question revolves around encrypting the identifiable portion of the information we have: “Joe.” The problem with blanket statements around whether encrypted data is personal data or not is it ignores the concept of threat actors and the threats they impose.

Consider a software cloud-based service and three different threat actors. (Note: I’m talking software-as-a-service provider, not platform as a service or infrastructure as a service.) Assume that the data is encrypted at three levels: disk-level encryption, database-level encryption and field-level encryption. For a thief running off with the hard drive, disk-level encryption will no doubt foil their efforts. They cannot distinguish any data on an individualized identifiable basis without the key.

However, such encryption would not cover an administrator of the server who has access to the system. For that, we turn to database-level encryption. Again, they can see a bunch of encrypted files on the system, but they cannot distinguish individuals within that data.

Finally, let's turn to the cloud service itself. Assuming the customer holds the encryption key, the software cloud-based service can see the data structure it has developed, but it cannot see the field-level data. So not personal data for them, correct?

Unfortunately, it probably is.

Even if we assume every field is encrypted, and the customer holds the key, the structure can still reveal information. While they may not be able to decrypt 8ff32489f92f33416694be8fdc2d4c22 as Joe, they can still distinguish Joe from Mikhail, who is represented by 37c09709af3da468f7a1bc723e943ec7 in the database. This still meets the test of identifiability, namely the ability to single out an individual from among a group.

The software cloud-based service provider could still commit a host of privacy violations using this knowledge. The provider could aggregate data from various systems or over time create a dossier of user 8ff. It could try to use any unencrypted data (geolocation, etcetera) to reidentify user 8ff’s legal identity. It could sell 8ff’s activities (average parts per day, average customer visits, etcetera) to competitors. It could use 8ff’s data to prevent access to services (“You have too many parts in your inventory, you’re not allowed to check out additional parts from the warehouse.” See Article 22 ). It could delete or alter data to make it look like 8ff was using more or less parts than it actually was. Use of homomorphic encryption could result in even more fine-grained manipulation of data without actually seeing the data.

Encryption only affects the confidentiality of information. It doesn’t, necessarily, affect the other security elements of integrity or availability, nor does it automatically render data non-identifiable. One must examine the implementation and the full landscape of threats to make that determination.

photo credit: Security via photopin (license)