TOTAL: {[ getCartTotalCost() | currencyFilter ]} Update cart for total shopping_basket Checkout

Privacy Perspectives | Is encrypted data personal data under the GDPR? Related reading: Why the 'encryption exception' may be over used

rss_feed

""

As businesses across the world have begun adjusting to life under the EU General Data Protection Regulation, an important question continues to crop up: Should encrypted data be treated as personal data? The answer to this question has significant ramifications for the modern e-commerce world.

At its most basic, encryption is a way of protecting the privacy of your data. It actually spans much of human history. Cryptography dates back to the ancient Greeks, who used simple ciphers to encode messages.

Conceptually, modern encryption is not too complicated an idea, although the actual technology that goes into it is complex. It works like this: The party doing the encryption takes data and uses an “encryption key” to encode it so that it appears unintelligible. The recipient uses the encryption key to make it readable again. The encryption key itself is a collection of algorithms that are designed to be completely unique, and without the encryption key, the data cannot be accessed. As long as the key is well designed, the encrypted data is safe.

The GDPR generally follows a binary approach to data in that it’s either personal or it’s not. If data is considered to be personal data, the full weight of the GDPR’s regulatory regime applies to any entity processing that information.

Whether data is considered personal depends on whether it relates to a person who “can be identified, directly or indirectly[.]” The crucial question then becomes: How much effort would a potential data controller have to expend to identify a person so that the data would be considered personal?

The more logical answer would be that there is some sort of “reasonableness” limitation, which seems to be accepted thinking. This approach lines up with passages within the GDPR itself, such as Recital 26.

Where this “reasonableness” line is drawn has huge implications for encryption and, by extension, the future of online commerce. If under the GDPR, encrypted data is regarded as personal data, thus subjecting any businesses that process the data to regulation and potential liability, it will hamper the growth of the digital economy.

Today, the question of how encrypted data would be viewed under the GPDR is an open one.

The GDPR is clearly in favor of encryption. For example, Article 34, Section 3(a), frees data controllers from having to notify affected individuals about a personal data breach if the controller has implemented protection measures, “in particular those that render the personal data unintelligible to any person who is not authorised to access it, such as encryption.”

However, the issue of how encrypted data should be treated gets knotty due to the GDPR’s inclusion of two other concepts: anonymization and pseudonymization.

In short, “anonymized” data is that which has been irreversibly stripped of any way of identifying the underlying individual, even by the organization that did the anonymizing. It is like locking the data up and throwing away the key.

Pseudonymization, on the other hand, involves “the processing of personal data in such a way that the data can no longer be attributed to a specific data subject without the use of additional information.” While it does provide additional safeguards, pseudonymized data, unlike anonymized data, is still unequivocally considered personal data under the GDPR, as noted in Recital 26.

Commentators have often concluded that because encryption is more conceptually similar to pseudonymization, encrypted data would also be considered personal data under the GDPR.

The problem with that approach is that encryption is neither pseudonymization nor anonymization. The fundamental difference is: Who holds the metaphorical key?

In anonymization, no one does. The key is gone. In pseudonymization, the same party who pseudonymizes the data usually does. However, with encryption, many of the parties who are processing the data, such as cloud storage providers, do not have the encryption key to unscramble that data. The encryption key stays with the generator or the end user of the data.

This difference is crucial. For all intents and purposes, a cloud provider who hosts encrypted data is not processing personal data. The cloud provider cannot access that data, and even if its servers were breached, data subjects would be at little risk from a privacy standpoint since the data would also be unintelligible to the wrongdoers. The GDPR itself recognizes this concept, although in a slightly different context since it frees organizations from having to notify data subjects of a breach if the data was encrypted. It therefore also makes sense that a cloud provider should not have to comply with all the GDPR’s requirements if it isn’t really processing personal data.

This argument rests, of course, on the assumption that the encryption algorithm is well designed and that the encryption is properly carried out. One possible solution for achieving balance between data protection and not restricting the growth of the digital economy would be for the European Data Protection Board to maintain an up-to-date list of proven encryption technologies. Any personal data that was encrypted with those technologies vetted by the EDPB would be considered “not personal” for any parties who did not have the encryption key.

While this is one possible solution, there are certainly others that could be explored. However, one thing is certain. As more of everyday life and business shifts to the digital realm, encryption will continue to grow in importance. If encrypted data is regarded as personal data under the GDPR, thus subjecting any businesses that process the data to regulation and potential liability, it will hamper both the growth of the digital economy and the motivation for companies to encrypt their data. Let’s hope that the supervisory authorities and European Data Protection Board see the light and officially conclude that, when processed by parties that do not have access to the encryption key, encrypted data should not be considered personal data under the GDPR.

photo credit: marcoverch Nahaufnahme von alten Schlüsseln via photopin (license)

8 Comments

If you want to comment on this post, you need to login.

  • comment Gary LaFever • Mar 7, 2019
    Josh, thank you for the great summary of issues related to encryption as a means to protect data when not in use. The issues change, however, when a data controller wants to actually make use of the data. As soon as the data is decrypted, it is then indisputably personal data and (as decrypted data) will not be protected against misuse.
    
    In contrast, pseudonymization has gained attention recently with its explicit codification in the GDPR.  Legal experts have highlighted the potential for new pseudonymization technologies to address the unique privacy issues raised for legal possession and processing of personal data. [https://www.lexology.com/library/detail.aspx?g=c0f2f119-57be-42b6-baea-7329bb0d330e]. Article 4(5) of the GDPR now specifically defines pseudonymization as “the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organisational measures to ensure that the personal data are not attributed to an identified or identifiable natural person.” Static tokenisation (where a common token is used to replace different occurrences of the same value – e.g., replacing all occurrences “James Smith” with “ABCD”) fails to satisfy GDPR definitional requirements since unauthorized re-identification is “trivial between records using the same pseudonymised attribute to refer to the same individual.” [https://www.pdpjournals.com/docs/88197.pdf] As a result, static tokenisation does not satisfy the “Balancing of Interest” test necessary to satisfy Article 6(1)(f) requirements for Legitimate Purpose processing nor is it included in the technical safeguards listed in Article 6(4) to help ensure that secondary processing like Analytics, AI & ML is a lawful compatible purpose.
    
    The Article 29 Working Party has highlighted “the special role that safeguards play in reducing the undue impact on the data subjects thereby changing the balance of rights and interests to the extent that the data controller’s legitimate interests will not be overridden” and “safeguards may include technical and organizational measures to ensure functional separation” and ”Pseudonymization…will play a role with regard to the evaluation of the potential impact of the processing on the data subject, and thus, may in some cases play a role in tipping the balance in favour of the controller.” [https://ec.europa.eu/justice/article-29/documentation/opinion-recommendation/files/2014/wp217_en.pdf] The Article 29 Working Party further highlights that “functional separation includes secure key-coding personal data transferred outside of an organization and prohibiting outsiders from re-identifying data subject” by using “rotating salts” or “randomly allocated” dynamic versus static, persistent or recurring tokens.  [https://ec.europa.eu/justice/article-29/documentation/opinion-recommendation/files/2013/wp203_en.pdf]
    
    GDPR compliant Pseudonymization, therefore, represents a unique means to help support the actual use of data in the form of lawful secondary processing like Analytics, AI & ML by technically enforcing functional separation protection.
  • comment Jussi Leppälä • Mar 8, 2019
    Thank you Josh for the analysis.  You write that "...encryption is neither pseudonymization nor anonymization."  However, by reading the definition of pseudonymization in Article 4(5) we see that encryption generally fulfills that definition.  The encrypted data can "no longer be attributed to a specific data subject without the use of additional information".  Using such "additional information" - encryption key - that data can again "be attributed to a specific individual".  From other parts of the Regulation, it becomes clear that a pseudonymized dataset is expected to have some utility on its own without the additional information.  Even so, the Article 4 definition embraces a wide set of transformations.  Applying homomorphic encryption on an attribute level would satisfy the definition and allow some computations on the “pseudonymized” dataset.
  • comment Wojciech Trelak • Mar 8, 2019
    Recommendations concerning encryption from the EDPB board would be a great thing. For the time being however all those who wish to rely on some recommendations may refer to NIST Cybersecurity Framework.
  • comment Jay Exum • Mar 8, 2019
    @Josh Gresham:  I agree with the logic of this view completely.  But I would make a stronger version of the argument.  I would not concede that pseudonymous data is "unequivocally" personal data based upon Article 26.  In order to be personal data, it is not enough that the identifiability test be met.  The "related to" test must ALSO be met.  And as outlined in ICO guidance (https://ico.org.uk/for-organisations/guide-to-data-protection/guide-to-the-general-data-protection-regulation-gdpr/what-is-personal-data/what-happens-when-different-organisations-process-the-same-data-for-different-purposes/),  that question depends on who the processor is.  
    
    If you are a holder of data in an encrypted format and do not have meaningful access to the encryption key, it is difficult to see how the data can pass the "related to" test in your hands, because you neither know nor care anything about the contents of the data.  As such, even if we assume that the data is personal data with respect to the keyholder, it may not be with respect to the party serving as a repository without access to the data.
  • comment Peter Tiarks • Mar 8, 2019
    Hi Josh, it's an interesting argument, but I think it's the wrong approach.
    
    I don't understand that distinction between pseudonymisation and encryption that you're making here:
    
    "The fundamental difference is: Who holds the metaphorical key?
    
    "... In pseudonymization, the same party who pseudonymizes the data usually does. However, with encryption, many of the parties who are processing the data, such as cloud storage providers, do not have the encryption key to unscramble that data. The encryption key stays with the generator or the end user of the data."
    
    As Jussi Leppälä pointed out, the idea that keys should be kept securely is fundamental to the GDPR definition of pseudonymisation. This was certainly the view in the 2014 WP29 guidance on anonymisation techniques, which stated that "encryption with a secret key" was a form of pseudonymisation, and it seems as though the GDPR was drafted in line with that view.
    
    I'm also sceptical of how much of a burden the GDPR imposes on companies processing encrypted data. Article 11 already provides incentives for controllers to encrypt by limiting the applicability of the regulation. There's certainly a valid concern for processors who only process encrypted data, but taking encrypted data entirely out of the scope of GDPR seems like an extreme solution. A better way to address the problem would be guidance, and perhaps Standard Contractual Clauses, from the EDPB setting out how Article 28 obligations apply to encrypted data where the processor does not have a key.
  • comment Michiel Benda • Mar 9, 2019
    Thanks for provoking an argument on the topic Josh. In my view, however, we are trying to make encryption needlessly complex. Encryption is an access-control mechanism, and as such is a technical measure to protect data. It doesn't change the type of data, it just requires you to have a key to the data in order to access it. In that sense, a user-ID/password construction is conceptually no different from encryption: if someone does not have credentials to access the data, they won't be able to access it; if they don't have the key to decrypt, they won't be able to access it.
    I am fully aware that cloud providers would love to say that they don't process personal data because it is encrypted. The fact is, though, that the cloud providers in the vast majority of cases are the keyholders. There are solutions in which the controller can control the keys, but that is a service that is not actively sold by the cloud providers (although many have a solution available is the client insists). In my view, where the cloud provider has the key to the encryption, they are processing personal data.
    I don't think that because of the arguments above, the digital economy is going to be hampered by GDPR though. It just means that control to the keys should be placed where it belongs - with the controller. If the controller wishes to outsource this to a third party (i.e. the cloud provider), then that third party becomes a processor.
    Where I do see a difference between basic access credentials and encryption is in the risk of data breach. This has nothing to do with the concepts themselves (although you can argue that encryption allows for data portability, contrary to credentials), but with the effort it takes to unlock access to the data. Good encryption that is maintained at an industry appropriate complexity level greatly reduces the risk that data can be accessed. This is where I believe GDPR could be greatly clarified (and a first attempt was made by the WP29 and incorporated by the EDPB): What is risk? Is there a risk when data is protected with modern encryption? While formally there is always a residual risk, I think that, under GDPR, it could be argued to be no risk, provided the keys are controlled.
    Encrypted data remains personal data, but the risk to the fundamental rights and freedoms of natural persons is negligible and should be considered no risk under GDPR.
  • comment Marcus Mueller • Mar 11, 2019
    Josh, thank you for your interesting thoughts - but please don't mind if I want to challenge them: I'm afraid your logic doesn't follow the logic of the GDPR.
    Key is your following statement: "For all intents and purposes, a cloud provider who hosts encrypted data is not processing personal data. The cloud provider cannot access that data, and even if its servers were breached, data subjects would be at little risk from a privacy standpoint since the data would also be unintelligible to the wrongdoers. The GDPR itself recognizes this concept, although in a slightly different context ...".
    I think it's rather the opposite, according to the GDPR, the hosting - or any other kind of processing - of encrypted personal data is still a processing in the meaning of Art. 4 (2) GDPR.
    1. The term "encryption" is mentioned in recital #83, furthermore in Art. 6 (4) (e), Art. 32 (1) (a) and - as already mentioned by you - Art. 34 (4) (a). In all these cases it's about encrypting personal data, i.e. as defined in Art. 4 (1). Therefore any kind of processing of such data is necessarily a processing in the meaning of Art. 4 (2), including its "storage".
    2. The more so as the hosting/storing of personal data, be it encrypted or not, does not require any kind of understanding such data. It's not necessary that the cloud provider has insight into the content or meaning of encrypted data. This is just no requirement according to Art. 4 regarding "personal data" (par. 1), "processing" (par. 2), "controller" (par. 7) or "processor" (par. 8). 
    3. While of course the hosting cloud provider, be it a processor (presumably in most cases) or (even) a controller, has certainly "access" to such data it hosts, be it encrypted or not, given that it immediately deals with it. 
    4. In spite of the encryption, the full range of the principles in Art. 5 applies for encrypted data, including "integrity and confidentiality" (par. 1 (f)). This is concept is not new, please note the references to encryption in the "Opinion 05/2012 on Cloud Computing" (WP 196) of the Art. 29 Working Party from July 2012, see in particular Sec. 3.4.3.3 about "Confidentiality". In particular please note footnote 27: "In the same line, the technical data fragmentation processes that may be used in the framework of the provision of CC services will not lead to irreversible anonymisation and thus does not imply that data protection obligations do not apply."
    5. Which means that the GDPR acknowledges the value of an encryption of personal data. But apart from that, it remains personal data, and to host it is still a processing, each in the meaning of the GDPR.
  • comment Jean Pierre Mistral • Mar 11, 2019
    Josh, thank you for your analysis. This is an extract of an internal memo on security techniques and the GDPR:
    Pseudonymisation aims at protecting personal data by hiding the identity of individuals in a dataset, by replacing one or more personal data identifiers with pseudonyms and appropriately protecting the link between the pseudonyms and the initial identifiers. An identifier is a specific piece of information, holding a privileged and close relationship with an individual, which allows for the identification, direct or indirect, of this individual (e.g., name, email address, picture of an individual, MAC address, IP address etc.). What is important to understand is that pseudonymisation in fact separates the original dataset in two parts, where each of the parts has a meaning with regard to specific individuals only in combination with the other.
    Here are some examples of pseudonymisation techniques:
    •	Hashing without key: A cryptographic hash function h is a function with specific properties which transforms any input message m of arbitrary length to a fixed-size output h(m) (e.g., of size 256 bits, that is 32 characters), being called hash value or message digest.
    •	Hashing with key: A robust approach to generate pseudonyms based on the use of keyed hash functions – i.e., hash functions whose output depends not only on the input but on a secret key too; in cryptography, such primitives are being called message authentication codes. The controller shall keep the secret key securely stored separately from other data, as it constitutes the additional information, i.e., it provides the means for associating the individuals – i.e., the original identifiers – with the derived pseudonyms.
    •	Hashing with key with salt: the input to the hash function is being augmented via adding auxiliary random-looking data that are being called “salt”.
    •	Encryption symmetric or asymmetric (i.e., usage of public keys): aims at ensuring via proper use of mathematical techniques that the whole dataset that is being encrypted is unintelligible to anyone but authorized users who are allowed to reverse/decrypt the dataset. To this end, encryption is a main instrument to achieve confidentiality of personal data by hiding the whole dataset and making it unintelligible to any unauthorized party (as long as state-of-the-art algorithms and key lengths are used and the encryption key is appropriately protected). 
    •	Tokenisation: it refers to the process that the data subjects’ identifiers are replaced by randomly-generated values, known as tokens, without having any mathematical relationship with the original identifiers. Hence, knowledge of a token has no usefulness for a third party other than the controller or processor.
    •	Other well-known techniques, such as masking, scrambling and blurring.  All of them mainly focus on pseudonymising data being at rest (i.e., data that is being stored in a file/database).