Those of you who follow the IAPP’s Privacy List may have recently seen a fascinating exchange among privacy experts opining on whether employee IDs and employee names should be classified as personally identifiable information (PII).
As I followed the exchange, it became clear that I was seeing a near perfect illustration of an enormous problem at the heart of our profession. Simply stated, as privacy professionals, we generally believe our jobs revolve around maintaining controls for the appropriate use and disclosure of either PII or personal data, but we can’t agree on what those terms mean.
Making matters even more complicated, there seems to be little agreement on the basis of what the terms set out to define. Though the terms “personal data”, “personal information” and “personally identifiable information” are often used interchangeably, it’s apparent they could easily be read to speak to fundamentally different things.
This definitional problem is leading to monumental uncertainty at the core of our profession.
On one hand, both personal data and personal information suggest that they refer to data about an individual without necessarily having the requirement to actually single that person out. Indeed, well-known regulation defines personal information as simply “information about an identifiable individual” without specifically calling on the necessity for the data to play a role in identifying the individual. This definition makes some sense since an identity isn’t really something that can be protected.
Isn’t it the information about the identity that we seek to protect, not simply that the identity exists?
That said, absent more guidance about the criteria, which must be met in terms of linkability before something is personal data, we don’t have much of a workable definition. Confusing this further is the general tendency to categorize more sensitive data as being personal data even where it is less linkable or contributes less to singling an individual out than would another data point that may not be considered personal. While “IP Address 18.104.22.168 viewed a page on Alzheimer’s symptoms” may be potentially more sensitive than, “LeBron James is 6ft 8inches tall”, shouldn’t we be clear that the latter is more likely “personal” data than the former even if the former is potentially more sensitive were it to be identified?
On the other hand, singling out and referencing individuals seems to be what’s implied by the competing term, personally identifiable information. In this bucket, I put the identifiers themselves—things like names, Social Security numbers, driver’s license numbers, credit card numbers, frequent flyer numbers or customer IDs. This approach starts from the notion that the identifier is what needs to be protected, but common definitions often leave open the hard-but-crucial questions of what context or discreetness the identifier must have or what needs to be associated with the identifier before we should protect it.
My favorite example—where all these problems manifest in one place—is the question of whether IP address should be considered PII. In practical terms, an IPv4 address is really just the dot-decimal representation of the integers between 0 and 4,294,967,295.
But are we really considering classifying the number 1,656,735,000 as PII? What if I told you it was my IP address (1,656,735,000 equals 22.214.171.124)? Does it matter if I express it in hex as 62BFC518, in binary as 01100010101111111100010100011000, in base 7 as 56,025,005,124? Does it change the classification if I told you that “my” IP address was actually shared by 2,000+ people in my company going to the web through the same proxy server? If we don’t mean to turn the first 4 billion integers into PII or mean to offer changing its numerical basis as a means of de-identification, where does this leave us?
We are staring down the potential for new regulation across Europe with fines as high as five percent of global turnover for the breach of a thing experts cannot even agree upon. Think about what this could mean in practical, day-to-day terms. Is it a data breach when a company throws out a disk containing a webserver log file that recorded the IP addresses of all browsers seeing a “Coming Soon” page for the new Super Widget product by Xmart? Does it matter if Super Widget is a pregnancy test?
If e-mail address is considered personal data, is it a breach when the sender of an e-mail to five recipients hasn’t received consent from each of the five subjects to share their personal information with the other four? Do the contents of the e-mail play into the determination even if you are just a recipient? What if the number is not five but 10, 100, 1000? Where’s the line? If this is a breach, how many hours do I have to report it to a data protection authority, and who in my company needs to be making these incredibly subjective calls?
I don’t have the answers here, and I do understand that this is never going to be a perfectly black-and-white topic. I would, however, remind us all that a wise man once said that you are never going to solve a problem unless you first start by defining it. If the problem privacy professionals set out to solve is governing personal [insert name here], we should consider how well we can achieve that objective until we first agree on what it is we’ve set out to do in a manner where we find more agreement than we do today.
I do think there are models for success.
Various U.S. state data breach laws and the HIPAA rules are successes because they carefully define what they set out to protect. Businesses understand that name in conjunction with Social Security number is covered by most state data breach laws with certainty. Covered entities understand that name in conjunction with health condition is covered by HIPAA with certainty.
Privacy professionals are debating if your name and where you work is at the core of what they do because they must consider it in terms of disparate definitions.
We need to go back to the drawing board and look at our definitions and the prejudices from which they start. It is time for more certainty. It’s time to harmonize the two approaches to better define what we, as privacy pros, set out to govern.
Not having a clear definition of personal [whatever] has been a well understood problem for a long time, but it is getting increasingly hard to just blindly accept. I enjoy a good debate as much as the next guy, but I think its time we find more agreement on the fundamentals.
If you want to comment on this post, you need to login.