19 April 2017

De-identification: Moving from the binary to a spectrum

Countless privacy laws and regulations around the world define personal information or personally identifiable information in different ways, using varying definitions and key terms. One jurisdiction may consider an IP address PII while another may not. The Federal Trade Commission may use words like "reasonably linkable" while the EU's General Data Protection Regulation defines "personal data" as any information that is related to a data subject.

Though not new, de-identification — or if you're in Europe, anonymization — is an increasingly popular and useful tool for practitioners to employ when masking datasets and handling personal information while remaining in compliance with privacy laws. Traditionally, it's been thought of as a kind of binary: Either a dataset is "de-identified," or it's not. Increasingly, though, that binary way of thinking is going by the wayside, and for privacy pros, that's a really positive development.

"Identifiability is relative and contextual," Hintze Law Partner Mike Hintze pointed out to a room full of privacy pros during an Active Learning session Tuesday at the IAPP Global Privacy Summit. Take an IP address for example, he explained. A website might not know who a given person is while on the site. It sees the IP address in isolation, but the individual's internet service provider has her subscriber and billing information on top of that IP address. So contextually speaking, the IP address has differing PII value in each context.

"The binary approach to de-identification is the wrong approach." -Mike Hintze, Hintze Law

"The binary approach to de-identification is the wrong approach," Hintze argued. He said it needs to be viewed on a spectrum as a way of mitigating risk.

And regulators seem to be catching on. Privacy Analytics CEO Khaled El-Emam said many government organizations support the risk-based approach to de-identification, and Hintze said he is encouraged by how the GDPR essentially provides incentives for those employing strong de-idenfication protocols.

For those who are visual learners, the Future of Privacy Forum's Kelsey Finch provided a visual guide for data de-identification. This primer helps differentiate between degrees of identifiability, pseudonymous data, de-identified data, and anonymous data, together with clues on direct identifiers and indirect identifiers, as well as safeguards and controls.

For those in marketing tech, Colin O'Malley, of the Lucid Privacy Group and co-founder of Ghostery, laid out the complex ad tech ecosystem. Really, to get anything done, he said, data goes through 10, 12, even 15 different third-party providers. Plus, even a basic visit to a website reveals potentially valuable data, like IP address, referrer, the device's operating system, and language preference.

With technology evolving to include voice and face recognition and other biometric identifiers, and increased capabilities by marketers to track users, maintaining compliance will continue to be difficult and complex. It's well known that there is no perfect security; and likewise, there is no useful set of data that is completely anonymized, but using the risk-based approach to de-identification may help organizations take more practical, and nuanced, steps to ensure, as best as possible at least, data is de-identified and useful.

De-identification: Moving from the binary to a spectrum

Related stories