Editor's note: The IAPP is policy neutral. We publish contributed opinion and analysis pieces to enable our members to hear a broad spectrum of views in our domains.
If Ella Fitzgerald and Louis Armstrong were singing about data privacy instead of tomatoes and potatoes, they might croon something like: "You say anonymization and I say deidentification. Let's call the whole thing off."
And honestly, who could blame them? The privacy world is full of terms that sound interchangeable but aren't — and that are often misunderstood and misused. Depending on which side of the Atlantic you sit, anonymization, pseudonymization and deidentification might all sound like variations of the same jazzy refrain.
But under the laws — and in practice — they hit very different notes, and it sounds by way of the EU Digital Omnibus and community reactions to it, that even within the EU, not everyone agrees on the rhymes.
'You like tomato and I like tomahto,' but the law doesn't
In everyday conversation with engineers, developers, and particularly those in technology sales and marketing, anonymized often just means "we deleted the name fields and nothing more." In the land of the EU General Data Protection Regulation, that's not even the first bar of the song.
Some more educated people know that quasi-Identifiers such as, among others, address or university degree, are also targeted by data loss prevention tools that are used to catch indirect identifiers.
Anonymization means the data has been transformed so completely that no one — not you, not a data broker, not the U.S. National Security Agency, nor the U.K.'s MI5 — could reasonably reidentify an individual, even with all the auxiliary data in the world, including the meaty, greedy analytical and metadata. Once anonymized, it's out of the legal orchestra entirely; it has ceased to even be "personal data."
Pseudonymization — or deidentification — is more of a remix, with all the possible riffs of one. Obvious identifiers are swapped out for codes or tokens, but the melody is still traceable. There's still a key, literal or metaphorical, that could bring back the original tune. Under the GDPR, pseudonymized data is still very much considered personal data; you just have better sheet music management.
To put it plainly: Anonymization erases the connection to an actual individual's identity. Pseudonymization just hides it behind the curtain or a series of curtains, or maybe in a vault behind the curtain.
And the moment there is any way to lift that curtain — even theoretically — you're not in anonymization territory, you're back in pseudonymization, and thus, GDPR land.
The high note of anonymization and why so few can reach it
True anonymization is like Fitzgerald hitting that effortless high note; rare, dazzling, and almost impossible for mortals to replicate.
Recital 26 of the GDPR sets the tempo: data counts as anonymous only if it has no realistic, feasible means to reidentify a person. That "reasonably likely" test pulls real weight. It doesn't just cover what one individual could do with the data, but what anyone could, given time, accessible tools, technology, resources and motive.
The European Data Protection Board has doubled down on this: anonymization must withstand not only today's technology but tomorrow's foreseeable advances. It's not enough that reidentification would be hard, it must be functionally impossible.
But given these regulator expectations, with artificial intelligence and additional computing power, can we ensure today's truly anonymized data sets will remain tomorrow's truly anonymized datasets?
The technical harmony: Everything is linkable
Even if names and emails are stripped, combinations of traits — birthdate, postal code, gender, favorite band — can uniquely identify people. Researchers have shown again and again that "anonymized" can carry a very loud melody:
- The Netflix Prize dataset was "anonymized" until someone harmonized it with IMDb reviews.
- Location data, even when fuzzed, can map back to homes and workplaces with frightening ease.
- And the famous study by Latanya Sweeney showed 87% of Americans could be uniquely identified with just three quasi-identifiers.
That's why anonymization is not just about removing notes. It's about rewriting the score so the original can't be reconstructed. Techniques like differential privacy and k-anonymity help, but even they require constant tuning, context awareness and a full four pillar governance structure of good data quality; data accountability and responsibility or stewardship; data protection, security and compliance; and data management, including well-thought through architecture.
Here is the kicker: so far, under the GDPR, anonymization is binary. Either data has been rendered truly anonymous, or it is personal data. There's no middle ground, no glissando or "mostly anonymous" chorus — it is. Most datasets that claim to be anonymized are really just pseudonymized. Playing jazz but calling it classical.
A simpler arrangement: Default to deidentified
Rather than pretending every dataset is a masterpiece of anonymization, maybe it is time for a remix.
Let's aim for anonymization as the platinum record, the absolute ideal. But operationally, let's assume we're working with deidentified — meaning pseudonymized — data, unless proven otherwise.
Ask an engineer, "Is this data anonymized?" and they will likely say, "Sure, we stripped out the names."
Ask a privacy lawyer the same question, and there will be a long pause, a deep sigh, and a quote from Recital 26.
And if we harmonize the two:
- Aim for anonymization. Encourage teams to design with irreversible transformations, aggregation and noise injection. Ask lawyers to implement a four-pillar governance.
- Test the melody. If there's any plausible route to reidentification — even a theoretical one — it's deidentified, not anonymous.
- Play by the privacy rules. Deidentified data still counts as personal data, so keep applying all the GDPR harmonics: purpose limitation, lawful basis, access control and minimization.
This approach is not defeatist, it's realistic. It sets a clear operational standard: if the data cannot be proven to be anonymized, treat it as if it is not.
Why the conservative beat wins
This "prove it or regulate it" mindset may sound strict, but it keeps organizations in tune with regulators, Court of Justice of the European Union case law, and reality, alike.
It's legally defensible because any risk of under-classifying data is avoided, and it's ethically sound since it respects the fact that real people sit behind the data points. It's also technically honest, acknowledging that anonymity is hard to achieve — rare, but not impossible. And it's future-proof, recognizing that what counts as anonymous today may not tomorrow. Better to stay on the safe side of history.
As the U.K. Information Commissioner's Office puts it, "the risk of identification is sufficiently remote," meaning that it can never be eliminated entirely. In other words: anonymization is a dynamic, not static, state. Data can slip out of anonymity as easily as a catchy tune slips back into your head.
Lately, the chorus has grown louder. The EU's draft "Digital Omnibus" Package suggests Brussels might retune the very notes of the GDPR, softening what counts as "personal data" and opening new exceptions for data reuse and AI training. If those proposals gain tempo, the already-blurred line between anonymization and deidentification could fade to silence.
But that should not be happening any time soon and lowering the anonymization bar won't create harmony; it risks turning the concept into background noise. Privacy professionals should hold the line: anonymization must remain a high, binary threshold and not a compliance shortcut. Because once we start rewriting the score, the whole privacy symphony risks falling out of tune.
Let's not call the whole thing off — let's just call it what it is
The confusion between anonymization and deidentification is not going away anytime soon, and, to quote Fitzgerald and Armstrong, "our romance is growing flat." But instead of arguing over pronunciation, privacy professionals can agree on the principle and make sure we are accurately using the correct terms.
If we are not positive it's anonymous, then the data is personal.
That is not pessimism; it is precision. It turns a fuzzy concept into a clear rule. It helps privacy teams and engineers sing from the same sheet and simplifies the discussions around what it would — or could — take to re-identify data.
So, next time someone insists they've "anonymized" the dataset, channel your inner Fitzgerald and Armstrong: "You say anonymization, I say deidentification."
And then gently suggest: "Let's not call the whole thing off — let's just call it what it is. And document it."
Because in the privacy world, words, like melodies, matter.
Noemie Weinbaum, AIGP, CIPP/C, CIPP/E, CIPP/US, CIPM, CIPT, CDPO/FR, FIP, is privacy lead at UKG and managing director at PS Expertise.
Flora Garcia, CIPP/E, CIPP/US, CIPT, FIP, is a cybersecurity and privacy attorney and chief privacy officer.
Roy Kamp, AIGP, CIPP/E, CIPP/US, CIPM, CIPT, FIP, is legal director at UKG.


