The data economy is facing a paradox. The exponential increase in the processing of personal data has created a wide array of unprecedented possibilities to gain useful insights via artificial intelligence and machine learning. At the same time, these developments expose individuals to new privacy threats.

Against this background, most conferences on privacy trendsbroach the issue of how emerging privacy-enhancing technologies can support privacy protection in the context of AI and ML.AI headlinedas a keynote topic at the IAPPPrivacy. Security. Risk. 2022 conference anddominated conversation at the IAPP Data Protection Congress 2022.At the annual conference for data protection authorities – the Global Privacy Assembly –the Future of Privacy Forum hostedthree sessions on PETs as side events, with distinguished panelists sharing views and priorities,with distinguished panelists sharing views and priorities.

The opportunitiesemerging PETs can providehave not gone unnoticedby worldwide policymakers and regulators. Thesheer number of theseglobal policy initiatives around PETs underscorethe rapid development of the space, even just within the past couple of years.

What are privacy-enhancing technologies?

Emerging PETs — several of which are also known as privacy-preserving machine learning — are attempts to combine data mining and data utilization with privacy and ethics that encompass a growing variety of new approaches, including federated learning, differential privacy, trusted execution environments, multiparty computation, homomorphic encryption, zero-knowledge proofs and synthetic data. They share a similar vision: preserving the security and privacy of personal information when training or using machine learning models in collaborative settings, while maintaining the utility of the analyzed information.

PETs are not a new concept. Early examinations of PETs can be traced back to a report titled Privacy Enhancing Technologies (PETs): The Path to Anonymity,” first published in 1995 by Canadian and Dutch privacy authorities.This piece used the term "privacy-enhancing to refer to a variety of technologies that safeguard personal privacy by minimizing or eliminating the collection of identifiable data. Another early definition stems from “Inventory of privacy-enhancing technologies,publishedby the Organisation for Economic Co-operation and Developmentin 2002, which describes PETs as “a wide range of technologies that help protect personal privacy.

It is telling that definitions of PETs in the private sector emphasize theirunique opportunities for data collaboration.

Aspiring to tap into previously inaccessible data siloes and gain new insights,a German early seed investor fund publisheda comprehensivereport on PETs in 2021 titled “The privacy infrastructure of tomorrow is being built today.” The report defines PETs as:a set of cryptographic methods, architectural designs, data science workflows, and systems of hardware and software that enable adversarial parties to collaborate on sensitive data without needing to rely on mutual trust.” The report predicts that by 2030, data marketplaces enabled by PETs, (…), will be the second largest information communications technology market after the Cloud.”

Maintaining data utility

Emerging PETs are regularly positioned as helping solve the “privacy-utility tradeoff.” This notion refers to emerging PETs providing privacy protection while the personal data that is processed remains valuable as an analytical resource. In general, mitigating disclosure risks can adverselyaffectdata utility,compromisingthe data set’s analytical completeness and validity.

For example, the European Union Agency for Cybersecurity’s 2022 report on data protection engineering,  included a 2001 definition that described PETs as “a coherent system of ICT measures that protects privacy by eliminating or reducing personal data or by preventing unnecessary and/or undesired processing of personal data; all without losing the functionality of the data system.”These emerging PETs include multiparty computation. MPC allows multiple parties to compute common results based on individual data, without revealing their respective data inputs to each other. The computation is based on cryptographic protocols and does not impact the utility of the data.

Similarly, differential privacy is a unique method used for privacy-preserving data analysis or query-based systems, based on a mathematical definition of privacy. Its goal is to learn as much as possible about a dataset while maintainingplausible deniability of any outcome, meaning answers cannot be traced back with certainty to any specific respondent. This is achieved by adding randomized responses to the dataset to protect individuals’ privacy without changing the query result.

On the other hand, synthetic data is generated by an algorithm trained on a real data set. The artificial data created mimics real data, thus replacing the original data while reproducing the statistical properties and patterns of the original set.

These approaches can be very useful, for instance in the health sector where sharing data under privacy regulations like the U.S. Health Insurance Portability and Accountability Actmeans stripping data of specific identifiers. This deidentification method is meant to reduce identity disclosure risk, but can also result in information losses that make datasets no longer useful for research purposes. Furthermore, deidentified health data can still regularly be reidentified. In comparison, emerging PETs can improve disclosure and reidentification risk mitigation, while maintaining the validity of the data’s information value.

PETs in the context of privacy regulation

With privacy regulation being technology-agnostic,few PET solutionsare called out explicitly in privacy regulations. Nevertheless, their link to privacy by default and privacy by designis apparent.

The framework of privacy by design was firstestablishedbyformer Privacy Commissioner of Ontario, Canada Ann Cavoukianin 2010, in the form ofseven foundational principles. PbDhas slowly become embedded into privacy and data protection laws across the world.

The most prominent example isArticle 25of the EU General Data Protection Regulation — the U.K. GDPRcontains identical wording that refers to the data controller’s obligation to take “into account the state of the art, the cost of implementation and the nature, scope, context and purposes of processing as well as the risks of varying likelihood and severity for rights and freedoms of natural persons posed by the processing. Further, Article 25 demandsappropriate technical and organisational measures (…), which are designed to implement data-protection principles, such as data minimisation,” are implemented in an effective manner, “both at the time of the determination of the means for processing and at the time of the processing itself.”

In the U.S., the Federal Trade Commission recognized privacy-preserving design as a best practice more than a decade ago. In a 2012report, the FTC statedthebaseline principle thatcompanies should promote consumer privacy throughout their organizations and at every stage of the development of their products and services.”

We find similarprovisions in privacy laws around the world. Article 46 of Brazil’s General Data Protection Law statesagents shall adopt security, technical and administrative measures able to protect personal data.Chapter 9.3 of India’s proposed Digital Personal Data Protection Bill, 2022,states “a Data Fiduciary shall implement appropriate technical and organizational measures.”

Implementing PbD is not defined in detail by laws and regulations, therefore the guiding principleforappropriate measuresis whatever is currently characterized as “state of the art” in practice. The meaning of this is not set in stone. Instead, it relies on technological progressand a certain amount of subjectivity.

ENISA, together with the German TeleTrust, recentlydefinedstate of the art as the “best performance of an IT security measure available on the market to achieve the legal IT security objective.This is usually the case when “existing scientific knowledge and research” either reaches market maturity or is launched on the market and references international standards where possible.

Appropriate technical and organizational measures can mean different things at different times in different contexts. What was good enough years ago might not be in the best interest of today’s end users anddata handlers. A classic example isthe evolution of online security. The former industry standard, visiting websites withunencrypted HTTP connection, is no longer relevant. Current “state of the artsecurity requires HTTPS with a TLS certificate, meaning an encrypted connection to a webserver when visiting a website.

Thus, data controllersmust consider the current progress in available technology and stay up to date on technological advances to choose the least invasive system design for their specific functionality while maintaining compliance with the appropriate privacy regulations.This is also one of the main reasons for privacy professionals to investigate emerging PETs.

How are emerging PETs categorized?

Several organizations and initiatives have taken on the challenge of categorizing and classifying emerging PETs, according to their underlying technologies, applications or capabilities. Examples include:

  • The U.K. Royal Society’s new report on the role of privacy-enhancing technologies in data governance and collaborative analysis (see above link).
  • Mobey Forum’s reporting in their piece, The Digital Banking Blindspot.
  • A project hosted by The Computer Security and Industrial Cryptography research group at the Department of Electrical Engineering of KU Leuven.
  • The Federal Reserve Bank of San Francisco’s report on PETs.
  • The use case based Adoption Guide for PETs by the U.K.'s Centre for Data Ethics and Innovation.
  • ENISA’s references to specific PETs in its report on pseudonymization techniques and in its report on data protection engineering.

A recent categorization stems from the U.K. Information Commissioner’s Office, which published a draft guidance on PETs in September 2022. The ICO distinguishes:

  • PETs that reduce the identifiability of individuals and help to fulfill the principle of data minimization, e.g., differential privacy and synthetic data generation.
  • PETs that focus on hiding and shielding data to achieve better security, e.g., homomorphic encryption, zero-knowledge proofs, and trusted execution environments.
  • PETs that split or control access to personal data, meeting both data minimization and stronger security principles, e.g., federated learning and MPC.

Global trends and policy support for PETs

The rapid development of the PET spacein the past couple of years has sparked a considerable amount of discoursein the privacy engineering and data science community. Due to their enhanced capabilities for securing, or anonymizing, data and data minimization while maintaining data utility,PETs are also seeing increased attention from legislators and public authorities.

In Europe, ENISA highlighted several emerging PETs as new techniques for data protection engineering andemphasizedMPC and zero-knowledge proofs specifically as advanced pseudonymization techniques. The European Data Protection Board also recognizedMPC as a supplementary technical measure for international personal data transfers. The European Commission’s Joint Research Centrepublished an analysis on the usefulness of synthetic data while performing research.

In the beginning of 2021, the U.S. Senate introduced the “Promoting Digital Privacy Technologies Act,with plans to support the research, deployment and standardization of privacy technology. The U.S. Department of Homeland Security also expressed interest in defining privacy in technical terms and hosted a workshop focused on showcasing use-cases for emerging PETs.In Canada, the Office of the Privacy Commissionerrecentlypublished considerations around various aspects of synthetic data.

In July 2022, Singapore’s Infocomm Media Development Authority began a six-month sandbox program in to support businesses interested in adopting emerging PETs.In May 2022, South Korea’s Personal Information Protection Commissionspearheaded the development of 11 core PETs that will continueover the next four years.

Development is not only intranational. In 2022, the U.N.launched the PETs Lab initiative,a global hackathon aimed at tackling challenges surrounding the safe and responsible use of PETs.In 2021 the U.S. and U.K.sponsored a bilateral prize challenge to facilitate the adoption of PETs. Singapore’s IMDA and the International Centre of Expertise of Montreal for the Advancement of Artificial Intelligence signed amemorandum of understanding for one of the world’s first cross-border collaborations on PETs in June 2022. South Korean and French data protection authorities soon followedwith an agreement to jointly research emerging PETs.

Challenges and outlook

Unsurprisingly, as with every new technology, challenges will become apparentas more PETs are developed and implemented. The relative infancy of PETs will eventually lead to the need formore technology subject matter experts, especially as regulators continue to develop firmer guidelines. Similarly, there are few use-case examples or off-the-shelf solutions,which makes it difficult for privacyengineers to determine thesuitability of emerging PETs in day-to-day operations. Bothfactors can lead to mistakes in implementation whichmay cause critical privacy issues.

It is also important to remember PETs are not “silver bullet” solutions for the protection and safeguarding of personal information.Of course, PbD cannot be reduced to the implementation of specific technologies. As ENISAsaid, PbD“is a process involving various technological and organizational components, which implement privacy principles by deploying technical and organization measures that include also PETs.”

The lack of regulatory guidelines around emerging PETscould place PET-processed data in a precarious state of limbo; could results be considered anonymized, deidentified or pseudonymized? This question can become even harder to answer when the data processing stretches over multiple jurisdictions.Ideally,regulators and data authorities will continue to foment discussion and standardization around these technologies, to make them more easily adopted and used globally.