The data economy is facing a paradox. The exponential increase in the processing of personal data has created a wide array of unprecedented possibilities to gain useful insights via artificial intelligence and machine learning. At the same time, these developments expose individuals to new privacy threats.
Against this background, most conferences on privacy trends broach the issue of how emerging privacy-enhancing technologies can support privacy protection in the context of AI and ML. AI headlined as a keynote topic at the IAPP Privacy. Security. Risk. 2022 conference and dominated conversation at the IAPP Data Protection Congress 2022. At the annual conference for data protection authorities – the Global Privacy Assembly – the Future of Privacy Forum hosted three sessions on PETs as side events, with distinguished panelists sharing views and priorities, with distinguished panelists sharing views and priorities.
The opportunities emerging PETs can provide have not gone unnoticed by worldwide policymakers and regulators. The sheer number of these global policy initiatives around PETs underscore the rapid development of the space, even just within the past couple of years.
What are privacy-enhancing technologies?
Emerging PETs — several of which are also known as privacy-preserving machine learning — are attempts to combine data mining and data utilization with privacy and ethics that encompass a growing variety of new approaches, including federated learning, differential privacy, trusted execution environments, multiparty computation, homomorphic encryption, zero-knowledge proofs and synthetic data. They share a similar vision: preserving the security and privacy of personal information when training or using machine learning models in collaborative settings, while maintaining the utility of the analyzed information.
PETs are not a new concept. Early examinations of PETs can be traced back to a report titled “Privacy Enhancing Technologies (PETs): The Path to Anonymity,” first published in 1995 by Canadian and Dutch privacy authorities. This piece used the term "privacy-enhancing” to refer to a “variety of technologies that safeguard personal privacy by minimizing or eliminating the collection of identifiable data.” Another early definition stems from “Inventory of privacy-enhancing technologies,” published by the Organisation for Economic Co-operation and Development in 2002, which describes PETs as “a wide range of technologies that help protect personal privacy.”
It is telling that definitions of PETs in the private sector emphasize their unique opportunities for data collaboration.
Aspiring to tap into previously inaccessible data siloes and gain new insights, a German early seed investor fund published a comprehensive report on PETs in 2021 titled “The privacy infrastructure of tomorrow is being built today.” The report defines PETs as: “a set of cryptographic methods, architectural designs, data science workflows, and systems of hardware and software that enable adversarial parties to collaborate on sensitive data without needing to rely on mutual trust.” The report predicts that by 2030, “data marketplaces enabled by PETs, (…), will be the second largest information communications technology market after the Cloud.”
Maintaining data utility
Emerging PETs are regularly positioned as helping solve the “privacy-utility tradeoff.” This notion refers to emerging PETs providing privacy protection while the personal data that is processed remains valuable as an analytical resource. In general, mitigating disclosure risks can adversely affect data utility, compromising the data set’s analytical completeness and validity.
For example, the European Union Agency for Cybersecurity’s 2022 report on data protection engineering, included a 2001 definition that described PETs as “a coherent system of ICT measures that protects privacy by eliminating or reducing personal data or by preventing unnecessary and/or undesired processing of personal data; all without losing the functionality of the data system.” These emerging PETs include multiparty computation. MPC allows multiple parties to compute common results based on individual data, without revealing their respective data inputs to each other. The computation is based on cryptographic protocols and does not impact the utility of the data.
Similarly, differential privacy is a unique method used for privacy-preserving data analysis or query-based systems, based on a mathematical definition of privacy. Its goal is to learn as much as possible about a dataset while maintaining “plausible deniability” of any outcome, meaning answers cannot be traced back with certainty to any specific respondent. This is achieved by adding randomized responses to the dataset to protect individuals’ privacy without changing the query result.
On the other hand, synthetic data is generated by an algorithm trained on a real data set. The artificial data created mimics real data, thus replacing the original data while reproducing the statistical properties and patterns of the original set.
These approaches can be very useful, for instance in the health sector where sharing data under privacy regulations like the U.S. Health Insurance Portability and Accountability Act means stripping data of specific identifiers. This deidentification method is meant to reduce identity disclosure risk, but can also result in information losses that make datasets no longer useful for research purposes. Furthermore, deidentified health data can still regularly be reidentified. In comparison, emerging PETs can improve disclosure and reidentification risk mitigation, while maintaining the validity of the data’s information value.
PETs in the context of privacy regulation
With privacy regulation being technology-agnostic, few PET solutions are called out explicitly in privacy regulations. Nevertheless, their link to privacy by default and privacy by design is apparent.
The framework of privacy by design was first established by former Privacy Commissioner of Ontario, Canada Ann Cavoukian in 2010, in the form of seven foundational principles. PbD has slowly become embedded into privacy and data protection laws across the world.
The most prominent example is Article 25 of the EU General Data Protection Regulation — the U.K. GDPR contains identical wording — that refers to the data controller’s obligation to take “into account the state of the art, the cost of implementation and the nature, scope, context and purposes of processing as well as the risks of varying likelihood and severity for rights and freedoms of natural persons posed by the processing.” Further, Article 25 demands “appropriate technical and organisational measures (…), which are designed to implement data-protection principles, such as data minimisation,” are implemented in an effective manner, “both at the time of the determination of the means for processing and at the time of the processing itself.”
In the U.S., the Federal Trade Commission recognized privacy-preserving design as a best practice more than a decade ago. In a 2012 report, the FTC stated the baseline principle that “companies should promote consumer privacy throughout their organizations and at every stage of the development of their products and services.”
We find similar provisions in privacy laws around the world. Article 46 of Brazil’s General Data Protection Law states “agents shall adopt security, technical and administrative measures able to protect personal data.” Chapter 9.3 of India’s proposed Digital Personal Data Protection Bill, 2022, states “a Data Fiduciary shall implement appropriate technical and organizational measures.”
Implementing PbD is not defined in detail by laws and regulations, therefore the guiding principle for appropriate measures is whatever is currently characterized as “state of the art” in practice. The meaning of this is not set in stone. Instead, it relies on technological progress and a certain amount of subjectivity.
ENISA, together with the German TeleTrust, recently defined state of the art as the “best performance of an IT security measure available on the market to achieve the legal IT security objective.” This is usually the case when “existing scientific knowledge and research” either reaches market maturity or is launched on the market and references international standards where possible.
Appropriate technical and organizational measures can mean different things at different times in different contexts. What was good enough years ago might not be in the best interest of today’s end users and data handlers. A classic example is the evolution of online security. The former industry standard, visiting websites with unencrypted HTTP connection, is no longer relevant. Current “state of the art” security requires HTTPS with a TLS certificate, meaning an encrypted connection to a webserver when visiting a website.
Thus, data controllers must consider the current progress in available technology and stay up to date on technological advances to choose the least invasive system design for their specific functionality while maintaining compliance with the appropriate privacy regulations. This is also one of the main reasons for privacy professionals to investigate emerging PETs.
How are emerging PETs categorized?
Several organizations and initiatives have taken on the challenge of categorizing and classifying emerging PETs, according to their underlying technologies, applications or capabilities. Examples include:
- The U.K. Royal Society’s new report on the role of privacy-enhancing technologies in data governance and collaborative analysis (see above link).
- Mobey Forum’s reporting in their piece, The Digital Banking Blindspot.
- A project hosted by The Computer Security and Industrial Cryptography research group at the Department of Electrical Engineering of KU Leuven.
- The Federal Reserve Bank of San Francisco’s report on PETs.
- The use case based Adoption Guide for PETs by the U.K.'s Centre for Data Ethics and Innovation.
- ENISA’s references to specific PETs in its report on pseudonymization techniques and in its report on data protection engineering.
A recent categorization stems from the U.K. Information Commissioner’s Office, which published a draft guidance on PETs in September 2022. The ICO distinguishes:
- PETs that reduce the identifiability of individuals and help to fulfill the principle of data minimization, e.g., differential privacy and synthetic data generation.
- PETs that focus on hiding and shielding data to achieve better security, e.g., homomorphic encryption, zero-knowledge proofs, and trusted execution environments.
- PETs that split or control access to personal data, meeting both data minimization and stronger security principles, e.g., federated learning and MPC.
Global trends and policy support for PETs
The rapid development of the PET space in the past couple of years has sparked a considerable amount of discourse in the privacy engineering and data science community. Due to their enhanced capabilities for securing, or anonymizing, data and data minimization while maintaining data utility, PETs are also seeing increased attention from legislators and public authorities.
In Europe, ENISA highlighted several emerging PETs as new techniques for data protection engineering and emphasized MPC and zero-knowledge proofs specifically as advanced pseudonymization techniques. The European Data Protection Board also recognized MPC as a supplementary technical measure for international personal data transfers. The European Commission’s Joint Research Centre published an analysis on the usefulness of synthetic data while performing research.
In the beginning of 2021, the U.S. Senate introduced the “Promoting Digital Privacy Technologies Act,” with plans to support the research, deployment and standardization of privacy technology. The U.S. Department of Homeland Security also expressed interest in defining privacy in technical terms and hosted a workshop focused on showcasing use-cases for emerging PETs. In Canada, the Office of the Privacy Commissioner recently published considerations around various aspects of synthetic data.
In July 2022, Singapore’s Infocomm Media Development Authority began a six-month sandbox program in to support businesses interested in adopting emerging PETs. In May 2022, South Korea’s Personal Information Protection Commission spearheaded the development of 11 core PETs that will continue over the next four years.
Development is not only intranational. In 2022, the U.N. launched the PETs Lab initiative, a global hackathon aimed at tackling challenges surrounding the safe and responsible use of PETs. In 2021 the U.S. and U.K. sponsored a bilateral prize challenge to facilitate the adoption of PETs. Singapore’s IMDA and the International Centre of Expertise of Montreal for the Advancement of Artificial Intelligence signed a memorandum of understanding for one of the world’s first cross-border collaborations on PETs in June 2022. South Korean and French data protection authorities soon followed with an agreement to jointly research emerging PETs.
Challenges and outlook
Unsurprisingly, as with every new technology, challenges will become apparent as more PETs are developed and implemented. The relative infancy of PETs will eventually lead to the need for more technology subject matter experts, especially as regulators continue to develop firmer guidelines. Similarly, there are few use-case examples or off-the-shelf solutions, which makes it difficult for privacy engineers to determine the suitability of emerging PETs in day-to-day operations. Both factors can lead to mistakes in implementation which may cause critical privacy issues.
It is also important to remember PETs are not “silver bullet” solutions for the protection and safeguarding of personal information. Of course, PbD cannot be reduced to the implementation of specific technologies. As ENISA said, PbD “is a process involving various technological and organizational components, which implement privacy principles by deploying technical and organization measures that include also PETs.”
The lack of regulatory guidelines around emerging PETs could place PET-processed data in a precarious state of limbo; could results be considered anonymized, deidentified or pseudonymized? This question can become even harder to answer when the data processing stretches over multiple jurisdictions. Ideally, regulators and data authorities will continue to foment discussion and standardization around these technologies, to make them more easily adopted and used globally.