27 Oct. 2021

Multiparty computation as supplementary measure and potential data anonymization tool

Approaching the challenging trade-off between data privacy and data utility for a vast variety of use cases, privacy-enhancing technologies embed important privacy-by-design principles in the data life cycle. They aim at enabling increased collaborative information-sharing while mitigating the risk for privacy and security in previously unknown ways.

This is particularly true for secure multiparty computation, also known as one of the most influential achievements of modern cryptography.

Previously unimaginable, MPC allows for sharing data insights while keeping the data itself private. Two or more parties can receive an output of a computation based on their combined data without revealing their own data to the other parties. All inputs remain private. At the same time, the data remains protected by encryption while in use. The participant's input data doesn't need to be transferred to a central location but can be processed locally. The trusted third party — which is usually needed for processing data from different sources — is emulated by the cryptographic protocol used for MPC.

The resulting ability to break through data silos in a private and secure way is increasingly identified by the market. According to Gartner, half of large organizations will have implemented privacy-enhancing computation for processing data in untrusted environments and multiparty data analytics use cases by 2025.

Multiparty computation can be deployed in many different use cases, ranging from distributed signatures, key management, privacy-preserving machine learning, blockchain and financial fraud detection to digital advertising and medical research.

The first large-scale practical application of MPC was used in Denmark in 2008. In a commodity trading exchange with only one buyer, MPC allowed for an anonymous price setting that was trusted by farmers. Another great example is the annual study by the Boston Women’s Workforce Council on the gender wage gap.

Regulators are catching up

These developments didn't go unnoticed by regulators.

In Europe, increasing legal clarity lays ground for the adoption of multiparty computation.

In June 2021, the European Data Protection Board recognized multiparty computation (or “split processing,” as they call it) as a supplementary technical measure for international personal data transfers from Europe to states that do not offer an adequate level of data protection.

This acknowledgment was not accidental. The European Research Council has been funding the development of practical MPC systems for some time (e.g. here, here and here). Correspondingly, the European Union Agency for Cybersecurity lists MPC as advanced pseudonymization technique in its report from January 2021.

In the U.S., the recognition of privacy-enhancing technologies and MPC is gaining momentum as well.

In February 2021, the Promoting Digital Privacy Technologies Act was introduced in the U.S. Senate. It would provide support for research, broader deployment and standardization of PETs, defining them as “any software solution, technical processes, or other technological means of enhancing the privacy and confidentiality of an individual’s personal data in data or sets of data” which “includes anonymization and pseudonymization techniques, filtering tools, anti-tracking technology, differential privacy tools, synthetic data, and secure multi-party computation.”

Previously, the use of MPC was already suggested explicitly in the Student Right to Know Before You Go Act, designed to protect privacy in a federal student record database. In a report prepared for the U.S. Census Bureau, MPC was identified as secure computation technology with the highest potential. Several projects by the U.S. Defense Advanced Research Projects Agency are focusing on MPC as well (e.g. “Brandeis”).

Solid technology with many protocols

MPC originated from solving a classic computer science problem called the “millionaires' problem,” introduced by Andrew Yao in 1982: Two millionaires would like to know who has more money without revealing how much they each have. The “secure two-party computation” Yao came up with is still the foundation for many of the most efficient cryptographic protocols for multiparty computation known to date.

In general, MPC is working on either the basis of Garbled Boolean circuits, introduced by Yao, or on the basis of Shamir's Secret Sharing and arithmetic circuits over the large field. Additionally, oblivious transfer protocols are used in both cases.

There are numerous different protocols developed and used in each approach, and the schemes can also be combined. Basically, one can summarize:

Garbled Boolean circuits are encrypted versions of digital logic circuits, consisting of hardware or programmed wires and logic gates that follow a prescribed logic when computing a function. To “garble” the circuit means encrypting the possible input combinations and possible outputs, described in the so-called truth tables at the logic gates. Then, each logic gate outputs cryptographic keys used to unlock the output of the next gate, a process set forth until arriving at the final result.
With Shamir's secret sharing, data (for example, personal data or a machine learning model) is split up into fragments, which in themselves do not contain any usable information. The secret shares are distributed amongst a set of parties to perform secure computation over the shares, releasing output to a designated party once done.
Oblivious transfer protocols allow two parties to transfer two encrypted messages from one party to the next in a way that ensures the messages were sent and received, but the sender doesn't know which one of the messages the receiver chose to open.

MPC protocols vary in their efficiency, security or robustness. The protocols can be set up for different scenarios, depending against how many adversaries operating in the system the solution should be secure. In the definition of security no adversarial success is tolerated.

To elevate the privacy posture and cover more use cases, MPC is often combined with federated learning, homomorphic encryption and differential privacy.

Increased legal clarity, but open questions remain

Within the scope of the EU General Data Protection Regulation, MPC can be considered a technical safeguard and pseudonymization technique for the processing of personal data. The recognition of MPC as a supplementary measure by the EDPB and its presentation as a pseudonymization technique by ENISA underlines its capacity as a Data Protection by Design and by Default tool in accordance with Article 25 of the GDPR.

Article 25 states that appropriate technical and organizational measures have to be implemented before and while processing personal data. In doing so, the “state-of-the-art” has to be taken into account. Data controllers have to consider the current progress in technology available in the market and stay up-to-date on technological advances.

The requirement of “state of the art” is stressed further in Article 32 of the GDPR, which specifies the technical and organizational measures. In this context, the guidelines about “State-of-the-art technical and organizational measures” by the European Union Agency for Network and Information Security and the German TeleTrusT from Feb. 2021 refers to state-of-the-art as the “best performance available” of a IT security measure “on the market to achieve” an IT security objective.

In general, this is when “existing scientific knowledge and research” reaches market maturity or is at least launched on the market. For evaluating the state of the technology, one has to look at the degree of recognition and proof in practice. Insofar as MPC meets those criteria, it will have to be considered for privacy and security risk mitigation.

But MPC might offer more than being a technical safeguard and technology for pseudonymization in the context of the GDPR.

MPC is also discussed as a tool for anonymizing personal data, which as a consequence would fall outside the scope of the GDPR. This is a very practical concern, particularly in the context of international health research impeded by the GDPR's strict requirements.

In general, the GDPR takes a risk-based approach for determining whether data should be considered anonymized or pseudonymized. All means to reidentification that could reasonably likely be used, by the data owner or a third party, to identify an individual, must be taken into account.

In reality, the question where to draw the line between pseudonymization and anonymization differs widely within EU institutions and member states. The absolute approach insists that to classify data as anonymous, no remaining risk for reidentification is acceptable. The relative approach accepts that there is always a remaining risk of reidentification. Only attempts to reidentify data by the controller themselves with the legitimate help of a known third party should be taken into consideration.

The legal point of view that MPC leads to anonymization of personal data (see here, here, here and here) can hold particularly true in the relative approach. It stresses that the private inputs or data fragments exchanged during MPC cannot identify any individual independently. Also, as long as the participants of the MPC don't have lawful access to the decryption keys, the chance of collusion is highly improbable.

Diving deep into the legal aspects of MPC in the context of GDPR, support for MPC as a means to anonymize personal data also comes from a project funded by the EU's Horizon 2020 research and innovation program. Working on practical privacy-preserving analytics for big data using MPC, SODA concludes in its legal assessment: “Cryptographic solutions such as multi-party computation have the potential to fulfil the requirements for computational anonymity by creating anonymized data in a way that does not allow the data subjects to be identified with means reasonably likely to be used.”

The definition of MPC the EDPB uses in its recommendations on supplementary measures does not seem to preclude this interpretation. It states that the final result of a computation conducted with MPC “may” constitute personal data:

The data exporter wishes personal data to be processed jointly by two or more independent processors located in different jurisdictions without disclosing the content of the data to them. Prior to transmission, it splits the data in such a way that no part an individual processor receives suffices to reconstruct the personal data in whole or in part. The data exporter receives the result of the processing from each of the processors independently, and merges the pieces received to arrive at the final result which may constitute personal or aggregated data.

The EDPB further specifies that for MPC being considered an effective supplementary measure, the data inputs need to be processed in a way that no information can be revealed about specific data subjects, even when cross-referenced, and no input information is leaked to other participants in the MPC protocol. The data being processed should be located in different jurisdictions and public authorities should not have the legal means of accessing all the necessary shares or input data.

Under all circumstances, it is important to keep in mind that encrypting or anonymizing personal data itself qualifies as “processing” and can only be done on a clear legal basis according to Article 6 of the GDPR.

In conclusion, MPC is righteously considered by many as a privacy-enhancing technology that could transform private and secure information-sharing. The acknowledgment of MPC as an admissible technical safeguard for international data transfers by the EDPB highlights MPC´s potential as a state-of-the-art privacy-by-design tool. This is encouraging for organizations who want to rely on mathematical models for joint processing that go beyond written agreements of private and secure collaboration.

To unlock the full value of multiparty computation and privacy-enhancing technologies, it would be crucial to have EU regulators provide more legal clarity on pseudonymization and anonymization. Particularly scientific research could benefit tremendously from a granular framework for the assessment of identifiability in the context of ready-to-use privacy-enhancing technologies.

Photo by Oscar Nord on Unsplash