11 Nov. 2021

Privacy as code: A new taxonomy for privacy

“Privacy by design” implies putting privacy into practice in system architectures and software development from the very beginning and throughout the system lifecycle. It is required by the EU General Data Protection Regulation in Article 25. In the U.S., the Federal Trade Commission included an entire section on privacy by design in its 2012 report on recommendations for businesses and policymakers. Privacy by design is also covered by India’s PDP Bill and by Australia’s Privacy Management Framework, to name just a few. Privacy by design has come a long way since its original presentation by Ann Cavoukian, former Canadian privacy commissioner of Ontario, in 2009.

While privacy as design is conceptually simple, its reduction to practice is not. System developers and privacy engineers responsible for it face simple but hard-to-answer questions: Where is the actual data in the organization? What types of information fall under personal data? How does one set up a data deletion process for structured as well as unstructured data?

Three years ago, Cillian Kieran and his team at Ethyca embarked on a quest to develop a unified solution to those questions. Their vision? Nothing less than privacy-as-code – privacy built into the code itself. This revolutionary approach classifies data in such a way that its privacy attributes are obvious within the code structure.

Their efforts were led by some big questions: How can systems developers describe the types and purposes of personal data in a consistent manner? Is there a way privacy rules and policies can get defined and enforced throughout the software development process? What if configurable tools could uphold data subject rights such as data access, erasure, portability, and retention as a system feature?

Last week, Ethyca celebrated an additional $7.5 million in funding and announced the first release of Fides. Fides is named after the Roman god of trust.

Fides is an open-source, human-readable description language based on the data-serialization language YAML. Fides allows one to write code with privacy designed in. It is based on common definitions of types, categories and purposes of personal data. Developers that use this language can easily see where privacy-related information is at any point in the software development. For any given system, engineers shall be able to understand at a glimpse whose data is in the system and what it is being used for.

Ethyca’s goal is to establish consistent standards around personal data processing which can describe the privacy characteristics of applications, datasets and broader tech stacks, and identify privacy risks. For the first time, this creates a standardized and interoperable approach towards privacy engineering from the ground up.

The privacy-related characteristics and behaviors of code and databases are derived from a new privacy taxonomy. While many of us are familiar with Daniel Solove’s pioneering taxonomy of privacy problems in the context of privacy-risk modelling, Fides’ taxonomy is different.

Fides’ privacy taxonomy is used to label and classify data, to quickly understand what, whose and why data is processed or shared.

The taxonomy distinguishes four levels of hierarchy: data categories, data uses, data subject categories and data qualifiers. Each of those hierarchical levels can be broken down into a variety of subclasses of annotations that allow for the needed granularity.

For example, data categories are classifying different types of data: account, system or user data. For data uses – describing for what purpose the data is used – examples for labels could be: personalize, third_party_sharing, or train_ai_system. Data subject categories cover anyone from anonymous_user, customer, employee and more. Finally, data qualifiers express the degree of identification, such as identified, pseudonymized or anonymized data.

Eventually, the description of the data results in a written statement that is easy to understand and concise. For example, the hierarchical notation to label cookie IDs would look like this: user.derived. identifiable.device.cookie_id.

While this is an example for a fully qualified category, going through all the categories, one could also describe the data on a higher level. An example could be the description for all the data that is shared with third parties for the purpose of personalized advertising: third_party_sharing.personalized_advertising. The different degrees of granularity allow for a flexible description of the data and adapted to match the specific needs of a project (try for yourself on the Privacy Taxonomy Explorer on GitHub).

Now, how is this done in practice?

The Fides description language uses YAML declaration files to store the data characterization as privacy metadata directly in a project’s Git repositories (an open-source version control system for managing source code history). In this phase, the management tool Fides Control (Fidesctl) is used to implement privacy requirements in the code before moving into production.

But this isn’t all. Many times, privacy engineers get involved only after data collection has already started, its source and use are unclear, and data management must be done manually requiring joint efforts of legal and data engineering teams. Therefore, in one further step, Fides expands beyond a sole taxonomy into a privacy ontology focusing on the production environment. Fides’ privacy ontology describes roles and relationships in a runtime environment, allowing for a variety of trailblazing applications.

Imagine these two examples:

Evaluating risk against policy while code is being written: On the Fides Control server, predefined privacy policies — formalizing business decisions and regulatory compliance requirements in accordance with the standardized privacy ontology — get stored. By using Fidesctl, the YAML declarations (based on the privacy taxonomy) will get compared against those policies on the Fides server. Any inconsistencies between the privacy declaration and the policies will get flagged as risks automatically. If the condition of the policies on file aren’t met, necessary changes can get investigated in the code base and made accordingly. Once the evaluation is passed, the changes can be committed to the code. In this way, Fides will be able to make sure any code that is shipped or merged is compliant with the policies of the organization.
Automated data subject rights requests: Once these YAML declarations are approved in the above step they form part of a metadata view of where information is across all business systems. Using Fides Operations (Fidesops), developers and legal teams can write policies for data subject request to automate access, deletion and other complex procedures consistently across all databases and connected systems. In this way, Fides makes a typically complex and heavily manual process fully automated and a feature of the system.

This emerging ontology and its many applications might be among the most interesting aspects for both privacy engineers and legal privacy professionals. Complex legal regulatory requirements can become synthesized in the predefined policies. Once those policies are defined and deployed, developers can rely on the automated control of the data processing against the requirements laid out in the policies. While developers can already compile such policies with Fidesops on an open-source basis, it is on Ethyca’s roadmap to provide an accessible user interface for everyone to write such policies.

Checking code against semantic privacy policies has the potential to be a true game changer. This would allow not only to identify and manage privacy risks in the development phase of the code before production but become the basis for generating privacy reports to document compliance of the code basis at the touch of a button, or handle privacy requests such as access and deletion automatically.

The prospects that Ethyca is offering with Fides are astonishing. With Fides, Ethyca seems to have taken the first step of building privacy into the very language of the code and have a practical and standardized approach towards privacy management in system and software development.

Now it is up to the community to use the tools offered on GitHub and get engaged in their further development by contributing for Fides to become the tool for interoperability its creators have envisioned. All our feedback is crucial to achieve privacy by design at the very core of development processes.

Photo by Sai Kiran Anagani on Unsplash

This article is eligible for Continuing Professional Education credits. Please self-submit according to CPE policy guidelines.

Submit for CPEs

Interested in writing for us? Visit our Contributor Guidelines Page

Privacy as code: A new taxonomy for privacy

Related stories