During a keynote session at the IAPP Global Privacy Summit 2023, U.S. Federal Trade Commissioner Alvaro Bedoya made a powerful suggestion: that the existing regulatory framework in the U.S. already sufficiently covers key issues posed by artificial intelligence systems.
But can the same conclusion be reached regarding privacy risks related to AI systems?
Attacks that in some way expose the data making up AI's training base have been known for a long time. However, it is only more recently that the types of risks that should be evaluated in AI systems, especially those based on machine learning, have been classified and consolidated.
This systematization of risks is important to help privacy professionals in considering these risks in their privacy frameworks, or even in creating specific governance frameworks for AI systems.
The October 2023 paper "Deepfakes, Phrenology, Surveillance and More! A Taxonomy of AI Privacy Risks" built an assessment focused on more traditional privacy risk models, based on George Washington University Law School professor Daniel Solove's privacy risk taxonomy.
Now, "Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations," published by the U.S. National Institute of Standards and Technology, describes how data aggregated in AI systems is being used for improper purposes.
The publication discusses privacy attacks related to data reconstruction, membership inference, model extraction and property inference and presents some ideas for risk mitigation strategies. What is interesting is that many of these risks have already been mapped for over a decade, but they may not have been systematized in the way NIST has done.
Data reconstruction
In a data reconstruction attack, reverse engineering of confidential information can recover an individual's data from disclosed aggregate information, using an individual user record or sensitive critical infrastructure.
How does it work? The ability to reconstruct training samples is partly explained by the tendency of neural networks to memorize their training data. Generative AI systems are fed carefully crafted instructions and receive context-sensitive information to summarize or complete other tasks. The input is also used for training, which means a lot of data is memorized. Taking advantage of this, attackers can simply ask for the repetition of confidential information that exists in the context of a conversation or an instruction. In other words, the more access the attacker has to the user's history, the more information they can extract. For example, lines from a database or text from a PDF document, which are intended to be summarized generically by the AI, can be extracted in detail by simply requesting them through a direct prompt injection.
Membership inference
Membership inference attacks happen when the attacker can determine whether a data record was part of the dataset used as the system's training model. When a record is fully known to the adversary, knowing that it was used to train a specific model is an indication of information leakage through the learning model. In some cases, it can directly lead to a privacy breach.
Why? Let's say a clinical record for a specific patient were used to train a model associated with a disease — for example, to determine the appropriate dosage of a medication or to discover the genetic basis of the disease — this could reveal to those who have access to that information that this individual has that disease.
Model extraction
In such an attack, model extraction is not the end goal, but rather a springboard for other attacks. As the model's parameters and architecture become known, attackers can launch more powerful attacks on known systems.
Property inference attacks
Property inference attacks explore data that was not related to the machine learning output, but was at some point used for its training. The goal of machine learning models is to make correct choices for specific tasks and learn important properties and patterns from data. In doing so, it is possible the model will store patterns that are not related to its main task. This type of attack seeks to infer from a given model, called the target model, information about the training data set that is unrelated to its main objective and, if this training data is confidential, this attack can be considered a privacy breach.
Leveraging identified risks
As we are continuously seeing, the already mapped privacy risks inherent in AI systems, as well as the existing regulatory framework — echoing Commissioner Bedoya's insights — serve to guide the governance of AI systems.
Yet, the act of categorizing, organizing and incorporating these risks into governance frameworks marks a critical step toward effective management. NIST's publication represents a significant advancement in this direction.
Organizations should leverage these identified risks within their risk assessment models and AI governance frameworks. Doing so not only aligns with regulatory expectations but also enhances the resilience of AI systems against privacy threats. This proactive engagement in risk management underscores the importance of evolving governance strategies to keep pace with the rapid advancements and complex challenges presented by AI technologies.
Also, with awareness and a call to action, governments, organizations and individuals must continue to play their respective roles in ensuring personal data is protected and used only when it is legal and appropriate.
Only this coordinated work can guarantee that the use of this technology is beneficial to everyone and not yet another tool for spreading insecurity and harm to individuals.
