Privacy by proxy: Regulating inferred identities in AI systems

As artificial intelligence tools permeate more aspects of daily life, they often draw conclusions about people who have never directly interacted with them. Imagine a smart home assistant that notices someone's spouse arriving late each night and, without ever "knowing" them, starts recommending health supplements for insomnia.

This kind of "personalization by proxy" raises thorny questions: whose data is this and do the usual privacy rules even apply? In a world driven by ambient and relational data, privacy may no longer be solely an individual concern.

Traditional laws like the EU General Data Protection Regulation, California Consumer Privacy Act, and Colorado Privacy Act are built on individual rights, but AI inferences about nonusers strain these frameworks. Existing legal definitions of personal data and consent often struggle to account for inferences generated by AI, raising the question of whether privacy is evolving from an individual right toward a more collective understanding.

Syrenis webinar, The ROI of Consent: Generate 40% More Revenue Through Smarter Personalization

Inferred identities and expanding definitions of personal data

The GDPR expansively defines personal data as "any information which are related to an identified or identifiable natural person." In practice, this covers everything from names and email addresses to health data and religion, even when only inferred.

For example, regulators emphasize that if an AI system predicts someone's health condition from their shopping habits, those predictions must be treated as health data under the GDPR. The CCPA likewise defines personal information broadly and explicitly includes "inferences drawn from any of the information identified … to create a profile about a consumer reflecting the consumer's preferences, characteristics…"

These broad definitions suggest that data about an individual that is generated by context or proxies could fall under privacy laws. But in practice, rights like access or deletion only attach to an identifiable data subject. If an individual has never given their information to a company, how can it know to notify them or honor their request?

For instance, neither the GDPR nor U.S. laws clearly address someone who is essentially "discovered" via third-party data. If an AI algorithm infers something about an individual from a friend's profile or a passing glance in a video, has that made them "identifiable"? Courts have interpreted personal data very broadly, but the key consideration is whether a controller can tie information back to an individual.

The problem of consent without contact

The Colorado Privacy Act shows how regulators are beginning to grapple with this issue. Colorado's law creates a category of "sensitive data inferences," such as deducing someone's religion from GPS pings — for instance, "visited a mosque" — which must be deleted within hours unless explicit consent is obtained.

However, even in that narrow exception, the Colorado Privacy Act requires the controller to disclose those sensitive inferences to the person involved. That obligation exposes the limits of the framework because disclosure presumes identifiability. A controller cannot meaningfully notify or obtain consent from someone whose data was never directly collected and whose identity the system does not know. The rule recognizes that inferred data should be treated like other personal data, but it stops short of addressing how those obligations can function when individuals remain invisible to the data controller.

In practice, even well-intentioned rules like these struggle to operate in the real world since AI systems often indirectly learn about people through relational signals and background context. A recent Stanford report warns that generative AI models can "memorize personal information about people, as well as relational data about their family and friends." These models might link an individual's social media habits to predict attributes about them or people close to them. Yet, because they never directly consented, they may not even be aware it is happening.

Privacy regimes struggle with this because consent and notice rely on person-specific interaction, where someone signs up, clicks a checkbox, and is told how data is used. But when the only signal associated with an individual comes through ambient data, such as a Bluetooth beacon captured by another application, traditional notice and consent frameworks no longer apply.

Rights like access, correction or objection only kick in once the controller recognizes someone as a data subject. Controllers generally have no obligation to guess who might be affected by their AI's inferences unless those individuals proactively complain or are identified by other means.

From individual to collective privacy

This predicament raises the broader question: Is privacy still just an individual right or does it start to look collective when AI is involved?

Academics have begun to frame advanced analytics as a collective issue. One study argues that predictive algorithms can create what it calls "collective privacy" problems, creating asymmetries of power and generating information that impacts whole communities or groups, not just one person.

Similarly, experts distinguish group privacy from the traditional model: While individual privacy protects data about that individual, group privacy concerns information patterns about communities — ethnic groups, neighborhoods, etc.

AI inferences often blur these lines. For example, if an AI system infers that people of a certain ZIP code have a health risk, even residents who never interacted with the system could be stigmatized or targeted. No existing law grants a "neighborhood" or family an independent privacy right, but the effects feel collective.

So far, regulatory frameworks mainly center on the individual. Article 22 of the GDPR does give individuals a say when fully automated decisions with significant effects are made about them, but that presumes the system already identified them. Under current rules, if AI "acts on" data that originates with someone else, the law generally treats the inference as belonging to whomever is identified after the fact.

In California, authorities have clarified that even inferences derived from public information must be disclosed to the consumer once identified. However, the first hurdle, identifying a consumer in the first place, remains. For instance, if Alice consents to share health data through a wearable device and that data contributes to an AI model identifying elevated cancer risks in her ZIP code, her neighbor Bob could be affected. He might be flagged for outreach, higher insurance screening, or targeted advertising. Yet, Bob would not receive a privacy notice or be able to exercise access rights under most laws because he never interacted with the service and remains unidentified.

Rethinking legal obligations for AI systems

These gaps complicate enforcement. When regulators begin to suspect profiling by proxy or other forms of indirect data exploitation, they are often forced to retrofit existing statutory concepts to an entirely new technological reality that privacy law never anticipated.

The inference provisions of the Colorado Privacy Act illustrate one early attempt to close that gap, treating certain algorithmic conclusions as sensitive data. They impose obligations of deletion, disclosure, and purpose limitation similar to those applied to more traditional categories of personal information.

The CCPA similarly recognizes sensitive attributes such as health status and sexual orientation. Yet the law's logic still depends on a model of direct data collection and informed consent — one that fails to capture the way AI systems can generate personal insights about someone who never shared data in the first place. When inferences arise through contextual or relational patterns, the traditional legal assumption that privacy protection begins with a data transaction between the subject and the controller begins to collapse, exposing the limits of a framework that tie rights and obligations to direct collection rather than insights systems can generate.

Looking forward, one emerging theme is that privacy by proxy is not an anomaly but a signal that consent, context, and control must be redefined. Organizations using sophisticated AI systems will increasingly need to anticipate when their models generate inferences about individuals who never engaged with them. They must also implement preventative design choices that either minimize that possibility or mitigate its effects through anonymization, aggregation or validation across groups rather than individuals.

Transparency expectations are evolving as well. Notices might one day include language acknowledging that predictions could be made about a person based on the activities of others within their social or geographic network. Although such disclosures remain hypothetical, they express a broader truth that data ecosystems are porous and that accountability cannot be confined to direct user relationships. To meet that challenge, controllers may need to maintain detailed data inventories, conduct recurring impact assessments, and treat cross-linkages between data subjects as a potential source of risk in themselves, not merely as a technical artifact of processing.

Contemporary privacy statutes still rest on the premise that if an organization collects an individual's data, it must handle it responsibly. However, that premise no longer captures the dynamics of an AI environment in which information is constantly generated, inferred and recombined in ways that transcend any individual's explicit input.

Once privacy depends on ambient or relational signals rather than direct exchanges, advocates and regulators alike must begin to conceive of protection as a collective and contextual right, not an exclusively individual one.

Whether that evolution occurs through legislative reform or the gradual formation of new professional norms, the trajectory points toward a world in which every inference about a person carries the same legal and ethical gravity as direct collection.

In the interim, organizations developing or deploying AI systems should operate with the understanding that these inferences constitute personal data and should be governed as such since regulators are only beginning to articulate what it means to protect privacy by proxy in a world increasingly shaped by machine perception and predictive logic.

Jennifer Dickey, AIGP, CIPP/E, CIPP/US, CIPM, CIPT, FIP, is data privacy and AI associate attorney at Dykema.

This article is eligible for Continuing Professional Education credits. Please self-submit according to CPE policy guidelines.

Submit for CPEs

Interested in writing for us? Visit our Contributor Guidelines Page

Privacy by proxy: Regulating inferred identities in AI systems

Related stories

Inferred identities and expanding definitions of personal data

The problem of consent without contact

From individual to collective privacy

Rethinking legal obligations for AI systems