9 March 2015

Pseudonymity and Context Dependency: The Implications for Privacy Engineering

Dr. Ian Oliver, in his excellent post "The Semantics of PII," mentions the inherent difficulty in ascertaining whether a particular network or hardware address is PII, given the number of different contexts in which this kind of information is transmitted or collected, some of which may raise privacy concerns and some of which may not. One could therefore say, correctly, that whether or not a particular piece of information is personally identifying depends on the context in which it is collected.

An important corollary to this principle, however, that carries significant consequences for privacy engineering, is that common safeguards deployed to achieve pseudonymity—the state of being non-PII—can, in fact, be easily defeated, based on a consideration of all the contexts in which a particular piece of information is collected. A real-world example makes this point.

Consider the mobile context, where the major platforms have taken the salutary step of creating device-wide IDs that are non-hardware based and non-network based (e.g., Apple's Identifier for Advertising and Google's Advertising ID). These IDs are widely used by entities in the advertising ecosystem to pseudonymously target and serve ads, track analytics and perform other functions. A large amount of third-party code, including widely-deployed social network and advertising SDKs, use these IDs. App developers often deploy this third-party code with the belief that the transmitted IDs are not PII.

The trade press, for its part, in various fits of exuberance, has touted the "anonymous" nature of these IDs. However, the extent to which these IDs afford any privacy protection at all turns in large part on low-level developer decisions regarding their use. For example, if a developer chooses third-party code that passively transmits one of these advertising IDs to a social network, the developer has placed the social network in a position to trivially discover the personal identity of the individual using the developer's mobile app, even if the end-user never uses the social sharing functions embedded in the developer's app.

How is this possible? The answer is context.

Part of the context of the mobile ecosystem is that social networks have their own apps, which are widely installed by end-users. Social networks often collect the same advertising ID from their own apps that they collect through their widely installed third-party code. In such instances, the developer has put the social network in the position of using the advertising ID as a primary key to link one table, consisting of advertising IDs collected from the waterfront of apps that use the social network's third-party code, with a second table, consisting of rich PII about individual end-users that the social network collects from its own app (and where the social network also collects advertising IDs). Setting up a relational database with these two tables would enable the social network to execute database queries to determine a large number of the apps have been installed by any particular individual belonging to the social network.

It goes without saying that an ill-advised design decision of using the wrong social sharing SDKs, in the development of banking apps, health and wellness apps, or child-directed apps, can easily result in disclosing sufficient information to the social network to trivially personally identify the app's users based on advertising IDs.

So, one might assume that pseudonymity will be preserved as long as developers refrain from using the social sharing SDK in those instances, but use, for example, an analytics SDK or third-party advertising SDK with code that passively transmits the advertising ID, right? The answer is maybe not.

Any party that collects one of these mobile advertising IDs in its own apps—whether it is a social network, a data broker or a consumer-facing company—is in a position to link the advertising ID with their member/user/customer profiles and sell or share access to that data to those third-party advertising entities or others that collect the advertising ID in a context where personal identity is unavailable.

Thus, there are many parties—not just social networks—that have the technical ability to unmask individuals based on mobile advertising IDs. Whether they do so may be subject to constraints found in law, best practices or internal policy, but there are no other limitations beyond that.

Moreover, there is no way to put the onus on the end-user to block the use of the advertising IDs because neither the "Limit Ad Tracking" function in iOS nor the "Opt out of interest-based ads" on Android actually stops the advertising IDs from being transmitted. Instead, those functions merely set a flag indicating the end-user's preference.

The bottom line is that the degree to which any particular piece of information—such as an advertising ID—is personally identifiable, depends not only upon the particular use case under development, but also upon all the relevant contexts in which the information or ID is collected.

There is a link between the squishy semantics of PII and the fragility of privacy "enhancing" technologies. In a future post, I will discuss how the fragility of privacy enhancing technologies is, ironically, not an engineering impediment to detecting the transmission of a wide array of potentially sensitive information, including advertising IDs, and methods for enforcing internal policy constraints for development.

Pseudonymity and Context Dependency: The Implications for Privacy Engineering

Related stories