A case currently making its way through the Supreme Court’s docket may have far-reaching implications for the future of privacy litigation. The case, Frank v. Gaos, concerns cy pres class action settlements, and the core issue (for which the Court granted certiorari) regards the appropriateness of the cy pres arrangement in the case.
During oral arguments, however, another issue captured the Court’s attention: Article III standing, and, specifically, whether any of the plaintiffs in the case pleaded sufficient concrete harm. The parties submitted supplemental briefs on the issue Nov. 30 and reply briefs on Dec. 12. The Court’s decision — should it choose to engage with the standing issue — could have far-reaching implications for future claims based on privacy harms.
In their briefs, the parties disagree about whether judicial precedent and Congressional history support their stance on the Article III standing issue. Buried in the nuanced discussions about current standing quandaries — whether Congress elevated the harm at issue to a concrete harm and whether the plaintiffs needed to allege more specific facts to establish standing — is an important question: Are search terms plus an IP address individually identifying and is their mere disclosure a concrete harm? If the answer is yes, the implications for privacy litigation are immense.
Spokeo and the Concrete Hurdle for Privacy Harms
In Spokeo v. Robins, 136 S. Ct. 1540 (2016), the court considered whether class-action plaintiffs had standing to claim that a company willfully failed to comply with the Fair Credit Reporting Act. The Court held that to meet the burden necessary for a claim under federal jurisdiction, the plaintiff must establish “(1) an injury in fact, (2) [that is] fairly traceable to the challenged conduct of the defendant, and (3) likely to be redressed by a favorable judicial decision.” The Supreme Court stated that injury-in-fact is the “[f]irst and foremost” of the elements. An injury must be “actual or imminent” and “concrete and particularized” for injury-in-fact to be satisfied. Additionally, a concrete injury need not be “tangible,” and an intangible injury may be an injury-in-fact — “history and the judgment of Congress are instructive” for such a determination.
Much ink has been spilt parsing the implications of Spokeo for privacy harms cases in light of existing precedent — for example, here, here, here, here, and here — but suffice it to say that it is actively developing jurisprudence. “Courts appear to prefer harms that are ‘visceral and vested’ — harms they can physically feel, that are measurable, and have already occurred.” But, Spokeo declared that “a ‘risk of real harm’ can satisfy the concreteness requirement.” The bottom line for privacy claims: Plaintiffs must show that their privacy harms are concrete if they are to have their day in court.
The concreteness of the alleged harms at issue in Frank is where the case carries profound implications.
Google and referrer headers
The claims in Frank arise from a common component of internet communication: the HTTP referrer header. This snippet of information shares the URL of a webpage making a request with a target website. At issue in Frank is the inclusion of a user’s search query in the referrer header. For example, the referrer header may contain a text string similar to the following:
The referrer header provides a target website various pieces of information, but most notably, the fact that a user came to the website from Google, and specifically, via a search for “flowers.” The plaintiffs claim that their search queries may contain “highly-sensitive and personally-identifiable information” — including medical information, racial and ethnic information, political and religious beliefs, and information about an individual’s sexuality. For example, one of the named plaintiffs conducted search queries including his name plus various terms related to his ongoing divorce proceedings. The plaintiffs say that though in most instances the information disclosed in a search query does not directly identify a user, the risk of re-identification poses a threat to their privacy.
The parties disagree over whether the disclosure of search terms in the referrer header to a target website constitutes a concrete harm. If the Court determines that it is a concrete harm, the plaintiffs will have standing; if the Court determines that it is not a concrete harm, an Article III standing analysis will fail and the cy pres arrangement may be vacated.
Search terms + IP addresses = identifiable information and a concrete privacy harm?
The Court requested additional briefing because, as Orin Kerr describes, the privacy-focused Article III standing question “raises some difficult issues.” Kerr suggests a way for the Court to find standing without grappling with the privacy harms questions (by understanding the allegations as an “intangible version of the tort of conversion” under SCA § 2702), but the difficult issues are the most interesting. Reviewing the parties’ briefs, two issues underlie the dispute: the concept of identifiability and the risk of re-identification.
Paloma Gaos (a named party respondent) and Theodore Frank (the petitioner and a class member) argue in favor of standing and describe the ways in which the “history and judgment of Congress are instructive” in support of their position. They also draw comparisons from the privacy harms alleged in the case to privacy torts and previously protected forms of communication. Frank emphasizes the fact that any doubt about Gaos’s ability to succeed on the merits of her claims should be independent from the determination of standing. Gaos agrees: “The scope of a cause of action is a merits issue, not a standing issue.”
The U.S. government (as amicus curiae) and Google (a respondent and the defendant) argue the opposite: that the named plaintiffs fail to allege facts that rise to the level of harm sufficient for Article III’s concreteness requirement. Both say the disclosure of search terms without identifying information is either not harmful or does not pose a great enough risk of harm to be concrete. Google extends that argument further and asserts that even with identifying information, the harm, or risk of harm, would not be enough for standing. Both parties argue that the risk of re-identification is so remote as to not be concrete. The government and Google add that Congress did not elevate the disclosure of search terms to a concrete harm in the SCA.
This analysis avoids the minutiae of the Congressional-judgment debate and focuses on two of the privacy-specific questions at issue: What type of identifiable information, if any, is necessary to establish sufficient concreteness for a privacy harm? And, what role does the risk of re-identification play in assessing the actual or imminent threat of harm?
The parties disagree about whether the disclosed information at issue is identifiable, and whether alleged identifiability is required for the disclosure to be considered concrete.
Direct and Indirect Identifiability
Google argues that the plaintiffs do not have standing because they failed to allege an “already-recognized tangible harm or a certainly impending risk of such harm.” It states that because no actual harm resulted from the disclosures, and because the plaintiffs’ claims of impending harm were mere naked assertions without alleged facts linking search terms to the searcher’s identity, the plaintiffs’ claims fail to rise to the level of concrete harm. To illustrate its point, Google included the following hypothetical scenario:
Imagine that a person enters a store and asks for “brown lace-up shoes” without giving his name to the salesperson, and the salesperson subsequently reports to the store manager that “a man came into our store and asked for ‘brown lace-up shoes.’” The reporting of that query by itself could not inflict real-world harm on the person who made it.
Google’s argument is that its hypothetical is the scenario of the facts — the plaintiffs’ failed to include any mention of identifiable information in their complaint, and without identifiable information any potential harm is so remote as to be non-concrete. That may be a winning procedural argument against standing, but it buries the more interesting substantive issue: Does disclosure of search terms plus identifiable information equate to concrete harm? And if it does, what is identifiable?
Europe’s General Data Protection Regulation lends insight into what identifiable can mean. It introduces the concepts of direct and indirect identifiability. Directly identifiable data identifies an individual “directly from the information in question,” and indirectly identifiable data identifies an individual “in combination with other information.” It is helpful to borrow this categorization of identifiable data from the European jurisdiction and apply it to Google’s argument to clarify the issue at hand. Google advocates for the position that information disclosed must be at least indirectly identifiable — “terms used in searches by itself[,] without any link to the identity of the individual ... is insufficient to inflict concrete harm” — to constitute concrete harm, if sufficient even then. The plaintiffs remain largely silent on the issue of identifiability.
Google’s argument is based on the legal scenario created by the plaintiffs’ complaint, which fails to explicitly tie search terms and IP addresses together — resulting in a lack of even indirect identifiability. But, the realworld scenario includes the key piece of information that Google emphasizes is missing from the complaint: IP addresses. It is worth digging into the details of identifiability and the technology at issue even if the pleaded facts may provide the Court a procedural out that may prevent it from engaging with the issue. The substantive questions about harm this case raises have far-reaching implications.
Is an IP address identifying?
In its briefs, Google emphasizes the fact that the named plaintiffs failed to allege that any plaintiff’s “search by itself revealed the searcher’s identity.” But, the company acknowledges that an IP address is sent along with a referrer header in the HTTP exchange: “In addition to the referrer header, the server hosting the user-requested webpage also receives the user’s Internet Protocol address.” The facts may not include search terms plus IP addresses, but an IP address is certainly shared in the communication. This begs the question: Does the combination of search term and IP address make an individual searcher identifiable?
The identifiability of IP addresses is a hotly-debated issue. The FTC says that static IP addresses can be “reasonably linked to a particular person” and should be regarded as “personally identifiable.” The California Consumer Privacy Act includes IP address as an “identifier” under the definition of personal information. The CJEU found that dynamic IP addresses are personally identifiable information in some circumstances. And Recital 30 of the GDPR clarifies that “online identifier”— under the definition of personal data — includes IP addresses. What seems to be relatively consistent is that an IP address plus some other information is at least indirectly identifying, which, applying Google’s argument, leads to the conclusion that the combination of IP address and search term may equate to a concrete harm. Whether the Court will weigh in on this specific issue is anyone’s guess.
What role does the risk of re-identification play in assessing the actual and imminent threat of harm?
The issue of re-identification plays an important role in the parties’ arguments. The plaintiffs claim that any disclosure without their consent heightens the risk of re-identification — connecting the search term to the searcher — and constitutes a privacy harm. Google and the government argue that any such risk is too speculative to rise to the level of a concrete harm.
Google goes to great length to diminish the risk of re-identification of a user from the user’s search terms alone. It may be correct that the risk is minimal, but that is not to say it is impossible, or even difficult to connect the data points, especially if the search terms are associated with an indirectly identifiable IP address. That raises the following fundamental question about privacy harms and standing: To what extent does the risk of re-identification of disclosed data elevate the concreteness of the harm?
The parties disagree about the risk posed by re-identification. In the consolidated complaint, Gaos alleges that the “Science of Re-identification ... creates and amplifies privacy harms by connecting the dots of ‘anonymous’ data and tracing it back to a specific individual.” Google argues that the plaintiffs failed to allege that the “Science of Re-identification” was “actually used to link their searches to any plaintiff’s identity” and calls their contention of an impending risk “purely speculative” (or as the government phrases it, “too speculative to create standing”). Additionally, according to Google, for re-identification to occur, a web operator would need to link multiple search queries together. The company paints the likelihood of linking various searches together as unlikely and argues that even if the searches can be linked, a website operator “would have to take at least five additional steps” to achieve a malicious goal:
- Retrieve and combine these multiple anonymized search queries.
- Identify “data fingerprints” in those queries.
- Combine those fingerprints with unspecified other data such as cookies, presumably from other websites but not from referrer headers.
- Discern individuals’ identities and their personal information from this combined, unspecified data.
- Exploit individuals’ discovered identities to their detriment.
Google implies that these steps are burdensome, but steps one through four are easily achieved by any hobbyist R-programmer (or any number of programming languages) through common data cleaning techniques and the R library, rgeolocate. Frank concisely summarizes the merits of Google’s argument as it pertains to the real-world (as opposed to the complained) facts where search terms and IP addresses are exchanged: “Google alleges multiple searches are required for re-identification, but [its] analysis ignores that IP addresses frequently disclose location and are readily cross-referenced with other collections of aggregated user data.”
The risk of re-identification is inherent in any disclosed data — or as one researcher put it: “It’s convenient to pretend it’s hard to re-identify people, but it’s easy ... any first-year data-science student could do [it].” Some argue that the risk of re-identification is “overblown,” but concede that it depends on “proper de-identification techniques” and “re-identification risk measurement procedures.”
To support its assertion that the risk of re-identification is low, Google also makes certain statements about the extent of the disclosure at issue. The size of a disclosure, both in terms of the amount of information disclosed and the number of entities to which disclosure occurs, bears on the risk of re-identification. At the beginning of its brief, Google includes a “Relevant Technology” section that describes the purpose of referrer headers. It states that “[a] referrer header is not disclosed to the general public; in the examples [provided in its brief], only the server hosting [the target webpage] would receive the referrer header.” That is not entirely accurate. During the relevant period of time this exchange was accomplished via unencrypted HTTP (not encrypted HTTPS), so “not disclosed to the general public” is a misnomer; the information was available for anyone to intercept with a common packet sniffer and could be easily parsed in a readable format. Google continues: “The referrer header identifies only the immediately preceding website; it does not contain any other information about the search history of the user requesting the webpage.” While technically true, the referrer header is part of other information included in the HTTP exchange that does contain information that can link to a user’s search history — an IP address.
This is a challenging issue — and one that does not appear to have been comprehensively briefed by all parties. If the Court is to set precedent regarding the amount of re-identification risk necessary for a privacy harm to become sufficiently concrete, it may do so without the benefit of input from various interested parties that are not part of the litigation. Will Baude is wary of this scenario:
“This makes me a little nervous, since I do not think the Court does its best work on tricky federal courts questions when they are noticed at the last minute in the middle of another merits case. The posture also means that the issue may not get as much public attention (and I am not even sure whether amicus briefs are permitted on this issue — though surely there would have been many if the issue were granted in another case).”
It is unclear if the Supreme Court will address any of the privacy harms issues raised in the supplemental briefings submitted by the parties — it may instead choose to dispose of the case on other grounds or remand it back to the 9th Circuit — but if it does, the decision may have broad implications for privacy litigation in the future. Google and the government may effectively downplay the risks of re-identification and the richness of information contained in referrer headers, but the issue raises fascinating questions with which the Court must grapple in the future, if not now.
photo credit: dog97209 US Supreme Court Washington DC via photopin (license)
If you want to comment on this post, you need to login.