Beyond the Law: A Common Rule for Data Research

The European Data Protection Supervisor’s announcement last week of the establishment of an external, interdisciplinary, six-member advisory group on the ethical dimension of data protection is the latest in a line of initiatives recognizing that the law isn’t enough for balancing big data benefits against privacy and civil liberties risks.

Last year, for example, to operationalize the European Court of Justice decision on the right to be forgotten, Google created an advisory council comprising senior officers and external experts, including a philosopher, a civil-rights activist and a United Nations representative, to arbitrate delisting decisions. In response to a public outcry about its “emotional contagion” experiment, Facebook established ethical guidelines, structured review processes, training and enhanced transparency for data research projects, marking another milestone in the emergence of data ethics as a crucial component of corporate governance programs. And the Obama administration issued a report about big data, called for the establishment of privacy review boards in its Consumer Privacy Bill of Rights and is currently looking at data ethics issues.

To take a concrete example, the recent debate over the attacks reportedly launched in the wild by Carnegie Mellon University researchers against users of Tor demonstrates that even without a focus on—or arguably collection of—personal data, research can have profound implications for individual privacy and safety. Piercing the privacy veil of Tor users—some of whom may be terrorists or pedophiles, while many others may be human rights activists in oppressive regimes or intelligence agency operatives—raises serious ethical concerns.

In today’s data-rich environment, everyone has become a researcher, with data labs springing up like mushrooms after the rain, not only in academic and government institutions, but also in companies, non-profit entities and even individuals’ homes. Researchers conduct analysis of our everyday data exhaust, from massive commercial or government databases to individual tweets or publicly available Facebook postings. Their goals range from curing cancer and designing emergency-response services to product improvement and increased ROI.

In many cases, individuals’ personal information—itemized or aggregated, exposed or de-identified—is used as raw material to test hypotheses, discover hidden correlations or validate theories that in the past were lost in the noise. At the same time, much ambiguity remains about the application of existing laws and ethical codes to this brave new environment.

The ethical framework governing human-subject research in the biomedical and behavioral research fields dates back to the Belmont Report, which was drafted in 1976 and adopted by the U.S. government in 1991 as the “Common Rule.” The Belmont principles—respect for persons, beneficence and justice—require obtaining individuals’ informed consent, conducting cost-benefit analysis and distributing research results equitably. Yet they are limited in scope to government-funded research and are geared towards controlled scientific experiments, such as clinical tests, with a limited population of human subjects interacting directly with researchers.

In contrast, big data research often takes place outside the remit of the federal government and on databases so large and diffuse as to render solicitation of individuals’ consent unworkable.

Attempting to fit such activity into the strictures of the Common Rule meets many challenges. To begin with, it isn’t clear that data-centered research constitutes human-subject research. As Michael Zimmer notes, “the perception of a human subject becomes diluted through increased technological mediation.” Moreover, the existence of identifiable private information in a dataset—one of the triggers for the Common Rule—has become a source of great contention, with de-identification critics casting doubt on whether data could possibly be anonymized at all. For companies, meanwhile, it isn’t clear where the line crosses between ethically bound research and run-of-the-mill A/B testing, business analytics and product improvement.

No doubt, privacy and data protection laws provide a backstop against abuse of commercial data use with boundaries like consent and avoidance of harms. But in many cases where informed consent is not feasible and where data uses create both benefits and risks, legal boundaries are ambiguous and rest on blurry-edged concepts such as “unfairness” or the “legitimate interests of the controller.”

Misgivings over data ethics could diminish collaboration between researchers and private-sector entities, restrict funding opportunities and lock research projects in corporate coffers, contributing to the development of new products without furthering generalizable knowledge.

To address these issues, the Future of Privacy Forum—of which I’m a senior fellow—hosts a workshop tomorrow supported by the National Science Foundation and Sloan Foundation and attended by leading computer scientists, ethicists, lawyers and business leaders. The debate will focus on a suite of paper submissions—which will be published by Washington & Lee—by thought leaders including Urs Gasser, Arvind Narayanan, Neil Richards, Woody Hartzog, Simson Garfinkel and more. It will drive to adapt existing ethical standards to the realities of data research to ensure that society can benefit from the tremendous opportunities of scientific progress without sacrificing individual rights and fundamental freedoms in the process.

photo credit: 3D Scales of Justice via photopin (license)

Beyond the Law: A Common Rule for Data Research

Related stories