The Internet has developed into a vastly complex system of interactions between publishers, data brokers and ad networks, and understanding how that system thrives is front-and-center for an emerging group of computer scientists attempting to uncover some of the web's deepest secrets.
But this is no small task. How do you measure personalization? How can businesses be transparent about their algorithms without compromising their “secret sauce”? How can companies differentiate their consumers without being inherently discriminatory? And how can regulators possibly enforce this?
These were but some of many difficult questions raised last Friday at the Center for Information Technology Policy at Princeton University (CITP). Computer scientists, technologists, academics and industry representatives took a deep dive into making the web more transparent by connecting the work of this emerging field of computer science with businesses, regulators and others in public policy.
“Whenever we do measurement, there’s a surprising level of effectiveness,” said Arvind Narayanan of Princeton. The research, he added, helps fix “information asymmetry” between businesses and consumers, helps consumers make more informed choices and provides regulators with important data for possible enforcement.
Narayanan discussed his work detecting tracking technology such as canvas fingerprinting. “Measurement study does tend to have impact on online tracking,” he said, noting that one study he helped conduct revealed the extent to which sites were employing the technology. Since its publication, he said, the largest canvas provider has stopped providing it.
“There are different tools and methodologies coming out of the computer science field to increase transparency and shed light on businesses' collection and personalization practices,” said CITP Director Ed Felten. “This community is maturing and getting data that can be useful in policy-making, policy decisions, application of policy and enforcement.”
Several examples of web measurement research were on display throughout the day.
Anupam Datta of Carnegie Mellon University discussed Adfisher, an automated tool developed to work with Google Ad Settings to explore “how user behaviors, Google’s ads and Ad Settings interact.” He noted there is some choice on ads, but cited two examples that caused concern. One involved a job search based on gender. For males, the top two ads were for services helping the user to find $200,000-plus jobs. Those two results did not come up for women. In a second example, a search involved substance abuse and the three top results “were clearly targeting victims of substance abuse,” he said.
Both examples, Datta explained, require a deeper investigation and deeper conversation between researchers, regulators and businesses.
Georgia Tech’s Nick Feamster shared his work on personalization pollution. He demonstrated how cross-site request forgery can pollute a user’s personalized recommendations. “Personalization can easily be exploited,” he warned, which gives rise to a new generation of web security issues. In an attempt to expose some of the “Filter Bubble,” he and a team of researchers have helped create Bobble, technology designed to expose varying Google search returns among users.
Then there is Xray. Developed in part by Roxana Geambasu of Columbia University, Xray measures user inputs into, for example, Gmail, against outputs such as Google ads. In two examples, she demonstrated ads seen by the user: One was targeting homosexuality while the other targeted pregnancy. “It is generic, can be applied to several different cases and is accurate,” she said, adding that it currently can be used on Gmail, Amazon and YouTube.
Ashkan Soltani, who was recently appointed as the Federal Trade Commission's (FTC) chief technologist, said the work of these researchers helps show the technical sophistication and complexity required for understanding ad targeting and the difficulty of mapping those complexities to public policy. “Where are the lines? Traditionally, a lot of inferences could be made from subpoenas, reading emails,” but now technology is imbued with algorithms that make it “much more difficult to find causation, liability or wrongdoing.”
Genie Barton, representing the Better Business Bureau’s enforcement wing, said, “Without the technology and research of everyone in this room, my enforcement would not be possible.”
Throughout the day, the notion of privacy was a major topic, but the web’s technology—from supercookies and fingerprinting to personalized services and big data analysis—is going beyond privacy issues and into civil rights and discriminatory concerns.
Paul Ohm, who recently worked on public policy with the FTC, said, “Privacy is a concern, a large concern, but it has been joined by discrimination—in a broad and interesting way. It’s similar to privacy but different in other ways, and policy-makers haven’t even begun to talk about that.”
Barton agreed, adding “there is a blurred line between differentiation and discrimination.” She added that self-regulation “is not without teeth.” She said businesses cannot use data collected from across sites to make eligibility decisions. “Based on what I’ve seen here today, there is a potential for doing that.”
Soltani offered one possible solution involving what he called a unit test or test harness. He said it could be used in self-regulation. “Companies making algorithms could employ statistical methods and tests to prevent unfairness," he said, "to ensure their software doesn’t cross those racial, discriminatory lines.” He suggested those principles should not be too specific but “have guidance such that companies making an algorithm could unit test that they don’t cross certain lines or make poor representations.”
Princeton University’s Solon Barocas at one point asked, “How do we translate these concerns into practical insights? It’s not just about what’s fair—not just the decision-making procedure itself—but the outcomes.” He added, “There’s a shift going on here.”
Tracking and adjusting to that shift will be a challenge for the entire privacy profession and community.