Start-Up Looks To Capitalize on Differential Privacy

Editor's Note: The company profiled in this article, Shroudbase, now operates as LeapYear Technologies and has substantially changed its business model, focusing on a different type of differential privacy. You should therefore understand that the information in this article is no longer valid in terms of the company and its operations.

We have left the article up in modified format, however, as it receives a fair amount of traffic from people looking to learn about differential privacy and we believe it still offers value on that front.

In some corners of the privacy world, de-identification has become something akin to the privacy community’s white whale: always just out of reach. Just this past summer, Princeton’s Arvind Narayanan and Edward Felton made waves with their paper, “No Silver Bullet: De-Identification Still Doesn’t Work.”

And while Privacy Analytics’ Khaled El Emam and Luk Arbuckle countered swiftly that “de-identification is a key solution for sharing data responsibly,” there remains an unease among those looking to use big data analytics for any number of purposes.

Thus, into the breach steps the start-up Shoudbase, a firm based in Philadelphia that is currently pitching “differential privacy” as a service, and soon hopes to offer it as a software package.

How does it work? While de-identification involves removing information from a data set, or replacing entries with numbers or hashing, differential privacy is “completely different,” said Shroudbase CEO Ishaan Nerurkar, who studied at the University of Pennsylvania in the Singh Program in Networked and Social Systems Engineering. “We keep almost all the information in the data, but we actually distort the database” via algorithms that create random distribution. “So, if I ask a question of the database … that answer is going to be slightly wrong, but slightly wrong in such a way that isn’t statistically meaningful. I can tell the difference between two individuals, but in a way that’s not going to matter.”

Obviously, this doesn’t work with a data set of 10 people or 20, but Nerukar said the technique works well starting at about 50,000 entries in a structured database, either SQL or Excel.

“The algorithms involved in [differential privacy] make it mathematically impossible to identify any individual in a privatized database and also ensure that aggregate analysis of a dataset is virtually unaffected by this protection,” he said. The company has consulted with lawyers and other HIPAA experts, to make sure the technique complies with current regulations, and the company is now ready to start working in the commercial sphere.

“The aim of differential privacy is future-proofing,” Nerukar said. “There are a lot of cases where datasets were exploited through legitimate querying, not hacking, and then looking at other data sets with different information to gain insights via a linkage attack.” He noted that Netflix’s data release resulted in this kind of breach; as did Massachusetts’ health info release, when Latanya Sweeney famously re-identified the governor.

“We want to make sure regardless of or what information they might get in the future, it’s impossible to identity the individual with high probability,” he said.

Editor's Note:

Author

Sam Pfeifle Nonmember Contributor

Comments

If you want to comment on this post, you need to login.

The Big Data Reports: Good for Privacy Pros—Anyone Else?

Privacy pros are largely lauding the White House’s recently released Big Data report—the culmination of a 90-day effort initiated by John Podesta and called for by President Barack Obama to examine how Big Data is being and will be used. But some are skeptical about its ever being more than a PDF sp...

Read More Save This

Big Data Analytics: Evolving Business Models and Global Privacy Regulation

If bad practices and bad media further promote other businesses and government to be less transparent about their data analytics projects, public perception of business and government colluding in secrecy will grow, prompting more prescriptive regulation. Big Data and the privacy regulatory and comp...

Read More Save This

Have the NSA Leaks Just Helped the PETs Industry?

Okay, at this point, we all know about the NSA leaks—if not, where have you been?!? But, did you know it was the best traffic week to-date for anonymous search site DuckDuckGo? According to this VentureBeatpost, direct searches on the search site were up 26 percent on Wednesday alone. DuckDuck...

Read More Save This

The American Privacy Rights Act's definition of covered data

In combing through a proposed or draft bill, privacy professionals naturally orient themselves by seeking out defined terms, scanning for the foundational and consequential definition of "personal data." Within the discussion draft of the latest effort to enact a national comprehensive privacy law, ...

Read More Save This

The Privacy Advisor | Start-Up Looks To Capitalize on Differential Privacy Related reading: The Big Data Reports: Good for Privacy Pros—Anyone Else?

Start-Up Looks To Capitalize on Differential Privacy

Author

Tags

Comments

Tags

Recent Comments

Author

Tags

Comments

Related Stories

The Big Data Reports: Good for Privacy Pros—Anyone Else?

Big Data Analytics: Evolving Business Models and Global Privacy Regulation

Have the NSA Leaks Just Helped the PETs Industry?

The American Privacy Rights Act's definition of covered data

Related Stories

Tags

Recent Comments