What’s the best way to protect people’s personal health information (PHI) while using the data to benefit society? That’s a crucial question for physicians and their patients, as well as for the epidemiologists, health researchers and public officials who rely on high-quality data to improve the delivery of healthcare, cure diseases and stop pandemics.
To mitigate privacy concerns while enabling researchers to analyze such data, one solution is to keep it encrypted even when it’s being analyzed. That’s possible now thanks to software developed by Khaled El Emam, an associate professor at the University of Ottawa and a Canada Research Chair in Electronic Health Information.
His software, named “Smart Platform,” uses a cryptographic technique known as secure multi-party computation to split encrypted data between multiple, semi-trusted third parties--in this case, located at academic institutions across Canada—that serve as computing engines. Think SETI@home, only for encrypted, aggregate health data.
Analyzing Encrypted Health Data
As a result, researchers can see summaries of results without having to view full data sets.
“With privacy laws, you have this minimal necessary requirement, which basically says, only collect the data you need for the purpose—and this meets the concept,” said El Emam. On the other hand, if researchers spot and want to track an outbreak, such as a new H1N1 pandemic, the software includes an automated “break the glass” capability to reveal the identity of information providers, although medical practices must cooperate for it to work. “The practices that provide data retain some control over their data that way and are always aware of more detailed data being requested,” he said.
The stored data, while encrypted, would be susceptible to cracking via collusion, meaning bringing multiple data sets together. Accordingly, El Emam said that data sets for the pilot projects will be located in multiple jurisdictions. “We’re setting this up so one is in Ontario and the other is in Quebec, so if collusion is forced, a case has to be made that it’s justified.”
Two organizations are currently in negotiations to pilot the software (for free): a hospital infection surveillance program tracking “superbugs,” such as MSRA, and a surveillance program for sexually transmitted diseases. In Canada, for both types of data, “reporting is mandatory, but poor,” said El Emam. Accordingly, he hopes to build his software into electronic medical records (EMR) software or to provide a standalone hardware appliance. Either approach would encrypt collected PHI before it leaves the premises, to maximize privacy while facilitating research.
Sharing De-Identified Patient Data
El Emam is no stranger to data encryption, or de-identifying health data to remove personal information. He also heads Ottawa-based Privacy Analytics, which he founded in 1997 to commercialize his Privacy Analytics Risk Assessment Tool (PARAT), which de-identifies data and assesses re-identification risk. For public health purposes, the tool is used to share entire sets of data, rather than just predetermined data points.
The software provides “a different kind of solution to the same problem” addressed by secure multi-party computation, he said. “If you go back to the minimal necessary issue, in some situations you really do need access to the full data set, because you need to run sophisticated algorithms” rather than just seeing data summaries.
For example, the Heritage Provider Network Health Prize used PARAT to de-identify the data set it’s provided as part of its $3 million competition “to develop a predictive algorithm that can identify patients who will be admitted to the hospital within the next year using historical claims data.”
Sharing Health Data: Promise And Peril
Another PARAT user is oncologist Dr. Craig Earle, a senior scientist at the Institute for Clinical Evaluative Sciences (ICES) and the Sunnybrook Health Sciences Center in Toronto. He’s linked a cancer registry managed at Cancer Care Ontario (CCO) with administrative data from ICES. Called “cd-link,” his program shares the linked, de-identified data directly with researchers.
But even by using a data de-identification tool, which allowed him to launch cd-link, he still faces legal hurdles. “There are very conservative interpretations of the privacy laws in Canada,” he said. For example, he’s not allowed to share the data with researchers outside of Ontario--even in Alberta or British Columbia. “The reason for this restriction is not obvious,” he said.
Data de-identification, of course, is a hot-button issue, with critics highlighting failures that led to the reverse-engineering of AOL search data, Netflix movie recommendations and pre-HIPAA Massachusetts insurance data. But El Emam, a de-identification proponent, said that in those cases, data wasn’t de-identified, at least not correctly.
In a health context, furthermore, collecting PHI is a “when, not if” proposition. Laws in numerous countries, including Canada and the United States, permit—and sometimes require—that physicians and laboratories share certain types of health data. Accordingly, why not better secure it and ensure it’s made as anonymous as possible?
Consider The Public Good
Data de-identification, however, isn’t bulletproof. Rather, it’s a risk-management exercise, meaning that if done correctly, it can make data quite difficult to re-identify—if anyone should go to the trouble. “Does anyone really want to spend the time and money required to try to re-identify a kidney cancer patient? And even if they did, how much real harm would occur?” said Earle at ICES.
On the flip side, consider the potential good that can come from better health data collecting and sharing. “Looking at the balance of all of this—the possibility of learning new things about how to treat the disease or ways to better manage our healthcare system--most of the time the likelihood of benefit will greatly outweigh the likelihood of harm,” he said.