Medical researchers often contend with a competing requirement when it comes to progress: protecting privacy. Under the Health Insurance Portability and Accountability Act (HIPAA), healthcare providers, health insurers and healthcare clearinghouses often have to obtain additional documentation before
health information to outside parties, making researchers’ data collection practices cumbersome.


Thanks to a National Institutes of Health (NIH) grant and two researchers at the University of Massachusetts Lowell’s Manning School of Business, some of that difficulty may soon be resolved.


The $700,000 NIH grant will help Profs. Xiaobai Li and Lutvai Motiwalla and two assistants create technology that will resolve the tensions between adequate healthcare privacy provisions and research needs.


Called data-masking technology, it will allow researchers to access more meaningful data without having to comply with rigid consent requirements under HIPAA. Healthcare providers must remove 18 points of sensitive data—such as names, Social Security numbers and dates of birth before releasing information, unless an established legal agreement precludes such requirements. But with data-masking technology, only five or six data points must be removed.


“So it improves data quality, because more variables are released,” said Motiwalla. “It’s a win-win for both the people sharing (the data) as well as people who are using the data for analysis, because the people releasing it don’t have to be afraid of data compliance, and identity cannot be revealed.”


While data that’s been de-identified according to HIPAA’s standards can lawfully be shared without specific authorizations, such data often proves less useful in conducting meaningful, comprehensive research, the professors said.


“To analyze the data, you need to provide information,” said Li. “But at the same time, you want to protect the individuals—like patients or even doctors. That’s the kind of tension that we try to address.”


The entire process starts with the data controller.


“So if a researcher wants data from a hospital, the hospital will run through the query and remove data from the database,” Motiwalla explains. “Then, our software would be applied to the data, which would then mask it, and the masked data would be released to the third party.”


Data masking is different from encryption, however, and addresses a problem thus far unresolved.


“If you encrypt the data, you cannot do any statistical analysis,” said Li.


Data masking can work in a couple of different ways. One is called statistical protobation. By either applying statistical methods to the data or, for example, adding the same number to each data subject’s age, the data is no longer a true value, but statistical. Alternatively, data swapping can be used, in which one person’s age, for example, would be swapped with another in the database.


“So what is released of the data, in terms of age or date of birth, will not be a true value but in terms of a statistical property data, it’s still the same,” Motiwalla said. Data can also be generalized so that instead of a street address, a zip code with the first four digits truncated is used.


Motiwalla and Li are now in the process of collecting patient data available under HIPAA laws in order to identify vulnerabilities. They hope to collect data from organizations in sectors other than healthcare as well, as data masking could potentially be employed for myriad applications in a number of sectors, including for non-research purposes, such as simply to keep databases safe from hackers or data loss.


Motiwalla and Li encourage organizations interested in getting involved to contact them.


In the end, say Motiwalla and Li, their purpose is to find a way for research to thrive while still maintaining appropriate privacy protections for individuals.


“With data masking, that balance is possible,” said Li.