Cities around the world are getting smarter.
For municipalities, becoming a smart city requires utilizing numerous data sets containing residents’ and visitors’ personal information in order to provide better services.
However, municipal IT systems are often targets for cyberattacks. Compounding the issue are some cities that may have outdated cybersecurity infrastructure safeguarding personal data.
In Canada, a new effort was launched to help cities transition from using personal data to using synthetic data, as Toronto-based nonprofit Innovate Cities and synthetic data generation provider Replica Analytics have joined forces. The basis partnership will provide municipalities with synthetic data based on real-life data points to achieve smarter solutions for each given city.
As part of the partnership, municipalities will provide Innovate Cities with a data set for a specific aspect of their municipal operations they seek to incorporate into a smart city endeavor. Innovate Cities then transfers the data set to Replica Analytics, which will apply its SDG technology to train machine-learning models to replicate the municipal data containing personal information into a synthetic data set for the organization’s data trust, CityShield.
Borrowing from his prior experience with Google’s urban planning and infrastructure subsidiary, Sidewalk Labs, Innovate Cities Executive Director Hugh O’Reilly said he and his team sought to create a data trust for citizens that “treats privacy as a fundamental right, of citizens and residents, and people in cities.” He said Innovate Cities views the EU General Data Protection Regulation as the “gold standard” for personal data protection and security laws. His organization aims to protect municipal data up to the EU legal standard, beyond Canada’s existing privacy laws.
“When we started to look at the governance principles, we thought the most important principle to abide by was earning the trust of people who live, work and play in urban areas where that data trust is going to be operating,” O’Reilly said. “We want to demonstrate to citizens, residents and people who come to visit cities where CityShield operates, as well as to set an example that you can have a data-sharing platform, you can have a data trust and it can also abide by the highest standards. So, we set those highest standards for ourselves.”
Ryerson University’s Privacy and Big Data Institute Executive Director Dr. Ann Cavoukian is the Innovate Cities independent chief privacy officer. Cavoukian, a prior information and privacy commissioner of Ontario, formerly signed on to be a privacy advisor for Sidewalk Labs in Toronto. However, she said she resigned from the post after the organization did not require partnering businesses to employ privacy-by-design practices.
She accepted the position at Innovate Cities because she saw a strong commitment to privacy by design and an interest in using synthetic data, which she considers a “highly efficient, effective and accurate” means of replicating data.
“I made it clear to, I would be happy to work with them if they walk the walk and deidentify the data source and if (we) used a really reputable company like Replica Analytics. Khaled El Emam is a leader in this area,” Cavoukian said. “(Synthetic data) doesn't present any of the risks associated with the real data, which have the personal identifiers linked to it. In my view, it’s going to be the future in terms of privacy and data utility.”
El Emam, senior vice president and general manager of Replica Analytics, said the reason synthetic data achieves greater individual privacy where traditional deidentification and anonymization techniques can fall short is due in part to a worldwide deficit of technical expertise in those areas.
“The privacy risks with synthetic data tend to be lower than deidentification and anonymization, and the quality of the data will be a bit higher,” El Emam said. “It’s largely automated, so you don't need as much skill to create the deidentified data sets. One of the big challenges, for example, with deidentified data is that there's not enough people around who have that skill set to do it.”
While the Innovate Cities-Replica Analytics partnership is in its early stages, O’Reilly said he hopes the initiative “demonstrates the best behaviors” around privacy for the further modernization of cities. Currently, he said Canadian municipalities span a wide “continuum” of deploying smart city technologies, ranging from minimal integration to implementing several smart initiatives.
“Some cities are further along than others, and a common issue that is faced by cities in Canada, but I suspect cities across North America and even around the world, is municipal governments are really overburdened,” O’Reilly said. “So that level of busyness creates issues ... if you are a civic government trying to do things, you might have 20 different innovators come forward, or 20 different pieces of software come to your attention to solve one problem.”
O’Reilly said one potential example of CityShield’s synthetic data sets in a smart cities’ context could be making public transit more efficient. Hypothetically, he said, a CityShield data set could combine the ridership rates of a city bus line with cellular location data to get a better sense of how frequently individual riders take the bus.
“When the data is available on the data trust, you won't be able to tell who was on the bus, so to speak, and you won't be able to re-identify them very easily,” O’Reilly said. “When we talk about the innovation economy, the biggest barrier for innovators and others to get access to data, and to make the best decisions, is to do it in a way that respects the privacy rights of individuals. What we've endeavored to do is to create a data sharing platform that not only is world-leading in terms of the technology we use, but to combine it with world-leading privacy practices as well.”
El Emam said urban planning is not an entirely new purpose for synthetic data, as several U.S. cities have employed it in various contexts. He said Replica Analytics SDG technology has been successfully implemented in the healthcare industry to protect patients’ medical data while maintaining its statistical properties.
“There is no inherent limitation (to) what (data sets) can be synthesized and what cannot be synthesized, … in terms of using synthetic data as a mechanism for data-sharing in cities, I think that's more recent,” El Emam said. “Part of it is figuring out which tool in the toolbox can be used, or should be used, for a particular data set to give you reasonable, privacy-protected results, so that you still maintain the utility of the data. We’ll have to see what data sets (from cities) come through and decide which machine learning tools are best suited for it.”