Rafae Bhatti is a panelist in a session titled “Analytics and Privacy Can Go Together: Engineering the Analytics Platform with Data Protection in Mind” that will be held at the Privacy Engineering Section Forum Sept. 23 in Las Vegas.
If data is the new oil, then analytics are the new refinery without which any modern business is unable to make informed decisions. However, data analytics and privacy are seldom assumed to go together. If media reports and regulatory actions are any indication, services and platforms that utilize or enable analytics have consistently been under scrutiny in terms of meeting reasonable privacy expectations.
Conflict between data analytics and privacy
Legitimate needs for data analytics
Not all analytics are created equal. Organizations on the one hand may analyze non-identifying website traffic to better understand the trends affecting their business. On the other, they may also build aggregate (non-identifying) clusters based on user preferences and engagement patterns to help make product improvement decisions. Both use cases are legitimate under the law when carried out with appropriate protections built into the process.
Consequences of ignoring privacy
Exposure to legal, policy, technical and organizational risks are among the consequences of ignoring privacy considerations in the design phase. It makes it difficult to conduct privacy impact assessments and honor obligations associated with sensitive data when that data is not appropriately identified, sanitized and isolated. It also creates a reactive organizational culture with respect to privacy, which not only reflects poorly on its commitment to its customers, but can also lead to poor employee confidence should the company consciously ignore making prudent privacy decisions. Finally, it is a significant burden to make design changes to respect privacy considerations after the fact due to technical dependencies, results in multiple applications with inconsistent processes and that it forces undesirable trade-offs.
Undesirable privacy trade-offs
When companies are forced to settle for trade-offs, these often include limiting the features to certain audiences (take geography-based restrictions as an example. If you walled off European users to “deal with” the EU General Data Protection Regulation, are California users next when the California Consumer Privacy Act comes along? And what happens when other states considering such measures join the bandwagon?), creating a separate tier and charge extra, or making a superficial change. However, trade-offs, as the name suggest, are just that. They are a compromise. They don’t scale well. Additionally, companies will be subject of the media stories or regulatory actions, and distrust among users will rise should the compromises be difficult to deal with.
Consequences of ignoring analytics
Increased scrutiny into a business’s privacy practices does not mean that it should ignore or clamp down on analytics as doing so has significant downsides. Identifying trends impacting the business is essential to set the strategy, and companies that don’t track the trends lose sight of what matters. Understanding what works and what doesn’t helps business grow, and therefore not tapping into users’ preferences reduces the likelihood of growth. Lastly, ignoring the value of analytics is a missed opportunity to solve real problems. Actual business problems are surfaced through good analytics and the ability to ask the right questions enable solutions to address those issues.
Unlocking value of analytics while navigating privacy
Companies can get ahead of this conundrum by putting in the right foundations for data protection into the platform. Privacy should be an important design consideration from the start or an adaptable approach to be incorporated later. Additionally, it requires that privacy not only be baked into the engineering process, but also the culture itself must be one that encourages it as a prudent course of action. This is how we have designed our data analytics platform at Mode.
Design considerations for privacy
Sanitizing data feeds
It is important to know when not to keep personal data. Applications not used for the purpose of analyzing sensitive personal data should not have access to it. The most obvious risk is the accumulation of data in various places, some of them may be unknown. There is also the risk of applications breaking when this data is suddenly removed as an afterthought.
Good business practice: Keep sensitive customer data out of analytics databases. This can be achieved by programmatically controlling application access to sensitive data.
How its done at Mode: The design decision we made at Mode was to keep sensitive customer data out of specific analytics databases that were not designed for that purpose. In order to do that, we excluded tables and columns that have sensitive fields, as well as fields that may contain sensitive data. There is a dedicated script that sanitizes the data feed going into the analytics databases. In doing so, we make sure that applications that are not designed for the purpose of analyzing sensitive personal data will not have access to it.
Splitting up data workflows
Another important design consideration is the ability to identify where personal information is present in the data processing pipeline. Otherwise, analytics becomes challenging, and businesses run the risk of violating data privacy obligations.
Good business practice: Adopt a walled-garden strategy, and split up the processes to trace sensitive data and prevent it from spreading into unintended or unknown places.
How its done at Mode: We decided to split the data-processing pipeline into projects that include sensitive customer data and those that do not. Doing so sets distinct levels of scrutiny that depend on the sensitivity of the data processing pipeline. We can cast a much wider net on marketing pages in terms of what third-party scripts are allowed to run than on other pages in the application that process sensitive customer data.
Isolating sensitive data storage
Having a plan to isolate data allows a business to keep things manageable. It can get challenging to separate sensitive data when the storage is in several places and too many data repositories with multiple dependencies introduce inconsistencies when applying requirements.
Good business practice: Separate the storage of sensitive data from non-sensitive data. Adding friction to but not blocking access to sensitive data can be used as a signal to indicate that it merits a higher degree of scrutiny.
How its done at Mode: We separated the storage of customer query results from the queries that were run and their metadata. Because the sensitive information is found in the results of the query and typically not in the query itself, storing the results separately would allow us to isolate the repository that we needed to apply the most restrictive policies, such as requiring encryption at rest and applying data retention rules. Doing so can be technically challenging and cost-prohibitive if done for entire storage.
Privacy culture to support prudent decisions
These privacy considerations would not be effective if they are merely coincidental or mechanically observed. They must reflect the right level of encouragement as part of a company's culture. Employees should be encouraged and empowered to take the right decisions for the right purpose. Leadership must prioritize protection of customer data over short-term gains. The true test of whether your culture values privacy is asking yourself this simple question: What are you willing to give up for it?
Increased scrutiny should be viewed as an opportunity to build better privacy protections in analytics services, not providing lesser value. Taking prudent design decisions and adopting a culture that encourages them helps the business unlock the potential of data analytics while avoiding the negative consequences that result when privacy considerations are ignored.
If you want to comment on this post, you need to login.