Data has become a four-letter word. We even worry about the application of data in estimating disease spread or planning health care for fear that once collected and shared, the data could eventually be misused and applied to an altogether different use.

Consumers have been conditioned to look for data misuse. Accidental data leaks (Virgin Media), concerns about national election security or simply poor data protection practices (British Airways) are fueling suspicions. Rapid technological innovation has increased complexity. Even sophisticated users struggle to define exactly what data is being collected, when it is being collected, by what devices or applications, and who it is shared with.

For most businesses, the ultimate value is not in the data itself but rather its insights and applications. Enterprises at the forefront of digital transformation and artificial intelligence and have already made progress in defining responsible data use. Responsible data use is when data achieves the purpose of the application and the purpose is well crafted and narrowly constructed, limiting the need to identify any consumer specifically.

Establishing privacy-first principles and proactively adopting responsible use standards as the rule will not only build consumer trust, but also help scale digital transformation and AI, thus accelerating consumer value. As consumers recognize the benefits, their confidence in businesses’ ability to respect data privacy will increase, creating a virtuous cycle.

From PII to PI to responsible use

Historically, data privacy centered around personally identifiable information — data that could identify a particular person or be used to commit fraud (i.e., name, Social Security number, driver's license number, bank account number). In recent years, the focus has broadened to persistent information, such as advertising ID, IP address and cookies, as this pseudonymous data reflects an individual’s behavior and may be used to indirectly identify them if combined with more information.

With the rise of 5G and the Internet of Things, the quantity and type of data interactions will only explode and create further gray areas. According to Strategy + Business, the average user produces 4,900 data interactions per day, and the amount of data stored per internet user is expected to double by 2025. As legislation lags technology, it becomes incumbent on businesses to lead standards for the “responsible use of data.” These standards address “should we do this?” rather than “can we do this?”

Making responsible use the rule 

Responsible data use starts at the conceptualization of any application or platform. The problem statement should be based on aggregated and deidentified data, and the application or platform should only process privacy-aware data. This may sound like complex legal jargon, but it is based on four straightforward principles.

Aggregate vs. individual insight

The goal of the platform should not require identifying any individual specifically and require a minimum threshold of scale for inclusion (i.e., a certain number of records). Data that does not meet this standard should be discarded so as not to keep isolated cases.

For example, expanding public transit requires population counts segmented by demographic (to optimize for specific groups, such as the elderly). Identifying someone’s specific commute pattern, as an individual, will not improve these efforts. There are countless similar use cases that require the application of AI to large, aggregated, high-quality datasets.

First-party consent and deidentification

It is not enough to confirm consumer consent was collected by the first-party data source. Organizations need to verify that the data was obtained with consumers’ rights and benefits in mind. Consumers should not only consent to sharing data, but they should also be able to change participation at any time and be aware of the purposes and benefits for which data is being collected.

Once validated, all personal or user-level identifiers should be discarded by either the first-party provider or the product (i.e., some advanced platforms filter user-level identifiers and utilize only aggregated or deidentified data).

Privacy-aware vs. personal data

Privacy-aware refers to using the intelligent outputs of deidentification or aggregation, rather than the original data which can identify a unique individual directly or indirectly. For example, privacy-aware outputs could be an obfuscated (hashed) device ID so what the platform sees is a random string, not identifiable to any individual. Rather than dealing with PII or PI, these systems only consume privacy-aware data. Further controls can be implemented to ensure that no specific or individual user is identifiable, thus eliminating any element of either human error or maleficent actions.

Transparency and simplicity

Recent research emphasizes the importance of honest and friendly communications that help consumers understand how the data they provide matches the transparent purpose for which it is needed. Similar to shifts seen after the Credit Card Act of 2009, companies should prioritize making the data approach more accessible and describe the complete data lifecycle, including data source, types of data interactions and data that is retained in the platform.

Responsibility will rebuild trust

There will always be questions on the nature of deidentification and aggregation, with some claiming that “extrapolation” is still possible by mining and matching location data with other public sources. However, without user-level identifiers, it is nearly impossible for misuse. In fact, the probability of privacy violations in the world of responsible use should be no greater than the hundreds of manual and error-prone processes used for gathering the same information today. This ongoing consternation reflects just how much consumer trust has eroded.

Businesses can and should rebuild consumer trust through their own leadership and adoption of responsible use standards. The standards for responsible use of data provide a framework for utilizing the power of data, without compromising fundamental rights of privacy.

Photo by Markus Spiske on Unsplash