25 Oct. 2023

Data quality, privacy joined at the hip

Now, data, when collected and deployed smartly and responsibly, can provide that substantial competitive edge and we can think of artificial intelligence as the catalyst that will take this data-driven economy to the next level. How data gets applied to software and how businesses manage risk and capture value from AI initiatives will be the differentiator.

Within this context, the future looks bright for business. The convergence of data and AI will offer unprecedented opportunities for enterprises — in everything from customer insights and enhanced operations, to market analysis and product innovation.

But there is one issue that threatens not only this future but AI's very existence: data quality.

All of us want to live in a world where we can build and scale disruptive new AI technologies, the kind of innovations that have the potential to make enterprise — and our lives — so much better. But for AI to flourish in this way, the technology needs access to sensitive data and many people are understandably cautious about handing their data over to machines.

This has created a conflict of interest between AI's potentially transformative value and very real concerns about data trust. If people don't trust AI, they won't willingly share their data with AI developers to make it better. And if that lack of trust takes hold at the regulatory level, it could stop the technology's development in its tracks.

Part of the problem is that we currently have no trust mechanism for AI models. A fiduciary relationship with AI simply doesn't exist in the same way that we might trust medical professionals, for example, to store and manage our sensitive personal data. AI trust concerns are mainly around sensitive data in the public sphere, and the results AI delivers — the fidelity in model results.

And there will soon come a time where the data from individuals will be a drop in the ocean compared to the vast outpouring of data from machines and embedded in machine learning algorithms. Because these projects are trained with data and may one day themselves provide training data for future programs, the conversation about data quality must necessarily be broadened to incorporate them.

If we get it right, individual trust in these models will mean they can be trained with increasingly diverse data, which in turn will produce higher quality data outputs. Better data quality will translate to greater value from AI systems. With increased trust and confidence in the model from its users, the technology might get a chance to meet its transformative, almost utopian, world-changing potential.

Data engineers will be central to building the infrastructure that enables the growth and adoption of AI. But how should they approach this new data frontier?

Data discovery will play larger, more important role, in conversations about data

The greatest challenge will be ensuring the global business standard is a comprehensive, nonintrusive system of data observability.

Data discovery should be thought of as a helicopter view to data infrastructure, and it will need to incorporate every element of a system. Observing data stacks is crucial, but not just tables — logs, trained models, executed pipelines, any change upstream that affects business users — the entire life cycle of data in systems and the systems themselves must be observed.

Data governance should fall into this new observability framework, too. Ask yourself: is your data privacy-aware?

Data governance needs holistic approach

Data governance frameworks have traditionally been difficult to implement, with organizations struggling to get the right balance between a system that is nonintrusive while also effectively delivering quality insights and addressing issues. Where there'; a lack of data governance, data can become inconsistent and laden with errors. This will only become more problematic in an AI-driven world.

Corporate executives should be able to access data governance dashboards, where every data point is checked. They need to know there's full consent and permissions across systems and that they are meeting regulatory compliance with their models. There's a whole new layer of observability here, incorporating trust, compliance and respect, which will need to be built into the future data framework.

Improving models means improving quality of data

This will mean checking for trust at every single point of the data journey — from responsible collection, to use, to retention. Many businesses currently understand that when an individual asks for any personal data held on them to be deleted, they must comply.

With AI, this question of data consent is more difficult. Deleting data won't be enough. AI models may have already been trained on this data, and as far as the model is concerned, it may already be baked-in. So, solutions and systems will need to be designed that remove all traces of data from the machine, including its training mechanisms. Only by doing so can we truly say that we are respecting data and that we can be trusted to handle and use it.

Every aspect of the AI process — from how it's trained to how it's deployed — requires the highest level of attention from CEOs, tech leaders, and engineers to ensure the data it processes is sufficiently protected and respected. The regulators will catch up to this, too.

Privacy by design

Data quality must be built internally, from the ground up. Third-party vendors won't know what your data will look like, the specific metrics you want to observe, and the aspects you want to take care of which are crucial to your business. Only you and your business can build a system with data trust at its core.

And it's not just the business leaders or compliance officers who need to think about data integrity anymore — it's now the role of business analysts and data engineers, too. By combining the principles of data governance, observability and trust, data engineers have the foundation to build technological infrastructures that manage, secure and protect the data fed into AI models.

By enshrining true data stewardship as a core value across all business operations, we can ensure a future where privacy and protection is offered de-facto for all users and their data. This will also ensure a future where AI technology can grow and succeed.

Jonathan Joseph is head of solutions at Ketch and Yuliia Tkachova is co-founder and CEO at Masthead Data.

This article is eligible for Continuing Professional Education credits. Please self-submit according to CPE policy guidelines.

Submit for CPEs

Interested in writing for us? Visit our Contributor Guidelines Page

Data quality, privacy joined at the hip

Related stories

Data discovery will play larger, more important role, in conversations about data

Data governance needs holistic approach

Improving models means improving quality of data

Privacy by design