Data has become the lifeblood of businesses, and in turn, it is becoming critical for effective artificial intelligence applications. Akin to a living organism, data is generated, enhanced, changed, harvested and reused to build and operate AI systems. As adoption of generative AI and new use cases for large language models grow, organizations must explore pragmatic approaches to navigating their data-related requirements in the pursuit of innovation.
AI risk is multifaceted, but the risks associated with poor data governance controls in particular can have widespread negative business impacts. While each company's unique AI risk profile will vary, some of the data governance challenges they are likely to encounter along the way will be similar. Therefore, for most organizations, establishing a strong foundation of data governance attuned every junction of the AI system life cycle requires a refresh of approaches across data privacy, intellectual property management, contracting, regulatory compliance, information governance and security.
AI-related data risks often result from a limited understanding of the value and sensitivity of the data the company holds, which in turn impacts its ability to protect and appropriately manage that data. While this isn't necessarily a net-new data challenge, the application of AI tools to mismanaged data will inevitably exacerbate the issue. For example, in an environment with an absence of data classification labels and granular access controls, an AI assistant may inadvertently provide some users with unauthorized access to highly confidential data.
Quality and accuracy of data used to train or operate AI models is another common issue. Bias can creep into AI systems through human cognitive biases and engineering decisions, as well as from the data itself. Having data labelling and quality protocols in place can help mitigate the risk of discriminatory or inaccurate AI outputs.
AI's fundamental dependency on large volumes of data can also lead to compliance risk if the system has access to or interacts with regulated datasets. Categories of data that may be subject to restrictions, whether regulatory, statutory or contractual, include personal data, copyrighted content, client data or sector-specific data such as health or financial information.
Additional high-stakes data risks impacted by the use of AI
Data privacy
Indiscriminate collection and over retention of data within an AI system can increase the organization's data breach risk and operational storage costs. When personal data is involved, it can also create tensions with fundamental principles of data protection like data minimization and purpose specification.
Intellectual property
Web scraping is a common practice used to source publicly available content for developing and operating AI systems. Some of this content may be subject to IP protections and copyright that restricts reuse for commercial purposes, which may include use for training LLMs or when combined with other information to produce outputs from a generative AI system.
Contractual restrictions
Data scientists commonly reuse the existing datasets available to them to test the boundaries of what AI can do. That may include use of stored customer or partner data that is typically restricted for use beyond the purposes of the client relationship, creating potential legal exposures for violating contract terms and conditions.
Sectoral restrictions
Data held by companies in highly regulated sectors such as financial services and health care is subject to additional sectoral requirements for how it can be reused for purposes such as AI development. It is therefore essential to conduct detailed AI risk assessments to ensure compliance with both regulatory and industry requirements and to derisk use of personal data.
Information security
Compromised security of AI systems can lead to a range of harms depending on the use case, including producing incorrect outputs and compromising a company's wider IT infrastructure. For this reason, an organization's existing IT risk and controls framework must be retrofitted for AI security risks.
Proactive approaches to streamline compliance
To address these data challenges, organizations can and should take a proactive and streamlined approach to implementing existing and emerging regulatory frameworks applicable to their data handling operations, e.g., the EU AI Act and recent guidance from the U.S. Department of Justice. With regulatory complexity and enforcement on the rise, particularly in relation to data and technology, organizations may be experiencing compliance fatigue. Failure to adapt can expose companies to significant financial penalties as well as wider organizational risks such as negative impact on brand reputation, customer trust and business revenue.
A good strategy companies can adopt to navigate these regulatory challenges is process integration. Enhancing the company's data governance standards by injecting AI risk management controls into existing corporate governance frameworks can help reduce the burden of compliance on the business and enable responsible innovation. This can be supplemented with additional steps to adapt the organization's approach to evolving requirements. These steps can include:
- Simplifying and automating compliance processes.
- Fostering a culture of compliance through open communication.
- Integrating compliance into daily business operations, supported by leadership.
- Conducting regular reviews and necessary updates to policies and procedures.
- Providing adequate resources and expert knowledge enhancement opportunities through continuous training.
- Facilitating external subject matter support where necessary.
At the enterprise level, an important first step is defining the organization's strategy for investing in AI technology, while also establishing clear roles and responsibilities for managing AI risk across corporate structures. For example, updating data privacy policies and procedures and revising or developing a copyright policy, particularly when general-purpose AI is involved, can help data science teams clearly understand how to use regulated datasets for AI development without introducing new compliance risks.
Operationally, organizations may incorporate AI-specific risk management controls into processes like vendor due diligence, employee training, acceptable use and impact assessments.
In technical and product development, businesses should implement data quality management procedures and update their IT risk and control frameworks to align with AI-related International Organization for Standardization and National Institute of Standards and Technology standards. Implementing data classification and retention labels, as well as associated disposal rules at the system level, can also help businesses enhance their information governance postures and minimize data risk.
Tailoring data governance practices to an organization's specific AI risks, business needs and strategic objectives is critical to establishing a strong foundation for future AI innovation. Doing so will help to drive AI adoption in a compliant manner, in addition to providing a sustainable framework for upholding trust and responsible use over the long term.
Nina Bryant is a senior managing director and Luisa Resmerita, CIPP/A, CIPP/E, CIPM, CIPT, FIP, is a senior director, information governance, privacy and security with FTI Consulting.