Responsible AI governance shouldn't start — much less end — at legal compliance

Long, long ago, at least in modern machine learning years, companies were starting to implement increasingly complex predictive AI models into their business operations and services. Companies struggled to understand these systems — how to assess the risks involved and what to do to manage them in ways that would allow them to benefit from their business value while also preventing harm to customers, public embarrassment or legal liability.

Syrenis ad, a privacy professional's AI checkilist

There was no clear path, no AI-specific regulation or law, and not even any real best practices to follow.

In that uncertainty, many lawyers and data scientists, as well as academics and other professionals, waded in to define what "responsible AI governance" might look like. Some of them ended up working on a grant with the U.S. National Institute of Standards and Technology — including the founders of the law firm that is now the AI Division of ZwillGen. Thus, the NIST AI Risk Management Framework was born.

While not a compliance standard, this framework, which was later expanded to include generative AI, has become the default in AI management, at least within the U.S. A similar, risk-based framework was developed in Singapore; the International Organization for Standardization developed an AI-specific standard for responsible governance.

All these systems are ways for companies to assess the risk to themselves of using any system. While all acknowledge that legal requirements should be one factor in that assessment, none start or end there.

Through a greatly simplified lens, this historical sequence provided companies with a way to handle the risks of AI in a structured and comprehensive fashion. As the NIST AI RMF acknowledges, "Responsible AI practices can help align the decisions about AI system design, development, and uses with intended aim and values. … AI risk management can drive responsible uses and practices by prompting organizations and their internal teams who design, develop, and deploy AI to think more critically about context and potential or unexpected negative and positive impacts."
Risk is defined as the likelihood for an identified harm to occur times the severity of the harm if it did occur. While legal impacts are part of the "severity" measure, they are not the whole of it. Every company must assess the risk of every AI system independently based on its own industry, values, business obligations and jurisdictional or regulatory requirements.

Risk tolerance is specific to each company and so are risk assessment techniques and preferred risk management strategies. Even when an AI system doesn't meet a particular statutory "high-risk" definition, businesses should determine that the system falls within their level of risk tolerance — that it works as expected, that it is fair enough for its context, and that it won't create avoidable harm or reputational damage. This is standard risk management.

'Just follow the law'

Two key laws, the EU AI Act and the Colorado AI Act, follow this risk-based approach. Because they are statutes and therefore must have defined scope, these laws draw clear lines around certain "high-risk" use cases. Those lines are useful, but they are not exhaustive.

The laws create a focus on categories that regulators prioritize and enforce. They do not attempt to say that these categories are the extent of the risks businesses may face by using other AI systems. Sources of business exposure go beyond this: bad decisions, overlooked failure modes, biased impacts, misleading claims, avoidable security incidents, lost customer trust and other litigation risk that doesn't depend on "AI" being in the statute.

The spike in "compliance only" thinking is understandable. Thresholds are comforting. But thresholds are not the full landscape. If the only question is "Does this trigger AI law X?," businesses will miss risks that are obvious to their users, employees, business partners and the press.

The risk-based AI frameworks that pre-date various AI laws all assume what organizations with good risk management systems already know: assess context, review data, consult stakeholders, and consider failure modes; test and monitor; manage and respond to issues; and document what was done.

Even if there were no AI laws — as was the case until very recently — important risks still exist that should define responsible governance approaches. Consider operational risk: If a system has not been ensured via sufficient and appropriate testing to perform its expected function in a reliable, understandable, and predictable way, then why would it be embedded into business operations?

Or evaluate reputational and ethical risk: Perception can sometimes be more significant than complex interpretations of data. If people experience a system as unfair or unreliable, legal thresholds won't save the day.

AI systems are also subject to other laws and regulations including, at minimum, employment anti-discrimination laws, consumer protection rules, privacy frameworks, product liability standards and advertising regulations. In the years before AI laws were passed, the first conversation with clients started with "how does your current regulatory environment apply, now that you are using AI?"

Use cases that may not hit statutory 'high-risk,' but still deserve oversight

Risk tolerance is inherently subjective and determined by individual companies, even those in the same industry. So is risk mitigation: Can they transfer the risk, insure against it, or otherwise revise the context in some way?

The point of the risk-management approach is for companies to determine what the potential harms are from any given AI system using their own selected, structured, and established criteria — and thus assess the risks of deploying that system. This may or may not directly align with legal AI act "high-risk" examples.

We see examples of these kinds of systems every day with clients.

Employee feedback, performance and coaching tools. They may not necessarily be part of a significant employment decision so as to directly trigger AI or employment laws, but they do shape evaluations and opportunities. Businesses can be sure employees are going to be watching them for poor, unfair or unreliable outcomes.

Customer sentiment and quality assurance scoring. These tools may influence employee compensation, drive staffing decisions, decide escalation for certain cases, and more broadly inform business strategy. If they are inaccurate or inconsistent, they can lead to unhappy employees, poor business decisions and reduced customer trust.

Customer-facing chatbots. While most chatbots are unlikely to be considered high-risk under AI laws, a chatbot that spews toxic or unsafe content will frustrate customers and grab unwanted attention, such as viral screenshots. Uncorroborated false statements by a chatbot about its own company or disparaging statements about competitors can even lead to regulatory scrutiny and litigation risk.

The bottom line: If a tool shapes decisions, such as who gets attention, how people are judged, or how customers are responded to, then zero oversight is rarely the right answer. Businesses should validate such systems even if the program never autonomously decides anything on its own.

A practical, risk-informed governance pattern

It is certainly true that not every system needs "all" the testing, oversight and documentation. This would not be feasible for any business. To develop a simple governance process that goes beyond legal high-risk categories, either dive into one of the mentioned frameworks or start with the basics.

First, assess the risk level for the system. Define the full use case for context and determine what is at stake by identifying who is affected, what decisions are influenced, and what the worst-case outcome might look like.

Consider regulatory exposure if it already falls into a statutory high-risk category but also look at business exposure. Could it cause material harm, bias, reputational damage or operational disruption? This initial risk triage allows companies to assign resources appropriately.

In a nutshell:

Low-risk systems require basic functional testing and documentation outlining their intended use and limitations.
Medium-risk systems should undergo performance testing and monitoring, include targeted fairness checks, and have a rollback plan in place.
High-risk systems demand extensive pre-deployment validation, red teaming exercises, human-in-the-loop controls, regular post-deployment testing and continuous monitoring, an established incident response plan, and independent review mechanisms.

Next, prove that it works. There are model cards and other approaches available, but they all largely address the same content. Document what datasets were used, test protocols, metrics, cohort-level results, known limitations, and mitigations accepted or deferred — with rationale. Establish acceptable performance drift thresholds and incident triggers.

Communicate honestly. Disclose capabilities and limits to users and decision-makers; avoid overstated claims; and align marketing with technical reality. Reassess at every major change or upgrade. A new model, new data source, or new context means a new risk review.

Does it work "fairly?" This is not only a legal discrimination question. Inequitable outcomes can take place across any factor or characteristic, not just those protected by civil rights law. In either case, organizational operational and reputational priorities will be impacted just as much as compliance.

Define the relevant groups for the particular system's context — for example, geography, tenure, language, device type, and protected classes when appropriate and lawful to assess. Compare error and outcome distributions across these groups using metrics matched to the task, like precision/recall, false-positive/negative rates, and calibration.

Clearly document tradeoffs. If improving recall raises false positives, explain why the business accepts that trade-off in this use case and how it will be monitored.

And finally, add human review where appropriate. This means assigning clear responsibility and making the reviewer accountable, not a rubber stamp.

Conclusion

Artificial intelligence governance laws create certain categories where assessment, testing, and documentation are mandatory. They certainly matter. But these thresholds cannot define the total risk posture for a company building, buying or operating AI in ways integral to its operations.

Best practice for any consequential system remains, for AI models as for anything else, to show that it works for its intended use, to monitor it in the real world, and to fix issues before they become harms.

The need to understand where and how these systems impact their overall operations hasn't changed. Although there are now laws that apply more directly, these should be seen as adding to governance requirements, not replacing them.

Business executives now and then prefer to respond to cost of delay, cost of error, and cost of remediation in the form of prevention. Governance isn't unnecessary bureaucracy; it's how to move faster without breaking the things that matter.

Brenda Leong, AIGP, CIPP/US, is director and Jey Kumarasamy is legal director of the AI Division at ZwillGen.

This article is eligible for Continuing Professional Education credits. Please self-submit according to CPE policy guidelines.

Submit for CPEs

Interested in writing for us? Visit our Contributor Guidelines Page

Responsible AI governance shouldn't start — much less end — at legal compliance

Related stories

'Just follow the law'

Use cases that may not hit statutory 'high-risk,' but still deserve oversight

A practical, risk-informed governance pattern

Conclusion