In recent years, red teaming has emerged as a primary method for developers of large language models to proactively test systems for vulnerabilities and problematic outputs.

Red teaming looks to soon extend beyond voluntary best practice, however, as governments and regulatory bodies worldwide increasingly turn to the concept as a crucial tool to manage the many risks associated with generative artificial intelligence.

What is red teaming?

Originating in the cybersecurity context, red teaming traditionally involves a team of professionals assuming an adversarial role to identify a system's flaws or vulnerabilities. In the generative AI realm, the process typically entails employing a dedicated "red team" to test a model's boundaries and potential to generate undesirable outputs across various domains.

Red teaming exercises focus on testing a model's propensity to produce harmful, illegal or otherwise inappropriate content — from generating misinformation and graphic images to reproducing copyrighted material or engaging in discrimination.

The red-teaming process frequently involves crafting prompts designed to manipulate a system's behavior, such as overwhelming the system with complex content, feigning a benign need for problematic content, injecting malicious code, or otherwise exploiting its logic to produce unintended outputs.

If a red team uncovers problematic model behavior, developers can then implement technical or policy safeguards to prevent or mitigate the risk of the system responding inappropriately in similar real-world scenarios.

Why regulators are embracing red teaming

Recently, regulators across the globe have honed in on red teaming as an important tool to regulate generative AI systems, due to the technical challenges these models present. While traditional or predictive machine learning systems can be evaluated quantitatively, generative AI systems possess the unique ability to create new content which does not easily lend itself to straightforward assessment.

The models' outputs are also highly context-dependent and influenced by subtle changes in input prompts, making it difficult to anticipate all potential failure modes through traditional risk-assessment methods. To complicate matters further, different systems may incorporate models within widely divergent applications or industries, each with its own unique intricacies and associated risks. Combined, this makes it fairly difficult to put forth a generally applicable regulatory framework.

Although traditional quantitative assessments may not apply, these models can still be tested systematically. The red-teaming approach described above provides an alternative process by which these systems can be evaluated across identified performance measures in a structured and meaningful way.

Recognizing this, government agencies are increasingly turning to regulatory frameworks that rely on independent red teams to assess model-related risks. Rather than put forth quantitative standards, governments can provide a prioritized list of high-risk outcomes or problematic model behaviors and require independent testers to evaluate the systems within those parameters.

This trend, far from a novel approach, mirrors long-standing regulatory practices in other industries that utilize complex technologies where performance measures cannot always be fully captured by statistical testing, such as health care, or autonomous vehicles. In these sectors, regulators routinely use independent testing or audits to identify potential risks in areas requiring judgment and contextual variance.

Global regulatory trends in AI red teaming

Governments and regulatory bodies around the world are actively considering, drafting and, in some cases, have already implemented laws and guidance that would require external red teaming of generative AI systems, particularly for large language models that could pose broader societal risks.

This growing regulatory focus on red teaming is evident in several recent statements, guidelines and proposed regulations.

At the international level, the G7 has called on generative AI developers to employ "independent external testing measures, through … methods … such as red-teaming." Similarly, the Bletchley Declaration, signed by 29 countries attending the 2023 AI Safety Summit, highlights developers' strong responsibility to ensure the safety of their systems through rigorous testing and evaluation measures.

US

In the U.S., the White House executive order on AI heavily emphasizes AI red teaming.

The order defines "AI red-teaming" as a "structured testing effort to find flaws and vulnerabilities in an AI system," often conducted by dedicated "red teams" using adversarial methods.

It directs the U.S. National Institute of Standards and Technology to create guidelines and procedures to enable developers to effectively conduct these AI red-teaming tests. Commercial developers of "dual-use foundation models" — broadly trained general use models that may pose security, economic or health risks — are required to red team their systems in accordance with these forthcoming standards and submit the results to regulators.

Other material published by the White House, including the Blueprint for an AI Bill of Rights and a recent statement from the Office of Science and Technology Policy, also emphasize the importance of external red team testing for key AI risks including bias, discrimination, security and privacy.

Beyond the White House, red teaming is gaining traction among lawmakers and agencies. The proposed Validation and Evaluation for Trustworthy Artificial Intelligence Act in the Senate aims to establish guidelines for AI evaluations and audits, including standards for external auditors conducting red teaming exercises.

On the administrative front, the National Telecommunications and Information Administration has stressed the value of external red team testing in ensuring AI accountability and proposed mandatory independent audits for high-risk AI systems.

NIST's recently published risk management profile for generative AI also encourages red teaming at length, suggesting developers use this form of testing to identify "unforeseen failure modes." The guidelines specifically advise companies to red team for resilience against various attacks, including malicious code generation, prompt injection, data poisoning and model extraction. Red teaming for problematic outputs is recommended, as well, including copyright infringement, demographic inference and exposure of sensitive information.

At the state level, Colorado's recently enacted Artificial Intelligence Act, which imposes a variety of requirements on developers of high-risk AI systems, allows companies to comply with its requirements by demonstrating they engaged in "adversarial testing or red teaming." Meanwhile, a proposed bill in California would mandate generative AI system providers to conduct regular "red-teaming exercises" to test the robustness of watermarks embedded in AI-generated content.

EU

The European Union is also centering red teaming as a key component of its regulatory approach to AI. The EU Artificial Intelligence Act, passed in early 2024, requires "general-purpose AI models" posing systemic risks to undergo rigorous red teaming, or "adversarial testing," throughout the product's life cycle.

Developers must also disclose detailed descriptions of the measures put in place for such testing. Given the EU's leadership role in regulating emerging technologies, these requirements are likely to influence other evolving efforts to regulate AI globally.

China

In China, while AI laws do not explicitly mention red teaming, several allude to it by requiring extensive evaluation and testing of AI systems.

The Provisions on the Administration of Deep Synthesis Internet Information Services mandate periodic examination, verification, assessment and testing of algorithmic logic for deep learning systems that have generative capabilities.

The recently enacted Basic Safety Requirements for Generative Artificial Intelligence Services, which prohibits AI systems from engaging in various harmful behaviors, requires developers to implement safety testing and evaluations to monitor compliance.

Although not explicitly stated, comprehensively testing generative AI systems in accordance with these laws likely necessitates some form of adversarial testing or red teaming.

Red teaming guidance from other nations

Other countries have put forward guidance proposing similar measures. The U.K.'s National Cyber Security Centre has emphasized the importance of red teaming as part of a broader AI security strategy.

More recently, the Department for Science, Innovation and Technology developed a voluntary Code of Practice which suggests developers engage in red teaming, preferably using independent external testers, to evaluate their AI models.

Finally, Canada, currently in the process of developing a comprehensive regulatory framework for AI, has also put forward a voluntary Code of Conduct which it recommends developers of general-purpose generative AI apply "in advance of binding regulation." Among other things, the framework suggests employing "adversarial testing (i.e., red-teaming) to identify vulnerabilities" in AI systems.

While neither the U.K.'s nor Canada's guidelines are legally binding, they provide insight into how lawmakers in these countries are approaching AI governance, and hint at potential red teaming requirements that may be implemented in the coming years.

The future of oversight for generative AI

Ultimately, it is becoming increasingly clear that red teaming will play a pivotal role in regulatory efforts moving forward.

Governments and regulatory bodies worldwide clearly recognize the value of adversarial testing as a regulatory tool for mitigating AI risks, and it appears likely more countries will mandate this type of testing in the coming years. As these requirements take shape, it will be crucial for regulators to establish best practices and accreditation processes to ensure red teaming develops the same credibility and depth as other external assessment and auditing processes.

There is already some emerging consensus around process and structure for this type of testing. Considering this development, companies developing and deploying AI should proactively establish comprehensive processes for red teaming their systems.

To ensure the most effective and unbiased evaluation, developers should engage independent, third-party testers to conduct these red teaming exercises, especially since regulatory frameworks will likely mandate as much in the near future.

As the regulatory landscape continues to evolve, those who have already established strong red teaming practices will be well-positioned to navigate these forthcoming requirements.

Andrew Eichen is an associate at Luminos.Law.