Editor's note: The IAPP is policy neutral. We publish contributed opinion and analysis pieces to enable our members to hear a broad spectrum of views in our domains.

We've come a long way in understanding large language models — but beneath the surface, they're still black boxes wrapped in probabilities and good intentions. The simplified notion that these systems are merely statistical engines predicting the next token masks a complex mechanism with characteristics that are still not fully understood.

Hallucination is one such feature — even the term itself is the subject of ongoing debate.

Regardless of the controversies, hallucination is commonly used to describe instances where an artificial intelligence system generates false, misleading or fabricated information that diverges from the original source content. More than that, it's often seen as a digital typo — something to patch, not ponder.

Not just bugs, signals of computation

Recent studies suggest hallucinations may not be mere bugs, but signatures of how these machines "think." This assertion comes from the idea that LLMs, as computable functions, cannot learn all computable ground truth functions. This inherent limitation ensures that some inputs will inevitably result in imperfect or inaccurate outputs, leading to deviations or "hallucinations."

Furthermore, the very architecture of transformers can be manipulated to produce specific, predefined — and potentially false — tokens by perturbing the input sequence, even with nonsensical prompts. This suggests that the capacity to generate divergent or fabricated information is not just a failure of learning specific data, but a characteristic tied to the model's operational mechanics and its inherent limits in perfectly mapping the vast space of language and knowledge.

Implications for AI governance

This challenges the traditional assumption that hallucinations can be entirely "fixed" or "trained away," and brings important considerations for AI governance programs. If LLMs can be actively prompted to produce incorrect or fabricated outputs, governance shifts from a focus solely on data quality or algorithmic robustness to a more nuanced challenge in security and risk management.

Instead of focusing solely on refining training datasets or tweaking model architectures post-deployment, organizations must embed a security-by-design philosophy from the inception of any LLM-powered system. This involves not just technical safeguards and robust input sanitization, but also a clear understanding of the attack surface this "hallucination as a feature" might expose.

Such vulnerabilities become particularly acute when LLMs interact with external data sources or are chained with other automated processes where a manipulated output could trigger cascading failures with significant operational or reputational impact.

It becomes essential to not only anticipate and mitigate the potential for deliberate or adversarial failure, but also to implement prompt validation, ongoing output filtering and human-in-the-loop oversight in critical processes.

AI governance frameworks, such as those developed by the U.S. National Institute of Standards and Technology or shaped by regulations like the EU's AI Act, must be implemented with this characteristic in mind. Algorithmic impact assessments, for example, should evaluate not only bias and accuracy under normal conditions, but also the model's resilience against attempts to induce hallucinations.

Regulatory transparency requirements also take on new significance. It is not enough to explain how the model is expected to behave; organizations must clearly communicate its limitations and the potential for unexpected or induced behaviors, enabling users and operators to make informed decisions.

Furthermore, the role of red teaming and adversarial testing becomes even more critical. These exercises, which simulate real-world attack scenarios by attempting to exploit known and novel vulnerabilities, should specifically aim to probe the system's susceptibility to induced hallucinations, moving beyond standard accuracy and bias checks.

The insights gained from such targeted testing are invaluable for understanding the practical limits of the deployed LLM, informing proactive risk mitigation strategies and ensuring human oversight mechanisms are designed to catch these more nuanced and potentially deceptive failure modes.

Liability for harm caused by hallucinated outputs adds further complexity, challenging existing legal frameworks for product safety and negligence. If hallucination is an intrinsic feature of LLMs, it is harder to calibrate the balance between what should be considered a product "defect" for which a developer might be strictly liable and what is an "expected behavior" within the current technological limitations.

Beyond technical and regulatory frameworks, addressing the "hallucination as a feature" challenge also necessitates a profound shift in organizational culture and user education. Teams developing and deploying LLM-based solutions must cultivate a healthy skepticism and understanding of these inherent limitations, moving away from a plug-and-play mentality.

Culture, context and curiosity

Similarly, end-users need to be educated about the probabilistic nature of LLM outputs and the potential for even highly coherent responses to be fabricated. This literacy is crucial for fostering responsible interaction and mitigating the risks of over-reliance on unverified AI-generated content, especially as these systems become more pervasive.

Recognizing hallucination as a feature of LLMs compels us to design more resilient systems, establish clear operational boundaries and strengthen human-machine collaboration. In this new paradigm, effective AI governance is that which anticipates, safeguards and adapts continuously — aligning innovation with responsibility and safety.

Henrique Fabretti Moraes, CIPP/E, CIPM, CIPT, CDPO/BR, FIP, is a partner at Opice Blum.