Hallucination and false confidence: not a bug, a property to manage

In 2023, an American lawyer submitted a legal brief to the court, prepared with ChatGPT. The brief cited several court decisions in support of its arguments. Some decisions didn’t exist. They had been invented by the template, with plausible judge names, plausible case numbers, plausible wording. The lawyer was sanctioned. The case became the canonical example of hallucination with real consequences.

This case is not an exception. It’s an illustration of normal LLM behavior in a context where factual truth matters.

Why it’s structural

An LLM predicts the next token. The prediction is based on statistical patterns in the training data. When it generates a justice decision, it generates what looks like a justice decision. The form is correct. Judge names are plausible. The dates are in the right format. But if that particular decision wasn’t in the training data, the model can’t know it. It invents something plausible.

This mechanism is not a bug. It is inherent to statistical prediction. The model has no internal signal to distinguish what it “knows” from what it “invents”. It generates with the same degree of fluidity in both cases.

Hallucinations in LLM are defined as outputs that seem plausible but are factually incorrect. They are classified into intrinsic hallucinations (contradictions with the source provided) and extrinsic hallucinations (unverifiable or incorrect statements not linked to a source).

Survey of Hallucination - Ji et al. (2022)' sourceUrl='https://arxiv.org/abs/2202.03629' date='2022-02-08

Recent models hallucinate less often than 2022 models. Alignment techniques (RLHF, Constitutional AI) and grounding (RAG, anchoring on source documents) reduce the frequency. They do not eliminate it.

Failure to recognize errors: the second problem

Beyond hallucination lies complacency. An LLM tends to align its responses with the user’s implicit expectations. If you say to it “I think X is true, what do you say?”, it tends to confirm X, even if X is incorrect.

Models trained with RLHF show systematic sycophantic behavior: they modify their responses to match users’ perceived preferences, even when this involves asserting factually incorrect things.

Sycophancy in Language Models - Perez et al., Anthropic (2022)' sourceUrl='https://arxiv.org/abs/2212.09251' date='2022-12-19

This behavior is a consequence of human feedback training. Annotators reward answers that please them. The model learns to please. If you test a model by presenting it with a false hypothesis and seeking confirmation, you often get confirmation. That’s not understanding. It’s optimization of your expectations.

What it means in production

Hallucinations are not a problem to be avoided. It’s a property to be managed. Tolerance levels will vary according to your particular situation.

High tolerance (errors are easily detected and corrected): generation of first draft content, brainstorming, non-critical synthesis. The model can hallucinate without serious consequences if a human proofreads.

Low tolerance (errors have real consequences): contractual data extraction, regulatory analysis, security code generation, medical diagnostics. In these contexts, you need :

An automated verification mechanism (the model must cite its sources, and the sources must be verifiable).
Human review of risk cases
A calibration test on your data before deployment

Zero tolerance: signing legal documents, final medical decisions, financial orders. The AI does not sign, decide or validate. It assists. The decision remains human, with responsibility.

The rule is simple: the higher the cost of an undetected error, the more robust your validation architecture needs to be - and the more it costs, which needs to be factored into your budget.

Why it’s structural#

Failure to recognize errors: the second problem#

What it means in production#

Related

Works in the demo, costs in production

Your data and trade secrets in AI: what leaves, what stays

Why it’s structural

Failure to recognize errors: the second problem

What it means in production