Why Do Large Language Models Hallucinate? OpenAI's Latest Research Uncovers the Reasons

OpenAI has clarified the issue of large language model (LLM) hallucinations. OpenAI recently published a research paper deeply analyzing the roots of LLM hallucinations, pointing out that current mainstream training and evaluation systems are one of the core driving factors behind the problem.

Paper: https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4aaa5/why-language-models-hallucinate.pdf

The research suggests that current evaluation standards implicitly reward models for guessing, rather than encouraging them to acknowledge the limits of their knowledge when faced with uncertainty. Hallucinations originate from pre-training, stemming from "next word prediction." Hallucinations are not a mysterious phenomenon; their statistical mechanisms and how they are rewarded within existing evaluation systems can all be understood.

Image

The Nature of Hallucinations

Hallucinations refer to statements generated by language models that appear plausible but are factually incorrect. This phenomenon can occur even when dealing with simple questions.

The paper provides an example: when a widely used chatbot was asked about the PhD thesis title of Adam Tauman Kalai, one of the paper's authors, it confidently provided three completely different and incorrect answers. Similarly, when asked about his birthday, it also gave three different incorrect dates.

The "Test-Taking Trap" of Evaluation Systems

The study points out that hallucinations are difficult to eradicate largely because evaluation methods set incorrect incentive directions. Most evaluation systems use accuracy as a core metric, which encourages models to guess rather than honestly express uncertainty.

This can be likened to a multiple-choice exam: if a student encounters a question they don't know, guessing might earn them a lucky point, but leaving it blank guarantees zero. Similarly, when models are scored solely on the percentage of questions they answer correctly, they are trained to prefer guessing over saying "I don't know."

For example, if a model is asked for a birthday it doesn't know, guessing September 10th has a 1/365 chance of being correct. However, if it answers "I don't know," the score is zero. In countless tests, models that habitually guess will achieve higher scores on leaderboards than those that cautiously admit uncertainty.

Data Substantiation: High Accuracy Does Not Mean Low Error Rate

To illustrate this point, the paper references data from the GPT-5 system card regarding SimpleQA evaluation, comparing two models:

Metric Comparison

Metric | gpt-5-thinking-mini | OpenAI o4-mini (Older Model)

Refusal Rate (No specific answer) | 52% | 1%

Accuracy (Correct answer) | 22% | 24%

Error Rate (Incorrect answer, i.e., hallucination rate) | 26% | 75%

Total | 100% | 100%

The data shows that the older OpenAI o4-mini model had a slight edge in accuracy (24% vs 22%). However, this came at the cost of a staggering 75% error rate (hallucination rate). This clearly demonstrates that while strategic guessing can marginally improve accuracy, it leads to a catastrophic increase in the error rate.

Nevertheless, industry leaderboards, which are generally accuracy-driven, incentivize developers to build models that are more prone to risky guessing. This explains why the problem of model hallucinations persists even as technology advances.

The Origin of Hallucinations: From "Next Word Prediction"

Where do these highly specific factual errors originally come from? The research points out that the root lies in the model's pre-training method. Language models learn by predicting the next word in vast amounts of text. During this process, the data itself lacks "true/false" labels, so the model can only learn the fluent patterns of language.

Spelling, grammar, etc., follow strong, consistent patterns, so as model scale increases, these types of errors decrease. However, low-frequency, arbitrary facts, such as someone's birthday, lack predictable patterns in text. Models cannot infer such facts solely from context, so when asked, they can only generate based on statistical probability, leading to hallucinations.

Five Common Misconceptions About Hallucinations

Based on the above analysis, the paper clarifies five common misconceptions about hallucinations:

Misconception One: Hallucinations are unavoidable.

Research Findings: This is not the case. Language models can choose to refuse to answer when uncertain, thereby avoiding hallucinations.

Misconception Two: Hallucinations can be eliminated as long as accuracy reaches 100%.

Research Findings: Accuracy can never reach 100%. This is because in the real world, there are always some questions that are inherently unanswerable or have insufficient information.

Misconception Three: Avoiding hallucinations requires extremely high intelligence, achievable only by large models.

Research Findings: For smaller models, recognizing their own limitations is actually easier. A model unfamiliar with a certain domain can easily say "I don't know," whereas a model with some knowledge requires more complex computations to assess the confidence of its answer.

Misconception Four: Hallucinations are a mysterious technical glitch in language models.

Research Findings: Hallucinations are not a mysterious phenomenon. Their statistical mechanisms and how they are rewarded within existing evaluation systems can all be understood.

Misconception Five: The problem can be solved simply by having a good hallucination evaluation standard.

Research Findings: Even with dedicated hallucination evaluation standards, their impact is negligible compared to hundreds of traditional accuracy-focused evaluation standards.

Future Direction: Reforming the Evaluation System

The ultimate conclusion of the research report is that the key to solving the hallucination problem lies in a fundamental reform of existing evaluation systems. Researchers advocate that new evaluation standards should impose a heavier penalty for confidently incorrect answers than for admitting uncertainty. Only when the "scoring rules" of the entire industry change can developers truly be incentivized to adopt technologies and strategies that reduce hallucinations.

Reference: https://openai.com/index/why-language-models-hallucinate/

Main Tag:Large Language Models

Sub Tags:AI HallucinationMachine LearningEvaluation MetricsOpenAI Research


Previous:Google's nano-banana Model Takes the Crown: How MLLMs Solve Image Tasks? A Deep Dive from 3 Dimensions

Next:Stanford Proposes New RL Paradigm: 3B Model Agent Outperforms Claude, GPT-4

Share Short URL