Why ChatGPT makes up facts and writes nonsense — Scientists explain

A new study by OpenAI explains why ChatGPT and other artificial intelligence language models sometimes make up facts. This phenomenon is known as hallucinations. The article explains that this is not just a training error but a mathematically inevitable effect of text generation. Even with perfect data, models make mistakes because of how they predict words and accumulate errors. The Conversation explains why AI chats invent facts and mislead users.
Mathematical nature of hallucinations
The study shows that the hallucination problem is not just a side effect of current training algorithms but a mathematically inevitable phenomenon. Even perfect training data cannot fully eliminate it.
The main reason is the way language models generate text. They predict one word at a time, based on probabilities.
This means errors accumulate throughout a sentence, and the error rate in long answers can be at least twice as high as in simple yes/no questions.
Impact of training data
The less information a model sees during training, the more likely it is to make mistakes.
For example, if only 20% of celebrities' birthdays appear once in the training data, the model will get at least 20% of birthday queries wrong.
A real example: DeepSeek-V3 gave three different incorrect birthdates for one of the article's authors across several attempts.
Evaluation trap
Researchers show that current AI benchmarks, including those by Google and OpenAI, encourage models to guess rather than admit uncertainty.
When a model answers "I don't know," it gets the same score as if it were wrong. Because of this, the optimal strategy is always to guess, which fuels hallucinations.
Possible solution
OpenAI suggests that models should take their own confidence into account before publishing a response. For example, only answer when confidence is above 75%, since mistakes are penalized more heavily than correct answers.
Mathematical analysis shows that this would allow models to express uncertainty instead of guessing, reducing hallucinations..
However, for users accustomed to confident replies, such behavior may feel inconvenient: if ChatGPT starts saying "I don't know" in even 30% of cases, audiences could be frustrated.
Computational economics
Implementing uncertainty-aware approaches requires significantly more computation. For systems handling millions of queries daily, this means much higher operating costs.
Active learning, where the model asks clarifying questions, can reduce mistakes but puts even more strain on computing resources.
In critical fields such as finance, supply chains, and medicine, the added computational costs are justified because hallucinations are costly.
In consumer apps, where people expect instant replies, business incentives drive models to sound overly confident, and that keeps the problem alive.
Uncomfortable truth
The OpenAI article emphasizes a fundamental contradiction: business incentives reward speed and confidence, not accuracy. Until those incentives change, hallucinations in language models will remain inevitable.