OpenAI says GPT-5 hallucinates less — what does the data say?

OpenAI says GPT-5 hallucinates less — what does the data say?
By: Mashable Posted On: August 07, 2025 View: 7

Web access is key here.
 By 
Cecily Mauran
 on 
gpt-5 logo on a phone screen with the openai logo in the background
GPT-5 hallucinates less than previous models, but by how much? Credit: CFOTO / Future Publishing / Getty Images

OpenAI has officially launched GPT-5, promising a faster and more capable AI model to power ChatGPT.

The AI company boasts state-of-the-art performance across math, coding, writing, and health advice. OpenAI proudly shared that GPT-5's hallucination rates have decreased compared to earlier models.

Specifically, GPT makes incorrect claims 9.6 percent of the time, compared to 12.9 percent for GPT-4o. And according to the GPT-5 system card, the new model’s hallucination rate is 26 percent lower than GPT-4o. In addition, GPT-5 had 44 percent fewer responses with “at least one major factual error.”

While that's definite progress, that also means roughly one in 10 responses from GPT-5 could contain hallucinations. That's concerning, especially since OpenAI touted healthcare as a promising use case for the new model.


How GPT-5 reduces hallucinations

Hallucinations are a pesky problem for AI researchers. Large language models (LLMs) are trained to generate the next probable word, guided by the massive amounts of data they're trained on. This means LLMs can sometimes confidently generate a sentence that is inaccurate or pure gibberish. One might assume that as models improve through factors like better data, training, and computing power, the hallucination rate decreases. But OpenAI's launch of its reasoning models o3 and o4-mini showed a troubling trend that couldn't be entirely explained even by its researchers: they hallucinated more than previous models, o1, GPT-4o, and GPT-4.5. Some researchers argue that hallucinations are an inherent feature of LLMs, instead of a bug that can resolved.

Mashable Light Speed
Want more out-of-this world tech, space and science stories?
Sign up for Mashable's weekly Light Speed newsletter.
By clicking Sign Me Up, you confirm you are 16+ and agree to our Terms of Use and Privacy Policy.
Thanks for signing up!

That said, GPT-5 hallucinates less than previous models according to its system card. OpenAI evaluated GPT-5 and a version of GPT-5 with additional reasoning power, called GPT-5-thinking against its reasoning model o3 and more traditional model GPT-4o. A significant part of evaluating hallucination rates is giving models access to the web. Generally speaking, models are more accurate when they're able to source their answers from accurate data online as opposed to relying solely on its training data (more on that below). Here are the hallucination rates when the models are given web-browsing access:

  • GPT-5: 9.6 percent

  • GPT-5-thinking: 4.5 percent

  • o3: 12.7 percent

  • GPT-4o: 12.9 percent

In the system card, OpenAI also evaluated various versions of GPT-5 with more open-ended and complex prompts. Here, GPT-5 with reasoning power hallucinated significantly less than previous reasoning model o3 and o4-mini. Reasoning models are said be more accurate and less hallucinatory because they apply more computing power to solving a question, which is why o3 and o4-mini's hallucination rates were somewhat baffling.

Overall, GPT-5 does pretty well when it's connected to the web. But the results from another evaluation tell a different story. OpenAI tested GPT-5 on its in-house benchmark, Simple QA. This test is a collection of "fact-seeking questions with short answers that measures model accuracy for attempted answers," per the system card's description. For this evaluation, GPT-5 didn't have web access, and it shows. In this test, the hallucination rates were way higher.

  • GPT-5 main: 47 percent

  • GPT-5-thinking: 40 percent

  • o3: 46 percent

  • GPT-4o: 52 percent

GPT-5 with thinking was marginally better than o3, while the normal GPT-5 hallucinated one percent higher than o3 and a few percentage points below GPT-4o. To be fair, hallucination rates with the Simple QA evaluation are high across all models. But that's not a great consolation. Users without web search will encounter much higher risks of hallucination and inaccuracies. So if you're using ChatGPT for something really important, make sure it's searching the web. Or you could just search the web yourself.

It didn't take long for users to find GPT-5 hallucinations

But despite reported overall lower rates of inaccuracies, one of the demos revealed an embarrassing blunder. Beth Barnes, founder and CEO of AI research nonprofit METR, spotted an inaccuracy in the demo of GPT-5 explaining how planes work. GPT-5 cited a common misconception related to the Bernoulli Effect, Barnes said, which explains how air flows around airplane wings. Without getting into the technicalities of aerodynamics, GPT-5's interpretation is wrong.

Mashable Image
Cecily Mauran
Tech Reporter

Cecily is a tech reporter at Mashable who covers AI, Apple, and emerging tech trends. Before getting her master's degree at Columbia Journalism School, she spent several years working with startups and social impact businesses for Unreasonable Group and B Lab. Before that, she co-founded a startup consulting business for emerging entrepreneurial hubs in South America, Europe, and Asia. You can find her on X at @cecily_mauran.

Read this on Mashable
  Contact Us
  • Bootjack Ca.
  • info@mariposafire.com
  Follow Us
Site Map
Get Site Map
  About

MariposaFire, is a Mountain community Fire information page . We aren't endorsed or part of County Fire or any Government Entity.