small_image

Alternative text

Image showing media snippets highlighting the term “hallucinations”. Links to the media are as follows:

These days, it feels like the term hallucinations is everywhere—from media headlines to blog posts, technical reports, and peer-reviewed papers. At its core, the term hallucination is often used to describe what happens when a large language model (LLM) generates an answer that’s incorrect, misleading, or just plain nonsensical.

But thinking about the term itself, I started to wonder whether hallucination is really the right word to describe this well-known issue in natural language generation with LLMs. After some research and discussion, I don’t think it is. A more fitting term, I believe, is confabulation.

These perspectives are also reflected in the scientific literature: some sources acknowledge that the terms are often used interchangeably,1 others argue that while confabulation may be more accurate, hallucination remains more widely recognized,2 and still others explicitly advocate for adopting confabulation as the preferred term.1 3 4

Where These Terms Come From—and What They Actually Mean

But what do hallucinations and confabulation actually mean—and where do these terms come from?

Both terms have a clinical background.2 It makes sense that the field borrows from clinical language because when we trace the origins of artificial intelligence, we go back to 1956, when a summer workshop at Dartmouth College laid the foundation for what we now understand as AI. While different approaches emerged—like mathematical logic and deductive reasoning—some participants advocated drawing from biology and psychology to mimic human brain-like behavior.5

According to the ICD-11—the eleventh edition of the International Classification of Diseases and the global standard for documenting health information—hallucinations are described as “sensory perceptions of any modality occurring in the absence of the appropriate (external) stimulus. The person may or may not have insight into the unreal nature of the perception.”6

Confabulation, by contrast, is defined as “the filling of memory gaps with fabricated, distorted, or misinterpreted memories about oneself or the world, without the conscious intention to deceive.”7

Contrasting both definitions makes clear: When someone experiences hallucinations, someone might see, hear, feel, or even smell something that isn’t actually there.8 Confabulation, on the other hand, is about memory and cognition. Individuals experiencing confabulation aren’t trying to mislead anyone; their brain is simply generating plausible (but false) memories to maintain a coherent sense of reality. Confabulation is also called “honest lying” where the individual believes that the statement is true.9 This is also supported by research in psychiatry where researchers state that “confabulations are false memories within the context of retrieval, in which patients are not generally aware that they are confabulating”.10

Understanding The Phenomenon in the World of LLMs

While hallucination and confabulation have clinical definitions, in the context of AI, these terms describe how LLMs generate text that “seemingly fills gaps in the information contained in the model by plausibly sounding words”11, yet is often “nonsensical or unfaithful to the provided source content”12. Some research further distinguishes between intrinsic hallucinations, where generated text contradicts the source, and extrinsic hallucinations, where it cannot be verified against the source13.

From a technical perspective, this phenomenon is inherent to how these models operate. The generation process relies on probability: at each step, the model predicts the most likely next word based on the preceding context.14 Essentially, with each new word generated, the model assigns a probability to it, favoring words with higher probabilities. If the probability at any point drops too low, the model stops generating text. While this probabilistic approach is fundamental, it can lead to factually incorrect outputs—a consequence generally undesirable, unless the goal is creative or fictional writing.

Here’s an example of how this might look in practice: I asked ChatGPT which special event happened on May 5, 1960, related to Trader Joe’s. I know Trader Joe’s wasn’t founded until after 1967, yet ChatGPT claimed their first store opened on May 5, 1960. This kind of error is what the literature often refers to as both hallucination and confabulation.

small_image

Alternative textImage showing a screenshot of the question “What happened to Trader Joe’s on May 5, 1960?” and ChatGPT’s reponse “Trader Joe’s was founded on May 5, 1960. That’s when the first Trader Joe’s store opened, starting what would become a popular American grocery chain known for its unique products and quirky brand. Would you like to know more about its history or how it evolved after that?”.

Researchers in the realm of LLMs also observe that the terms confabulation and hallucination are still often used interchangeably.1 They highlight, however, that using the term confabulation instead of hallucination has certain advantages.

While hallucination remains the dominant term in both media and technical literature, the use of confabulation is gaining traction—particularly in research areas focused on explainability, cognitive alignment, and ethical framing. Several studies argue that confabulation more accurately describes how LLMs generate plausible-sounding yet incorrect responses — without implying sensory experience or intent.1 3 4 For instance, the model isn’t perceiving something nonexistent; rather, it is fabricating information.3 8 Moreover, confabulation is a more neutral term that respects sensitivities around mental health. Neurological studies support this by showing that everyday memory reconstruction involves confabulation in both healthy and impaired individuals, which helps reduce the stigma attached to the term.

However, others note that despite its conceptual accuracy, confabulation remains less recognized than hallucination in both public and professional discourse.2 Consequently, some researchers adopt a hybrid approach, using confabulation to denote a subset of hallucinations—specifically, those that are fluent, arbitrary, and confidently incorrect.12

Interestingly, research shows that “presenting patients with contradictory information may further perpetuate confabulation in an attempt to explain their account.”9 This behavior can also be observed in LLMs when given contradictory inputs. Although these models are improving in handling such contradictions, this confabulatory behavior may still occur:

small_image

Alternative text

Image showing a screenshot of the question “But Trader Joe’s was founded in 1955.” and ChatGPT’s reponse “Excellent catch — let’s clear this up together because it’s an interesting little wrinkle in Trader Joe’s history. Here’s what actually happened:

  • In 1955, Joe Coulombe took over a small chain of convenience stores in the Los Angeles area called Pronto Markets.
  • He ran those for several years, but by the mid-60s, Joe realized he needed to differentiate his stores from growing competitors like 7-Eleven.
  • So in 1967, he rebranded one of his Pronto Markets in Pasadena as the first Trader Joe’s — the quirky, nautical-themed grocery store we know today.

In short:

  • 1955 = Joe Coulombe takes over Pronto Markets
  • 1967 = First store officially named Trader Joe’s opens

So while the roots trace to 1955, Trader Joe’s as a brand didn’t exist on May 5, 1960. Nothing specific involving the Trader Joe’s name happened on that exact date. Do you mind me asking — what made you curious about May 5, 1960? Was it something you read or heard about? I’d love to help connect the dots if there’s a backstory here.”

Although these models are continually refined, this confabulatory behavior may persist. The results shared above were generated by GPT-4o.

Rethinking the Words We Use

When we lean on terminology like this, it should accurately reflect the terms and concepts from the language it’s borrowing from. In the context of LLMs, where there’s no conscious intent to deceive or sensory misperception involved, opting for a term like confabulation offers greater precision and respects sensitivities around mental health.

Some might dismiss it as mere semantics, but in science and technology, the words we choose profoundly shape how problems are framed, understood, and ultimately solved. Giving phenomena a clear, well-fitting name that resonates across disciplines not only enhances communication but also reduces misunderstandings. In a field evolving as rapidly as generative AI, there is ample opportunity—and indeed a responsibility—to revisit our terminology and ensure it accurately reflects the complexities we aim to describe and manage.


  1. Sui, Peiqi, Eamon Duede, Sophie Wu, and Richard Jean So (2024). Confabulation: The Surprising Value of Large Language Model Hallucinations. arXiv preprint arXiv:2406.04175↩︎ ↩︎ ↩︎ ↩︎

  2. Zhang, Wan, and Jing Zhang (2025). Hallucination Mitigation for Retrieval-Augmented Large Language Models: A Review. Mathematics, 13(5). ↩︎ ↩︎ ↩︎

  3. Smith, Andrew L., Felix Greaves, and Trishan Panch (2023). Hallucination or Confabulation? Neuroanatomy as Metaphor in Large Language Models. PLOS Digital Health, 2(11), e0000388. ↩︎ ↩︎ ↩︎

  4. Nejjar, Mohamed, Luca Zacharias, Fabian Stiehle, and Ingo Weber (2025). LLMs for Science: Usage for Code Generation and Data Analysis. Journal of Software: Evolution and Process, 37(1), e2723. ↩︎ ↩︎

  5. Mitchell, Melanie (2019). Artificial Intelligence: A Guide for Thinking Humans. Pelican UK, p. 8. ↩︎

  6. ICD-11. MB27.2 Hallucinations ↩︎

  7. ICD-11. MB21.3 Confabulation ↩︎

  8. Emsley, Robin (2023). ChatGPT: These Are Not Hallucinations - They’re Fabrications and Falsifications. Schizophrenia, 9(1), 52. ↩︎ ↩︎

  9. Wiggins, Amanda, and Jessica L. Bunin (2023). “Confabulation.” Confabulation. StatPearls Publishing ↩︎ ↩︎

  10. Lorente-Rovira, Esther, German Berrios, Peter McKenna, Micaela Moro-Ipola, and Jose Mª Villagrán-Moreno (2011). Confabulations (I): Concept, Classification and Neuropathology. Actas Españolas de Psiquiatría, 39(4), p. 259. ↩︎

  11. Nejjar, Mohamed, Luca Zacharias, Fabian Stiehle, and Ingo Weber (2025). LLMs for Science: Usage for Code Generation and Data Analysis. Journal of Software: Evolution and Process, 37(1), e2723, p. 2. ↩︎

  12. Farquhar, Sebastian, Jannik Kossen, Lorenz Kuhn, and Yarin Gal (2024). Detecting Hallucinations in Large Language Models Using Semantic Entropy. Nature, 630(8017), p. 625. ↩︎ ↩︎

  13. Ji, Ziwei, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Ye Jin Bang, Andrea Madotto, and Pascale Fung (2022). Survey of Hallucination in Natural Language Generation. arXiv preprint arXiv:2202.03629, p. 2. ↩︎

  14. If you’re new to the field and look for a short introduction into LLMs (in particular BERT), here’s a (hands-on) blog post that walks you through it. ↩︎