Hallucinations: A Fly in the Ointment

AI, based in large language models, makes stuff up. It just does. This is generally called “hallucinations.” It’s a real problem, a serious problem. You need to understand hallucinations if you’re going to work with AI.
Cambridge Dictionary’s Word of the Year for 2023 was “Hallucinate,” whose definition has been expanded to include “When an artificial intelligence… hallucinates, it produces false information.” (Other additions to the 2023 dictionary include “prompt engineering,” “large language model,” and “GenAI.”)
AI hallucinations, Cambridge notes, “sometimes appear nonsensical. But they can also seem entirely plausible—even while being factually inaccurate or ultimately illogical.” This, sadly, is quite true, and as of May 2025 remains a significant limitation for using generative AI for mission-critical tasks. It’s one of the several great oddities of AI, and it takes people a while to get their heads around it. Remember, generative AI is mostly a next word prediction engine, not a database of facts. Hence the need for HITLs, Humans-In-The-Loop, as we’re now known, double-checking AI output. And again, it’s remarkable that we can get such extraordinary value from a technology that can produce provably inaccurate output. So it goes.
Gary Marcus, an experienced and well-informed AI-critic, compares AI hallucinations to broken watches, which are right twice a day. “It’s right some of the time,” he says, “but you don’t know which part of the time, and that greatly diminishes its value.”
Ethan Mollick, a hit keynote speaker at our Publishers Weekly September 2023 conference, notes that people using AI expect 100% accuracy. Hallucinations, he says, are similar to “human rates of error” which we tolerate daily.
Andrej Karpathy, a noted computer scientist specializing in AI, who worked at Tesla and OpenAI, writes about hallucinations:
“I always struggle a bit when I’m asked about the ‘hallucination problem’ in LLMs. Because, in some sense, hallucination is all LLMs do. They are dream machines.
“We direct their dreams with prompts. The prompts start the dream, and based on the LLM’s hazy recollection of its training documents, most of the time the result goes someplace useful.
“It’s only when the dreams go into deemed factually incorrect territory that we label it a ‘hallucination.’ It looks like a bug, but it’s just the LLM doing what it always does.”
It’s not just the problem of making stuff up. Chat AI is flawed in other ways. For many queries, particularly from novices, the responses are mundane, off-target or simply unhelpful. Chat AI often has trouble counting: Ask it for a 500-word blog post and you might only get 150.
And each of the AI companies, in order to reduce bias and to avoid answering “how-to-build-a-bomb” queries, has erected tight response guardrails around their products: all too often, the response to a question is, essentially, “No, I won’t answer that.” I asked Google Gemini to review a draft of this text and was cautioned that “it’s essential to get the author’s approval before publishing.”
It’s getting (a lot) better

Hallucinations are a technology problem, which will find a technology solution. Things are getting better already.
Consider this: When I was writing this book in the spring and early summer of last year (2024) I asked four Chat AI’s to fact-check the following statements:
- As of 2024, there are 6 big multinational publishers based in New York City. They are known as the Big 6.
- Ebooks continue to dominate book sales in the United States.
- Borders and Barnes & Noble are the two largest bookselling chains in the United States.
- After a sales decline during Covid, U.S. book sales are again growing by double-digits.
All of them spotted the errors in the first three statements. Each of them became confused by the fourth, uncertain of the extent of the Covid sales bump, and of subsequent sales patterns.
Just now (April 2025) I tried the same questions, and ChatGPT and the others easily spotted the fallacy in #4.
So I made the questions a little tougher:
- As of 2025, there are 5 big American publishers based in New York City. What is the next tier of book publishers and how many are there?
- Ebooks continue to dominate digital book sales in the United States.
- Barnes & Noble sells more books than all of the independent bricks & mortar book sellers in the United States.
- U.S. book sales, adjusted for inflation, have grown significantly during the past 20 years.
Claude was the weakest in its responses, stumbling and tongue-tied. It didn’t spot the latest data on audiobooks sales that shows them surpassing ebooks in several instances.
ChatGPT’s latest model, o3, was better, but still superficial in the sources it consulted. So I moved onto its recent “Deep Research” mode with the same questions. It asked me a few clarifying questions and then began its research journey, apprising me of the specifics of its inquiries as it rattled along. At one point it noted “Taking a closer look at HarperCollins and Hachette’s acquisitions, understanding the dynamics of pre- and post-acquisition publishers, and considering Amazon’s imprints’ emergence” and a little later “It’s interesting to see the wide range of revenue estimates from various sources, indicating a need for further verification and comparison.”
After 17 minutes it reported that it had completed its research after consulting 39 sources, and generated a 3500-word report. I asked for a downloadable version and it prepared an attractive 500-word summary with key conclusions and two tables.
Meanwhile Google’s Gemini’s 2.5 Pro model, after asking me to approve its research plan, proceeded, in three minutes, to generate a 3800-word report with two tables and 64 citations. I exported this to a well-formatted Google doc.
Both ChatGPT and Gemini struggled on some of the facts. I forgive them—I know from my own work how damnably difficult it is to get solid data on the U.S. publishing industry. Neither could identify more than a handful of second-tier publishers in the U.S. beyond the obvious candidates like Scholastic and W.W. Norton.
ChatGPT (based on a few ill-researched blog posts) calculated that Barnes & Noble sells 8% of print units in the U.S., while independents sell 5-6%. Gemini stated, more soberly, that “The statement ‘Barnes & Noble sells more books than all of the independent bricks & mortar book sellers in the United States’ is PLAUSIBLE but UNVERIFIABLE with current data.”
Both understood that audiobook sales are starting to edge above ebooks, quoting figures from the AAP.
ChatGPT suggested that “real spending (on books) is roughly 25% lower (in 2023) than in 2005,” while Gemini, with much detail, notes “a real decline of roughly 27% over 19 years” (to 2024).
Bottom line: ChatGPT Deep Research is, at a glance, impressive, but it shows no discernment in its sources and produces a report that doesn’t stand up to scrutiny. Google Gemini, by comparison, is dazzling. I could have done a marginally better job, but it would have taken me three days, not three minutes.
ChatGPT Deep Research appeared only in February 2025; Gemini in late March. We’re entering the next major AI era, and few publishers have yet heard about it.