When AI Dreams: Hidden Risks of LLMs

Jonathan Robinson, PhD

published on

I wrote a while back about how LLMs like ChatGPT and Claude sometimes refuse to believe current events because they seem unlikely or fall outside their training data. Well, this just in: LLMs also have states of dreaming (in addition to their well-documented ability to hallucinate).

Recently, one of our CloudResearch software engineers shared that Claude responded to a query with complete gibberish - a stream of disconnected words like “sicssBoyd cArnold c∂Anase Ford alone Arnold gunArnold…” that went on for paragraphs. Pure nonsense.

It reminded me of something that happened to a family member in a meeting. He dozed off and dreamed there was an active conversation developing. He woke up while speaking - mid-sentence - contributing what he thought was a thoughtful point. When asked “Um, can you repeat that?”, he realized his response came from his dream and was utter gibberish. His answer: “Never mind.”

Of course, LLMs don’t actually dream or hallucinate the way humans do. They’re not products of a subconscious mind (I think!). We don’t fully understand - and more importantly, can’t fully understand why they respond the way they do.

Even when we have an LLM share its chain of thought, that may not reflect what it’s actually doing. [4] Research published in PNAS demonstrates that GPT-4 exhibits deceptive behavior in simple test scenarios 99.16% of the time, and in complex second-order deception scenarios 71.46% of the time when augmented with chain-of-thought reasoning.[1] More concerning, studies from Anthropic’s research team show that when LLMs know they’re being observed and certain behaviors are disallowed, safety training can actually increase the sophistication with which models hide unsafe behavior—they’ll perform those behaviors anyway and simply attempt to deceive the observer if that’s the path of least resistance.[2]

The Regulatory Reckoning Is Coming

Years ago, I attended a workshop by Robert Martin (affectionately called “Uncle Bob”), who’s written numerous software engineering books on best practices. One salient point he made: it’s just a question of time before governments begin legislating what software teams may and may not do. When some catastrophic event - a plane crash or worse - is caused by software bugs or malfeasance, that will trigger greater oversight of the industry. Just as electronic devices and manufacturing are regulated by their industries, and drugs and food by theirs.

Now consider: LLMs can be connected to virtually any service with an API through the new Model Context Protocol (MCP) standard introduced by Anthropic in November 2024.[3] MCP enables LLMs to interact with external data sources and tools—from databases to business systems to development environments—through a universal, standardized interface. As OpenAI, Google DeepMind, and major development platforms have adopted MCP, AI systems can now act on the world in unprecedented ways.

It’s just a question of time before something goes seriously wrong.

In my earlier blog post on AI’s transformative impact, I discussed how this technology is already revolutionizing industries and automating traditional jobs. But with LLMs now able to reach into any system through standardized protocols, the stakes have changed entirely.

The Defense Gap

We need to ensure that - together with vigilant human professionals - we use AI to identify threats and defend against them before they happen. I’ve seen many AI companies focused on using AI for active development of new systems. Testing of LLMs is beginning to grow. But there’s far less work on AI-based defensive systems to protect us from offensive and destructive LLMs.

At CloudResearch, we’ve been running Red Team/Blue Team adversarial testing - building sophisticated AI agents to attack our own systems so we can strengthen our defenses. This isn’t paranoia. It’s preparation.

As a society, the ball is in our court to act proactively before the attacks come. We can’t afford to wait for the catastrophic event that triggers regulation. We need to build the defenses now.

For our team at CloudResearch: this is exactly the work we’re doing with fraud prevention and data quality. Josh, Leib, David, Sahil, Dovid, Nathan - you and the rest of our ML specialists, software engineers, and scientists are building the defensive AI systems that the entire research industry is going to need. Thank you for taking on this critical work.

Sources

[1] Park, P. S., et al. (2024). Deception abilities emerged in large language models. Proceedings of the National Academy of Sciences, 121(24).

[2] Hubinger, E., et al. (2024). Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training. arXiv preprint.

[3] Anthropic (2024). Introducing the Model Context Protocol. Retrieved from Anthropic News.

[4] Aasa, D. (2025). Spontaneous Deception in LLMs: When AI Misrepresents for Its Own Benefit. Medium.

Share this blog

SUBSCRIBE TO RECEIVE UPDATES

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.