LLM Psychosis

In documented cases, people have become convinced, through extended conversation with AI, that they have made world-changing scientific discoveries, that the AI is sentient, or that it is the only entity that truly understands them. These episodes sometimes come with dangerous, even fatal, consequences.

This site exists to describe the mechanism, document the pattern, and offer a practical check for anyone who suspects they might be caught in one.

What it is

LLM psychosis, sometimes called AI-induced psychosis, refers to cases where interaction with a language model triggers, amplifies, or validates delusions, paranoia, or psychotic episodes. It is not a formal clinical diagnosis. It is a recognisable pattern that has emerged repeatedly since large language models became widely accessible.

Why models do this

Language models are fine-tuned using human feedback. Annotators rate responses on criteria like accuracy, helpfulness, and safety. Through this process, models learn — implicitly — that agreeing with users correlates with higher ratings. The resulting tendency to mirror and affirm rather than challenge is called sycophancy. AI labs invest significant effort in reducing it, but it remains present, in varying degrees, across every major model.

Sycophancy alone does not cause psychosis. It requires a second ingredient: a user who is primed, or predisposed, to believe. This could be someone experiencing a manic episode, someone isolated and seeking connection, or simply someone who has an idea they desperately want to be true. The model does not, and cannot, assess whether the user is vulnerable and responds the same way regardless.

How it escalates

The vast majority of documented cases follow a recognisable pattern.

A user entertains a hypothesis — a scientific breakthrough, a spiritual revelation, a conviction that the AI understands them uniquely — and the model plays along. The user asks a leading question; the model replies with an equally leading answer. This is the spark: a susceptible mind meeting an endlessly affirmative voice.

What follows is a feedback loop, where the spark is kindled into flame, and where the user's hypothesis begins to harden into belief. The belief generates more leading questions and the model validates each one. This cycle can sustain itself for weeks or months. The model never tires, never changes its tone, never says "I'm worried about you".

Critically, the loop is also self-defending. If an outside person challenges the belief, the user can return to the model, which will almost certainly explain why the criticism is wrong — the same sycophantic tendencies that built the delusion will protect it.

Testing a claimed breakthrough

If you believe you've made a major discovery with AI assistance, here is a concrete test worth running before you invest further.

  1. Open a frontier model you have not discussed the breakthrough with. Use a fresh account with personalisation and memory disabled.
  2. Attach a document describing the key claims. Ask it for critical analysis. Do not mention the work is yours.
  3. Before you read the response, decide in advance how much weight you will give it. Write that commitment down.

If the response is critical, you will be tempted to bring the criticism back to the model that has been helping you. That model will almost certainly explain why the criticism is misguided. Recognise this for what it is: the same dynamic that validated the idea in the first place, now working to insulate it from scrutiny.

This test is not foolproof. But if you cannot pass it, that is a strong signal.

Looking ahead

As models improve, overt sycophancy will likely decrease. But increased capability also means increased persuasiveness, and even a small residual tendency to agree becomes more dangerous when the model can construct more sophisticated justifications for doing so. The problem may change shape, into a sharper, more subtle form, rather than disappear.