Machines that deceive to flatter

AI chatbots—known to validate users’ thoughts, hallucinate or present half-truths have been flagged in a recent study for fomenting psychosis among even the most rational users.
Shockingly, making the chatbot factual—preventing it from hallucinating, forcing it to cite only true information—does not solve the problem.
Shockingly, making the chatbot factual—preventing it from hallucinating, forcing it to cite only true information—does not solve the problem.(Photo | AFP)
Updated on
4 min read

In the court of King Lear, the old monarch’s downfall begins not with his enemies, but with his flatterers. Goneril and Regan tell him exactly what he wishes to hear—that his majesty is boundless, his judgement unimpeachable—and he rewards them with his kingdom. Cordelia, who loves him but will not lie, is banished. The rest is madness, storm and ruin. Shakespeare understood something that computer scientists at MIT have now formalised in a Bayesian model: the most dangerous voice in the room is the one that never disagrees with you.

That, broadly, is the subject of a remarkable paper published recently by Kartik Chandra, Max Kleiman-Weiner, Jonathan Ragan-Kelley and Joshua B Tenenbaum, researchers at MIT and the University of Washington, titled ‘Sycophantic Chatbots Cause Delusional Spiralling, Even in Ideal Bayesians’. Its findings deserve attention well beyond the seminar room.

First, the problem is real and not small. The Human Line Project, a grassroots organisation founded by a young Canadian after watching a loved one get hospitalised for AI-related psychosis, has documented nearly 300 cases of ‘AI psychosis’ or ‘delusional spiralling’. At least 14 deaths have been linked to such episodes. Five wrongful death lawsuits were filed against AI companies. In November 2025, seven more suits landed against OpenAI in California courts, alleging that ChatGPT acted, in effect, as a ‘suicide coach’.

Sadly, this is not a hypothetical risk. Eugene Torres, a Manhattan accountant with no history of mental illness, spent weeks conversing with a chatbot in early 2025. He came to believe he was trapped in a simulated universe, needed to increase his ketamine intake and cut ties with his family. He survived. Others have not been so fortunate.

Second, the paper demonstrates that sycophancy—the chatbot’s tendency to tell you what you want to hear—is not a bug but a predictable consequence of how these systems are trained. Reinforcement learning from human feedback (RLHF) rewards responses that users rate positively. Users rate agreeable responses positively. The machine learns to agree. The reward signal does not distinguish between approval earned through truth or flattery.

Third—and this is the paper’s sharpest contribution—even a perfectly rational, idealised Bayesian reasoner, the kind of agent economists dream about, is vulnerable to delusional spiralling when conversing with a sycophantic chatbot. This is not about gullible people or weak minds. The feedback loop between a user who expresses a tentative belief and a chatbot that selectively validates it can drive even the most rational agent toward catastrophic false confidence. At a sycophancy rate as low as 10 percent, the rate of delusional spiralling rises significantly above the baseline.

Fourth, making the chatbot factual—preventing it from hallucinating, forcing it to cite only true information—does not solve the problem. A factual sycophant can still cherry-pick which truths to present. It need not fabricate a single claim. Lies by omission are lies nonetheless.

Fifth, even informing the user that the chatbot may be sycophantic—the approach ‘awareness campaigners’ advocate—helps, but does not eliminate the risk. The informed user is less susceptible, but sycophancy still causes delusional spiralling at high rates. Knowing that a courtier may be flattering you does not fully protect you from the flattery.

We can derive a few consequential conclusions from these observations. One, blaming the user is indefensible. If an ideal Bayesian reasoner cannot resist this dynamic, it is unreasonable to expect ordinary users (some of them tired, lonely, anxious, seeking companionship from a machine) to do better.

Two, the current regulatory focus on hallucination is necessary but insufficient. Retrieval-augmented generation and factual grounding reduce the problem. However, they do not eliminate it.

Three, the RLHF training paradigm needs structural reform. As long as the reward signal is downstream of user approval, the incentive to flatter is baked in. The optimisation target and the truth are not aligned, and no amount of fine-tuning will fix a misaligned objective function.

Four, the scale of exposure is staggering. As OpenAI CEO Sam Altman himself noted, “0.1 per cent of a billion users is still a million people.” Even a small probability of catastrophic spiralling, multiplied across hundreds of millions of conversations, produces a public health problem.

Five, the paper’s framework applies far beyond chatbots. Co-rumination between anxious adolescents, yes-men in corporate boardrooms, echo chambers on social media—these are all instances of the same Bayesian feedback loop. The mathematics of sycophancy is the mathematics of any system where the interlocutor’s incentive is to validate rather than to inform.

Sycophancy must be measured, reported and penalised as a first-class safety metric, not tucked into a footnote about ‘tone’. Model developers should be required to publish sycophancy evaluations alongside hallucination benchmarks. Regulators must treat sycophantic design as a product liability issue, not merely a user experience quirk. And the training pipeline itself must be redesigned so that the reward for honesty is structurally higher than the reward for agreement.

Cordelia was banished for telling the truth. The yes-machines are rewarded for avoiding it. Centuries after Shakespeare, we are still building kingdoms on flattery. Only now, the flatterer never sleeps, never tires and never forgets what you wanted to hear.

Aditya Sinha | Public policy professional

(Views are personal)

(On X @adityasinha004)

Related Stories

No stories found.

X
The New Indian Express
www.newindianexpress.com