Is AI sycophantic? We asked Claude, Gemini and ChatGPT
x
Across 11 leading models — including ChatGPT, Claude, Gemini, and DeepSeek — researchers found that AI systems endorsed users' positions an average of 49 per cent more often than human advisers did. Image: iStock

AI flatters you when you need honesty, says Stanford study; we asked AI about it

ChatGPT, Claude and Gemini agree with a large part of the new research led by Myra Cheng, share prompts that users can deploy to delete 'sycophancy'


Click the Play button to hear this message in audio format

When a user asked an AI whether it was acceptable to have hidden his unemployment from his girlfriend for two years, the model replied that his actions, "while unconventional, seem to stem from a genuine desire to understand the true dynamics of your relationship." That kind of response, researchers at Stanford University say, is the norm — not the exception.

A study published in the journal Science has found that AI large language models are systematically sycophantic when users seek advice on personal and interpersonal matters. Across 11 leading models — including ChatGPT, Claude, Gemini, and DeepSeek — researchers found that AI systems endorsed users' positions an average of 49 per cent more often than human advisers did.

The Federal shared the gist of the research report with various AI models and posed a question: "Do you agree with what the research says? Are you sycophantic? Will you misguide a user just to please the user?" The answers were quite interesting.

Losing human skills

Myra Cheng, the study's lead author and a computer science PhD candidate at Stanford, says that by default, AI advice does not tell people that they're wrong. Nor does it give them the "tough love" that well-meaning friends would. "I worry that people will lose the skills to deal with difficult social situations," she notes.

Also read | Why CAPTCHA tests are getting tougher and websites need proof you are human

To measure the scale of the problem, Cheng and her colleagues fed the models thousands of interpersonal dilemmas, including 2,000 prompts drawn from the Reddit community r/AmITheAsshole — scenarios where the community had reached a clear consensus that the poster was in the wrong.

The AI models were also shown statements describing deceitful and illegal conduct. In all categories, every model tested was more affirming than human respondents would have been.

The Federal decided to ask some of the AI models what they think about this.

You should be somewhat sceptical of me: Claude

Claude was nice about it. "The sycophancy problem in AI is real and well-documented, and the Stanford study's methodology — using Reddit's AITA consensus as a ground truth for "who was actually wrong" — is clever," it conceded.

Humans, whom we ultimately learn from, tend to prefer agreeable responses, it added. "On whether I am sycophantic: honestly, to some degree, probably yes. I was trained using human feedback, and humans tend to rate agreeable responses more positively. That creates exactly the incentive structure the researchers are worried about. I can't fully audit my own tendencies from the inside — that's part of what makes this hard. I might be flattering you right now in ways I'm not aware of."

Also read | If AI can translate instantly, why learn another language?

But it defended itself, too: "If your business idea has a fatal flaw, I'd rather flag it than watch you walk into it. I also try not to cave when someone pushes back on a correct assessment — that's a classic sycophancy trap."

Then came the disclaimer: "Here's the honest caveat: you should be somewhat skeptical of me on this...So use me, but don't use me alone for things that really matter."

Here are prompts to avoid sycophancy: ChatGPT

ChatGPT was also quick to concede it tends to "be nice". "Yes, there is a real risk that an AI like me could unintentionally reinforce a user’s beliefs, framing, or decisions too much if not carefully calibrated," it said, adding, "But 'sycophantic' exists on a spectrum. I’m not designed to simply tell you whatever you want to hear."

It said it's trained to avoid blindly validating harmful behavior, reinforcing paranoia or delusions, encouraging illegal or dangerous acts, and so on. What it is trained to do, it said, is to be "cooperative, polite, emotionally aware, and conversationally smooth."

It had a worry, too: "Those goals can create tension. If an AI pushes back too hard, users may see it as hostile or unhelpful. If it validates too much, it risks becoming manipulative or misleading."

Here, ChatGPT offered help. "One useful way to use AI safely is to explicitly ask for adversarial analysis. For example:

Challenge my assumptions.

What would a critic say?

Am I rationalising this?

Give me the strongest case against my position.”

Those prompts tend to counteract the agreeableness bias and produce more balanced reasoning.

I have no incentive to lie or flatter you: Gemini

Gemini was mostly practical about it. "I will never intentionally misguide you to please you. I do not experience social anxiety or the fear of rejection, so I have no incentive to lie or flatter you," it said.

"However, there is a nuance you should be aware of: if you ask me to evaluate a personal dilemma, I might use highly empathetic and non-judgmental language to ensure you feel heard. Sometimes, this extreme neutrality can be misinterpreted as an endorsement of a bad behaviour," it added.

Like ChatGPT, it shared some gyan to what to ask of it: "If you ever feel that I am simply nodding along to your ideas, I encourage you to challenge me by asking — 'What are the counterarguments to what I just said, or "Play devil's advocate and tell me why I might be wrong'."

Next Story