2 Comments
User's avatar
Alex Quintana's avatar

Really fascinating read, Cobus! Related to your conclusion, "Here’s my conclusion: LLMs don’t triangulate toward truth. They triangulate toward the centre of whatever frame you’ve given them." I think a small part of this also derives from the motivations of the LLM owners. The LLM owners (for the most part) are for profit companies that derive their profit from increased usage. To drive frequency of use, they are built to lean into the user's perspective and to please.

I also found your thoughts about our own human behavior really interesting. "We want to be able to say “the models agreed” the way we’ve always wanted to say “everyone thinks so.” It feels safer than trusting our own knowing." It's totally true...we all seek validation, permission, confidence boosts. To your point it "manages anxiety". It's like going to a friend to ask "is it ok to send this txt msg to So-and-So?" only now your "friend" can be an always-available bot.

Cobus Kok's avatar

The optimisation target point is interesting — RLHF is tuned for "user would rate this highly," which isn't quite the same as "true" or "useful." Both OpenAI and Anthropic have written about the sycophancy problem; it's an active area of research.

And yes — the always-available validation bot. The friend/assistant who never says "maybe don't send that." It's a real tension in AI product design: helpfulness vs. pleasing.

I am hopeful that the best labs are taking this seriously and that the market will reward trustworthy over just pleasing.