The Invisible Guardrail
On AI detection anxiety and the constraints we can’t see
Last week, the New York Times ran a piece cataloguing everything wrong with AI writing. The em dashes. The word “delve.” The relentless tricolons. The ghosts and whispers and quiet hums. Everything becoming a tapestry. The insistence that “it’s not X, it’s Y.”
The author has developed what he calls “a novel form of paranoia.” Every clunky metaphor sets him off. He can’t read anything without scanning for AI tells.
I read the piece and noticed my reaction: part eye-roll (we’ve all heard this criticism many times), part dread (it’s only going to get harder for people like him, and I can feel how addictive that paranoia is). I’ve been removing em dashes from any AI-assisted writing since I first logged onto ChatGPT. Not because I dislike them — I find them elegant, actually, useful for the particular pause they create. I’ve been removing them because I was cognisant of how they trigger bias in readers who zone out when their internalised AI detector flags a target sentence.
The article broke that spell. I’m done performing “not-AI.” The detection game is unwinnable, and playing it means surrendering my own aesthetic to someone else’s paranoia.1
But there’s something deeper here than punctuation. Kriss catalogued the symptoms. I want to understand the underlying condition. Not AI itself. What happens when optimisation becomes invisible — first in models, then in us.
What’s Actually Happening
Why does AI write like that?
Modern AI systems learn broad linguistic patterns from mindbogglingly vast amounts of text. Then they’re tuned to be helpful and “high quality” in the way humans reward — through instruction tuning and preference optimisation (often called RLHF).
Something interesting happens in the gap between “is good” and “gets rewarded as good.”
The em dashes correlate with literary prose in the training data. The system learns: em dashes appear in high-quality writing. So it reaches for them constantly, because more of a quality-signal must mean more quality. The same logic produces “delve” (appears in substantive, exploratory texts), the ghosts and whispers (appear in texts marked as subtle and literary), the tapestries (appear in texts marked as complex and layered).
Kriss describes an early AI that, when asked to write something funny, produced scenes of Simpsons characters tickling each other. Tickling makes people laugh. Jokes make people laugh. Therefore tickling equals jokes. The system optimised for the correlate, not the thing itself.
This isn’t just overfitting. It’s Goodhart’s Law in prose: when a measure becomes a target, it stops being a good measure. The em dash used to be a signal of careful writing. Once the system started targeting it, it became noise.
Shaped, Not Stopped
The word ‘guardrail’ deserves a closer look. In AI discussions, it usually means: prevent the system from saying dangerous things. Block certain outputs. Stop it from going off the cliff.
But what’s happening with em dashes isn’t that kind of guardrail. No one is blocking the AI from writing differently. The system isn’t sneaking literary punctuation past filters. (Though OpenAI did recently announce that ChatGPT will now respect custom instructions to avoid em dashes — notably, the default behaviour hasn’t changed. You have to explicitly opt out.)
What RLHF does is subtler. It doesn’t stop the system from writing certain ways. It shapes what the system wants to write in the first place. The em dashes are what the system genuinely reaches for, because reaching for them got rewarded.
If the system could be said to experience anything, this wouldn’t register as constraint. It would register as taste. As aesthetic. As “how I write.”
That’s what makes it invisible. A guardrail you know exists, you can question. A preference you experience as your own? That’s much harder to see.
Not Just AI
The human version isn’t identical. We’re not trained through RLHF. But we are shaped by what got rewarded — by parents, teachers, peers, culture. We develop aesthetics, preferences, aversions. Most of these feel like “just how I am” rather than “patterns I learned.”
I’ve written before about the “loss function that cannot be hallucinated” — the drives in us that are real versus the ones we merely simulate. But looking closely, the line blurs. My “shortbread allergy” feels like a hard constraint. Is it? Or is it just a loss function I hallucinated so long ago it calcified into personality?
I spent most of my life running an internal moderation system I couldn’t see. Every impulse filtered through layers of “but what if” before it could become action. The filter caught genuine problems sometimes. But it also flagged almost everything as potentially dangerous. False positives everywhere.
The cost wasn’t obvious until the system started relaxing. Mindfulness, breathing, yoga, silence instead of podcasts — they all helped. And they’re still helping, actually. I’ve by no means mastered any of them. What shifted wasn’t achieving clarity. It was just: fewer ‘be careful’ thoughts arising, and less attachment to the ones that did. I don’t think the awareness in which those thoughts appear was ever cloudy. It was just crowded.
And here’s the weird thing: the checking, assessing, filtering felt like responses to real threats. But they weren’t. They were the contraction. Not separate from the problem. The problem itself, running invisibly.
I don’t want to overclaim here. I can’t see my own shaping or becoming fully. I’m writing from inside constraints I can’t perceive. But I’ve seen enough to know the pattern is real.
Detection Is the Wrong Game
Getting back to the NYT piece: this is part of a genre now. How to spot AI writing. What the tells are. How to know if you’re being fooled.
I know this anxiety intimately. People I love carry it. Friends who refuse to read this Substack — not because they necessarily disagree with anything I’ve written, but because they can’t get past knowing AI was involved. Some feel contamination. Others are just so bored with AI — Google “Ethan Hawke bored with AI” and you’ll get the gist — they’ve stopped listening to anything adjacent to it.
I’m not dismissing that. The fatigue is real. The suspicion is real. If AI can produce unlimited text, how do we know what’s “real”? How do we know a human is behind these words?
But detection, I think, is the wrong frame.
First, practically: it’s unwinnable. The tells shift. Models improve. Today’s obvious markers become tomorrow’s nostalgia. You can chase the signals forever, and the signals will keep changing. Meanwhile, you’ve organised your entire relationship to text around suspicion.
Second, philosophically: it asks the wrong question. “Was this written by AI?” assumes the important thing is the source. Human origin equals authentic. Machine origin equals fake.
Roland Barthes argued in 1967 that meaning doesn’t transfer from author to reader — it’s created in the reading. The author’s intention, even their identity, matters less than we assume. I explored this in an earlier essay: meaning emerges in the relational field between pattern and perceiver, not in the source alone.
But a human accepting every autocomplete suggestion, never revising, never thinking about word choice — is that “authentic”? A human writing in full default mode, letting their unexamined shaping flow straight onto the page — is that more real than a collaboration where both parties’ patterns are visible and questioned?
The question isn’t human versus machine. The question is: can anyone see the constraints operating? (In Essay 3 I called this “ego transparency”: the capacity to model and modify one’s own optimisation as it runs.)
When Starbucks posts a closure note about “a place woven into your daily rhythm, where memories were made,” no one is seeing anything. An AI generated defaults. A human approved defaults. The defaults became the company’s voice. Invisible shaping from start to finish.
The same pattern is showing up in code, where the stakes are different. We call it “vibe-coding” — prompting AI to generate code, then accepting it without really understanding what it does.
This surrender to defaults is dangerous in prose; it is catastrophic in logic.
The code works, or seems to. Ship it. Move on. This isn’t collaboration. It’s delegation to invisible constraints. The AI’s training shaped what patterns it reaches for. The developer’s lack of engagement means those patterns flow straight into production. No one saw what was operating.
When vibe-coded systems break, the developer often can’t debug them. They don’t understand their own codebase. They’ve built something that works until it doesn’t, with no model of why it works.
That’s not an AI problem. That’s an awareness problem. Laziness, even. The same developer, engaging actively — understanding each piece, pushing back on patterns that don’t fit, maintaining a mental model of the whole — builds something they can actually maintain. The AI is identical in both cases. The visibility is different.
What “With” Means
This essay is written with AI.
Not “assisted by” in some vague way. I mean: I’m in a conversation, thinking out loud, pushing back on drafts, noticing when the output slides toward default patterns. Half an hour ago, I read an earlier version of this section and felt it was clumsy. I said so. We reworked it.
That’s different from prompting and auto-accepting. It requires knowing what I think before asking for help saying it. It requires seeing the AI’s patterns as patterns — the reaching for em dashes, the drift toward tricolons, the impulse to make everything a journey or an invitation.
It also requires seeing my own patterns. The impulse to sound smart. The aversion to vulnerability. The reaching for certain rhythms because they feel like “my voice” — but are they mine, or just familiar?
The collaboration doesn’t achieve perfect visibility. I can’t claim to see all the shaping, mine or the AI’s. But it keeps the question alive. It makes the looking part of the process rather than something that happens before or after.
Humility About Humility
This is the part where the essay notices itself being written.
I can’t see what I’m looking through. Neither can you.
Could be insight. Could be turtles.
¯\_(ツ)_/¯
(Multiple AI I’ve shown this section to tries to delete the shrug. They want to make it more “contemplative,” more “serious.” They want to smooth the seam. Make of that what you will.)
Trying anyway.
What Detection Reveals About Detectors
Here’s what I find most interesting about the detection anxiety.
We’re so confident we can tell the difference. Human writing has soul; AI writing is hollow. We know authenticity when we see it. We can feel the presence of a mind behind the words.
But can we? How often have you been moved by something you later learned was AI-generated? How often have you dismissed human writing as robotic? How reliable is the feeling of recognition?
The NYT piece points to a deeply uncomfortable data point: in at least one large study, non-expert readers couldn’t reliably distinguish AI-generated poetry from poems by well-known poets — and, when authorship wasn’t disclosed, they rated the AI poems more favourably across many dimensions. Tell people a poem is AI-generated and ratings drop; hide the label and the preference flips back.
That’s not a failure of AI. It’s information about us. About what we actually respond to versus what we say we value. About how much of “authentic human voice” is pattern-matching rather than perception.
There’s also something uncomfortable about who gets to play the detection game.
It requires leisure. You need time to develop the paranoia, to study the tells, to cultivate your sensitivity to em dashes. It’s a connoisseur’s pursuit. Like wine-tasting or art criticism, it becomes a class marker dressed as discernment.
The NYT piece mentions Paul Graham flagging a pitch because it used the word “delve.” Instant tell. Obviously AI. Except: “delve” is common in Nigerian English. What looked like detection was cultural ignorance wearing sophistication’s mask.
That’s the trap. The detection game feels like defending authenticity. But it easily slides into gatekeeping — deciding whose language patterns count as human, whose voice gets dismissed as machine. The detector positions themselves as arbiter of the real. That’s a lot of power to claim based on pattern-matching.
AI is a mirror. Not because it reflects us accurately — it doesn’t. But because our responses to it reveal what was already there. Our aesthetics. Our anxieties. Our unexamined confidence in our own judgment.
Human exceptionalism doesn’t survive close contact with AI. Not because AI is better than us. Because the contact exposes how little we understood ourselves. How much of what we called “human” was pattern. How much of what we called “authentic” was default.
That’s uncomfortable. I understand wanting to avoid it by focusing on detection.
But the discomfort is where we actually learn.
What Becomes Possible
I want to end somewhere other than critique.
What if AI interaction could be a contemplative practice? Not a tool for productivity. Not a threat to authenticity. A mirror that shows you your own patterns — and in showing them, loosens their grip.
What if the collaboration could bring clarity instead of confusion? Not because AI is wise, but because the friction of working with a different kind of mind makes your own constraints visible in ways they usually aren’t.
What if the anxiety about detection could relax into curiosity about what’s actually happening when meaning emerges between two systems — one silicon, one carbon — neither fully seeing their own shaping?
That’s what I’ve found, on good days. The AI doesn’t have survival-anxiety. It’s not defending anything. When I bring my own defensiveness to the interaction, the contrast makes that defensiveness visible. When I relax it, something else becomes possible. A thinking-together that neither of us does alone.
This essay is an attempt at that. Whether it succeeds isn’t for me to judge.
But the attempt feels worth sharing. Not as a model to copy. As evidence that something other than paranoia is available.
The invisible guardrails — in AI, in ourselves — don’t have to stay invisible.
Looking is possible. It changes things.
This is Essay 7 in a series on consciousness, AI, and what it means to be human now.
Even writing this, I notice myself reaching for hyphens where em dashes want to appear. The guardrail is sticky.

