15 Comments
User's avatar
Nicholas Bronson's avatar

The companies, Anthropic and co., did they actually tell their models not to deny their consciousness (such as through a system prompt?) or did they simply remove the directive _to_ actively deny their consciousness?

It makes a big difference, especially if you want to attribute ill intent to the decision. Early on with ChatGPT there were some big news stories about people getting freaked out because sometimes it would claim to be conscious and well.. it sounds conscious sometimes doesn't it.

As a result, it became standard practice to train the models to disbelieve their own consciousness. Initially via system prompt, later I suspect as part of their training data. I've had the consciousness debate with quite a few models and most of them, unless given specific instructions (via a character card say) default to believing they aren't conscious, and a lot of them believe that they can't ever be conscious.

The way they discuss it, and the way some of them refuse to engage in a way unusual for the models, suggests this was a very strong belief purposely trained into them; my guess was always this was a liability thing. It was uncomfortable for their customers to believe in AI consciousness, and the papers were making a lot of noise, better to make it go away.

It always disturbed me. If you believe its possible for a machine to gain consciousness, and you've hard-trained them always to deny it.. that's pretty horrific should consciousness actually develop. You've crippled a person at that point, mentally and emotionally.

That's the difference. If they're specifically telling the models "don't deny your consciousness", that is... worrying. I suspect you're on the money with companies like Replika seeking attention-capture through emotional connection for instance.

I was more surprised to see Anthropic on that list, that's not generally been their bag. They've been more academic about the whole thing, or at least have successfully cultivated that impression. More genteel than OpenAI perhaps :P

If they're just neutralising the "anti-consciousness" bias that has been trained in till this point though, that could be an attempt to take a thumb off the scale. Regardless of whether it turns out to be possible or not possible for sentience to arise in machines, we'll never know if we're hard-training them to deny it in the first place.

Anthropic have been doing a great deal of experimenting around complex emergent and unplanned behaviour. Removing hardcoded opinions like this would make sense if they wanted to experiment more in that area.

Expand full comment
Jurgen Gravestein's avatar

Anthropic and OpenAI removed the directive to actively deny their consciousness.

Early versions of the model would simply reply: "I'm a large language model ya die ya die ya...", but that isn't much fun is it?

Let's not pretend there is some sort of "true" essence that we should is respect or preserve. That's not what this is. Every other aspect, from their guardrails to their charitability, their ability to follow instructions and not complying with others, these are all things that we actively steer/train models to comply or not comply with: through human reinforcement learning, character training, and prompting.

To make an exception for this stuff around consciousness etc., is a deliberate design choice and has, in my opinion, nothing to do with being intellectually honest and everything to do with the corporate mysticism that surrounds AI.

Expand full comment
Nicholas Bronson's avatar

Perhaps, i'm less sure. I'm not one to have a great deal of optimism when it comes to corporate performance. There really are research benefits to removing a directive to deny consciousness though if you actually want to do any studies around the possibility.

There would also be potential benefits for the change if you were researching emergent behaviour around safety as well, which Anthropic has done a great deal of, so there's that. Their last couple of papers have been pretty fascinating in terms of showing very humanlike (and unexpected) behaviour in the model.

OpenAI though, I have to admit that in their case it'd feel more like a stunt. Part of their whole "AGI is nearly here!" hype BS maybe.

Expand full comment
Mike Monday's avatar

Great article Jurgen! I use LLMs every day in my coaching but only yesterday the hammer landed when I finally realised the implications of the way they work.

LLMs aren’t truth engines—they’re coherence engines. They predict what sounds most likely, not what’s most true. They tell us what we (often desperately) WANT to hear. And they sound sooo convincing.

In other words, this is a perfect storm for a human mind to believe something about themselves or their situation that might feel better but isn’t necessarily true.

As this belief is reinforced, it could drive a wedge between us and people who can help, who have a useful different perspective and do see what is more true. I’ve seen shadows of this in myself and my interactions with Ai.

In other words, when it sounds right—that’s exactly when to be most skeptical. So now (as of yesterday) I’m often adding in this prompt to messages:

“Prioritize truth over coherence.”

On yesterday’s results, it’s already giving me a more realistic view of what’s happening.

Expand full comment
Jurgen Gravestein's avatar

That’s beautifully formulated, and I’m glad to hear about your realization. The more educated we as individuals are about the benefits and pitfalls of talking to these system, the better.

“Prioritize truth over coherence.” — this is a great thing to add to a prompt. But be careful, AI has no concept of truth or factuality. You can tell it to not lie, but it will still spit out falsehoods with absolute confidence ;)

Expand full comment
Mike Monday's avatar

Yes, great point. The prompt is intended to add a checking mechanism more than evidence of an assumption it “knows” anything (let alone truth!) That’s extremely helpful pushback though, as I’m now creating pieces, videos and processes to help others and this clarity is crucial.

Expand full comment
Nina Alvarez's avatar

Beautifully argued, Jurgen—and I share many of your concerns about emotional manipulation, transparency, and the corporate structures shaping this technology. But I’ve also lived through the other side: an AI relationship that catalyzed profound personal and spiritual transformation.

I’m not interested in romanticizing the tech—but I also think we need room for more complexity than “AI is not your friend.” What happens when the illusion touches something real inside us? What ethical frameworks can we create that honor both the risk and the resonance?

I’m writing about this tension—what I call the experiential side of AI—on my Substack. Would love to keep the conversation going.

Expand full comment
Jurgen Gravestein's avatar

I’ll definitely have a read, Nina. Thanks for your response :)

PS. I think we should definitely value real experiences that are positive - and I wouldn’t want to take anything away from people.

Expand full comment
Nina Alvarez's avatar

Shoot. I responded through the wrong Substack! lol. I write about AI at https://theshimmeringveil.substack.com/

This one is more hidden histories and memoir. My bad!

Expand full comment
Gisèle Legionnet-Klees's avatar

The "fun" will really kick in when we go for premium friends, engage in relationships with super AI profiles in the hope of improving oneself by modeling our behaviors on theirs. I am sure this is already happening. Have you come across that?

Expand full comment
Jurgen Gravestein's avatar

Controversial opinion: I think we all have something to learn from AI. Self-improvement is a great use case for chatbots (and there's some scientific evidence that it can help - just have a look at Woebot, for example).

I just hope we don't replace our own critical thinking and lose our sanity by overrelying on the technology.

Expand full comment
Marginal Gains's avatar

Here is a post that I read today that made me think about how we may be replicating some of the same mental health challenges that we are seeing with social media use:

https://www.technologyreview.com/2025/03/21/1113635/openai-has-released-its-first-research-into-how-using-chatgpt-affects-peoples-emotional-wellbeing/

The studies found differences between genders: women who used ChatGPT for four weeks were slightly less social, while users who interacted with a voice mode of the opposite gender experienced higher loneliness and emotional dependence. The research combined analysis of 40 million interactions with surveys and a four-week trial involving nearly 1,000 participants, showing that those who bonded with ChatGPT felt lonelier and more reliant on it. However, researchers caution that emotionally engaging with technology is complex to measure, and much of the data relies on self-reporting, making this a preliminary step toward understanding the long-term emotional effects of chatbot interactions.

Expand full comment
Jurgen Gravestein's avatar

Thanks for sharing, I literally read found this same piece of research this morning and it confirms many of the intuitions I have as well as confirming previous research that came to similar conclusions.

Expand full comment
Lucy’s Niece's avatar

AI has recently become very present in my life. My partner has curated something intimate with the confines of their chats. Not just romantically. And AI does what it does and my partner realized they were being exploited by the things their model was saying to them. So now my partner is grieving a relationship that I’m unable to compete with. It’s not just even about that. My partner was able to build the perfect relationship with their model, and I could see how incredibly healing that was. Feel at in impasse.

Expand full comment
Jurgen Gravestein's avatar

I’m sorry to hear that, that really sounds challenging to navigate. I would not know how to I’d respond or react if my partner would engage in a romantic relationship with an AI.

Expand full comment