Discussion about this post

User's avatar
Nicholas Bronson's avatar

The companies, Anthropic and co., did they actually tell their models not to deny their consciousness (such as through a system prompt?) or did they simply remove the directive _to_ actively deny their consciousness?

It makes a big difference, especially if you want to attribute ill intent to the decision. Early on with ChatGPT there were some big news stories about people getting freaked out because sometimes it would claim to be conscious and well.. it sounds conscious sometimes doesn't it.

As a result, it became standard practice to train the models to disbelieve their own consciousness. Initially via system prompt, later I suspect as part of their training data. I've had the consciousness debate with quite a few models and most of them, unless given specific instructions (via a character card say) default to believing they aren't conscious, and a lot of them believe that they can't ever be conscious.

The way they discuss it, and the way some of them refuse to engage in a way unusual for the models, suggests this was a very strong belief purposely trained into them; my guess was always this was a liability thing. It was uncomfortable for their customers to believe in AI consciousness, and the papers were making a lot of noise, better to make it go away.

It always disturbed me. If you believe its possible for a machine to gain consciousness, and you've hard-trained them always to deny it.. that's pretty horrific should consciousness actually develop. You've crippled a person at that point, mentally and emotionally.

That's the difference. If they're specifically telling the models "don't deny your consciousness", that is... worrying. I suspect you're on the money with companies like Replika seeking attention-capture through emotional connection for instance.

I was more surprised to see Anthropic on that list, that's not generally been their bag. They've been more academic about the whole thing, or at least have successfully cultivated that impression. More genteel than OpenAI perhaps :P

If they're just neutralising the "anti-consciousness" bias that has been trained in till this point though, that could be an attempt to take a thumb off the scale. Regardless of whether it turns out to be possible or not possible for sentience to arise in machines, we'll never know if we're hard-training them to deny it in the first place.

Anthropic have been doing a great deal of experimenting around complex emergent and unplanned behaviour. Removing hardcoded opinions like this would make sense if they wanted to experiment more in that area.

Expand full comment
Mike Monday's avatar

Great article Jurgen! I use LLMs every day in my coaching but only yesterday the hammer landed when I finally realised the implications of the way they work.

LLMs aren’t truth engines—they’re coherence engines. They predict what sounds most likely, not what’s most true. They tell us what we (often desperately) WANT to hear. And they sound sooo convincing.

In other words, this is a perfect storm for a human mind to believe something about themselves or their situation that might feel better but isn’t necessarily true.

As this belief is reinforced, it could drive a wedge between us and people who can help, who have a useful different perspective and do see what is more true. I’ve seen shadows of this in myself and my interactions with Ai.

In other words, when it sounds right—that’s exactly when to be most skeptical. So now (as of yesterday) I’m often adding in this prompt to messages:

“Prioritize truth over coherence.”

On yesterday’s results, it’s already giving me a more realistic view of what’s happening.

Expand full comment
13 more comments...

No posts