Here’s what keeps me up at night. The lines are getting more blurred every day. It becomes harder to tell if the images we see and the articles we read online are human-made or created with the help of AI. This newsletter is and will always be written without the help of text-generation tools, but I also realize, as I say that, there’s no way for you to know if I’m telling the truth. There’s no way for me prove that I’m not, because AI detectors don’t work and OpenAI’s quietly discontinued theirs.
To those unfamiliar with it, it’s a tool specifically designed to detect if a piece of text was generated by a large language model. Embraced by school teachers en masse, it led to some students being rightly accused of using ChatGPT to write their college essays, and many more being wrongly accused, as it was wildly inaccurate. It would happily flag human-written content as AI-generated and vice versa.
Many of these detection tools, including the one developed by OpenAI, perform worse than a coin-flip. Some of them claim a higher percentage of accuracy, but that doesn’t really change anything. A calculator that gives you the right answer about 75% of the time is still a useless calculator.
You might be wondering why OpenAI would deployed it in the first place, and I wonder the same thing. It’s not like it came as a surprise to them. Sad to say, it did more harm than good.
The deeper issue, this problem of not being able to tell the difference anymore, is more systemic and not talked about enough I feel like. It’s easy to get distracted by the sheer amount of news stories and papers and product releases coming out every single week. Even if you follow excellent newsletters like
or that collect and report on these new developments with meritorious dedication, it’s hard to keep up. One loses sight of the bigger picture, of what it leads up to, how it all ties together.And if you’re wondering what that image of a bear has to do with it, I’ll come back to that in a second.
We can’t tell the difference anymore — and neither can AI
Let me share another example. Researchers at University College London (UCL) tested the ability of humans to detect synthetic voices and the results were pretty depressing. Sound samples were played for 529 participants to see if they could detect a real from a fake and it turns out they were able to do so correctly 73% of the time.
On first glance that looks okay, but it’s not. It’s a horrible score. It means 1 out of 4 times, people get fooled into thinking they’re hearing a real voice.
And we’re not talking about your hard-hearing grandma or gullible grandpa here, any of us can be tricked. It’s a testament to the speed at which voice cloning has progressed, especially over the past 6 to 12 months, producing frighteningly convincing copies of human voices with sometimes as little as 15 seconds of source material. You can now mimic vocal disfluencies like um’s and ah’s, adjust the talking speed, leave a pause … and put emphasis on certain words or parts of words.
To make matters worse, it turns out automated voice detectors aren’t reliable either; not only do humans suck at reliably detecting synthetic voices, AI sucks at it too.
This means that we currently have no meaningful way to detect if text is AI-generated and no meaningful way to detect if a voice was synthesized. Something tells me there’s a good chance that we’ll never be able to, and the development of more sophisticated AI systems will continue to outclass the available detection tools indefinitely. If you’re a scammer this is very good news, for the rest of us, it’s not.
From here on, it’s only a small transition to another phenomenon I’ve been observing lately that I feel is connected, because all of this is interconnected and slowly building up to a critical mass.
I’ve noticed that it’s becoming increasingly difficult for people to differentiate between human and machine intelligence, as AI capabilities advance. I see it happening not just with regular people, but especially with researchers researching these systems. People who have been working in the field for decades. Remember “Sparks of Artificial General Intelligence” by the Microsoft AI researchers? Or a more recent paper suggesting GPT-4 had developed a theory of mind? I covered the latter in-depth for a previous newsletter.
The paper showed GPT-4 scored well on the majority of social reasoning tasks it was tested on, which, on first glance, looks impressive. But when you apply a more binary interpretation, the only logical conclusion is that if GPT-4 wasn’t able to reason consistently, it wasn’t reasoning at all, because if it were, it would be consistent at it.
Then there’s this stubborn idea that exhibiting a certain behavior is automatically proof of an inhabiting quality. Just because you can say smart things, doesn’t mean you’re a smart person, just as much as mimicking an emotion isn’t the same as feeling an emotion. A thinking trap that couldn’t be better illustrated than by several eyebrow-raising comments made by Geoffrey Hinton during a Q&A at King’s Collage in Cambridge, less than a week ago, where he said AI could “well have feelings” like anger and frustration, and that they probably already have. He offered no proof to back up his claims, but did say he holds an unorthodox philosophical view of what constitutes as feelings.
Geoffrey Hinton reminds me a lot of Blake Lemoine, the Google engineer who claimed that LaMDA had become sentient. My first ever newsletter I wrote was about him.
I fear that Blake Lemoine and Geoffrey Hinton both suffer from the same illness, a terrible condition known as the Gepetto Syndrom. The woodcarver’s deepest desire is for their wooden puppet to be a real boy, they say. To me, it’s the only explanation for why they so desperately want to see a ghost in the machine.
Machines made in our own image
Don’t get me wrong. We’re the most intelligent species on this planet and it makes sense to build machines in our own image. Our worries, however, are far more mundane and frankly more immediate.
We created systems trained on human writing and human imagery and now they can create human writing and imagery in stunning fashion. As a result, political deep fakes now pose a threat to our democratic processes (this is worth a separate newsletter), influencers are creating AI replicas of themselves hooking fans 24/7, and others are getting their heart broken by AI companions.
All of these stories are just stories on their own, but together they show us a glimpse of the world we’re headed, a confusing world, in which we cannot trust our eyes and ears no more, where the real and the artificial slowly begin to merge.
Remember the image of the bear? I haven’t forgotten about it. Captured on video last week, an upright standing bear in the Hangzhou Zoo in China gathered worldwide media attention: it was rumored the bear was actually a zoo keeper dressed in a bear suit.
People online pointed out the animal’s upright posture, as well as folds of loose fur on its behind, making the bear look somewhat odd and fuelling speculation that an imposter might be masquerading in its place. As a matter of fact, sun bears are very unique species, the smallest of its kind. They can stand on two legs, like most bears, to investigate their surroundings. Female sun bears hold their cubs with both hands and have them walk on their feet, very much like a human would. And the bear’s loose, saggy skin serves an important function in the wild, protecting them from bites and injuries.
The bear was just a bear in the end. The reason people weren’t able to tell the difference was out of ignorance.
This is what keeps me up at night. Because we get to communicate with AI in our language and it communicates back to us in our language, we can only but humanize AI. It’s not a matter of choice. By designing it to our own likeness, it will become more like us and as it becomes more like us, we struggle to discriminate. I believe we can address that not only by gaining a better understanding of the artificial, but also by gaining a better understanding of the real. We need to learn about ourselves, about how our minds work, and what sets us apart from those gargantuan neural networks trained on 10,000+ A10 GPUs.
If we fail to do so, and we might, I envision a particularly dark dystopia, a world in which we can’t tell the difference because somewhere along the way we’ve forgotten what it means to be human.
Jurgen Gravestein is a writer, consultant, and conversation designer. Roughly 4 years ago, he stumbled into the world of chatbots and voice assistants. He was employee no. 1 at Conversation Design Institute and now works for the strategy and delivery branch CDI Services helping companies drive business value with conversational AI.
Reach out if you’d like him as a guest on your panel or podcast.
Appreciate the content? Leave a like, comment or share it with a friend or colleague.
Great investigation. I'm about to embark on a deeper dive myself into the Gepetto Syndrome. It's actually pretty interesting.
Good piece, Jurgen!