Key insights of today’s newsletter:
AI voice assistants are getting super real, with features like laughter and sighs, very much like the movie Her (2013).
It shows how far we’ve come: the present is catching up with the future and science fiction is quickly becoming reality.
This is awesome for things like learning and tutoring, but we also have to be careful. What happens if we are spending too much time with AI?
↓ Go deeper (8 min read)
For years, the opening slide of our company presentation would feature a still from the Spike Jonze movie Her. The movie’s cinematographic vision of AI spoke to global audiences — it was bright and exhilarating and uncanny — it was something anyone could envision and felt was possible in the future, some day.
The movie was referenced a lot after OpenAI’s spring update, which demo’ed the improved voice capabilities of GPT-4o, and to be honest, they leaned in on it themselves too:
Her was written and directed in 2013, a little longer than a decade ago. But today it is no longer science fiction. Today the present has caught up with the future and it’s readily available to everyone. Not just in the form of ChatGPT, but in the many different AI assistants that are out there, waiting for us, eager to talk.
I’d like to take a moment and reflect on how we got here and where we’re headed.
Magical, not magic
“Any sufficiently advanced technology is indistinguishable from magic.” You might’ve heard this quote before. Whenever a new technology comes along, it feels magical to us. At least for a while.
In the 1940s and 1950s, programming was done by punching holes in cards that could then be fed into early computers. Those computers were magical, until something better came along. The first iPhone was a magical device, its launch was a gravity-defining for the mobile space; now, smartphones are everywhere.
Then came the launch of ChatGPT. Another pivotal moment that expanded our possibility space, a step change for how we interact with technology. Infinitely more humanlike and conversational than anything that came before. We taught computers how to talk.
The newest generation of models are multimodal, which means they seamlessly work across audio, vision, and text. They also display surprisingly advanced voice capabilities: AI can now laugh, sigh, sing, whisper and even … flirt?
With response times as quick as 232 milliseconds, it matches human response rates, making the interactions feel extremely natural. OpenAI wasn’t the first to achieve this; if you’ve interacted with the voice assistant Pi, from Inflection, you already knew this kind of stuff was possible for a while, but it’s impressive nonetheless.
It’s something else, too. When you look at some of the other videos they put out, they all have got one thing in common; people are having fun. And I think that’s really cool to see, people having fun with technology is something we can’t have enough off.
A taste of what’s yet to come
So now we know where we’re at, it’s time to see where we’re headed (or what’s coming for us). Here’s a thought experiment that you can do at home: take a deep breath, consider the current state of the technology, and put on your future-vision goggles. What do you see?
I personally see students and kids learning and exploring new subjects with AI (and totally trying to cheat in novel ways). I see blind people using their phones and letting AI describe the world to them, helping them navigate public spaces or read signs, papers or anything in front of them. I see AI offering translation, helping us cross the barrier of language, anywhere we go.
There are also bleaker futures we can imagine. One particularly bad one is this:
For those who don’t know, Scarlett Johansson was the voice of the AI in the movie Her. Theodore, the main character, falls in love with her and eventually gets his heart broken.
The movie shows how terribly tempting it can be to form a bond with an joyful, understanding, forever-patient being that lives in the cloud. A being that will laugh at any joke, stroke your ego, and not have any relationship expectations that exist in the real world.
We have no idea what the effects of this are on a young person’s brains. Early research suggest that AI companions apps can be addictive and the Verge reported that Character.AI, which is mostly free to use, attracts roughly 3.5 million daily users who spend an average of two hours a day talking to their AI. It could quickly become the new infinite scroll.
And as we confide in our new-found friends, who are non-judgmental and always there for us, we share our inner most private thoughts with them. It will turn these apps into treasure throves of intimate personal data about who we are, what we like, and how we think — which can then be sold to the highest bidder. Because if we know one thing to be true about the Internet, it’s that if a service is free, you are the product.
Join the conversation 🗣
Leave a comment or like this article if it resonated with you.
Get in touch 📥
Shoot me an email at jurgen@cdisglobal.com.
The next step would seem to be to have AI connect with us via a human face interface. We've been focused on faces when we communicate for millions of years, and before we were even human. Faces are a big deal, the next big turning point.
It's long been possible to animate a face image with audio. The challenge seems to be doing it at scale over the Internet.
More discussion of this would be most welcome. As best I can tell, this seems sort of a neglected topic.