A earlier version of this article was published as a guest post on
as part the AGI Series. Other contributions in this series came from , , and .Many tests have been proposed to see if a machine has reached human-level intelligence. The Turing test is probably the most (in)famous, but I bet you never heard about the Coffee test: a machine is required to enter an average American home and figure out how to make coffee.
It would need to find the coffee, a mug, maybe grind the coffee if it’s just beans, boil water, and brew the coffee. I think we can all agree this requires more coordination and intelligence than we’re currently able to conjure up in our machines — yet, for humans, it doesn’t get more mundane than that.
The test was introduced by Steve Wozniak, co-founder of Apple, and it makes an interesting point about the way we should think and talk about artificial general intelligence (AGI).
AGI is Silicon Valley’s favorite three letter-acronym nowadays, it’s all everyone can talk about: the modern holy grail. But what is it really and are we as close to achieving it as we are led to believe?
Playing fast and loose with AGI definitions
AGI is a hypothetical type of intelligent agent that doesn’t exist yet, hence the word ‘hypothetical’. If realized, this agent could learn to accomplish any task that human beings can perform, as good or better than us. Achieving AGI has been the openly stated goal of companies like OpenAI, Google DeepMind, Inflection, and Anthropic.
As of today, AGI remains entirely speculative. Even if it’s possible to build such a system, it hasn’t been demonstrated that it can be done or what exactly constitutes an AGI system. There seems to be no broadly agreed upon definition amongst industry experts and because of that people are happy to play fast and loose with its definition, change the goal posts, all depending on whether they want to invoke fear, appeal to goodwill, inspire or are looking to secure their next venture capital injection.
There’s no consensus on if and when AGI will arrive. Some say we’re terribly close. Dario Amodei, CEO of Anthropic, for example, has said in a recent podcast that he think we will achieve AGI in 2-3 years.
AI Researcher Geoffrey Hinton, who until very recently worked for Google, said:
(…) I thought it was 30 to 50 years or even longer away. Obviously, I no longer think that. Many others say we are much further away. And yet others say we will never achieve it.
It’s worth noting that we should probably take the opinions of experts employed by some of the aforementioned AI companies with a fair grain of salt. From a business standpoint it’s better to slightly overstate your AI capabilities than downplaying them, because downplaying gives off the impression of lagging behind.
Launchpad or off-ramp?
Nevertheless, progress has been impressive. GPT-1 was released in 2018 and had 117 million parameters, whereas its fourth generation, GPT-4, released in March this year, is rumored to consist of eight models with a whopping 220 billion parameters each. Meanwhile billions of dollars are being poured into the generative AI space and it’s safe to say that the technology has ushered in the first new UI paradigm shift in 60 years.
Whether it has brought us any closer to AGI remains up for debate. Sam Altman, CEO of OpenAI, is optimistic but has acknowledged publicly that further progress will not necessarily come from making models bigger.
A recent piece in WIRED included a telling quote from Greg Brockman, OpenAI’s CTO:
“The biggest thing we’re missing is coming up with new ideas. It’s nice to have something that could be a virtual assistant. But that’s not the dream. The dream is to help us solve problems we can’t.”
Besides the candor, which I find refreshing, I cannot help to sense a subtle undertone of disappointment. OpenAI is clearly grappling with making another exponentially powerful improvement — GPT-4 is and never was their final destination.
Yann LeCun, Chief AI Scientist at Meta, has been outspoken, too. On the road to AGI, he has referred to generative AI as an “off-ramp”.
has suggested generative AI could turn out to be a dud and from a scientific standpoint, a dead end. He argues that the technology we have today is built on completion and not on factuality, and there’s a good chance the hallucination problem cannot be solved.Huston, we have a problem
Since we don’t agree on a definition, there’s no objective measurement that tells us a system has reached or surpassed human level intelligence. It doesn’t keep us from trying, though.
We love to subject AI systems to all sorts of benchmarks and tests originally designed for humans. OpenAI’s GPT-4 reportedly scored in the top percentile on the Uniform Bar Exam, the Graduate Record Exam, the US medical licensing exam, and aced several high-school Advanced Placement tests. However, a recent article in Science by
explains why we should be cautious in interpreting this as evidence for human-level intelligence.One of the biggest issues is data contamination. This is when a system has already seen the answers to the questions before. For one of the coding benchmarks, for example, GPT-4’s performance on problems dating from before 2021 was significantly better than problems published after 2021, which coincides with GPT-4’s training cutoff. An article from AI Snake Oil fleshes it out in more detail, so I won’t.
What’s so problematic about it is that because of the lack of transparency from AI companies like OpenAI, it’s impossible to prove contamination with certainty. Transparency isn’t in their best interest, not just for competitive reasons, but also because they would much rather report on their successes than publicly acknowledge their flaws.
Even if a language model hasn’t literally seen the exact problem in its training set, it might’ve seen examples that are close, allowing it to get away with a shallower level of reasoning (i.e. pattern matching). And whether or not current AI systems can reason is a hot topic of discussion in and of itself. A paper published last month went even as far as to say that GPT-4 is utterly incapable of reasoning.
Online people jumped to the defense in an attempt to refute, showing that with the right prompting they were able to get the right answers out of the system.
All this shows is that the system provides different results under different circumstances. It demonstrating precisely the unreliability (i.e. lack of robustness) of the system and suggests that something else is happening under the hood that might appear as reasoning, but in fact isn’t reasoning at all.
Machine intelligence vs. human intelligence
The problem is that we desperately want AGI to be like us, so much so that we’re training AI systems to speak and act like us, yet, we couldn’t be more different from one another.
In many ways GPT-4 is already ‘smarter’ than the average person. It can access information and formulate answers more quickly than any of us on a vast array of topics. At the same time, it’s also much more ‘stupid’. It can’t plan ahead or reason very well (even though it gives off the impression it can). It can express words of empathy, but doesn’t feel anything (even though it gives off the impression it does). And it has no will or thoughts of its own (it only moves when prompted).
This means we’ve built a system that may appear to be smart or empathetic or cognizant, but could just be posing as such.
Yes, these systems are able to perform tasks that typically require human intelligence, but they are not arriving at them the same way we do. A calculator is superior to a human at performing math, just like a chess computer is superior to a human at playing chess, but we’d never call these machines intelligent. What’s so different about GPT-4?
Well, this machine speaks our language! This machine is more fluent than any other machine that came before it and it messes with our heads. As humans, we tend to project intelligence and agency onto systems that provide even the smalles hint of linguistic competence, something that’s commonly referred to as the ELIZA-effect.
We’re tempted to see the ghost in the machine, but in reality chatbots like ChatGPT aren’t much more than glorified tape recorders and, according to futurologist and theoretical physicist Michio Kaku, the public anxiety over this technology is misguided.
These systems don’t learn from first principles and experience, like us, but by crunching as much human-generated content as possible. A process that requires warehouses full of GPU’s and can hardly be called efficient. The models are then trained again through a process called reinforcement learning with human feedback (RLHF) to make them more accurate, coherent, and aligned with human values. In a way, we’re trying to brute force intelligence by throwing as much compute at it as possible and then tinkering with them to optimize for human preferences.
What we end up with is not human-level intelligence, but a form of machine intelligence that appears human-like.
If that’s a definition of intelligence that you are comfortable with, then AGI might indeed be near, but it also means you must acknowledge that a machine is only as smart as the next person it fools into believing it’s smart.
So what’s next for AGI?
Despite a lack of agreement on definitions and good measurements, it won’t keep AI companies from claiming they’ve reached AGI. I suspect OpenAI will claim AGI when they finish training the next generation of their model, GPT-5 — and that’s alright. When the times comes, we’ll argue about it online, and in the meantime I’ll continue to brew my own coffee in the morning.
Something we do have to take into calculation is the possibility of another breakthrough. Today’s generative AI revolution was ignited by the invention of the transformer back in 2017. What if the next giant leap turns out to be the catalyst that gives us real artificial general intelligence?
We might find the answer in Mustafa Suleyman’s upcoming book ‘The Coming Wave’. For those who don’t know, Suleyman is the co-founder and CEO of Inflection AI and previously co-founded DeepMind.
An excerpt:
As technology proliferates, more people can use it, adapt it, shape it however they like, in chains of causality beyond any individual’s comprehension. One day someone is writing equations on a blackboard or fiddling with a prototype in the garage, work seemingly irrelevant to the wider world. Within decades, it has produced existential questions for humanity. As we have built systems of increasing power, this aspect of technology has felt more and more pressing to me.
Technology’s problem here is a containment problem. If this aspect cannot be eliminated, it might be curtailed. Containment is the overarching ability to control, limit, and, if need be, close down technologies at any stage of their development or deployment. It means, in some circumstances, the ability to stop a technology from proliferating in the first place, checking the ripple of unintended consequences (both good and bad).
The more powerful a technology, the more ingrained it is in every facet of life and society. Thus, technology’s problems have a tendency to escalate in parallel with its capabilities, and so the need for containment grows more acute over time.
Does any of this get technologists off the hook? Not at all; more than anyone else it is up to us to face it. We might not be able to control the final end points of our work or its long-term effects, but that is no reason to abdicate responsibility. Decisions technologists and societies make at the source can still shape outcomes. Just because consequences are difficult to predict doesn’t mean we shouldn’t try.
A message we can all get behind, I think.
When we reach AGI might be up for debate, but what isn’t up for debate is that in the coming decades humanity will be faced with a lot more uncertainty. The societal impact of increasingly advanced AI systems is going to be uniquely disruptive and I believe it’s up to all of us to face those challenges head on.
Join the conversation
Leave a comment with your thoughts. Please let me know what you think of the race to AGI, if you think computers will reach or surpass human intelligence, and in what timeframe. Is AGI close or further away than we think? 💬
Do you think as we approach a technological singularity and as Quantum computers with trillions of qubits come online, will AI itself begin to morph our perception of it? And will we notice when its no longer in the control of corporations, researchers and policy makers?
There's a point where compute, Quantum and AI intersect where regulation becomes impossible, do you think that's before, at or after AGI?
Maybe the solution for achieving AGI lies in organoid intelligence: https://medium.com/@gmemon/organoid-intelligence-97f04b3caed2. The key idea is that instead of using silicon for intelligence, let's use biology.