Summary: Everyone is trying to build AI agents. These are chatbots instilled with the agency to venture out into the digital or the real world. AI agents display many of the same qualities as humans, but they’ve been equipped with a form of agency we haven’t seen before.
↓ Go deeper (10 min)
Agents are seen as the holy grail in AI. Early attempts have garnered considerable attention and investment, like Cognition’s ‘Devin’ and Google’s ‘AMIE’, both of which I’ve written about before, Sakana’s ‘AI Scientist’, and Cosine’s ‘Genie’.
These AI agents are far from perfect. They still have difficulty with common sense reasoning that we as humans take for granted, lack robust planning capabilities, and break down in unexpected ways. They also can be slow and expensive to run.
At the same time, there’s reason for excitement. Never ever have we come this close to computers that can not just talk, but think. What better time than now to reflect on what it means to give AI the autonomy to roam around in the digital and physical world, and ask ourselves: do agents have agency?
Agency without intelligence
Agency is the feeling of being in control. It’s our capacity to influence our own thoughts, emotions and actions. Agency is the reason we grant moral consideration to people, and to a lesser extent animals, and why when you rob a bank, you are held responsible (if you get caught, of course).
Luciano Floridi, best known for his work on the philosophy of information, describes AI as “agency without intelligence”. In an exceptionally well-written research article from 2023, he writes:
We have gone from being in constant contact with animal agents and what we believed to be spiritual agents (gods and forces of nature, angels and demons, souls or ghosts, good and evil spirits) to having to understand, and learn to interact with, artificial agents created by us, as new demiurges of such a form of agency. We have decoupled the ability to act successfully from the need to be intelligent, understand, reflect, consider or grasp anything. We have liberated agency from intelligence.
It’s a form of agency unlike any we’ve ever seen before and, let’s be frank, it’s pretty weird.
AI agents display many of the same qualities that we attribute to human agents: they speak our language, read and write code, and can be taught to use tools and operate a physical body. At the same time, the consensus among scientists is that they are not conscious and have no feelings, desires or a will of their own. They are not born and bred like us, they are made. They are an engineered species.
The intentional stance
What makes this species so special is that your goal can become their goal. It’s like the robotaxis cruising around San Francisco — give them your destination and off they go.
Now imagine you have a more open-ended AI agent that you can command to do things, like “make me more money” or “protect me, at all costs”.
When we get to that point, and there is at least a non-zero chance that we will, I predict these agents will be perceived as having goals and desires of their own. This can be explained through what the late Daniel Dennett, a highly influential American philosopher and cognitive scientist, called the ‘intentional stance’1. It describes our tendency to predict behavior (whether that is other humans, animals or artificial beings) based on the assumption they are driven by goals, desires, and personal beliefs. If something displays some form of agency, we basically treat this thing as if it has. It’s the philosophical version of “if it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck”.
According to Dennett, the question of whether or not something actually has agency isn’t all that useful. This is true for humans as well, for reasons I explained in one of my first viral articles about theory of mind:
We can observe our own mind, yes, but we don’t have access to the minds of others, so the existence and nature of other minds must be inferred. (…)
Hypothetically, any of us could be imagining every single interaction with every single person throughout any given day, but the presumption is that we’re not. We all operate under this presumption.
We believe others have minds not because we can prove it, but because operating under the presumption continues to produce the most reliable results.
Aligning AI with human values
We have no choice but to extend AI agents the same courtesy. In a thread under a 2023 Substack article by about potential AI self-exfiltration scenarios, , ex-OpenAI alignment researcher sums it up succinctly, “if it acts like it has a goal, then it has a goal”.
This is especially relevant with regards to alignment, which is about making the AI do what we want it to do and refuse to do what we don’t want it to do.
AI agents aren’t your usual software, they are more similar to a well-trained dog than to a traditional computer program. They follow our instructions not because they are programmed, but because they’ve been conditioned to do so. Unfortunately, training sometimes fails or the AI optimizes for something else instead. A relatively innocuous example is the sycophantic behavior that can be observed in assistants like ChatGPT, Claude and others, which is a by-product of reinforcement learning with human feedback.
Now, personally I’ve been very reluctant to anthropomorphize AI, because I know that, technically, a truthful statement and a hallucinated response look the same to an LLM. It’s just words that are likely to appear next to other words.
However, this isn’t a helpful frame most of the time. Think about it — if an AI agent acts like it is lying, it can be treated as lying for all intents and purposes. AI researchers should look into why this happens, but for the average person, it’s more useful to adopt an intentional stance and say: AI agents can lie, sweet-talk, forget things, struggle with reasoning, and they are above all great at pretending. They will confidently discuss topics they’re not experts on, much like the Jordan Petersons and Andrew Hubermans of this world.
Obviously, this raises questions around accountability and responsibility. For this, we can look to existing legislation around self-driving cars to get an idea of what that could look like:
Until self-driving cars rule our roads, determining liability in a self-driving car accident partially lands with a driver. Tesla vehicles involved in self-driving car accidents have proven driver error or over-reliance on AutoPilot features to mean driver liability. (…)
Experts believe that as self-driving cars improve and vehicles become more autonomous, liability may shift from the driver to the manufacturer.
One way or the other, we are responsible for the actions of our AI agents — be it as a user or as a manufacturer. They are an engineered species. When they wreck havoc, it won’t be by accident, but by design.
Catch you on the flip side,
— Jurgen
The ‘intentional stance’ is actually one of three stances or strategies that Daniel Dennett proposes for understanding and predicting behavior.
The physical stance: The most concrete level, based on physics and chemistry. We can reasonably predict its outcomes based on the laws of physics. If a ball is dropped, it falls.
The design stance: More abstract, applicable to biology and engineering. We can predict the behavior of a thing based on its design. A helicopter uses rotor blades and its motor to fly.
The intentional stance: The most abstract, applied to software and minds. It predicts behavior based on assumed beliefs and desires of an agent. A dog wigs its tail, because it is happy to see us.
The choice of strategy depends on its predictive power in a given situation, i.e. how successful that stance is when applied. It does not claim to say anything about the matter-of-factness of the entity that is being observed.
Nice post. I especially appreciate the pointer to Luciano Floridi, who I have not encountered.
I'm with you on the reluctance to anthropomorphize AI, but I'm not following your point that "this isn’t a helpful frame most of the time." It seems to me that we want to keep in mind that when an LLM confabulates they are not lying because lying requires an intention to deceive.
When dealing with humans, sussing out intentions is helpful. When the Jordan Petersons and Andrew Hubermans of this world say things that are untrue, they are doing so to acquire status and prestige. They are confabulating (if they are unaware that they are speaking untruth) or lying (if they know it to be untrue) with the intention of pleasing and impressing their audience. Understanding intentions helps evaluate human statements.
When an LLM generates an untruth, it is doing the same thing it does when it generates a true statement: attempting to provide a satisfying answer. It has no intention, but the goal it has been given has not been changed. Treating it as if it has intentions will mislead us. What am I missing?
In your opinion, are agents the genuine next step, or is it more "The Next Big Thing" chat to keep the hype around AI ticking along?
It strikes me that if agents have same issues as LLM chatbots and other tools, and we completely remove human intervention (like, if we hand over tasks to them with no oversight), that seems like a recipe for disaster?