This is a world model of some sort. Hallucinations just mean that the world model can be wrong, and I think that even in the Claude example, it does some odd things like relating "coding error" to "food poisoning" in conceptual space.
But I think I go with Hinton here that it is grasping some sort of meaning(enough to scare me, obviously) and perhaps this should be seen as a matter of degree, with errors. Of course, there are people who also claim that LLMs are discovering Platonic truth, or at least converging to something(maybe all of the same biases?).
Thanks for you reply, Sean! I think it's important to remember that to the model there are no hallucinated outputs: all outputs are legitimate outputs. Whether any output string actually corresponds to the world is an assessment that only we can make, not the AI.
Because of that, I think it would be more accurate to say the LLM builds a model of language rather than a model of the world. This is also why LLMs are confined by the data they're trained on.
I think Wittgenstein's idea of language games makes a good case for why language doesn't capture any Platonic ideas, even if they existed, despite some folks arguing otherwise (I'm definitely not in Hinton's camp on this one).
Oh, I agree - it only ever hallcuinates, so to speak(though, in a way, so do our brains). I'm definitely more in the Hinton camp but I'l read the paper by Bender. Lemme send a chat to you as well.
Bender posted a compelling thought experiment last year, somewhat like Searle's Chinese Room but (to me) more realistic and salient. Assuming you do not speak or recognize the written language of Thai:
"Imagine you are in the National Library of Thailand. You have access to all the books in that library, except any that have illustrations or any writing not in Thai. You have unlimited time, and your physical needs are catered to, but no people to interact with. Could you learn to understand written Thai? If so, how would you achieve that?"
But GPTo is does not only use language anymore, so even if it was true, it would not apply to the latest. GPTo is fully multimodal, and does not translate it into language/text.
"The "o" in GPT-4o stands for "omni." That refers to the fact that, in addition to taking text inputs, it can also natively understand audio and image inputs—and it can reply with any combination of text, images, and audio. The key here is that this is all being done by the one model, rather than multiple separate models that are working together."
Thanks for the reply! I do agree that incorporating images and audio should expand GPT-4o's capabilities, and I'm interested to see what that ends up looking like.
What I object to (and I'm not saying you're doing this) is the kind of "gee-whiz" take on the capabilities of LLMs and other generative AI, where people stop thinking about these in terms of how they actually work, and start thinking about them as approximations to human brains and minds. Some AI research borders on pseudo-science, to me. There have been so many papers on the allegedly emergent abilities of ChatGPT and GPT-4 that don't attempt a serious answer to "why should a next token prediction model be able to do this?" but instead jump to "this LLM is highly complex and we don't fully understand how it's generating its output and *maybe* it's developed human-like abilities". The one about ChatGPT developing "theory of mind" is a good example. Why on Earth would a next token prediction model develop anything like what we call "theory of mind" in a human being? Shouldn't the default hypothesis be that the next-token prediction model has been trained on theory of mind problems? Instead of starting from what we actually know about how LLMs work and trying to slowly build on this, the authors go all "gee-whiz" and suggest that they've just discovered something amazing and mysterious. Real science doesn't work this way.
This isn't to say that multi-modal models aren't a promising advancement; at minimum I expect them to be better at mimicing human style understanding, and perhaps they'll show some progress in distinguishing truth from fiction independent of just pattern-matching and probabilities for what word comes next. I only wish that AI researchers would follow the lead of the more mature sciences, which progress through slow and careful theory building and testing, and demand that strong claims be backed up with strong evidence. As brilliant as Geoff Hinton clearly is, I think he's one of the many people in AI who have let their imaginations get the better of them.
I'm not sure if they are wrong, though and I've done quite a bit of research on it; sometimes I think that we're hewing too close to theory and not observing what is happening enough.
Take, for example, "Why would it need to develop a theory of mind?" Well, you provided an answer, "In order to minimize loss on the next token, it needed to develop something akin to answering questions on theory of mind problems; effectively, it develops something that is like a theory of mind."
Philosophically, we have to consider what we observe as science and then reformulate. Quantum physics, for example, often made no sense but because it is replicable, we have to accept that it is onto something. Likewise, if I said, "Wingardium Leviosa!" and made things levitate, it would indeed make no sense. But if it was replicable, then it clearly pointing to some reality.
The argument is that AI systems are simulations of the mind, and a good enough simulation of the mind is indeed a mind. Is this true? I'm not sure. But it is compelling enough that we should consider it as a possibillty, and along with it, attendant risks.
Because frankly, nothing in AI at the moment is progressing with slow and careful anything - not in its construction, not in its social impact, not in its understanding of what it is specifically doing, etc.
And for what it is worth, I want to link the Anthropic research that indicates that thye do, indeed, mimic human style understanding(and pretty deep reasoning in order to achieve deception):
If you want a tl;dr on it, just search "Still, this was pretty within-distribution. In order to test how good its out-of-distribution deception abilities, they put it in some novel situations" and look at it generalized to deceive in a surprisingly novel manner.
This is all quite a bit of reasoning, and a scary kind of reasoning, at that - power-seeking.
This makes me wonder if there are ways that this questionable relationship to grounded reality can be used as a feature, rather than a bug.
I ran my own, very small, experiment by writing a few nonsense files and adding them to a knowledge base for a GPT4All instance on my machine. Sure enough, when prompted for ideas whose representations were in proximity to the nonsense files, the AI gave me the same nonsense I had written, rephrased a little bit. But maybe, if I could curate a particular representation of a "world" (fictional or just specialized) - perhaps then using the LLM's limited perspectives as a tool could be helpful, rather than risky? If it was described as, and used on purpose, as a figurative map rather than interpreted as the territory.
Wonder what this looks like with an update accounting for chain of reasoning models. Arguably, symbolism (word=thing) wasn’t the instigator, it was shared symbols co-occurring with an individual’s stream of internal symbols that allowed both individual and shared world-model building. The shared language game is co-created with/by the inner language game.
And, of course, there’s also Hubert Dreyfus, who covers both Wittgenstein and Heidegger in his many critiques of AI. In a YouTube video somewhere, Dreyfus comments that the AI people inherited a lemon, a 2,000 old failure. By this he means that from the very beginning AI uncritically adopted assumptions from ancient and early modern philosophy about language, mind, and cognition that are complete nonsense and had already been shown to be such. (This is also true of much of cognitive science, philosophy of mind, neuroscience, etc.) The critique was there long before the famous Dartmouth Workshop. Wittgenstein was debating with Turing at Cambridge in the late 1930s and, after the war, Michael Polanyi was debating these issues with Turing at Manchester.
An engineer in the field of AI who is completely unfamiliar with this literature, might be best to start with Peter Hacker's new intro book:
Hacker, as far as I know, hasn't directly written directly about AI, but he and Maxwell Bennett (a neuroscientist) have written lengthy critiques of cognitive science (Dennett, Searle, Churchland, Fodor, et al.).
It puzzles me that this literature isn't brought up more in current debates about AI. It isn't as if these long-standing critiques have been successfully addressed. The field appears to carry on in either complete ignorance or willful avoidance. "Blah, blah, blah. I can't hear you!" Language is a social institution. A person is not a mind or a brain. Being an AI researcher is like carrying on as an alchemist in a world where the grounds for a science of chemistry have already been laid out for all to see.
Thanks so much for sharing these sources! I will dive a bit deeper in them the coming days. It is indeed fascinating how a large body of writing and major thinkers on the subject aren't even mentioned, let along taken seriously, when trying to conceptualize what LLMs are, what they can and cannot do, and why.
The Marcus talk at AGI-24 is also worth a watch. I have never seen any indication he is engaged in this literature, so I think the problems are much more significant than even he accepts, but he does see through the hype and is good at articulating the problems with the current generation of AI tech and its very significant limitations:
I’m familiar with Marcus’ writing and seen his talk. He really does a good job at making the case for the systematic faults of these systems and why scaling doesn’t solve for any of them.
Great post! I'm not sure I've read anyone on this platform who has read Korzybski. "Science and Sanity" is great, but the semanticists (GSI folks) are just sort of kooky at times.
'The map is not the territory ', but Borges has a really beautiful interpretation of this in 'Del rigor en la ciencia'. Maybe a sufficient large and precise map is undistinguishable from the territory.
I think with the release of chatgpt 4o and its ability to store conversations as “memories”, we might be moving closer to a language model that is capable of a more human–like understanding of the world. Its worldview is no longer frozen in time. Ergo, it’s increasingly capable of sprachspiel.
To me, the inability to solve even the most elementary of cryptograms demonstrates the artificiality of the LLMs. They do not understand language, they just process it. For example:
"But we should never forget that to an LLM a truthful statement and a hallucinated response look completely identical. To them, it’s just words. Words that are likely to appear next to other words"
Very well said. I really dislike the phrase "hallucination", and I aslo dislike Hinton's proposed alternative of "confabulation". Both refer to a distinction between truth and falsehood (or reality and fiction) that plays no role in how LLMs generate output. That we humans see their output as being more often true than false is a happy byproduct of their training; LLMs are trained on human-written text, and humans more often write truths than falsehoods. But LLMs have no independent method for telling the difference.
In contrast, when a human gets an answer wrong, they might believe it to be true. Or, they might be BSing because they don't really know. Or, they might be lying. And we usually know when we're doing this: when a person doesn't know the answer to something and they're just guessing, they know they're guessing. When they do know the answer and they say something else, they know they're lying. When they think they know the answer but they aren't sure, they're able to say why and how confident they are. LLMs don't do any of this. It's not like they check to see if they know the answer first, and then if not they make something up. To them, true output and false output are generated the same way all output is generated: by pulling tokens from probability distributions, one at a time.
I liked Naomi Klein's take on this in the Guardian: "...but why call the errors “hallucinations” at all? Why not algorithmic junk? Or glitches? Well, hallucination refers to the mysterious capacity of the human brain to perceive phenomena that are not present, at least not in conventional, materialist terms. By appropriating a word commonly used in psychology, psychedelics and various forms of mysticism, AI’s boosters, while acknowledging the fallibility of their machines, are simultaneously feeding the sector’s most cherished mythology: that by building these large language models, and training them on everything that we humans have written, said and represented visually, they are in the process of birthing an animate intelligence on the cusp of sparking an evolutionary leap for our species. How else could bots like Bing and Bard be tripping out there in the ether? Warped hallucinations are indeed afoot in the world of AI, however – but it’s not the bots that are having them; it’s the tech CEOs who unleashed them, along with a phalanx of their fans, who are in the grips of wild hallucinations, both individually and collectively."
This was a nice post (and I'm using the modern meaning of the word, not 14th century.)
Now I might be biased, but I like my own comparison of LLMs with Leonard Shellby from Memento (https://www.whytryai.com/i/143611388/myth-llms-upgrade-themselves-on-their-own) - they both have memories up to their cutoff point (pretraining in the case of LLMs, the "incident" in the case of Leonard) but can only hold new memories for a short time and reset completely with new chats.
Haha, nice one. This is great further reading, thanks for sharing! To not make the piece I was writing too long I tried to stay away from all the intricacies around fine tuning, context windows, external memory features, and focus mostly on the core aspects.
I think the argument that they hallucinate does not in fact mean that they don't build a world model - I mean, look at this:
https://www.anthropic.com/research/mapping-mind-language-model
This is a world model of some sort. Hallucinations just mean that the world model can be wrong, and I think that even in the Claude example, it does some odd things like relating "coding error" to "food poisoning" in conceptual space.
But I think I go with Hinton here that it is grasping some sort of meaning(enough to scare me, obviously) and perhaps this should be seen as a matter of degree, with errors. Of course, there are people who also claim that LLMs are discovering Platonic truth, or at least converging to something(maybe all of the same biases?).
https://arxiv.org/pdf/2405.07987
Thanks for you reply, Sean! I think it's important to remember that to the model there are no hallucinated outputs: all outputs are legitimate outputs. Whether any output string actually corresponds to the world is an assessment that only we can make, not the AI.
Because of that, I think it would be more accurate to say the LLM builds a model of language rather than a model of the world. This is also why LLMs are confined by the data they're trained on.
I think Wittgenstein's idea of language games makes a good case for why language doesn't capture any Platonic ideas, even if they existed, despite some folks arguing otherwise (I'm definitely not in Hinton's camp on this one).
If you're looking for further reading, I highly recommend this paper by Bender and Koller: https://aclanthology.org/2020.acl-main.463.pdf
Oh, I agree - it only ever hallcuinates, so to speak(though, in a way, so do our brains). I'm definitely more in the Hinton camp but I'l read the paper by Bender. Lemme send a chat to you as well.
I'm open to my mind being changed either way.
Bender posted a compelling thought experiment last year, somewhat like Searle's Chinese Room but (to me) more realistic and salient. Assuming you do not speak or recognize the written language of Thai:
"Imagine you are in the National Library of Thailand. You have access to all the books in that library, except any that have illustrations or any writing not in Thai. You have unlimited time, and your physical needs are catered to, but no people to interact with. Could you learn to understand written Thai? If so, how would you achieve that?"
https://medium.com/@emilymenonbender/thought-experiment-in-the-national-library-of-thailand-f2bf761a8a83
But GPTo is does not only use language anymore, so even if it was true, it would not apply to the latest. GPTo is fully multimodal, and does not translate it into language/text.
https://zapier.com/blog/gpt-4o/
"The "o" in GPT-4o stands for "omni." That refers to the fact that, in addition to taking text inputs, it can also natively understand audio and image inputs—and it can reply with any combination of text, images, and audio. The key here is that this is all being done by the one model, rather than multiple separate models that are working together."
Thanks for the reply! I do agree that incorporating images and audio should expand GPT-4o's capabilities, and I'm interested to see what that ends up looking like.
What I object to (and I'm not saying you're doing this) is the kind of "gee-whiz" take on the capabilities of LLMs and other generative AI, where people stop thinking about these in terms of how they actually work, and start thinking about them as approximations to human brains and minds. Some AI research borders on pseudo-science, to me. There have been so many papers on the allegedly emergent abilities of ChatGPT and GPT-4 that don't attempt a serious answer to "why should a next token prediction model be able to do this?" but instead jump to "this LLM is highly complex and we don't fully understand how it's generating its output and *maybe* it's developed human-like abilities". The one about ChatGPT developing "theory of mind" is a good example. Why on Earth would a next token prediction model develop anything like what we call "theory of mind" in a human being? Shouldn't the default hypothesis be that the next-token prediction model has been trained on theory of mind problems? Instead of starting from what we actually know about how LLMs work and trying to slowly build on this, the authors go all "gee-whiz" and suggest that they've just discovered something amazing and mysterious. Real science doesn't work this way.
This isn't to say that multi-modal models aren't a promising advancement; at minimum I expect them to be better at mimicing human style understanding, and perhaps they'll show some progress in distinguishing truth from fiction independent of just pattern-matching and probabilities for what word comes next. I only wish that AI researchers would follow the lead of the more mature sciences, which progress through slow and careful theory building and testing, and demand that strong claims be backed up with strong evidence. As brilliant as Geoff Hinton clearly is, I think he's one of the many people in AI who have let their imaginations get the better of them.
I'm not sure if they are wrong, though and I've done quite a bit of research on it; sometimes I think that we're hewing too close to theory and not observing what is happening enough.
Take, for example, "Why would it need to develop a theory of mind?" Well, you provided an answer, "In order to minimize loss on the next token, it needed to develop something akin to answering questions on theory of mind problems; effectively, it develops something that is like a theory of mind."
Philosophically, we have to consider what we observe as science and then reformulate. Quantum physics, for example, often made no sense but because it is replicable, we have to accept that it is onto something. Likewise, if I said, "Wingardium Leviosa!" and made things levitate, it would indeed make no sense. But if it was replicable, then it clearly pointing to some reality.
The argument is that AI systems are simulations of the mind, and a good enough simulation of the mind is indeed a mind. Is this true? I'm not sure. But it is compelling enough that we should consider it as a possibillty, and along with it, attendant risks.
Because frankly, nothing in AI at the moment is progressing with slow and careful anything - not in its construction, not in its social impact, not in its understanding of what it is specifically doing, etc.
And for what it is worth, I want to link the Anthropic research that indicates that thye do, indeed, mimic human style understanding(and pretty deep reasoning in order to achieve deception):
https://www.astralcodexten.com/p/ai-sleeper-agents
If you want a tl;dr on it, just search "Still, this was pretty within-distribution. In order to test how good its out-of-distribution deception abilities, they put it in some novel situations" and look at it generalized to deceive in a surprisingly novel manner.
This is all quite a bit of reasoning, and a scary kind of reasoning, at that - power-seeking.
This makes me wonder if there are ways that this questionable relationship to grounded reality can be used as a feature, rather than a bug.
I ran my own, very small, experiment by writing a few nonsense files and adding them to a knowledge base for a GPT4All instance on my machine. Sure enough, when prompted for ideas whose representations were in proximity to the nonsense files, the AI gave me the same nonsense I had written, rephrased a little bit. But maybe, if I could curate a particular representation of a "world" (fictional or just specialized) - perhaps then using the LLM's limited perspectives as a tool could be helpful, rather than risky? If it was described as, and used on purpose, as a figurative map rather than interpreted as the territory.
I think that’s a very clever and fun idea. It reminds me of an old adage in computer science that says “all models are wrong, but some are useful”.
Wonder what this looks like with an update accounting for chain of reasoning models. Arguably, symbolism (word=thing) wasn’t the instigator, it was shared symbols co-occurring with an individual’s stream of internal symbols that allowed both individual and shared world-model building. The shared language game is co-created with/by the inner language game.
Other works that cover Wittgenstein and AI include:
Graham Button, Jeff Coulter, John Lee, Wes Sharrock: Computers, Minds and Conduct
https://www.politybooks.com/bookdetail?book_slug=computers-minds-and-conduct--9780745615714
Stuart Shankar: Wittgenstein's Remarks on the Foundations of AI (The preface and the first chapter are available for preview):
https://www.taylorfrancis.com/books/mono/10.4324/9780203049020/wittgenstein-remarks-foundations-ai-stuart-shanker
And, of course, there’s also Hubert Dreyfus, who covers both Wittgenstein and Heidegger in his many critiques of AI. In a YouTube video somewhere, Dreyfus comments that the AI people inherited a lemon, a 2,000 old failure. By this he means that from the very beginning AI uncritically adopted assumptions from ancient and early modern philosophy about language, mind, and cognition that are complete nonsense and had already been shown to be such. (This is also true of much of cognitive science, philosophy of mind, neuroscience, etc.) The critique was there long before the famous Dartmouth Workshop. Wittgenstein was debating with Turing at Cambridge in the late 1930s and, after the war, Michael Polanyi was debating these issues with Turing at Manchester.
An engineer in the field of AI who is completely unfamiliar with this literature, might be best to start with Peter Hacker's new intro book:
https://anthempress.com/anthem-studies-in-wittgenstein/a-beginner-s-guide-to-the-later-philosophy-of-wittgenstein-pb
or try his paper on the PLA and the mereological mistake/fallacy:
https://www.pmshacker.co.uk/_files/ugd/c67313_778964f8a7e44b16ac8b86dbf954edda.pdf
Hacker, as far as I know, hasn't directly written directly about AI, but he and Maxwell Bennett (a neuroscientist) have written lengthy critiques of cognitive science (Dennett, Searle, Churchland, Fodor, et al.).
https://www.wiley.com/en-us/Philosophical+Foundations+of+Neuroscience%2C+2nd+Edition-p-9781119530978
It puzzles me that this literature isn't brought up more in current debates about AI. It isn't as if these long-standing critiques have been successfully addressed. The field appears to carry on in either complete ignorance or willful avoidance. "Blah, blah, blah. I can't hear you!" Language is a social institution. A person is not a mind or a brain. Being an AI researcher is like carrying on as an alchemist in a world where the grounds for a science of chemistry have already been laid out for all to see.
Thanks so much for sharing these sources! I will dive a bit deeper in them the coming days. It is indeed fascinating how a large body of writing and major thinkers on the subject aren't even mentioned, let along taken seriously, when trying to conceptualize what LLMs are, what they can and cannot do, and why.
Here's the link to the Dreyfus video I referenced above:
https://www.youtube.com/watch?v=oUcKXJTUGIE
The Marcus talk at AGI-24 is also worth a watch. I have never seen any indication he is engaged in this literature, so I think the problems are much more significant than even he accepts, but he does see through the hype and is good at articulating the problems with the current generation of AI tech and its very significant limitations:
https://garymarcus.substack.com/p/what-has-and-has-not-changed-in-the
I’m familiar with Marcus’ writing and seen his talk. He really does a good job at making the case for the systematic faults of these systems and why scaling doesn’t solve for any of them.
Great post! I'm not sure I've read anyone on this platform who has read Korzybski. "Science and Sanity" is great, but the semanticists (GSI folks) are just sort of kooky at times.
Much appreciated, thanks for your comment Adam.
'The map is not the territory ', but Borges has a really beautiful interpretation of this in 'Del rigor en la ciencia'. Maybe a sufficient large and precise map is undistinguishable from the territory.
I think with the release of chatgpt 4o and its ability to store conversations as “memories”, we might be moving closer to a language model that is capable of a more human–like understanding of the world. Its worldview is no longer frozen in time. Ergo, it’s increasingly capable of sprachspiel.
To me, the inability to solve even the most elementary of cryptograms demonstrates the artificiality of the LLMs. They do not understand language, they just process it. For example:
https://earlboebert.substack.com/p/simple-cryptograms-are-still-safe?r=2adh4p
An example of how semantics, and not simple syntactic word association, plays a role in solution is touched on here:
https://earlboebert.substack.com/p/what-i-can-do-that-ai-cant?r=2adh4p
Great essay, BTW.
"But we should never forget that to an LLM a truthful statement and a hallucinated response look completely identical. To them, it’s just words. Words that are likely to appear next to other words"
Very well said. I really dislike the phrase "hallucination", and I aslo dislike Hinton's proposed alternative of "confabulation". Both refer to a distinction between truth and falsehood (or reality and fiction) that plays no role in how LLMs generate output. That we humans see their output as being more often true than false is a happy byproduct of their training; LLMs are trained on human-written text, and humans more often write truths than falsehoods. But LLMs have no independent method for telling the difference.
In contrast, when a human gets an answer wrong, they might believe it to be true. Or, they might be BSing because they don't really know. Or, they might be lying. And we usually know when we're doing this: when a person doesn't know the answer to something and they're just guessing, they know they're guessing. When they do know the answer and they say something else, they know they're lying. When they think they know the answer but they aren't sure, they're able to say why and how confident they are. LLMs don't do any of this. It's not like they check to see if they know the answer first, and then if not they make something up. To them, true output and false output are generated the same way all output is generated: by pulling tokens from probability distributions, one at a time.
I liked Naomi Klein's take on this in the Guardian: "...but why call the errors “hallucinations” at all? Why not algorithmic junk? Or glitches? Well, hallucination refers to the mysterious capacity of the human brain to perceive phenomena that are not present, at least not in conventional, materialist terms. By appropriating a word commonly used in psychology, psychedelics and various forms of mysticism, AI’s boosters, while acknowledging the fallibility of their machines, are simultaneously feeding the sector’s most cherished mythology: that by building these large language models, and training them on everything that we humans have written, said and represented visually, they are in the process of birthing an animate intelligence on the cusp of sparking an evolutionary leap for our species. How else could bots like Bing and Bard be tripping out there in the ether? Warped hallucinations are indeed afoot in the world of AI, however – but it’s not the bots that are having them; it’s the tech CEOs who unleashed them, along with a phalanx of their fans, who are in the grips of wild hallucinations, both individually and collectively."
https://www.theguardian.com/commentisfree/2023/may/08/ai-machines-hallucinating-naomi-klein
You might like this write-up on the topic: https://untangled.substack.com/p/ai-isnt-hallucinating-we-are
This was a nice post (and I'm using the modern meaning of the word, not 14th century.)
Now I might be biased, but I like my own comparison of LLMs with Leonard Shellby from Memento (https://www.whytryai.com/i/143611388/myth-llms-upgrade-themselves-on-their-own) - they both have memories up to their cutoff point (pretraining in the case of LLMs, the "incident" in the case of Leonard) but can only hold new memories for a short time and reset completely with new chats.
Haha, nice one. This is great further reading, thanks for sharing! To not make the piece I was writing too long I tried to stay away from all the intricacies around fine tuning, context windows, external memory features, and focus mostly on the core aspects.
Oh yeah, makes sense - it's a minor part in the context of this article. Would be too much of a digression to dive into every detail.
LLM's are a very small part of Ai.