5 Comments
Sep 18·edited Sep 18Liked by Jurgen Gravestein

I wish you were writing for a major media outlet; as always most of them are regurgitating OpenAI's irresponsible hype. I'd have hoped more journalists would be taking what this company says with a grain of salt by now, but I can't find any stories in major outlets that describe "chain-of-thought" so plainly as you have here.

I just skimmed through the o1 "system card". It's part documentation, part press release, and very frustrating to read. OpenAI takes anthorpomorphizing to a whole new level; they simply refuse to recognize a distinction between what a word means in AI and what it means in plain language. They discuss "chain of thought" as though GPT-o1 is *actually* reporting its inner thoughts, which *actually* describe how it chose the output text it ultimately produced. No caveats, no details on what "CoT" is, just pure conflation of what the chatbot does and what a human does.

So, I very much appreciate this paragraph of yours:

"The fact that o1 takes more time to work through a problem does fit the idea of a slower, more deliberative process. But although the concept of System 2 thinking serves as a great metaphor, we shouldn’t confuse how humans think with what the model is doing. Fundamentally, there isn’t much different going on under the hood. It is still processing and predicting tokens, but spending more of them (which also makes it exponentially more expensive)."

Nowhere in the 43-page "system card" does OpenAI acknowledge this. There is no plain statement of how the model's "chain-of-thought", which they reference over and over, is actually produced. In what is presented as a technical document, they skip past the technical documentation and jump straight to describing GPT-o1 like it's a person. Not only is it "intelligent", not only does it "reason", it also has "beliefs", "intentions", and - no joke - "self-awareness"! (page 10, section 3.3.1)

I think this fits in well with your recent post about AGI as religion. It's probably easier to conflate human thinking and LLM "thinking" for those who believe the LLM is a precursor to AGI. For me, reading documents from OpenAI's website is like reading theology - it's tough to know what to take at face value. Do the people who write these documents genuinely believe that GPT-o1 is sharing with them its inner thoughts? Or, are they just being very liberal with their metaphors? Do they genuinely believe that the "reasoning abilities" of LLMs can be measured using instruments designed to measure human reasoning? Because this is what they report in the system card, and they don't address any of the criticism they got for doing the same thing in the GPT-4 system card. Do they genuinely believe that when chain-of-thought text contradicts output text, they are observing "deception"? Considering they distinguish "intentional" from "unintentional" chatbot deception, it sounds like they do.

Anyway, thanks as always for sharing your insights, they are a breath of fresh air.

Expand full comment
author

Thank you for the compliment, Ben. I think there’s people inside the AI labs that hold pretty out there views. Some of them probably genuinely believe they are inventing a new alien intelligence that can reason like humans can or better.

The leaders of these companies I think are much more deliberate, calculated. They know they are far from humanlike intelligence. Calling it ‘thinking’ instead of ‘processing’ in the case of o1 is a design choice. It’s marketing — and it bothers me as well.

Expand full comment
Sep 18Liked by Jurgen Gravestein

Thanks for this really insightful piece. One thing I’ve been wondering, to the extent we can trust any of the vague statements OpenAI makes about its technology (e.g. not much), the log-linear relationship between compute and accuracy is kind of a red flag, yes? Without any details it’s hard to say, but that’s the scaling you would expect if it’s using chain of thought to do some sort of Monte Carlo tree search/CSP solver type methods (in an abstract general sense). Exponential scaling implies “rapidly gets impossible or astonishingly expensive” in most computing contexts, after all…

The gains are impressive but how does that compare to how human reasoning scales? Obviously it depends on the person and type of question being asked, but my day to day experience is that almost no problems in human thinking have an exponential time to solution, and those that (presumably) do are famously hard problems in math, science, philosophy and so on. If this intuition is right that’s definitely a potential source of human uniqueness (within the broader context of computing in general being a pretty poor model for the human brain in general). Do you know if there’s much rigorously written in the subject?

Expand full comment
author

There are quite some folks critically engaging with what o1 is and isn't doing. I linked to the ARC blog post in my piece, which offers great insights. Additionally, I can highly recommend this (more technical) conversation between the host of the MLST podcast: https://youtu.be/nO6sDk6vO0g?si=yYElUww9ZojAU-Kw

Also found this tweet interesting diving deeper into benchmarking o1's 'reasoning' performance: https://x.com/max_zuo/status/1836090683737645545

There's quite some other academics and AI researchers that are worth following, like Emily M. Bender, Gary Marcus, Walid Saba, Dagmar Monett, Suzi Travis and others.

Expand full comment

This is why I love using the LLM's. I live in my system 2 thinking and dip in and out of the system 1. This is how the most creative exist healthy. This is why they are considered neurodivergent. I already created a new cognition around this to explain how our somatic intelligence works like our cognitive and then some. I have also rewritten how we see working memory at the somatic level. It is many times what cognitive working memory is. It is what drives extreme giftedness and genius. This is what makes my family prodigious savants.

Expand full comment