Key insights of today’s newsletter:
Cognition Labs, emerging from stealth last week, announced Devin, the world’s first fully autonomous AI software engineer.
Devin is equipped with common developer tools including a shell, code editor, and browser within a sandboxed compute environment and can engage in end-to-end coding tasks.
It feels like an exciting leap forwards, but at the same time, it’s hard to predict the impact a tool like this may have when deployed at scale.
↓ Go deeper (5 min read)
What if you could build a fully automated software engineer? That’s exactly what the company Cognition Labs, emerging from stealth last week, claims to have succeeded at with their groundbreaking new product, Devin.
I say ‘claims’ because Devin hasn’t been released to the public yet. Only a handful of people have been granted early access, a collection of demo videos are circulating online, and preliminary benchmark data was published by the team.
I’ll go over the details and talk you through some of the potential implications, in case Devin turns out to be the real deal. (We’ve seen a lot of promising demo’s over the past year, so let’s not jump on the hype-train just yet).
What can Devin do?
Similar to previous attempts at building systems like this, like AutoGPT, a user can give Devin a goal. From this goal, the system will try to extrapolate a high-level plan made up of sub-tasks that it will then execute one-by-one.
Devin is equipped with common developer tools including a shell, code editor, and browser within a sandboxed compute environment — everything a real developer would need to do their work. A conversational interface allows the user to communicate with Devin throughout.
To give you a feeling what it looks like in action, I’ve included one of the clips below, in which Devin built an interactive website inspired by the Conway's Game of Life.
How good is Devin really?
Now, with this kind of stuff, you really have to see it to believe it. But like I mentioned, only a handful of people got early access.
The company did publish benchmark data for Devin based on SWE-bench: a benchmark that tests a system’s ability to address real-world software issues found on GitHub. The idea is to see if an LLM can analyze the issue description and codebase, and then generate a patch that fixes the problem, similar to how a human programmer would.
Here’s how Devin, allegedly, performed on that benchmark:
Devin correctly resolves 13.86%* of the issues end-to-end, far exceeding the previous state-of-the-art of 1.96%. Even when given the exact files to edit, the best previous models can only resolve 4.80% of issues.
I hear you thinking, 13,86% doesn’t sound like a lot. It means that 86,14% of the issues were too hard to solve on its own. But if you turned it around and said that roughly 1 in 10 coding problems can now be solved by an AI instead of a human software engineer — all of a sudden it sounds a lot more impressive.
In the Dutch podcast POKI by
and they compared Devin’s skill level to that of an intern or a junior programmer. In its current state, while imperfect, it looks like it could be a highly effective tool.And not just that, it could highly disruptive too. By giving Devin access to the web, it can act in the real world, which may have unintended consequences. To illustrate,
took Devin for a spin and… well this happened:Is it time to freak out now?
All in all, Devin looks impressive enough to take seriously, but it’s difficult to say how serious we should take it.
I mean, are all junior software engineers out of a job in 1-3 years? What unintentional harms are caused by thousands or even tens of thousands of Devin’s running wild on the web? And how will malicious actors leverage Devin — or an open source version that will be build, eventually, inevitable — unconstrained by any guardrails?
I don’t know about you, but sometimes these AI releases feel like a game of what’s going to terrify us next. Writing about this stuff feels like a combination of reading tealeaves and seeing the writing on the wall. On the one hand, Devin feels like an exciting leap forwards and on the other hand, I can’t help but think of everything that can go wrong.
And I’ve been called too pessimistic by readers before, pointing out I’m focusing too much on the negative (really, I get it), but I just happen to have a nagging suspicion that Devin is going to create more problems than it solves.
Join the conversation 🗣
Leave comment with your thoughts or like this article if it resonated.
Get in touch 📥
Shoot me an email at jurgen@cdisglobal.com.
'Devins running wild on the Web" maybe even with capacity for arbitrary "thought"... Yep. Scary. 😬
I’m a professional software engineer. I’ve been waiting for this since LLMs first appeared on scene. I got into software because I love building things and exploring about what can be invented with code and data.
A virtual programmer, for me, doesn’t need to be perfect. Just good enough to enable me to close the gap between idea/question and implantation a bit faster.
My only hope is it gets even better even faster so I can quit my job and build the things I’m already working on faster.
Junior devs who actually love understanding how digital systems are constructed will be fine. But yes, devs who mostly copy paste are over.