Summary: If you look online, you’ll find plenty videos of humanoid robots performing all sorts of impressive feats. But if they are so impressive, why aren’t these robots everywhere yet?
↓ Go deeper (8 min)
Have you ever heard about the Coffee Test? A humanoid robot is required to enter any regular home and brew a pot of coffee. It needs to find a mug, coffee, perhaps grind some beans if necessary, and figure out how the coffeemaker works. It’s the equivalent of the Turing Test, but for robots.
Simple as it may seem, to this day engineers haven’t solved the problem. As a matter of fact, humanoid robots are nowhere to be seen. You don’t see robots walking the neighbor’s dog, stacking the shelves of your local supermarket or doing the dishes. So where are they?
Why haven’t humanoid robots made the leap from the laboratory to the living room?
Living in a simulation
There is no shortage of amazing videos online of humanoid robots doing remarkable things (from doing backflips to dancing) and unremarkable things (from folding clothing to loading the dishwasher).
Take a look at this recent announcement of HMND 01, by a new player in the humanoid market:
Or how about this one, from one of the big player in the market, Figure:
Watch enough of these videos and you could get the impression that robots are in fact very much capable of doing all sorts of things.
To understand their limitations, though, we need to look at how humanoids are being trained. Mostly this is done via 3D simulations of real-life settings (such as kitchens with fridges). By interacting with a virtual fridge, a digital version of the robot can try with different ways to open the door, load items, and close it again without any risk of damage or injury. This is a relatively compute-intensive process and is often aided by having humans teaching robots using virtual reality.
Once it masters a task in the simulation environment, the behaviors can be transferred to a real physical robot. Easy peasy.
Reality is a messy place
But these training methods aren’t perfect. While simulated environments offer a safe and controlled setting for training humanoid robots, they often fall short of capturing the full complexity of the real world.
Minor changes in friction, texture or unexpected obstacles can cause trained behaviors to go wrong. This ‘sim-to-real’ gap means that a robot that flawlessly opens a fridge in a digital kitchen might fail as soon as it is being ask to open your fridge.
It’s unclear how hard it will be to overcome these challenges. Regardless, the world is all in. Nvidia recently unveiled a new robotics platform Cosmos, which they called the ‘ChatGPT moment’ for robotics. And Elon Musk predicted at the 8th Future Investment Initiative conference, in Riyadh, Saudi Arabia, that by 2040 we will see least 10 billion humanoid robots roaming around. I have my doubts.
A force for good
Even if we were able to more efficiently and effectively train humanoid robots to do everyday tasks, it’s unclear what we would have them do. The truth is, robots for the home are already here — they’re called the Dishwasher, the Vacuum Cleaner, and the Washing Machine. Can the humanoid really compete?
And if we want humanoid robots to become a force for good in the world, why don’t we turn our focus on eradicating dangerous jobs or child labor instead? Côte d’Ivoire and Ghana produce nearly 60% of the world’s cocoa each year, but the latest estimates found that 1.56 million children are engaged in child labor on cocoa farms in these two countries alone. Why don’t we employ humanoid robots to do this work and let local communities benefit from the profits?
Or how about the more than 1 million people working in cobalt mines worldwide? Cobalt is primarily used in lithium-ion batteries, powering smartphones, electric cars, and soon even these robots themselves.
Wouldn’t that be nice way to return the favor?
Until the next one,
— Jurgen
Speaking of rare materials like Cobalt, I doubt we could ever source enough of it to build batteries for 10 billion humanoids. We know the planet is in decline and natural resources are depleting, yet these techbros seem oblivious. Do you ever get the sense that they know their robotics and AI projects are bullshit but they're hoping to sell enough shovels and pipedreams to get away with it?
Great article! There are a lot of layers to this, from corporate greed to lack of innovative vision, but the technical issue that makes the sim-to-real gap extremely hard--regardless of the politics or the economics of it--, I think, is probably something similar to the issue of hallucinations. ML systems trained via backprop need to come up with differentiable approximations of the input space to be able to generalize at all. This means LLMs need to map sentences to vectors, for example, and compare vector distances to determine what a sentence "means".
In the same sense, these robots need to map the world to a vectorial representation to determine what is going on in the kitchen. But smooth, differentiable representations can only be an approximate model of macroscopic reality, and is in that tiny difference between the smooth proxy and the non-continuous reality you want to model that problems arise. For robots, this means tiny discrepancies between their prediction of the outcome of a decision and the actual outcome, that get amplified the longer the prediction chain.
This is why simple, reactive agents like Roombas work so well for they restricted domain, they are approximating a much smaller part of the real world (just flat floors with static obstacles) and they can self-correct fast because the model of the world they have is super simple, just a few parameters (plus they are not learning agents, they are hard-coded, I believe). Humanoid robots are supposed to deal with the whole chunk of reality humans have evolved over eons to represent in our brains accurately, and I think we do need symbolic world models with causal inference rules to model it accurately. I don't know if backprop can lead us all the way there.
(Sorry for the jargon, it's early and my brain is still too dumb to make this more intelligible.)