AI and the destructive force of human creativity
Humanity: a bunch of cowboys in the desert, aiming guns at each other.
I was re-reading Nick Bostrom’s essay The Vulnerable World Hypothesis, which poses an interesting thought experiment:
“One way of looking at human creativity is as a process of pulling balls out of a giant urn. The balls represent possible ideas, discoveries, technological inventions. Over the course of history, we have extracted a great many balls – mostly white (beneficial) but also various shades of gray (moderately harmful ones and mixed blessings). (...) What we haven’t extracted, so far, is a black ball: a technology that invariably or by default destroys the civilization that invents it. The reason is not that we have been particularly careful or wise in our technology policy. We have just been lucky.”
Bostrom asks his reader the question: What if there is a black ball in the urn?
Artificial intelligence as a potential black ball
AI could turn out to be it. With rapid advances in the field of artificial intelligence, should we be worried? Are the necessary guardrails in place to prevent large scale disruption in case of a major breakthrough?
It might sound apocalyptic and dark and alarmist, but the reality is that human creativity will likely continue to produce inventions that are as beautiful as they are terrifying.
The nuke was a direct result of incredible scientific research into nuclear fusion – which led to a global arms race, Hiroshima, and the current-day balancing act between superpowers on the premise of mutual assured destruction.
Is it really so hard to fathom that we’re close to pulling another gray or dark-gray ball out of Bostrom's hypothetical urn?
Aligning AI with human values
In the field of AI, this concept is better known as the ‘alignment’ problem. As technology becomes smarter and smarter, we’ll rely more and more on its decision-making, because that’s what AI is: the outsourcing of decision-making.
Knowing that, we better make AI make the right decisions. We better make it accurate, reliable, and safe.
To get an idea of how hard that is, all you need to do is to read what OpenAI writes about the limitations of its models:
“Despite making significant progress, our InstructGPT models are far from fully aligned or fully safe; they still generate toxic or biased outputs, make up facts, and generate sexual and violent content without explicit prompting.”
Please note: These models are getting cheaper and more readily available. Their harm might (still) be limited, but so is the oversight.
Playing with fire
I feel our efforts should shift towards creating safer models, not bigger ones. Because, paraphrasing Bostrom here, we have the power to extract hypothetical black balls from the urn, but once out we can’t put them back.
We are hardly beginning to understand the potential havoc these new technologies can cause.
Do we even know what these ‘human values’ are that we want AI to align with? What do we mean when we say ‘avoid harm’? The ways in which social media algorithms capitalize on and reward our proclivity for sensationalist, fringe content… isn’t that harmful AI?
The technocapitalists will have you believe that as long as we can monetize it, it’s safe. I say we’re playing with fire.