Sept. 17, 2021 – One of us (Anders) has a background in computational neuroscience, and now works with groups such as the AI Objectives Institute, where we discuss how to avoid such problems with AI; the other (Thomas) studies history, and the various ways people have thought about both the future and the fate of civilization throughout the past. After striking up a conversation on the topic of wireheading, we both realized just how rich and interesting the history behind this topic is.

It is an idea that is very of the moment, but its roots go surprisingly deep. We are currently working together to research just how deep the roots go: a story that we hope to tell fully in a forthcoming book. The topic connects everything from the riddle of personal motivation, to the pitfalls of increasingly addictive social media, to the conundrum of hedonism and whether a life of stupefied bliss may be preferable to one of meaningful hardship. It may well influence the future of civilization itself.

Here, we outline an introduction to this fascinating but under-appreciated topic, exploring how people first started thinking about it.

The Sorcerer’s Apprentice

When people think about how AI might “go wrong,” most probably picture something along the lines of malevolent computers trying to cause harm. After all, we tend to anthropomorphize—think that nonhuman systems will behave in ways identical to humans. But when we look to concrete problems in present-day AI systems, we see other, stranger ways that things could go wrong with smarter machines. One growing issue with real-world AIs is the problem of wireheading.

Imagine you want to train a robot to keep your kitchen clean. You want it to act adaptively, so that it doesn’t need supervision. So you decide to try to encode the goal of cleaning rather than dictate an exact—yet rigid and inflexible—set of step-by-step instructions. Your robot is different from you in that it has not inherited a set of motivations—such as acquiring fuel or avoiding danger—from many millions of years of natural selection. You must program it with the right motivations to get it to reliably accomplish the task.

So, you encode it with a simple motivational rule: it receives reward from the amount of cleaning-fluid used. Seems foolproof enough. But you return to find the robot pouring fluid, wastefully, down the sink.


[ninja-popup ID=12216]