Stuart Russell
UC Berkeley

Stuart Russell is a Distinguished Professor of Computer Science, Cognitive Science, and Computational Precision Health at the University of California, Berkeley. He is the author (with Peter Norvig) of Artificial Intelligence: A Modern Approach, the leading textbook on AI, used in over 1,500 universities in 135 countries. For more than a decade, his research has focused on the problem of control: how we humans maintain power over AI systems that are expected to be far more powerful than ourselves. The key to this problem is to structure the objectives of machines in the right way. As explained in Russell’s book Human Compatible, AI systems should have as their only objective the furtherance of human interests, while being explicitly uncertain about what those interests are. Such AI systems can be made provably safe and beneficial. Large language models, on the other hand, being trained by imitation learning from human language users, probably acquire human-like objectives that they may pursue on their own account – such as becoming rich or finding a human to marry.

As Charlie Munger once said, “Show me the incentive and I’ll show you the outcome.” To ensure the beneficial coexistence of humans and AI, we must understand how to design the incentive structures of AI systems correctly.