Agency, Intentions, and
Artificial Intelligence
An interdisciplinary collaboration between philosophers, neuroscientists, and AI researchers
AI is increasingly pervasive and transformative for human society. And there is a consensus that it possesses a different type of intelligence, some even say “alien” intelligence. As AI becomes ever-more capable and autonomous, don’t we need to understand its intentions?
Why intentions matter
For AI to fulfill its enormous promise, we must ameliorate valid concerns about the risks that these systems pose to humans or even to humanity as they become more autonomous. To do this, it must be recognized that the central concerns are about the intentions of increasingly capable AI systems, especially if AIs develop their own intentions and if those intentions misalign with human values. It is not about AI becoming super-intelligent or conscious. What is more, if AIs acquire intentions, they might have moral responsibility and moral rights.
What are intentions?
A popular view equates an intention to act with a rational commitment to perform that particular act as either an end or a means (Bratman 1987). If I intend to travel to my friend’s house this weekend, I should take the necessary steps to bring it about (e.g., pack a bag). Though my plan need not be fully detailed (e.g., I may have not settled on the departure time). And I could change my mind, but I would need good reasons to do that. Such flexible plans are beyond thermostats that rigidly adjust the room temperature even when there are good reasons not to.
Can AI have intentions?
If I take a self-driving car to my friend’s house, I give it the goal of getting me there as soon as possible. But, if a child runs into the road in front of it, the car will override that goal to achieve its overarching goal of avoiding collisions. The car’s plans are thus flexible, like humans’. It therefore appears that the plans of these cars differ from human intentions in degree rather than profoundly. So, such self-driving cars may possess at least some rudimentary intentions.
Objections to intentions in AI
Some think that consciousness is required for intentions. But, while typing, I can intentionally type a space between words without ever becoming conscious that I intended that. So, intentions do not require consciousness. Others claim that humans choose their own goals while AIs are given theirs by their developers and users. However, humans get their goals externally too (e.g., survival). Also, the growing effort to make AI systems more autonomous, towards AGI, means that future AIs (e.g., LLM-based agents) will balance their commitments between increasingly abstract and even conflicting goals as well as create their own goals.
What we need now
As long as humans live alongside AIs, the need to align the intentions of AIs with the goals of humans—before the AIs act on its intentions—will remain at the forefront. So, we must develop the tools to detect, characterize, and intervene on AI intentions. Human brains differ from AI in many ways, but the black-box nature of cutting-edge AIs highlights the promise of adapting insights and methods from the neuroscience of intentions in humans for analysis of AI. Such research could also help us better understand intentions in humans.
The most promising way forward is to combine precise definitions of intentions from philosophers, tools and insights from the neuroscience of intentions, and know-how from AI research. This is essential if we are to learn how to live with AI.
“It is important to recognize that most worries about the capabilities of autonomous AI are at heart worries about its intentions rather than about super-intelligence or consciousness.”
“Just as we need to know the intentions of humans who interact with us, we also cannot live safely with AI unless we understand whether AI has intentions and, if so, what its intentions are. This initiative is necessary now before it is too late.”
“It is of utmost importance that autonomous AI is programmed so that its intentions agree with those of its human creators.”
“The cliché that ‘the road to hell is paved with good intentions’ is particularly apropos when considering how AI may cause harm despite the good intentions of its designers, especially given the increasing capacity of machines to act independently of human oversight, pursuing seemingly innocuous outcomes in surprisingly undesirable ways.”
“In time, we will coexist with another species, one made entirely in silico. Wouldn’t it be nice if their intentions were aligned with human intentions?”
“Reality as we know it changes rapidly; to keep up, we must face the ethical and societal impacts of these changes. We do not have the luxury of understanding post-hoc the consequences of these advancements, especially when it comes to intentions in AI.”
“Understanding and embedding human-like intentions in AI systems is key to creating technology that is not only smart but also trustworthy. This initiative offers an exciting opportunity to explore how we can align AI systems with human values and goals.”
“AI models excel at generating human-like language, often blurring the line between mimicking intentional behavior and genuinely possessing intentions. This exciting initiative has the potential to introduce new ways of thinking about intentions in AI models and to develop tools for analyzing them.”
“Arguments about AI sentience are missing the target we need to care about most. The key question will be whether and when machines develop a sense of agency.”
“Many current fears, questions, and debates around AI are not really about AI at all - instead, they are merely highlighting ancient, existential human issues. Significant progress will require incorporating the field of Diverse Intelligence, to broaden current neuromorphic biases.”
“As AI becomes more autonomous, we will need to better understand the nature of agency and various forms that intentions can take. Translational knowledge will be important for advances in AI, as well as prediction and control of these increasingly embedded autonomous systems.”
“As Charlie Munger once said, ‘Show me the incentive and I’ll show you the outcome.’ To ensure the beneficial coexistence of humans and AI, we must understand how to design the incentive structures of AI systems correctly.”
“To predict what an intelligent system will do, we need to reason about its intentions. Suppose we are in an unfamiliar building with an intelligent climate control system. If we know that the system’s goals include keeping the building cool and minimizing energy consumption, we can predict it will lower the shades on the sun-facing windows. As increasingly general AI agents are released into the world, predicting their actions will become ever harder and understanding their intentions all the more essential.”
“Whether artificial agents are themselves responsible for harms they cause is a question that will be of increasing importance, as AI systems become more sophisticated. A crucial question that must be answered, then, concerns whether AI systems have intentions and how we can determine what their intentions are.”
“In large part due to our unequaled intelligence, humans have enjoyed our status as the most powerful population on the planet. Our challenge has been to live peacefully with each other. To the extent we succeed, we succeed, not through our intelligence, but through our sociability, which constrains our decisions and intentions. Better understanding the relation between intentions, sociability, power, and peacefulness will be crucial.”
“AI is beginning to exceed the human capacity to comprehend its decisions. This trend could be a good thing or a dangerous thing, depending not on whether AI is conscious, but on whether AI develops its own intentions. Ideally those intentions should align with human interests.”
“Complex systems can develop their own intentions, yet we do not know the conditions under which these intentions can emerge from specific network architectures. Understanding how simple circuits do this in a biological context will reveal principles of network structure that enable these emergent properties. ”
“Society is on track to build superhuman general intelligence in the near future, whether or not we understand it. By default, powerful AI systems will pursue intentions formulated with respect to an alien worldview, sometimes conflicting with basic human priorities. To enable human flourishing, significant resources must be invested now to determine and steer the interrelation of concepts that AI systems internally represent and utilize.”
“A new form of intelligence and agency exists in the world. Right now, we have the opportunity to understand its desires and intentions, in order to make sure our interactions with them help and support us, rather than harm us.”
“AI agents are increasingly going to be acting for human principals. Understanding how such agents are making decisions will be critical, both for AI safety and for accountability. Gaining such understanding also presents an exciting scientific and philosophical question, which may shed light on human thought, as well.”
Uri Maoz
Walter Sinnott-Armstrong
Cynthia Rudin
Colin Allen
Gabriel Kreiman
Liad Mudrik
Kyongsik Yun
Mor Geva
Patrick Haggard
Michael Levin
Adina Roskies
Stuart Russell
Vincent Conitzer
Gideon Yaffe
Pamela Hieronymi
Aaron Schurger
Tom Clandinin
Paul Riechers
Adam Shai
Matthew Botvinick
Funding Partners



