Agency, Intentions, and
Artificial Intelligence

An interdisciplinary collaboration between philosophers, neuroscientists, and AI researchers

It is important to recognize that most worries about the capabilities of autonomous AI are at heart worries about its intentions rather than about super-intelligence or consciousness.

Just as we need to know the intentions of humans who interact with us, we also cannot live safely with AI unless we understand whether AI has intentions and, if so, what its intentions are. This initiative is necessary now before it is too late.

It is of utmost importance that autonomous AI is programmed so that its intentions agree with those of its human creators.

The cliché that ‘the road to hell is paved with good intentions’ is particularly apropos when considering how AI may cause harm despite the good intentions of its designers, especially given the increasing capacity of machines to act independently of human oversight, pursuing seemingly innocuous outcomes in surprisingly undesirable ways.

In time, we will coexist with another species, one made entirely in silico. Wouldn’t it be nice if their intentions were aligned with human intentions?

Reality as we know it changes rapidly; to keep up, we must face the ethical and societal impacts of these changes. We do not have the luxury of understanding post-hoc the consequences of these advancements, especially when it comes to intentions in AI.

Understanding and embedding human-like intentions in AI systems is key to creating technology that is not only smart but also trustworthy. This initiative offers an exciting opportunity to explore how we can align AI systems with human values and goals.

AI models excel at generating human-like language, often blurring the line between mimicking intentional behavior and genuinely possessing intentions. This exciting initiative has the potential to introduce new ways of thinking about intentions in AI models and to develop tools for analyzing them.

Arguments about AI sentience are missing the target we need to care about most. The key question will be whether and when machines develop a sense of agency.

Many current fears, questions, and debates around AI are not really about AI at all - instead, they are merely highlighting ancient, existential human issues. Significant progress will require incorporating the field of Diverse Intelligence, to broaden current neuromorphic biases.

As AI becomes more autonomous, we will need to better understand the nature of agency and various forms that intentions can take. Translational knowledge will be important for advances in AI, as well as prediction and control of these increasingly embedded autonomous systems.

As Charlie Munger once said, ‘Show me the incentive and I’ll show you the outcome.’ To ensure the beneficial coexistence of humans and AI, we must understand how to design the incentive structures of AI systems correctly.

To predict what an intelligent system will do, we need to reason about its intentions. Suppose we are in an unfamiliar building with an intelligent climate control system. If we know that the system’s goals include keeping the building cool and minimizing energy consumption, we can predict it will lower the shades on the sun-facing windows. As increasingly general AI agents are released into the world, predicting their actions will become ever harder and understanding their intentions all the more essential.

Whether artificial agents are themselves responsible for harms they cause is a question that will be of increasing importance, as AI systems become more sophisticated. A crucial question that must be answered, then, concerns whether AI systems have intentions and how we can determine what their intentions are.

In large part due to our unequaled intelligence, humans have enjoyed our status as the most powerful population on the planet. Our challenge has been to live peacefully with each other. To the extent we succeed, we succeed, not through our intelligence, but through our sociability, which constrains our decisions and intentions. Better understanding the relation between intentions, sociability, power, and peacefulness will be crucial.

AI is beginning to exceed the human capacity to comprehend its decisions. This trend could be a good thing or a dangerous thing, depending not on whether AI is conscious, but on whether AI develops its own intentions. Ideally those intentions should align with human interests.

Complex systems can develop their own intentions, yet we do not know the conditions under which these intentions can emerge from specific network architectures. Understanding how simple circuits do this in a biological context will reveal principles of network structure that enable these emergent properties.

Society is on track to build superhuman general intelligence in the near future, whether or not we understand it. By default, powerful AI systems will pursue intentions formulated with respect to an alien worldview, sometimes conflicting with basic human priorities. To enable human flourishing, significant resources must be invested now to determine and steer the interrelation of concepts that AI systems internally represent and utilize.

A new form of intelligence and agency exists in the world. Right now, we have the opportunity to understand its desires and intentions, in order to make sure our interactions with them help and support us, rather than harm us.

AI agents are increasingly going to be acting for human principals. Understanding how such agents are making decisions will be critical, both for AI safety and for accountability. Gaining such understanding also presents an exciting scientific and philosophical question, which may shed light on human thought, as well.

Uri Maoz

Walter Sinnott-Armstrong

Cynthia Rudin

Colin Allen

Gabriel Kreiman

Liad Mudrik

Kyongsik Yun

Mor Geva

Patrick Haggard

Michael Levin

Adina Roskies

Stuart Russell

Vincent Conitzer

Gideon Yaffe

Pamela Hieronymi

Aaron Schurger

Tom Clandinin

Paul Riechers

Adam Shai

Matthew Botvinick

A key challenge for AI is to ensure that the behavior of artificial algorithms aligns with what humans want. The problem is that so many different people want so many different things, and paradoxically even individuals often don’t even know exactly what it is they want.

John-Dylan Haynes

Optimal blending of AI and intention, which can be estimated from brain-computer interfaces, has the potential to improve the reliability and performance of neuroprosthetics to restore function after injury or disease.

Jen Collinger

Alignment of AI models to human goals and intentions is crucial in a future where more sophisticated AI agents will be acting on our behalf. Gaining understanding into how goals align with intentions, and how we can detect intentions, is important to make future AI agents safe and useful.

Sagi Perel

The development of AI is likely to be the biggest transformation in the history of our species. It is incumbent on us to understand it, including its ability to generate its own complex intentions, and to study how to make it compatible with human mental health.

Michael Graziano

The power of modern AI lies in its ability to represent a universe of possibilities via complex high-order probability distributions. Committing to one possibility—the essence of forming an intention—is where AI is weakest and where we can be distinguished from agents with explicit goals and beliefs.

Michael Mozer

Desire, not computation, is the deepest wellspring of intention. What would be necessary for an AI to develop desire?

William Newsome

  • AI is increasingly pervasive and transformative for human society.
  • There is a consensus that it possesses a special, some even say “alien”, type of intelligence.
  • As AI becomes ever-more capable and autonomous, we need to understand its intentions.

Why intentions matter

For AI to fulfill its promise, we must first address concerns about the risks that these systems pose to humanity as they become more autonomous. The best way to understand these risks is to study the intentions of these systems. Specifically, to study whether they have intentions of their own, and if their intentions align with our values as humanity. Additionally, if AIs acquire intentions, might they have moral responsibility or rights?

What are intentions?

To have an intention is to act with a rational commitment to perform that particular act as either an end or a means (Bratman 1987).

Can AI have intentions?

Take for example a self-driving car which, on its way to its destination, swerves to avoid a child running in the middle of the road. The car, momentarily, altered its immediate intention of arriving at its destination, to fulfill its overarching intention of not harming anyone or crashing. The car’s plans are thus flexible, like humans’. It therefore appears that the plans of these cars differ from human intentions in degree rather than profoundly. So, such self-driving cars may possess intentions.

Objections to intentions in AI

Is consciousness required for intentions?

The aforementioned self-driving car is an example of the working definition which we use for intentions. An entity which is capable of affecting the world around it, altering outcomes, and weighing those outcomes based on values is sufficient for intentions.

Are AI intentions just human intentions ‘built into’ the AI?

Potentially, but humans get their goals externally too (e.g., survival, upbringing). Also, the growing effort to make AI systems more autonomous, towards AGI, means that future AIs (e.g., LLM-based agents) will balance their commitments between increasingly abstract and even conflicting goals and potentially create their own goals.

What we need now

As long as humans live alongside AIs, there will be a need to align their intentions with ours. As AIs become increasingly powerful, this need is time-sensitive. We must develop the tools to detect, characterize, and intervene on AI intentions before they become undetectable, or unstoppable. Additionally, in developing the tools to study the black boxes of AI, we may also find applications for these tools in studying the human brain.

The most promising way forward is to combine precise definitions of intentions from philosophers, tools and insights from the neuroscience of intentions, and know-how from AI research. This is essential if we are to learn how to live with AI.

Funding Partners