Synthesising Reward Machines for Cooperative Multi-Agent Reinforcement Learning

Giovanni Varricchione; Natasha Alechina; Mehdi Dastani; Brian Logan

doi:10.1613/jair.1.20151

PDF

Published: Apr 7, 2026

DOI: https://doi.org/10.1613/jair.1.20151

Keywords:

Reinforcement Learning, temporal logics, Temporal Reasoning, multiagent learning

Giovanni Varricchione

a:1:{s:5:"en_US";s:18:"Utrecht University";}

https://orcid.org/0000-0002-5466-9012

Natasha Alechina

https://orcid.org/0000-0003-3306-9891

Mehdi Dastani

https://orcid.org/0000-0002-4641-4087

Brian Logan

https://orcid.org/0000-0003-0648-7107

Abstract

Reward machines have recently been proposed as a means of encoding team tasks in cooperative multi-agent reinforcement learning. The resulting multi-agent reward machine is then decomposed into individual reward machines, one for each member of the team, allowing agents to learn in a decentralised manner while still achieving the team task. In this paper, we show how multi-agent reward machines for team tasks can be synthesised automatically from an abstraction of the environment in which the agents act and a high-level specification of the desired team behaviour expressed in a fragment of Alternating-time Temporal Logic. We present results from a number of benchmarks which suggest that our automated approach performs as well or better than reward machines in the literature.

Issue

Vol. 85 (2026)

Section

Articles

Article Sidebar

Main Article Content

Abstract

Article Details