Safe Learning of Multi-Agent Action Models from Concurrent Joint Action Observations

Argaman Mordoch; Ori Karat; Lea Shmilovich; Yarin Benyamin; Brendan Juba; Roni Stern

doi:10.1613/jair.1.20494

PDF

Published: May 15, 2026

DOI: https://doi.org/10.1613/jair.1.20494

Keywords:

Automated planning, Multi-agent planning, Action model learning

Argaman Mordoch

Ben Gurion University

https://orcid.org/0000-0003-3502-1461

Ori Karat

Ben Gurion University

Lea Shmilovich

Ben Gurion University

Yarin Benyamin

Ben Gurion University

https://orcid.org/0009-0008-7427-6380

Brendan Juba

Washington University in St. Louis

https://orcid.org/0000-0001-6542-833X

Roni Stern

https://orcid.org/0000-0003-0043-8179

Abstract

Background: Multi-Agent Planning (MAP) involves coordinating the actions of multiple autonomous agents to achieve shared objectives. A prevalent formalism for MAP is the Multi-Agent Planning Domain Definition Language (MA-PDDL). While effective, existing MA-PDDL solvers typically require complete access to agents’ action models—specifically their preconditions and effects. However, manually creating these models is often intractable, requiring exhaustive domain expertise.

Objectives: This work explores an alternative approach: automatically learning agents’ action models from observed transitions. Since learned models may be inaccurate, planning with them can yield invalid or non-executable sequences. To mitigate this, we formalize a requirement for safety, ensuring that plans generated via the learned model remain sound with respect to the real unknown action model.

Methods: Previous research introduced the Safe Action Model Learning (SAM) algorithm for single-agent domains. However, SAM is not suitable for MA-PDDL environments where observations include concurrently executed actions, since it cannot naturally disambiguate the individual contributions of the agents to the observed effects. To address this, we introduce Multi-Agent Safe Action Model Learning (MA-SAM), a safe action model learning algorithm designed to handle concurrent multi-agent observations. For scenarios where individual action effects remain ambiguous, we further propose MA-SAM+ , which learns the preconditions and effects of macro-actions representing concurrent execution of subsets of actions. We evaluate both algorithms on domains from the Competition of Distributed and Multi-Agent Planners (CoDMAP) benchmarks and a novel MAP domain inspired by the game Overcooked.

Results: We establish a theoretical lower bound on the sample complexity for learning safe action models in multi-agent settings. We prove that MA-SAM does not achieve this lower bound in all cases, identifying specific conditions under which its sample complexity may become unbounded. Empirically, both MA-SAM and MA-SAM+ significantly outperform SAM-based baselines in coverage and applicability rates. While their performance is comparable in many settings, MA-SAM+ highly outperforms MA-SAM in some of the evaluated domains.

Conclusions: We present the first algorithms capable of learning safe MA-PDDL action models from concurrently executed actions, providing both theoretical foundations and empirical validation across diverse planning benchmarks

Issue

Vol. 86 (2026)

Section

Articles

Article Sidebar

Main Article Content

Abstract

Article Details