Backward Monte Carlo Tree Search: Charting Unsafe Regions in the Belief-Space
Main Article Content
Abstract
Safety-critical systems often operate in partially observable environments, where assessing the safety of the underlying policy remains a fundamental challenge. This study focuses on evaluating policies by identifying regions of the belief-space that can lead the system’s policy to an undesirable state with a non-negligible probability. In this paper, we introduce Backward Monte Carlo Tree Search, the first Monte Carlo tree search framework that expands backward in time within the belief-space. The tree search begins from an undesired terminal belief and recursively explores its possible predecessors, constructing a tree of belief transitions that could lead to an unsafe outcome within a given horizon. Evaluations in gridworld and autonomous driving domains show that identifying beliefs from which failures may occur enables runtime risk forecasting and targeted policy retraining, marking a conceptual shift in how safety is validated under uncertainty.