Avoiding Negative Side Effects of Autonomous Systems in the Open World
Main Article Content
Autonomous systems that operate in the open world often use incomplete models of their environment. Model incompleteness is inevitable due to the practical limitations in precise model specification and data collection about open-world environments. Due to the limited fidelity of the model, agent actions may produce negative side effects (NSEs) when deployed. Negative side effects are undesirable, unmodeled effects of agent actions on the environment. NSEs are inherently challenging to identify at design time and may affect the reliability, usability and safety of the system. We present two complementary approaches to mitigate the NSE via: (1) learning from feedback, and (2) environment shaping. The solution approaches target settings with different assumptions and agent responsibilities. In learning from feedback, the agent learns a penalty function associated with a NSE. We investigate the efficiency of different feedback mechanisms, including human feedback and autonomous exploration. The problem is formulated as a multi-objective Markov decision process such that optimizing the agent’s assigned task is prioritized over mitigating NSE. A slack parameter denotes the maximum allowed deviation from the optimal expected reward for the agent’s task in order to mitigate NSE. In environment shaping, we examine how a human can assist an agent, beyond providing feedback, and utilize their broader scope of knowledge to mitigate the impacts of NSE. We formulate the problem as a human-agent collaboration with decoupled objectives. The agent optimizes its assigned task and may produce NSE during its operation. The human assists the agent by performing modest reconfigurations of the environment so as to mitigate the impacts of NSE, without affecting the agent’s ability to complete its assigned task. We present an algorithm for shaping and analyze its properties. Empirical evaluations demonstrate the trade-offs in the performance of different approaches in mitigating NSE in different settings.