Efficiently Adapt to New Dynamic via Meta-Model

Main Article Content

Kaixin Huang
Chen Zhao
Chun Yuan

Abstract




We delve into the realm of offline meta-reinforcement learning (OMRL), a practical paradigm in the field of reinforcement learning that leverages offline data to adapt to new tasks. While prior approaches have not explored the utilization of context-based dynamical models to tackle OMRL problems, our research endeavors to fill this gap. Our investigation uncovers shortcomings in existing context-based methods, primarily related to distribution shifts during offline learning and challenges in establishing stable task representations. To address these issues, we formulate the problem as Hidden-Parameter MDPs and propose a framework for effective model adaptation using meta-models plus latent variables, which is inferred by the transformer-based system recognition module trained in an unsupervised fashion. Through extensive experimentation encompassing diverse simulated robotics and control tasks, we validate the efficacy of our approach and demonstrate its superior generalization ability compared to existing schemes, and explore multiple strategies for obtaining policies with personalized models. Our method achieves a model with reduced prediction error, outperforming previous methods in policy performance, and facilitating efficient adaptation when compared to prior dynamic model generalization methods and OMRL algorithms.




Article Details

Section
Articles