The RL/LLM Taxonomy Tree: Reviewing Synergies Between Reinforcement Learning and Large Language Models

Moschoula Pternea; Prerna Singh; Abir Chakraborty; Yagna Oruganti; Mirco Milletari; Sayli Bapat; Kebei Jiang

doi:10.1613/jair.1.15960

PDF

Published: Aug 26, 2024

DOI: https://doi.org/10.1613/jair.1.15960

Keywords:

reinforcement learning, natural language, planning, markov decision processes, large language models, robotics

Moschoula Pternea

Microsoft

Prerna Singh

Microsoft

Abir Chakraborty

Microsoft

Yagna Oruganti

Microsoft

Mirco Milletari

Microsoft

Sayli Bapat

Microsoft

Kebei Jiang

Microsoft

Abstract

In this work, we review research studies that combine Reinforcement Learning (RL) and Large Language Models (LLMs), two areas that owe their momentum to the development of Deep Neural Networks (DNNs). We propose a novel taxonomy of three main classes based on the way that the two model types interact with each other. The first class, RL4LLM, includes studies where RL is leveraged to improve the performance of LLMs on tasks related to Natural Language Processing (NLP). RL4LLM is divided into two sub-categories depending on whether RL is used to directly fine-tune an existing LLM or to improve the prompt of the LLM. In the second class, LLM4RL, an LLM assists the training of an RL model that performs a task that is not inherently related to natural language. We further break down LLM4RL based on the component of the RL training framework that the LLM assists or replaces, namely reward shaping, goal generation, and policy function. Finally, in the third class, RL+LLM, an LLM and an RL agent are embedded in a common planning framework without either of them contributing to training or fine-tuning of the other. We further branch this class to distinguish between studies with and without natural language feedback. We use this taxonomy to explore the motivations behind the synergy of LLMs and RL and explain the reasons for its success, while pinpointing potential shortcomings and areas where further research is needed, as well as alternative methodologies that serve the same goal.

Issue

Vol. 80 (2024)

Section

Articles

Article Sidebar

Main Article Content

Abstract

Article Details