Reinforcement Learning from Optimization Proxy for Ride-Hailing Vehicle Relocation | Journal of Artificial Intelligence Research

PDF

Published: Nov 28, 2022

DOI: https://doi.org/10.1613/jair.1.13794

Keywords:

constraint satisfaction, machine-learning, reinforcement-learning, real-time-systems

Enpeng Yuan

Wenbo Chen

Georgia Tech

Pascal Van Hentenryck

a:1:{s:5:"en_US";s:12:"Georgia Tech";}

Abstract

Idle vehicle relocation is crucial for addressing demand-supply imbalance that frequently arises in the ride-hailing system. Current mainstream methodologies - optimization and reinforcement learning - suffer from obvious computational drawbacks. Optimization models need to be solved in real-time and often trade off model fidelity (hence quality of solutions) for computational efficiency. Reinforcement learning is expensive to train and often struggles to achieve coordination among a large fleet. This paper designs a hybrid approach that leverages the strengths of the two while overcoming their drawbacks. Specifically, it trains an optimization proxy, i.e., a machine-learning model that approximates an optimization model, and then refines the proxy with reinforcement learning. This Reinforcement Learning from Optimization Proxy (RLOP) approach is computationally efficient to train and deploy, and achieves better results than RL or optimization alone. Numerical experiments on the New York City dataset show that the RLOP approach reduces both the relocation costs and computation time significantly compared to the optimization model, while pure reinforcement learning fails to converge due to computational complexity.

Issue

Vol. 75 (2022)

Section

Articles

Article Sidebar

Main Article Content

Abstract

Article Details