Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity | Journal of Artificial Intelligence Research

PDF

Published: Nov 15, 2018

DOI: https://doi.org/10.1613/jair.1.11251

Bo Liu

Ian Gemp

Mohammad Ghavamzadeh

Ji Liu

Sridhar Mahadevan

Marek Petrik

Abstract

In this paper, we introduce proximal gradient temporal difference learning, which provides a principled way of designing and analyzing true stochastic gradient temporal difference learning algorithms. We show how gradient TD (GTD) reinforcement learning methods can be formally derived, not by starting from their original objective functions, as previously attempted, but rather from a primal-dual saddle-point objective function. We also conduct a saddle-point error analysis to obtain finite-sample bounds on their performance. Previous analyses of this class of algorithms use stochastic approximation techniques to prove asymptotic convergence, and do not provide any finite-sample analysis. We also propose an accelerated algorithm, called GTD2-MP, that uses proximal "mirror maps" to yield an improved convergence rate. The results of our theoretical analysis imply that the GTD family of algorithms are comparable and may indeed be preferred over existing least squares TD methods for off-policy learning, due to their linear complexity. We provide experimental results showing the improved performance of our accelerated gradient TD methods.

Issue

Vol. 63 (2018)

Section

Articles

afiliatedsites

JAIR is published by AI Access Foundation, a nonprofit public charity whose purpose is to facilitate the dissemination of scientific results in artificial intelligence. JAIR, established in 1993, was one of the first open-access scientific journals on the Web, and has been a leading publication venue since its inception. We invite you to check out our other initiatives.

Learn more

Article Sidebar

Main Article Content

Abstract

Article Details