Generalizing the Regret: an Analysis of Lower and Upper Bounds

Main Article Content

Marco Mussi
Alberto Maria Metelli

Abstract

The (expected cumulative) regret is the customary index to judge the performance of online sequential decision-making algorithms. In the traditional form, it is defined as the expected sum over a learning horizon T of the sub-optimality gaps ∆It (i.e., expected instantaneous regret) the agent suffers when playing arm It at round t. In this paper, we propose and investigate a generalization of this notion, named g-(expected cumulative) regret, obtained by applying a transformation function g to the sub-optimality gaps, making the agent suffer g(∆It) instead of just ∆It . Intuitively, function g embeds the “perception” that the agent manifests when performing a sub-optimal decision. We first show that sublinear g-regret is not achievable for a generic transformation function g. Then, we introduce a mild condition on g and provide instance-dependent and worst-case (i.e., minimax) lower bounds for the g-regret. Finally, we show that state-of-the-art stochastic bandit algorithms with no modification surprisingly display optimal performances for the g-regret. Specifically, we prove that UCB1 matches (up to constant factors) the instance-dependent lower bound regardless of function g and that MOSS matches (up to constant factors) the minimax lower bound at least for a wide class of transformation functions.

Article Details

Section
Articles