Guarantees for Epsilon-Greedy Reinforcement Learning with Function Approximation

Christoph Dann*, Yishay Mansour, Mehryar Mohri, Ayush Sekhari, Karthik Sridharan

*Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review

17 Scopus citations

Abstract

Myopic exploration policies such as ε-greedy, softmax, or Gaussian noise fail to explore efficiently in some reinforcement learning tasks and yet, they perform well in many others. In fact, in practice, they are often selected as the top choices, due to their simplicity. But, for what tasks do such policies succeed? Can we give theoretical guarantees for their favorable performance? These crucial questions have been scarcely investigated, despite the prominent practical importance of these policies. This paper presents a theoretical analysis of such policies and provides the first regret and sample-complexity bounds for reinforcement learning with myopic exploration. Our results apply to value-function-based algorithms in episodic MDPs with bounded Bellman Eluder dimension. We propose a new complexity measure called myopic exploration gap, denoted by α, that captures a structural property of the MDP, the exploration policy expl and the given value function class F. We show that the sample-complexity of myopic exploration scales quadratically with the inverse of this quantity, 1/α2. We further demonstrate through concrete examples that myopic exploration gap is indeed favorable in several tasks where myopic exploration succeeds, due to the corresponding dynamics and reward structure.

Original languageEnglish
Pages (from-to)4666-4689
Number of pages24
JournalProceedings of Machine Learning Research
Volume162
StatePublished - 2022
Event39th International Conference on Machine Learning, ICML 2022 - Baltimore, United States
Duration: 17 Jul 202223 Jul 2022

Funding

FundersFunder number
European Research Council
European Union’s Horizon 2020 research and innovation program
Israel Science Foundation993/17
NSF
Tel Aviv University
Yandex Initiative for Machine Learning
National Science Foundation1750575
National Science Foundation
European Research Council
Israel Science Foundation
Tel Aviv University
Horizon 2020882396
Horizon 2020

    Fingerprint

    Dive into the research topics of 'Guarantees for Epsilon-Greedy Reinforcement Learning with Function Approximation'. Together they form a unique fingerprint.

    Cite this