Almost optimal exploration in multi-armed bandits

Zohar Karnin, Tomer Koren, Oren Somekh

Research output: Contribution to conferencePaperpeer-review

167 Scopus citations

Abstract

We study the problem of exploration in stochastic Multi-Armed Bandits. Even in the simplest setting of identifying the best arm, there remains a logarithmic multiplicative gap between the known lower and upper bounds for the number of arm pulls required for the task. This extra logarithmic factor is quite meaningful in nowadays large-scale applications. We present two novel, parameter-free algorithms for identifying the best arm, in two different settings: given a target confidence and given a target budget of arm pulls, for which we prove upper bounds whose gap from the lower bound is only doubly-logarithmic in the problem parameters. We corroborate our theoretical results with experiments demonstrating that our algorithm outperforms the state-of-the-art and scales better as the size of the problem increases.

Original languageEnglish
Pages2275-2283
Number of pages9
StatePublished - 2013
Externally publishedYes
Event30th International Conference on Machine Learning, ICML 2013 - Atlanta, GA, United States
Duration: 16 Jun 201321 Jun 2013

Conference

Conference30th International Conference on Machine Learning, ICML 2013
Country/TerritoryUnited States
CityAtlanta, GA
Period16/06/1321/06/13

Fingerprint

Dive into the research topics of 'Almost optimal exploration in multi-armed bandits'. Together they form a unique fingerprint.

Cite this