Bandit convex optimization: √T regret in one dimension

Sébastien Bubeck, Ofer Dekel, Tomer Koren, Yuval Peres

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

21 Scopus citations

Abstract

We analyze the minimax regret of the adversarial bandit convex optimization problem. Focusing on the one-dimensional case, we prove that the minimax regret is θ∼(√T) and partially resolve a decade-old open problem. Our analysis is non-constructive, as we do not present a concrete algorithm that attains this regret rate. Instead, we use minimax duality to reduce the problem to a Bayesian setting, where the convex loss functions are drawn from a worst-case distribution, and then we solve the Bayesian version of the problem with a variant of Thompson Sampling. Our analysis features a novel use of convexity, formalized as a "local-to-global" property of convex functions, that may be of independent interest.

Original languageEnglish
Title of host publicationProceedings of The 28th Conference on Learning Theory
EditorsPeter Grünwald, Elad Hazan, Satyen Kale
PublisherPMLR
Volume40
Edition2015
StatePublished - 2015
Externally publishedYes
Event28th Conference on Learning Theory, COLT 2015 - Paris, France
Duration: 2 Jul 20156 Jul 2015

Publication series

NameProceedings of Machine Learning Research
PublisherPMLR
Volume40
ISSN (Electronic)2640-3498

Conference

Conference28th Conference on Learning Theory, COLT 2015
Country/TerritoryFrance
CityParis
Period2/07/156/07/15

Fingerprint

Dive into the research topics of 'Bandit convex optimization: √T regret in one dimension'. Together they form a unique fingerprint.

Cite this