Better Best of Both Worlds Bounds for Bandits with Switching Costs

Idan Amir, Guy Azov, Tomer Koren, Roi Livni

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

9 Scopus citations

Abstract

We study best-of-both-worlds algorithms for bandits with switching cost, recently addressed by Rouyer, Seldin, and Cesa-Bianchi [14]. We introduce a surprisingly simple and effective algorithm that simultaneously achieves minimax optimal regret bound (up to logarithmic factors) of O(T2/3) in the oblivious adversarial setting and a bound of O(min{log(T)/∆2,T2/3}) in the stochastically-constrained regime, both with (unit) switching costs, where ∆ is the gap between the arms. In the stochastically constrained case, our bound improves over previous results due to [14], that achieved regret of O(T1/3/∆). We accompany our results with a lower bound showing that, in general, Ω̃(min{1/∆2,T2/3}) switching cost regret is unavoidable in the stochastically-constrained case for algorithms with O(T2/3) worst-case switching cost regret.

Original languageEnglish
Title of host publicationAdvances in Neural Information Processing Systems 35 - 36th Conference on Neural Information Processing Systems, NeurIPS 2022
EditorsS. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, A. Oh
PublisherNeural information processing systems foundation
ISBN (Electronic)9781713871088
StatePublished - 2022
Event36th Conference on Neural Information Processing Systems, NeurIPS 2022 - New Orleans, United States
Duration: 28 Nov 20229 Dec 2022

Publication series

NameAdvances in Neural Information Processing Systems
Volume35
ISSN (Print)1049-5258

Conference

Conference36th Conference on Neural Information Processing Systems, NeurIPS 2022
Country/TerritoryUnited States
CityNew Orleans
Period28/11/229/12/22

Fingerprint

Dive into the research topics of 'Better Best of Both Worlds Bounds for Bandits with Switching Costs'. Together they form a unique fingerprint.

Cite this