Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions

Tal Lancewicki*, Shahar Segal*, Tomer Koren, Yishay Mansour

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

15 Scopus citations

Abstract

We study the stochastic Multi-Armed Bandit (MAB) problem with random delays in the feedback received by the algorithm. We consider two settings: the reward-dependent delay setting, where realized delays may depend on the stochastic rewards, and the reward-independent delay setting. Our main contribution is algorithms that achieve near-optimal regret in each of the settings, with an additional additive dependence on the quantiles of the delay distribution. Our results do not make any assumptions on the delay distributions: in particular, we do not assume they come from any parametric family of distributions and allow for unbounded support and expectation; we further allow for infinite delays where the algorithm might occasionally not observe any feedback.

Original languageEnglish
Title of host publicationProceedings of the 38th International Conference on Machine Learning, ICML 2021
PublisherML Research Press
Pages5969-5978
Number of pages10
ISBN (Electronic)9781713845065
StatePublished - 2021
Event38th International Conference on Machine Learning, ICML 2021 - Virtual, Online
Duration: 18 Jul 202124 Jul 2021

Publication series

NameProceedings of Machine Learning Research
Volume139
ISSN (Electronic)2640-3498

Conference

Conference38th International Conference on Machine Learning, ICML 2021
CityVirtual, Online
Period18/07/2124/07/21

Funding

FundersFunder number
Yandex Initiative for Machine Learning
Yandex Initiative in Machine Learning
Horizon 2020 Framework Programme
Blavatnik Family Foundation
European Research Council
Israel Science Foundation2549/19, 993/17
Israel Science Foundation
Tel Aviv University
Horizon 2020882396
Horizon 2020

    Fingerprint

    Dive into the research topics of 'Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions'. Together they form a unique fingerprint.

    Cite this