Non-stochastic Bandits With Evolving Observations

Yogev Bar-On, Yishay Mansour

Research output: Contribution to journalConference articlepeer-review

Abstract

We introduce a novel online learning framework that unifies and generalizes pre-established models, such as delayed and corrupted feedback, to encompass adversarial environments where action feedback evolves over time. In this setting, the observed loss is arbitrary and may not correlate with the true loss incurred, with each round updating previous observations adversarially. We propose regret minimization algorithms for both the full-information and bandit settings, with regret bounds quantified by the average feedback accuracy relative to the true loss. Our algorithms match the known regret bounds across many special cases, while also introducing previously unknown bounds.

Original languageEnglish
Pages (from-to)204-227
Number of pages24
JournalProceedings of Machine Learning Research
Volume272
StatePublished - 2025
Event36th International Conference on Algorithmic Learning Theory, ALT 2025 - Milan, Italy
Duration: 24 Feb 202527 Feb 2025

Funding

FundersFunder number
Yandex Initiative for Machine Learning
Israel Science Foundation
Tel Aviv University
European Research Council
Horizon 2020882396

    Keywords

    • delayed feedback
    • multi-armed bandits
    • online learning

    Fingerprint

    Dive into the research topics of 'Non-stochastic Bandits With Evolving Observations'. Together they form a unique fingerprint.

    Cite this