Abstract
We introduce a novel online learning framework that unifies and generalizes pre-established models, such as delayed and corrupted feedback, to encompass adversarial environments where action feedback evolves over time. In this setting, the observed loss is arbitrary and may not correlate with the true loss incurred, with each round updating previous observations adversarially. We propose regret minimization algorithms for both the full-information and bandit settings, with regret bounds quantified by the average feedback accuracy relative to the true loss. Our algorithms match the known regret bounds across many special cases, while also introducing previously unknown bounds.
Original language | English |
---|---|
Pages (from-to) | 204-227 |
Number of pages | 24 |
Journal | Proceedings of Machine Learning Research |
Volume | 272 |
State | Published - 2025 |
Event | 36th International Conference on Algorithmic Learning Theory, ALT 2025 - Milan, Italy Duration: 24 Feb 2025 → 27 Feb 2025 |
Funding
Funders | Funder number |
---|---|
Yandex Initiative for Machine Learning | |
Israel Science Foundation | |
Tel Aviv University | |
European Research Council | |
Horizon 2020 | 882396 |
Keywords
- delayed feedback
- multi-armed bandits
- online learning