The Real Price of Bandit Information in Multiclass Classification

Liad Erez, Alon Cohen, Tomer Koren, Yishay Mansour, Shay Moran

Research output: Contribution to journalConference articlepeer-review

Abstract

We revisit the classical problem of multiclass classification with bandit feedback (Kakade, Shalev-Shwartz, and Tewari, 2008), where each input classifies to one of K possible labels and feedback is restricted to whether the predicted label is correct or not. Our primary inquiry is with regard to the dependency on the number of labels K, and whether T-step regret bounds in this setting can be improved beyond the √KT dependence exhibited by existing algorithms. Our main contribution is in showing that the_minimax regret of bandit multiclass is in fact more nuanced, and is of the form Θ(Equation presented) (min{|H | + √T, √KT log|H |}), where H is the underlying (finite) hypothesis class. In particular, we present a new bandit classification algorithm that guarantees regret Õ(|H | +T), improving over classical algorithms for moderately-sized hypothesis classes, and give a matching lower bound establishing tightness of the upper bounds (up to log-factors) in all parameter regimes.

Original languageEnglish
Pages (from-to)1573-1598
Number of pages26
JournalProceedings of Machine Learning Research
Volume247
StatePublished - 2024
Event37th Annual Conference on Learning Theory, COLT 2024 - Edmonton, Canada
Duration: 30 Jun 20243 Jul 2024

Funding

FundersFunder number
European Research Council Executive Agency
Yandex Initiative for Machine Learning
Blavatnik Family Foundation
Technion Center for Machine Learning and Intelligent Systems
MLIS
European Commission
Tel Aviv University
European Research Council
United States-Israel Binational Science Foundation2018385
Horizon 2020882396, 101078075
Aegis Foundation1225/20
Israel Science Foundation2549/19, 2250/22
GENERALIZATION101039692

    Fingerprint

    Dive into the research topics of 'The Real Price of Bandit Information in Multiclass Classification'. Together they form a unique fingerprint.

    Cite this