Abstract
We revisit the classical problem of multiclass classification with bandit feedback (Kakade, Shalev-Shwartz, and Tewari, 2008), where each input classifies to one of K possible labels and feedback is restricted to whether the predicted label is correct or not. Our primary inquiry is with regard to the dependency on the number of labels K, and whether T-step regret bounds in this setting can be improved beyond the √KT dependence exhibited by existing algorithms. Our main contribution is in showing that the_minimax regret of bandit multiclass is in fact more nuanced, and is of the form Θ(Equation presented) (min{|H | + √T, √KT log|H |}), where H is the underlying (finite) hypothesis class. In particular, we present a new bandit classification algorithm that guarantees regret Õ(|H | +√T), improving over classical algorithms for moderately-sized hypothesis classes, and give a matching lower bound establishing tightness of the upper bounds (up to log-factors) in all parameter regimes.
Original language | English |
---|---|
Pages (from-to) | 1573-1598 |
Number of pages | 26 |
Journal | Proceedings of Machine Learning Research |
Volume | 247 |
State | Published - 2024 |
Event | 37th Annual Conference on Learning Theory, COLT 2024 - Edmonton, Canada Duration: 30 Jun 2024 → 3 Jul 2024 |
Funding
Funders | Funder number |
---|---|
European Research Council Executive Agency | |
Yandex Initiative for Machine Learning | |
Blavatnik Family Foundation | |
Technion Center for Machine Learning and Intelligent Systems | |
MLIS | |
European Commission | |
Tel Aviv University | |
European Research Council | |
United States-Israel Binational Science Foundation | 2018385 |
Horizon 2020 | 882396, 101078075 |
Aegis Foundation | 1225/20 |
Israel Science Foundation | 2549/19, 2250/22 |
GENERALIZATION | 101039692 |