Improved Algorithms for Multi-period Multi-class Packing Problems with Bandit Feedback

Wonyoung Kim*, Garud Iyengar, Assaf Zeevi

*Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review

Abstract

We consider the linear contextual multi-class multi-period packing problem (LMMP) where the goal is to pack items such that the total vector of consumption is below a given budget vector and the total value is as large as possible. We consider the setting where the reward and the consumption vector associated with each action is a class-dependent linear function of the context, and the decision-maker receives bandit feedback. LMMP includes linear contextual bandits with knapsacks and online revenue management as special cases. We establish a new estimator which guarantees a faster convergence rate, and consequently, a lower regret in LMMP. We propose a bandit policy that is a closed-form function of said estimated parameters. When the contexts are non-degenerate, the regret of the proposed policy is sublinear in the context dimension, the number of classes, and the time horizon T when the budget grows at least as T. We also resolve an open problem posed in Agrawal & Devanur (2016), and extend the result to a multi-class setting. Our numerical experiments clearly demonstrate that the performance of our policy is superior to other benchmarks in the literature.

Original languageEnglish
Pages (from-to)16458-16501
Number of pages44
JournalProceedings of Machine Learning Research
Volume202
StatePublished - 2023
Externally publishedYes
Event40th International Conference on Machine Learning, ICML 2023 - Honolulu, United States
Duration: 23 Jul 202329 Jul 2023

Fingerprint

Dive into the research topics of 'Improved Algorithms for Multi-period Multi-class Packing Problems with Bandit Feedback'. Together they form a unique fingerprint.

Cite this