Queue Up Your Regrets: Achieving the Dynamic Capacity Region of Multiplayer Bandits

Ilai Bistritz, Nicholas Bambos

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Consider N cooperative agents such that for T turns, each agent n takes an action an and receives a stochastic reward rn (a1,..., aN ). Agents cannot observe the actions of other agents and do not know even their own reward function. The agents can communicate with their neighbors on a connected graph G with diameter d (G). We want each agent n to achieve an expected average reward of at least λn over time, for a given quality of service (QoS) vector λ. A QoS vector λ is not necessarily achievable. By giving up on immediate reward, knowing that the other agents will compensate later, agents can improve their achievable capacity region. Our main observation is that the gap between λnt and the accumulated reward of agent n, which we call the QoS regret, behaves like a queue. Inspired by this observation, we propose a distributed algorithm that aims to learn a max-weight matching of agents to actions. In each epoch, the algorithm employs a consensus phase where the agents agree on a certain weighted sum of rewards by communicating only O (d (G)) numbers every turn. Then, the algorithm uses distributed successive elimination on a random subset of action profiles to approximately maximize this weighted sum of rewards. We prove a bound on the accumulated sum of expected QoS regrets of all agents, that holds if λ is a safety margin εT away from the boundary of the capacity region, where εT → 0 as T → ∞. This bound implies that, for large T, our algorithm can achieve any λ in the interior of the dynamic capacity region, while all agents are guaranteed an empirical average expected QoS regret of Õ (1) over t = 1,..., T which never exceeds (Equation presented) for any t. We then extend our result to time-varying i.i.d. communication graphs.

Original languageEnglish
Title of host publicationAdvances in Neural Information Processing Systems 35 - 36th Conference on Neural Information Processing Systems, NeurIPS 2022
EditorsS. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, A. Oh
PublisherNeural information processing systems foundation
ISBN (Electronic)9781713871088
StatePublished - 2022
Externally publishedYes
Event36th Conference on Neural Information Processing Systems, NeurIPS 2022 - New Orleans, United States
Duration: 28 Nov 20229 Dec 2022

Publication series

NameAdvances in Neural Information Processing Systems
Volume35
ISSN (Print)1049-5258

Conference

Conference36th Conference on Neural Information Processing Systems, NeurIPS 2022
Country/TerritoryUnited States
CityNew Orleans
Period28/11/229/12/22

Fingerprint

Dive into the research topics of 'Queue Up Your Regrets: Achieving the Dynamic Capacity Region of Multiplayer Bandits'. Together they form a unique fingerprint.

Cite this