TY - GEN
T1 - Queue Up Your Regrets
T2 - 36th Conference on Neural Information Processing Systems, NeurIPS 2022
AU - Bistritz, Ilai
AU - Bambos, Nicholas
N1 - Publisher Copyright:
© 2022 Neural information processing systems foundation. All rights reserved.
PY - 2022
Y1 - 2022
N2 - Consider N cooperative agents such that for T turns, each agent n takes an action an and receives a stochastic reward rn (a1,..., aN ). Agents cannot observe the actions of other agents and do not know even their own reward function. The agents can communicate with their neighbors on a connected graph G with diameter d (G). We want each agent n to achieve an expected average reward of at least λn over time, for a given quality of service (QoS) vector λ. A QoS vector λ is not necessarily achievable. By giving up on immediate reward, knowing that the other agents will compensate later, agents can improve their achievable capacity region. Our main observation is that the gap between λnt and the accumulated reward of agent n, which we call the QoS regret, behaves like a queue. Inspired by this observation, we propose a distributed algorithm that aims to learn a max-weight matching of agents to actions. In each epoch, the algorithm employs a consensus phase where the agents agree on a certain weighted sum of rewards by communicating only O (d (G)) numbers every turn. Then, the algorithm uses distributed successive elimination on a random subset of action profiles to approximately maximize this weighted sum of rewards. We prove a bound on the accumulated sum of expected QoS regrets of all agents, that holds if λ is a safety margin εT away from the boundary of the capacity region, where εT → 0 as T → ∞. This bound implies that, for large T, our algorithm can achieve any λ in the interior of the dynamic capacity region, while all agents are guaranteed an empirical average expected QoS regret of Õ (1) over t = 1,..., T which never exceeds (Equation presented) for any t. We then extend our result to time-varying i.i.d. communication graphs.
AB - Consider N cooperative agents such that for T turns, each agent n takes an action an and receives a stochastic reward rn (a1,..., aN ). Agents cannot observe the actions of other agents and do not know even their own reward function. The agents can communicate with their neighbors on a connected graph G with diameter d (G). We want each agent n to achieve an expected average reward of at least λn over time, for a given quality of service (QoS) vector λ. A QoS vector λ is not necessarily achievable. By giving up on immediate reward, knowing that the other agents will compensate later, agents can improve their achievable capacity region. Our main observation is that the gap between λnt and the accumulated reward of agent n, which we call the QoS regret, behaves like a queue. Inspired by this observation, we propose a distributed algorithm that aims to learn a max-weight matching of agents to actions. In each epoch, the algorithm employs a consensus phase where the agents agree on a certain weighted sum of rewards by communicating only O (d (G)) numbers every turn. Then, the algorithm uses distributed successive elimination on a random subset of action profiles to approximately maximize this weighted sum of rewards. We prove a bound on the accumulated sum of expected QoS regrets of all agents, that holds if λ is a safety margin εT away from the boundary of the capacity region, where εT → 0 as T → ∞. This bound implies that, for large T, our algorithm can achieve any λ in the interior of the dynamic capacity region, while all agents are guaranteed an empirical average expected QoS regret of Õ (1) over t = 1,..., T which never exceeds (Equation presented) for any t. We then extend our result to time-varying i.i.d. communication graphs.
UR - http://www.scopus.com/inward/record.url?scp=85163196132&partnerID=8YFLogxK
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:85163196132
T3 - Advances in Neural Information Processing Systems
BT - Advances in Neural Information Processing Systems 35 - 36th Conference on Neural Information Processing Systems, NeurIPS 2022
A2 - Koyejo, S.
A2 - Mohamed, S.
A2 - Agarwal, A.
A2 - Belgrave, D.
A2 - Cho, K.
A2 - Oh, A.
PB - Neural information processing systems foundation
Y2 - 28 November 2022 through 9 December 2022
ER -