## Abstract

Consider N cooperative agents such that for T turns, each agent n takes an action a_{n} and receives a stochastic reward r_{n} (a_{1},..., a_{N} ). Agents cannot observe the actions of other agents and do not know even their own reward function. The agents can communicate with their neighbors on a connected graph G with diameter d (G). We want each agent n to achieve an expected average reward of at least λ_{n} over time, for a given quality of service (QoS) vector λ. A QoS vector λ is not necessarily achievable. By giving up on immediate reward, knowing that the other agents will compensate later, agents can improve their achievable capacity region. Our main observation is that the gap between λ_{n}t and the accumulated reward of agent n, which we call the QoS regret, behaves like a queue. Inspired by this observation, we propose a distributed algorithm that aims to learn a max-weight matching of agents to actions. In each epoch, the algorithm employs a consensus phase where the agents agree on a certain weighted sum of rewards by communicating only O (d (G)) numbers every turn. Then, the algorithm uses distributed successive elimination on a random subset of action profiles to approximately maximize this weighted sum of rewards. We prove a bound on the accumulated sum of expected QoS regrets of all agents, that holds if λ is a safety margin ε_{T} away from the boundary of the capacity region, where ε_{T} → 0 as T → ∞. This bound implies that, for large T, our algorithm can achieve any λ in the interior of the dynamic capacity region, while all agents are guaranteed an empirical average expected QoS regret of Õ (1) over t = 1,..., T which never exceeds (Equation presented) for any t. We then extend our result to time-varying i.i.d. communication graphs.

Original language | English |
---|---|

Title of host publication | Advances in Neural Information Processing Systems 35 - 36th Conference on Neural Information Processing Systems, NeurIPS 2022 |

Editors | S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, A. Oh |

Publisher | Neural information processing systems foundation |

ISBN (Electronic) | 9781713871088 |

State | Published - 2022 |

Externally published | Yes |

Event | 36th Conference on Neural Information Processing Systems, NeurIPS 2022 - New Orleans, United States Duration: 28 Nov 2022 → 9 Dec 2022 |

### Publication series

Name | Advances in Neural Information Processing Systems |
---|---|

Volume | 35 |

ISSN (Print) | 1049-5258 |

### Conference

Conference | 36th Conference on Neural Information Processing Systems, NeurIPS 2022 |
---|---|

Country/Territory | United States |

City | New Orleans |

Period | 28/11/22 → 9/12/22 |