TY - JOUR
T1 - Gamekeeper
T2 - Online Learning for Admission Control of Networked Open Multiagent Systems
AU - Bistritz, Ilai
AU - Bambos, Nicholas
N1 - Publisher Copyright:
© 1963-2012 IEEE.
PY - 2024
Y1 - 2024
N2 - We consider open games where players arrive according to a Poisson process with rate and stay in the game for an exponential random duration with rate μ. The game evolves in continuous time, where each player n sets an exponential random clock and updates his/her action an in 0,K when it expires. The players take independent actions according to local decision rules that, uninterrupted, are designed to converge to an equilibrium. When is small, the game spends most of the time in a (time-varying) equilibrium. This equilibrium exhibits predictable behavior and can have performance guarantees by design. However, when is too small, the system is underutilized since not many players are in the game on average. Choosing the maximal that the game can support while still spending a target fraction 0< ρ < 1 of the time at equilibrium requires knowing the reward functions. To overcome that, we propose an online learning algorithm that the gamekeeper uses to adjust the probability θ to admit an incoming player. The gamekeeper only observes whether an action was changed without observing the action or who played it. We prove that our algorithm learns, with probability 1, a θ such that the game is at equilibrium for at least ρ fraction of the time, and no more than ρ +(μ,ρ)< 1, where we specify (μ,ρ). Our algorithm is a black-box method to transfer performance guarantees of distributed protocols from closed systems to open systems.
AB - We consider open games where players arrive according to a Poisson process with rate and stay in the game for an exponential random duration with rate μ. The game evolves in continuous time, where each player n sets an exponential random clock and updates his/her action an in 0,K when it expires. The players take independent actions according to local decision rules that, uninterrupted, are designed to converge to an equilibrium. When is small, the game spends most of the time in a (time-varying) equilibrium. This equilibrium exhibits predictable behavior and can have performance guarantees by design. However, when is too small, the system is underutilized since not many players are in the game on average. Choosing the maximal that the game can support while still spending a target fraction 0< ρ < 1 of the time at equilibrium requires knowing the reward functions. To overcome that, we propose an online learning algorithm that the gamekeeper uses to adjust the probability θ to admit an incoming player. The gamekeeper only observes whether an action was changed without observing the action or who played it. We prove that our algorithm learns, with probability 1, a θ such that the game is at equilibrium for at least ρ fraction of the time, and no more than ρ +(μ,ρ)< 1, where we specify (μ,ρ). Our algorithm is a black-box method to transfer performance guarantees of distributed protocols from closed systems to open systems.
KW - Game theory
KW - online learning
KW - open multiagent systems
KW - queuing theory
UR - http://www.scopus.com/inward/record.url?scp=85192956831&partnerID=8YFLogxK
U2 - 10.1109/TAC.2024.3398139
DO - 10.1109/TAC.2024.3398139
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:85192956831
SN - 0018-9286
VL - 69
SP - 7694
EP - 7709
JO - IEEE Transactions on Automatic Control
JF - IEEE Transactions on Automatic Control
IS - 11
ER -