Abstract
We consider open games where players arrive according to a Poisson process with rate <inline-formula><tex-math notation="LaTeX">$\lambda$</tex-math></inline-formula> and stay in the game for an exponential random duration with rate <inline-formula><tex-math notation="LaTeX">$\mu$</tex-math></inline-formula>. The game evolves in continuous time where each player <inline-formula><tex-math notation="LaTeX">$n$</tex-math></inline-formula> sets an exponential random clock and updates her action <inline-formula><tex-math notation="LaTeX">$a_{n}\in \lbrace 0,\ldots,K\rbrace $</tex-math></inline-formula> when it expires. The players take independent actions according to local decision rules that, uninterrupted, are designed to converge to an equilibrium. This models open multiagent systems such as in wireless networks, transportation, cloud computing, and online marketplaces. When <inline-formula><tex-math notation="LaTeX">$\lambda$</tex-math></inline-formula> is small, the game spends most of the time in a (time-varying) equilibrium. This equilibrium exhibits predictable behavior and can have performance guarantees by design. However, when <inline-formula><tex-math notation="LaTeX">$\lambda$</tex-math></inline-formula> is too small, the system is under-utilized since not many players are in the game on average. Choosing the maximal <inline-formula><tex-math notation="LaTeX">$\lambda$</tex-math></inline-formula> that the game can support while still spending a target fraction <inline-formula><tex-math notation="LaTeX">$0< \rho < 1$</tex-math></inline-formula> of the time at equilibrium requires knowing the reward functions. To overcome that, we propose an online learning algorithm that the gamekeeper uses to adjust the probability <inline-formula><tex-math notation="LaTeX">$\theta$</tex-math></inline-formula> to admit an incoming player. The gamekeeper only observes whether an action was changed, without observing the action or who played it. We prove that our algorithm learns, with probability 1, a <inline-formula><tex-math notation="LaTeX">$\theta ^{*}$</tex-math></inline-formula> such that the game is at equilibrium for at least <inline-formula><tex-math notation="LaTeX">$\rho$</tex-math></inline-formula> fraction of the time, and no more than <inline-formula><tex-math notation="LaTeX">$\rho +\varepsilon (\mu,\rho)< 1$</tex-math></inline-formula>, where we provide an analytic expression for <inline-formula><tex-math notation="LaTeX">$\varepsilon (\mu,\rho)$</tex-math></inline-formula>. Our algorithm is a black-box method to transfer performance guarantees of distributed protocols from closed systems to open systems.
Original language | English |
---|---|
Pages (from-to) | 1-16 |
Number of pages | 16 |
Journal | IEEE Transactions on Automatic Control |
DOIs | |
State | Accepted/In press - 2024 |
Keywords
- Admission control
- Convergence
- Game theory
- Games
- Multi-agent systems
- online learning
- open multiagent systems
- Protocols
- queuing theory
- Servers
- Wireless networks