Abstract
We study repeated two-player games where one of the players, the learner, employs a no-regret learning strategy, while the other, the optimizer, is a rational utility maximizer. We consider general Bayesian games, where the payoffs of both the optimizer and the learner could depend on the type, which is drawn from a publicly known distribution, but revealed privately to the learner. We address the following questions: (a) what is the bare minimum that the optimizer can guarantee to obtain regardless of the no-regret learning algorithm employed by the learner? (b) are there learning algorithms that cap the optimizer payoff at this minimum? (c) can these algorithms be implemented efficiently? While building this theory of optimizer-learner interactions, we define a new combinatorial notion of regret called polytope swap regret, that could be of independent interest in other settings.
Original language | English |
---|---|
Pages (from-to) | 5221-5252 |
Number of pages | 32 |
Journal | Proceedings of Machine Learning Research |
Volume | 178 |
State | Published - 2022 |
Event | 35th Conference on Learning Theory, COLT 2022 - London, United Kingdom Duration: 2 Jul 2022 → 5 Jul 2022 |
Funding
Funders | Funder number |
---|---|
Yandex Initiative for Machine Learning | |
Horizon 2020 Framework Programme | 882396 |
European Commission | |
Israel Science Foundation | 993/17 |
Tel Aviv University |
Keywords
- Bayesian games
- Stackelberg value
- swap regret