We develop a general theory to optimize the frequentist regret for sequential learning problems, where efficient bandit and reinforcement learning algorithms can be derived from unified Bayesian principles. We propose a novel optimization approach to create “algorithmic beliefs” at each round, and use Bayesian posteriors to make decisions. This is the first approach to make Bayesian-type algorithms prior-free and applicable to adversarial settings, in a generic and optimal manner. Moreover, the algorithms are simple and often efficient to implement. As a major application, we present a novel algorithm for multi-armed bandits that achieves the “best-of-all-worlds” empirical performance in the stochastic, adversarial, and non-stationary environments. And we illustrate how these principles can be used in linear bandits, convex bandits, and reinforcement learning.
|Number of pages
|Proceedings of Machine Learning Research
|Published - 2023
|40th International Conference on Machine Learning, ICML 2023 - Honolulu, United States
Duration: 23 Jul 2023 → 29 Jul 2023