TY - JOUR

T1 - Efficient Rate Optimal Regret for Adversarial Contextual MDPs Using Online Function Approximation

AU - Levy, Orin

AU - Cohen, Alon

AU - Cassel, Asaf

AU - Mansour, Yishay

N1 - Publisher Copyright:
© 2023 Proceedings of Machine Learning Research. All rights reserved.

PY - 2023

Y1 - 2023

N2 - We present the OMG-CMDP! algorithm for regret minimization in adversarial Contextual MDPs. The algorithm operates under the minimal assumptions of realizable function class and access to online least squares and log loss regression oracles. Our algorithm is efficient (assuming efficient online regression oracles), simple and robust to approximation errors. It enjoys an Oe(H2.5pT|S||A|(RTH(O) + H log(δ−1))) regret guarantee, with T being the number of episodes, S the state space, A the action space, H the horizon and RTH(O) = RTH(OsqF ) + RTH(OlogP ) is the sum of the square and log-loss regression oracles' regret, used to approximate the context-dependent rewards and dynamics, respectively. To the best of our knowledge, our algorithm is the first efficient rate optimal regret minimization algorithm for adversarial CMDPs that operates under the minimal standard assumption of online function approximation.

AB - We present the OMG-CMDP! algorithm for regret minimization in adversarial Contextual MDPs. The algorithm operates under the minimal assumptions of realizable function class and access to online least squares and log loss regression oracles. Our algorithm is efficient (assuming efficient online regression oracles), simple and robust to approximation errors. It enjoys an Oe(H2.5pT|S||A|(RTH(O) + H log(δ−1))) regret guarantee, with T being the number of episodes, S the state space, A the action space, H the horizon and RTH(O) = RTH(OsqF ) + RTH(OlogP ) is the sum of the square and log-loss regression oracles' regret, used to approximate the context-dependent rewards and dynamics, respectively. To the best of our knowledge, our algorithm is the first efficient rate optimal regret minimization algorithm for adversarial CMDPs that operates under the minimal standard assumption of online function approximation.

UR - http://www.scopus.com/inward/record.url?scp=85174407759&partnerID=8YFLogxK

M3 - ???researchoutput.researchoutputtypes.contributiontojournal.conferencearticle???

AN - SCOPUS:85174407759

SN - 2640-3498

VL - 202

SP - 19287

EP - 19314

JO - Proceedings of Machine Learning Research

JF - Proceedings of Machine Learning Research

T2 - 40th International Conference on Machine Learning, ICML 2023

Y2 - 23 July 2023 through 29 July 2023

ER -