Optimism in Face of a Context: Regret Guarantees for Stochastic Contextual MDP

Orin Levy, Yishay Mansour

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We present regret minimization algorithms for stochastic contextual MDPs under minimum reachability assumption, using an access to an offline least square regression oracle. We analyze three different settings: where the dynamics is known, where the dynamics is unknown but independent of the context and the most challenging setting where the dynamics is unknown and context-dependent. For the latter, our algorithm obtains regret bound of Oe((H + 1/pmin)H|S|3/2p|A|T log(max{|G|, |P|}/δ)) with probability 1 − δ, where P and G are finite and realizable function classes used to approximate the dynamics and rewards respectively, pmin is the minimum reachability parameter, S is the set of states, A the set of actions, H the horizon, and T the number of episodes. To our knowledge, our approach is the first optimistic approach applied to contextual MDPs with general function approximation (i.e., without additional knowledge regarding the function class, such as it being linear and etc.). We present a lower bound of Ω(pTH|S||A|ln(|G|)/ln(|A|)), on the expected regret which holds even in the case of known dynamics. Lastly, we discuss an extension of our results to CMDPs without minimum reachability, that obtains Oe(T3/4) regret.

Original languageEnglish
Title of host publicationAAAI-23 Technical Tracks 7
EditorsBrian Williams, Yiling Chen, Jennifer Neville
PublisherAAAI press
Pages8510-8517
Number of pages8
ISBN (Electronic)9781577358800
DOIs
StatePublished - 27 Jun 2023
Event37th AAAI Conference on Artificial Intelligence, AAAI 2023 - Washington, United States
Duration: 7 Feb 202314 Feb 2023

Publication series

NameProceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI 2023
Volume37

Conference

Conference37th AAAI Conference on Artificial Intelligence, AAAI 2023
Country/TerritoryUnited States
CityWashington
Period7/02/2314/02/23

Funding

FundersFunder number
Yandex Initiative for Machine Learning
Horizon 2020 Framework Programme882396
Horizon 2020 Framework Programme
European Commission
Israel Science Foundation993/17
Israel Science Foundation
Tel Aviv University

    Fingerprint

    Dive into the research topics of 'Optimism in Face of a Context: Regret Guarantees for Stochastic Contextual MDP'. Together they form a unique fingerprint.

    Cite this