TY - GEN
T1 - Off-policy evaluation in partially observable environments
AU - Tennenholtz, Guy
AU - Mannor, Shie
AU - Shalit, Uri
N1 - Publisher Copyright:
Copyright 2020, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
PY - 2020
Y1 - 2020
N2 - This work studies the problem of batch off-policy evaluation for Reinforcement Learning in partially observable environments. Off-policy evaluation under partial observability is inherently prone to bias, with risk of arbitrarily large errors. We define the problem of off-policy evaluation for Partially Observable Markov Decision Processes (POMDPs) and establish what we believe is the first off-policy evaluation result for POMDPs. In addition, we formulate a model in which observed and unobserved variables are decoupled into two dynamic processes, called a Decoupled POMDP. We show how off-policy evaluation can be performed under this new model, mitigating estimation errors inherent to general POMDPs. We demonstrate the pitfalls of off-policy evaluation in POMDPs using a well-known off-policy method, Importance Sampling, and compare it with our result on synthetic medical data.
AB - This work studies the problem of batch off-policy evaluation for Reinforcement Learning in partially observable environments. Off-policy evaluation under partial observability is inherently prone to bias, with risk of arbitrarily large errors. We define the problem of off-policy evaluation for Partially Observable Markov Decision Processes (POMDPs) and establish what we believe is the first off-policy evaluation result for POMDPs. In addition, we formulate a model in which observed and unobserved variables are decoupled into two dynamic processes, called a Decoupled POMDP. We show how off-policy evaluation can be performed under this new model, mitigating estimation errors inherent to general POMDPs. We demonstrate the pitfalls of off-policy evaluation in POMDPs using a well-known off-policy method, Importance Sampling, and compare it with our result on synthetic medical data.
UR - http://www.scopus.com/inward/record.url?scp=85098831037&partnerID=8YFLogxK
U2 - 10.1609/aaai.v34i06.6590
DO - 10.1609/aaai.v34i06.6590
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:85098831037
T3 - AAAI 2020 - 34th AAAI Conference on Artificial Intelligence
SP - 10276
EP - 10283
BT - AAAI 2020 - 34th AAAI Conference on Artificial Intelligence
PB - AAAI press
T2 - 34th AAAI Conference on Artificial Intelligence, AAAI 2020
Y2 - 7 February 2020 through 12 February 2020
ER -