TY - JOUR
T1 - Relative entropy in sequential decision problems
AU - Lehrer, Ehud
AU - Smorodinsky, Rann
N1 - Funding Information:
The author gratefully acknowledges BSF grants 96/00043 and 97/113, NSF grant SBR-9730385, and Technion MANLAM and research promotion grants.
PY - 2000/5
Y1 - 2000/5
N2 - Consider an agent who faces a sequential decision problem. At each stage the agent takes an action and observes a stochastic outcome (e.g., daily prices, weather conditions, opponents' actions in a repeated game, etc.). The agent's stage-utility depends on his action, the observed outcome and on previous outcomes. We assume the agent is Bayesian and is endowed with a subjective belief over the distribution of outcomes. The agent's initial belief is typically inaccurate. Therefore, his subjectively optimal strategy is initially suboptimal. As time passes information about the true dynamics is accumulated and, depending on the compatibility of the belief with respect to the truth, the agent may eventually learn to optimize. We introduce the notion of relative entropy, which is a natural adaptation of the entropy of a stochastic process to the subjective set-up. We present conditions, expressed in terms of relative entropy, that determine whether the agent will eventually learn to optimize. It is shown that low entropy yields asymptotic optimal behavior. In addition, we present a notion of pointwise merging and link it with relative entropy.
AB - Consider an agent who faces a sequential decision problem. At each stage the agent takes an action and observes a stochastic outcome (e.g., daily prices, weather conditions, opponents' actions in a repeated game, etc.). The agent's stage-utility depends on his action, the observed outcome and on previous outcomes. We assume the agent is Bayesian and is endowed with a subjective belief over the distribution of outcomes. The agent's initial belief is typically inaccurate. Therefore, his subjectively optimal strategy is initially suboptimal. As time passes information about the true dynamics is accumulated and, depending on the compatibility of the belief with respect to the truth, the agent may eventually learn to optimize. We introduce the notion of relative entropy, which is a natural adaptation of the entropy of a stochastic process to the subjective set-up. We present conditions, expressed in terms of relative entropy, that determine whether the agent will eventually learn to optimize. It is shown that low entropy yields asymptotic optimal behavior. In addition, we present a notion of pointwise merging and link it with relative entropy.
KW - Optimization
KW - Relative entropy
KW - Sequential decision problems
UR - http://www.scopus.com/inward/record.url?scp=0012338027&partnerID=8YFLogxK
U2 - 10.1016/S0304-4068(99)00027-0
DO - 10.1016/S0304-4068(99)00027-0
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:0012338027
SN - 0304-4068
VL - 33
SP - 425
EP - 439
JO - Journal of Mathematical Economics
JF - Journal of Mathematical Economics
IS - 4
ER -