TY - JOUR
T1 - Evolution of reinforcement learning in uncertain environments
T2 - A simple explanation for complex foraging behaviors
AU - Niv, Yael
AU - Joel, Daphna
AU - Meilijson, Isaac
AU - Ruppin, Eytan
PY - 2002
Y1 - 2002
N2 - Reinforcement learning is a fundamental process by which organisms learn to achieve goals from their interactions with the environment. Using evolutionary computation techniques we evolve (near-)optimal neuronal learning rules in a simple neural network model of reinforcement learning in bumblebees foraging for nectar. The resulting neural networks exhibit efficient reinforcement learning, allowing the bees to respond rapidly to changes in reward contingencies. The evolved synaptic plasticity dynamics give rise to varying exploration/exploitation levels and to the well-documented choice strategies of risk aversion and probability matching. Additionally, risk aversion is shown to emerge even when bees are evolved in a completely risk-less environment. In contrast to existing theories in economics and game theory, risk-averse behavior is shown to be a direct consequence of (near-)optimal reinforcement learning, without requiring additional assumptions such as the existence of a nonlinear subjective utility function for rewards. Our results are corroborated by a rigorous mathematical analysis, and their robustness in real-world situations is supported by experiments in a mobile robot. Thus we provide a biologically founded, parsimonious, and novel explanation for risk aversion and probability matching.
AB - Reinforcement learning is a fundamental process by which organisms learn to achieve goals from their interactions with the environment. Using evolutionary computation techniques we evolve (near-)optimal neuronal learning rules in a simple neural network model of reinforcement learning in bumblebees foraging for nectar. The resulting neural networks exhibit efficient reinforcement learning, allowing the bees to respond rapidly to changes in reward contingencies. The evolved synaptic plasticity dynamics give rise to varying exploration/exploitation levels and to the well-documented choice strategies of risk aversion and probability matching. Additionally, risk aversion is shown to emerge even when bees are evolved in a completely risk-less environment. In contrast to existing theories in economics and game theory, risk-averse behavior is shown to be a direct consequence of (near-)optimal reinforcement learning, without requiring additional assumptions such as the existence of a nonlinear subjective utility function for rewards. Our results are corroborated by a rigorous mathematical analysis, and their robustness in real-world situations is supported by experiments in a mobile robot. Thus we provide a biologically founded, parsimonious, and novel explanation for risk aversion and probability matching.
KW - Dopamine
KW - Evolutionary computation
KW - Heterosynaptic plasticity
KW - Neuromodulation
KW - Probability matching
KW - Reinforcement learning
KW - Risk aversion
UR - http://www.scopus.com/inward/record.url?scp=0036972336&partnerID=8YFLogxK
U2 - 10.1177/10597123020101001
DO - 10.1177/10597123020101001
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:0036972336
SN - 1059-7123
VL - 10
SP - 5
EP - 24
JO - Adaptive Behavior
JF - Adaptive Behavior
IS - 1
ER -