Reinforcement learning with feedback graphs

Christoph Dann, Yishay Mansour, Mehryar Mohri, Ayush Sekhari, Karthik Sridharan

Research output: Contribution to journalConference articlepeer-review

3 Scopus citations

Abstract

We study RL in the tabular MDP setting where the agent receives additional observations per step in the form of transitions samples. Such additional observations can be provided in many tasks by auxiliary sensors or by leveraging prior knowledge about the environment (e.g., when certain actions yield similar outcome). We formalize this setting using a feedback graph over state-action pairs and show that model-based algorithms can incorporate additional observations for more sample-efficient learning. We give a regret bound that predominantly depends on the size of the maximum acyclic subgraph of the feedback graph, in contrast with a polynomial dependency on the number of states and actions in the absence of side observations. Finally, we highlight fundamental challenges for leveraging a small dominating set of the feedback graph, as compared to the well-studied bandit setting, and propose a new algorithm that can use such a dominating set to learn a near-optimal policy faster.

Original languageEnglish
JournalAdvances in Neural Information Processing Systems
Volume2020-December
StatePublished - 2020
Event34th Conference on Neural Information Processing Systems, NeurIPS 2020 - Virtual, Online
Duration: 6 Dec 202012 Dec 2020

Funding

FundersFunder number
Google Research
National Science FoundationCCF-1535987, IIS-1618662
Google1750575
Horizon 2020 Framework Programme882396
European Research Council
Israel Science Foundation993/17

    Fingerprint

    Dive into the research topics of 'Reinforcement learning with feedback graphs'. Together they form a unique fingerprint.

    Cite this