Stochastic Shortest Path with Adversarially Changing Costs

Aviv Rosenberg, Yishay Mansour

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

13 Scopus citations

Abstract

Stochastic shortest path (SSP) is a well-known problem in planning and control, in which an agent has to reach a goal state in minimum total expected cost. In this paper we present the adversarial SSP model that also accounts for adversarial changes in the costs over time, while the underlying transition function remains unchanged. Formally, an agent interacts with an SSP environment for K episodes, the cost function changes arbitrarily between episodes, and the transitions are unknown to the agent. We develop the first algorithms for adversarial SSPs and prove high probability regret bounds of square-root K assuming all costs are strictly positive, and sub-linear regret in the general case. We are the first to consider this natural setting of adversarial SSP and obtain sub-linear regret for it.

Original languageEnglish
Title of host publicationProceedings of the 30th International Joint Conference on Artificial Intelligence, IJCAI 2021
EditorsZhi-Hua Zhou
PublisherInternational Joint Conferences on Artificial Intelligence
Pages2936-2942
Number of pages7
ISBN (Electronic)9780999241196
DOIs
StatePublished - 2021
Event30th International Joint Conference on Artificial Intelligence, IJCAI 2021 - Virtual, Online, Canada
Duration: 19 Aug 202127 Aug 2021

Publication series

NameIJCAI International Joint Conference on Artificial Intelligence
ISSN (Print)1045-0823

Conference

Conference30th International Joint Conference on Artificial Intelligence, IJCAI 2021
Country/TerritoryCanada
CityVirtual, Online
Period19/08/2127/08/21

Funding

FundersFunder number
Yandex Initiative for Machine Learning
Horizon 2020 Framework Programme
European Research Council
Israel Science Foundation993/17
Tel Aviv University
Horizon 2020882396

    Fingerprint

    Dive into the research topics of 'Stochastic Shortest Path with Adversarially Changing Costs'. Together they form a unique fingerprint.

    Cite this