## Abstract

Given a policy of a Markov Decision Process, we define a SAFEZONE as a subset of states, such that most of the policy's trajectories are confined to this subset. The quality of a SAFEZONE is parameterized by the number of states and the escape probability, i.e., the probability that a random trajectory will leave the subset. SAFEZONES are especially interesting when they have a small number of states and low escape probability. We study the complexity of finding optimal SAFEZONES, and show that in general, the problem is computationally hard. Our main result is a bi-criteria approximation learning algorithm with a factor of almost 2 approximation for both the escape probability and SAFEZONE size, using a polynomial size sample complexity.

Original language | English |
---|---|

Journal | Advances in Neural Information Processing Systems |

Volume | 36 |

State | Published - 2023 |

Event | 37th Conference on Neural Information Processing Systems, NeurIPS 2023 - New Orleans, United States Duration: 10 Dec 2023 → 16 Dec 2023 |

### Funding

Funders | Funder number |
---|---|

Yandex Initiative for Machine Learning | |

Tel Aviv University | |

European Research Council | |

Horizon 2020 | 882396 |

Horizon 2020 | |

Israel Science Foundation | 993/17 |

Israel Science Foundation |