## Abstract

We consider the subclass of linear programs that formulate Markov Decision Processes (mdps). We show that the Simplex algorithm with the Gass-Saaty shadow-vertex pivoting rule is strongly polynomial for a subclass of mdps, called controlled random walks (CRWs); the running time is O({pipe}S{pipe}^{3}{dot operator}{pipe}U{pipe}^{2}), where {pipe}S{pipe} denotes the number of states and {pipe}U{pipe} denotes the number of actions per state. This result improves the running time of Zadorojniy et al. (Mathematics of Operations Research 34(4):992-1007, 2009) algorithm by a factor of {pipe}S{pipe}. In particular, the number of iterations needed by the Simplex algorithm for CRWs is linear in the number of states and does not depend on the discount factor.

Original language | English |
---|---|

Pages (from-to) | 159-167 |

Number of pages | 9 |

Journal | Annals of Operations Research |

Volume | 201 |

Issue number | 1 |

DOIs | |

State | Published - Dec 2012 |

## Keywords

- Controlled queues
- Controlled random walks
- Gass-Saaty shadow-vertex pivoting rule
- Markov decision process
- Simplex algorithm