Abstract
We consider the subclass of linear programs that formulate Markov Decision Processes (mdps). We show that the Simplex algorithm with the Gass-Saaty shadow-vertex pivoting rule is strongly polynomial for a subclass of mdps, called controlled random walks (CRWs); the running time is O({pipe}S{pipe}3{dot operator}{pipe}U{pipe}2), where {pipe}S{pipe} denotes the number of states and {pipe}U{pipe} denotes the number of actions per state. This result improves the running time of Zadorojniy et al. (Mathematics of Operations Research 34(4):992-1007, 2009) algorithm by a factor of {pipe}S{pipe}. In particular, the number of iterations needed by the Simplex algorithm for CRWs is linear in the number of states and does not depend on the discount factor.
Original language | English |
---|---|
Pages (from-to) | 159-167 |
Number of pages | 9 |
Journal | Annals of Operations Research |
Volume | 201 |
Issue number | 1 |
DOIs | |
State | Published - Dec 2012 |
Keywords
- Controlled queues
- Controlled random walks
- Gass-Saaty shadow-vertex pivoting rule
- Markov decision process
- Simplex algorithm