TY - JOUR

T1 - A strongly polynomial algorithm for controlled queues

AU - Zadorojniy, Alexander

AU - Even, Guy

AU - Shwartz, Adam

PY - 2009/11

Y1 - 2009/11

N2 - We consider the problem of computing optimal policies of finite-state finite-action Markov decision processes (MDPs). A reduction to a continuum of constrained MDPs (CMDPs) is presented such that the optimal policies for these CMDPs constitute a path in a graph defined over the deterministic policies. This path contains, in particular, an optimal policy of the original MDP. We present an algorithm based on this new approach that finds this path, and thus an optimal policy. In the general case, this path might be exponentially long in the number of states and actions. We prove that the length of this path is polynomial if the MDP satisfies a coupling property. Thus we obtain a strongly polynomial algorithm for MDPs that satisfies the coupling property. We prove that discrete time versions of controlled M/M/1 queues induce MDPs that satisfy the coupling property. The only previously known polynomial algorithm for controlled M/M/1 queues in the expected average cost model is based on linear programming (and is not known to be strongly polynomial). Our algorithm works both for the discounted and expected average cost models, and the running time does not depend on the discount factor.

AB - We consider the problem of computing optimal policies of finite-state finite-action Markov decision processes (MDPs). A reduction to a continuum of constrained MDPs (CMDPs) is presented such that the optimal policies for these CMDPs constitute a path in a graph defined over the deterministic policies. This path contains, in particular, an optimal policy of the original MDP. We present an algorithm based on this new approach that finds this path, and thus an optimal policy. In the general case, this path might be exponentially long in the number of states and actions. We prove that the length of this path is polynomial if the MDP satisfies a coupling property. Thus we obtain a strongly polynomial algorithm for MDPs that satisfies the coupling property. We prove that discrete time versions of controlled M/M/1 queues induce MDPs that satisfy the coupling property. The only previously known polynomial algorithm for controlled M/M/1 queues in the expected average cost model is based on linear programming (and is not known to be strongly polynomial). Our algorithm works both for the discounted and expected average cost models, and the running time does not depend on the discount factor.

KW - Constrained Markov decision process

KW - Controlled queues

KW - Linear programming

KW - M/M/1 queue

KW - Markov decision process

KW - Optimization

UR - http://www.scopus.com/inward/record.url?scp=73249130548&partnerID=8YFLogxK

U2 - 10.1287/moor.1090.0415

DO - 10.1287/moor.1090.0415

M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???

AN - SCOPUS:73249130548

SN - 0364-765X

VL - 34

SP - 992

EP - 1007

JO - Mathematics of Operations Research

JF - Mathematics of Operations Research

IS - 4

ER -