A sparse sampling algorithm for near-optimal planning in large Markov decision processes

Michael Kearns, Yishay Mansour, Andrew Y. Ng

Research output: Contribution to journalConference articlepeer-review


An issue that is critical for the application of Markov decision processes (MDPs) to realistic problems is how the complexity of planning scales with the size of the MDP. In stochastic environments with very large or even infinite state spaces, traditional planning and reinforcement learning algorithms are often inapplicable, since their running time typically scales linearly with the state space size In this paper we present a new algorithm that, given only a generative model (simulator) for an arbitrary MDP, performs near-optimal planning with a running time that has no dependence on the number of states. Although the running time is exponential in the horizon time (which depends only on the discount factor 7 and the desired degree of approximation to the optimal policy), our results establish for the first time that there are no theoretical barriers to computing near-optimal policies in arbitrarily large, unstructured MDPs. Our algorithm is based on the idea of sparse sampling. We prove that a randomly sampled look-ahead tree that covers only a vanishing fraction of the full look-ahead tree nevertheless suffices to compute near-optimal actions from any state of an MDP. Practical imple-mentations of the algorithm are discussed, and we draw ties to our related recent results on finding a near-best strategy from a given class of strategies in very large partially observable MDPs [KMN99].

Original languageEnglish
Pages (from-to)1324-1331
Number of pages8
JournalIJCAI International Joint Conference on Artificial Intelligence
StatePublished - 1999
Event16th International Joint Conference on Artificial Intelligence, IJCAI 1999 - Stockholm, Sweden
Duration: 31 Jul 19996 Aug 1999


Dive into the research topics of 'A sparse sampling algorithm for near-optimal planning in large Markov decision processes'. Together they form a unique fingerprint.

Cite this