Dynamic programming models with vector-valued returns are investigated. The sets of (Pareto) maximal returns and (Pareto) maximal policies are defined. Monotonicity conditions are shown to be sufficient for the set of maximal policies to include a stationary policy, and for the set of maximal returns to be in the convex hull of returns of stationary policies. In particular, it is shown that these results hold for Markov decision processes.
|Number of pages||10|
|Journal||SIAM Journal on Control and Optimization|
|State||Published - 1983|