VALUE ITERATION
Value iteration - This is an algorithm for infinite horizon [stochastic] dynamic programs that proceeds by successive approximation to satisfy the fundamental equation:
F(s) = Opt(r(x, s) + a Sum_s'(P(x, s, s')F(s'))), where
a is a discount rate . The successive approximation becomes the DP forward equation. If 0 <
a < 1, this is a fixed point , and Banach's theorem yields convergence because then `Opt' is a contraction map . Even when there is no discounting, policy iteration can apply.