Abstract
The convergence of an approximation scheme known as policy iteration has been demonstrated for controlled diffusions by Fleming, Puterman, and Bismut. In this paper, we show that this approximation scheme is equivalent to the Newton-Kantorovich iteration for solving the optimality equation and exploit this equivalence to obtain a new proof of convergence. Estimates of the rate of convergence of this procedure are also obtained.
Similar content being viewed by others
References
Fleming, W. H.,Some Markovian Optimization Problems, Journal of Mathematics and Mechanics, Vol. 12, pp. 131–140, 1963.
Puterman, M. L.,Optimal Control of Diffusion Processes with Reflection, Journal of Optimization Theory and Applications, Vol. 22, 103–116, 1977.
Bismut, J.,An Approximation Method in Optimal Stochastic Control, SIAM Journal on Control and Optimization, Vol. 16, pp. 122–130, 1978.
Puterman, M. L., andBrumelle, S. L.,On the Convergence of Policy Iteration in Stationary Dynamic Programming, Mathematics of Operations Research, Vol. 4, pp. 60–69, 1979.
Stroock, D. W., andVaradhan, S. R. S.,Diffusion Processes with Boundary Conditions, Communications on Pure and Applied Mathematics, Vol. 26, pp. 147–226, 1971.
Kantorovich, L. V., andAkilov, G. P.,Functional Analysis in Normed Spaces, The Macmillan Company, New York, New York, 1964.
Stroock, D. W., andVaradhan, S. R. S.,Diffusion Processes with Continuous Coefficients, II, Communications on Pure and Applied Mathematics, Vol. 22, pp. 479–530, 1969.
Fleming, W. H., andRishel, R.,Deterministic and Stochastic Optimal Control, Springer-Verlag, New York, New York, 1975.
Mandl, P.,Analytic Treatment of One-Dimensional Markov Processes, Springer-Verlag, New York, New York, 1968.
Author information
Authors and Affiliations
Additional information
Communicated by R. Rishel
This research was partially supported by NRC Grant No. A-3609.
Rights and permissions
About this article
Cite this article
Puterman, M.L. On the convergence of policy iteration for controlled diffusions. J Optim Theory Appl 33, 137–144 (1981). https://doi.org/10.1007/BF00935182
Issue Date:
DOI: https://doi.org/10.1007/BF00935182