Journal of Optimization Theory and Applications

, Volume 52, Issue 2, pp 227–241 | Cite as

Adaptive control of Markov processes with incomplete state information and unknown parameters

  • O. Hernandez-Lerma
  • S. I. Marcus
Contributed Papers


Recent results for parameter-adaptive Markov decision processes (MDP's) are extended to partially observed MDP's depending on unknown parameters. These results include approximations converging uniformly to the optimal reward function and asymptotically optimal adaptive policies.

Key Words

Partially observed Markov decision processes unknown parameters discounted reward criterion adaptive I-policies non-stationary value iteration principle of estimation and control 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Sawaragi, Y., andYoshikawa, T.,Discrete-Time Markovian Decision Processes with Incomplete State Observation, Annals of Mathematical Statistics, Vol. 41, pp. 78–86, 1970.Google Scholar
  2. 2.
    Wakuta, K.,Semi-Markov Decision Processes with Incomplete State Observation—Discounted Cost Criterion, Journal of the Operations Research Society of Japan, Vol. 25, pp. 351–362, 1982.Google Scholar
  3. 3.
    Monahan, G. E.,A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms, Management Science, Vol. 28, pp. 1–16, 1982.Google Scholar
  4. 4.
    Kumar, P. R.,A Survey of Some Results in Stochastic Adaptive Control, SIAM Journal on Control and Optimization, Vol. 23, pp. 329–380, 1985.Google Scholar
  5. 5.
    Schal, M.,Estimation and Control in Discounted Stochastic Dynamic Programming, Preprint No. 428, Institute of Applied Math., Universität Bonn, 1981.Google Scholar
  6. 6.
    Hernandez-Lerma, O., andMarcus, S. I.,Adaptive Control of Discounted Markov Decision Chains, Journal of Optimization Theory and Applications, Vol. 46, pp. 227–235, 1985.Google Scholar
  7. 7.
    Hernandez-Lerma, O.,Approximation and Adaptive Policies in Discounted Dynamic Programming, Bol. Soc. Mat. Mexicana, Vol. 30, 1986.Google Scholar
  8. 8.
    Mandl, P.,Estimation and Control of Markov Chains, Advances in Applied Probability, Vol. 6, pp. 40–60, 1974.Google Scholar
  9. 9.
    Georgin, J. P., Estimation et Contrôle des Chaînes de Markov sur des Espaces Arbitraires, Journées de Statistique des Processus Stochastiques, Edited by D. Dacunha-Castelle and B. Van Cutsem, Springer-Verlag, New York, New York, pp. 71–113, 1978.Google Scholar
  10. 10.
    Georgin, J. P., Contrôle des Chaînes de Markov sur des Espaces Arbitraires, Annales de l'Institut Henri Poincaré, Section B, Vol. 16, pp. 255–277, 1978.Google Scholar
  11. 11.
    White, C. C.,A Markov Quality Control Process Subject to Partial Observation, Management Science, Vol. 23, pp. 843–852, 1977.Google Scholar
  12. 12.
    White, C. C.,Optimal Inspection and Repair of a Production Process Subject to Deterioration, Journal of the Operational Research Society, Vol. 29, pp. 235–243, 1978.Google Scholar
  13. 13.
    Rhenius, D.,Incomplete Information in Markovian Decision Models, Annals of Statistics, Vol. 2, pp. 1327–1334, 1974.Google Scholar
  14. 14.
    Yushkevich, A. A.,Reduction of a Controlled Markov Model with Incomplete Data to a Problem with Incomplete Information in the Case of Borel State and Control Spaces, Theory of Probability and Its Applications, Vol. 21, pp. 153–158, 1976.Google Scholar
  15. 15.
    Kolonko, M.,Strongly Consistent Estimation in a Controlled Markov Renewal Model, Journal of Applied Probability, Vol. 19, pp. 532–545, 1982.Google Scholar
  16. 16.
    Klimko, L. A., andNelson, P. T.,On Conditional Least-Squares Estimation for Stochastic Processes, Annals of Statistics, Vol. 6, pp. 629–642, 1978.Google Scholar
  17. 17.
    Ljung, L.,Analysis of a General Recursive Prediction Error Identification Algorithm, Automatica, Vol. 17, pp. 89–99, 1981.Google Scholar
  18. 18.
    Baum, L. E., andPetrie, T.,Statistical Inference for Probabilistic Functions of Finite State Markov Chains, Annals of Mathematical Statistics, Vol. 37, pp. 1554–1563, 1966.Google Scholar
  19. 19.
    Baum, L. E., Petrie, T., Soules, G., andWeiss, N.,A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains, Annals of Mathematical Statistics, Vol. 41, pp. 164–171, 1970.Google Scholar
  20. 20.
    Royden, H. L.,Real Analysis, Macmillan, New York, New York, 1968.Google Scholar
  21. 21.
    Van Schuppen, J. H.,Convergence Results for Continuous-Time Adaptive Stochastic Filtering Algorithms, Journal of Mathematical Analysis and Applications, Vol. 96, pp. 209–225, 1983.Google Scholar
  22. 22.
    Wakuta, K.,Semi-Markov Decision Processes with Incomplete State Observation-Average Cost Criterion, Journal of the Operations Research Society of Japan, Vol. 24, pp. 95–108, 1981.Google Scholar
  23. 23.
    Acosta-Abreu, R. S., andHernandez-Lerma, O.,Iterative Adaptive Control of Denumerable State Average-Cost Markov Systems, Control Cyber., Vol. 14, pp. 313–322, 1985.Google Scholar
  24. 24.
    Kolonko, M.,Bounds for the Regret Loss in Dynamic Programming under Adaptive Control, Zeitschrift für Operations Research, Vol. 27, pp. 17–37, 1983.Google Scholar

Copyright information

© Plenum Publishing Corporation 1987

Authors and Affiliations

  • O. Hernandez-Lerma
    • 1
  • S. I. Marcus
    • 2
  1. 1.Departamento de MatemáticasCentro de Investigación del IPNMéxico, DFMexico
  2. 2.Department of Electrical and Computer EngineeringUniversity of Texas at AustinAustin

Personalised recommendations