Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Estimation and control in multichain processes

  • 35 Accesses


This paper considers Markovian decision processes in discrete time with transition probabilities depending on an unknown parameter which may change step by step. In the case of the convergence of such a parameter sequence, a policy maximizing the average expected reward over an infinite future is looked for. Under continuity conditions, the uniform optimality of a policy based on “estimation and control” for some multichain models is shown.

This is a preview of subscription content, log in to check access.


  1. [1]

    J. Bather, Optimal decision procedures for finite Markov chains. Part II: Communicating systems, Adv. Appl. Prob. 5(1973)521–540.

  2. [2]

    R. Bellman,Dynamic Programming (Princeton University Press, Princeton, 1957).

  3. [3]

    K.-J. Bierth, Nearly optimal policies in semi-Markov decision models,14th Symp. on Operations Research, Ulm (1989).

  4. [4]

    D. Blackwell, Discrete dynamic programming, Ann. Math. Stat. 33(1962)719–726.

  5. [5]

    D. Blackwell, Discounted dynamic programming, Ann. Math. Stat. 36(1965)226–235.

  6. [6]

    R. Dekker, Denumerable Markov decision chains: Optimal policies for small interest rates, Doctoral Dissertation, University of Leiden (1985).

  7. [7]

    A. Dvoretzky, J. Kiefer and J. Wolfowitz, The inventory problem: II. Case of unknown distribution of demand, Econometrica 20(1952)450–466.

  8. [8]

    E. Dynkin and A. Yushkevich,Controlled Markov Processes (Springer, 1979).

  9. [9]

    Y. Fukuda, Bayes and maximum likelihood policies for a multi-echelon inventory problem, Planning Research Corporation, Los Angeles (1969).

  10. [10]

    J. Georgin, Estimation et contrôle des chaines de Markov sur des espaces arbitraires, in: Lecture Notes in Mathematics 636 (Springer, 1978), pp. 71–113.

  11. [11]

    H.-J. Girlich and A. Sokolichin, Schätzen und Steuern in einem Markoffschen Entscheidungsmodell mit unbekannter Parameterfolge, Wiss. Zeitschr. Techn. Hochschule Leipzig 12(1988)121–126.

  12. [12]

    K. Hinderer,Foundations of Non-Stationary Dynamic Programming with Discrete Time Parameter, Lecture Notes in Operations Research 33 (Springer, 1970).

  13. [13]

    A. Hordijk,Dynamic Programming and Markov Potential Theory, Mathematical Centre Tracts 51, Amsterdam (1974).

  14. [14]

    G. Hübner, A unified approach to adaptive controls of average reward Markov decision processes, OR Spektrum 10(1988)161–166.

  15. [15]

    M. Kolonko, Dynamische Optimierung unter Unsicherheit in einem Semi-Markoff-Modell mit abzählbarem Zustandsraum, Doctoral Dissertation, University of Bonn (1980).

  16. [16]

    M. Kolonko and M. Schäl, Optimal control of semi-Markov chains under uncertainty with applications to queueing models,Proc. in Operations Research 9 (Physica-Verlag, Würzburg 1980), pp. 430–435.

  17. [17]

    M. Kurano, Discrete-time Markovian decision processes with an unknown parameter — average return criterion, J. Oper. Res. Soc. Japan 15(1972)67–76.

  18. [18]

    P. Mandl, Estimation and control in Markov chains, Adv. Appl. Prob. 6(1974)40–60.

  19. [19]

    P. Mandl, On the adaptive control of countable Markov chains, Banach Center Publ. 5(1979)159–173.

  20. [20]

    P. Mandl and G. Hübner, Transient phenomena and self-optimizing control of Markov chains, Acta Univ. Carolinae, Math. Phys., 26(1985)35–51.

  21. [21]

    E. Mann, Optimality equations and sensitive optimality in bounded Markov decision processes, Optimization 16(1985)767–781.

  22. [22]

    E. Mann, Optimalitätsgleichungen für undiskontierte semi-Markoffsche Entscheidungsprozesse, Doctoral Dissertation, University of Bonn (1986).

  23. [23]

    U. Rieder, Bayesian dynamic programming, Adv. Appl. Prob. 7(1975)330–348.

  24. [24]

    M. Schäl, Estimation and control in Markov decisions models Wiss. Zeitschr. Techn. Hochschule Leipzig 12(1988)187–192.

  25. [25]

    M. Schäl, On the second optimality equation for semi-markov decision models, Math. Oper. Res. (1989), submitted.

  26. [26]

    A. Sokolichin, Existenz durchschnittsoptimaler Strategien in einem Markoffschen Entscheidungsmodell mit unbekannter Parameterfolge, Optimization 19(1988)577–585.

  27. [27]

    K. van Hee, Bayesian control of Markov chains, Doctoral Dissertation, Techn. Hogeschool Eindhoven (1978).

  28. [28]

    H. Zijm, The optimality equations in multichain denumerable state Markov decision processes with the average cost criterion: The bounded cost case, Statist. Decisions 3(1985)314–365.

Download references

Author information

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Girlich, H., Sokolichin, A.A. Estimation and control in multichain processes. Ann Oper Res 32, 23–33 (1991). https://doi.org/10.1007/BF02204826

Download citation


  • Decision Process
  • Discrete Time
  • Unknown Parameter
  • Continuity Condition
  • Markovian Decision Process