Advertisement

Mapping discounted and undiscounted Markov Decision Problems onto Hopfield neural networks

  • Alexandru Murgu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 888)

Abstract

This paper presents a framework for mapping the value-iteration and related successive approximation methods for Markov Decision Problems onto Hopfield neural networks, for both discounted and undiscounted versions of the finite state and action spaces. We analyse the asymptotic behaviour of the control sets and we give some estimates on the convergence rate for the value-iteration scheme. We relate the convergence properties on an energy function which represents the key point in mapping Markov Decision Problems onto Hopfield networks. Finally, an application from queueing systems in communication networks is taken into consideration and the results of computer simulation of Hopfield network running for the equivalent Markov Decision Problem are presented, together with some comments on possible developments.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    J. Bather: Optimal decision procedures for finite Markov chains. Adv. in Appl. Prob. 5, Part I, 328–339 (1973)Google Scholar
  2. 2.
    R. Belman: Applied Dynamic Programming, Princeton University Press, Princeton, NJ, 1962Google Scholar
  3. 3.
    D.P. Bertsekas: Constrained Optimization and Lagrange Multiplier Methods. New York: Academic Press 1982Google Scholar
  4. 4.
    D.P. Bertesekas and J.N.Tsitsiklis: Parallel and Distributed Computation: Numerical Methods. Englewood Cliffs, NJ: Prentice-Hall 1989Google Scholar
  5. 5.
    D.P. Bertsekas and R. Gallager: Data Networks. Englewood Cliffs, NJ: Prentice-Hall 1992Google Scholar
  6. 6.
    B. Brown: On the iterative method of dynamic programming on a finite space discrete time Markov process. Ann. Math. Statist 36, 1279–1285 (1965)Google Scholar
  7. 7.
    E. V. Denardo: Contraction mappings in the theory underlying dynamic programming. SIAM Rev. 9, 165–177 (1967)Google Scholar
  8. 8.
    E.V. Denardo: A Markov decision problem. In: T.C. Hu, S.M. Robinson (Eds.): Mathematical Programming. Academic Press 1973, pp. 33–68Google Scholar
  9. 9.
    C. Derman: Finite State Markovian Decision Processes. New York: Academic Press, 1970Google Scholar
  10. 10.
    B. Finkbeiner, W. Rungaldier: A value iteration algorithm for Markov renewal programming. In: L. Zadeh (ed.): Computing Methods in optimization Problems 2. New York: Academic Press 1969, pp. 95–104Google Scholar
  11. 11.
    N. Hastings: Optimization of discounted Markov decision problems. Op. Res. Quart. 20, 499–500 (1969)Google Scholar
  12. 12.
    N. Hastings: Bounds on the gain of a Markov decision process. Op. Res. 19, 240–244 (1971)Google Scholar
  13. 13.
    N. Hastings, J. Mello: Tests for suboptimal actions in discounted Markov programming. Man. Sci. 19, 1019–1022 (1973)Google Scholar
  14. 14.
    J. Hertz, A. Krogh, P.G. Palmer: Introduction to the Theory of Neural Computation. Redwood City: Addison-Wesley Publishing Company 1991Google Scholar
  15. 15.
    R. Howard: Dynamic Programming and Markov Processes, John Wiley, New York, 1960Google Scholar
  16. 16.
    H. Kushner, A. Kleinman: Accelerated procedures for the solution of discrete Markov control problems. IEEE Trans. Automatic Control AC-16, 147–152 (1971)Google Scholar
  17. 17.
    V. Lakshmikantham, S. Leela, A.A. Marttynyuk: Stability Analysis of Nonlinear Systems. New York: Marcel Dekker, Inc. 1989Google Scholar
  18. 18.
    J. MacQueen: A test for suboptimal actions in Markovian decision problems. Op. Res. 15, 559–561 (1967)Google Scholar
  19. 19.
    T. Morton, W. Wecker: Discounting, ergodicity and convergence for Markov decision processes. Man. Sci. 23, 890–900 (1977)Google Scholar
  20. 20.
    A. Odoni: On finding the maximal gain for Markov decision processes. Op. Res. 17, 857–860 (1969)Google Scholar
  21. 21.
    E. Porteus: Bounds and transformations for discounted finite Markov decision chains. Op. Res. 23, 761–784 (1975)Google Scholar
  22. 22.
    E. Porteus, S.J. Totten: Accelerated computation of the expected discounted return in a Markov chain. Op. Res. 26, 350–358 (1978)Google Scholar
  23. 23.
    D. Reetz: Solution of a Markovian decision problem by successive overrelaxation. Op. Res. 21, 29–32 (1973)Google Scholar
  24. 24.
    S.M. Ross: Stochastic Processes. New York: John Wiley & Sons 1983Google Scholar
  25. 25.
    M. Schwartz: Telecommunication Networks: Protocols, Modeling and Analysis. Reading, MA: Addison-Wesley Publishing Company 1987Google Scholar
  26. 26.
    P.J. Schweitzer: A turnpike theorem for undiscounted Markovian decision processes. ORSA/TIMS National Meeting, May 1968Google Scholar
  27. 27.
    P.J. Schweitzer: Iterative solution of the functional equations for undiscounted Markov renewal programming. J.M.A.A. 34, 495–501 (1971)Google Scholar
  28. 28.
    J. Shapiro: Turnpike planning horizons for a Markovian decision model. Man. Sci. 14, 292–300 (1968)Google Scholar
  29. 29.
    J. Van Nunen: A set of successive approximation methods for discounted Markovian decision problems. Op. Res. 20, 203–209 (1976)Google Scholar
  30. 30.
    D. White: Dynamic programming, Markov chains, and the method of successive approximations. J.M.A.A. 6, 373–376 (1963)Google Scholar
  31. 31.
    W. Zangwill: Nonlinear Programming. A Unified approach. Englewood Cliffs, NJ: Prentice Hall 1969Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1995

Authors and Affiliations

  • Alexandru Murgu
    • 1
  1. 1.Department of MathematicsUniversity of JyväskyläJyväskyläFinland

Personalised recommendations