Skip to main content

Experiment Design and Markov Decision Processes

  • Chapter
  • First Online:
Decision Making Under Uncertainty and Reinforcement Learning

Part of the book series: Intelligent Systems Reference Library ((ISRL,volume 223))

  • 556 Accesses

Abstract

This chapter introduces the very general formalism of Markov decision processes (MDPs) that allows representation of various sequential decision making problems. Thus a Markov decision process can be used to model stochastic path problems, stopping problems as well as problems in reinforcement learning, experiment design, and control.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Thus, the result is weakly polynomial complexity, due to the dependence on the input size description.

References

  1. Chernoff, H.: Sequential design of experiments. Ann. Math. Stat. 30(3), 755–770 (1959)

    Article  MathSciNet  MATH  Google Scholar 

  2. Chernoff, H.: Sequential models for clinical trials. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 4, pp. 805–812. University of California Press (1966)

    Google Scholar 

  3. Puterman, M.L.: Markov Decision Processes Discrete Stochastic Dynamic Programming. Wiley, New Jersey, US (1994)

    Book  MATH  Google Scholar 

  4. Tseng, P.: Solving H-horizon, stationary Markov decision problems in time proportional to log(H). Oper. Res. Lett. 9(5), 287–297 (1990)

    Article  MathSciNet  MATH  Google Scholar 

  5. Ye, Y.: The simplex and policy-iteration methods are strongly polynomial for the Markov decision problem with a fixed discount rate. Math. Oper. Res. 36(4), 593–603 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  6. DeGroot, M.H.: Optimal Statistical Decisions. Wiley (1970)

    Google Scholar 

  7. O’Gordon Duff, M.: Optimal learning computational procedures for bayes-adaptive markov decision processes. Ph.D. thesis, University of Massachusetts at Amherst (2002)

    Google Scholar 

  8. Gittins, J.C.: Multi-armed Bandit Allocation Indices. Wiley, New Jersey, US (1989)

    MATH  Google Scholar 

  9. Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific (1996)

    Google Scholar 

  10. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christos Dimitrakakis .

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Dimitrakakis, C., Ortner, R. (2022). Experiment Design and Markov Decision Processes. In: Decision Making Under Uncertainty and Reinforcement Learning. Intelligent Systems Reference Library, vol 223. Springer, Cham. https://doi.org/10.1007/978-3-031-07614-5_6

Download citation

Publish with us

Policies and ethics