Skip to main content

Outcome Weighted Learning in Dynamic Treatment Regimes

  • Chapter
  • First Online:
Minimum Divergence Methods in Statistical Machine Learning

Abstract

This chapter discusses applications of information geometry in a paradigm of reinforcement learning with emphasis on dynamic treatment regimes which have progressed recently in the learning algorithm with outcome weighed learning. The probabilistic framework for a triple expressing state, action, and reward is formulated in multiple stages, in which a decision function defined by a state and action is estimated to make an optimal policy for a given dataset. Decision consistency for a decision function is introduced by the state-value function in the space of all the decision functions. We introduce the \(\varPsi \)-divergence on the decision function space with a generator function \(\varPsi \), and investigate statistical properties for the \(\varPsi \)-loss function conducted by \(\varPsi \)-divergence. An outcome-weighed learning algorithm for the decision function is considered in a boosting approach in association with the prediction in supervised learning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 139.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Bellman, R. E. (1957). Dynamic programming. Princeton, New Jersey: Princeton University Press.

    MATH  Google Scholar 

  • Chakraborty, B., & Moodie, E. E. (2013). Statistical methods for dynamic treatment regimes. New York: Springer.

    Book  Google Scholar 

  • Greenland, S., Pearl, J., & Robins, J. M. (1999). Causal diagrams for epidemiologic research. Epidemiology, 10, 37–48.

    Article  Google Scholar 

  • Hastie, T., Rosset, S., Zhu, J., & Zou, H. (2009). Multi-class adaboost. Statistics and Its Interface, 2, 349–360.

    Article  MathSciNet  Google Scholar 

  • H.Watkins, C. J. C. (1989). Learning from delayed rewards. Ph.D. thesis, Cambridge University.

    Google Scholar 

  • Jiang, B., Song, R., Li, J., & Zeng, D. (2019). Entropy learning for dynamic treatment regimes. Statistica Sinica, 29, 1633–1655.

    MathSciNet  MATH  Google Scholar 

  • Murphy, S. A. (2005). An experimental design for the development of adaptive treatment strategies. Statistics in Medicine, 24, 1455–1481.

    Article  MathSciNet  Google Scholar 

  • Pearl, J. (2009). Causality. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Robins, J. M., Hernéan, M. A., & Brumback, B. (2000). Marginal structural models and causal inference in epidemiology. Epidemiology, 11, 550–560.

    Article  Google Scholar 

  • Robins, J. (1986). A new approach to causal inference in mortality studies with a sustained exposure period-application to control of the healthy worker survivor effect. Mathematical Modelling, 7, 1393–1512.

    Article  MathSciNet  Google Scholar 

  • Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66, 688–701.

    Article  Google Scholar 

  • Schulz, J., & Moodie, E. E. (2021). Doubly robust estimation of optimal dosing strategies. Journal of the American Statistical Association, 116, 256–268.

    Article  MathSciNet  Google Scholar 

  • Sutton, R. S., & Barto, A. G. (2020). Reinforcement learning: An introduction. London: MIT Press.

    MATH  Google Scholar 

  • Zhang, B., Tsiatis, A. A., Laber, E. B., & Davidian, M. (2012). A robust method for estimating optimal treatment regimes. Biometrics, 68, 1010–1018.

    Article  MathSciNet  Google Scholar 

  • Zhao, X., Li, X., Chen, L., & Aihara, K. (2007). Protein classification with imbalanced data. Proteins: Structure, Function, and Bioinformatics, 70, 1125–1132.

    Article  Google Scholar 

  • Zhao, Y. Q., Laber, D. Z. E. B., & Kosorok, M. R. (2015). New statistical learning methods for estimating optimal dynamic treatment regimes. Journal of the American Statistical Association, 110, 583–598.

    Article  MathSciNet  Google Scholar 

  • Zhao, Y. Q., Laber, E. B., Ning, Y., Saha, S., & Sands, B. E. (2019). Efficient augmentation and relaxation learning for individualized treatment rules using observational data. Journal of Machine Learning Research, 20, 1–23.

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shinto Eguchi .

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Japan KK, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Eguchi, S., Komori, O. (2022). Outcome Weighted Learning in Dynamic Treatment Regimes. In: Minimum Divergence Methods in Statistical Machine Learning. Springer, Tokyo. https://doi.org/10.1007/978-4-431-56922-0_8

Download citation

Publish with us

Policies and ethics