Outcome Weighted Learning in Dynamic Treatment Regimes

Eguchi, Shinto; Komori, Osamu

doi:10.1007/978-4-431-56922-0_8

Shinto Eguchi³ &
Osamu Komori⁴

859 Accesses
1 Citations

Abstract

This chapter discusses applications of information geometry in a paradigm of reinforcement learning with emphasis on dynamic treatment regimes which have progressed recently in the learning algorithm with outcome weighed learning. The probabilistic framework for a triple expressing state, action, and reward is formulated in multiple stages, in which a decision function defined by a state and action is estimated to make an optimal policy for a given dataset. Decision consistency for a decision function is introduced by the state-value function in the space of all the decision functions. We introduce the \(\varPsi \)-divergence on the decision function space with a generator function \(\varPsi \), and investigate statistical properties for the \(\varPsi \)-loss function conducted by \(\varPsi \)-divergence. An outcome-weighed learning algorithm for the decision function is considered in a boosting approach in association with the prediction in supervised learning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Hardcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bellman, R. E. (1957). Dynamic programming. Princeton, New Jersey: Princeton University Press.
MATH Google Scholar
Chakraborty, B., & Moodie, E. E. (2013). Statistical methods for dynamic treatment regimes. New York: Springer.
Book Google Scholar
Greenland, S., Pearl, J., & Robins, J. M. (1999). Causal diagrams for epidemiologic research. Epidemiology, 10, 37–48.
Article Google Scholar
Hastie, T., Rosset, S., Zhu, J., & Zou, H. (2009). Multi-class adaboost. Statistics and Its Interface, 2, 349–360.
Article MathSciNet Google Scholar
H.Watkins, C. J. C. (1989). Learning from delayed rewards. Ph.D. thesis, Cambridge University.
Google Scholar
Jiang, B., Song, R., Li, J., & Zeng, D. (2019). Entropy learning for dynamic treatment regimes. Statistica Sinica, 29, 1633–1655.
MathSciNet MATH Google Scholar
Murphy, S. A. (2005). An experimental design for the development of adaptive treatment strategies. Statistics in Medicine, 24, 1455–1481.
Article MathSciNet Google Scholar
Pearl, J. (2009). Causality. Cambridge: Cambridge University Press.
Book Google Scholar
Robins, J. M., Hernéan, M. A., & Brumback, B. (2000). Marginal structural models and causal inference in epidemiology. Epidemiology, 11, 550–560.
Article Google Scholar
Robins, J. (1986). A new approach to causal inference in mortality studies with a sustained exposure period-application to control of the healthy worker survivor effect. Mathematical Modelling, 7, 1393–1512.
Article MathSciNet Google Scholar
Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66, 688–701.
Article Google Scholar
Schulz, J., & Moodie, E. E. (2021). Doubly robust estimation of optimal dosing strategies. Journal of the American Statistical Association, 116, 256–268.
Article MathSciNet Google Scholar
Sutton, R. S., & Barto, A. G. (2020). Reinforcement learning: An introduction. London: MIT Press.
MATH Google Scholar
Zhang, B., Tsiatis, A. A., Laber, E. B., & Davidian, M. (2012). A robust method for estimating optimal treatment regimes. Biometrics, 68, 1010–1018.
Article MathSciNet Google Scholar
Zhao, X., Li, X., Chen, L., & Aihara, K. (2007). Protein classification with imbalanced data. Proteins: Structure, Function, and Bioinformatics, 70, 1125–1132.
Article Google Scholar
Zhao, Y. Q., Laber, D. Z. E. B., & Kosorok, M. R. (2015). New statistical learning methods for estimating optimal dynamic treatment regimes. Journal of the American Statistical Association, 110, 583–598.
Article MathSciNet Google Scholar
Zhao, Y. Q., Laber, E. B., Ning, Y., Saha, S., & Sands, B. E. (2019). Efficient augmentation and relaxation learning for individualized treatment rules using observational data. Journal of Machine Learning Research, 20, 1–23.
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Statistical Mathematic, Tokyo, Japan
Shinto Eguchi
Seikei University, Tokyo, Japan
Osamu Komori

Authors

Shinto Eguchi
View author publications
You can also search for this author in PubMed Google Scholar
Osamu Komori
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shinto Eguchi .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Eguchi, S., Komori, O. (2022). Outcome Weighted Learning in Dynamic Treatment Regimes. In: Minimum Divergence Methods in Statistical Machine Learning. Springer, Tokyo. https://doi.org/10.1007/978-4-431-56922-0_8

Download citation

DOI: https://doi.org/10.1007/978-4-431-56922-0_8
Published: 15 March 2022
Publisher Name: Springer, Tokyo
Print ISBN: 978-4-431-56920-6
Online ISBN: 978-4-431-56922-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics