Abstract
This chapter discusses applications of information geometry in a paradigm of reinforcement learning with emphasis on dynamic treatment regimes which have progressed recently in the learning algorithm with outcome weighed learning. The probabilistic framework for a triple expressing state, action, and reward is formulated in multiple stages, in which a decision function defined by a state and action is estimated to make an optimal policy for a given dataset. Decision consistency for a decision function is introduced by the state-value function in the space of all the decision functions. We introduce the \(\varPsi \)-divergence on the decision function space with a generator function \(\varPsi \), and investigate statistical properties for the \(\varPsi \)-loss function conducted by \(\varPsi \)-divergence. An outcome-weighed learning algorithm for the decision function is considered in a boosting approach in association with the prediction in supervised learning.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bellman, R. E. (1957). Dynamic programming. Princeton, New Jersey: Princeton University Press.
Chakraborty, B., & Moodie, E. E. (2013). Statistical methods for dynamic treatment regimes. New York: Springer.
Greenland, S., Pearl, J., & Robins, J. M. (1999). Causal diagrams for epidemiologic research. Epidemiology, 10, 37–48.
Hastie, T., Rosset, S., Zhu, J., & Zou, H. (2009). Multi-class adaboost. Statistics and Its Interface, 2, 349–360.
H.Watkins, C. J. C. (1989). Learning from delayed rewards. Ph.D. thesis, Cambridge University.
Jiang, B., Song, R., Li, J., & Zeng, D. (2019). Entropy learning for dynamic treatment regimes. Statistica Sinica, 29, 1633–1655.
Murphy, S. A. (2005). An experimental design for the development of adaptive treatment strategies. Statistics in Medicine, 24, 1455–1481.
Pearl, J. (2009). Causality. Cambridge: Cambridge University Press.
Robins, J. M., Hernéan, M. A., & Brumback, B. (2000). Marginal structural models and causal inference in epidemiology. Epidemiology, 11, 550–560.
Robins, J. (1986). A new approach to causal inference in mortality studies with a sustained exposure period-application to control of the healthy worker survivor effect. Mathematical Modelling, 7, 1393–1512.
Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66, 688–701.
Schulz, J., & Moodie, E. E. (2021). Doubly robust estimation of optimal dosing strategies. Journal of the American Statistical Association, 116, 256–268.
Sutton, R. S., & Barto, A. G. (2020). Reinforcement learning: An introduction. London: MIT Press.
Zhang, B., Tsiatis, A. A., Laber, E. B., & Davidian, M. (2012). A robust method for estimating optimal treatment regimes. Biometrics, 68, 1010–1018.
Zhao, X., Li, X., Chen, L., & Aihara, K. (2007). Protein classification with imbalanced data. Proteins: Structure, Function, and Bioinformatics, 70, 1125–1132.
Zhao, Y. Q., Laber, D. Z. E. B., & Kosorok, M. R. (2015). New statistical learning methods for estimating optimal dynamic treatment regimes. Journal of the American Statistical Association, 110, 583–598.
Zhao, Y. Q., Laber, E. B., Ning, Y., Saha, S., & Sands, B. E. (2019). Efficient augmentation and relaxation learning for individualized treatment rules using observational data. Journal of Machine Learning Research, 20, 1–23.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2022 Springer Japan KK, part of Springer Nature
About this chapter
Cite this chapter
Eguchi, S., Komori, O. (2022). Outcome Weighted Learning in Dynamic Treatment Regimes. In: Minimum Divergence Methods in Statistical Machine Learning. Springer, Tokyo. https://doi.org/10.1007/978-4-431-56922-0_8
Download citation
DOI: https://doi.org/10.1007/978-4-431-56922-0_8
Published:
Publisher Name: Springer, Tokyo
Print ISBN: 978-4-431-56920-6
Online ISBN: 978-4-431-56922-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)