Informing sequential clinical decision-making through reinforcement learning: an empirical study

Abstract

This paper highlights the role that reinforcement learning can play in the optimization of treatment policies for chronic illnesses. Before applying any off-the-shelf reinforcement learning methods in this setting, we must first tackle a number of challenges. We outline some of these challenges and present methods for overcoming them. First, we describe a multiple imputation approach to overcome the problem of missing data. Second, we discuss the use of function approximation in the context of a highly variable observation set. Finally, we discuss approaches to summarizing the evidence in the data for recommending a particular action and quantifying the uncertainty around the Q-function of the recommended policy. We present the results of applying these methods to real clinical trial data of patients with schizophrenia.

References

  1. Adams, C. E. (2002). Schizophrenia trials: past, present and future. Epidemiologia E Psichiatria Sociale, 11(13), 144–151.

    Google Scholar 

  2. Andrews, D. W. K. (2000). Inconsistency of the bootstrap when a parameter is on the boundary of the parameter space. Econometrica, 68(2), 399–405.

    MathSciNet  MATH  Article  Google Scholar 

  3. Bagnell, A., Ng, A., & Schneider, J. (2001). Solving uncertain Markov decision problems (Tech. Rep. CMU-RI-TR-01-25). Robotics Institute, Carnegie Mellon University.

  4. Berry, D. A. (2006). A guide to drug discovery: Bayesian clinical trials. Nature Reviews. Drug Discovery, 5, 27–36.

    Article  Google Scholar 

  5. Biswas, S., Liu, D. D., Lee, J. J., & Berry, D. A. (2009). Bayesian clinical trials at the University of Texas M. D. Anderson cancer center. Clinical Trials, 6, 205–216.

    Article  Google Scholar 

  6. Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140.

    MathSciNet  MATH  Google Scholar 

  7. Brunskill, E., Leffler, B. R., Li, L., Littman, M., & Roy, N. (2008). A continuous-state offset-dynamics reinforcement learner. In D. A. McAllester & P. Myllymäki (Eds.), Proceedings of 24th conference on uncertainty in artificial intelligence (UAI 2008) (pp. 53–61).

    Google Scholar 

  8. Carpenter, J. R. , Kenward, M. G., & White, I. R. (2007). Sensitivity analysis after multiple imputation under missing at random: a weighting approach. Statistical Methods in Medical Research, 16(3), 259–275.

    MathSciNet  MATH  Article  Google Scholar 

  9. Dawson, R., & Lavori, P. W. (2004). Placebo-free designs for evaluating new mental health treatments: the use of adaptive strategies. Statistics in Medicine, 23, 3249–3262.

    Article  Google Scholar 

  10. Dearden, R., Friedman, N., & Andre, D. (1999). Model based Bayesian exploration. In B. Kathryn, & H. P. Laskey (Eds.), Proceedings of 5th conference on uncertainty in artificial intelligence (UAI 1999) (pp. 150–159). San Mateo: Morgan Kaufmann.

    Google Scholar 

  11. Diggle, P., Heagerty, P., Liang, K. Y., & Zeger, S. (2002). Analysis of longitudinal data. Oxford: Oxford University Press.

    Google Scholar 

  12. Doshi, F., Pineau, J., & Roy, N. (2008). Reinforcement learning with limited reinforcement: using Bayes risk for active learning in POMDPs. In A. McCallum & S. Roweis (Eds.), Proceedings of the 25th annual international conference on machine learning (ICML 2008) (pp. 256–263). New York: Omnipress.

    Google Scholar 

  13. Efron, B. (1979). Bootstrap methods: another look at the jackknife. The Annals of Statistics, 7(1), 1–26.

    MathSciNet  MATH  Article  Google Scholar 

  14. Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap. New York: Chapman & Hall.

    Google Scholar 

  15. Engel, Y., Mannor, S., & Meir, R. (2005). Reinforcement learning with Gaussian processes. In L. D. Raedt & S. Wrobel (Eds.), Proceedings of the 22nd international conference on machine learning (ICML 2005) (pp. 201–208). New York: ACM. 10.1145/1102351.1102377.

    Google Scholar 

  16. Ernst, D., Geurts, P., & Wehenkel, L. (2005). Tree-based batch mode reinforcement learning. Journal of Machine Learning Research, 6, 503–556.

    MathSciNet  Google Scholar 

  17. Ernst, D., Stan, G. B., Goncalves, J., & Wehenkel, L. (2006). Clinical data based optimal STI strategies for HIV: a reinforcement learning approach. In Proceedings of the machine learning conference of Belgium and The Netherlands (Benelearn) (pp. 65–72).

    Google Scholar 

  18. Fard, M. M., Pineau, J. (2009). MDPs with non-deterministic policies. In D. Koller, D. Schuurmans, Y. Bengio, & L. Bottou (Eds.), Advances in neural information processing systems (pp. 1065–1072). Cambridge: MIT Press.

    Google Scholar 

  19. Gelman, A., Carlin, J. B., Stern, H., & Rubin, D. B. (1995). Bayesian Data Analysis. New York: Chapman & Hall.

    Google Scholar 

  20. Gelman, A., Mechelen, I. V., Verbeke, G., Heitjan, D. F., & Meulders, M. (2005). Multiple imputation for model checking: completed-data plots with missing and latent data. Biometrics, 61, 74–85.

    MathSciNet  MATH  Article  Google Scholar 

  21. Guez, A., Vincent, R., Avoli, M., & Pineau, J. (2008). Adaptive treatment of epilepsy via batch-mode reinforcement learning. In Proceedings of the innovative applications of artificial intelligence (IAAI).

    Google Scholar 

  22. Hastie, T., Tibshirani, R., & Friedman, J. (2001). Elements of statistical learning. Berlin: Springer.

    Google Scholar 

  23. Irodova, M., & Sloan, R. H. (2005). Reinforcement learning and function approximation. In Proceeding of the twentieth national conference on artificial intelligence (AAAI) (p. 2005). American Association for Artificial Intelligence, Menlo Park.

    Google Scholar 

  24. Kaelbling, L. P., Littman, M. L., & Moore, A. (1996). Reinforcement learning: a survey. The Journal of Artificial Intelligence Research, 4, 237–385.

    Google Scholar 

  25. Kaelbling, L. P., Littman, M. L., & Cassandra, A. R. (1998). Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101, 99–134.

    MathSciNet  MATH  Article  Google Scholar 

  26. Kakade, S., Kearns, J., & Langford, J. (2003). Exploration in metric state spaces. In Proceedings of the 20th Annual International Conference on Machine Learning (ICML 2003).

    Google Scholar 

  27. Kay, S. R., Flazbein, A., & Opler, L. A. (1987). The positive and negative syndrome scale (PANSS) for schizophrenia. Schizophrenia Bulletin, 13(2), 261–276.

    Google Scholar 

  28. Laber, E. B., Qian, M., & Murphy, S. A. (2010). Statistical inference in dynamic treatment regimes (Tech. Rep. 506). Dept. of Statistics, University of Michigan

  29. Lagoudakis, M. G., & Parr, R. (2003). Least-squares policy iteration. Journal of Machine Learning Research, 4, 1107–1149.

    MathSciNet  Article  Google Scholar 

  30. Little, R. J. A., & Rubin, D. B. (1987). Statistical analysis with missing data. New York: Wiley.

    Google Scholar 

  31. Lizotte, D. J., Laber, E., & Murphy, S. A. (2009) Assessing confidence in policies learned from sequential randomized trials (Tech. Rep. 481). Department of Statistics, University of Michigan.

  32. Lizotte, D., Bowling, M., & Murphy, S. (2010). Efficient reinforcement learning with multiple reward functions for randomized controlled trial analysis. In Proceedings of the twenty-seventh international conference on machine learning (ICML 2010). (pp. 695–702). New York: Omnipress.

    Google Scholar 

  33. Mannor, S., Simester, D., Sun, P., & Tsitsiklis, J. (2007) Biases and variance in value function estimates. Management Science 53(1).

  34. Monahan, G. (1982). A survey of partially observable Markov decision processes. Management Science, 28, 1–16.

    MathSciNet  MATH  Article  Google Scholar 

  35. Murphy, S. M. (2003). Optimal dynamic treatment regimes. Journal of the Royal Statistical Society, Series B, 65(2), 331–366.

    MathSciNet  MATH  Article  Google Scholar 

  36. Murphy, S. A. (2005). An experimental design for the development of adaptive treatment strategies. Statistics in Medicine, 24, 1455–1481.

    MathSciNet  Article  Google Scholar 

  37. Murphy, S. A., Oslin, D., & Rush, A. J. (2007). Methodological challenges in constructing effective treatment sequences for chronic disorders. Neuropsychopharmacology, 32(2), 257–262.

    Article  Google Scholar 

  38. NAP (2010). The prevention and treatment of missing data in clinical trials. The National Academies Press, Panel on Handling Missing Data in Clinical Trials. Committee on National Statistics, Division of Behavioral, Social Sciences and Education.

  39. Neter, J., Kutner, M. H., Nachtsheim, C. J., & Wasserman, W. (1996). Applied linear statistical models. New York: McGraw-Hill.

    Google Scholar 

  40. Parr, R., Li, L., Taylor, G., Painter-Wakefield, C., & Littman, M. (2008). An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning. In A. McCallum, & S. Roweis (Eds.), Proceedings of the 25th annual international conference on machine learning (pp. 752–759). New York: Omnipress.

    Google Scholar 

  41. Pineau, J., Bellemare, M. G., Rush, A. J., Ghizaru, A., & Murphy, S. A. (2007). Constructing evidence-based treatment strategies using methods from computer science. Drug and Alcohol Dependence S52–S60.

  42. Robins, J. M., Rotnitzky, A., & Scharfstein, D. (1999). Sensitivity analysis for selection bias and unmeasured confounding in missing data and causal inference models. In M. E. Halloran & D. Berry (Eds.), Statistical models in epidemiology: the environment and clinical trials (pp. 1–92). Berlin: Springer.

    Google Scholar 

  43. Rubin, D. B. (1996). Multiple imputation after 18+ years (with discussion). Journal of the American Statistical Association, 91, 473–489.

    MATH  Article  Google Scholar 

  44. Rush, A. J., Fava, M., Wisniewski, S. R., & Lavori, P. W. (2004). Sequenced treatment alternatives to relieve depression (STAR*D): rational and design. Controlled Clinical Trials, 25(1), 119–142.

    Article  Google Scholar 

  45. Schafer, J. L. (1997). Imputation of missing covariates under a multivariate linear mixed model (Tech. rep.). Dept. of Statistics, The Pennsylvania State University.

  46. Schafer, J. L. (1999). Multiple imputation: a primer. Statistical Methods in Medical Research, 8(1), 3–15.

    Article  Google Scholar 

  47. Schafer, J. L., & Yucel, R. M. (2002). Computational strategies for multivariate linear mixed models with missing values. Journal of Computational and Graphical Statistics, 11, 421–442.

    MathSciNet  Article  Google Scholar 

  48. Scharfstein, D. O., Rotnitzky, A., & Robins, J. M. (1999). Adjusting for nonignorable drop-out using semiparametric nonresponse models. Journal of the American Statistical Association, 94(448), 1096–1120.

    MathSciNet  MATH  Article  Google Scholar 

  49. Shao, J. (1994). Bootstrap sample size in nonregular cases. Proceedings of the American Mathematical Society, 122(4), 1251–1262.

    MathSciNet  MATH  Article  Google Scholar 

  50. Shelton, C. R. (2001). Balancing multiple sources of reward in reinforcement learning. In Advances in neural information processing systems (NIPS 2000) (pp. 1082–1088).

    Google Scholar 

  51. Smallwood, R. D., & Sondik, E. J. (1973). The optimal control of partially observable Markov processes over a finite horizon. Operations Research, 21, 1070–1088.

    Article  Google Scholar 

  52. Strehl, A. L., & Littman, M. L. (2004). An empirical evaluation of interval Estimation for Markov decision processes. In ICTAI (pp. 128–135). Los Alamitos: IEEE Computer Society.

    Google Scholar 

  53. Strehl, A. L., & Littman, M. L. (2005). A theoretical analysis of model-based interval Estimation. In L. D. Raedt & S. Wrobel (Eds.), Proceedings of the 22nd international conference on Machine learning (ICML 2005 ) (pp. 856–863). New York: ACM. 10.1145/1102351.1102459.

    Google Scholar 

  54. Strehl, A., Li, L., Wiewiora, E., Langford, J., & Littman, M. (2006). PAC model-free reinforcement learning. In W. W. Cohen & A. Moore (Eds.), Proceedings of the 23rd annual international conference on machine learning (ICML 2006) (pp. 881–888).

    Google Scholar 

  55. Strens, M. J. A. (2000). A Bayesian framework for reinforcement learning. In P. Langley (Ed.), Proceedings of the seventeenth international conference on machine learning (ICML 2000) (p. 943–950). San Francisco: Morgan Kaufmann.

    Google Scholar 

  56. Stroup, T. S., McEvoy, J. P., Swartz, M. S., Byerly, M. J., Glick, I. D., Canive, J. M., McGee, M., Simpson, G. M., Stevens, M. D., & Lieberman, J. A. (2003). The National Institute of Mental Health clinical antipschotic trials of intervention effectiveness (CATIE) project: schizophrenia trial design and protocol development. Schizophrenia Bulletin, 29(1), 15–31.

    Google Scholar 

  57. Sutton, R. S., & Barto, A. G. (1998). Off-policy bootstrapping. In Reinforcement learning: an introduction Cambridge: MIT Press.

    Google Scholar 

  58. Swartz, M. S., Perkins, D. O., Stroup, T. S., McEvoy, J. P., Nieri, J. M., & Haal, D. D. (2003). Assessing clinical and functional outcomes in the clinical antipsychotic of intervention effectiveness (CATIE) schizophrenia trial. Schizophrenia Bulletin, 29(1), 33–43.

    Google Scholar 

  59. Tetreault, J., Bohus, D., & Litman, D. (2007). Estimating the reliability of MDP policies: a confidence interval approach. In Proceedings of the human language technology conference (pp. 276–283).

    Google Scholar 

  60. Thall, P., & Wathen, J. (2007). Practical Bayesian adaptive randomisation in clinical trials. European Journal of Cancer, 43(5), 859–866.

    Article  Google Scholar 

  61. Thall, P. F., & Wathan, J. K. (2000). Covariate-adjusted adaptive randomization in a sarcoma trial with multistate treatments. Statistics in Medicine, 19, 1011–1028.

    Article  Google Scholar 

  62. van Buuren, S. (2007). Multiple imputation of discrete and continuous data by fully conditional specification. Statistical Methods in Medical Research, 16(3), 219–242.

    MathSciNet  MATH  Article  Google Scholar 

  63. van Buuren, S., Brand, J. P. L., Groothuis-Oudshoorn, C. G. M., & Rubin, D. B. (2006). Fully conditional specification in multivariate imputation. Journal of Statistical Computation and Simulation, 76(12), 1049–1064.

    MathSciNet  MATH  Article  Google Scholar 

  64. Wang, T., Lizotte, D., Bowling, M., & Schuurmans, D. (2005). Bayesian sparse sampling for on-line reward optimization. In L. D. Raedt & S. Wrobel (Eds.), Proceedings of the 22nd international conference on machine learning (ICML 2005) (pp. 956–963). New York: ACM. 10.1145/1102351.1102472.

    Google Scholar 

  65. Zhao, Y., Kosorok, M. R., & Zeng, D. (2009). Reinforcement learning design for cancer clinical trials. Statistics in Medicine, 28, 3294–3315.

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Susan M. Shortreed.

Additional information

Editors: S. Whiteson and M. Littman.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Shortreed, S.M., Laber, E., Lizotte, D.J. et al. Informing sequential clinical decision-making through reinforcement learning: an empirical study. Mach Learn 84, 109–136 (2011). https://doi.org/10.1007/s10994-010-5229-0

Download citation

Keywords

  • Optimal treatment policies
  • Fitted Q-iteration
  • Policy uncertainty