Statistics in Biosciences

, Volume 6, Issue 2, pp 223–243

Q-Learning: Flexible Learning About Useful Utilities


DOI: 10.1007/s12561-013-9103-z

Cite this article as:
Moodie, E.E.M., Dean, N. & Sun, Y.R. Stat Biosci (2014) 6: 223. doi:10.1007/s12561-013-9103-z


Dynamic treatment regimes are fast becoming an important part of medicine, with the corresponding change in emphasis from treatment of the disease to treatment of the individual patient. Because of the limited number of trials to evaluate personally tailored treatment sequences, inferring optimal treatment regimes from observational data has increased importance. Q-learning is a popular method for estimating the optimal treatment regime, originally in randomized trials but more recently also in observational data. Previous applications of Q-learning have largely been restricted to continuous utility end-points with linear relationships. This paper is the first attempt at both extending the framework to discrete utilities and implementing the modelling of covariates from linear to more flexible modelling using the generalized additive model (GAM) framework. Simulated data results show that the GAM adapted Q-learning typically outperforms Q-learning with linear models and other frequently-used methods based on propensity scores in terms of coverage and bias/MSE. This represents a promising step toward a more fully general Q-learning approach to estimating optimal dynamic treatment regimes.


Dynamic treatment regimes Q-learning Generalized additive models Discrete data Adaptive treatment strategies Personalized medicine 

Copyright information

© International Chinese Statistical Association 2013

Authors and Affiliations

  1. 1.Department of Epidemiology, Biostatistics, and Occupational HealthMcGill UniversityMontrealCanada
  2. 2.School of Mathematics and StatisticsUniversity of GlasgowGlasgowUK
  3. 3.Department of Mathematics and Statistics, School of Computer ScienceMcGill UniversityMontrealCanada

Personalised recommendations