Q-Learning: Flexible Learning About Useful Utilities
- First Online:
- Cite this article as:
- Moodie, E.E.M., Dean, N. & Sun, Y.R. Stat Biosci (2014) 6: 223. doi:10.1007/s12561-013-9103-z
- 278 Downloads
Dynamic treatment regimes are fast becoming an important part of medicine, with the corresponding change in emphasis from treatment of the disease to treatment of the individual patient. Because of the limited number of trials to evaluate personally tailored treatment sequences, inferring optimal treatment regimes from observational data has increased importance. Q-learning is a popular method for estimating the optimal treatment regime, originally in randomized trials but more recently also in observational data. Previous applications of Q-learning have largely been restricted to continuous utility end-points with linear relationships. This paper is the first attempt at both extending the framework to discrete utilities and implementing the modelling of covariates from linear to more flexible modelling using the generalized additive model (GAM) framework. Simulated data results show that the GAM adapted Q-learning typically outperforms Q-learning with linear models and other frequently-used methods based on propensity scores in terms of coverage and bias/MSE. This represents a promising step toward a more fully general Q-learning approach to estimating optimal dynamic treatment regimes.