Abstract
In longitudinal studies, outcomes are measured repeatedly over time and it is common that not all the patients will be measured throughout the study. For example patients can be lost to follow-up (monotone missingness) or miss one or more visits (non-monotone missingness); hence there are missing outcomes. In the longitudinal setting, we often assume the missingness is related to the unobserved data, which is non-ignorable. Pattern-mixture models (PMM) analyze the joint distribution of outcome and patterns of missingness in longitudinal data with non-ignorable non-monotone missingness. Existing methods employ PMM and impute the unobserved outcomes using the distribution of observed outcomes, conditioned on missing patterns. We extend the existing methods using latent class analysis (LCA) and a shared-parameter PMM. The LCA groups patterns of missingness with similar features and the shared-parameter PMM allows a subset of parameters to be different between latent classes when fitting a model. We also propose a method for imputation using distribution of observed data conditioning on latent class. Our model improves existing methods by accommodating data with small sample size. In a simulation study our estimator had smaller mean squared error than existing methods. Our methodology is applied to data from a phase II clinical trial that studies quality of life of patients with prostate cancer receiving radiation therapy.
Similar content being viewed by others
References
Allison, P.D. (2001). Missing Data, 136. Sage Publications, Thousand Oaks.
Azur, M.J., Stuart, E.A., Frangakis, C. and Leaf, P.J. (2011). Multiple imputation by chained equations: what is it and how does it work? Int. J. Methods Psychiatr. Res. 20, 1, 40–49.
Daniels, M.J. and Linero, A.R. (2015). Bayesian nonparametrics for missing data in longitudinal clinical trials. Springer, p. 423–446.
Daniels, M.J. and Pourahmadi, M. (2002). Bayesian analysis of covariance matrices and dynamic models for longitudinal data. Biometrika 89, 3, 553–566.
Diggle, P.J. (1989). Testing for random dropouts in repeated measurement data. Biometrics, 1255–1258.
Diggle, P. and Kenward, M.G. (1994). Informative drop-out in longitudinal data analysis. Appl. Stat., 49–93.
Ekholm, A. and Skinner, C. (1998). The muscatine children’s obesity data reanalysed using pattern mixture models. J. R. Stat. Soc.: Series C (Appl. Stat.) 47, 2, 251–263.
Enders, C.K. (2010). Applied Missing Data Analysis. Guilford Press.
Fitzmaurice, G.M., Laird, N.M. and Shneyer, L. (2001). An alternative parameterization of the general linear mixture model for longitudinal data with non-ignorable drop-outs. Stat. Med. 20, 7, 1009–1021.
Gelman, A. et al. (2006). Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper). Bayesian Anal. 1, 3, 515–534.
Glorot, X., Bordes, A. and Bengio, Y. (2011). Deep sparse rectifier neural networks, p. 315–323.
Hogan, J.W. and Laird, N.M. (1997). Mixture models for the joint distribution of repeated measures and event times. Stat. Med. 16, 3, 239–257.
Ibrahim, J.G., Chen, M.-H. and Lipsitz, S.R. (2001). Missing responses in generalised linear mixed models when the missing data mechanism is nonignorable. Biometrika 88, 2, 551–564.
Linero, A.R. (2017). Bayesian nonparametric analysis of longitudinal studies in the presence of informative missingness. Biometrika 104, 2, 327–341.
Linero, A.R. and Daniels, M.J. (2015). A flexible bayesian approach to monotone missing data in longitudinal studies with nonignorable missingness with application to an acute schizophrenia clinical trial. J. Am. Stat. Assoc. 110, 509, 45–55.
Linero, A.R. and Daniels, M.J. (2017). A general Bayesian nonparametric approach for missing outcome data.
Little, R.J.A. (1994). A class of pattern-mixture models for normal incomplete data. Biometrika 81, 3, 471–483.
Little, R.J.A. (1995). Modeling the drop-out mechanism in repeated-measures studies. J. Am. Stat. Assoc. 90, 431, 1112–1121.
Little, R.J.A. and Rubin, D.B. (2014). Statistical Analysis with Missing Data. Wiley.
Little, R.J.A. and Wang, Y. (1996). Pattern-mixture models for multivariate incomplete data with covariates. Biometrics, 98–111.
Molenberghs, G. and Verbeke, G. (2006). The Diggle-Kenward model for dropout.
Molenberghs, G., Michiels, B., Kenward, M.G. and Diggle, P.J. (1998). Monotone missing data and pattern-mixture models. Statistica Neerlandica 52, 2, 153–161.
Paiva, T. and Reiter, J.P. (2017). Stop or continue data collection: A nonignorable missing data approach for continuous variables. J. Off. Stat. 33, 3, 579–599.
Rosasco, L., Verri, A., Santoro, M., Mosci, S. and Villa, S. (2009). Iterative projection methods for structured sparsity regularization.
Roy, J. (2003). Modeling longitudinal data with nonignorable dropouts using a latent dropout class model. Biometrics 59, 4, 829–836.
Roy, J. and Daniels, M.J. (2008). A general class of pattern mixture models for nonignorable dropout with many possible dropout times. Biometrics 64, 2, 538–545.
Rubin, D.B. (1976). Inference and missing data. Biometrika 63, 3, 581–592.
Rubin, D.B. (2004). Multiple Imputation for Nonresponse in Surveys, 81. Wiley, Hoboken.
Schafer, J.L. and Graham, J.W. (2002). Missing data: Our view of the state of the art. Psychol. Methods 7, 2, 147.
Si, Y., Reiter, J.P., Hillygus, D.S. et al. (2016). Bayesian latent pattern mixture models for handling attrition in panel studies with refreshment samples. Ann. Appl. Stat. 10, 1, 118–143.
Vansteelandt, S., Rotnitzky, A. and Robins, J. (2007). Estimation of regression models for the mean of repeated outcomes under nonignorable nonmonotone nonresponse. Biometrika 94, 4, 841–860.
Wang, C., Liao, X., Carin, L. and Dunson, D.B. (2010). Classification with incomplete data using Dirichlet process priors. J. Mach. Learn. Res. 11, 3269–3311.
Acknowledgements
Research reported in this publication was supported by the National Institute Of Environmental Health Sciences of the National Institutes of Health under Award Number T32ES007334. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health; This research was also supported, in part, with funding from NIH-NCI Cancer Center Support Grant P30 CA016059. The authors would like to acknowledge L. Alexis Hoeferlin for help with language of the report.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Cao, Y., Mukhopadhyay, N.D. Statistical Modeling of Longitudinal Data with Non-Ignorable Non-Monotone Missingness with Semiparametric Bayesian and Machine Learning Components. Sankhya B 83, 152–169 (2021). https://doi.org/10.1007/s13571-019-00222-w
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13571-019-00222-w
Keywords
- Missing data
- non-ignorable missingness
- non-monotone missingness
- Bayesian nonparametric analysis
- imputation.