Skip to main content
Log in

Statistical Modeling of Longitudinal Data with Non-Ignorable Non-Monotone Missingness with Semiparametric Bayesian and Machine Learning Components

  • Published:
Sankhya B Aims and scope Submit manuscript

Abstract

In longitudinal studies, outcomes are measured repeatedly over time and it is common that not all the patients will be measured throughout the study. For example patients can be lost to follow-up (monotone missingness) or miss one or more visits (non-monotone missingness); hence there are missing outcomes. In the longitudinal setting, we often assume the missingness is related to the unobserved data, which is non-ignorable. Pattern-mixture models (PMM) analyze the joint distribution of outcome and patterns of missingness in longitudinal data with non-ignorable non-monotone missingness. Existing methods employ PMM and impute the unobserved outcomes using the distribution of observed outcomes, conditioned on missing patterns. We extend the existing methods using latent class analysis (LCA) and a shared-parameter PMM. The LCA groups patterns of missingness with similar features and the shared-parameter PMM allows a subset of parameters to be different between latent classes when fitting a model. We also propose a method for imputation using distribution of observed data conditioning on latent class. Our model improves existing methods by accommodating data with small sample size. In a simulation study our estimator had smaller mean squared error than existing methods. Our methodology is applied to data from a phase II clinical trial that studies quality of life of patients with prostate cancer receiving radiation therapy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3

Similar content being viewed by others

References

  • Allison, P.D. (2001). Missing Data, 136. Sage Publications, Thousand Oaks.

    Google Scholar 

  • Azur, M.J., Stuart, E.A., Frangakis, C. and Leaf, P.J. (2011). Multiple imputation by chained equations: what is it and how does it work? Int. J. Methods Psychiatr. Res. 20, 1, 40–49.

    Article  Google Scholar 

  • Daniels, M.J. and Linero, A.R. (2015). Bayesian nonparametrics for missing data in longitudinal clinical trials. Springer, p. 423–446.

  • Daniels, M.J. and Pourahmadi, M. (2002). Bayesian analysis of covariance matrices and dynamic models for longitudinal data. Biometrika 89, 3, 553–566.

    Article  MathSciNet  Google Scholar 

  • Diggle, P.J. (1989). Testing for random dropouts in repeated measurement data. Biometrics, 1255–1258.

  • Diggle, P. and Kenward, M.G. (1994). Informative drop-out in longitudinal data analysis. Appl. Stat., 49–93.

  • Ekholm, A. and Skinner, C. (1998). The muscatine children’s obesity data reanalysed using pattern mixture models. J. R. Stat. Soc.: Series C (Appl. Stat.) 47, 2, 251–263.

    Article  Google Scholar 

  • Enders, C.K. (2010). Applied Missing Data Analysis. Guilford Press.

  • Fitzmaurice, G.M., Laird, N.M. and Shneyer, L. (2001). An alternative parameterization of the general linear mixture model for longitudinal data with non-ignorable drop-outs. Stat. Med. 20, 7, 1009–1021.

    Article  Google Scholar 

  • Gelman, A. et al. (2006). Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper). Bayesian Anal. 1, 3, 515–534.

    Article  MathSciNet  Google Scholar 

  • Glorot, X., Bordes, A. and Bengio, Y. (2011). Deep sparse rectifier neural networks, p. 315–323.

  • Hogan, J.W. and Laird, N.M. (1997). Mixture models for the joint distribution of repeated measures and event times. Stat. Med. 16, 3, 239–257.

    Article  Google Scholar 

  • Ibrahim, J.G., Chen, M.-H. and Lipsitz, S.R. (2001). Missing responses in generalised linear mixed models when the missing data mechanism is nonignorable. Biometrika 88, 2, 551–564.

    Article  MathSciNet  Google Scholar 

  • Linero, A.R. (2017). Bayesian nonparametric analysis of longitudinal studies in the presence of informative missingness. Biometrika 104, 2, 327–341.

    Article  MathSciNet  Google Scholar 

  • Linero, A.R. and Daniels, M.J. (2015). A flexible bayesian approach to monotone missing data in longitudinal studies with nonignorable missingness with application to an acute schizophrenia clinical trial. J. Am. Stat. Assoc. 110, 509, 45–55.

    Article  MathSciNet  Google Scholar 

  • Linero, A.R. and Daniels, M.J. (2017). A general Bayesian nonparametric approach for missing outcome data.

  • Little, R.J.A. (1994). A class of pattern-mixture models for normal incomplete data. Biometrika 81, 3, 471–483.

    Article  MathSciNet  Google Scholar 

  • Little, R.J.A. (1995). Modeling the drop-out mechanism in repeated-measures studies. J. Am. Stat. Assoc. 90, 431, 1112–1121.

    Article  MathSciNet  Google Scholar 

  • Little, R.J.A. and Rubin, D.B. (2014). Statistical Analysis with Missing Data. Wiley.

  • Little, R.J.A. and Wang, Y. (1996). Pattern-mixture models for multivariate incomplete data with covariates. Biometrics, 98–111.

  • Molenberghs, G. and Verbeke, G. (2006). The Diggle-Kenward model for dropout.

  • Molenberghs, G., Michiels, B., Kenward, M.G. and Diggle, P.J. (1998). Monotone missing data and pattern-mixture models. Statistica Neerlandica 52, 2, 153–161.

    Article  MathSciNet  Google Scholar 

  • Paiva, T. and Reiter, J.P. (2017). Stop or continue data collection: A nonignorable missing data approach for continuous variables. J. Off. Stat. 33, 3, 579–599.

    Article  Google Scholar 

  • Rosasco, L., Verri, A., Santoro, M., Mosci, S. and Villa, S. (2009). Iterative projection methods for structured sparsity regularization.

  • Roy, J. (2003). Modeling longitudinal data with nonignorable dropouts using a latent dropout class model. Biometrics 59, 4, 829–836.

    Article  MathSciNet  Google Scholar 

  • Roy, J. and Daniels, M.J. (2008). A general class of pattern mixture models for nonignorable dropout with many possible dropout times. Biometrics 64, 2, 538–545.

    Article  MathSciNet  Google Scholar 

  • Rubin, D.B. (1976). Inference and missing data. Biometrika 63, 3, 581–592.

    Article  MathSciNet  Google Scholar 

  • Rubin, D.B. (2004). Multiple Imputation for Nonresponse in Surveys, 81. Wiley, Hoboken.

    MATH  Google Scholar 

  • Schafer, J.L. and Graham, J.W. (2002). Missing data: Our view of the state of the art. Psychol. Methods 7, 2, 147.

    Article  Google Scholar 

  • Si, Y., Reiter, J.P., Hillygus, D.S. et al. (2016). Bayesian latent pattern mixture models for handling attrition in panel studies with refreshment samples. Ann. Appl. Stat. 10, 1, 118–143.

    Article  MathSciNet  Google Scholar 

  • Vansteelandt, S., Rotnitzky, A. and Robins, J. (2007). Estimation of regression models for the mean of repeated outcomes under nonignorable nonmonotone nonresponse. Biometrika 94, 4, 841–860.

    Article  MathSciNet  Google Scholar 

  • Wang, C., Liao, X., Carin, L. and Dunson, D.B. (2010). Classification with incomplete data using Dirichlet process priors. J. Mach. Learn. Res. 11, 3269–3311.

    MATH  MathSciNet  Google Scholar 

Download references

Acknowledgements

Research reported in this publication was supported by the National Institute Of Environmental Health Sciences of the National Institutes of Health under Award Number T32ES007334. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health; This research was also supported, in part, with funding from NIH-NCI Cancer Center Support Grant P30 CA016059. The authors would like to acknowledge L. Alexis Hoeferlin for help with language of the report.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yu Cao.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cao, Y., Mukhopadhyay, N.D. Statistical Modeling of Longitudinal Data with Non-Ignorable Non-Monotone Missingness with Semiparametric Bayesian and Machine Learning Components. Sankhya B 83, 152–169 (2021). https://doi.org/10.1007/s13571-019-00222-w

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13571-019-00222-w

Keywords

PACS Nos

Navigation