Skip to main content

Bayesian semiparametric modeling of response mechanism for nonignorable missing data

Abstract

Statistical inference with nonresponse is quite challenging, especially when the response mechanism is nonignorable. In this case, the validity of statistical inference depends on untestable correct specification of the response model. To avoid the misspecification, we propose semiparametric Bayesian estimation in which an outcome model is parametric, but the response model is semiparametric in that we do not assume any parametric form for the nonresponse variable. We adopt penalized spline methods to estimate the unknown function. We also consider a fully nonparametric approach to modeling the response mechanism by using radial basis function methods. Using Pólya–gamma data augmentation, we developed an efficient posterior computation algorithm via Gibbs sampling in which most full conditional distributions can be obtained in familiar forms. The performance of the proposed method is demonstrated in simulation studies and an application to longitudinal data.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

References

  • Celeux G, Forbes F, Robert CP, Titterington DM et al (2006) Deviance information criteria for missing data models. Bayesian Anal 1(4):651–673

    MathSciNet  MATH  Google Scholar 

  • Chang T, Kott PS (2008) Using calibration weighting to adjust for nonresponse under a plausible model. Biometrika 105:1265–1275

    MathSciNet  MATH  Google Scholar 

  • Diggle P, Kenward MG (1994) Informative drop-out in longitudinal data analysis. J R Stat Soc Ser C 43:49–93

    MATH  Google Scholar 

  • Durrant GB, Skinner C (2006) Using data augmentation to correct for non-ignorable non-response when surrogate data are available: an application to the distribution of hourly pay. J R Stat Soc Ser A 169:605–623

    MathSciNet  Article  Google Scholar 

  • Greenlees JS, Reece WS, Zieschang KD (1982) Imputation of missing values when the probability of response depends on the variable being imputed. J Am Stat Assoc 77:251–261

    Article  Google Scholar 

  • Han P (2014) Multiply robust estimation in regression analysis with missing data. J Am Stat Assoc 109:1159–1173

    MathSciNet  Article  Google Scholar 

  • Hobert JP, Casella G (1996) The effect of improper priors on Gibbs sampling in hierarchical linear mixed models. J Am Stat Assoc 91:1461–1473

    MathSciNet  Article  Google Scholar 

  • Ibrahim JG, Lipsitz SR, Horton N (2001) Using auxiliary data for parameter estimation with non-ignorably missing outcomes. J R Stat Soc Ser C 50:361–373

    MathSciNet  Article  Google Scholar 

  • Im J, Kim S (2017) Multiple imputation for nonignorable missing data. J Korean Stat Soc 46:583–592

    MathSciNet  Article  Google Scholar 

  • Kang JDY, Schafer JL (2007) Demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data. Stat Sci 22:523–539

    MathSciNet  MATH  Google Scholar 

  • Kim JK, Yu CL (2011) A semiparametric estimation of mean functionals with nonignorable missing data. J Am Stat Assoc 106:157–165

    MathSciNet  Article  Google Scholar 

  • Kott PS, Chang T (2010) Imputation of missing values when the probability of response depends on the variable being imputed. J Am Stat Assoc 77:251–261

    Google Scholar 

  • Little RJA, Rubin DB (2002) Statistical inference with missing data, 2nd edn. Wiley, New York

    Book  Google Scholar 

  • Makalic E (2016) Schmidt D (2016) High-dimensional Bayesian regularised regression with the bayesreg package. arXiv:1611.06649v3

  • Miao W, Tchetgen EJT (2016) On varieties of doubly robust estimators under missingness not at random with a shadow variable. Biometrika 103:475–482

    MathSciNet  Article  Google Scholar 

  • Millar RB (2009) Comparison of hierarchical Bayesian models for overdispersed count data using dic and bayes’ factors. Biometrics 65(3):962–969

    MathSciNet  Article  Google Scholar 

  • Polson NG, Scott JG, Windle JS (2013) Bayesian inference for logistic models using polya-gamma latent variables. J Am Stat Assoc 108:1339–1349

    Article  Google Scholar 

  • Qin J, Leung D, Shao J (2002) Estimation with survey data under nonignorable nonresponse or informative sampling. J Am Stat Assoc 97:193–200

    MathSciNet  Article  Google Scholar 

  • Riddles MK, Kim JK, Im J (2016) A propensity-score-adjustment method for nonignorable nonresponse. J Surv Stat Methodol 4:215–245

    Article  Google Scholar 

  • Robins JM, Rotnitzky A, Zhao LP (1994) Estimation of regression coefficients when some regressors are not always observed. J Am Stat Assoc 89:846–866

    MathSciNet  Article  Google Scholar 

  • Rubin DB (1976) Inference and missing data. Biometrika 63:581–592

    MathSciNet  Article  Google Scholar 

  • Rubin DB (1978) Multiple imputation in sample surveys—a phenomenological Bayesian approach to nonresponse. In: Proceedings of the Survey Research Methods Section. American Statistical Association, Washington, DC, pp 20–34

  • Rubin DB (1987) Multiple imputation for nonresponse in surveys. Wiley, New York

    Book  Google Scholar 

  • Sang H, Morikawa K (2018) A profile likelihood approach to semiparametric estimation with nonignorable nonresponse. arXiv:1809.03645

  • Shao J, Wang L (2016) Semiparametric inverse propensity weighting for nonignorable missing data. Biometrika 103:175–187

    MathSciNet  Article  Google Scholar 

  • Spiegelhalter DJ, Best NG, Carlin BP, Van Der Linde A (2002) Bayesian measures of model complexity and fit. J R Stat Soci Ser B (Stat Methodol) 64(4):583–639

    MathSciNet  Article  Google Scholar 

  • Tang G, Little RJA, Raghunathan TE (2003) Analysis of multivariate missing data with nonignorable nonresponse. Biometrika 90:747–764

    MathSciNet  Article  Google Scholar 

  • Wang S, Shao J, Kim JK (2014) An instrumental variable approach for identification and estimation with nonignorable nonresponse. Stat Sin 20:1097–1116

    MathSciNet  MATH  Google Scholar 

  • Yin G (2009) Bayesian generalized method of moments. Bayesian Anal 4:191–208

    MathSciNet  MATH  Google Scholar 

  • Zahner GE, Pawelkiewicz W, DeFrancesco JJ, Adnopoz J (1992) Children’s mental health service needs and utilization patterns in an urban community: an epidemiological assessment. J Am Acad Child Adolesc Psychiatry 31:951–960

    Article  Google Scholar 

  • Zhao J, Shao J (2015) Semiparametric pseudo-likelihoods in generalized linear models with nonignorable missing data. J Am Stat Assoc 110:1577–1590

    MathSciNet  Article  Google Scholar 

Download references

Acknowledgements

This work is supported by the Japan Society for the Promotion of Science (KAKENHI) grant numbers 18K12757 and 19K14592.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shonosuke Sugasawa.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Sugasawa, S., Morikawa, K. & Takahata, K. Bayesian semiparametric modeling of response mechanism for nonignorable missing data. TEST 31, 101–117 (2022). https://doi.org/10.1007/s11749-021-00774-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11749-021-00774-y

Keywords

  • Longitudinal data
  • Markov Chain Monte Carlo
  • Multiple imputation
  • Polya-gamma distribution
  • Penalized spline

Mathematics Subject Classification

  • 62D10
  • 62G08
  • 62F15