Skip to main content

Advertisement

Log in

BAGEL: A non-ignorable missing value estimation method for mixed attribute datasets

  • Published:
Sādhanā Aims and scope Submit manuscript

Abstract

Surveys are mainly conducted to obtain valuable information on some criteria from a specified population. But, the survey results often become biased due to non-response of the subjects under study for highly significant attributes. Such non-ignorable missingness need to be treated and the actual values should be retrieved. Many methods have already been proposed for handling missing values in either discrete or continuous attributes. But, there exists a large gap in handling non-ignorable missing values in datasets with mixed attributes. With the intent of addressing this gap, this paper proposes a methodology called as Bayesian Genetic Algorithm (BAGEL) with hybridized Bayesian and Genetic Algorithm principles. In BAGEL, the initial population is generated using Bayesian model and fitness values of the chromosomes are evaluated using Bayesian principles. BAGEL is implemented in real datasets for imputing both discrete and continuous missing values and the imputation accuracy is observed. The experimental results show the superior performance of BAGEL than other standard imputation techniques. Statistical tests conducted to validate the experimental results also prove that BAGEL outperforms at all missing rates from 5% to 50%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8

Similar content being viewed by others

References

  1. Belin T R 2009 Missing data: What a little can do, and what researchers can do in response. Am. J. Ophthal. 148(6): 820–822

    Article  Google Scholar 

  2. Zhang Z and Wang L 2012 A note on the robustness of a full Bayesian method for non-ignorable missing data analysis. Brazil. J. Prob. Stat. 26(3): 244–264

    Article  MATH  Google Scholar 

  3. Wang S, Jiao H and Xiang Y 2013 The effect of nonignorable missing data in computerized adaptive test on item fit statistics for polytomous item response models. Annual meeting of the National Council on Measurement in Education. April 27–30, 2013, San Francisco, CA

  4. Pfeffermann D and Sikov N 2011 Imputation and estimation under nonignorable nonresponse in household surveys with missing covariate information. J. Offic. Stat. 27(2): 181–209

    Google Scholar 

  5. Molenberghs G and Kenward M G 2007 Missing data in clinical studies. West Sussex, England: John Wiley

    Book  Google Scholar 

  6. Molenberghs G 2009 Incomplete data in clinical studies: Analysis, sensitivity and sensitivity analysis. Drug Inform. J. 43(4): 409–429

    Google Scholar 

  7. Pfeffermann D 2011 Modelling of complex survey data: Why model? Why is it a problem? How can we approach it?. Surv. Method 37(2): 115–136

    Google Scholar 

  8. Xie H 2010 Adjusting for nonignorable missingness when estimating generalized additive models. Biomet. J. 52(2): 186–200

    Article  MathSciNet  MATH  Google Scholar 

  9. Enders C K, Fairchild A J and MacKinnon D P 2013 A Bayesian approach for estimating mediation effects with missing data. Multivar. Behav. Res. 48(3): 340–369

    Article  Google Scholar 

  10. Muthen B, Asparouhov T, Hunter A and Leuchter A 2011 Growth modeling with non-ignorable dropout: Alternative analyses of the STAR*D antidepressant trial. Psychol. Methods16(1): 16–33

    Article  Google Scholar 

  11. Feldman B J and Rabe-Hesketh S R 2012 Modeling achievement trajectories when attrition is informative. J. Educ. Behav. Stat. 37(6): 703–736

    Article  Google Scholar 

  12. Song W, Yao W and Xing Y 2014 Robust mixture regression model fitting by Laplace distribution. Comput. Stat. Data Anal. 71: 128–137

    Article  MathSciNet  Google Scholar 

  13. Kang S, Little R J and Kaciroti N 2015 Missing not at random models for masked clinical trials with dropouts. Clin. Trials 12(2):139–148

    Article  Google Scholar 

  14. Riddles M K 2013 Propensity score adjusted method for missing data. PhD thesis, Iowa State University

  15. Jiang D, Zhao P and Tang N 2016 A propensity score adjustment method for regression models with nonignorable missing covariates. Comput. Stat. Data Anal. 94: 98–119

    Article  MathSciNet  Google Scholar 

  16. Fang F, Hong Q and Shao J 2010 Empirical likelihood estimation for samples with nonignorable nonresponse. Stat. Sinica 20: 263–280

    MathSciNet  MATH  Google Scholar 

  17. Zhao H, Zhao P Y and Tang N S 2013 Empirical likelihood inference for mean functionals with nonignorably missing response data. Comput. Stat. Data Anal. 66(10): 101–116

    Article  MathSciNet  Google Scholar 

  18. Niu C, Guo X, Xu W and Zhu L 2014 Empirical likelihood inference in linear regression with non-ignorable missing response. Comput. Stat. Data Anal. 79: 91–112

    Article  MathSciNet  Google Scholar 

  19. Tang N S, Zhao P Y and Zhu HT 2014 Empirical likelihood for estimating equations with nonignorably missing data. Stat. Sinica 24: 723–747

    MathSciNet  MATH  Google Scholar 

  20. Varin C, Reid N and Firth D 2011 An overview of composite likelihood methods. Stat. Sinica 21: 5–42

    MathSciNet  MATH  Google Scholar 

  21. Kim J K and Yu C L 2011 A semiparametric estimation of mean functionals with nonignorable missing data. J. Am. Statist. Assoc. 106(493): 157–165

    Article  MathSciNet  MATH  Google Scholar 

  22. Wang S, Shao J and Kim J K 2014 An instrument variable approach for identification and estimation with nonignorable nonresponse. Stat. Sinica 24: 1097–1116

    MathSciNet  MATH  Google Scholar 

  23. Miao W, Ding P and Geng Z 2015 Identifiability of normal and normal mixture models with nonignorable missing data. arXiv:1509.03860

  24. Kim J K 2009 Calibration estimation using empirical likelihood in survey sampling. Stat. Sinica 19(1): 145–157

    MathSciNet  MATH  Google Scholar 

  25. Kott P S 2009 Calibration weighting: Combining probability samples and linear prediction models. In: D Pfeffermann and C R Rao (Eds.) Handbook of statistics 29B; Sample surveys: Inference and analysis. Amsterdam: North Holland, 55–82

    Chapter  Google Scholar 

  26. Aronow P M, Gerber A S, Green D P and Kern H 2013 Double sampling for missing outcome data in randomized experiments. Typescript, Yale University

  27. Karl A T, Yang Y and Lohr S L 2013 A correlated random effects model for nonignorable missing data in value-added assessment of teacher effects. J. Educ. Behav. Stat. 38(6): 557–603

    Article  Google Scholar 

  28. Pfeffermann D and Sverchkov M 2009 Inference under informative sampling. In: D Pfeffermann and C R Rao (Eds.) Handbook of Statistics 29B; Sample Surveys: Inference and Analysis. Amsterdam: North Holland, 455–487

    Chapter  Google Scholar 

  29. Liao K 2012 Statistical methods for non-ignorable missing data with applications to quality-of-life data. PhD thesis, University of Pennsylvania

  30. Kim J K and Shao J 2013 Statistical methods for handling incomplete data. Chapman &Hall/CRC

  31. Lu Z and Zhang Z 2014 Robust growth mixture models with non-ignorable missingness: Models, estimation, selection, and application. Comput. Stat. Data Anal.71: 220–240

    Article  MathSciNet  Google Scholar 

  32. Paiva T and Reiter J P 2015 Stop or continue data collection: A nonignorable missing data approach for continuous variables. arXiv: 2015. : 1511.02189

  33. Xie H, Qian Y and Qu L M 2011 A semiparametric approach for analyzing nonignorable missing data. Stat. Sinica 21: 1881–1899

    Article  MathSciNet  MATH  Google Scholar 

  34. Yin P and Shi J Q 2015 Simulation based sensitivity analysis for non-ignorable missing data. arxiv:1501.05788

  35. Nelwamondo F V and Marwala T 2008 Techniques for handling missing data: Applications to online condition monitoring. Int. J. Innov. Comp., Inform. Cont. 4(6): 1507–1526

  36. Azadeh S M, Asadzadeh R, Jafari-Marandi S, Nazari-Shirkouhi G, Khoshkhou B, Talebi S and Naghavi A 2013 Optimum estimation of missing values in randomized complete block design by genetic algorithm. Knowl. Based Syst. 37(1): 37–47

    Article  Google Scholar 

  37. Duma M 2013 Partial imputation of unseen records to improve classification using a hybrid multi-layered artificial immune system and genetic algorithm. Appl. Soft Comp. 13(12): 4461–4480

    Article  Google Scholar 

  38. DeviPriya R and Kuppuswami S 2014 Drawing inferences from clinical studies with missing values using genetic algorithm. Int. J. Bioinf. Res. Appl. 10(6): 613–627

    Article  Google Scholar 

  39. DeviPriya R and Kuppuswami S 2015 A novel approach for imputation of missing continuous attribute values in databases using genetic algorithm. Int. J. Inform. Tech. Manag. 14(2/3):185–200

    Article  Google Scholar 

  40. Lobato F, Sales C, Araujo I, Tadaiesky V, Diaa L, Ramos L and Santana A 2015 Multi objective genetic algorithm for missing data imputation. Pattern Recogn. Lett. 68(P1): 126–131

    Article  Google Scholar 

  41. Celeux G, Forbes F, Robert C and Titterington D 2006 Deviance information criteria for missing data models. Bayes. Anal. 1(4): 651–674

    Article  MathSciNet  MATH  Google Scholar 

  42. Kruschke J K, Aguinis H and Joo H 2012 The time has come: Bayesian methods for data analysis in the organizational sciences. Organiz. Res. Methods 15(4): 722–752

    Article  Google Scholar 

  43. Lu Z L, Zhang Z and Lubke G 2011 Bayesian inference for growth mixture models with latent class dependent missing data. Multivar. Behav. Res. 46(4): 567–597

    Article  Google Scholar 

  44. Epifanio G D 2006 A Pseudo Bayes approach for non-ignorable non-response in categorical survey data. Dip. Economi, Finanza e Stat., Technical Report, Univ. di Perugia

  45. Siddique J and Belin T R 2008 Using an approximate Bayesian bootstrap to multiply impute nonignorable missing data. Comput. Stat. Data Anal. 53(2): 405–415

    Article  MathSciNet  MATH  Google Scholar 

  46. Si Y 2012 Non-parametric Bayesian methods for multiple imputation of large scale incomplete categorical data in panel studies. PhD thesis, Duke University

  47. Asparouhov T and Muthen B 2010 Bayesian analysis of latent variable models using MPlus. Version 4. http://www.statmodel.com

  48. Lunn D, Jackson C, Best N, Thomas A and Spiegelhalter D 2013 The BUGS Book – A practical introduction to Bayesian analysis. Boca Raton, FL: CRC Press

    MATH  Google Scholar 

  49. Little R 2011 Calibrated Bayes, for statistics in general, and missing data in particular. Stat. Sci. 26(2): 162–174

    Article  MathSciNet  MATH  Google Scholar 

  50. Tanaka D and Kanazawa Y 2010 Bayesian analysis of the latent growth model with dropout. Discussion paper series, Department of Social Systems and Management, University of Tsukuba

  51. Mason A, Richardson S, Plewis I and Best N 2012 Strategy for modelling nonrandom missing data mechanisms in observational studies using Bayesian methods. J. Offic. Stat. 28(2): 279–302

    Google Scholar 

  52. Janicki R and Malec D 2013 A Bayesian model averaging approach to analyzing categorical data with nonignorable nonresponse. Comput. Stat. Data Anal. 57: 600–614

    Article  MathSciNet  Google Scholar 

  53. Allen J 2015 A Bayesian Hierarchical selection model for academic growth with missing data. ACT Working Paper Series, WP-2015-04

  54. Zhu H, Ibrahim J G and Tang N 2014 Bayesian sensitivity analysis of statistical models with missing data. Stat. Sinica 24(2):871–896

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to R Devi Priya.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Devi Priya, R., Kuppuswami, S. & Sivaraj, R. BAGEL: A non-ignorable missing value estimation method for mixed attribute datasets. Sādhanā 41, 825–836 (2016). https://doi.org/10.1007/s12046-016-0526-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12046-016-0526-3

Keywords

Navigation