Abstract
Surveys are mainly conducted to obtain valuable information on some criteria from a specified population. But, the survey results often become biased due to non-response of the subjects under study for highly significant attributes. Such non-ignorable missingness need to be treated and the actual values should be retrieved. Many methods have already been proposed for handling missing values in either discrete or continuous attributes. But, there exists a large gap in handling non-ignorable missing values in datasets with mixed attributes. With the intent of addressing this gap, this paper proposes a methodology called as Bayesian Genetic Algorithm (BAGEL) with hybridized Bayesian and Genetic Algorithm principles. In BAGEL, the initial population is generated using Bayesian model and fitness values of the chromosomes are evaluated using Bayesian principles. BAGEL is implemented in real datasets for imputing both discrete and continuous missing values and the imputation accuracy is observed. The experimental results show the superior performance of BAGEL than other standard imputation techniques. Statistical tests conducted to validate the experimental results also prove that BAGEL outperforms at all missing rates from 5% to 50%.
Similar content being viewed by others
References
Belin T R 2009 Missing data: What a little can do, and what researchers can do in response. Am. J. Ophthal. 148(6): 820–822
Zhang Z and Wang L 2012 A note on the robustness of a full Bayesian method for non-ignorable missing data analysis. Brazil. J. Prob. Stat. 26(3): 244–264
Wang S, Jiao H and Xiang Y 2013 The effect of nonignorable missing data in computerized adaptive test on item fit statistics for polytomous item response models. Annual meeting of the National Council on Measurement in Education. April 27–30, 2013, San Francisco, CA
Pfeffermann D and Sikov N 2011 Imputation and estimation under nonignorable nonresponse in household surveys with missing covariate information. J. Offic. Stat. 27(2): 181–209
Molenberghs G and Kenward M G 2007 Missing data in clinical studies. West Sussex, England: John Wiley
Molenberghs G 2009 Incomplete data in clinical studies: Analysis, sensitivity and sensitivity analysis. Drug Inform. J. 43(4): 409–429
Pfeffermann D 2011 Modelling of complex survey data: Why model? Why is it a problem? How can we approach it?. Surv. Method 37(2): 115–136
Xie H 2010 Adjusting for nonignorable missingness when estimating generalized additive models. Biomet. J. 52(2): 186–200
Enders C K, Fairchild A J and MacKinnon D P 2013 A Bayesian approach for estimating mediation effects with missing data. Multivar. Behav. Res. 48(3): 340–369
Muthen B, Asparouhov T, Hunter A and Leuchter A 2011 Growth modeling with non-ignorable dropout: Alternative analyses of the STAR*D antidepressant trial. Psychol. Methods16(1): 16–33
Feldman B J and Rabe-Hesketh S R 2012 Modeling achievement trajectories when attrition is informative. J. Educ. Behav. Stat. 37(6): 703–736
Song W, Yao W and Xing Y 2014 Robust mixture regression model fitting by Laplace distribution. Comput. Stat. Data Anal. 71: 128–137
Kang S, Little R J and Kaciroti N 2015 Missing not at random models for masked clinical trials with dropouts. Clin. Trials 12(2):139–148
Riddles M K 2013 Propensity score adjusted method for missing data. PhD thesis, Iowa State University
Jiang D, Zhao P and Tang N 2016 A propensity score adjustment method for regression models with nonignorable missing covariates. Comput. Stat. Data Anal. 94: 98–119
Fang F, Hong Q and Shao J 2010 Empirical likelihood estimation for samples with nonignorable nonresponse. Stat. Sinica 20: 263–280
Zhao H, Zhao P Y and Tang N S 2013 Empirical likelihood inference for mean functionals with nonignorably missing response data. Comput. Stat. Data Anal. 66(10): 101–116
Niu C, Guo X, Xu W and Zhu L 2014 Empirical likelihood inference in linear regression with non-ignorable missing response. Comput. Stat. Data Anal. 79: 91–112
Tang N S, Zhao P Y and Zhu HT 2014 Empirical likelihood for estimating equations with nonignorably missing data. Stat. Sinica 24: 723–747
Varin C, Reid N and Firth D 2011 An overview of composite likelihood methods. Stat. Sinica 21: 5–42
Kim J K and Yu C L 2011 A semiparametric estimation of mean functionals with nonignorable missing data. J. Am. Statist. Assoc. 106(493): 157–165
Wang S, Shao J and Kim J K 2014 An instrument variable approach for identification and estimation with nonignorable nonresponse. Stat. Sinica 24: 1097–1116
Miao W, Ding P and Geng Z 2015 Identifiability of normal and normal mixture models with nonignorable missing data. arXiv:1509.03860
Kim J K 2009 Calibration estimation using empirical likelihood in survey sampling. Stat. Sinica 19(1): 145–157
Kott P S 2009 Calibration weighting: Combining probability samples and linear prediction models. In: D Pfeffermann and C R Rao (Eds.) Handbook of statistics 29B; Sample surveys: Inference and analysis. Amsterdam: North Holland, 55–82
Aronow P M, Gerber A S, Green D P and Kern H 2013 Double sampling for missing outcome data in randomized experiments. Typescript, Yale University
Karl A T, Yang Y and Lohr S L 2013 A correlated random effects model for nonignorable missing data in value-added assessment of teacher effects. J. Educ. Behav. Stat. 38(6): 557–603
Pfeffermann D and Sverchkov M 2009 Inference under informative sampling. In: D Pfeffermann and C R Rao (Eds.) Handbook of Statistics 29B; Sample Surveys: Inference and Analysis. Amsterdam: North Holland, 455–487
Liao K 2012 Statistical methods for non-ignorable missing data with applications to quality-of-life data. PhD thesis, University of Pennsylvania
Kim J K and Shao J 2013 Statistical methods for handling incomplete data. Chapman &Hall/CRC
Lu Z and Zhang Z 2014 Robust growth mixture models with non-ignorable missingness: Models, estimation, selection, and application. Comput. Stat. Data Anal.71: 220–240
Paiva T and Reiter J P 2015 Stop or continue data collection: A nonignorable missing data approach for continuous variables. arXiv: 2015. : 1511.02189
Xie H, Qian Y and Qu L M 2011 A semiparametric approach for analyzing nonignorable missing data. Stat. Sinica 21: 1881–1899
Yin P and Shi J Q 2015 Simulation based sensitivity analysis for non-ignorable missing data. arxiv:1501.05788
Nelwamondo F V and Marwala T 2008 Techniques for handling missing data: Applications to online condition monitoring. Int. J. Innov. Comp., Inform. Cont. 4(6): 1507–1526
Azadeh S M, Asadzadeh R, Jafari-Marandi S, Nazari-Shirkouhi G, Khoshkhou B, Talebi S and Naghavi A 2013 Optimum estimation of missing values in randomized complete block design by genetic algorithm. Knowl. Based Syst. 37(1): 37–47
Duma M 2013 Partial imputation of unseen records to improve classification using a hybrid multi-layered artificial immune system and genetic algorithm. Appl. Soft Comp. 13(12): 4461–4480
DeviPriya R and Kuppuswami S 2014 Drawing inferences from clinical studies with missing values using genetic algorithm. Int. J. Bioinf. Res. Appl. 10(6): 613–627
DeviPriya R and Kuppuswami S 2015 A novel approach for imputation of missing continuous attribute values in databases using genetic algorithm. Int. J. Inform. Tech. Manag. 14(2/3):185–200
Lobato F, Sales C, Araujo I, Tadaiesky V, Diaa L, Ramos L and Santana A 2015 Multi objective genetic algorithm for missing data imputation. Pattern Recogn. Lett. 68(P1): 126–131
Celeux G, Forbes F, Robert C and Titterington D 2006 Deviance information criteria for missing data models. Bayes. Anal. 1(4): 651–674
Kruschke J K, Aguinis H and Joo H 2012 The time has come: Bayesian methods for data analysis in the organizational sciences. Organiz. Res. Methods 15(4): 722–752
Lu Z L, Zhang Z and Lubke G 2011 Bayesian inference for growth mixture models with latent class dependent missing data. Multivar. Behav. Res. 46(4): 567–597
Epifanio G D 2006 A Pseudo Bayes approach for non-ignorable non-response in categorical survey data. Dip. Economi, Finanza e Stat., Technical Report, Univ. di Perugia
Siddique J and Belin T R 2008 Using an approximate Bayesian bootstrap to multiply impute nonignorable missing data. Comput. Stat. Data Anal. 53(2): 405–415
Si Y 2012 Non-parametric Bayesian methods for multiple imputation of large scale incomplete categorical data in panel studies. PhD thesis, Duke University
Asparouhov T and Muthen B 2010 Bayesian analysis of latent variable models using MPlus. Version 4. http://www.statmodel.com
Lunn D, Jackson C, Best N, Thomas A and Spiegelhalter D 2013 The BUGS Book – A practical introduction to Bayesian analysis. Boca Raton, FL: CRC Press
Little R 2011 Calibrated Bayes, for statistics in general, and missing data in particular. Stat. Sci. 26(2): 162–174
Tanaka D and Kanazawa Y 2010 Bayesian analysis of the latent growth model with dropout. Discussion paper series, Department of Social Systems and Management, University of Tsukuba
Mason A, Richardson S, Plewis I and Best N 2012 Strategy for modelling nonrandom missing data mechanisms in observational studies using Bayesian methods. J. Offic. Stat. 28(2): 279–302
Janicki R and Malec D 2013 A Bayesian model averaging approach to analyzing categorical data with nonignorable nonresponse. Comput. Stat. Data Anal. 57: 600–614
Allen J 2015 A Bayesian Hierarchical selection model for academic growth with missing data. ACT Working Paper Series, WP-2015-04
Zhu H, Ibrahim J G and Tang N 2014 Bayesian sensitivity analysis of statistical models with missing data. Stat. Sinica 24(2):871–896
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Devi Priya, R., Kuppuswami, S. & Sivaraj, R. BAGEL: A non-ignorable missing value estimation method for mixed attribute datasets. Sādhanā 41, 825–836 (2016). https://doi.org/10.1007/s12046-016-0526-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12046-016-0526-3