BAGEL: A non-ignorable missing value estimation method for mixed attribute datasets

Devi Priya, R; Kuppuswami, S; Sivaraj, R

doi:10.1007/s12046-016-0526-3

BAGEL: A non-ignorable missing value estimation method for mixed attribute datasets

Published: 09 August 2016

Volume 41, pages 825–836, (2016)
Cite this article

Sādhanā Aims and scope Submit manuscript

R Devi Priya¹,
S Kuppuswami² &
R Sivaraj³

232 Accesses
2 Citations
Explore all metrics

Abstract

Surveys are mainly conducted to obtain valuable information on some criteria from a specified population. But, the survey results often become biased due to non-response of the subjects under study for highly significant attributes. Such non-ignorable missingness need to be treated and the actual values should be retrieved. Many methods have already been proposed for handling missing values in either discrete or continuous attributes. But, there exists a large gap in handling non-ignorable missing values in datasets with mixed attributes. With the intent of addressing this gap, this paper proposes a methodology called as Bayesian Genetic Algorithm (BAGEL) with hybridized Bayesian and Genetic Algorithm principles. In BAGEL, the initial population is generated using Bayesian model and fitness values of the chromosomes are evaluated using Bayesian principles. BAGEL is implemented in real datasets for imputing both discrete and continuous missing values and the imputation accuracy is observed. The experimental results show the superior performance of BAGEL than other standard imputation techniques. Statistical tests conducted to validate the experimental results also prove that BAGEL outperforms at all missing rates from 5% to 50%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on missing data in machine learning

Article Open access 27 October 2021

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

Article 09 November 2022

Imbalanced regression and extreme value prediction

Article 04 September 2020

References

Belin T R 2009 Missing data: What a little can do, and what researchers can do in response. Am. J. Ophthal. 148(6): 820–822
Article Google Scholar
Zhang Z and Wang L 2012 A note on the robustness of a full Bayesian method for non-ignorable missing data analysis. Brazil. J. Prob. Stat. 26(3): 244–264
Article MATH Google Scholar
Wang S, Jiao H and Xiang Y 2013 The effect of nonignorable missing data in computerized adaptive test on item fit statistics for polytomous item response models. Annual meeting of the National Council on Measurement in Education. April 27–30, 2013, San Francisco, CA
Pfeffermann D and Sikov N 2011 Imputation and estimation under nonignorable nonresponse in household surveys with missing covariate information. J. Offic. Stat. 27(2): 181–209
Google Scholar
Molenberghs G and Kenward M G 2007 Missing data in clinical studies. West Sussex, England: John Wiley
Book Google Scholar
Molenberghs G 2009 Incomplete data in clinical studies: Analysis, sensitivity and sensitivity analysis. Drug Inform. J. 43(4): 409–429
Google Scholar
Pfeffermann D 2011 Modelling of complex survey data: Why model? Why is it a problem? How can we approach it?. Surv. Method 37(2): 115–136
Google Scholar
Xie H 2010 Adjusting for nonignorable missingness when estimating generalized additive models. Biomet. J. 52(2): 186–200
Article MathSciNet MATH Google Scholar
Enders C K, Fairchild A J and MacKinnon D P 2013 A Bayesian approach for estimating mediation effects with missing data. Multivar. Behav. Res. 48(3): 340–369
Article Google Scholar
Muthen B, Asparouhov T, Hunter A and Leuchter A 2011 Growth modeling with non-ignorable dropout: Alternative analyses of the STAR*D antidepressant trial. Psychol. Methods16(1): 16–33
Article Google Scholar
Feldman B J and Rabe-Hesketh S R 2012 Modeling achievement trajectories when attrition is informative. J. Educ. Behav. Stat. 37(6): 703–736
Article Google Scholar
Song W, Yao W and Xing Y 2014 Robust mixture regression model fitting by Laplace distribution. Comput. Stat. Data Anal. 71: 128–137
Article MathSciNet Google Scholar
Kang S, Little R J and Kaciroti N 2015 Missing not at random models for masked clinical trials with dropouts. Clin. Trials 12(2):139–148
Article Google Scholar
Riddles M K 2013 Propensity score adjusted method for missing data. PhD thesis, Iowa State University
Jiang D, Zhao P and Tang N 2016 A propensity score adjustment method for regression models with nonignorable missing covariates. Comput. Stat. Data Anal. 94: 98–119
Article MathSciNet Google Scholar
Fang F, Hong Q and Shao J 2010 Empirical likelihood estimation for samples with nonignorable nonresponse. Stat. Sinica 20: 263–280
MathSciNet MATH Google Scholar
Zhao H, Zhao P Y and Tang N S 2013 Empirical likelihood inference for mean functionals with nonignorably missing response data. Comput. Stat. Data Anal. 66(10): 101–116
Article MathSciNet Google Scholar
Niu C, Guo X, Xu W and Zhu L 2014 Empirical likelihood inference in linear regression with non-ignorable missing response. Comput. Stat. Data Anal. 79: 91–112
Article MathSciNet Google Scholar
Tang N S, Zhao P Y and Zhu HT 2014 Empirical likelihood for estimating equations with nonignorably missing data. Stat. Sinica 24: 723–747
MathSciNet MATH Google Scholar
Varin C, Reid N and Firth D 2011 An overview of composite likelihood methods. Stat. Sinica 21: 5–42
MathSciNet MATH Google Scholar
Kim J K and Yu C L 2011 A semiparametric estimation of mean functionals with nonignorable missing data. J. Am. Statist. Assoc. 106(493): 157–165
Article MathSciNet MATH Google Scholar
Wang S, Shao J and Kim J K 2014 An instrument variable approach for identification and estimation with nonignorable nonresponse. Stat. Sinica 24: 1097–1116
MathSciNet MATH Google Scholar
Miao W, Ding P and Geng Z 2015 Identifiability of normal and normal mixture models with nonignorable missing data. arXiv:1509.03860
Kim J K 2009 Calibration estimation using empirical likelihood in survey sampling. Stat. Sinica 19(1): 145–157
MathSciNet MATH Google Scholar
Kott P S 2009 Calibration weighting: Combining probability samples and linear prediction models. In: D Pfeffermann and C R Rao (Eds.) Handbook of statistics 29B; Sample surveys: Inference and analysis. Amsterdam: North Holland, 55–82
Chapter Google Scholar
Aronow P M, Gerber A S, Green D P and Kern H 2013 Double sampling for missing outcome data in randomized experiments. Typescript, Yale University
Karl A T, Yang Y and Lohr S L 2013 A correlated random effects model for nonignorable missing data in value-added assessment of teacher effects. J. Educ. Behav. Stat. 38(6): 557–603
Article Google Scholar
Pfeffermann D and Sverchkov M 2009 Inference under informative sampling. In: D Pfeffermann and C R Rao (Eds.) Handbook of Statistics 29B; Sample Surveys: Inference and Analysis. Amsterdam: North Holland, 455–487
Chapter Google Scholar
Liao K 2012 Statistical methods for non-ignorable missing data with applications to quality-of-life data. PhD thesis, University of Pennsylvania
Kim J K and Shao J 2013 Statistical methods for handling incomplete data. Chapman &Hall/CRC
Lu Z and Zhang Z 2014 Robust growth mixture models with non-ignorable missingness: Models, estimation, selection, and application. Comput. Stat. Data Anal.71: 220–240
Article MathSciNet Google Scholar
Paiva T and Reiter J P 2015 Stop or continue data collection: A nonignorable missing data approach for continuous variables. arXiv: 2015. : 1511.02189
Xie H, Qian Y and Qu L M 2011 A semiparametric approach for analyzing nonignorable missing data. Stat. Sinica 21: 1881–1899
Article MathSciNet MATH Google Scholar
Yin P and Shi J Q 2015 Simulation based sensitivity analysis for non-ignorable missing data. arxiv:1501.05788
Nelwamondo F V and Marwala T 2008 Techniques for handling missing data: Applications to online condition monitoring. Int. J. Innov. Comp., Inform. Cont. 4(6): 1507–1526
Azadeh S M, Asadzadeh R, Jafari-Marandi S, Nazari-Shirkouhi G, Khoshkhou B, Talebi S and Naghavi A 2013 Optimum estimation of missing values in randomized complete block design by genetic algorithm. Knowl. Based Syst. 37(1): 37–47
Article Google Scholar
Duma M 2013 Partial imputation of unseen records to improve classification using a hybrid multi-layered artificial immune system and genetic algorithm. Appl. Soft Comp. 13(12): 4461–4480
Article Google Scholar
DeviPriya R and Kuppuswami S 2014 Drawing inferences from clinical studies with missing values using genetic algorithm. Int. J. Bioinf. Res. Appl. 10(6): 613–627
Article Google Scholar
DeviPriya R and Kuppuswami S 2015 A novel approach for imputation of missing continuous attribute values in databases using genetic algorithm. Int. J. Inform. Tech. Manag. 14(2/3):185–200
Article Google Scholar
Lobato F, Sales C, Araujo I, Tadaiesky V, Diaa L, Ramos L and Santana A 2015 Multi objective genetic algorithm for missing data imputation. Pattern Recogn. Lett. 68(P1): 126–131
Article Google Scholar
Celeux G, Forbes F, Robert C and Titterington D 2006 Deviance information criteria for missing data models. Bayes. Anal. 1(4): 651–674
Article MathSciNet MATH Google Scholar
Kruschke J K, Aguinis H and Joo H 2012 The time has come: Bayesian methods for data analysis in the organizational sciences. Organiz. Res. Methods 15(4): 722–752
Article Google Scholar
Lu Z L, Zhang Z and Lubke G 2011 Bayesian inference for growth mixture models with latent class dependent missing data. Multivar. Behav. Res. 46(4): 567–597
Article Google Scholar
Epifanio G D 2006 A Pseudo Bayes approach for non-ignorable non-response in categorical survey data. Dip. Economi, Finanza e Stat., Technical Report, Univ. di Perugia
Siddique J and Belin T R 2008 Using an approximate Bayesian bootstrap to multiply impute nonignorable missing data. Comput. Stat. Data Anal. 53(2): 405–415
Article MathSciNet MATH Google Scholar
Si Y 2012 Non-parametric Bayesian methods for multiple imputation of large scale incomplete categorical data in panel studies. PhD thesis, Duke University
Asparouhov T and Muthen B 2010 Bayesian analysis of latent variable models using MPlus. Version 4. http://www.statmodel.com
Lunn D, Jackson C, Best N, Thomas A and Spiegelhalter D 2013 The BUGS Book – A practical introduction to Bayesian analysis. Boca Raton, FL: CRC Press
MATH Google Scholar
Little R 2011 Calibrated Bayes, for statistics in general, and missing data in particular. Stat. Sci. 26(2): 162–174
Article MathSciNet MATH Google Scholar
Tanaka D and Kanazawa Y 2010 Bayesian analysis of the latent growth model with dropout. Discussion paper series, Department of Social Systems and Management, University of Tsukuba
Mason A, Richardson S, Plewis I and Best N 2012 Strategy for modelling nonrandom missing data mechanisms in observational studies using Bayesian methods. J. Offic. Stat. 28(2): 279–302
Google Scholar
Janicki R and Malec D 2013 A Bayesian model averaging approach to analyzing categorical data with nonignorable nonresponse. Comput. Stat. Data Anal. 57: 600–614
Article MathSciNet Google Scholar
Allen J 2015 A Bayesian Hierarchical selection model for academic growth with missing data. ACT Working Paper Series, WP-2015-04
Zhu H, Ibrahim J G and Tang N 2014 Bayesian sensitivity analysis of statistical models with missing data. Stat. Sinica 24(2):871–896
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Technology, Kongu Engineering College, Erode, Tamil Nadu, 638 052, India
R Devi Priya
Department of Computer Science and Engineering, Kongu Engineering College, Erode, Tamil Nadu, 638 052, India
S Kuppuswami
Department of Computer Science and Engineering, Velalar College of Engineering and Technology, Erode, Tamil Nadu, 638 012, India
R Sivaraj

Authors

R Devi Priya
View author publications
You can also search for this author in PubMed Google Scholar
S Kuppuswami
View author publications
You can also search for this author in PubMed Google Scholar
R Sivaraj
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to R Devi Priya.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Devi Priya, R., Kuppuswami, S. & Sivaraj, R. BAGEL: A non-ignorable missing value estimation method for mixed attribute datasets. Sādhanā 41, 825–836 (2016). https://doi.org/10.1007/s12046-016-0526-3

Download citation

Received: 19 October 2015
Revised: 23 February 2016
Accepted: 12 March 2016
Published: 09 August 2016
Issue Date: August 2016
DOI: https://doi.org/10.1007/s12046-016-0526-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

BAGEL: A non-ignorable missing value estimation method for mixed attribute datasets

Abstract

Access this article

Similar content being viewed by others

A survey on missing data in machine learning

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

Imbalanced regression and extreme value prediction

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

BAGEL: A non-ignorable missing value estimation method for mixed attribute datasets

Abstract

Access this article

Similar content being viewed by others

A survey on missing data in machine learning

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

Imbalanced regression and extreme value prediction

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation