Missing value imputation: a review and analysis of the literature (2006–2017)
Missing value imputation (MVI) has been studied for several decades being the basic solution method for incomplete dataset problems, specifically those where some data samples contain one or more missing attribute values. This paper aims at reviewing and analyzing related studies carried out in recent decades, from the experimental design perspective. Altogether, 111 journal papers published from 2006 to 2017 are reviewed and analyzed. In addition, several technical issues encountered during the MVI process are addressed, such as the choice of datasets, missing rates and missingness mechanisms, and the MVI techniques and evaluation metrics employed, are discussed. The results of analysis of these issues allow limitations in the existing body of literature to be identified based upon which some directions for future research can be gleaned.
KeywordsMissing values Imputation Supervised learning Incomplete dataset Data mining
The work of the first author was supported in part in part by the Healthy Aging Research Center, Chang Gung University from the Featured Areas Research Center Program within the Framework of the Higher Education Sprout Project by the Ministry of Education (MOE) in Taiwan under Grants EMRPD1I0481 and EMRPD1I0501, and in part by Chang Gung Memorial Hospital, Linkou under Grant CMRPD3I0031. This research of the second author was supported by the Ministry of Science and Technology of Taiwan (MOST 105-2410-H-008-043-MY3).
- Aydilek IB, Arslan A (2012) A novel hybrid approach to estimating missing values in databases using k-nearest neighbors and neural networks. Int J Innov Comput Inf Control 8(7):4705–4717Google Scholar
- De Souto MCP, Jaskowiak PA, Costa IG (2015) Impact of missing data imputation methods on gene expression clustering and classification. Bioinformatics 16:64–72Google Scholar
- Ghorbani S, Desmarais MC (2017) Performance comparison of recent imputation methods for classification tasks over binary data. Appl Artif Intell 31(1):1–22Google Scholar
- Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. Int Joint Conf Artif Intell 2:1137–1143Google Scholar
- Somasundaram RS, Nedunchezhian R (2011) Evaluation of three simple imputation methods for enhancing preprocessing of data with missing values. Int J Comput Appl 12(10):14–19Google Scholar
- Zhang S (2008) Parimputation: from imputation and null-imputation to partially imputation. IEEE Intell Inform Bull 9(1):32–38Google Scholar