Abstract
Missing values are a common cause of poor data quality. When not handled properly, it can interrupt data pipelines and have a disastrous impact on Data Mining, Machine Learning (ML) and Statistical applications. To draw reliable and accurate inferences from the data, missing values must be treated correctly. Adoption of any imputation technique needs a thorough understanding of the underlying assumptions and rules followed by the technique. Most of the earlier reviews are based on statistical and ML techniques, but none of these reviewed and discussed single imputation (SΙ) and multiple imputation (MI) techniques in detail. This paper aims to review the SI and MI techniques for handling missing values which will give researchers an overview and motivate them to use these techniques in their research study.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Junninen H, Niska H, Tuppurainen K, Ruuskanen J, Kolehmainen M (2004) Methods for imputation of missing values in air quality data sets. Atmos Environ 38:2895–2907
Di Zio M, Guarnera U, Luzi O (2007) Imputation through finite Gaussian mixture models. Comput Stat Data Anal 51:5305–5316
Verboven S, Vanden Branden K, Goos P (2007) Sequential imputation for missing values. Comput Biol Chem 31:320–327
Lakshminarayan K, Harp SA, Samad T (1999) Imputation of missing data in industrial databases. Appl Intell 11:259–275
Rubin DB (1976) Inference and missing data. Biometrika 63:581–592
Swalin A (2018) How to handle missing data. Towards Data Sci 18:1–19. https://towardsdatascience.com/how-to-handle-missing-data-8646b18db0d4
Schafer JL, Graham JW (2002) Missing data: our view of the state of the art. Psychol Methods 7:147
Little RJA, Rubin DB (2019) Statistical analysis with missing data. Wiley
Jäger S, Allhorn A, Bießmann F (2021) A benchmark for data imputation methods. Front Big Data 48
Rubin DB (1987) Multiple imputation for survey nonresponse
Van Buuren S (2018) Flexible imputation of missing data. CRC
SAS, S.A.S., Guide, S.U.: Version 9.1, Volumes 1–7. SAS Inst. Inc., Cary, NC, USA. (2004).
LP S (2013) Stata statistical software: release 13. Coll. Station. TX
Team RC, others (2013) R: A language and environment for statistical computing
Rubin DB, Schafer JL (1990) Efficiently creating multiple imputations for incomplete multivariate normal data. In: Proceedings of the statistical computing section of the American Statistical Association, p 88
Van Buuren S (2007) Multiple imputation of discrete and continuous data by fully conditional specification. Stat Methods Med Res 16:219–242
Schafer JL (1997) Analysis of incomplete multivariate data. CRC Press
Van Buuren S, Brand JPL, Groothuis-Oudshoorn CGM, Rubin DB (2006) Fully conditional specification in multivariate imputation. J Stat Comput Simul 76:1049–1064
Schafer JL, Yucel RM (2002) Computational strategies for multivariate linear mixed-effects models with missing values. J Comput Graph Stat 11:437–457
Huque MH, Carlin JB, Simpson JA, Lee KJ (2018) A comparison of multiple imputation methods for missing data in longitudinal studies. BMC Med Res Methodol 18:1–16
Kim HJ, Reiter JP, Wang Q, Cox LH, Karr AF (2014) Multiple imputation of missing or faulty values under linear constraints. J Bus Econ Stat 32:375–386
Enders CK, Keller BT, Levy R (2018) A fully conditional specification approach to multilevel imputation of categorical and continuous variables. Psychol Methods 23:298
Audigier V, Niang N, Resche-Rigon M (2021) Clustering with missing data: which imputation model for which cluster analysis method? arXiv Preprint. arXiv.2106.04424
Sra S, Dhillon I (2005) Generalized nonnegative matrix approximations with Bregman divergences. Adv Neural Inf Process Syst 18
Bernaards CA, Belin TR, Schafer JL (2007) Robustness of a multivariate normal approximation for imputation of incomplete binary data. Stat Med 26:1368–1382
Honaker J, King G, Blackwell M (2011) Amelia II: a program for missing data. J Stat Softw 45:1–47
Goldstein H, Carpenter J, Kenward MG, Levin KA (2009) Multilevel models with multivariate mixed response types. Stat Modelling 9:173–197
Pritikin JN, Brick TR, Neale MC (2018) Multivariate normal maximum likelihood with both ordinal and continuous variables, and data missing at random. Behav Res Methods 50:490–500
Nevalainen J, Kenward MG, Virtanen SM (2009) Missing values in longitudinal dietary data: a multiple imputation approach based on a fully conditional specification. Stat Med 28:3657–3669
Van Buuren S (2011) Multiple imputation of multilevel data. Handb Adv Multilevel Anal 10:173–196
Van Buuren S, Groothuis-Oudshoorn K (2011) MICE: Multivariate imputation by chained equations in R. J Stat Softw 45:1–67
Audigier V, Resche-Rigon M (2017) micemd: multiple imputation by chained equations with multilevel data. R Package version 1
Robitzsch A, Grund S, Henke T (2016) Miceadds: some additional multiple imputation functions, especially for mice (Version 1.7–8)[Computer software]
Seaman SR, White IR, Copas AJ, Li L (2012) Combining multiple imputation and inverse-probability weighting. Biometrics 68:129–137
de Goeij MCM, van Diepen M, Jager KJ, Tripepi G, Zoccali C, Dekker FW (2013) Multiple imputation: dealing with missing data. Nephrol Dial Transplant 28:2415–2420
Gómez-Carracedo MP, Andrade JM, López-Mah’ia P, Muniategui S, Prada D (2014) A practical comparison of single and multiple imputation methods to handle complex missing data in air quality datasets. Chemom Intell Lab Syst 134:23–33
Hayati Rezvan P, Lee KJ, Simpson JA (2015) The rise of multiple imputation: a review of the reporting and implementation of the method in medical research. BMC Med Res Methodol 15:1–14
Enders CK, Mistler SA, Keller BT (2016) Multilevel multiple imputation: a review and evaluation of joint modeling and chained equations imputation. Psychol Methods 21:222
Takahashi M (2017) Statistical inference in missing data by MCMC and non-MCMC multiple imputation algorithms: assessing the effects of between-imputation iterations. Data Sci J 16
De Silva AP, Moreno-Betancur M, De Livera AM, Lee KJ, Simpson JA (2017) A comparison of multiple imputation methods for handling missing values in longitudinal data in the presence of a time-varying covariate with a non-linear association with time: a simulation study. BMC Med Res Methodol 17:1–11
Jakobsen JC, Gluud C, Wetterslev J, Winkel P (2017) When and how should multiple imputation be used for handling missing data in randomised clinical trials–a practical guide with flowcharts. BMC Med Res Methodol 17:1–10
Yamaguchi Y, Misumi T, Maruo K (2018) A comparison of multiple imputation methods for incomplete longitudinal binary data. J Biopharm Stat 28:645–667
Rosato R, Pagano E, Testa S, Zola P, di Cuonzo D (2021) Missing data in longitudinal studies: comparison of multiple imputation methods in a real clinical setting. J Eval Clin Pract 27:34–41
Khan SI, Hoque ASML (2020) SICE: an improved missing data imputation technique. J Big Data 7:1–21
Lim AJ-M, Cheung MW-L (2022) Evaluating FIML and multiple imputation in joint ordinal-continuous measurements models with missing data. Behav Res Methods 54:1063–1077
Austin PC, White IR, Lee DS, van Buuren S (2021) Missing data in clinical research: a tutorial on multiple imputation. Can J Cardiol 37:1322–1331
Nguyen CD, Moreno-Betancur M, Rodwell L, Romaniuk H, Carlin JB, Lee KJ (2021) Multiple imputation of semi-continuous exposure variables that are categorized for analysis. Stat Med 40:6093–6106
Nguyen CD, Carlin JB, Lee KJ (2021) Practical strategies for handling breakdown of multiple imputation procedures. Emerg Themes Epidemiol 18:1–8
Zhao Y (2022) Diagnostic checking of multiple imputation models. AStA Adv Stat Anal 106:271–286
Grund S, Lüdtke O, Robitzsch A (2022) Handling missing data in cross-classified multilevel analyses: an evaluation of different multiple imputation approaches
Elasra A (2022) Multiple imputation of missing data in educational production functions. Computation 10:49
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Sethia, K., Gosain, A., Singh, J. (2023). Review of Single Imputation and Multiple Imputation Techniques for Handling Missing Values. In: Noor, A., Saroha, K., Pricop, E., Sen, A., Trivedi, G. (eds) Proceedings of Third Emerging Trends and Technologies on Intelligent Systems. ETTIS 2023. Lecture Notes in Networks and Systems, vol 730. Springer, Singapore. https://doi.org/10.1007/978-981-99-3963-3_4
Download citation
DOI: https://doi.org/10.1007/978-981-99-3963-3_4
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-3962-6
Online ISBN: 978-981-99-3963-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)