Skip to main content

Review of Single Imputation and Multiple Imputation Techniques for Handling Missing Values

  • Conference paper
  • First Online:
Proceedings of Third Emerging Trends and Technologies on Intelligent Systems (ETTIS 2023)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 730))

  • 245 Accesses

Abstract

Missing values are a common cause of poor data quality. When not handled properly, it can interrupt data pipelines and have a disastrous impact on Data Mining, Machine Learning (ML) and Statistical applications. To draw reliable and accurate inferences from the data, missing values must be treated correctly. Adoption of any imputation technique needs a thorough understanding of the underlying assumptions and rules followed by the technique. Most of the earlier reviews are based on statistical and ML techniques, but none of these reviewed and discussed single imputation (SΙ) and multiple imputation (MI) techniques in detail. This paper aims to review the SI and MI techniques for handling missing values which will give researchers an overview and motivate them to use these techniques in their research study.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Junninen H, Niska H, Tuppurainen K, Ruuskanen J, Kolehmainen M (2004) Methods for imputation of missing values in air quality data sets. Atmos Environ 38:2895–2907

    Article  Google Scholar 

  2. Di Zio M, Guarnera U, Luzi O (2007) Imputation through finite Gaussian mixture models. Comput Stat Data Anal 51:5305–5316

    Google Scholar 

  3. Verboven S, Vanden Branden K, Goos P (2007) Sequential imputation for missing values. Comput Biol Chem 31:320–327

    Google Scholar 

  4. Lakshminarayan K, Harp SA, Samad T (1999) Imputation of missing data in industrial databases. Appl Intell 11:259–275

    Article  Google Scholar 

  5. Rubin DB (1976) Inference and missing data. Biometrika 63:581–592

    Article  MathSciNet  MATH  Google Scholar 

  6. Swalin A (2018) How to handle missing data. Towards Data Sci 18:1–19. https://towardsdatascience.com/how-to-handle-missing-data-8646b18db0d4

  7. Schafer JL, Graham JW (2002) Missing data: our view of the state of the art. Psychol Methods 7:147

    Article  Google Scholar 

  8. Little RJA, Rubin DB (2019) Statistical analysis with missing data. Wiley

    Google Scholar 

  9. Jäger S, Allhorn A, Bießmann F (2021) A benchmark for data imputation methods. Front Big Data 48

    Google Scholar 

  10. Rubin DB (1987) Multiple imputation for survey nonresponse

    Google Scholar 

  11. Van Buuren S (2018) Flexible imputation of missing data. CRC

    Google Scholar 

  12. SAS, S.A.S., Guide, S.U.: Version 9.1, Volumes 1–7. SAS Inst. Inc., Cary, NC, USA. (2004).

    Google Scholar 

  13. LP S (2013) Stata statistical software: release 13. Coll. Station. TX

    Google Scholar 

  14. Team RC, others (2013) R: A language and environment for statistical computing

    Google Scholar 

  15. Rubin DB, Schafer JL (1990) Efficiently creating multiple imputations for incomplete multivariate normal data. In: Proceedings of the statistical computing section of the American Statistical Association, p 88

    Google Scholar 

  16. Van Buuren S (2007) Multiple imputation of discrete and continuous data by fully conditional specification. Stat Methods Med Res 16:219–242

    Article  MathSciNet  MATH  Google Scholar 

  17. Schafer JL (1997) Analysis of incomplete multivariate data. CRC Press

    Google Scholar 

  18. Van Buuren S, Brand JPL, Groothuis-Oudshoorn CGM, Rubin DB (2006) Fully conditional specification in multivariate imputation. J Stat Comput Simul 76:1049–1064

    Article  MathSciNet  MATH  Google Scholar 

  19. Schafer JL, Yucel RM (2002) Computational strategies for multivariate linear mixed-effects models with missing values. J Comput Graph Stat 11:437–457

    Article  MathSciNet  Google Scholar 

  20. Huque MH, Carlin JB, Simpson JA, Lee KJ (2018) A comparison of multiple imputation methods for missing data in longitudinal studies. BMC Med Res Methodol 18:1–16

    Article  Google Scholar 

  21. Kim HJ, Reiter JP, Wang Q, Cox LH, Karr AF (2014) Multiple imputation of missing or faulty values under linear constraints. J Bus Econ Stat 32:375–386

    Google Scholar 

  22. Enders CK, Keller BT, Levy R (2018) A fully conditional specification approach to multilevel imputation of categorical and continuous variables. Psychol Methods 23:298

    Article  Google Scholar 

  23. Audigier V, Niang N, Resche-Rigon M (2021) Clustering with missing data: which imputation model for which cluster analysis method? arXiv Preprint. arXiv.2106.04424

    Google Scholar 

  24. Sra S, Dhillon I (2005) Generalized nonnegative matrix approximations with Bregman divergences. Adv Neural Inf Process Syst 18

    Google Scholar 

  25. Bernaards CA, Belin TR, Schafer JL (2007) Robustness of a multivariate normal approximation for imputation of incomplete binary data. Stat Med 26:1368–1382

    Article  MathSciNet  Google Scholar 

  26. Honaker J, King G, Blackwell M (2011) Amelia II: a program for missing data. J Stat Softw 45:1–47

    Article  Google Scholar 

  27. Goldstein H, Carpenter J, Kenward MG, Levin KA (2009) Multilevel models with multivariate mixed response types. Stat Modelling 9:173–197

    Article  MathSciNet  MATH  Google Scholar 

  28. Pritikin JN, Brick TR, Neale MC (2018) Multivariate normal maximum likelihood with both ordinal and continuous variables, and data missing at random. Behav Res Methods 50:490–500

    Article  Google Scholar 

  29. Nevalainen J, Kenward MG, Virtanen SM (2009) Missing values in longitudinal dietary data: a multiple imputation approach based on a fully conditional specification. Stat Med 28:3657–3669

    Article  MathSciNet  Google Scholar 

  30. Van Buuren S (2011) Multiple imputation of multilevel data. Handb Adv Multilevel Anal 10:173–196

    Google Scholar 

  31. Van Buuren S, Groothuis-Oudshoorn K (2011) MICE: Multivariate imputation by chained equations in R. J Stat Softw 45:1–67

    Article  Google Scholar 

  32. Audigier V, Resche-Rigon M (2017) micemd: multiple imputation by chained equations with multilevel data. R Package version 1

    Google Scholar 

  33. Robitzsch A, Grund S, Henke T (2016) Miceadds: some additional multiple imputation functions, especially for mice (Version 1.7–8)[Computer software]

    Google Scholar 

  34. Seaman SR, White IR, Copas AJ, Li L (2012) Combining multiple imputation and inverse-probability weighting. Biometrics 68:129–137

    Article  MathSciNet  MATH  Google Scholar 

  35. de Goeij MCM, van Diepen M, Jager KJ, Tripepi G, Zoccali C, Dekker FW (2013) Multiple imputation: dealing with missing data. Nephrol Dial Transplant 28:2415–2420

    Article  Google Scholar 

  36. Gómez-Carracedo MP, Andrade JM, López-Mah’ia P, Muniategui S, Prada D (2014) A practical comparison of single and multiple imputation methods to handle complex missing data in air quality datasets. Chemom Intell Lab Syst 134:23–33

    Google Scholar 

  37. Hayati Rezvan P, Lee KJ, Simpson JA (2015) The rise of multiple imputation: a review of the reporting and implementation of the method in medical research. BMC Med Res Methodol 15:1–14

    Article  Google Scholar 

  38. Enders CK, Mistler SA, Keller BT (2016) Multilevel multiple imputation: a review and evaluation of joint modeling and chained equations imputation. Psychol Methods 21:222

    Article  Google Scholar 

  39. Takahashi M (2017) Statistical inference in missing data by MCMC and non-MCMC multiple imputation algorithms: assessing the effects of between-imputation iterations. Data Sci J 16

    Google Scholar 

  40. De Silva AP, Moreno-Betancur M, De Livera AM, Lee KJ, Simpson JA (2017) A comparison of multiple imputation methods for handling missing values in longitudinal data in the presence of a time-varying covariate with a non-linear association with time: a simulation study. BMC Med Res Methodol 17:1–11

    Article  Google Scholar 

  41. Jakobsen JC, Gluud C, Wetterslev J, Winkel P (2017) When and how should multiple imputation be used for handling missing data in randomised clinical trials–a practical guide with flowcharts. BMC Med Res Methodol 17:1–10

    Article  Google Scholar 

  42. Yamaguchi Y, Misumi T, Maruo K (2018) A comparison of multiple imputation methods for incomplete longitudinal binary data. J Biopharm Stat 28:645–667

    Article  Google Scholar 

  43. Rosato R, Pagano E, Testa S, Zola P, di Cuonzo D (2021) Missing data in longitudinal studies: comparison of multiple imputation methods in a real clinical setting. J Eval Clin Pract 27:34–41

    Article  Google Scholar 

  44. Khan SI, Hoque ASML (2020) SICE: an improved missing data imputation technique. J Big Data 7:1–21

    Article  Google Scholar 

  45. Lim AJ-M, Cheung MW-L (2022) Evaluating FIML and multiple imputation in joint ordinal-continuous measurements models with missing data. Behav Res Methods 54:1063–1077

    Article  Google Scholar 

  46. Austin PC, White IR, Lee DS, van Buuren S (2021) Missing data in clinical research: a tutorial on multiple imputation. Can J Cardiol 37:1322–1331

    Article  Google Scholar 

  47. Nguyen CD, Moreno-Betancur M, Rodwell L, Romaniuk H, Carlin JB, Lee KJ (2021) Multiple imputation of semi-continuous exposure variables that are categorized for analysis. Stat Med 40:6093–6106

    Article  MathSciNet  Google Scholar 

  48. Nguyen CD, Carlin JB, Lee KJ (2021) Practical strategies for handling breakdown of multiple imputation procedures. Emerg Themes Epidemiol 18:1–8

    Article  Google Scholar 

  49. Zhao Y (2022) Diagnostic checking of multiple imputation models. AStA Adv Stat Anal 106:271–286

    Article  MathSciNet  MATH  Google Scholar 

  50. Grund S, Lüdtke O, Robitzsch A (2022) Handling missing data in cross-classified multilevel analyses: an evaluation of different multiple imputation approaches

    Google Scholar 

  51. Elasra A (2022) Multiple imputation of missing data in educational production functions. Computation 10:49

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kavita Sethia .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sethia, K., Gosain, A., Singh, J. (2023). Review of Single Imputation and Multiple Imputation Techniques for Handling Missing Values. In: Noor, A., Saroha, K., Pricop, E., Sen, A., Trivedi, G. (eds) Proceedings of Third Emerging Trends and Technologies on Intelligent Systems. ETTIS 2023. Lecture Notes in Networks and Systems, vol 730. Springer, Singapore. https://doi.org/10.1007/978-981-99-3963-3_4

Download citation

Publish with us

Policies and ethics