Skip to main content

Diminishing Unclear Consequences of Missing Values in Data Mining

  • Conference paper
  • First Online:
ICT: Innovation and Computing (ICTCS 2023)

Abstract

In the realm of data mining, the presence of missing values poses significant challenges that can undermine the accuracy and reliability of analytical outcomes. This study delves into the critical task of addressing missing values to mitigate the potential for ambiguous results in data mining processes. Recognizing the pivotal role of complete and accurate data in generating meaningful insights, this article explores various approaches for handling missing values, including omission, imputation, interpolation, and model-based techniques with valuable insights into selecting the most appropriate strategy based on contextual factors. Study also provides information about the potential of model-based imputation with their variants. The research article highlights the nuanced process of model selection and its pros and cons. The study provides a layman framework that integrates both traditional and innovative methodologies; this study contributes to a holistic understanding of mitigating the impact of missing values.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ayilara OF, Zhang L, Sajobi TT, Sawatzky R, Bohm E (2019) Lix LM (2019), Impact of missing data on bias and precision when estimating change in patient-reported outcomes from a clinical registry. Health Qual Life Outcomes 17:106

    Article  Google Scholar 

  2. Grzymała-Busse JW, Grzymała-Busse WJ, Goodwin LK (1999), A closest fit approach to missing attribute values in preterm birth data. In: Zhong N, Skowron A, Ohsuga S (eds) New directions in rough sets, data mining, and granular-soft computing. RSFDGrC 1999. Lecture Notes in Computer Science, vol 1711. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-48061-7_49

  3. Wang H, Wang S (2009) Discovering patterns of missing data in survey databases: an application of rough sets. Expert Syst Appl 36(3):6256–6260

    Article  Google Scholar 

  4. Acock AC (2005) Working with missing values. J Marriage Fam 67(4):1012–1028

    Article  Google Scholar 

  5. Allison PD (2002) Missing data. In: Sage University Papers series on Quantitative Applications in Social Sciences, 07-136. Sage, Thousand Oaks, CA

    Google Scholar 

  6. Puri A, Gupta M (2019) Review on missing value imputation techniques in data mining. Int J Sci Res Comput Sci Eng Inform Technol 2(7):35–40

    Google Scholar 

  7. King G, Honaker J, Joseph A, Scheve K (2001) Analyzing incomplete political science data: an alternative algorithm for multiple imputation. Am Polit Sci Rev 95(1):49–69

    Article  Google Scholar 

  8. Baldwin KD, Ohman-Strickland P (2005) Missing data in orthopedic research. Univ Pennsylvania Orthop J 19

    Google Scholar 

  9. Rana P, Pahuja D, Gautam R (2014) A critical review on outlier detection techniques. Int J Sci Res 3(12):2394–2403

    Google Scholar 

  10. Sugar CA, Belim TR (2015) Evaluating model-based imputation methods for missing covariates in regression models with interactions. worldwidescience.org

    Google Scholar 

  11. Horton NJ, Kleinman KP (2007) Much ado about nothing: a comparison of missing data methods and software to fit incomplete data regression models. Am Stat 61(1):79–90

    Article  MathSciNet  Google Scholar 

  12. Rudas T (2005) Mixture models of missing data. Qual Quant 39:19–36

    Article  Google Scholar 

  13. Von Hippel PT (2007) Regression with missing Y’s: an improved strategy for analyzing multiply imputed data. Sociol Methodol 37:83–117

    Article  Google Scholar 

  14. Honaker J, King G (2010) What to do about missing values in time-series cross-section data. Am J Polit Sci 54(2):561–581

    Article  Google Scholar 

  15. Emmanuel T, Maupong T, Mpoeleng D et al (2021) A survey on missing data in machine learning. J Big Data 8:140. https://doi.org/10.1186/s40537-021-00516-9

    Article  Google Scholar 

  16. Paul C, Mason WM, McCaffrey D, Fox SA (2008) A cautionary case study of approaches to the treatment of missing data. Stat Methods Appl 17(3):351–372

    Article  MathSciNet  Google Scholar 

  17. Singh S, Prasad J (2013) Estimation of missing values in data mining. J Interdiscip Sci 1(2):75–90

    Article  Google Scholar 

  18. Gaur S (2012) Closest fit approach to handle odd size missing block values. Int J Math Arch 3(7)

    Google Scholar 

  19. Gaur S, Pandya DD, Soni D (2020) Closest fit approach through linear interpolation to recover missing values in data mining. In: Yang XS, Sherratt S, Dey N, Joshi A (eds) Fourth International Congress on Information and Communication Technology. Advances in Intelligent Systems and Computing, vol 1041. Springer, Singapore

    Google Scholar 

  20. Gaur S, Dulawat MS (2010) A perception of statistical inference in data mining. Int J Comput Sci Commun 1(2):653–658

    Google Scholar 

  21. Sharma S, Gaur S (2013) Contiguous agile approach to manage odd size missing block in data mining. Int J Adv Res Comput Sci 4(11):214

    Google Scholar 

  22. Elahi M, Li K, Nisar W, Lv X, Wang H (2009) Detection of local outlier over dynamic data streams using efficient partitioning method. IEEE Xplore 4:76–81. https://doi.org/10.1109/CSIE.2009.217

    Article  Google Scholar 

  23. Luengo J, Sáez JA, Herrera F (2012) Missing data imputation for fuzzy rule-based classification systems. Soft Comput 16:863–881. https://doi.org/10.1007/s00500-011-0774-4

    Article  Google Scholar 

  24. Gaur S (2014) Estimation of missing value at extremes in data mining. Int J Adv Found Res Comput 14(03):13–19

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bhathawala Vaishnavi Pareshbhai .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pareshbhai, B.V., Buch, S.H. (2024). Diminishing Unclear Consequences of Missing Values in Data Mining. In: Joshi, A., Mahmud, M., Ragel, R.G., Karthik, S. (eds) ICT: Innovation and Computing. ICTCS 2023. Lecture Notes in Networks and Systems, vol 879. Springer, Singapore. https://doi.org/10.1007/978-981-99-9486-1_21

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-9486-1_21

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-9485-4

  • Online ISBN: 978-981-99-9486-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics