Abstract
In the realm of data mining, the presence of missing values poses significant challenges that can undermine the accuracy and reliability of analytical outcomes. This study delves into the critical task of addressing missing values to mitigate the potential for ambiguous results in data mining processes. Recognizing the pivotal role of complete and accurate data in generating meaningful insights, this article explores various approaches for handling missing values, including omission, imputation, interpolation, and model-based techniques with valuable insights into selecting the most appropriate strategy based on contextual factors. Study also provides information about the potential of model-based imputation with their variants. The research article highlights the nuanced process of model selection and its pros and cons. The study provides a layman framework that integrates both traditional and innovative methodologies; this study contributes to a holistic understanding of mitigating the impact of missing values.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ayilara OF, Zhang L, Sajobi TT, Sawatzky R, Bohm E (2019) Lix LM (2019), Impact of missing data on bias and precision when estimating change in patient-reported outcomes from a clinical registry. Health Qual Life Outcomes 17:106
Grzymała-Busse JW, Grzymała-Busse WJ, Goodwin LK (1999), A closest fit approach to missing attribute values in preterm birth data. In: Zhong N, Skowron A, Ohsuga S (eds) New directions in rough sets, data mining, and granular-soft computing. RSFDGrC 1999. Lecture Notes in Computer Science, vol 1711. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-48061-7_49
Wang H, Wang S (2009) Discovering patterns of missing data in survey databases: an application of rough sets. Expert Syst Appl 36(3):6256–6260
Acock AC (2005) Working with missing values. J Marriage Fam 67(4):1012–1028
Allison PD (2002) Missing data. In: Sage University Papers series on Quantitative Applications in Social Sciences, 07-136. Sage, Thousand Oaks, CA
Puri A, Gupta M (2019) Review on missing value imputation techniques in data mining. Int J Sci Res Comput Sci Eng Inform Technol 2(7):35–40
King G, Honaker J, Joseph A, Scheve K (2001) Analyzing incomplete political science data: an alternative algorithm for multiple imputation. Am Polit Sci Rev 95(1):49–69
Baldwin KD, Ohman-Strickland P (2005) Missing data in orthopedic research. Univ Pennsylvania Orthop J 19
Rana P, Pahuja D, Gautam R (2014) A critical review on outlier detection techniques. Int J Sci Res 3(12):2394–2403
Sugar CA, Belim TR (2015) Evaluating model-based imputation methods for missing covariates in regression models with interactions. worldwidescience.org
Horton NJ, Kleinman KP (2007) Much ado about nothing: a comparison of missing data methods and software to fit incomplete data regression models. Am Stat 61(1):79–90
Rudas T (2005) Mixture models of missing data. Qual Quant 39:19–36
Von Hippel PT (2007) Regression with missing Y’s: an improved strategy for analyzing multiply imputed data. Sociol Methodol 37:83–117
Honaker J, King G (2010) What to do about missing values in time-series cross-section data. Am J Polit Sci 54(2):561–581
Emmanuel T, Maupong T, Mpoeleng D et al (2021) A survey on missing data in machine learning. J Big Data 8:140. https://doi.org/10.1186/s40537-021-00516-9
Paul C, Mason WM, McCaffrey D, Fox SA (2008) A cautionary case study of approaches to the treatment of missing data. Stat Methods Appl 17(3):351–372
Singh S, Prasad J (2013) Estimation of missing values in data mining. J Interdiscip Sci 1(2):75–90
Gaur S (2012) Closest fit approach to handle odd size missing block values. Int J Math Arch 3(7)
Gaur S, Pandya DD, Soni D (2020) Closest fit approach through linear interpolation to recover missing values in data mining. In: Yang XS, Sherratt S, Dey N, Joshi A (eds) Fourth International Congress on Information and Communication Technology. Advances in Intelligent Systems and Computing, vol 1041. Springer, Singapore
Gaur S, Dulawat MS (2010) A perception of statistical inference in data mining. Int J Comput Sci Commun 1(2):653–658
Sharma S, Gaur S (2013) Contiguous agile approach to manage odd size missing block in data mining. Int J Adv Res Comput Sci 4(11):214
Elahi M, Li K, Nisar W, Lv X, Wang H (2009) Detection of local outlier over dynamic data streams using efficient partitioning method. IEEE Xplore 4:76–81. https://doi.org/10.1109/CSIE.2009.217
Luengo J, Sáez JA, Herrera F (2012) Missing data imputation for fuzzy rule-based classification systems. Soft Comput 16:863–881. https://doi.org/10.1007/s00500-011-0774-4
Gaur S (2014) Estimation of missing value at extremes in data mining. Int J Adv Found Res Comput 14(03):13–19
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Pareshbhai, B.V., Buch, S.H. (2024). Diminishing Unclear Consequences of Missing Values in Data Mining. In: Joshi, A., Mahmud, M., Ragel, R.G., Karthik, S. (eds) ICT: Innovation and Computing. ICTCS 2023. Lecture Notes in Networks and Systems, vol 879. Springer, Singapore. https://doi.org/10.1007/978-981-99-9486-1_21
Download citation
DOI: https://doi.org/10.1007/978-981-99-9486-1_21
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-9485-4
Online ISBN: 978-981-99-9486-1
eBook Packages: EngineeringEngineering (R0)