Diminishing Unclear Consequences of Missing Values in Data Mining

Pareshbhai, Bhathawala Vaishnavi; Buch, Sanjay H.

doi:10.1007/978-981-99-9486-1_21

Bhathawala Vaishnavi Pareshbhai¹³ &
Sanjay H. Buch¹³

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 879))

Included in the following conference series:

International Conference on Information and Communication Technology for Competitive Strategies

40 Accesses

Abstract

In the realm of data mining, the presence of missing values poses significant challenges that can undermine the accuracy and reliability of analytical outcomes. This study delves into the critical task of addressing missing values to mitigate the potential for ambiguous results in data mining processes. Recognizing the pivotal role of complete and accurate data in generating meaningful insights, this article explores various approaches for handling missing values, including omission, imputation, interpolation, and model-based techniques with valuable insights into selecting the most appropriate strategy based on contextual factors. Study also provides information about the potential of model-based imputation with their variants. The research article highlights the nuanced process of model selection and its pros and cons. The study provides a layman framework that integrates both traditional and innovative methodologies; this study contributes to a holistic understanding of mitigating the impact of missing values.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ayilara OF, Zhang L, Sajobi TT, Sawatzky R, Bohm E (2019) Lix LM (2019), Impact of missing data on bias and precision when estimating change in patient-reported outcomes from a clinical registry. Health Qual Life Outcomes 17:106
Article Google Scholar
Grzymała-Busse JW, Grzymała-Busse WJ, Goodwin LK (1999), A closest fit approach to missing attribute values in preterm birth data. In: Zhong N, Skowron A, Ohsuga S (eds) New directions in rough sets, data mining, and granular-soft computing. RSFDGrC 1999. Lecture Notes in Computer Science, vol 1711. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-48061-7_49
Wang H, Wang S (2009) Discovering patterns of missing data in survey databases: an application of rough sets. Expert Syst Appl 36(3):6256–6260
Article Google Scholar
Acock AC (2005) Working with missing values. J Marriage Fam 67(4):1012–1028
Article Google Scholar
Allison PD (2002) Missing data. In: Sage University Papers series on Quantitative Applications in Social Sciences, 07-136. Sage, Thousand Oaks, CA
Google Scholar
Puri A, Gupta M (2019) Review on missing value imputation techniques in data mining. Int J Sci Res Comput Sci Eng Inform Technol 2(7):35–40
Google Scholar
King G, Honaker J, Joseph A, Scheve K (2001) Analyzing incomplete political science data: an alternative algorithm for multiple imputation. Am Polit Sci Rev 95(1):49–69
Article Google Scholar
Baldwin KD, Ohman-Strickland P (2005) Missing data in orthopedic research. Univ Pennsylvania Orthop J 19
Google Scholar
Rana P, Pahuja D, Gautam R (2014) A critical review on outlier detection techniques. Int J Sci Res 3(12):2394–2403
Google Scholar
Sugar CA, Belim TR (2015) Evaluating model-based imputation methods for missing covariates in regression models with interactions. worldwidescience.org
Google Scholar
Horton NJ, Kleinman KP (2007) Much ado about nothing: a comparison of missing data methods and software to fit incomplete data regression models. Am Stat 61(1):79–90
Article MathSciNet Google Scholar
Rudas T (2005) Mixture models of missing data. Qual Quant 39:19–36
Article Google Scholar
Von Hippel PT (2007) Regression with missing Y’s: an improved strategy for analyzing multiply imputed data. Sociol Methodol 37:83–117
Article Google Scholar
Honaker J, King G (2010) What to do about missing values in time-series cross-section data. Am J Polit Sci 54(2):561–581
Article Google Scholar
Emmanuel T, Maupong T, Mpoeleng D et al (2021) A survey on missing data in machine learning. J Big Data 8:140. https://doi.org/10.1186/s40537-021-00516-9
Article Google Scholar
Paul C, Mason WM, McCaffrey D, Fox SA (2008) A cautionary case study of approaches to the treatment of missing data. Stat Methods Appl 17(3):351–372
Article MathSciNet Google Scholar
Singh S, Prasad J (2013) Estimation of missing values in data mining. J Interdiscip Sci 1(2):75–90
Article Google Scholar
Gaur S (2012) Closest fit approach to handle odd size missing block values. Int J Math Arch 3(7)
Google Scholar
Gaur S, Pandya DD, Soni D (2020) Closest fit approach through linear interpolation to recover missing values in data mining. In: Yang XS, Sherratt S, Dey N, Joshi A (eds) Fourth International Congress on Information and Communication Technology. Advances in Intelligent Systems and Computing, vol 1041. Springer, Singapore
Google Scholar
Gaur S, Dulawat MS (2010) A perception of statistical inference in data mining. Int J Comput Sci Commun 1(2):653–658
Google Scholar
Sharma S, Gaur S (2013) Contiguous agile approach to manage odd size missing block in data mining. Int J Adv Res Comput Sci 4(11):214
Google Scholar
Elahi M, Li K, Nisar W, Lv X, Wang H (2009) Detection of local outlier over dynamic data streams using efficient partitioning method. IEEE Xplore 4:76–81. https://doi.org/10.1109/CSIE.2009.217
Article Google Scholar
Luengo J, Sáez JA, Herrera F (2012) Missing data imputation for fuzzy rule-based classification systems. Soft Comput 16:863–881. https://doi.org/10.1007/s00500-011-0774-4
Article Google Scholar
Gaur S (2014) Estimation of missing value at extremes in data mining. Int J Adv Found Res Comput 14(03):13–19
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Bhagwan Mahavir Center for Advance Research, Bhagwan Mahavir University, Surat, India
Bhathawala Vaishnavi Pareshbhai & Sanjay H. Buch

Authors

Bhathawala Vaishnavi Pareshbhai
View author publications
You can also search for this author in PubMed Google Scholar
Sanjay H. Buch
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bhathawala Vaishnavi Pareshbhai .

Editor information

Editors and Affiliations

Global Knowledge Research Foundation, Ahmedabad, Gujarat, India
Amit Joshi
Nottingham Trent University, Nottingham, UK
Mufti Mahmud
University of Peradeniya, Kandy, Sri Lanka
Roshan G. Ragel
Department of Comp Sci & Engg, SNS College of Technology, Coimbatore, Tamil Nadu, India
S. Karthik

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pareshbhai, B.V., Buch, S.H. (2024). Diminishing Unclear Consequences of Missing Values in Data Mining. In: Joshi, A., Mahmud, M., Ragel, R.G., Karthik, S. (eds) ICT: Innovation and Computing. ICTCS 2023. Lecture Notes in Networks and Systems, vol 879. Springer, Singapore. https://doi.org/10.1007/978-981-99-9486-1_21

Download citation

DOI: https://doi.org/10.1007/978-981-99-9486-1_21
Published: 18 April 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-9485-4
Online ISBN: 978-981-99-9486-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics