kDMI: A Novel Method for Missing Values Imputation Using Two Levels of Horizontal Partitioning in a Data set

Rahman, Md. Geaur; Islam, Md Zahidul

doi:10.1007/978-3-642-53917-6_23

Md. Geaur Rahman²⁵ &
Md Zahidul Islam²⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8347))

Included in the following conference series:

International Conference on Advanced Data Mining and Applications

3200 Accesses
11 Citations

Abstract

Imputation of missing values is an important data mining task for improving the quality of data mining results. The imputation based on similar records is generally more accurate than the imputation based on all records of a data set. Therefore, in this paper we present a novel algorithm called kDMI that employs two levels of horizontal partitioning (based on a decision tree and k-NN algorithm) of a data set, in order to find the records that are very similar to the one with missing value/s. Additionally, it uses a novel approach to automatically find the value of k for each record. We evaluate the performance of kDMI over three high quality existing methods on two real data sets in terms of four evaluation criteria. Our initial experimental results, including 95% confidence interval analysis and statistical t-test analysis, indicate the superiority of kDMI over the existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aydilek, I.B., Arslan, A.: A hybrid method for imputation of missing values using optimized fuzzy c-means with support vector regression and a genetic algorithm. Information Sciences 233, 25–35 (2013)
Article Google Scholar
Batista, G., Monard, M.: An analysis of four missing data treatment methods for supervised learning. Applied Artificial Intelligence 17(5-6), 519–533 (2003)
Article Google Scholar
Cai, Z., Heydari, M., Lin, G.: Iterated local least squares microarray missing value imputation. Journal of Bioinformatics and Computational Biology 4(5), 935–958 (2006)
Article Google Scholar
Cheng, K., Law, N., Siu, W.: Iterative bicluster-based least square framework for estimation of missing values in microarray gene expression data. Pattern Recognition 45(4), 1281–1289 (2012)
Article Google Scholar
Farhangfar, A., Kurgan, L., Dy, J.: Impact of imputation of missing values on classification error for discrete data. Pattern Recognition 41(12), 3692–3705 (2008)
Article MATH Google Scholar
Frank, A., Asuncion, A.: UCI machine learning repository (2010), http://archive.ics.uci.edu/ml (accessed July 7, 2013)
Han, J., Kamber, M.: Data mining: Concepts and techniques. The Morgan Kaufmann Series in data management systems (2000)
Google Scholar
Junninen, H., Niska, H., Tuppurainen, K., Ruuskanen, J., Kolehmainen, M.: Methods for imputation of missing values in air quality data sets. Atmospheric Environment 38(18), 2895–2907 (2004)
Article Google Scholar
Kim, H., Golub, G., Park, H.: Missing value estimation for dna microarray gene expression data: local least squares imputation. Bioinformatics 21(2), 187–198 (2005)
Article Google Scholar
Maletic, J., Marcus, A.: Data cleansing: Beyond integrity analysis. In: Proceedings of the Conference on Information Quality, pp. 200–209. Citeseer (2000)
Google Scholar
Quinlan, J.R.: Improved use of continuous attributes in C4. 5. Journal of Artificial Intelligence Research 4, 77–90 (1996)
MATH Google Scholar
Rahman, M.G., Islam, M.Z.: A decision tree-based missing value imputation technique for data pre-processing. In: Australasian Data Mining Conference (AusDM 2011). CRPIT, vol. 121, pp. 41–50. ACS, Ballarat (2011)
Google Scholar
Rahman, M.G., Islam, M.Z.: Data quality improvement by imputation of missing values. In: International Conference on Computer Science and Information Technology (CSIT 2013), Yogyakarta, Indonesia (2013)
Google Scholar
Rahman, M.G., Islam, M.Z., Bossomaier, T., Gao, J.: Cairad: A co-appearance based analysis for incorrect records and attribute-values detection. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–10. IEEE, Brisbane (2012)
Chapter Google Scholar
Schneider, T.: Analysis of incomplete climate data: Estimation of mean values and covariance matrices and imputation of missing values. Journal of Climate 14(5), 853–871 (2001)
Article Google Scholar
Willmott, C.: Some comments on the evaluation of model performance. Bulletin of the American Meteorological Society 63, 1309–1369 (1982)
Article Google Scholar
Yan, D., Wang, J.: Biclustering of gene expression data based on related genes and conditions extraction. Pattern Recognition 46(4), 1170–1182 (2013)
Article Google Scholar
Zhu, X., Wu, X., Yang, Y.: Error detection and impact-sensitive instance ranking in noisy datasets. In: Proceedings of the National Conference on Artificial Intelligence, pp. 378–384. AAAI Press; MIT Press, Menlo Park, CA; Cambridge, MA (2004)
Google Scholar
Zhu, X., Zhang, S., Jin, Z., Zhang, Z., Xu, Z.: Missing value estimation for mixed-attribute data sets. IEEE Transactions on Knowledge and Data Engineering 23(1), 110–121 (2011)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Center for Research in Complex Systems, School of Computing and Mathematics, Charles Sturt University, Bathurst, NSW, 2795, Australia
Md. Geaur Rahman & Md Zahidul Islam

Authors

Md. Geaur Rahman
View author publications
You can also search for this author in PubMed Google Scholar
Md Zahidul Islam
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

US Air Force Office of Scientific Research, 106-0032, Tokyo, Japan
Hiroshi Motoda
School of Computer Science and Technology, Zhejiang University, 310027, Hangzhou, China
Zhaohui Wu
Faculty of Engineering and Information Technology, University of Technology, Chippendale, 2008, Sydney, NSW, Australia
Longbing Cao
Department of Computing Science, Edmonton, University of Alberta, T6G 2E8, Canada
Osmar Zaiane
College of Computer Science and Technology, Zhejiang University, Hangzhou, China
Min Yao
School of Computer Science, Fudan University, 200433, Shanghai, China
Wei Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rahman, M.G., Islam, M.Z. (2013). kDMI: A Novel Method for Missing Values Imputation Using Two Levels of Horizontal Partitioning in a Data set. In: Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M., Wang, W. (eds) Advanced Data Mining and Applications. ADMA 2013. Lecture Notes in Computer Science(), vol 8347. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-53917-6_23

Download citation

DOI: https://doi.org/10.1007/978-3-642-53917-6_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-53916-9
Online ISBN: 978-3-642-53917-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics