Skip to main content

kDMI: A Novel Method for Missing Values Imputation Using Two Levels of Horizontal Partitioning in a Data set

  • Conference paper
Advanced Data Mining and Applications (ADMA 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8347))

Included in the following conference series:

Abstract

Imputation of missing values is an important data mining task for improving the quality of data mining results. The imputation based on similar records is generally more accurate than the imputation based on all records of a data set. Therefore, in this paper we present a novel algorithm called kDMI that employs two levels of horizontal partitioning (based on a decision tree and k-NN algorithm) of a data set, in order to find the records that are very similar to the one with missing value/s. Additionally, it uses a novel approach to automatically find the value of k for each record. We evaluate the performance of kDMI over three high quality existing methods on two real data sets in terms of four evaluation criteria. Our initial experimental results, including 95% confidence interval analysis and statistical t-test analysis, indicate the superiority of kDMI over the existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aydilek, I.B., Arslan, A.: A hybrid method for imputation of missing values using optimized fuzzy c-means with support vector regression and a genetic algorithm. Information Sciences 233, 25–35 (2013)

    Article  Google Scholar 

  2. Batista, G., Monard, M.: An analysis of four missing data treatment methods for supervised learning. Applied Artificial Intelligence 17(5-6), 519–533 (2003)

    Article  Google Scholar 

  3. Cai, Z., Heydari, M., Lin, G.: Iterated local least squares microarray missing value imputation. Journal of Bioinformatics and Computational Biology 4(5), 935–958 (2006)

    Article  Google Scholar 

  4. Cheng, K., Law, N., Siu, W.: Iterative bicluster-based least square framework for estimation of missing values in microarray gene expression data. Pattern Recognition 45(4), 1281–1289 (2012)

    Article  Google Scholar 

  5. Farhangfar, A., Kurgan, L., Dy, J.: Impact of imputation of missing values on classification error for discrete data. Pattern Recognition 41(12), 3692–3705 (2008)

    Article  MATH  Google Scholar 

  6. Frank, A., Asuncion, A.: UCI machine learning repository (2010), http://archive.ics.uci.edu/ml (accessed July 7, 2013)

  7. Han, J., Kamber, M.: Data mining: Concepts and techniques. The Morgan Kaufmann Series in data management systems (2000)

    Google Scholar 

  8. Junninen, H., Niska, H., Tuppurainen, K., Ruuskanen, J., Kolehmainen, M.: Methods for imputation of missing values in air quality data sets. Atmospheric Environment 38(18), 2895–2907 (2004)

    Article  Google Scholar 

  9. Kim, H., Golub, G., Park, H.: Missing value estimation for dna microarray gene expression data: local least squares imputation. Bioinformatics 21(2), 187–198 (2005)

    Article  Google Scholar 

  10. Maletic, J., Marcus, A.: Data cleansing: Beyond integrity analysis. In: Proceedings of the Conference on Information Quality, pp. 200–209. Citeseer (2000)

    Google Scholar 

  11. Quinlan, J.R.: Improved use of continuous attributes in C4. 5. Journal of Artificial Intelligence Research 4, 77–90 (1996)

    MATH  Google Scholar 

  12. Rahman, M.G., Islam, M.Z.: A decision tree-based missing value imputation technique for data pre-processing. In: Australasian Data Mining Conference (AusDM 2011). CRPIT, vol. 121, pp. 41–50. ACS, Ballarat (2011)

    Google Scholar 

  13. Rahman, M.G., Islam, M.Z.: Data quality improvement by imputation of missing values. In: International Conference on Computer Science and Information Technology (CSIT 2013), Yogyakarta, Indonesia (2013)

    Google Scholar 

  14. Rahman, M.G., Islam, M.Z., Bossomaier, T., Gao, J.: Cairad: A co-appearance based analysis for incorrect records and attribute-values detection. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–10. IEEE, Brisbane (2012)

    Chapter  Google Scholar 

  15. Schneider, T.: Analysis of incomplete climate data: Estimation of mean values and covariance matrices and imputation of missing values. Journal of Climate 14(5), 853–871 (2001)

    Article  Google Scholar 

  16. Willmott, C.: Some comments on the evaluation of model performance. Bulletin of the American Meteorological Society 63, 1309–1369 (1982)

    Article  Google Scholar 

  17. Yan, D., Wang, J.: Biclustering of gene expression data based on related genes and conditions extraction. Pattern Recognition 46(4), 1170–1182 (2013)

    Article  Google Scholar 

  18. Zhu, X., Wu, X., Yang, Y.: Error detection and impact-sensitive instance ranking in noisy datasets. In: Proceedings of the National Conference on Artificial Intelligence, pp. 378–384. AAAI Press; MIT Press, Menlo Park, CA; Cambridge, MA (2004)

    Google Scholar 

  19. Zhu, X., Zhang, S., Jin, Z., Zhang, Z., Xu, Z.: Missing value estimation for mixed-attribute data sets. IEEE Transactions on Knowledge and Data Engineering 23(1), 110–121 (2011)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Rahman, M.G., Islam, M.Z. (2013). kDMI: A Novel Method for Missing Values Imputation Using Two Levels of Horizontal Partitioning in a Data set. In: Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M., Wang, W. (eds) Advanced Data Mining and Applications. ADMA 2013. Lecture Notes in Computer Science(), vol 8347. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-53917-6_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-53917-6_23

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-53916-9

  • Online ISBN: 978-3-642-53917-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics