Skip to main content

ITCI:An Information Theory Based Classification Algorithm for Incomplete Data

  • Conference paper
Web-Age Information Management (WAIM 2014)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8485))

Included in the following conference series:

  • 5808 Accesses

Abstract

In the field of data mining, classification is an important aspect which has been studied widely. However, most of the existing studies assumed the data for classification is complete, while in practice, a lot of data with missing values exists. When dealing with these data, deleting the incomplete instances will result in a reduction of available information and filling in missing values may introduce skew and errors. To avoid the above problems, it is of great importance to study how to classify directly with incomplete data. In the paper, an information theory based classification algorithm, ITCI, is proposed. ITCI calculates the initial uncertainty of each class and attributes’ contribution to decrease class uncertainty in the training stage and then, in the testing stage, an instance is assigned to the class whose uncertainty is minimum after all of the attributes are taken into consideration. Extended experiments proved the effectiveness and feasibility of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Gantayat, S.S., Misra, A., Panda, B.S.: A study of incomplete data – A review. In: Satapathy, S.C., Udgata, S.K., Biswal, B.N. (eds.) FICTA 2013. AISC, vol. 247, pp. 401–408. Springer, Heidelberg (2014)

    Google Scholar 

  2. Graham, J.W.: Missing Data Theory. Missing Data, pp. 3–46. Springer, New York (2012)

    Book  Google Scholar 

  3. Little, R.J.A., Rubin, D.B.: Statistical analysis with missing data (2002)

    Google Scholar 

  4. Farhangfar, A., Kurgan, L.A., Pedrycz, W.: A novel framework for imputation of missing values in databases. IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans 37(5), 692–709 (2007)

    Article  Google Scholar 

  5. Zhang, S., Jin, Z., Zhu, X.: Missing data imputation by utilizing information within incomplete instances. Journal of Systems and Software 84(3), 452–459 (2011)

    Article  Google Scholar 

  6. Garca-Laencina, P.J., Sancho-Gmez, J.L., Figueiras-Vidal, A.R.: Pattern classification with missing data: a review. Neural Computing and Applications 19(2), 263–282 (2010)

    Article  Google Scholar 

  7. Zhang, X., Song, S., Wu, C.: Robust Bayesian Classification with Incomplete Data. Cognitive Computation, 1–18 (2013)

    Google Scholar 

  8. Quinlan, J.R.: C4. 5: programs for machine learning. Morgan Kaufmann (1993)

    Google Scholar 

  9. Ichihashi, H., Honda, K., Notsu, A., et al.: Fuzzy c-means classifier with deterministic initialization and missing value imputation. In: IEEE Symposium on Foundations of Computational Intelligence, FOCI 2007, pp. 214–221. IEEE (2007)

    Google Scholar 

  10. Chechik, G., Heitz, G., Elidan, G., et al.: Max-margin classification of incomplete data. In: Advances in Neural Information Processing Systems: Proceedings of the 2006 Conference, vol. 19, p. 233. The MIT Press (2007)

    Google Scholar 

  11. Wang, S.C., Yuan, S.M.: Research on Learning Bayesian Networks Structure with Missing Data. Journal of Software 7, 11 (2004)

    Google Scholar 

  12. Jonsson, P., Wohlin, C.: An evaluation of k-nearest neighbour imputation using likert data. In: Proceedings of the 10th International Symposium on Software Metrics, pp. 108–118. IEEE (2004)

    Google Scholar 

  13. Blomberg, L.C., Ruiz, D.D.A.: Evaluating the Influence of Missing Data on Classification Algorithms in Data Mining Applications. SBSI 2013: SimpiĂłsio Brasileiro de Sistemas de Informacao (2013)

    Google Scholar 

  14. Ramoni, M., Sebastiani, P.: Robust bayes classifiers. Artificial Intelligence 125(1), 209–226 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  15. Corani, G., Zaffalon, M.: Naive credal classifier 2: an extension of naive Bayes for delivering robust classifications. DMIN 8, 84–90 (2008)

    Google Scholar 

  16. Dai, J., Xu, Q., Wang, W.: A comparative study on strategies of rule induction for incomplete data based on rough set approach[J]. International Journal of Advancements in Computing Technology 3(3), 176–183 (2011)

    Article  Google Scholar 

  17. Grzymala-Busse, J.W., Hippe, Z.S.: Mining Incomplete Data A Rough Set Approach. Emerging Paradigms in Machine Learning, pp. 49–74. Springer, Heidelberg (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Chen, Y., Li, J., Luo, J. (2014). ITCI:An Information Theory Based Classification Algorithm for Incomplete Data. In: Li, F., Li, G., Hwang, Sw., Yao, B., Zhang, Z. (eds) Web-Age Information Management. WAIM 2014. Lecture Notes in Computer Science, vol 8485. Springer, Cham. https://doi.org/10.1007/978-3-319-08010-9_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-08010-9_19

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-08009-3

  • Online ISBN: 978-3-319-08010-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics