Skip to main content

Centroid-Based Classification of Categorical Data

  • Conference paper
Web-Age Information Management (WAIM 2014)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8485))

Included in the following conference series:

Abstract

The traditional centroid-based classifiers cannot be directly applied to categorical data classification due to the undefined concept of centroid for a categorical class, and the lack of an effective distance measure for categorical objects. In this paper, two centroid-based classifiers are proposed for categorical data classification. We propose a new formulation for the centroid of categorical classes to address the first problem, while two weighted distance measures are defined for the second problem. The experimental results conducted on real-world data sets show the effectiveness of the proposed methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Han, E.-H(S.), Karypis, G.: Centroid-based document classification: Analysis and experimental results. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 424–431. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  2. Chen, L., Ye, Y., Jiang, Q.: New centroid-based classifier for text categorization. In: Proceedings of the AINAW, pp. 1217–1222 (2008)

    Google Scholar 

  3. Sen, P.: Gini diversity index, hamming distance and curse of dimensionality. Metron - International Journal of Statistics  LXIII(3), 329–349 (2005)

    Google Scholar 

  4. Weinberger, K., Saul, L.: Distance Metric Learning for Large Margin Nearest Neighbor Classification. Journal of Machine Learning Research 10, 207–244 (2009)

    MATH  Google Scholar 

  5. Peng, H., Long, F., Ding, C.: Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence 27, 1226–1238 (2005)

    Article  Google Scholar 

  6. Hall, M., Frank, E., et al.: The weka data mining software: An update. SIGKDD Explorations 11 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Chen, L., Guo, G. (2014). Centroid-Based Classification of Categorical Data. In: Li, F., Li, G., Hwang, Sw., Yao, B., Zhang, Z. (eds) Web-Age Information Management. WAIM 2014. Lecture Notes in Computer Science, vol 8485. Springer, Cham. https://doi.org/10.1007/978-3-319-08010-9_50

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-08010-9_50

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-08009-3

  • Online ISBN: 978-3-319-08010-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics