Skip to main content

Data Reduction Method for Categorical Data Clustering

  • Conference paper
  • 1248 Accesses

Part of the Lecture Notes in Computer Science book series (LNAI,volume 5290)

Abstract

Categorical data clustering constitutes an important part of data mining; its relevance has recently drawn attention from several researchers. As a step in data mining, however, clustering encounters the problem of large amount of data to be processed. This article offers a solution for categorical clustering algorithms when working with high volumes of data by means of a method that summarizes the database. This is done using a structure called CM-tree. In order to test our method, the K-Modes and Click clustering algorithms were used with several databases. Experiments demonstrate that the proposed summarization method improves execution time, without losing clustering quality.

Keywords

  • Categorical Attributes
  • K-modes Clustering Algorithm
  • Reduced database

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-540-88309-8_15
  • Chapter length: 10 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   84.99
Price excludes VAT (USA)
  • ISBN: 978-3-540-88309-8
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   109.00
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley and Sons, New York (1990)

    Google Scholar 

  2. Jain, A.K., Dubes, R.C.: Algorithm for Clustering Data. Prentice-Hall, Englewood Cliffs (1988)

    Google Scholar 

  3. Andritsos, P., Tsaparas, P., Miller, R.J., Sevcik, K.C.: LIMBO: A scalable Algorithm to Cluster Categorical Data. Technical report, University of Toronto, Department of Computer Science, CSRG-467 (2004)

    Google Scholar 

  4. Huang, Z.: A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining. In: Sigmod Workshop on Research Issues on Data Mining and Knowledge Discovery, pp. 1–8 (1997)

    Google Scholar 

  5. Ganti, V., Gehrkeand, J., Ramakrishanan, R.: CACTUS-Clustering Categorical Data Using Summaries. In: Proceeding of the 5th ACM Sigmod International Conference on Knowledge Discovery in Databases, San Diego, California, pp. 73–83 (1999)

    Google Scholar 

  6. Guha, S., Rastogi, R., Shim, K.: Rock: A robust clustering algorithm for categorical attributes. In: Proceeding of the 15th International Conference on Data Engineering (ICDE), Sydney, pp. 512–521 (1999)

    Google Scholar 

  7. Barbará, D., Li, Y., Couto, J.: Coolcat: an entropy-based algorithm for categorical clustering, pp. 582–589. ACM Press, New York (2002)

    Google Scholar 

  8. Zaki, M.J., Peters, M., Assent, I., Seidl, T.: CLICK: An Effective algorithm for Mining Subspace Clustering in categorical datasets. In: Proceeding of the eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 733–742 (2005)

    Google Scholar 

  9. Gowda, K., Diday, E.: Symbolic Clustering Using a New Dissimilarity Measure. Pattern Recognition 24(6), 567–578 (1991)

    CrossRef  Google Scholar 

  10. Rendón, E., Sánchez, J.S.: Clustering Based on Compressed Data for Categorical and Mixed Attibutes. In: Yeung, D.-Y., Kwok, J.T., Fred, A., Roli, F., de Ridder, D. (eds.) SSPR 2006 and SPR 2006. LNCS, vol. 4109, pp. 817–825. Springer, Heidelberg (2006)

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Rendón, E., Sánchez, J.S., Garcia, R.A., Abundez, I., Gutierrez, C., Gasca, E. (2008). Data Reduction Method for Categorical Data Clustering. In: Geffner, H., Prada, R., Machado Alexandre, I., David, N. (eds) Advances in Artificial Intelligence – IBERAMIA 2008. IBERAMIA 2008. Lecture Notes in Computer Science(), vol 5290. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88309-8_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-88309-8_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-88308-1

  • Online ISBN: 978-3-540-88309-8

  • eBook Packages: Computer ScienceComputer Science (R0)