Skip to main content

Part of the book series: Studies in Computational Intelligence ((SCI,volume 126))

Summary

Cluster ensembles provide a solution to challenges inherent to clustering arising from its ill-posed nature. In fact, cluster ensembles can find robust and stable solutions by leveraging the consensus across multiple clustering results, while averaging out spurious structures that arise due to the various biases to which each participating algorithm is tuned. In this chapter we focus on the design of ensembles for categorical data. Our techniques build upon diverse input clusterings discovered in random subspaces, and reduce the problem of defining a consensus function to a graph partitioning problem. We experimentally demonstrate the efficacy of our approach in combination with the categorical clustering algorithm COOLCAT.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aldenderfer MS, Blashfield RK (1984) Cluster analysis. Sage Publications, Thousand Oaks

    Google Scholar 

  2. Al-Razgan M, Domeniconi C (2006) Weighted clustering ensembles. In: Ghosh J, Lambert D, Skillicorn DB, Srivastava J (eds) Proc 6th SIAM Int Conf Data Mining, Bethesda, MD, USA. SIAM, Philadelphia, pp 258–269

    Google Scholar 

  3. Ayad H, Kamel M (2003) Finding natural clusters using multi-clusterer combiner based on shared nearest neighbors. In: Windeatt T, Roli F (eds) Proc 4th Int Workshop Multiple Classifier Systems, Guildford, UK. Springer, Berlin/Heidelberg, pp 166–175

    Chapter  Google Scholar 

  4. Barbará D, Li Y, Couto J (2002) COOLCAT: an entropy-based algorithm for categorical clustering. In: Proc 11th Int Conf Inf Knowl Manag, McLean, VA, USA. ACM Press, New York, pp 582–589

    Google Scholar 

  5. Dhillon I (2001) Co-clustering documents and words using bipartite spectral graph partitioning. In: Proc 7th SIGKDD Int Conf Knowl Discov Data Mining, San Francisco, CA, USA. ACM Press, New York, pp 269–274

    Google Scholar 

  6. Fern X, Brodley C (2004) Solving cluster ensemble problems by bipartite graph partitioning. In: Proc 21st Int Conf on Mach Learn, Banff, AL, Canada. ACM, New York, pp 281–288

    Google Scholar 

  7. Fred A, Jain A (2002) Data clustering using evidence accumulation. In: Proc 16th Int Conf Pattern Recognition, Quebec, QB, Canada. IEEE Computer Society, Washington, pp 276–280

    Google Scholar 

  8. Gan G, Wu J (2004) Subspace clustering for high dimensional categorical data. ACM SIGKDD Explorations Newsletter 6:87–94

    Article  Google Scholar 

  9. Guha S, Rastogi R, Shim K (1999) ROCK: a robust clustering algorithm for categorical attributes. In: Proc 15th Int Conf Data Engineering, Sydney, NSW, Australia. IEEE Computer Society, Washington, pp 512–521

    Google Scholar 

  10. He Z, Xu X, Deng S (2005) A cluster ensemble method for clustering categorical data. Inf Fusion 6:143–151

    Article  Google Scholar 

  11. He Z, Xu X, Deng S (2005) Clustering mixed numeric and categorical data: a cluster ensemble approach. ArXiv Computer Science e-prints

    Google Scholar 

  12. Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Analysis Mach Intell 20:832–844

    Article  Google Scholar 

  13. Hu X (2004) Integration of cluster ensemble and text summarization for gene expression analysis. In: Proc 4th IEEE Symp Bioinformatics and Bioengineering, Taichung, Taiwan, ROC. IEEE Computer Society, Washington, pp 251–258

    Google Scholar 

  14. Huang Z (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining Knowl Discov 2:283–304

    Article  Google Scholar 

  15. Karypis G, Kumar V (1995) Multilevel k-way partitioning scheme for irregular graphs. Technical report, University of Minnesota, Department of Computer Science and Army HPC Research Center

    Google Scholar 

  16. Kleinberg EM (1990) Stochastic discrimination. Annals Math Artif Intell 1:207–239

    Article  MATH  Google Scholar 

  17. Kleinberg EM (1996) An overtraining-resistant stochastic modeling method for pattern recognition. The Annals of Stat 24:2319–2349

    Article  MATH  MathSciNet  Google Scholar 

  18. Kuncheva L, Hadjitodorov S (2004) Using diversity in cluster ensembles. In: Proc IEEE Int Conf Systems, Man and Cybernetics, The Hague, The Netherlands. IEEE Computer Society, Washington, pp 1214–1219

    Google Scholar 

  19. Mei Q, Xin D, Cheng H, Han J, Zhai C (2006) Generating semantic annotations for frequent patterns with context analysis. In: Proc 12th SIGKDD Int Conf Knowl Discov Data Mining, Philadelphia, PA, USA. ACM Press, New York, pp 337–346

    Chapter  Google Scholar 

  20. Newman D, Hettich S, Blake C, Merz, C (1998) UCI repository of machine learning databases

    Google Scholar 

  21. Skurichina M, Duin RPW (2001) Bagging and the random subspace method for redundant feature spaces. In: Kittler J, Roli, F (eds) Proc 2nd Int Workshop Multiple Classifier Systems, Cambridge, UK. Springer, London, pp 1–10

    Google Scholar 

  22. Strehl A, Ghosh J (2002) Cluster ensembles – a knowledge reuse framework for combining multiple partitions. J Mach Learn Research 3: pp 583–617

    Article  MathSciNet  Google Scholar 

  23. Zengyou H, Xiaofei X, Shengchun D (2002) Squeezer: an efficient algorithm for clustering categorical data. J Comput Sci Technol 17:611–624

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Al-Razgan, M., Domeniconi, C., Barbará, D. (2008). Random Subspace Ensembles for Clustering Categorical Data. In: Okun, O., Valentini, G. (eds) Supervised and Unsupervised Ensemble Methods and their Applications. Studies in Computational Intelligence, vol 126. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78981-9_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-78981-9_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-78980-2

  • Online ISBN: 978-3-540-78981-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics