Random Subspace Ensembles for Clustering Categorical Data

Al-Razgan, Muna; Domeniconi, Carlotta; Barbará, Daniel

doi:10.1007/978-3-540-78981-9_2

Muna Al-Razgan⁵,
Carlotta Domeniconi⁵ &
Daniel Barbará⁵

Part of the book series: Studies in Computational Intelligence ((SCI,volume 126))

925 Accesses
8 Citations

Summary

Cluster ensembles provide a solution to challenges inherent to clustering arising from its ill-posed nature. In fact, cluster ensembles can find robust and stable solutions by leveraging the consensus across multiple clustering results, while averaging out spurious structures that arise due to the various biases to which each participating algorithm is tuned. In this chapter we focus on the design of ensembles for categorical data. Our techniques build upon diverse input clusterings discovered in random subspaces, and reduce the problem of defining a consensus function to a graph partitioning problem. We experimentally demonstrate the efficacy of our approach in combination with the categorical clustering algorithm COOLCAT.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aldenderfer MS, Blashfield RK (1984) Cluster analysis. Sage Publications, Thousand Oaks
Google Scholar
Al-Razgan M, Domeniconi C (2006) Weighted clustering ensembles. In: Ghosh J, Lambert D, Skillicorn DB, Srivastava J (eds) Proc 6th SIAM Int Conf Data Mining, Bethesda, MD, USA. SIAM, Philadelphia, pp 258–269
Google Scholar
Ayad H, Kamel M (2003) Finding natural clusters using multi-clusterer combiner based on shared nearest neighbors. In: Windeatt T, Roli F (eds) Proc 4th Int Workshop Multiple Classifier Systems, Guildford, UK. Springer, Berlin/Heidelberg, pp 166–175
Chapter Google Scholar
Barbará D, Li Y, Couto J (2002) COOLCAT: an entropy-based algorithm for categorical clustering. In: Proc 11th Int Conf Inf Knowl Manag, McLean, VA, USA. ACM Press, New York, pp 582–589
Google Scholar
Dhillon I (2001) Co-clustering documents and words using bipartite spectral graph partitioning. In: Proc 7th SIGKDD Int Conf Knowl Discov Data Mining, San Francisco, CA, USA. ACM Press, New York, pp 269–274
Google Scholar
Fern X, Brodley C (2004) Solving cluster ensemble problems by bipartite graph partitioning. In: Proc 21st Int Conf on Mach Learn, Banff, AL, Canada. ACM, New York, pp 281–288
Google Scholar
Fred A, Jain A (2002) Data clustering using evidence accumulation. In: Proc 16th Int Conf Pattern Recognition, Quebec, QB, Canada. IEEE Computer Society, Washington, pp 276–280
Google Scholar
Gan G, Wu J (2004) Subspace clustering for high dimensional categorical data. ACM SIGKDD Explorations Newsletter 6:87–94
Article Google Scholar
Guha S, Rastogi R, Shim K (1999) ROCK: a robust clustering algorithm for categorical attributes. In: Proc 15th Int Conf Data Engineering, Sydney, NSW, Australia. IEEE Computer Society, Washington, pp 512–521
Google Scholar
He Z, Xu X, Deng S (2005) A cluster ensemble method for clustering categorical data. Inf Fusion 6:143–151
Article Google Scholar
He Z, Xu X, Deng S (2005) Clustering mixed numeric and categorical data: a cluster ensemble approach. ArXiv Computer Science e-prints
Google Scholar
Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Analysis Mach Intell 20:832–844
Article Google Scholar
Hu X (2004) Integration of cluster ensemble and text summarization for gene expression analysis. In: Proc 4th IEEE Symp Bioinformatics and Bioengineering, Taichung, Taiwan, ROC. IEEE Computer Society, Washington, pp 251–258
Google Scholar
Huang Z (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining Knowl Discov 2:283–304
Article Google Scholar
Karypis G, Kumar V (1995) Multilevel k-way partitioning scheme for irregular graphs. Technical report, University of Minnesota, Department of Computer Science and Army HPC Research Center
Google Scholar
Kleinberg EM (1990) Stochastic discrimination. Annals Math Artif Intell 1:207–239
Article MATH Google Scholar
Kleinberg EM (1996) An overtraining-resistant stochastic modeling method for pattern recognition. The Annals of Stat 24:2319–2349
Article MATH MathSciNet Google Scholar
Kuncheva L, Hadjitodorov S (2004) Using diversity in cluster ensembles. In: Proc IEEE Int Conf Systems, Man and Cybernetics, The Hague, The Netherlands. IEEE Computer Society, Washington, pp 1214–1219
Google Scholar
Mei Q, Xin D, Cheng H, Han J, Zhai C (2006) Generating semantic annotations for frequent patterns with context analysis. In: Proc 12th SIGKDD Int Conf Knowl Discov Data Mining, Philadelphia, PA, USA. ACM Press, New York, pp 337–346
Chapter Google Scholar
Newman D, Hettich S, Blake C, Merz, C (1998) UCI repository of machine learning databases
Google Scholar
Skurichina M, Duin RPW (2001) Bagging and the random subspace method for redundant feature spaces. In: Kittler J, Roli, F (eds) Proc 2nd Int Workshop Multiple Classifier Systems, Cambridge, UK. Springer, London, pp 1–10
Google Scholar
Strehl A, Ghosh J (2002) Cluster ensembles – a knowledge reuse framework for combining multiple partitions. J Mach Learn Research 3: pp 583–617
Article MathSciNet Google Scholar
Zengyou H, Xiaofei X, Shengchun D (2002) Squeezer: an efficient algorithm for clustering categorical data. J Comput Sci Technol 17:611–624
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, George Mason University, Fairfax, Virginia, 22030, USA
Muna Al-Razgan, Carlotta Domeniconi & Daniel Barbará

Authors

Muna Al-Razgan
View author publications
You can also search for this author in PubMed Google Scholar
Carlotta Domeniconi
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Barbará
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Machine Vision Group, Infotech Oulu, Finland
Oleg Okun
Department of Electrical and Information Engineering, University of Oulu, P.O. Box 4500, FI-90014, Oulu, Finland
Oleg Okun
Dipartimento di Scienze dell’Informazione, Universita degli Studi di Milano, Via Comelico 39, 20135, Milano, Italy
Giorgio Valentini

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Al-Razgan, M., Domeniconi, C., Barbará, D. (2008). Random Subspace Ensembles for Clustering Categorical Data. In: Okun, O., Valentini, G. (eds) Supervised and Unsupervised Ensemble Methods and their Applications. Studies in Computational Intelligence, vol 126. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78981-9_2

Download citation

DOI: https://doi.org/10.1007/978-3-540-78981-9_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78980-2
Online ISBN: 978-3-540-78981-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics