How to Control Clustering Results? Flexible Clustering Aggregation

Hahmann, Martin; Volk, Peter B.; Rosenthal, Frank; Habich, Dirk; Lehner, Wolfgang

doi:10.1007/978-3-642-03915-7_6

Martin Hahmann²⁰,
Peter B. Volk²⁰,
Frank Rosenthal²⁰,
Dirk Habich²⁰ &
…
Wolfgang Lehner²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5772))

Included in the following conference series:

International Symposium on Intelligent Data Analysis

1799 Accesses
4 Citations

Abstract

One of the most important and challenging questions in the area of clustering is how to choose the best-fitting algorithm and parameterization to obtain an optimal clustering for the considered data. The clustering aggregation concept tries to bypass this problem by generating a set of separate, heterogeneous partitionings of the same data set, from which an aggregate clustering is derived. As of now, almost every existing aggregation approach combines given crisp clusterings on the basis of pair-wise similarities. In this paper, we regard an input set of soft clusterings and show that it contains additional information that is efficiently useable for the aggregation. Our approach introduces an expansion of mentioned pair-wise similarities, allowing control and adjustment of the aggregation process and its result. Our experiments show that our flexible approach offers adaptive results, improved identification of structures and high useability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proc. of KDD (1996)
Google Scholar
Forgy, E.W.: Cluster analysis of multivariate data: Efficiency versus interpretability of classification. Biometrics 21 (1965)
Google Scholar
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31 (1999)
Google Scholar
Zeng, Y., Tang, J., Garcia-Frias, J., Gao, G.R.: An adaptive meta-clustering approach: Combining the information from different clustering results. In: Proc. of CSB (2002)
Google Scholar
Gionis, A., Mannila, H., Tsaparas, P.: Clustering aggregation. In: Proc. of ICDE (2005)
Google Scholar
Boulis, C., Ostendorf, M.: Combining multiple clustering systems. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 63–74. Springer, Heidelberg (2004)
Chapter Google Scholar
Strehl, A., Ghosh, J.: Cluster ensembles — a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research 3 (2002)
Google Scholar
Filkov, V., Skiena, S.S.: Heterogeneous data integration with the consensus clustering formalism. In: Rahm, E. (ed.) DILS 2004. LNCS (LNBI), vol. 2994, pp. 110–123. Springer, Heidelberg (2004)
Chapter Google Scholar
Fred, A.L.N., Jain, A.K.: Robust data clustering. In: Proc. of CVPR (2003)
Google Scholar
Dimitriadou, E., Weingessel, A., Hornik, K.: Voting-merging: An ensemble method for clustering. In: Dorffner, G., Bischof, H., Hornik, K. (eds.) ICANN 2001. LNCS, vol. 2130, p. 217. Springer, Heidelberg (2001)
Chapter Google Scholar
Long, B., Zhang, Z.M., Yu, P.S.: Combining multiple clusterings by soft correspondence. In: Proc. of ICDM (2005)
Google Scholar
Topchy, A.P., Jain, A.K., Punch, W.F.: Combining multiple weak clusterings. In: Proc. of ICDM (2003)
Google Scholar
Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum, New York (1981)
Book MATH Google Scholar
Habich, D., Wächter, T., Lehner, W., Pilarsky, C.: Two-phase clustering strategy for gene expression data sets. In: Proc. of SAC (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Database Technology Group, Dresden University of Technology, Email: dbinfo@mail.inf.tu-dresden.de, Germany
Martin Hahmann, Peter B. Volk, Frank Rosenthal, Dirk Habich & Wolfgang Lehner

Authors

Martin Hahmann
View author publications
You can also search for this author in PubMed Google Scholar
Peter B. Volk
View author publications
You can also search for this author in PubMed Google Scholar
Frank Rosenthal
View author publications
You can also search for this author in PubMed Google Scholar
Dirk Habich
View author publications
You can also search for this author in PubMed Google Scholar
Wolfgang Lehner
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Mathematics, Imperial College London, South Kensington Campus, SW7 2PG, London, United Kingdom
Niall M. Adams
INSA Lyon, LIRIS CNRS UMR 5205, Bâtiment Blaise Pascal, University of Lyon, F-69621, Villeurbanne, France
Céline Robardet
Department of Information and Computer Science, Universiteit Utrecht, Utrecht, The Netherlands
Arno Siebes
INSA-Lyon, LIRIS CNRS UMR5205, University of Lyon, F-69621, Villeurbanne, France
Jean-François Boulicaut

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hahmann, M., Volk, P.B., Rosenthal, F., Habich, D., Lehner, W. (2009). How to Control Clustering Results? Flexible Clustering Aggregation. In: Adams, N.M., Robardet, C., Siebes, A., Boulicaut, JF. (eds) Advances in Intelligent Data Analysis VIII. IDA 2009. Lecture Notes in Computer Science, vol 5772. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03915-7_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-03915-7_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03914-0
Online ISBN: 978-3-642-03915-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics