Abstract
Ensemble approach based on aggregated models has been successfully applied in the context of supervised learning in order to increase the accuracy and stability of classification. Recently, analogous techniques for cluster analysis have been suggested. Research has proved that, by combining a set of different clusterings, an improved solution can be obtained. In the traditional way of learning from a data set, the classifiers are built in a feature space. However, an alternative way can be found by constructing decision rules on dissimilarity representations. In such a recognition process each object is described by a matrix showing the similarities or distances to the rest of training samples. This research has focused on exploiting the additional information provided by a collection of diverse clusterings to generate a co-occurrence (co-association) matrix. Taking the co-occurrences of pairs of patterns in the same cluster as votes for their association, the data partitions are mapped into a co-association matrix. This n ×n matrix represents a new similarity measure between patterns. The final data partition is obtained by clustering this matrix. In the experiments, the behavior of partitions built on co-occurrence data is studied.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Asuncion, A., & Newman, D. J. (2007). UCI machine learning repository. Irvine: School of Information and Computer Science, University of California. Retrieved from http://www.ics.uci.edu/∼mlearn/MLRepository.html.
Bezdek, J. C. (1981). Pattern recognition with fuzzy objective function algorithms. New York: Plenum.
Breiman, L. (1996). Bagging predictors. Machine Learning, 26(2), 123–140.
Fred, A. (2002). Finding consistent clusters in data partitions. In F. Roli, & J. Kittler (Eds.), Proceedings of the International Workshop on Multiple Classifier Systems (pp. 309–318). LNCS 2364.
Fred, A., & Jain, A. K. (2002). Data clustering using evidence accumulation. In Proceedings of the Sixteenth International Conference on Pattern Recognition (pp. 276–280). ICPR, Canada.
Freund, Y. (1990). Boosting a weak learning algorithm by majority. In Proceedings of the Third Annual Workshop on Computational Learning Theory (pp. 202–216).
Jain, A. K., & Dubes, R. C. (1988). Algorithms for clustering data. New Jersey: Prentice-Hall.
Jain, A., Murty, M. N., & Flynn, P. (1999). Data clustering: A review. ACM Computing Surveys, 31(3), 264–323.
Kaufman, L., & Rousseeuw, P. J. (1990). Finding groups in data: An introduction to cluster analysis. New York: Wiley.
Kuncheva, L. I., Hadjitodorov, S. T., & Todorova, L. P. (2006). Experimental comparison of cluster ensemble methods. In 9th International Conference on Information Fusion (pp. 1–7). Florence.
Pekalska, E., & Duin, R. P. W. (2000). Classifiers for dissimilarity-based pattern recognition. In A. Sanfeliu, J. J. Villanueva, M. Vanrell, R. Alquezar, A. K. Jain, & J. Kittler (Eds.), Proceedings of the Fifteenth International Conference on Pattern Recognition (pp. 12–16). Los Alamitos: IEEE Computer Society Press.
Strehl, A., & Ghosh, J. (2002). Cluster ensembles – A knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research, 3, 583–618.
Tsymbal, A., Pechenizkiy, M., & Cunningham, P. (2003). Diversity in ensemble feature selection. Technical Report, Trinity College Dublin.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rozmus, D. (2009). Cluster Ensemble Based on Co-occurrence Data. In: Fink, A., Lausen, B., Seidel, W., Ultsch, A. (eds) Advances in Data Analysis, Data Handling and Business Intelligence. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01044-6_16
Download citation
DOI: https://doi.org/10.1007/978-3-642-01044-6_16
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01043-9
Online ISBN: 978-3-642-01044-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)