Abstract
There exists much prejudice against the within-cluster summary similarity criterion which supposedly leads to collecting all the entities in one cluster. This is not so if the similarity matrix is preprocessed by subtraction of “noise”, of which two ways, the uniform and modularity, are analyzed in the chapter. Another criterion under consideration is the semi-average within-cluster similarity, which manifests more versatile properties. In fact, both types of criteria emerge in relation to the least-squares data approximation approach to clustering, as shown in the chapter. A very simple local optimization algorithm, Add-and-Remove(S), leads to a suboptimal cluster satisfying some tightness conditions. Three versions of an iterative extraction approach are considered, leading to a portrayal of the cluster structure of the data. Of these, probably most promising is what is referred to as the injunctive clustering approach. Applications are considered to the analysis of semantics, to integrating different knowledge aspects and consensus clustering.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ayad, H., Kamel, M.: On voting-based consensus of cluster ensembles. Pattern Recognit. 43, 1943–1953 (2010)
Bader, G.D., Hogue, C.W.V.: An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinform. 4, 2 (2003)
Ben-Dor, A., Shamir, R., Yakhini, Z.: Clustering gene expression patterns. J. Comput. Biol. 6, 281–297 (1999)
Dimitriadou, E., Weingessel, A., Hornik, K.: A combination scheme for fuzzy clustering. In: Proceedings of the 2002 AFSS International Conference on Fuzzy Systems, pp. 332–338 (2002)
Frumkina, R., Mirkin, B.: Sematics of domain-specific nouns: a psycho-linguistic approach. Not. Russ. Acad. Sci. Lang. Lit. 45(1), 12–22 (1986) (in Russian)
Gallo, G., Grigoriadis, M.D., Tarjan, R.E.: A fast parametric maximum flow algorithm and applications. SIAM J. Comput. 18, 30–55 (1989)
Guenoche, A.: Consensus of partitions: a constructive approach. Adv. Data Anal. Classif. 5, 215–229 (2011)
Jarvis, R.A., Patrick, E.A.: Clustering using a similarity measure based on shared nearest neighbors. IEEE Trans. Comput. 22, 1025–1034 (1973)
Kernighan, B.W., Lin, S.: An eflicient heuristic procedure for partitioning graphs. Bell Syst. Tech. J. 49(2), 291–307 (1970)
Kupershtoh, V., Mirkin, B.: A problem for automatic classification. In: Bagrinowski, K. (ed.) Mathematical Methods for Economics, pp. 39–49. Siberian Branch of Nauka Publisher, Novosibirsk (1968) (in Russian)
Kupershtoh, V., Mirkin, B., Trofimov, V.: Sum of within partition similarities as a clustering criterion. Autom. Remote Control 37(2), 548–553 (1976)
Mirkin, B.: Analysis of Categorical Features. Finansy i Statistika, Moscow (1976). 166 pp. (in Russian)
Mirkin, B.: Additive clustering and qualitative factor analysis methods for similarity matrices. J. Classif., 4, 7–31 (1987). Erratum 6, 271–272 (1989)
Mirkin, B.: Mathematical Classification and Clustering. Kluwer Academic, Dordrecht (1996)
Mirkin, B.: Core Concepts in Data Analysis: Summarization, Correlation, Visualization. Springer, London (2011)
Mirkin, B.: Clustering: A Data Recovery Approach, 2nd edn. Chapman and Hall, Boca Raton (2012)
Mirkin, B.G., Camargo, R., Fenner, T., Loizou, G., Kellam, P.: Similarity clustering of proteins using substantive knowledge and reconstruction of evolutionary gene histories in herpesvirus. Theor. Chem. Acc. 125(3–6), 569–581 (2010)
Mirkin, B., Fenner, T., Galperin, M., Koonin, E.: Algorithms for computing parsimonious evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes. BMC Evol. Biol. 3, 2 (2003). www.biomedcentral.com/1471-2148/3/2/
Mirkin, B., Muchnik, I.: Geometric interpretation of clustering criteria. In: Mirkin, B. (ed.) Methods for Analysis of Multidimensional Economics Data, pp. 3–11. Nauka Publishers (Siberian Branch), Novosibirsk (1981) (in Russian)
Mirkin, B., Nascimento, S.: Additive spectral method for fuzzy cluster analysis of similarity data including community structure and affinity matrices. Inf. Sci. 183, 16–34 (2012)
Newman, M.E.J.: Modularity and community structure in networks. Proc. Natl. Acad. Sci. USA 103(23), 8577–8582 (2006)
Newman, M., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69, 026113 (2004)
Rosenberg, S., Kim, M.P.: The method of sorting as a data-gathering procedure in multivariate research. Multivar. Behav. Res. 10, 489–502 (1975)
Satarov, G.A.: A non-intrusive knowledge evaluation method. Personal communication (1981)
Sevillano Dominguez, X., Socoro Carrie, J.C., Alias Pujol, F.: Fuzzy clusters combination by positional voting for robust document clustering. Procesamiento del Lenguaje Natural 43, 245–253 (2009)
Shepard, R.N., Arabie, P.: Additive clustering: representation of similarities as combinations of overlapping properties. Psychol. Rev. 86, 87–123 (1979)
Shestakov, A., Mirkin, B.G.: Least square consensus clustering: criteria, methods, experiments. In: Advances in Information Retrieval. LNCS, vol. 7814, pp. 764–767 (2013)
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)
Small, H.: Co-citation in the scientific literature: a new measure of the relationship between two documents. J. Am. Soc. Inf. Sci. 24, 265–269 (1973)
Strehl, A., Ghosh, J.: Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2002)
Swift, S., Tucker, A., Vinciotti, V., Martin, N., Orengo, C., Liu, X., Kellam, P.: Consensus clustering and functional interpretation of gene expression data. Genome Biol. 5, R94 (2004)
Wang, H., Shan, H., Banerjee, A.: Bayesian cluster ensembles. In: Proceedings of the Ninth SIAM International Conference on Data Mining, pp. 211–222 (2009)
Acknowledgements
In the end, I would like to express my gratitude for the partial support of this work to the International Laboratory of Decision Choice and Analysis at NRU HSE (headed by F. Aleskerov) and the Laboratory of Algorithms and Technologies for Network Analysis NRU HSE Nizhny Novgorod by means of RF government grant ag. 11.G34.31.0057 (headed by V. Kalyagin).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media New York
About this paper
Cite this paper
Mirkin, B. (2013). Summary and Semi-average Similarity Criteria for Individual Clusters. In: Goldengorin, B., Kalyagin, V., Pardalos, P. (eds) Models, Algorithms, and Technologies for Network Analysis. Springer Proceedings in Mathematics & Statistics, vol 59. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8588-9_8
Download citation
DOI: https://doi.org/10.1007/978-1-4614-8588-9_8
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8587-2
Online ISBN: 978-1-4614-8588-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)