Summary and Semi-average Similarity Criteria for Individual Clusters

Mirkin, Boris

doi:10.1007/978-1-4614-8588-9_8

Boris Mirkin^4,5

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 59))

1 Citations

Abstract

There exists much prejudice against the within-cluster summary similarity criterion which supposedly leads to collecting all the entities in one cluster. This is not so if the similarity matrix is preprocessed by subtraction of “noise”, of which two ways, the uniform and modularity, are analyzed in the chapter. Another criterion under consideration is the semi-average within-cluster similarity, which manifests more versatile properties. In fact, both types of criteria emerge in relation to the least-squares data approximation approach to clustering, as shown in the chapter. A very simple local optimization algorithm, Add-and-Remove(S), leads to a suboptimal cluster satisfying some tightness conditions. Three versions of an iterative extraction approach are considered, leading to a portrayal of the cluster structure of the data. Of these, probably most promising is what is referred to as the injunctive clustering approach. Applications are considered to the analysis of semantics, to integrating different knowledge aspects and consensus clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ayad, H., Kamel, M.: On voting-based consensus of cluster ensembles. Pattern Recognit. 43, 1943–1953 (2010)
Article MATH Google Scholar
Bader, G.D., Hogue, C.W.V.: An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinform. 4, 2 (2003)
Article Google Scholar
Ben-Dor, A., Shamir, R., Yakhini, Z.: Clustering gene expression patterns. J. Comput. Biol. 6, 281–297 (1999)
Article Google Scholar
Dimitriadou, E., Weingessel, A., Hornik, K.: A combination scheme for fuzzy clustering. In: Proceedings of the 2002 AFSS International Conference on Fuzzy Systems, pp. 332–338 (2002)
Google Scholar
Frumkina, R., Mirkin, B.: Sematics of domain-specific nouns: a psycho-linguistic approach. Not. Russ. Acad. Sci. Lang. Lit. 45(1), 12–22 (1986) (in Russian)
Google Scholar
Gallo, G., Grigoriadis, M.D., Tarjan, R.E.: A fast parametric maximum flow algorithm and applications. SIAM J. Comput. 18, 30–55 (1989)
Article MathSciNet MATH Google Scholar
Guenoche, A.: Consensus of partitions: a constructive approach. Adv. Data Anal. Classif. 5, 215–229 (2011)
Article MathSciNet MATH Google Scholar
Jarvis, R.A., Patrick, E.A.: Clustering using a similarity measure based on shared nearest neighbors. IEEE Trans. Comput. 22, 1025–1034 (1973)
Article Google Scholar
Kernighan, B.W., Lin, S.: An eflicient heuristic procedure for partitioning graphs. Bell Syst. Tech. J. 49(2), 291–307 (1970)
Article MATH Google Scholar
Kupershtoh, V., Mirkin, B.: A problem for automatic classification. In: Bagrinowski, K. (ed.) Mathematical Methods for Economics, pp. 39–49. Siberian Branch of Nauka Publisher, Novosibirsk (1968) (in Russian)
Google Scholar
Kupershtoh, V., Mirkin, B., Trofimov, V.: Sum of within partition similarities as a clustering criterion. Autom. Remote Control 37(2), 548–553 (1976)
Google Scholar
Mirkin, B.: Analysis of Categorical Features. Finansy i Statistika, Moscow (1976). 166 pp. (in Russian)
Google Scholar
Mirkin, B.: Additive clustering and qualitative factor analysis methods for similarity matrices. J. Classif., 4, 7–31 (1987). Erratum 6, 271–272 (1989)
Article MathSciNet MATH Google Scholar
Mirkin, B.: Mathematical Classification and Clustering. Kluwer Academic, Dordrecht (1996)
Book MATH Google Scholar
Mirkin, B.: Core Concepts in Data Analysis: Summarization, Correlation, Visualization. Springer, London (2011)
Book Google Scholar
Mirkin, B.: Clustering: A Data Recovery Approach, 2nd edn. Chapman and Hall, Boca Raton (2012)
Book Google Scholar
Mirkin, B.G., Camargo, R., Fenner, T., Loizou, G., Kellam, P.: Similarity clustering of proteins using substantive knowledge and reconstruction of evolutionary gene histories in herpesvirus. Theor. Chem. Acc. 125(3–6), 569–581 (2010)
Article Google Scholar
Mirkin, B., Fenner, T., Galperin, M., Koonin, E.: Algorithms for computing parsimonious evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes. BMC Evol. Biol. 3, 2 (2003). www.biomedcentral.com/1471-2148/3/2/
Article Google Scholar
Mirkin, B., Muchnik, I.: Geometric interpretation of clustering criteria. In: Mirkin, B. (ed.) Methods for Analysis of Multidimensional Economics Data, pp. 3–11. Nauka Publishers (Siberian Branch), Novosibirsk (1981) (in Russian)
Google Scholar
Mirkin, B., Nascimento, S.: Additive spectral method for fuzzy cluster analysis of similarity data including community structure and affinity matrices. Inf. Sci. 183, 16–34 (2012)
Article Google Scholar
Newman, M.E.J.: Modularity and community structure in networks. Proc. Natl. Acad. Sci. USA 103(23), 8577–8582 (2006)
Article Google Scholar
Newman, M., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69, 026113 (2004)
Article Google Scholar
Rosenberg, S., Kim, M.P.: The method of sorting as a data-gathering procedure in multivariate research. Multivar. Behav. Res. 10, 489–502 (1975)
Article Google Scholar
Satarov, G.A.: A non-intrusive knowledge evaluation method. Personal communication (1981)
Google Scholar
Sevillano Dominguez, X., Socoro Carrie, J.C., Alias Pujol, F.: Fuzzy clusters combination by positional voting for robust document clustering. Procesamiento del Lenguaje Natural 43, 245–253 (2009)
Google Scholar
Shepard, R.N., Arabie, P.: Additive clustering: representation of similarities as combinations of overlapping properties. Psychol. Rev. 86, 87–123 (1979)
Article Google Scholar
Shestakov, A., Mirkin, B.G.: Least square consensus clustering: criteria, methods, experiments. In: Advances in Information Retrieval. LNCS, vol. 7814, pp. 764–767 (2013)
Chapter Google Scholar
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)
Article Google Scholar
Small, H.: Co-citation in the scientific literature: a new measure of the relationship between two documents. J. Am. Soc. Inf. Sci. 24, 265–269 (1973)
Article Google Scholar
Strehl, A., Ghosh, J.: Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2002)
MathSciNet Google Scholar
Swift, S., Tucker, A., Vinciotti, V., Martin, N., Orengo, C., Liu, X., Kellam, P.: Consensus clustering and functional interpretation of gene expression data. Genome Biol. 5, R94 (2004)
Article Google Scholar
Wang, H., Shan, H., Banerjee, A.: Bayesian cluster ensembles. In: Proceedings of the Ninth SIAM International Conference on Data Mining, pp. 211–222 (2009)
Google Scholar

Download references

Acknowledgements

In the end, I would like to express my gratitude for the partial support of this work to the International Laboratory of Decision Choice and Analysis at NRU HSE (headed by F. Aleskerov) and the Laboratory of Algorithms and Technologies for Network Analysis NRU HSE Nizhny Novgorod by means of RF government grant ag. 11.G34.31.0057 (headed by V. Kalyagin).

Author information

Authors and Affiliations

Division of Applied Mathematics, Higher School of Economics, Moscow, Russian Federation
Boris Mirkin
Department of Computer Science and Information Systems, University of London, Birkbeck, UK
Boris Mirkin

Authors

Boris Mirkin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Boris Mirkin .

Editor information

Editors and Affiliations

Dept. of Industrial and Systems Engineer, University of Florida, Gainesville, Florida, USA
Boris I. Goldengorin
Department of Applied Mathematics, National Research University Higher School of Economics, Nizhny Novgorod, Russia
Valery A. Kalyagin
Department of Industrial & Systems Engin, University of Florida, Gainesville, Florida, USA
Panos M. Pardalos

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mirkin, B. (2013). Summary and Semi-average Similarity Criteria for Individual Clusters. In: Goldengorin, B., Kalyagin, V., Pardalos, P. (eds) Models, Algorithms, and Technologies for Network Analysis. Springer Proceedings in Mathematics & Statistics, vol 59. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8588-9_8

Download citation

DOI: https://doi.org/10.1007/978-1-4614-8588-9_8
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8587-2
Online ISBN: 978-1-4614-8588-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics