Skip to main content

Summary and Semi-average Similarity Criteria for Individual Clusters

  • Conference paper
Models, Algorithms, and Technologies for Network Analysis

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 59))

Abstract

There exists much prejudice against the within-cluster summary similarity criterion which supposedly leads to collecting all the entities in one cluster. This is not so if the similarity matrix is preprocessed by subtraction of “noise”, of which two ways, the uniform and modularity, are analyzed in the chapter. Another criterion under consideration is the semi-average within-cluster similarity, which manifests more versatile properties. In fact, both types of criteria emerge in relation to the least-squares data approximation approach to clustering, as shown in the chapter. A very simple local optimization algorithm, Add-and-Remove(S), leads to a suboptimal cluster satisfying some tightness conditions. Three versions of an iterative extraction approach are considered, leading to a portrayal of the cluster structure of the data. Of these, probably most promising is what is referred to as the injunctive clustering approach. Applications are considered to the analysis of semantics, to integrating different knowledge aspects and consensus clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ayad, H., Kamel, M.: On voting-based consensus of cluster ensembles. Pattern Recognit. 43, 1943–1953 (2010)

    Article  MATH  Google Scholar 

  2. Bader, G.D., Hogue, C.W.V.: An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinform. 4, 2 (2003)

    Article  Google Scholar 

  3. Ben-Dor, A., Shamir, R., Yakhini, Z.: Clustering gene expression patterns. J. Comput. Biol. 6, 281–297 (1999)

    Article  Google Scholar 

  4. Dimitriadou, E., Weingessel, A., Hornik, K.: A combination scheme for fuzzy clustering. In: Proceedings of the 2002 AFSS International Conference on Fuzzy Systems, pp. 332–338 (2002)

    Google Scholar 

  5. Frumkina, R., Mirkin, B.: Sematics of domain-specific nouns: a psycho-linguistic approach. Not. Russ. Acad. Sci. Lang. Lit. 45(1), 12–22 (1986) (in Russian)

    Google Scholar 

  6. Gallo, G., Grigoriadis, M.D., Tarjan, R.E.: A fast parametric maximum flow algorithm and applications. SIAM J. Comput. 18, 30–55 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  7. Guenoche, A.: Consensus of partitions: a constructive approach. Adv. Data Anal. Classif. 5, 215–229 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  8. Jarvis, R.A., Patrick, E.A.: Clustering using a similarity measure based on shared nearest neighbors. IEEE Trans. Comput. 22, 1025–1034 (1973)

    Article  Google Scholar 

  9. Kernighan, B.W., Lin, S.: An eflicient heuristic procedure for partitioning graphs. Bell Syst. Tech. J. 49(2), 291–307 (1970)

    Article  MATH  Google Scholar 

  10. Kupershtoh, V., Mirkin, B.: A problem for automatic classification. In: Bagrinowski, K. (ed.) Mathematical Methods for Economics, pp. 39–49. Siberian Branch of Nauka Publisher, Novosibirsk (1968) (in Russian)

    Google Scholar 

  11. Kupershtoh, V., Mirkin, B., Trofimov, V.: Sum of within partition similarities as a clustering criterion. Autom. Remote Control 37(2), 548–553 (1976)

    Google Scholar 

  12. Mirkin, B.: Analysis of Categorical Features. Finansy i Statistika, Moscow (1976). 166 pp. (in Russian)

    Google Scholar 

  13. Mirkin, B.: Additive clustering and qualitative factor analysis methods for similarity matrices. J. Classif., 4, 7–31 (1987). Erratum 6, 271–272 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  14. Mirkin, B.: Mathematical Classification and Clustering. Kluwer Academic, Dordrecht (1996)

    Book  MATH  Google Scholar 

  15. Mirkin, B.: Core Concepts in Data Analysis: Summarization, Correlation, Visualization. Springer, London (2011)

    Book  Google Scholar 

  16. Mirkin, B.: Clustering: A Data Recovery Approach, 2nd edn. Chapman and Hall, Boca Raton (2012)

    Book  Google Scholar 

  17. Mirkin, B.G., Camargo, R., Fenner, T., Loizou, G., Kellam, P.: Similarity clustering of proteins using substantive knowledge and reconstruction of evolutionary gene histories in herpesvirus. Theor. Chem. Acc. 125(3–6), 569–581 (2010)

    Article  Google Scholar 

  18. Mirkin, B., Fenner, T., Galperin, M., Koonin, E.: Algorithms for computing parsimonious evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes. BMC Evol. Biol. 3, 2 (2003). www.biomedcentral.com/1471-2148/3/2/

    Article  Google Scholar 

  19. Mirkin, B., Muchnik, I.: Geometric interpretation of clustering criteria. In: Mirkin, B. (ed.) Methods for Analysis of Multidimensional Economics Data, pp. 3–11. Nauka Publishers (Siberian Branch), Novosibirsk (1981) (in Russian)

    Google Scholar 

  20. Mirkin, B., Nascimento, S.: Additive spectral method for fuzzy cluster analysis of similarity data including community structure and affinity matrices. Inf. Sci. 183, 16–34 (2012)

    Article  Google Scholar 

  21. Newman, M.E.J.: Modularity and community structure in networks. Proc. Natl. Acad. Sci. USA 103(23), 8577–8582 (2006)

    Article  Google Scholar 

  22. Newman, M., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69, 026113 (2004)

    Article  Google Scholar 

  23. Rosenberg, S., Kim, M.P.: The method of sorting as a data-gathering procedure in multivariate research. Multivar. Behav. Res. 10, 489–502 (1975)

    Article  Google Scholar 

  24. Satarov, G.A.: A non-intrusive knowledge evaluation method. Personal communication (1981)

    Google Scholar 

  25. Sevillano Dominguez, X., Socoro Carrie, J.C., Alias Pujol, F.: Fuzzy clusters combination by positional voting for robust document clustering. Procesamiento del Lenguaje Natural 43, 245–253 (2009)

    Google Scholar 

  26. Shepard, R.N., Arabie, P.: Additive clustering: representation of similarities as combinations of overlapping properties. Psychol. Rev. 86, 87–123 (1979)

    Article  Google Scholar 

  27. Shestakov, A., Mirkin, B.G.: Least square consensus clustering: criteria, methods, experiments. In: Advances in Information Retrieval. LNCS, vol. 7814, pp. 764–767 (2013)

    Chapter  Google Scholar 

  28. Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)

    Article  Google Scholar 

  29. Small, H.: Co-citation in the scientific literature: a new measure of the relationship between two documents. J. Am. Soc. Inf. Sci. 24, 265–269 (1973)

    Article  Google Scholar 

  30. Strehl, A., Ghosh, J.: Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2002)

    MathSciNet  Google Scholar 

  31. Swift, S., Tucker, A., Vinciotti, V., Martin, N., Orengo, C., Liu, X., Kellam, P.: Consensus clustering and functional interpretation of gene expression data. Genome Biol. 5, R94 (2004)

    Article  Google Scholar 

  32. Wang, H., Shan, H., Banerjee, A.: Bayesian cluster ensembles. In: Proceedings of the Ninth SIAM International Conference on Data Mining, pp. 211–222 (2009)

    Google Scholar 

Download references

Acknowledgements

In the end, I would like to express my gratitude for the partial support of this work to the International Laboratory of Decision Choice and Analysis at NRU HSE (headed by F. Aleskerov) and the Laboratory of Algorithms and Technologies for Network Analysis NRU HSE Nizhny Novgorod by means of RF government grant ag. 11.G34.31.0057 (headed by V. Kalyagin).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Boris Mirkin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media New York

About this paper

Cite this paper

Mirkin, B. (2013). Summary and Semi-average Similarity Criteria for Individual Clusters. In: Goldengorin, B., Kalyagin, V., Pardalos, P. (eds) Models, Algorithms, and Technologies for Network Analysis. Springer Proceedings in Mathematics & Statistics, vol 59. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8588-9_8

Download citation

Publish with us

Policies and ethics