Soft Computing

, Volume 21, Issue 22, pp 6571–6592 | Cite as

Pseudo-centroid clustering

Foundations
  • 116 Downloads

Abstract

Pseudo-centroid clustering replaces the traditional concept of a centroid expressed as a center of gravity with the notion of a pseudo-centroid (or a coordinate free centroid) which has the advantage of applying to clustering problems where points do not have numerical coordinates (or categorical coordinates that are translated into numerical form). Such problems, for which classical centroids do not exist, are particularly important in social sciences, marketing, psychology and economics, where distances are not computed from vector coordinates but rather are expressed in terms of characteristics such as affinity relationships, psychological preferences, advertising responses, polling data and market interactions, where distances, broadly conceived, measure the similarity (or dissimilarity) of characteristics, functions or structures. We formulate a K-PC algorithm analogous to a K-Means algorithm and focus on two key types of pseudo-centroids, MinMax-centroids and (weighted) MinSum-centroids, and describe how they, respectively, give rise to a K-MinMax algorithm and a K-MinSum algorithm which are analogous to a K-Means algorithm. The K-PC algorithms are able to take advantage of problem structure to identify special diversity-based and intensity-based starting methods to generate initial pseudo-centroids and associated clusters, accompanied by theorems for the intensity-based methods that establish their ability to obtain best clusters of a selected size from the points available at each stage of construction. We also introduce a regret-threshold PC algorithm that modifies the K-PC algorithm together with an associated diversification method and a new criterion for evaluating the quality of a collection of clusters.

Keywords

Clustering Centroids K-Means K-Medoids Advanced starting methods Metaheuristics 

References

  1. Anderberg MR (1973) Cluster analysis for applications. Academic Press, LondonMATHGoogle Scholar
  2. Anwar TM, Beck HW, Navathe SB (2012) Knowledge mining by imprecise querying: a classification-based approach. In: Proceedings of the eighth international conference on data engineering, IEEE computer society, pp 622–630. Washington, DCGoogle Scholar
  3. Cao B, Glover F (2010) Creating balanced and connected clusters to improve service delivery routes in logistics planning. J Syst Sci Syst Eng 19(4):453–480CrossRefGoogle Scholar
  4. Cao B, Glover F, Rego C (2015) A tabu search algorithm for cohesive clustering problems. J Heuristics 21:457–477CrossRefGoogle Scholar
  5. Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 1:224–227CrossRefGoogle Scholar
  6. Estivill-Castro V, Lee I (2001) Fast spatial clustering with different metrics and in the presence of obstacles. GIS’01 142–147Google Scholar
  7. Fan B (2009) A hybrid spatial data clustering method for site selection: the data driven approach of GIS mining. Exp Syst Appl 36:3923–3936CrossRefGoogle Scholar
  8. Glover F (1994) Tabu search for nonlinear and parametric optimization (with links to genetic algorithms). Discrete Appl Math 49:231–255MathSciNetCrossRefMATHGoogle Scholar
  9. Glover F (1997) A template for scatter search and path relinking. In: J-K Hao, E Lutton, E Ronald, M Schoenauer, D Snyers (eds) Artificial evolution, lecture notes in computer science. 1363, pp 13–54. Springer, BerlinGoogle Scholar
  10. Huang Z (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min Knowl Discov 2:283–304CrossRefGoogle Scholar
  11. Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall, Englewood Cliffs, NJMATHGoogle Scholar
  12. Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20:359–392MathSciNetCrossRefMATHGoogle Scholar
  13. Kauffmann L, Rousseeuw PJ (1990) Finding groups in data—an introduction to cluster analysis. Wiley, LondonGoogle Scholar
  14. Kochenberger G, Glover F, Alidaee B, Wang H (2005) Clustering of microarray data via clique partitioning. J Comb Optim 10:77–92Google Scholar
  15. Kwon Y-J, Kim JG, Seo J, Lee DH, Kim DS (2007) A tabu search algorithm using voronoi diagram for the capacitated vehicle routing problem. In: Proceeding of 5th international conference on computational science and applications. IEEE computer society, pp 480–485Google Scholar
  16. MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. Proceedings of 5th Berkeley symposium on mathematical statistics and probability. University of California Press, Berkeley, pp 281–297Google Scholar
  17. Ng R, Han J (2002) CLARANS: a method for custering objects for satial data mining. IEEE Trans Knowl Data Eng 14(5):1003–1016CrossRefGoogle Scholar
  18. Paivinen N (2005) Clustering with a minimum spanning tree of scale-free-like structure. Pattern Recognit Lett 26(7):921–930CrossRefGoogle Scholar
  19. Park H-S, Jun C-H (2009) A simple and fast algorithm for K-medoids clustering. Expert Syst Appl 36:3336–3341CrossRefGoogle Scholar
  20. Ralambondrainy H (1995) A conceptual version of the k-means algorithm. Pattern Recognit Lett 16:1147–1157CrossRefGoogle Scholar
  21. Shamsul A, Inostroza-Ponta M, Mathieson L, Berretta R, Moscato P (2011) Clustering nodes in large-scale biological networks using external memory algorithms. In: Xiang et al (ed) ICA3PP 2011 workshops, part II, LNCS 7017, pp 375–386Google Scholar
  22. Strehl A, Ghosh J (2002) Relationship-based clustering and visualization for high-dimensional data mining. INFORMS J Comput 1–23Google Scholar
  23. Sudha KR, Raju YB, Sekhar AC (2012) Fuzzy C-means clustering for robust decentralized load frequency control of interconnected power system with generation rate constraint. Electr Power Energy Syst 37:58–66CrossRefGoogle Scholar
  24. Xu Y, Olman V, Xu D (2001) Minimum spanning trees for gene expression data clustering. Genome Inf 12:24–33Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  1. 1.School of Engineering and Applied ScienceUniversity of ColoradoBoulderUSA

Personalised recommendations