Pseudo-centroid clustering replaces the traditional concept of a centroid expressed as a center of gravity with the notion of a pseudo-centroid (or a coordinate free centroid) which has the advantage of applying to clustering problems where points do not have numerical coordinates (or categorical coordinates that are translated into numerical form). Such problems, for which classical centroids do not exist, are particularly important in social sciences, marketing, psychology and economics, where distances are not computed from vector coordinates but rather are expressed in terms of characteristics such as affinity relationships, psychological preferences, advertising responses, polling data and market interactions, where distances, broadly conceived, measure the similarity (or dissimilarity) of characteristics, functions or structures. We formulate a K-PC algorithm analogous to a K-Means algorithm and focus on two key types of pseudo-centroids, MinMax-centroids and (weighted) MinSum-centroids, and describe how they, respectively, give rise to a K-MinMax algorithm and a K-MinSum algorithm which are analogous to a K-Means algorithm. The K-PC algorithms are able to take advantage of problem structure to identify special diversity-based and intensity-based starting methods to generate initial pseudo-centroids and associated clusters, accompanied by theorems for the intensity-based methods that establish their ability to obtain best clusters of a selected size from the points available at each stage of construction. We also introduce a regret-threshold PC algorithm that modifies the K-PC algorithm together with an associated diversification method and a new criterion for evaluating the quality of a collection of clusters.
KeywordsClustering Centroids K-Means K-Medoids Advanced starting methods Metaheuristics
This study was not funded.
Compliance with ethical standards
Conflict of interest
The author declares that he has no conflict of interest.
This article does not contain any studies with human or animal participants.
- Anwar TM, Beck HW, Navathe SB (2012) Knowledge mining by imprecise querying: a classification-based approach. In: Proceedings of the eighth international conference on data engineering, IEEE computer society, pp 622–630. Washington, DCGoogle Scholar
- Estivill-Castro V, Lee I (2001) Fast spatial clustering with different metrics and in the presence of obstacles. GIS’01 142–147Google Scholar
- Glover F (1997) A template for scatter search and path relinking. In: J-K Hao, E Lutton, E Ronald, M Schoenauer, D Snyers (eds) Artificial evolution, lecture notes in computer science. 1363, pp 13–54. Springer, BerlinGoogle Scholar
- Kauffmann L, Rousseeuw PJ (1990) Finding groups in data—an introduction to cluster analysis. Wiley, LondonGoogle Scholar
- Kochenberger G, Glover F, Alidaee B, Wang H (2005) Clustering of microarray data via clique partitioning. J Comb Optim 10:77–92Google Scholar
- Kwon Y-J, Kim JG, Seo J, Lee DH, Kim DS (2007) A tabu search algorithm using voronoi diagram for the capacitated vehicle routing problem. In: Proceeding of 5th international conference on computational science and applications. IEEE computer society, pp 480–485Google Scholar
- MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. Proceedings of 5th Berkeley symposium on mathematical statistics and probability. University of California Press, Berkeley, pp 281–297Google Scholar
- Shamsul A, Inostroza-Ponta M, Mathieson L, Berretta R, Moscato P (2011) Clustering nodes in large-scale biological networks using external memory algorithms. In: Xiang et al (ed) ICA3PP 2011 workshops, part II, LNCS 7017, pp 375–386Google Scholar
- Strehl A, Ghosh J (2002) Relationship-based clustering and visualization for high-dimensional data mining. INFORMS J Comput 1–23Google Scholar
- Xu Y, Olman V, Xu D (2001) Minimum spanning trees for gene expression data clustering. Genome Inf 12:24–33Google Scholar