## Abstract

Pseudo-centroid clustering replaces the traditional concept of a centroid expressed as a center of gravity with the notion of a *pseudo-centroid* (or a *coordinate free centroid*) which has the advantage of applying to clustering problems where points do not have numerical coordinates (or categorical coordinates that are translated into numerical form). Such problems, for which classical centroids do not exist, are particularly important in social sciences, marketing, psychology and economics, where distances are not computed from vector coordinates but rather are expressed in terms of characteristics such as affinity relationships, psychological preferences, advertising responses, polling data and market interactions, where distances, broadly conceived, measure the similarity (or dissimilarity) of characteristics, functions or structures. We formulate a K-PC algorithm analogous to a K-Means algorithm and focus on two key types of pseudo-centroids, *MinMax-centroids* and (weighted) *MinSum-centroids*, and describe how they, respectively, give rise to a K-MinMax algorithm and a K-MinSum algorithm which are analogous to a K-Means algorithm. The K-PC algorithms are able to take advantage of problem structure to identify special *diversity-based* and *intensity-based* starting methods to generate initial pseudo-centroids and associated clusters, accompanied by theorems for the intensity-based methods that establish their ability to obtain best clusters of a selected size from the points available at each stage of construction. We also introduce a regret-threshold PC algorithm that modifies the K-PC algorithm together with an associated diversification method and a new criterion for evaluating the quality of a collection of clusters.

## Keywords

Clustering Centroids K-Means K-Medoids Advanced starting methods Metaheuristics## Notes

### Acknowledgments

This study was not funded.

### Compliance with ethical standards

### Conflict of interest

The author declares that he has no conflict of interest.

### Ethical approval

This article does not contain any studies with human or animal participants.

## References

- Anderberg MR (1973) Cluster analysis for applications. Academic Press, LondonMATHGoogle Scholar
- Anwar TM, Beck HW, Navathe SB (2012) Knowledge mining by imprecise querying: a classification-based approach. In: Proceedings of the eighth international conference on data engineering, IEEE computer society, pp 622–630. Washington, DCGoogle Scholar
- Cao B, Glover F (2010) Creating balanced and connected clusters to improve service delivery routes in logistics planning. J Syst Sci Syst Eng 19(4):453–480CrossRefGoogle Scholar
- Cao B, Glover F, Rego C (2015) A tabu search algorithm for cohesive clustering problems. J Heuristics 21:457–477CrossRefGoogle Scholar
- Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 1:224–227CrossRefGoogle Scholar
- Estivill-Castro V, Lee I (2001) Fast spatial clustering with different metrics and in the presence of obstacles. GIS’01 142–147Google Scholar
- Fan B (2009) A hybrid spatial data clustering method for site selection: the data driven approach of GIS mining. Exp Syst Appl 36:3923–3936CrossRefGoogle Scholar
- Glover F (1994) Tabu search for nonlinear and parametric optimization (with links to genetic algorithms). Discrete Appl Math 49:231–255MathSciNetCrossRefMATHGoogle Scholar
- Glover F (1997) A template for scatter search and path relinking. In: J-K Hao, E Lutton, E Ronald, M Schoenauer, D Snyers (eds) Artificial evolution, lecture notes in computer science. 1363, pp 13–54. Springer, BerlinGoogle Scholar
- Huang Z (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min Knowl Discov 2:283–304CrossRefGoogle Scholar
- Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall, Englewood Cliffs, NJMATHGoogle Scholar
- Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20:359–392MathSciNetCrossRefMATHGoogle Scholar
- Kauffmann L, Rousseeuw PJ (1990) Finding groups in data—an introduction to cluster analysis. Wiley, LondonGoogle Scholar
- Kochenberger G, Glover F, Alidaee B, Wang H (2005) Clustering of microarray data via clique partitioning. J Comb Optim 10:77–92Google Scholar
- Kwon Y-J, Kim JG, Seo J, Lee DH, Kim DS (2007) A tabu search algorithm using voronoi diagram for the capacitated vehicle routing problem. In: Proceeding of 5th international conference on computational science and applications. IEEE computer society, pp 480–485Google Scholar
- MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. Proceedings of 5th Berkeley symposium on mathematical statistics and probability. University of California Press, Berkeley, pp 281–297Google Scholar
- Ng R, Han J (2002) CLARANS: a method for custering objects for satial data mining. IEEE Trans Knowl Data Eng 14(5):1003–1016CrossRefGoogle Scholar
- Paivinen N (2005) Clustering with a minimum spanning tree of scale-free-like structure. Pattern Recognit Lett 26(7):921–930CrossRefGoogle Scholar
- Park H-S, Jun C-H (2009) A simple and fast algorithm for K-medoids clustering. Expert Syst Appl 36:3336–3341CrossRefGoogle Scholar
- Ralambondrainy H (1995) A conceptual version of the k-means algorithm. Pattern Recognit Lett 16:1147–1157CrossRefGoogle Scholar
- Shamsul A, Inostroza-Ponta M, Mathieson L, Berretta R, Moscato P (2011) Clustering nodes in large-scale biological networks using external memory algorithms. In: Xiang et al (ed) ICA3PP 2011 workshops, part II, LNCS 7017, pp 375–386Google Scholar
- Strehl A, Ghosh J (2002) Relationship-based clustering and visualization for high-dimensional data mining. INFORMS J Comput 1–23Google Scholar
- Sudha KR, Raju YB, Sekhar AC (2012) Fuzzy C-means clustering for robust decentralized load frequency control of interconnected power system with generation rate constraint. Electr Power Energy Syst 37:58–66CrossRefGoogle Scholar
- Xu Y, Olman V, Xu D (2001) Minimum spanning trees for gene expression data clustering. Genome Inf 12:24–33Google Scholar