Abstract
The clique partitioning problem (CPP) requires the establishment of an equivalence relation for the vertices of a graph such that the sum of the edge costs associated with the relation is minimized. The CPP has important applications for the social sciences because it provides a framework for clustering objects measured on a collection of nominal or ordinal attributes. In such instances, the CPP incorporates edge costs obtained from an aggregation of binary equivalence relations among the attributes. We review existing theory and methods for the CPP and propose two versions of a new neighborhood search algorithm for efficient solution. The first version (NS-R) uses a relocation algorithm in the search for improved solutions, whereas the second (NS-TS) uses an embedded tabu search routine. The new algorithms are compared to simulated annealing (SA) and tabu search (TS) algorithms from the CPP literature. Although the heuristics yielded comparable results for some test problems, the neighborhood search algorithms generally yielded the best performances for large and difficult instances of the CPP.
Similar content being viewed by others
References
Arabie, P., Hubert, L., & De Soete, G. (1996). An overview of combinatorial data analysis. In P. Arabie, L.J. Hubert, & G. De Soete (Eds.), Clustering and classification (pp. 5–64). River Edge: World Scientific.
Barthélemy, J.-P., & Monjardet, B. (1981). The median procedure in cluster analysis and social choice theory. Mathematical Social Sciences, 1, 235–267.
Barthélemy, J.-P., & Monjardet, B. (1988). The median procedure in data analysis: new results and open problems. In H.H. Bock (Ed.), Classification and related methods in data analysis (pp. 309–316). Amsterdam: North-Holland.
Barthélemy, J.-P., & Monjardet, B. (1995). The median procedure for partitions. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, 19, 3–34.
Blake, C.L., & Merz, C.J. (1998). UCI repository of machine learning databases. http://www.ics.uci.edu/mlearn/MLRepository.html
Borda, J.C. (1784). Mèmoire sur les élections au scrutin. Histoire de l’académie royale des sciences pour 1781. Paris.
Brusco, M.J., Jacobs, L.W., Bongiorno, R.J., Lyons, D.V., & Tang, B. (1995). Improving personnel scheduling at airline stations. Operations Research, 43, 741–751.
Brusco, M.J., & Köhn, H.-F. (2008a). Optimal partitioning of a data set based on the p-median model. Psychometrika, 73, 89–105.
Brusco, M.J., & Köhn, H.-F. (2008b). Comment on ‘Clustering by passing messages between data points’. Science, 319, 726.
Brusco, M.J., & Steinley, D. (2007a). A comparison of heuristic procedures for minimum within-cluster sums of squares partitioning. Psychometrika, 72, 583–600.
Brusco, M.J., & Steinley, D. (2007b). A variable neighborhood search method for generalized blockmodeling of two-mode binary matrices. Journal of Mathematical Psychology, 51, 325–338.
Charon, I., & Hudry, O. (2006). Noising methods for a clique partitioning problem. Discrete Applied Mathematics, 154, 754–769.
Condorcet, M.J.A.N. (1785). Caritat, marquis de Essai sur l’application de l’analyse à la probabilité des décisions rendues à la pluralité des voix. Paris.
De Amorim, S.G., Barthélemy, J.-P., & Ribeiro, C.C. (1992). Clustering and clique partitioning: Simulated annealing and tabu search approaches. Journal of Classification, 9, 17–41.
Dempster, A.P., Laird, N.M., & Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society B, 39, 1–38.
Dorndorf, U., & Pesch, E. (1994). Fast clustering algorithms. ORSA Journal on Computing, 6, 141–153.
Forgy, E.W. (1965). Cluster analyses of multivariate data: Efficiency versus interpretability of classifications. Biometrics, 21, 768.
Garcia, C.G., Pérez-Brito, D, Campos, V., & Marti, R. (2006). Variable neighborhood search for the linear ordering problem. Computers and Operations Research, 33, 3549–3565.
Glover, F. (1989). Tabu search—Part I. ORSA Journal on Computing, 1, 190–206.
Glover, F. (1990). Tabu search—Part II. ORSA Journal on Computing, 2, 4–32.
Gower, J.C., & Legendre, P. (1986). Metric and Euclidean properties of dissimilarity coefficients. Journal of Classification, 5, 5–48.
Grim, J. (2006). EM cluster analysis for categorical data. In D.-Y. Yeung, J.T. Kwok, A.L.N. Fred, F. Roll, & D. de Ridder (Eds.), Structural, syntactic, and statistical pattern recognition (pp. 640–648). Berlin: Springer.
Grötschel, M., & Wakabayashi, Y. (1989). A cutting plane algorithm for a clustering problem. Mathematical Programming, 45, 59–96.
Grötschel, M., & Wakabayashi, Y. (1990). Facets of the clique partitioning polytope. Mathematical Programming, 47, 367–387.
Hansen, P., & Mladenović, N. (1997). Variable neighborhood search for the p-median. Location Science, 5, 207–226.
Hansen, P., & Mladenović, N. (2001). J-Means: a new local search heuristic for minimum sum of squares clustering. Pattern Recognition, 34, 405–413.
Hartigan, J.A. (1975). Clustering algorithms. New York: Wiley.
Hartigan, J.A., & Wong, M.A. (1979). Algorithm AS136: a K-means clustering program. Applied Statistics, 28(1), 100–108.
ILOG (1999). ILOG CPLEX 6.5 User’s manual. Mountain View, CA: Author.
Jacobs, L.W., & Brusco, M.J. (1995). Note: A local-search heuristic for large set-covering problems. Naval Research Logistics, 42, 1129–1140.
Johnson, S.C. (1967). Hierarchical clustering schemes. Psychometrika, 32, 241–254.
Kaufman, L., & Rousseeuw, P.J. (1990). Finding groups in data: an introduction to cluster analysis. New York: Wiley.
Kemeny, J.G. (1959). Mathematics without numbers. Daedalus, 88, 577–591.
Kirkpatrick, S., Gelatt, C.D., & Vecchi, M.P. (1983). Optimization by simulated annealing. Science, 220, 671–680.
Klastorin, T. (1985). The p-median problem for cluster analysis: a comparative test using the mixture model approach. Management Science, 31, 84–95.
Kochenberger, G., Glover, F., Alidaee, B., & Wang, H. (2005). Clustering of microarray data via clique partitioning. Journal of Combinatorial Optimization, 10, 77–92.
MacQueen, J.B. (1967). Some methods for classification and analysis of multivariate observations. In L.M. Le Cam & J. Neyman (Eds.), Proceedings of the fifth Berkeley symposium on mathematical statistics and probability (Vol. 1, pp. 281–297). Berkeley: University of California Press.
Marcotorchino, J.-F. (1981). Agrégation des similarités en classification automatique. Thèse d’Etat, Université Paris VI.
Marcotorchino, F., & Michaud, P. (1981). Heuristic approach to the similarity aggregation problem. Methods of Operations Research, 43, 395–404.
McLachlan, G., & Peel, D. (2000). Finite mixture models. New York: Wiley.
Mehrotra, A., & Trick, M. (1998). Cliques and clustering: a combinatorial approach. Operations Research Letters, 22, 1–12.
Michaud, P., & Marcotorchino, J.-F. (1980). Optimisation en analyse des donneés relationnelles. In E. Diday (Eds.), Data analysis and informatics (pp. 655–670). Berlin: Springer.
Mirkin, B.G. (1974). The problems of approximation in space of relations and qualitative data analysis. Information and Remote Control, 35, 1424–1431.
Mirkin, B.G. (1979). Group choice. New York: Wiley.
Mladenović, N., & Hansen, P. (1997). Variable neighborhood search. Computers and Operations Research, 24, 1097–1100.
Oosten, M., Rutten, J., & Spieksma, F. (2001). The clique partitioning problem: facets and patching facets. Networks, 38, 209–226.
Opitz, O., & Schader, M. (1984a). Analyse qualitativer Daten: Einführung und Übersicht. Teil 1. OR Spektrum, 6, 67–83. Analysis of qualitative data: Introduction and survey. Part 1.
Opitz, O., & Schader, M. (1984b). Analyse qualitativer Daten: Einführung und Übersicht. Teil 2. OR Spektrum, 6, 133–140. Analysis of qualitative data: Introduction and survey. Part 2.
Pacheco, J., & Valencia, O. (2003). Design of hybrids for the minimum sum-of-squares clustering problem. Computational Statistics and Data Analysis, 43, 235–248.
Palubeckis, G. (1997). A branch-and-bound approach using polyhedral results for a clustering problem. INFORMS Journal on Computing, 9, 30–42.
Règnier, S. (1965). Sur quelques aspects mathématiques des problèmes de classification automatique. I.C.C. Bulletin, 4, 175–191.
Schader, M., & Tüshaus, U. (1985). Ein Subgradientenverfahren zur Klassifikation qualitativer Daten. OR Spektrum, 7, 1–15. A subgradient procedure for classifying qualitative data.
Tüshaus, U. (1983). Aggregation binärer Relationen in der qualitativen Datenanalyse. Königsstein: Athenäum. Aggregation of binary relations in qualitative data analysis.
Vescia, G. (1985). Descriptive classification of cetacea: whales, porpoises and dolphins. In J.-F. Marcotorchino, J.M. Proth, & J. Janssen (Eds.), Data analysis in real life environment: ins and outs of solving problems (pp. 7–24). Amsterdam: Elsevier.
Wakabayashi, Y. (1986). Aggregation of binary relations: algorithmic and polyhedral investigations. PhD Thesis, Universität Augsburg, Germany.
Wakabayashi, Y. (1998). The complexity of computing medians of relations. IME-USP, 3, 323–349.
Wang, H., Obremski, T., Alidaee, B., & Kochenberger, G. (2008). Clique partitioning for clustering: a comparison with K-means and latent class analysis. Communications in Statistics—Simulation and Computation, 37, 1–13.
Ward, J.H. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58, 236–244.
Zahn, C.T. (1964). Approximating symmetric relations by equivalence relations. SIAM Journal on Applied Mathematics, 12, 840–847.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Brusco, M.J., Köhn, HF. Clustering Qualitative Data Based on Binary Equivalence Relations: Neighborhood Search Heuristics for the Clique Partitioning Problem. Psychometrika 74, 685–703 (2009). https://doi.org/10.1007/s11336-009-9126-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11336-009-9126-z