Clustering Qualitative Data Based on Binary Equivalence Relations: Neighborhood Search Heuristics for the Clique Partitioning Problem

Brusco, Michael J.; Köhn, Hans-Friedrich

doi:10.1007/s11336-009-9126-z

Clustering Qualitative Data Based on Binary Equivalence Relations: Neighborhood Search Heuristics for the Clique Partitioning Problem

Theory and Methods
Published: 28 April 2009

Volume 74, pages 685–703, (2009)
Cite this article

Psychometrika Aims and scope Submit manuscript

Michael J. Brusco¹ &
Hans-Friedrich Köhn²

380 Accesses
27 Citations
Explore all metrics

Abstract

The clique partitioning problem (CPP) requires the establishment of an equivalence relation for the vertices of a graph such that the sum of the edge costs associated with the relation is minimized. The CPP has important applications for the social sciences because it provides a framework for clustering objects measured on a collection of nominal or ordinal attributes. In such instances, the CPP incorporates edge costs obtained from an aggregation of binary equivalence relations among the attributes. We review existing theory and methods for the CPP and propose two versions of a new neighborhood search algorithm for efficient solution. The first version (NS-R) uses a relocation algorithm in the search for improved solutions, whereas the second (NS-TS) uses an embedded tabu search routine. The new algorithms are compared to simulated annealing (SA) and tabu search (TS) algorithms from the CPP literature. Although the heuristics yielded comparable results for some test problems, the neighborhood search algorithms generally yielded the best performances for large and difficult instances of the CPP.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fuzzy graphs and their applications in finding the best route, dominant node and influence index in a network under the hesitant bipolar-valued fuzzy environment

Article Open access 20 April 2024

A Practical Fixed-Parameter Algorithm for Constructing Tree-Child Networks from Multiple Binary Trees

Article 15 February 2022

The p-Median Problem

References

Arabie, P., Hubert, L., & De Soete, G. (1996). An overview of combinatorial data analysis. In P. Arabie, L.J. Hubert, & G. De Soete (Eds.), Clustering and classification (pp. 5–64). River Edge: World Scientific.
Google Scholar
Barthélemy, J.-P., & Monjardet, B. (1981). The median procedure in cluster analysis and social choice theory. Mathematical Social Sciences, 1, 235–267.
Article Google Scholar
Barthélemy, J.-P., & Monjardet, B. (1988). The median procedure in data analysis: new results and open problems. In H.H. Bock (Ed.), Classification and related methods in data analysis (pp. 309–316). Amsterdam: North-Holland.
Google Scholar
Barthélemy, J.-P., & Monjardet, B. (1995). The median procedure for partitions. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, 19, 3–34.
Google Scholar
Blake, C.L., & Merz, C.J. (1998). UCI repository of machine learning databases. http://www.ics.uci.edu/mlearn/MLRepository.html
Borda, J.C. (1784). Mèmoire sur les élections au scrutin. Histoire de l’académie royale des sciences pour 1781. Paris.
Brusco, M.J., Jacobs, L.W., Bongiorno, R.J., Lyons, D.V., & Tang, B. (1995). Improving personnel scheduling at airline stations. Operations Research, 43, 741–751.
Article Google Scholar
Brusco, M.J., & Köhn, H.-F. (2008a). Optimal partitioning of a data set based on the p-median model. Psychometrika, 73, 89–105.
Article Google Scholar
Brusco, M.J., & Köhn, H.-F. (2008b). Comment on ‘Clustering by passing messages between data points’. Science, 319, 726.
Article PubMed Google Scholar
Brusco, M.J., & Steinley, D. (2007a). A comparison of heuristic procedures for minimum within-cluster sums of squares partitioning. Psychometrika, 72, 583–600.
Article Google Scholar
Brusco, M.J., & Steinley, D. (2007b). A variable neighborhood search method for generalized blockmodeling of two-mode binary matrices. Journal of Mathematical Psychology, 51, 325–338.
Article Google Scholar
Charon, I., & Hudry, O. (2006). Noising methods for a clique partitioning problem. Discrete Applied Mathematics, 154, 754–769.
Article Google Scholar
Condorcet, M.J.A.N. (1785). Caritat, marquis de Essai sur l’application de l’analyse à la probabilité des décisions rendues à la pluralité des voix. Paris.
De Amorim, S.G., Barthélemy, J.-P., & Ribeiro, C.C. (1992). Clustering and clique partitioning: Simulated annealing and tabu search approaches. Journal of Classification, 9, 17–41.
Article Google Scholar
Dempster, A.P., Laird, N.M., & Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society B, 39, 1–38.
Google Scholar
Dorndorf, U., & Pesch, E. (1994). Fast clustering algorithms. ORSA Journal on Computing, 6, 141–153.
Google Scholar
Forgy, E.W. (1965). Cluster analyses of multivariate data: Efficiency versus interpretability of classifications. Biometrics, 21, 768.
Google Scholar
Garcia, C.G., Pérez-Brito, D, Campos, V., & Marti, R. (2006). Variable neighborhood search for the linear ordering problem. Computers and Operations Research, 33, 3549–3565.
Article Google Scholar
Glover, F. (1989). Tabu search—Part I. ORSA Journal on Computing, 1, 190–206.
Google Scholar
Glover, F. (1990). Tabu search—Part II. ORSA Journal on Computing, 2, 4–32.
Google Scholar
Gower, J.C., & Legendre, P. (1986). Metric and Euclidean properties of dissimilarity coefficients. Journal of Classification, 5, 5–48.
Article Google Scholar
Grim, J. (2006). EM cluster analysis for categorical data. In D.-Y. Yeung, J.T. Kwok, A.L.N. Fred, F. Roll, & D. de Ridder (Eds.), Structural, syntactic, and statistical pattern recognition (pp. 640–648). Berlin: Springer.
Chapter Google Scholar
Grötschel, M., & Wakabayashi, Y. (1989). A cutting plane algorithm for a clustering problem. Mathematical Programming, 45, 59–96.
Article Google Scholar
Grötschel, M., & Wakabayashi, Y. (1990). Facets of the clique partitioning polytope. Mathematical Programming, 47, 367–387.
Article Google Scholar
Hansen, P., & Mladenović, N. (1997). Variable neighborhood search for the p-median. Location Science, 5, 207–226.
Article Google Scholar
Hansen, P., & Mladenović, N. (2001). J-Means: a new local search heuristic for minimum sum of squares clustering. Pattern Recognition, 34, 405–413.
Article Google Scholar
Hartigan, J.A. (1975). Clustering algorithms. New York: Wiley.
Google Scholar
Hartigan, J.A., & Wong, M.A. (1979). Algorithm AS136: a K-means clustering program. Applied Statistics, 28(1), 100–108.
Article Google Scholar
ILOG (1999). ILOG CPLEX 6.5 User’s manual. Mountain View, CA: Author.
Jacobs, L.W., & Brusco, M.J. (1995). Note: A local-search heuristic for large set-covering problems. Naval Research Logistics, 42, 1129–1140.
Article Google Scholar
Johnson, S.C. (1967). Hierarchical clustering schemes. Psychometrika, 32, 241–254.
Article PubMed Google Scholar
Kaufman, L., & Rousseeuw, P.J. (1990). Finding groups in data: an introduction to cluster analysis. New York: Wiley.
Google Scholar
Kemeny, J.G. (1959). Mathematics without numbers. Daedalus, 88, 577–591.
Google Scholar
Kirkpatrick, S., Gelatt, C.D., & Vecchi, M.P. (1983). Optimization by simulated annealing. Science, 220, 671–680.
Article PubMed Google Scholar
Klastorin, T. (1985). The p-median problem for cluster analysis: a comparative test using the mixture model approach. Management Science, 31, 84–95.
Article Google Scholar
Kochenberger, G., Glover, F., Alidaee, B., & Wang, H. (2005). Clustering of microarray data via clique partitioning. Journal of Combinatorial Optimization, 10, 77–92.
Article Google Scholar
MacQueen, J.B. (1967). Some methods for classification and analysis of multivariate observations. In L.M. Le Cam & J. Neyman (Eds.), Proceedings of the fifth Berkeley symposium on mathematical statistics and probability (Vol. 1, pp. 281–297). Berkeley: University of California Press.
Google Scholar
Marcotorchino, J.-F. (1981). Agrégation des similarités en classification automatique. Thèse d’Etat, Université Paris VI.
Marcotorchino, F., & Michaud, P. (1981). Heuristic approach to the similarity aggregation problem. Methods of Operations Research, 43, 395–404.
Google Scholar
McLachlan, G., & Peel, D. (2000). Finite mixture models. New York: Wiley.
Book Google Scholar
Mehrotra, A., & Trick, M. (1998). Cliques and clustering: a combinatorial approach. Operations Research Letters, 22, 1–12.
Article Google Scholar
Michaud, P., & Marcotorchino, J.-F. (1980). Optimisation en analyse des donneés relationnelles. In E. Diday (Eds.), Data analysis and informatics (pp. 655–670). Berlin: Springer.
Google Scholar
Mirkin, B.G. (1974). The problems of approximation in space of relations and qualitative data analysis. Information and Remote Control, 35, 1424–1431.
Google Scholar
Mirkin, B.G. (1979). Group choice. New York: Wiley.
Google Scholar
Mladenović, N., & Hansen, P. (1997). Variable neighborhood search. Computers and Operations Research, 24, 1097–1100.
Article Google Scholar
Oosten, M., Rutten, J., & Spieksma, F. (2001). The clique partitioning problem: facets and patching facets. Networks, 38, 209–226.
Article Google Scholar
Opitz, O., & Schader, M. (1984a). Analyse qualitativer Daten: Einführung und Übersicht. Teil 1. OR Spektrum, 6, 67–83. Analysis of qualitative data: Introduction and survey. Part 1.
Article Google Scholar
Opitz, O., & Schader, M. (1984b). Analyse qualitativer Daten: Einführung und Übersicht. Teil 2. OR Spektrum, 6, 133–140. Analysis of qualitative data: Introduction and survey. Part 2.
Article Google Scholar
Pacheco, J., & Valencia, O. (2003). Design of hybrids for the minimum sum-of-squares clustering problem. Computational Statistics and Data Analysis, 43, 235–248.
Google Scholar
Palubeckis, G. (1997). A branch-and-bound approach using polyhedral results for a clustering problem. INFORMS Journal on Computing, 9, 30–42.
Article Google Scholar
Règnier, S. (1965). Sur quelques aspects mathématiques des problèmes de classification automatique. I.C.C. Bulletin, 4, 175–191.
Google Scholar
Schader, M., & Tüshaus, U. (1985). Ein Subgradientenverfahren zur Klassifikation qualitativer Daten. OR Spektrum, 7, 1–15. A subgradient procedure for classifying qualitative data.
Article Google Scholar
Tüshaus, U. (1983). Aggregation binärer Relationen in der qualitativen Datenanalyse. Königsstein: Athenäum. Aggregation of binary relations in qualitative data analysis.
Google Scholar
Vescia, G. (1985). Descriptive classification of cetacea: whales, porpoises and dolphins. In J.-F. Marcotorchino, J.M. Proth, & J. Janssen (Eds.), Data analysis in real life environment: ins and outs of solving problems (pp. 7–24). Amsterdam: Elsevier.
Google Scholar
Wakabayashi, Y. (1986). Aggregation of binary relations: algorithmic and polyhedral investigations. PhD Thesis, Universität Augsburg, Germany.
Wakabayashi, Y. (1998). The complexity of computing medians of relations. IME-USP, 3, 323–349.
Google Scholar
Wang, H., Obremski, T., Alidaee, B., & Kochenberger, G. (2008). Clique partitioning for clustering: a comparison with K-means and latent class analysis. Communications in Statistics—Simulation and Computation, 37, 1–13.
Article Google Scholar
Ward, J.H. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58, 236–244.
Article Google Scholar
Zahn, C.T. (1964). Approximating symmetric relations by equivalence relations. SIAM Journal on Applied Mathematics, 12, 840–847.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Marketing, College of Business, Florida State University, Tallahassee, FL, 32306-1110, USA
Michael J. Brusco
University of Missouri-Columbia, Columbia, MO, USA
Hans-Friedrich Köhn

Authors

Michael J. Brusco
View author publications
You can also search for this author in PubMed Google Scholar
Hans-Friedrich Köhn
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michael J. Brusco.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Brusco, M.J., Köhn, HF. Clustering Qualitative Data Based on Binary Equivalence Relations: Neighborhood Search Heuristics for the Clique Partitioning Problem. Psychometrika 74, 685–703 (2009). https://doi.org/10.1007/s11336-009-9126-z

Download citation

Received: 23 January 2008
Revised: 29 January 2009
Accepted: 27 March 2009
Published: 28 April 2009
Issue Date: December 2009
DOI: https://doi.org/10.1007/s11336-009-9126-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Clustering Qualitative Data Based on Binary Equivalence Relations: Neighborhood Search Heuristics for the Clique Partitioning Problem

Abstract

Access this article

Similar content being viewed by others

Fuzzy graphs and their applications in finding the best route, dominant node and influence index in a network under the hesitant bipolar-valued fuzzy environment

A Practical Fixed-Parameter Algorithm for Constructing Tree-Child Networks from Multiple Binary Trees

The p-Median Problem

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Clustering Qualitative Data Based on Binary Equivalence Relations: Neighborhood Search Heuristics for the Clique Partitioning Problem

Abstract

Access this article

Similar content being viewed by others

Fuzzy graphs and their applications in finding the best route, dominant node and influence index in a network under the hesitant bipolar-valued fuzzy environment

A Practical Fixed-Parameter Algorithm for Constructing Tree-Child Networks from Multiple Binary Trees

The p-Median Problem

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation