Abstract
ConsiderN entities to be classified, with given weights, and a matrix of dissimilarities between pairs of them. The split of a cluster is the smallest dissimilarity between an entity in that cluster and an entity outside it. The single-linkage algorithm provides partitions intoM clusters for which the smallest split is maximum. We consider the problems of finding maximum split partitions with exactlyM clusters and with at mostM clusters subject to the additional constraint that the sum of the weights of the entities in each cluster never exceeds a given bound. These two problems are shown to be NP-hard and reducible to a sequence of bin-packing problems. A Θ (N 2) algorithm for the particular caseM =N of the second problem is also presented. Computational experience is reported.
Résumé
SoientN objets à classifier, avec des poids donnés, et une matrice de dissimilarités entre paires de ces objets. L'écart d'une classe est la plus petite dissimilarité entre un objet de cette classe et un objet en dehors d'elle. L'algorithme du lien simple foumit des partitions enM classes dont le plus petit écart est maximum. Nous étudions comment obtenir des partitions d'écart maximum en exactementM classes ou en au plusM classes sous la contrainte additionnelle que la somme des poids des objets de chaque classe ne dépasse jamais une borne donnée. Nous montrons que ces deux problèmes sont NP-difficiles et réductibles à une séquence de problèmes de mise-en-boites (bin-packing). Nous proposons aussi un algorithme en Θ (N 2) pour le cas particulierM =N du second problème. Enfin, nous présentons des résultats de calcul.
Similar content being viewed by others
References
AHO, A. V., HOPCROFT, J. E., and ULLMAN, J. D. (1974),The Design and Analysis of Computer Algorithms, Reading: Addison-Wesley.
BELL, D. A. (1984), “Physical Record Clustering in Databases,”Kybernetes, 13, 31–37.
BELL, D. A., MCERLEAN, F. J., STEWART, P. M., and ARBUCKLE, W. (1988), “Clustering Related Tuples in Databases,”The Computer Journal, 31(3, 253–257.
BENZECRI, J. P. (1982), “Construction d'une classification ascendante hiérarchique par la recherche en chaîne des voisins réciproques,”Les Cahiers de l'Analyse des Données, (VII)2, 209–218.
COFFMAN, E. G. Jr, GAREY, M. R., and JOHNSON, D. S. (1984), “Approximation Algorithms for Bin-packing — An Updated Survey,” inAlgorithm Design for Computer System Design, Eds., G. Ausiello, M. Lucertini and P. Serafini, Heidelberg: Springer, 49–106.
DAY, H. E., and EDELSBRUNNER, H. (1984), “Efficient Algorithms for Agglomerative Hierarchical Clustering Methods,”Journal of Classification, 1, 7–24.
DELATTRE, M., and HANSEN, P. (1980), “Bicriterion Cluster Analysis,”IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-2(4, 277–291.
DIDAY, E. et al. (1979),Optimisation en classification automatique, Le Chesnay: INRIA.
FLORY, A., GUNTHER, J., and KOULOUMDJIAN, J. (1978), “Database Reorganization by Clustering Methods,”Information Systems, 3, 59–62.
GAREY, M. R., and JOHNSON, D. S. (1979),Computers and Intractability: A Guide to the Theory of NP-Completeness, San Francisco: Freeman.
GORDON, A. D. (1981),Classification: Methods for the Exploratory Analysis of Multivariate Data, New York: Chapman and Hall.
GOWER, J. C., and ROSS, G. J. S. (1969), “Minimum Spanning Trees and Single-linkage Cluster Analysis,”Applied Statistics, 18, 54–64.
HANSEN, P., JAUMARD, B., and FRANK, O. (1989), “Maximum Sum-of-Splits Clustering,”Journal of Classification, 6, 177–193.
HARTIGAN, J. A. (1975),Clustering Algorithms, New York: Wiley.
HSU, W.-L., and NEMHAUSER, G. L. (1979), “Easy and Hard Bottleneck Location Problems,”Discrete Applied Mathematics, 1, 209–215.
HUBERT, L. (1977), “Data Analysis Implications of Some Concepts Related to the Cuts of a Graph”,Journal of Mathematical Psychology, 15, 199–208.
JOHNSON, D.S. (1974), “Fast Algorithms for Bin Packing,”Journal of Computers and Systems Sciences, 8, 272–274.
KING, J. R., and NAKORNCHAI, V. (1982), “Machine-component Group Formation in Group Technology,”International Journal of Production Research, 20(2, 117–133.
KNUTH, D. E. (1973),The Art of Computer Programming, Volume 3: Sorting and Searching, Reading, Massachusetts: Addison-Wesley.
KUMAR, R. K., KUSIAK A., and VANNELLI, A. (1986), “Grouping of Parts and Components in Flexible Manufacturing Systems,”European Journal of Operational Research, 24, 387–397.
KUSIAK, A., VANNELLI, A., and KUMAR, R. K. (1986), “Clustering Analysis: Models and Algorithms,”Control and Cybernetics, 15(2, 139–153.
LECLERC, B. (1977), “An Application of Combinatorial Theory to Hierarchical Classification”, inRecent Developments in Statistics, Eds., J. R. Barra, F. Brodeau, G. Romier, and B. van Cutsem, Amsterdam: North Holland, 783–786.
MARTELLO S., and TOTH P. (1990), “Lower Bounds and Reduction Procedures for the Bin-packing Problem,”Discrete Applied Mathematics (forthcoming).
MARTELLO S., and TOTH P. (1989),Knapsack Problem Algorithms and Computer Implementation, Wiley: New York (to appear).
MATIASEVITCH, Y. Y. (1973) “Enumerable Sets Are Diophantine,”Soviet Mathematics Doklady, 11, 354–357.
MURTAGH, F. (1985), “A Survey of Algorithms for Contiguity-Constrained Clustering and Related Problems,”The Computer Journal, 28, 82–88.
PRIM, R. C. (1957), “Shortest Connection Networks and Some Generalizations,”Bell System Technical Journal, 36, 1389–1401.
RAO, M. R. (1971), “Cluster Analysis and Mathematical Programming,”Journal of the American Statistical Association, 66, 622–626.
ROSENSTIEHL, P. (1967), “L'arbre minimum d'un graphe,” inThéorie des Graphes, Rome, I.C.C., Ed., P. Rosenstiehl, Paris: Dunod, 357–368.
SPÄTH, H. (1980),Cluster Analysis Algorithms for Data Reduction and Classification of Objects, Chichester: Horwood.
VINOD, H. (1969), “Integer Programming and the Theory of Groups,”Journal of the American Statistical Association, 64, 506–519.
WAGHODEKAR, P. H. and SAHU S. (1983), “Group Technology: A Research Bibliography,”OPSEARCH, 20(4, 225–249.
ZAHN, C. T. (1971), “Graph-theoretical Methods for Detecting and Describing Gestalt Clusters,”IEEE Transactions on Computers, C-20, 68–86.
Author information
Authors and Affiliations
Additional information
Acknowledgments: Work of the first author was supported in part by AFOSR grants 0271 and 0066 to Rutgers University and was done in part during a visit to GERAD, Ecole Polytechnique de Montréal, whose support is gratefully acknowledged. Work of the second and third authors was supported by NSERC grant GP0036426 and by FCAR grant 89EQ4144. We are grateful to Silvano Martello and Paolo Toth for making available to us their program MTP for the bin-paking problem and to three anonymous referees for comments which helped to improve the presentation of the paper.
Rights and permissions
About this article
Cite this article
Hansen, P., Jaumard, B. & Musitu, K. Weight constrained maximum split clustering. Journal of Classification 7, 217–240 (1990). https://doi.org/10.1007/BF01908717
Issue Date:
DOI: https://doi.org/10.1007/BF01908717