Skip to main content
Log in

Weight constrained maximum split clustering

  • Published:
Journal of Classification Aims and scope Submit manuscript

Abstract

ConsiderN entities to be classified, with given weights, and a matrix of dissimilarities between pairs of them. The split of a cluster is the smallest dissimilarity between an entity in that cluster and an entity outside it. The single-linkage algorithm provides partitions intoM clusters for which the smallest split is maximum. We consider the problems of finding maximum split partitions with exactlyM clusters and with at mostM clusters subject to the additional constraint that the sum of the weights of the entities in each cluster never exceeds a given bound. These two problems are shown to be NP-hard and reducible to a sequence of bin-packing problems. A Θ (N 2) algorithm for the particular caseM =N of the second problem is also presented. Computational experience is reported.

Résumé

SoientN objets à classifier, avec des poids donnés, et une matrice de dissimilarités entre paires de ces objets. L'écart d'une classe est la plus petite dissimilarité entre un objet de cette classe et un objet en dehors d'elle. L'algorithme du lien simple foumit des partitions enM classes dont le plus petit écart est maximum. Nous étudions comment obtenir des partitions d'écart maximum en exactementM classes ou en au plusM classes sous la contrainte additionnelle que la somme des poids des objets de chaque classe ne dépasse jamais une borne donnée. Nous montrons que ces deux problèmes sont NP-difficiles et réductibles à une séquence de problèmes de mise-en-boites (bin-packing). Nous proposons aussi un algorithme en Θ (N 2) pour le cas particulierM =N du second problème. Enfin, nous présentons des résultats de calcul.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • AHO, A. V., HOPCROFT, J. E., and ULLMAN, J. D. (1974),The Design and Analysis of Computer Algorithms, Reading: Addison-Wesley.

    Google Scholar 

  • BELL, D. A. (1984), “Physical Record Clustering in Databases,”Kybernetes, 13, 31–37.

    Google Scholar 

  • BELL, D. A., MCERLEAN, F. J., STEWART, P. M., and ARBUCKLE, W. (1988), “Clustering Related Tuples in Databases,”The Computer Journal, 31(3, 253–257.

    Google Scholar 

  • BENZECRI, J. P. (1982), “Construction d'une classification ascendante hiérarchique par la recherche en chaîne des voisins réciproques,”Les Cahiers de l'Analyse des Données, (VII)2, 209–218.

    Google Scholar 

  • COFFMAN, E. G. Jr, GAREY, M. R., and JOHNSON, D. S. (1984), “Approximation Algorithms for Bin-packing — An Updated Survey,” inAlgorithm Design for Computer System Design, Eds., G. Ausiello, M. Lucertini and P. Serafini, Heidelberg: Springer, 49–106.

    Google Scholar 

  • DAY, H. E., and EDELSBRUNNER, H. (1984), “Efficient Algorithms for Agglomerative Hierarchical Clustering Methods,”Journal of Classification, 1, 7–24.

    Google Scholar 

  • DELATTRE, M., and HANSEN, P. (1980), “Bicriterion Cluster Analysis,”IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-2(4, 277–291.

    Google Scholar 

  • DIDAY, E. et al. (1979),Optimisation en classification automatique, Le Chesnay: INRIA.

    Google Scholar 

  • FLORY, A., GUNTHER, J., and KOULOUMDJIAN, J. (1978), “Database Reorganization by Clustering Methods,”Information Systems, 3, 59–62.

    Google Scholar 

  • GAREY, M. R., and JOHNSON, D. S. (1979),Computers and Intractability: A Guide to the Theory of NP-Completeness, San Francisco: Freeman.

    Google Scholar 

  • GORDON, A. D. (1981),Classification: Methods for the Exploratory Analysis of Multivariate Data, New York: Chapman and Hall.

    Google Scholar 

  • GOWER, J. C., and ROSS, G. J. S. (1969), “Minimum Spanning Trees and Single-linkage Cluster Analysis,”Applied Statistics, 18, 54–64.

    Google Scholar 

  • HANSEN, P., JAUMARD, B., and FRANK, O. (1989), “Maximum Sum-of-Splits Clustering,”Journal of Classification, 6, 177–193.

    Google Scholar 

  • HARTIGAN, J. A. (1975),Clustering Algorithms, New York: Wiley.

    Google Scholar 

  • HSU, W.-L., and NEMHAUSER, G. L. (1979), “Easy and Hard Bottleneck Location Problems,”Discrete Applied Mathematics, 1, 209–215.

    Google Scholar 

  • HUBERT, L. (1977), “Data Analysis Implications of Some Concepts Related to the Cuts of a Graph”,Journal of Mathematical Psychology, 15, 199–208.

    Google Scholar 

  • JOHNSON, D.S. (1974), “Fast Algorithms for Bin Packing,”Journal of Computers and Systems Sciences, 8, 272–274.

    Google Scholar 

  • KING, J. R., and NAKORNCHAI, V. (1982), “Machine-component Group Formation in Group Technology,”International Journal of Production Research, 20(2, 117–133.

    Google Scholar 

  • KNUTH, D. E. (1973),The Art of Computer Programming, Volume 3: Sorting and Searching, Reading, Massachusetts: Addison-Wesley.

    Google Scholar 

  • KUMAR, R. K., KUSIAK A., and VANNELLI, A. (1986), “Grouping of Parts and Components in Flexible Manufacturing Systems,”European Journal of Operational Research, 24, 387–397.

    Google Scholar 

  • KUSIAK, A., VANNELLI, A., and KUMAR, R. K. (1986), “Clustering Analysis: Models and Algorithms,”Control and Cybernetics, 15(2, 139–153.

    Google Scholar 

  • LECLERC, B. (1977), “An Application of Combinatorial Theory to Hierarchical Classification”, inRecent Developments in Statistics, Eds., J. R. Barra, F. Brodeau, G. Romier, and B. van Cutsem, Amsterdam: North Holland, 783–786.

    Google Scholar 

  • MARTELLO S., and TOTH P. (1990), “Lower Bounds and Reduction Procedures for the Bin-packing Problem,”Discrete Applied Mathematics (forthcoming).

  • MARTELLO S., and TOTH P. (1989),Knapsack Problem Algorithms and Computer Implementation, Wiley: New York (to appear).

    Google Scholar 

  • MATIASEVITCH, Y. Y. (1973) “Enumerable Sets Are Diophantine,”Soviet Mathematics Doklady, 11, 354–357.

    Google Scholar 

  • MURTAGH, F. (1985), “A Survey of Algorithms for Contiguity-Constrained Clustering and Related Problems,”The Computer Journal, 28, 82–88.

    Google Scholar 

  • PRIM, R. C. (1957), “Shortest Connection Networks and Some Generalizations,”Bell System Technical Journal, 36, 1389–1401.

    Google Scholar 

  • RAO, M. R. (1971), “Cluster Analysis and Mathematical Programming,”Journal of the American Statistical Association, 66, 622–626.

    Google Scholar 

  • ROSENSTIEHL, P. (1967), “L'arbre minimum d'un graphe,” inThéorie des Graphes, Rome, I.C.C., Ed., P. Rosenstiehl, Paris: Dunod, 357–368.

    Google Scholar 

  • SPÄTH, H. (1980),Cluster Analysis Algorithms for Data Reduction and Classification of Objects, Chichester: Horwood.

    Google Scholar 

  • VINOD, H. (1969), “Integer Programming and the Theory of Groups,”Journal of the American Statistical Association, 64, 506–519.

    Google Scholar 

  • WAGHODEKAR, P. H. and SAHU S. (1983), “Group Technology: A Research Bibliography,”OPSEARCH, 20(4, 225–249.

    Google Scholar 

  • ZAHN, C. T. (1971), “Graph-theoretical Methods for Detecting and Describing Gestalt Clusters,”IEEE Transactions on Computers, C-20, 68–86.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

Acknowledgments: Work of the first author was supported in part by AFOSR grants 0271 and 0066 to Rutgers University and was done in part during a visit to GERAD, Ecole Polytechnique de Montréal, whose support is gratefully acknowledged. Work of the second and third authors was supported by NSERC grant GP0036426 and by FCAR grant 89EQ4144. We are grateful to Silvano Martello and Paolo Toth for making available to us their program MTP for the bin-paking problem and to three anonymous referees for comments which helped to improve the presentation of the paper.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hansen, P., Jaumard, B. & Musitu, K. Weight constrained maximum split clustering. Journal of Classification 7, 217–240 (1990). https://doi.org/10.1007/BF01908717

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01908717

Keywords

Navigation