Advertisement

Journal of Global Optimization

, Volume 71, Issue 3, pp 613–630 | Cite as

A sampling-based exact algorithm for the solution of the minimax diameter clustering problem

  • Daniel Aloise
  • Claudio Contardo
Article

Abstract

We consider the problem of clustering a set of points so as to minimize the maximum intra-cluster dissimilarity, which is strongly NP-hard. Exact algorithms for this problem can handle datasets containing up to a few thousand observations, largely insufficient for the nowadays needs. The most popular heuristic for this problem, the complete-linkage hierarchical algorithm, provides feasible solutions that are usually far from optimal. We introduce a sampling-based exact algorithm aimed at solving large-sized datasets. The algorithm alternates between the solution of an exact procedure on a small sample of points, and a heuristic procedure to prove the optimality of the current solution. Our computational experience shows that our algorithm is capable of solving to optimality problems containing more than 500,000 observations within moderate time limits, this is two orders of magnitude larger than the limits of previous exact methods.

Keywords

Clustering Diameter Large-scale optimization 

Notes

Acknowledgements

This research was financed by the Fonds de recherche du Québec - Nature et technologies (FRQNT) under grant no 181909 and by the Natural Sciences and Engineering Research Council of Canada (NSERC) under grants 435824-2013 and 2017-05617. These supports are gratefully acknowledged.

References

  1. 1.
    Alcock, R., Manolopoulos, Y.: Time-series similarity queries employing a feature-based approach. In: 7th Hellenic Conference on Informatics, Ioannina, Greece, pp. 27–29 (1999)Google Scholar
  2. 2.
    Alpert, C.J., Kahng, A.B.: Splitting an ordering into a partition to minimize diameter. J. Classif. 14, 51–74 (1997)MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Anderberg, M.R.: Cluster Analysis for Applications/Michael R. Anderberg. Academic Press, New York (1973)zbMATHGoogle Scholar
  4. 4.
    Blackard, J.A.: Comparison of neural networks and discriminant analysis in predicting forest cover types. Ph.D. thesis, Colorado State University (1998)Google Scholar
  5. 5.
    Bradley, P.S., Fayyad, U.M., Reina, C.: Scaling clustering algorithms to large databases. In: KDD’98 proceedings of the fourth international conference on knowledge discovery and data mining, pp. 9–15 (1998)Google Scholar
  6. 6.
    Brusco, M.J., Stahl, S.: Branch-and-Bound Applications in Combinatorial Data Analysis. Springer, New York (2006)zbMATHGoogle Scholar
  7. 7.
    Dao, T.B.H., Duong, K.C., Vrain, C.: Constrained clustering by constraint programming. Artif. Intell. 244, 70–94 (2017)MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Defays, D.: An efficient algorithm for a complete link method. Comput. J. 20(4), 364–366 (1977)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Delattre, M., Hansen, P.: Bicriterion cluster analysis. IEEE Trans. Pattern Anal. Mach. Intell. 4, 277–291 (1980)CrossRefzbMATHGoogle Scholar
  10. 10.
    Duarte, M., Hu, Y.H.: Vehicle classification in distributed sensor networks. J. Parallel Distrib. Comput. 64, 826–838 (2004)CrossRefGoogle Scholar
  11. 11.
    Fioruci, J.A.A., Toledo, F.M., Nascimento, M.A.C.V.: Heuristics for minimizing the maximum within-clusters distance. Pesquisa Operacional 32, 497–522 (2012)CrossRefGoogle Scholar
  12. 12.
    Fraley, C., Raftery, A., Wehrens, R.: Incremental model-based clustering for large datasets with small clusters. J. Comput. Graph. Stat. 14(3), 529–546 (2005)CrossRefGoogle Scholar
  13. 13.
    Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to NP-Completeness. WH Freeman, New York (1979)zbMATHGoogle Scholar
  14. 14.
    Gonzalez, T.F.: Clustering to minimize the maximum intercluster distance. Theor. Comput. Sci. 38, 293–306 (1985)MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Hansen, P., Delattre, M.: Complete-link cluster analysis by graph coloring. J. Am. Stat. Assoc. 73(362), 397–403 (1978)CrossRefzbMATHGoogle Scholar
  16. 16.
    Johnson, S.C.: Hierarchical clustering schemes. Psychometrika 32(3), 241–254 (1967)CrossRefzbMATHGoogle Scholar
  17. 17.
    Kahraman, H.T., Sagiroglu, S., Colak, I.: Developing intuitive knowledge classifier and modeling of users’ domain dependent data in web. Knowl. Based Syst. 37, 283–295 (2013)CrossRefGoogle Scholar
  18. 18.
    Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data : An Introduction to Cluster Analysis. Wiley Series in Probability and Mathematical Statistics, Wiley, New York (1990)CrossRefzbMATHGoogle Scholar
  19. 19.
    Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml. Accessed 27 Feb 2018
  20. 20.
    Lozano, L., Smith, J.C.: A backward sampling framework for interdiction problems with fortification. INFORMS J. Comput. 29(1), 123–139 (2017)MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    Östergård, P.R.: A fast algorithm for the maximum clique problem. Discrete Appl. Math. 120(1), 197–207 (2002)MathSciNetCrossRefGoogle Scholar
  22. 22.
    Prokhorov, D.: IJCNN 2001 neural network competition. Slide presentation in IJCNN, 1, 97 (2001)Google Scholar
  23. 23.
    Sibson, R.: SLINK: an opoptimal efficient algorithm for the single-link cluster method. Comput. J. 16, 30–34 (1973)MathSciNetCrossRefGoogle Scholar
  24. 24.
    Siebert, J.P.: Vehicle recognition using rule based methods. Research Memorandum TIRM-87-018, Turing Institute (1987)Google Scholar
  25. 25.
    Sørensen, T.: A method of establishing groups of equal amplitude in plant sociology based on similarity of species and its application to analyses of the vegetation on danish commons. Biol Skr 5, 1–34 (1948)Google Scholar
  26. 26.
    Torgo, L.: Regression datasets (2009). http://www.dcc.fc.up.pt/~ltorgo/Regression/DataSets.html. Accessed 27 Feb 2018
  27. 27.
    Ugulino, W., Cardador, D., Vega, K., Velloso, E., Milidiu, R., Fuks, H.: Wearable computing: Accelerometers’ data classification of body postures and movements. In: Proceedings of 21st Brazilian Symposium on Artificial Intelligence, Springer, Berlin/Heidelberg, Lecture Notes in Computer Science, pp. 52–61 (2012)Google Scholar
  28. 28.
    Uzilov, A.V., Keegan, J.M., Mathews, D.H.: Detection of non-coding rnas on the basis of predicted secondary structure formation free energy change. BMC Bioinform. 7, 173 (2006)CrossRefGoogle Scholar
  29. 29.
    Zhang, T., Ramakrishnan, R., Livny, M.: Birch: a new data clustering algorithm and its applications. Data Min. Knowl. Discrete 1(2), 141–182 (1997)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Département de génie informatique et génie logicielÉcole Polytechnique de MontréalMontrealCanada
  2. 2.Département de management et technologieESG UQÀMMontrealCanada

Personalised recommendations