Advertisement

Improved metaheuristics for the quartet method of hierarchical clustering

  • Sergio ConsoliEmail author
  • Jan Korst
  • Steffen Pauws
  • Gijs Geleijnse
Article

Abstract

The quartet method is a novel hierarchical clustering approach where, given a set of n data objects and their pairwise dissimilarities, the aim is to construct an optimal tree from the total number of possible combinations of quartet topologies on n, where optimality means that the sum of the dissimilarities of the embedded (or consistent) quartet topologies is minimal. This corresponds to an NP-hard combinatorial optimization problem, also referred to as minimum quartet tree cost (MQTC) problem. We provide details and formulation of this challenging problem, and propose a basic greedy heuristic that is characterized by some appealing insights and findings for speeding up and simplifying the processes of solution generation and evaluation, such as the use of adjacency-like matrices to represent the topology structures of candidate solutions; fast calculation of coefficients and weights of the solution matrices; shortcuts in the enumeration of all solution permutations for a given configuration; and an iterative distance matrix reduction procedure, which greedily merges together highly connected objects which may bring lower values of the quartet cost function in a given partial solution. It will be shown that this basic greedy heuristic is able to improve consistently the performance of popular quartet clustering algorithms in the literature, namely a reduced variable neighbourhood search and a simulated annealing metaheuristic, producing novel efficient solution approaches to the MQTC problem.

Keywords

Combinatorial optimization NP-hardness Quartet trees Hierarchical clustering Data mining Metaheuristics Graphs Minimum quartet tree cost 

Notes

Acknowledgements

The author Dr. Sergio Consoli wants to dedicate this work with deepest respect to the memory of Professor Kenneth Darby-Dowman, a great scientist, an excellent manager, the best supervisor, a wonderful person, a real friend.

References

  1. 1.
    Aarts, E., Korst, J.: Simulated Annealing and Boltzmann Machines: A Stochastic Approach to Combinatorial Optimization and Neural Computing. Wiley, Chichester (1988)zbMATHGoogle Scholar
  2. 2.
    Aarts, E., Korst, J., Michiels, W.: Simulated annealing. In: Burke, E.K., Kendall, G. (eds.) Search Methodologies: Introductory Tutorials in Optimization and Decision Support Techniques, pp. 187–210. Springer, Berlin (2005)CrossRefGoogle Scholar
  3. 3.
    Ben-Dor, A., Chor, B., Graur, D., Ophir, R., Pelleg, D.: Constructing phylogenies from quartets: elucidation of eutherian superordinal relationships. J. Comput. Biol. 5(3), 377–390 (1998)CrossRefGoogle Scholar
  4. 4.
    Berry, V., Jiang, T., Kearney, P., Li, M., Wareham, T.: Quartet cleaning: improved algorithms and simulations. In: Voigt, H.M., Ebeling, W., Rechenberg, I., Schwefel, H.P. (eds.) Algorithms—Proceedings 7th European Symposium on Algorithms (ESA’99), Lecture Notes in Computer Science, vol. 1643, pp. 313–324. Springer, Berlin (1999)CrossRefGoogle Scholar
  5. 5.
    Cilibrasi, R.: The Complearn toolkit (2007). http://www.complearn.org/
  6. 6.
    Cilibrasi, R., Vitányi, P.M.B.: Clustering by compression. IEEE Trans. Inf. Theory 51(4), 1523–1545 (2005)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Cilibrasi, R., Vitányi, P.M.B.: The google similarity distance. IEEE Trans. Knowl. Data Eng. 19(3), 370–383 (2007)CrossRefGoogle Scholar
  8. 8.
    Cilibrasi, R., Vitányi, P.M.B.: A fast quartet tree heuristic for hierarchical clustering. Pattern Recognit. 44(3), 662–677 (2011)CrossRefGoogle Scholar
  9. 9.
    Cilibrasi, R., Vitányi, P.M.B., de Wolf, R.: Algorithmic clustering of music based on string compression. Comput. Music J. 28(4), 49–67 (2004)CrossRefGoogle Scholar
  10. 10.
    Consoli, S., Darby-Dowman, K., Geleijnse, G., Korst, J., Pauws, S.: Heuristic approaches for the quartet method of hierarchical clustering. IEEE Trans. Knowl. Data Eng. 22(10), 1428–1443 (2010)CrossRefGoogle Scholar
  11. 11.
    Consoli, S., Korst, J., Pauws, S., Geleijnse, G.: An exact algorithm for the minimum quartet tree cost problem. 4OR Q. J. Oper. Res. 17(4), 401–425 (2019).  https://doi.org/10.1007/s10288-018-0394-2 MathSciNetCrossRefGoogle Scholar
  12. 12.
    Consoli, S., Korst, J., Pauws, S., Geleijnse, G.: Improved variable neighbourhood search heuristic for quartet clustering. In: Sifaleras, A., Salhi, S., Brimberg, J. (eds.) Proceedings 6th International Conference on Variable Neighborhood Search (ICVNS 2018), Lecture Notes in Computer Science, vol. 11328, pp. 1–12. Springer, Berlin (2019)Google Scholar
  13. 13.
    Consoli, S., Stilianakis, N.I.: A quartet method based on variable neighborhood search for biomedical literature extraction and clustering. Int. Trans. Oper. Res. 24(3), 537–558 (2017)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Davidović, T.: Scheduling heuristic for dense task graphs. Yugosl. J. Oper. Res. 10, 113–136 (2000)zbMATHGoogle Scholar
  15. 15.
    Demśar, J.: Statistical comparison of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)MathSciNetzbMATHGoogle Scholar
  16. 16.
    Diestel, R.: Graph Theory. Springer, New York (2000)zbMATHGoogle Scholar
  17. 17.
    Felsenstein, J.: Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evolut. 17(6), 368–376 (1981)CrossRefGoogle Scholar
  18. 18.
    Friedman, M.: A comparison of alternative tests of significance for the problem of m rankings. Ann. Math. Stat. 11, 86–92 (1940)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Furnas, G.W.: The generation of random, binary unordered trees. J. Classif. 1(1), 187–233 (1984)MathSciNetCrossRefGoogle Scholar
  20. 20.
    Geleijnse, G., Korst, J., de Boer, V.: Instance classification using co-occurrences on the web. In: Proceedings of the ISWC 2006 Workshop on Web Content Mining (WebConMine). Athens, GA (2006). http://www.dse.nl/~gijsg/webconmine.pdf
  21. 21.
    Glover, F.: Future paths for integer programming and links to artificial intelligence. Comput. Oper. Res. 13, 533–549 (1986)MathSciNetCrossRefGoogle Scholar
  22. 22.
    Granados, A., Cebrian, M., Camacho, D., Rodriguez, F.B.: Reducing the loss of information through annealing text distortion. IEEE Trans. Knowl. Data Eng. 23(7), 1090–1102 (2011)CrossRefGoogle Scholar
  23. 23.
    Hansen, P., Mladenović, N.: Variable neighborhood search. In: Marti, R., Pardalos, P.M., Resende, M.G.C. (eds.) Handbook of Heuristics, Chap. 15, pp. 759–787. Springer Nature, Berlin (2018)CrossRefGoogle Scholar
  24. 24.
    Hansen, P., Mladenović, N., Perez-Brito, D.: Variable neighborhood decomposition search. J. Heurist. 7, 335–350 (2001)CrossRefGoogle Scholar
  25. 25.
    Jiang, T., Kearney, P., Li, M.: A polynomial time approximation scheme for inferring evolutionary trees from quartet topologies and its application. SIAM J. Comput. 30(6), 1942–1961 (2000)MathSciNetCrossRefGoogle Scholar
  26. 26.
    Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P.: Optimization by simulated annealing. Science 220(4598), 671–680 (1983)MathSciNetCrossRefGoogle Scholar
  27. 27.
    Li, M., Vitányi, P.M.B.: An Introduction to Kolmogorov Complexity and Its Applications, 2nd edn. Springer, New York (1997)CrossRefGoogle Scholar
  28. 28.
    Mladenović, N., Petrović, J., Kovačević-Vujčić, V., Čangalović, M.: Solving spread spectrum radar polyphase code design problem by tabu search and variable neighbourhood search. Eur. J. Oper. Res. 151(2), 389–399 (2003)MathSciNetCrossRefGoogle Scholar
  29. 29.
    Nemenyi, P.B.: Distribution-free multiple comparisons. Ph.D. thesis, Princeton University, NJ (1963)Google Scholar
  30. 30.
    Pei, J., Darzić, Z., Drazić, M., Mladenović, N., Pardalos, P.: Continuous variable neighborhood search (C-VNS) for solving systems of nonlinear equations. INFORMS J. Comput. 31, 235–250 (2019)MathSciNetCrossRefGoogle Scholar
  31. 31.
    Rokas, A., Williams, B.L., King, N., Carroll, S.B.: Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature 425(6960), 798–804 (2003)CrossRefGoogle Scholar
  32. 32.
    Steel, M.A.: The complexity of reconstructiong trees from qualitative characters and subtrees. J. Classif. 9, 91–116 (1992)CrossRefGoogle Scholar
  33. 33.
    Strimmer, K., von Haeseler, A.: Quartet puzzling: a quartet maximum-likelihood method for reconstructing tree topologies. Mol. Biol. Evolut. 13(7), 964–969 (1996)CrossRefGoogle Scholar
  34. 34.
    Weyer-Menkhoff, J., Devauchelle, C., Grossmann, A., Grünewald, S.: Integer linear programming as a tool for constructing trees from quartet data. Comput. Biol. Chem. 29(3), 196–203 (2005)CrossRefGoogle Scholar
  35. 35.
    Whittaker, R.: A fast algorithm for the greedy interchange for large-scale clustering and median location problems. INFOR 21, 95–108 (1983)Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Authors and Affiliations

  1. 1.Philips ResearchEindhovenThe Netherlands
  2. 2.TiCCTilburg UniversityTilburgThe Netherlands
  3. 3.Netherlands Comprehensive Cancer Organisation (IKNL)EindhovenThe Netherlands

Personalised recommendations