Improved metaheuristics for the quartet method of hierarchical clustering
The quartet method is a novel hierarchical clustering approach where, given a set of n data objects and their pairwise dissimilarities, the aim is to construct an optimal tree from the total number of possible combinations of quartet topologies on n, where optimality means that the sum of the dissimilarities of the embedded (or consistent) quartet topologies is minimal. This corresponds to an NP-hard combinatorial optimization problem, also referred to as minimum quartet tree cost (MQTC) problem. We provide details and formulation of this challenging problem, and propose a basic greedy heuristic that is characterized by some appealing insights and findings for speeding up and simplifying the processes of solution generation and evaluation, such as the use of adjacency-like matrices to represent the topology structures of candidate solutions; fast calculation of coefficients and weights of the solution matrices; shortcuts in the enumeration of all solution permutations for a given configuration; and an iterative distance matrix reduction procedure, which greedily merges together highly connected objects which may bring lower values of the quartet cost function in a given partial solution. It will be shown that this basic greedy heuristic is able to improve consistently the performance of popular quartet clustering algorithms in the literature, namely a reduced variable neighbourhood search and a simulated annealing metaheuristic, producing novel efficient solution approaches to the MQTC problem.
KeywordsCombinatorial optimization NP-hardness Quartet trees Hierarchical clustering Data mining Metaheuristics Graphs Minimum quartet tree cost
The author Dr. Sergio Consoli wants to dedicate this work with deepest respect to the memory of Professor Kenneth Darby-Dowman, a great scientist, an excellent manager, the best supervisor, a wonderful person, a real friend.
- 4.Berry, V., Jiang, T., Kearney, P., Li, M., Wareham, T.: Quartet cleaning: improved algorithms and simulations. In: Voigt, H.M., Ebeling, W., Rechenberg, I., Schwefel, H.P. (eds.) Algorithms—Proceedings 7th European Symposium on Algorithms (ESA’99), Lecture Notes in Computer Science, vol. 1643, pp. 313–324. Springer, Berlin (1999)CrossRefGoogle Scholar
- 5.Cilibrasi, R.: The Complearn toolkit (2007). http://www.complearn.org/
- 12.Consoli, S., Korst, J., Pauws, S., Geleijnse, G.: Improved variable neighbourhood search heuristic for quartet clustering. In: Sifaleras, A., Salhi, S., Brimberg, J. (eds.) Proceedings 6th International Conference on Variable Neighborhood Search (ICVNS 2018), Lecture Notes in Computer Science, vol. 11328, pp. 1–12. Springer, Berlin (2019)Google Scholar
- 20.Geleijnse, G., Korst, J., de Boer, V.: Instance classification using co-occurrences on the web. In: Proceedings of the ISWC 2006 Workshop on Web Content Mining (WebConMine). Athens, GA (2006). http://www.dse.nl/~gijsg/webconmine.pdf
- 29.Nemenyi, P.B.: Distribution-free multiple comparisons. Ph.D. thesis, Princeton University, NJ (1963)Google Scholar
- 35.Whittaker, R.: A fast algorithm for the greedy interchange for large-scale clustering and median location problems. INFOR 21, 95–108 (1983)Google Scholar