Abstract
This paper deals with an NP-hard string problem from the bio-informatics field: the repetition-free longest common subsequence problem. This problem has enjoyed an increasing interest in recent years, which has resulted in the application of several pure as well as hybrid metaheuristics. However, the literature lacks a comprehensive comparison between those approaches. Moreover, it has been shown that general purpose integer linear programming solvers are very efficient for solving many of the problem instances that were used so far in the literature. Therefore, in this work we extend the available benchmark set, adding larger instances to which integer linear programming solvers cannot be applied anymore. Moreover, we provide a comprehensive comparison of the approaches found in the literature. Based on the results we propose a hybrid between two of the best methods which turns out to inherit the complementary strengths of both methods.
Similar content being viewed by others
Notes
Default parameter values used for the experiments in Castelli et al. (2013): \(p_{\mathrm {size}} =100, c_{\mathrm {rate}} =0.9, m_{\mathrm {rate}} =0.05\).
According to the description as given in Castelli et al. (2013), this vector is re-computed inside the while-loop (lines 6–12). In our opinion, this is a description error, because re-computing V at every iteration would only introduce minor variations to V. Therefore, we decided to re-compute V only once per main iteration of the algorithm.
IBM ILOG CPLEX is an optimization software package which includes state-of-the-art exact techniques for solving integer linear programming models to optimality. It is available for free for academic purposes. For more information we refer the interested reader to http://www-01.ibm.com/software/commerce/optimization/cplex-optimizer/index.html.
References
Adi, S.S., Braga, M.D.V., Fernandes, C.G., Ferreira, C.E., Martinez, F.V., Sagot, M.F., Stefanes, M.A., Tjandraatmadja, C., Wakabayashi, Y.: Repetition-free longest common subsequence. Electron. In: Proceedings of The IV Latin-American Algorithms, Graphs, and Optimization Symposium. Notes Discret. Math. 30, 243–248 (2008)
Adi, S.S., Braga, M.D.V., Fernandes, C.G., Ferreira, C.E., Martinez, F.V., Sagot, M.F., Stefanes, M.A., Tjandraatmadja, C., Wakabayashi, Y.: Repetition-free longest common subsquence. Discret. Appl. Math. 158, 1315–1324 (2010)
Aho, A., Hopcroft, J., Ullman, J.: Data Structures and Algorithms. Addison-Wesley, Reading, MA (1983)
Blum, C., Blesa, M.J.: Construct, merge, solve and adapt: application to the repetition-free longest common subsequence problem. In: Chicano, F., Hu, B. (eds.) Proceedings of EvoCOP 2016—16th European Conference on Evolutionary Computation in Combinatorial Optimization. Lecture Notes in Computer Science, vol. 9595, pp. 46–57. Springer, Berlin (2016)
Blum, C., Blesa, M.J., Calvo, B.: Beam-ACO for the repetition-free longest common subsequence problem. In: Legrand, P., Corsini, M.M., Hao, J.K., Monmarché, N., Lutton, E., Schoenauer, M. (eds.) Proceedings of EA 2013—11th Conference on Artificial Evolution, Lecture Notes in Computer Science, vol. 8752, pp. 79–90. Springer, Berlin (2014)
Blum, C., Blesa, M.J., López-Ibáñez, M.: Beam search for the longest common subsequence problem. Comput. Oper. Res. 36(12), 3178–3186 (2009)
Blum, C., Dorigo, M.: The hyper-cube framework for ant colony optimization. IEEE Trans. Man Syst. Cybern. B 34(2), 1161–1172 (2004)
Bonizzoni, P., Della Vedova, G., Dondi, R., Fertin, G., Rizzi, R., Vialette, S.: Exemplar longest common subsequence. IEEE/ACM Trans. Comput. Biol. Bioinf. 4(4), 535–543 (2007)
Castelli, M., Beretta, S., Vanneschi, L.: A hybrid genetic algorithm for the repetition free longest common subsequence problem. Oper. Res. Lett. 41(6), 644–649 (2013)
Easton, T., Singireddy, A.: A large neighborhood search heuristic for the longest common subsequence problem. J. Heuristics 14(3), 271–283 (2008)
García, S., Fernández, A., Luengo, J., Herrera, F.: Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf. Sci. 180(10), 2044–2064 (2010)
García, S., Herrera, F.: An extension on “statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons. J. Mach. Learn. Res. 9, 2677–2694 (2008)
Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997)
Jiang, T., Lin, G., Ma, B., Zhang, K.: A general edit distance between RNA structures. J. Comput. Biol. 9(2), 371–388 (2002)
López-Ibáñez, M., Dubois-Lacoste, J., Stützle, T., Birattari, M.: The irace package, iterated race for automatic algorithm configuration. Technical report TR/IRIDIA/2011-004, IRIDIA, Université libre de Bruxelles, Belgium (2011)
Maier, D.: The complexity of some problems on subsequences and supersequences. J. ACM 25, 322–336 (1978)
Ning, K.: Deposition and extension approach to find longest common subsequence for thousands of long sequences. Comput. Biol. Chem. 34(3), 149–157 (2010)
Smith, T., Waterman, M.: Identification of common molecular subsequences. J. Mol. Biol. 147(1), 195–197 (1981)
Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147(1), 195–197 (1981)
Storer, J.: Data Compression: Methods and Theory. Computer Science Press, Rockville, MD (1988)
Tabataba, F.S., Mousavi, S.R.: A hyper-heuristic for the longest common subsequence problem. Comput. Biol. Chem. 36, 42–54 (2012)
Wang, Q., Korkin, D., Shang, Y.: A fast multiple longest common subsequence (MLCS) algorithm. IEEE Trans. Knowl. Data Eng. 23(3), 321–334 (2011)
Wang, Q., Pan, M., Shang, Y., Korkin, D.: A fast heuristic search algorithm for finding the longest common subsequence of multiple strings. In: Proceedings of AAAI—Conference on Artificial Intelligence, pp. 1287–1292 (2010)
Acknowledgements
This work was supported by Project TIN2012-37930-c02-02 (Spanish Ministry for Economy and Competitiveness, feder funds from the European Union). Maria J. Blesa acknowledges support by funds from the agaur of the Government of Catalonia under Project Ref. SGR 2014:1034 (albcom). Our experiments have been executed in the High Performance Computing environment managed by the rdlab at the Technical University of Barcelona (http://rdlab.cs.upc.edu) and we would like to thank them for their support.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Blum, C., Blesa, M.J. A comprehensive comparison of metaheuristics for the repetition-free longest common subsequence problem. J Heuristics 24, 551–579 (2018). https://doi.org/10.1007/s10732-017-9329-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10732-017-9329-x