Skip to main content

A comprehensive comparison of metaheuristics for the repetition-free longest common subsequence problem

Abstract

This paper deals with an NP-hard string problem from the bio-informatics field: the repetition-free longest common subsequence problem. This problem has enjoyed an increasing interest in recent years, which has resulted in the application of several pure as well as hybrid metaheuristics. However, the literature lacks a comprehensive comparison between those approaches. Moreover, it has been shown that general purpose integer linear programming solvers are very efficient for solving many of the problem instances that were used so far in the literature. Therefore, in this work we extend the available benchmark set, adding larger instances to which integer linear programming solvers cannot be applied anymore. Moreover, we provide a comprehensive comparison of the approaches found in the literature. Based on the results we propose a hybrid between two of the best methods which turns out to inherit the complementary strengths of both methods.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Notes

  1. 1.

    Default parameter values used for the experiments in Castelli et al. (2013): \(p_{\mathrm {size}} =100, c_{\mathrm {rate}} =0.9, m_{\mathrm {rate}} =0.05\).

  2. 2.

    According to the description as given in Castelli et al. (2013), this vector is re-computed inside the while-loop (lines 6–12). In our opinion, this is a description error, because re-computing V at every iteration would only introduce minor variations to V. Therefore, we decided to re-compute V only once per main iteration of the algorithm.

  3. 3.

    IBM ILOG CPLEX is an optimization software package which includes state-of-the-art exact techniques for solving integer linear programming models to optimality. It is available for free for academic purposes. For more information we refer the interested reader to http://www-01.ibm.com/software/commerce/optimization/cplex-optimizer/index.html.

References

  1. Adi, S.S., Braga, M.D.V., Fernandes, C.G., Ferreira, C.E., Martinez, F.V., Sagot, M.F., Stefanes, M.A., Tjandraatmadja, C., Wakabayashi, Y.: Repetition-free longest common subsequence. Electron. In: Proceedings of The IV Latin-American Algorithms, Graphs, and Optimization Symposium. Notes Discret. Math. 30, 243–248 (2008)

  2. Adi, S.S., Braga, M.D.V., Fernandes, C.G., Ferreira, C.E., Martinez, F.V., Sagot, M.F., Stefanes, M.A., Tjandraatmadja, C., Wakabayashi, Y.: Repetition-free longest common subsquence. Discret. Appl. Math. 158, 1315–1324 (2010)

    Article  MATH  Google Scholar 

  3. Aho, A., Hopcroft, J., Ullman, J.: Data Structures and Algorithms. Addison-Wesley, Reading, MA (1983)

    MATH  Google Scholar 

  4. Blum, C., Blesa, M.J.: Construct, merge, solve and adapt: application to the repetition-free longest common subsequence problem. In: Chicano, F., Hu, B. (eds.) Proceedings of EvoCOP 2016—16th European Conference on Evolutionary Computation in Combinatorial Optimization. Lecture Notes in Computer Science, vol. 9595, pp. 46–57. Springer, Berlin (2016)

  5. Blum, C., Blesa, M.J., Calvo, B.: Beam-ACO for the repetition-free longest common subsequence problem. In: Legrand, P., Corsini, M.M., Hao, J.K., Monmarché, N., Lutton, E., Schoenauer, M. (eds.) Proceedings of EA 2013—11th Conference on Artificial Evolution, Lecture Notes in Computer Science, vol. 8752, pp. 79–90. Springer, Berlin (2014)

  6. Blum, C., Blesa, M.J., López-Ibáñez, M.: Beam search for the longest common subsequence problem. Comput. Oper. Res. 36(12), 3178–3186 (2009)

    MathSciNet  Article  MATH  Google Scholar 

  7. Blum, C., Dorigo, M.: The hyper-cube framework for ant colony optimization. IEEE Trans. Man Syst. Cybern. B 34(2), 1161–1172 (2004)

    Article  Google Scholar 

  8. Bonizzoni, P., Della Vedova, G., Dondi, R., Fertin, G., Rizzi, R., Vialette, S.: Exemplar longest common subsequence. IEEE/ACM Trans. Comput. Biol. Bioinf. 4(4), 535–543 (2007)

    Article  MATH  Google Scholar 

  9. Castelli, M., Beretta, S., Vanneschi, L.: A hybrid genetic algorithm for the repetition free longest common subsequence problem. Oper. Res. Lett. 41(6), 644–649 (2013)

    Article  MATH  Google Scholar 

  10. Easton, T., Singireddy, A.: A large neighborhood search heuristic for the longest common subsequence problem. J. Heuristics 14(3), 271–283 (2008)

    Article  MATH  Google Scholar 

  11. García, S., Fernández, A., Luengo, J., Herrera, F.: Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf. Sci. 180(10), 2044–2064 (2010)

    Article  Google Scholar 

  12. García, S., Herrera, F.: An extension on “statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons. J. Mach. Learn. Res. 9, 2677–2694 (2008)

    MATH  Google Scholar 

  13. Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997)

    Book  MATH  Google Scholar 

  14. Jiang, T., Lin, G., Ma, B., Zhang, K.: A general edit distance between RNA structures. J. Comput. Biol. 9(2), 371–388 (2002)

    Article  Google Scholar 

  15. López-Ibáñez, M., Dubois-Lacoste, J., Stützle, T., Birattari, M.: The irace package, iterated race for automatic algorithm configuration. Technical report TR/IRIDIA/2011-004, IRIDIA, Université libre de Bruxelles, Belgium (2011)

  16. Maier, D.: The complexity of some problems on subsequences and supersequences. J. ACM 25, 322–336 (1978)

    MathSciNet  Article  MATH  Google Scholar 

  17. Ning, K.: Deposition and extension approach to find longest common subsequence for thousands of long sequences. Comput. Biol. Chem. 34(3), 149–157 (2010)

    MathSciNet  Article  MATH  Google Scholar 

  18. Smith, T., Waterman, M.: Identification of common molecular subsequences. J. Mol. Biol. 147(1), 195–197 (1981)

    Article  Google Scholar 

  19. Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147(1), 195–197 (1981)

    Article  Google Scholar 

  20. Storer, J.: Data Compression: Methods and Theory. Computer Science Press, Rockville, MD (1988)

    Google Scholar 

  21. Tabataba, F.S., Mousavi, S.R.: A hyper-heuristic for the longest common subsequence problem. Comput. Biol. Chem. 36, 42–54 (2012)

    MathSciNet  Article  MATH  Google Scholar 

  22. Wang, Q., Korkin, D., Shang, Y.: A fast multiple longest common subsequence (MLCS) algorithm. IEEE Trans. Knowl. Data Eng. 23(3), 321–334 (2011)

    Article  Google Scholar 

  23. Wang, Q., Pan, M., Shang, Y., Korkin, D.: A fast heuristic search algorithm for finding the longest common subsequence of multiple strings. In: Proceedings of AAAI—Conference on Artificial Intelligence, pp. 1287–1292 (2010)

Download references

Acknowledgements

This work was supported by Project TIN2012-37930-c02-02 (Spanish Ministry for Economy and Competitiveness, feder funds from the European Union). Maria J. Blesa acknowledges support by funds from the agaur of the Government of Catalonia under Project Ref. SGR 2014:1034 (albcom). Our experiments have been executed in the High Performance Computing environment managed by the rdlab at the Technical University of Barcelona (http://rdlab.cs.upc.edu) and we would like to thank them for their support.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Christian Blum.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Blum, C., Blesa, M.J. A comprehensive comparison of metaheuristics for the repetition-free longest common subsequence problem. J Heuristics 24, 551–579 (2018). https://doi.org/10.1007/s10732-017-9329-x

Download citation

Keywords

  • Repetition-free longest common subsequence
  • Hybrid metaheuristics
  • Matheuristic