Soft Computing

, Volume 21, Issue 17, pp 4901–4915 | Cite as

Large neighborhood search for the most strings with few bad columns problem

  • Evelia Lizárraga
  • Maria J. Blesa
  • Christian BlumEmail author
  • Günther R. Raidl


In this work, we consider the following NP-hard combinatorial optimization problem from computational biology. Given a set of input strings of equal length, the goal is to identify a maximum cardinality subset of strings that differ maximally in a pre-defined number of positions. First of all, we introduce an integer linear programming model for this problem. Second, two variants of a rather simple greedy strategy are proposed. Finally, a large neighborhood search algorithm is presented. A comprehensive experimental comparison among the proposed techniques shows, first, that larger neighborhood search generally outperforms both greedy strategies. Second, while large neighborhood search shows to be competitive with the stand-alone application of CPLEX for small- and medium-sized problem instances, it outperforms CPLEX in the context of larger instances.


Most strings with few bad columns Integer linear programming Large neighborhood search 



All experiments were executed in the High Performance Cluster managed by the Research and Development Lab (RDlab) of the Computer Science Dept. at the Universitat Politècnica de Catalunya ( We thank all the RDlab staff for their support. A preliminary version of this work appeared at the IEEE 2015 International Symposium on INnovations in Intelligent SysTems and Applications (INISTA), September 2–4, 2015, Madrid, Spain. This work was supported by project TIN2012-37930-C02-02 (Spanish Ministry for Economy and Competitiveness, FEDER funds from the European Union) and project SGR 2014-1034 (AGAUR, Generalitat de Catalunya). Additionally, Christian Blum acknowledges support from IKERBASQUE. Evelia Lizárraga acknowledges support from the Mexican National Council for Science and Technology (CONACYT, Doctoral Grant Number 253787).

Compliance with ethical standards

Conflict of interest

Evelia Lizárraga, Maria J. Blesa, Christian Blum, and Günther R. Raidl declare that they have no conflict of interest.

Ethical standard

This article does not contain any studies with human participants or animals performed by any of the authors.


  1. Boucher C, Landau GM, Levy A, Pritchard D, Weimann O (2013) On approximating string selection problems with outliers. Theor Comput Sci 498:107–114MathSciNetCrossRefzbMATHGoogle Scholar
  2. Gusfield D (1997) Algorithms on strings, trees, and sequences. Computer science and computational biology. Cambridge University Press, CambridgeCrossRefzbMATHGoogle Scholar
  3. Hsu WJ, Du MW (1984) Computing a longest common subsequence for a set of strings. BIT Numer Math 24(1):45–59. doi: 10.1007/BF01934514 MathSciNetCrossRefzbMATHGoogle Scholar
  4. Landau GM, Schmidt JP, Sokol D (2001) An algorithm for approxixmate tandem repeat. J Comput Biol 8(1):1–18CrossRefGoogle Scholar
  5. Lizárraga E, Blesa MJ, Blum C, Raidl GR (2015) On solving the most strings with few bad columns problem: an ILP model and heuristics. In: Proceedings of INISTA 2015—international symposium on innovations in intelligent systems and applications, IEEE Press, pp 1–8Google Scholar
  6. López-Ibáñez M, Dubois-Lacoste J, Stützle T, Birattari M (2011) The \(\sf irace\) package, iterated race for automatic algorithm configuration. Technical Report TR/IRIDIA/2011-004, IRIDIA, Université libre de Bruxelles, BelgiumGoogle Scholar
  7. Meneses C, Oliveira C, Pardalos P (2005) Optimization techniques for string selection and comparison problems in genomics. IEEE Eng Med Biol Mag 24(3):81–87CrossRefGoogle Scholar
  8. Mousavi S, Babaie M, Montazerian M (2012) An improved heuristic for the far from most strings problem. J Heuristics 18:239–262CrossRefGoogle Scholar
  9. Pappalardo E, Pardalos PM, Stracquadanio G (2013) Optimization approaches for solving string selection problems. SpringerBriefs in optimization. Springer, New YorkCrossRefzbMATHGoogle Scholar
  10. Pisinger D, Ropke S (2010) Large neighborhood search. In: Gendreau M, Potvin JY (eds) Handbook of metaheuristics, International series in operations research and management science, vol 146. Springer, New York, pp 399–419Google Scholar
  11. Rajasekaran S, Hu Y, Luo J, Nick H, Pardalos PM, Sahni S, Shaw G (2001) Efficient algorithms for similarity search. J Comb Optim 5(1):125–132MathSciNetCrossRefzbMATHGoogle Scholar
  12. Rajasekaran S, Nick H, Pardalos PM, Sahni S, Shaw G (2001) Efficient algorithms for local alignment search. J Comb Optim 5(1):117–124MathSciNetCrossRefzbMATHGoogle Scholar
  13. Smith T, Waterman M (1981) Identification of common molecular subsequences. J Mol Biol 147(1):195–197CrossRefGoogle Scholar
  14. Voß S, Fink A, Duin C (2005) Looking ahead with the pilot method. Ann Oper Res 136(1):285–302MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  • Evelia Lizárraga
    • 1
  • Maria J. Blesa
    • 1
  • Christian Blum
    • 2
    Email author
  • Günther R. Raidl
    • 3
  1. 1.Computer Science DepartmentUniversitat Politècnica de Catalunya – BarcelonaTechBarcelonaSpain
  2. 2.Artificial Intelligence Research Institute (IIIA-CSIC)BellaterraSpain
  3. 3.Institute of Computer Graphics and AlgorithmsTU WienViennaAustria

Personalised recommendations