Abstract
The increasing amount of genomic data and the ability to synthesize artificial DNA constructs poses a series of challenging problems involving the identification and design of sequences with specific properties. We address the identification of such sequences; many of these problems present challenges both at biological and computational level. In this chapter, we introduce the main string selection problems and the theoretical and experimental results for the most important instances.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Informally, the goal of parameterized complexity is to study how the different parameters of the input instance affect the running time of the algorithm.
- 2.
ZPP is the Zero-error Probabilistic Polynomial Time complexity class. It is defined as the class of languages recognized by probabilistic Turing machine with polynomial bounded average run time and zero error probability [30].
- 3.
APX is defined as the class of all NP-optimization problems P such that, for some r ≥ 1, there exists a polynomial time r-approximate algorithm for P [3].
- 4.
FPT denotes the class of fixed-parameter tractable problems, which are problems that can be solved in time \(f(k)\vert x{\vert }^{\mathcal{O}(1)}\) for some computable function f.
References
Amir, A., Paryenty, H., Roditty, L.: Configurations and minority in the string consensus problem. In: String Processing and Information Retrieval, pp. 42–53. Springer, Berlin (2012)
Andoni, A., Indyk, P., Patrascu, M.: On the optimality of the dimensionality reduction method. In: 47th Annual IEEE Symposium on Foundations of Computer Science, 2006 (FOCS’06), pp. 449–458. IEEE, New York (2006)
Ausiello, G.: Complexity and approximation: Combinatorial optimization problems and their approximability properties. Springer, Berlin (1999)
Babaie, M., Mousavi, S.: A memetic algorithm for closest string problem and farthest string problem. In: 18th Iranian Conference on Electrical Engineering (ICEE), pp. 570–575. IEEE, New York (2010)
Bahredar, F., Javadi, H., Moghadam, R., Erfani, H., Navidi, H.: A meta heuristic solution for closest substring problem using ant colony system. Adv. Stud. Biol. 2(4), 179–189 (2010)
Ben-Dor, A., Lancia, G., Ravi, R., Perone, J.: Banishing bias from consensus sequences. In: Combinatorial Pattern Matching, pp. 247–261. Springer, Berlin (1997)
Booker, L., Goldberg, D., Holland, J.: Classifier systems and genetic algorithms. In: Machine Learning: Paradigms and Methods Table of Contents, pp. 235–282 (1990)
Boucher, C., Ma, B.: Closest string with outliers. BMC bioinformatics, 12(Suppl 1), S55 (2011)
Boucher, C., Landau, G.M., Levy, A., Pritchard, D., Weimann, O.: On approximating string selection problems with outliers. In: Proceedings of the 23rd Annual Conference on Combinatorial Pattern Matching, pp. 427–438. Springer, Berlin (2012)
Calhoun, J., Graham, J., Jiang, H.: On using a graphics processing unit to solve the closest substring problem. In: International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA) (2011)
Casacuberta, F., de Antonio, M.: A greedy algorithm for computing approximate median strings. In: Proceedings of Spanish Symposium on Pattern Recognition and Image Analysis, pp. 193–198. AERFAI (1997)
Chen, Z.Z., Ma, B., Wang, L.: A three-string approach to the closest string problem. J. Comput. Syst. Sci., 78(1), 164–178 (2012)
Chimani, M., Woste, M., Böcker, S.: A closer look at the closest string and closest substring problem. In: Proceedings of the 13th Workshop on Algorithm Engineering and Experiments (ALENEX), pp. 13–24 (2011)
Della Croce, F., Salassa, F.: Improved lp-based algorithms for the closest string problem. Comput. Oper. Res. 39(3), 746–749 (2012)
Deng, X., Li, G., Li, Z., Ma, B., Wang, L.: A PTAS for distinguishing (sub)string selection. In: Automata, Languages and Programming, pp. 788–788 (2002)
Deng, X., Li, G., Wang, L.: Center and distinguisher for strings with unbounded alphabet. J. Comb. Optim. 6(4), 383–400 (2002)
Deng, X., Li, G., Li, Z., Ma, B., Wang, L.: Genetic design of drugs without side-effects. SIAM J. Comput. 32(4), 1073–1090 (2003)
Dinu, L., Ionescu, R.: A genetic approximation of closest string via rank distance. In: 13th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), pp. 207–214. IEEE, New York (2011)
Dinu, L., Ionescu, R.: An efficient rank based approach for closest string and closest substring. PloS One 7(6), e37576 (2012)
Dorigo, M.: Optimization, learning and natural algorithms. Ph.D. thesis, Dipartimento di Elettronica, Politecnico di Milano (1992)
Dorigo, M., Caro, G., Gambardella, L.: Ant algorithms for discrete optimization. Artif. Life 5(2), 137–172 (1999)
Evans, P., Smith, A.: Complexity of approximating closest substring problems. In: Fundamentals of Computation Theory, pp. 13–47. Springer, Berlin (2003)
Faro, S., Pappalardo, E.: Ant-CSP: An ant colony optimization algorithm for the closest string problem. In: SOFSEM 2010: Theory and Practice of Computer Science, pp. 370–381. Springer Berlin Heidelberg (2010)
Fellows, M., Gramm, J., Niedermeier, R.: On the parameterized intractability of closest substring and related problems. In: STACS 2002, pp. 262–273. Springer Berlin Heidelberg (2002)
Festa, P.: On some optimization problems in molecular biology. Math. Biosci. 207(2), 219–234 (2007)
Festa, P., Pardalos, P.M.: Efficient solutions for the far from most string problem. Ann. Oper. Res. 196(1), 663–682 (2012)
Frances, M., Litman, A.: On covering problems of codes. Theor. Comput. Syst. 30(2), 113–119 (1997)
Ga̧sieniec, L., Jansson, J., Lingas, A.: Efficient approximation algorithms for the Hamming center problem. In: Proceedings of the Tenth Annual ACM-SIAM Symposium on Discrete Algorithms: Society for Industrial and Applied Mathematics, pp. 905–906 (1999)
Gilkerson, J., Jaromczyk, J.: The genetic algorithm scheme for consensus sequences. In: IEEE Congress on Evolutionary Computation, 2007 (CEC 2007), pp. 3870–3878. IEEE, New York (2007)
Gill, J.: Computational complexity of probabilistic turing machines. SIAM J. Comput. 6(4), 675–695 (1977)
Goldberg, D., Holland, J.: Genetic algorithms and machine learning. Mach. Learn. 3(2), 95–99 (1988)
Gomes, F., Meneses, C., Pardalos, P., Viana, G.: A parallel multistart algorithm for the closest string problem. Comput. Oper. Res. 35(11), 3636–3643 (2008)
Gramm, J., Niedermeier, R., Rossmanith, P.: Exact solutions for closest string and related problems. Algorithms and Computation, pp. 441–453. Springer Berlin Heidelberg (2001)
Gramm, J., Niedermeier, R., Rossmanith, P.: Fixed-parameter algorithms for closest string and 743 related problems. Algorithmica 37(1), 25-42 (2003)
Gramm, J., Guo, J., Niedermeier, R.: On exact and approximation algorithms for distinguishing substring selection. In: Proceedings of Fundamentals of Computation Theory: 14th International Symposium (FCT 2003), Malmö, 12–15 August 2003, vol. 14, p. 195. Springer, Berlin (2003)
Gramm, J., Guo, J., Niedermeier, R.: Parameterized intractability of distinguishing substring selection. Theor. Comput. Syst. 39(4), 545–560 (2006)
Gusfield, D.: Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997)
Guyon, I., Schomaker, L., Plamondon, R., Liberman, M., Janet, S.: UNIPEN project of on-line data exchange and recognizer benchmarks. In: Proceedings of the 12th IAPR International Conference on Pattern Recognition, vol. 2-Conference B: Computer Vision & Image Processing, vol. 2, pp. 29–33. IEEE, New York (1994)
de la Higuera, C., Casacuberta, F.: Topology of strings: median string is NP-complete. Theor. Comput. Sci. 230(1), 39–48 (2000)
Holland, J.: Adaptation in Natural and Artificial Systems. MIT, Cambridge (1992)
Jiang, X., Abegglen, K., Bunke, H., Csirik, J.: Dynamic computation of generalised median strings. Pattern Anal. Appl. 6(3), 185–193 (2003)
Jiang, X., Bunke, H., Csirik, J.: Median strings: a review. In: Data Mining in Time Series Databases, pp. 173–192 (2004)
Jiang, X., Wentker, J., Ferrer, M.: Generalized median string computation by means of string embedding in vector spaces. Pattern Recognit. Lett. 33(7), 842–852 (2012)
Juan, A., Vidal, E.: Fast median search in metric spaces. In: Advances in Pattern Recognition, pp. 905–912. Springer Berlin Heidelberg (1998)
Julstrom, B.: A data-based coding of candidate strings in the closest string problem. In: Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers, pp. 2053–2058. Association for Computing Machinery (2009)
Keith, J., Adams, P., Bryant, D., Kroese, D., Mitchelson, K., Cochran, D., Lala, G.: A simulated annealing algorithm for finding consensus sequences. Bioinformatics 18(11), 1494–1499 (2002)
Kelsey, T., Kotthoff, L.: The exact closest string problem as a constraint satisfaction problem. Arxiv preprint arXiv:1005.0089 (2010)
Kohonen, T.: Median strings. Pattern Recognit. Lett. 3(5), 309–313 (1985)
Kruskal, J.B.: An overview of sequence comparison: time warps, string edits, and macromolecules. SIAM Rev. 25(2), 201–237 (1983)
Kruzslicz, F.: Improved greedy algorithm for computing approximate median strings. Acta Cybern. 14(2), 331–340 (1999)
Lanctot, J.K., Li, M., Ma, B., Wang, S., Zhang, L.: Distinguishing string selection problems. In: Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms, pp. 633–642. Society for Industrial and Applied Mathematics (1999)
Li, M., Ma, B., Wang, L.: Finding similar regions in many strings. In: Proceedings of the Thirty-first Annual ACM Symposium on Theory of computing, pp. 473–482. Association for Computing Machinery (1999)
Li, M., Ma, B., Wang, L.: On the closest string and substring problems. J. ACM 49(2), 157–171 (2002)
Liu, X., He, H., Sýkora, O.: Parallel genetic algorithm and parallel simulated annealing algorithm for the closest string problem. In: Advanced Data Mining and Applications, pp. 591–597. Springer Berlin Heidelberg (2005)
Liu, X., Holger, M., Hao, Z., Wu, G.: A compounded genetic and simulated annealing algorithm for the closest string problem. In: The 2nd International Conference on Bioinformatics and Biomedical Engineering, 2008 (ICBBE 2008), pp. 702–705. IEEE, New York (2008)
Liu, X., Liu, S., Hao, Z., Mauch, H.: Exact algorithm and heuristic for the closest string problem. Comput. & Oper. Res., 38(11), 1513–1520 (2011)
Lopresti, D., Zhou, J.: Using consensus sequence voting to correct OCR errors. Comput. Vis. Image Underst. 67(1), 39–47 (1997)
Ma, B.: A polynomial time approximation scheme for the closest substring problem. In: Combinatorial Pattern Matching, pp. 99–107. Springer, Berlin (2000)
Ma, B., Sun, X.: More efficient algorithms for closest string and substring problems. In: Research in Computational Molecular Biology, pp. 396–409. Springer, Berlin (2008)
Martínez-Hinarejos, C.D., Juan, A., Casacuberta, F.: Use of median string for classification. In: Proceedings of 15th International Conference on Pattern Recognition, vol. 2, pp. 903–906. IEEE, New York (2000)
Marx, D.: Closest substring problems with small distances. SIAM J. Comput. 38(4), 1382–1410 (2008)
Mauch, H.: Closest substring problem–results from an evolutionary algorithm. In: Neural Information Processing, pp. 205–211. Springer, Berlin (2004)
Mauch, H., Melzer, M., Hu, J.: Genetic algorithm approach for the closest string problem. In: Proceedings of the 2003 IEEE Bioinformatics Conference 2003 (CSB 2003), pp. 560–561 (2003)
McClure, M., Vasi, T., Fitch, W.: Comparative analysis of multiple protein-sequence alignment methods. Mol. Biol. Evol. 11(4), 571 (1994)
Meneses, C., Lu, Z., Oliveira, C., Pardalos, P., et al.: Optimal solutions for the closest-string problem via integer programming. INFORMS J. Comput. 16(4), 419–429 (2004)
Meneses, C., Pardalos, P., Resende, M., Vazacopoulos, A.: Modeling and solving string selection problems. In: Second International Symposium on Mathematical and Computational Biology, pp. 54–64 (2005)
Meneses, C., Oliveira, C., Pardalos, P.: Optimization techniques for string selection and comparison problems in genomics. IEEE Eng. Med. Biol. Mag. 24(3), 81–87 (2005)
Metropolis, N., Ulam, S.: The Monte Carlo method. J. Am. Stat. Assoc. 44(247), 335–341 (1949)
Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A., Teller, E.: Perspective on “Equation of state calculations by fast computing machines”. J. Chem. Phys. 21, 1087–1092 (1953)
Micó, L., Oncina, J.: An approximate median search algorithm in non-metric spaces. Pattern Recognit. Lett. 22(10), 1145–1151 (2001)
Mousavi, S.R.: A hybridization of constructive beam search with local search for far from most strings problem. Int. J. Comput. Math. Sci. v4(i7), 340–348 (2010)
Mousavi, S.R., Babaie, M., Montazerian, M.: An improved heuristic for the far from most strings problem. J. Heuristics 18(2), 239–262 (2012)
Nicolas, F., Rivals, E.: Complexities of the centre and median string problems. In: Combinatorial Pattern Matching, pp. 315–327. Springer, Berlin (2003)
Nicolas, F., Rivals, E.: Hardness results for the center and median string problems under the weighted and unweighted edit distances. J. Discrete Algorithms 3(2), 390–415 (2005)
Mousavi, S.R., Nasr Esfahani, N.: A GRASP algorithm for the closest string problem using a probability-based heuristic. Comput. & Oper. Res., 39(2), 238–248 (2012)
Silva, R.M.A., Baleeiro, G., Pires, D., Resende, M., Festa, P., Valentim, F.: Grasp with path-relinking for the farthest substring problem. Technical Report, AT&T Labs Research (2008)
Sim, J.S., Park, K.: The consensus string problem for a metric is NP-complete. J. Discrete Algorithms 1(1), 111–117 (2003)
Smith, A.: Common approximate substrings. Ph.D. thesis, Citeseer (2004)
Stojanovic, N., Berman, P., Gumucio, D., Hardison, R., Miller, W.: A linear-time algorithm for the 1-mismatch problem. In: Algorithms and Data Structures, pp. 126–135. Springer Berlin Heidelberg (1997)
Tanaka, S.: A heuristic algorithm based on Lagrangian relaxation for the closest string problem. Comput. & Oper. Res., 39(3), 709–717 (2012)
Wang, J., Huang, M., Chen., J.: A lower bound on approximation algorithms for the closest substring problem. In: Combinatorial Optimization and Applications, pp. 291–300. Springer Berlin Heidelberg (2007)
Wang, J., Chen, J., Huang, M.: An improved lower bound on approximation algorithms for the closest substring problem. Inf. Process. Lett. 107(1), 24–28 (2008)
Wang, L., Zhu, B.: Efficient algorithms for the closest string and distinguishing string selection problems. In: Frontiers in Algorithmics, pp. 261–270. Springer Berlin Heidelberg (2009)
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2013 Elisa Pappalardo, Panos M. Pardalos, Giovanni Stracquadanio
About this chapter
Cite this chapter
Pappalardo, E., Pardalos, P.M., Stracquadanio, G. (2013). String Selection Problems. In: Optimization Approaches for Solving String Selection Problems. SpringerBriefs in Optimization. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-9053-1_4
Download citation
DOI: https://doi.org/10.1007/978-1-4614-9053-1_4
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-9052-4
Online ISBN: 978-1-4614-9053-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)