Encyclopedia of Algorithms

2016 Edition
| Editors: Ming-Yang Kao

Closest String and Substring Problems

  • Lusheng Wang
  • Ming Li
  • Bin Ma
Reference work entry
DOI: https://doi.org/10.1007/978-1-4939-2864-4_73

Years and Authors of Summarized Original Work

  • 2000; Li, Ma, Wang

  • 2003; Deng, et al.

  • 2008; Marx

  • 2009; Ma, Sun

  • 2011; Chen, Wang

  • 2012; Chen, Ma, Wang

Problem Definition

The problem of finding a center string that is “close” to every given string arises and has applications in computational molecular biology [4, 5, 9, 10, 11, 18, 19] and coding theory [1, 6, 7].

This problem has two versions: The first problem comes from coding theory when we are looking for a code not too far away from a given set of codes.

Problem 1 (The closest string problem)

Input: a set of strings \(\mathcal{S} =\{ s_{1},s_{2},\ldots ,s_{n}\}\)

Keywords

Approximation algorithm Fixed-parameter algorithms 
This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Ben-Dor A, Lancia G, Perone J, Ravi R (1997) Banishing bias from consensus sequences. In: Proceedings of the 8th annual symposium on combinatorial pattern matching conference, Aarhus, pp 247–261Google Scholar
  2. 2.
    Chen Z, Wang L (2011) Fast exact algorithms for the closest string and substring problems with application to the planted (L, d)-motif model. IEEE/ACM Trans Comput Biol Bioinform 8(5):1400–1410CrossRefGoogle Scholar
  3. 3.
    Chen Z-Z, Ma B, Wang L (2012) A three-string approach to the closest string problem. J Comput Syst Sci 78(1):164–178MathSciNetzbMATHCrossRefGoogle Scholar
  4. 4.
    Deng X, Li G, Li Z, Ma B, Wang L (2003) Genetic design of drugs without side-effects. SIAM J Comput 32(4):1073–1090MathSciNetzbMATHCrossRefGoogle Scholar
  5. 5.
    Dopazo J, Rodríguez A, Sáiz JC, Sobrino F (1993) Design of primers for PCR amplification of highly variable genomes. CABIOS 9:123–125Google Scholar
  6. 6.
    Frances M, Litman A (1997) On covering problems of codes. Theor Comput Syst 30:113–119MathSciNetzbMATHCrossRefGoogle Scholar
  7. 7.
    Gasieniec L, Jansson J, Lingas A (1999) Efficient approximation algorithms for the hamming center problem. In: Proceedings of the 10th ACM-SIAM symposium on discrete algorithms, Baltimore, pp 135–S906Google Scholar
  8. 8.
    Gramm J, Niedermeier R, Rossmanith P 2003 Fixed-parameter algorithms for closest string and related problems. Algorithmica 37(1):25–42MathSciNetzbMATHCrossRefGoogle Scholar
  9. 9.
    Hertz G, Stormo G (1995) Identification of consensus patterns in unaligned DNA and protein sequences: a large-deviation statistical basis for penalizing gaps. In: Proceedings of the 3rd international conference on bioinformatics and genome research, Tallahassee, pp 201–216Google Scholar
  10. 10.
    Lanctot K, Li M, Ma B, Wang S, Zhang L (1999) Distinguishing string selection problems. In: Proceedings of the 10th ACM-SIAM symposium on discrete algorithms, Baltimore, pp 633–642Google Scholar
  11. 11.
    Lawrence C, Reilly A (1990) An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Proteins 7:41–51CrossRefGoogle Scholar
  12. 12.
    Li M, Ma B, Wang L (2002) Finding similar regions in many sequences. J Comput Syst Sci 65(1):73–96MathSciNetzbMATHCrossRefGoogle Scholar
  13. 13.
    Li M, Ma B, Wang L (1999) Finding similar regions in many strings. In: Proceedings of the thirty-first annual ACM symposium on theory of computing, Atlanta, pp 473–482Google Scholar
  14. 14.
    Li M, Ma B, Wang L (2002) On the closest string and substring problems. J ACM 49(2):157–171MathSciNetzbMATHCrossRefGoogle Scholar
  15. 15.
    Ma B (2000) A polynomial time approximation scheme for the closest substring problem. In: Proceedings of the 11th annual symposium on combinatorial pattern matching, Montreal, pp 99–107Google Scholar
  16. 16.
    Ma B, Sun X (2009) More efficient algorithms for closest string and substring problems. SIAM J Comput 39(4):1432–1443MathSciNetzbMATHCrossRefGoogle Scholar
  17. 17.
    Marx D (2008) Closest substring problems with small distances. SIAM J Comput 38(4):1382–1410MathSciNetzbMATHCrossRefGoogle Scholar
  18. 18.
    Stormo G (1990) Consensus patterns in DNA. In: Doolittle RF (ed) Molecular evolution: computer analysis of protein and nucleic acid sequences. Methods Enzymol 183:211–221Google Scholar
  19. 19.
    Stormo G, Hartzell GW III (1991) Identifying protein-binding sites from unaligned DNA fragments. Proc Natl Acad Sci USA 88:5699–5703CrossRefGoogle Scholar
  20. 20.
    Wang L, Zhu B (2009) Efficient algorithms for the closest string and distinguishing string selection problems. In: Proceedings of 3rd international workshop on frontiers in algorithms, Hefei. Lecture notes in computer science, vol 5598, pp 261–270Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Lusheng Wang
    • 1
  • Ming Li
    • 2
  • Bin Ma
    • 3
    • 4
  1. 1.Department of Computer ScienceCity University of Hong KongHong KongHong kong
  2. 2.David R. Cheriton School of Computer Science, University of WaterlooWaterlooCanada
  3. 3.David R. Cheriton School of Computer Science, University of WaterlooWaterlooCanada
  4. 4.Department of Computer ScienceUniversity of Western OntarioLondonCanada