Advertisement

More Efficient Algorithms for Closest String and Substring Problems

  • Bin Ma
  • Xiaoming Sun
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4955)

Abstract

The closest string and substring problems find applications in PCR primer design, genetic probe design, motif finding, and antisense drug design. For their importance, the two problems have been extensively studied recently in computational biology. Unfortunately both problems are NP-complete. Researchers have developed both fixed-parameter algorithms and approximation algorithms for the two problems.

In terms of fixed-parameter, when the radius d is the parameter, the best-known fixed-parameter algorithm for closest string has time complexity O(n d d + 1), which is still superpolynomial even if d = O(logn). In this paper we provide an \(O\left(n |\Sigma|^{O(d)}\right)\) algorithm where Σ is the alphabet. This gives a polynomial time algorithm when d = O(logn) and Σ has constant size. Using the same technique, we additionally provide a more efficient subexponential time algorithm for the closest substring problem.

In terms of approximation, both closest string and closest substring problems admit polynomial time approximation schemes (PTAS). The best known time complexity of the PTAS is \(O(n^{O(\epsilon^{-2} \log \frac 1\epsilon)})\). In this paper we present a PTAS with time complexity \(O(n^{O(\epsilon^{-2})})\).

At last, we prove that a restricted version of the closest substring has the same parameterized complexity as closest substring, answering an open question in the literature.

Keywords

Time Complexity Polynomial Time Algorithm Close String Substring Problem Input String 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Andoni, A., Indyk, P., Patrascu, M.: On the optimality of the dimensionality reduction method. In: Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science, pp. 449–458 (2006)Google Scholar
  2. 2.
    Ben-Dor, A., Lancia, G., Perone, J., Ravi, R.: Banishing bias from consensus sequences. In: Hein, J., Apostolico, A. (eds.) CPM 1997. LNCS, vol. 1264, pp. 247–261. Springer, Heidelberg (1997)Google Scholar
  3. 3.
    Davila, J., Balla, S., Rajasekaran, S.: Space and time efficient algorithms for planted motif search. In: International Conference on Computational Science (2), pp. 822–829 (2006)Google Scholar
  4. 4.
    Deng, X., Li, G., Li, Z., Ma, B., Wang, L.: Genetic design of drugs without side-effects. SIAM Journal on Computing 32(4), 1073–1090 (2003)CrossRefMathSciNetzbMATHGoogle Scholar
  5. 5.
    Dopazo, J., Rodríguez, A., Sáiz, J.C., Sobrino, F.: Design of primers for PCR amplification of highly variable genomes. CABIOS 9, 123–125 (1993)Google Scholar
  6. 6.
    Downey, R.G., Fellows, M.R.: Parameterized complexity. In: Monographs in Computer Science, Springer, New York (1999)Google Scholar
  7. 7.
    Evans, P.A., Smith, A.D.: Complexity of approximating closest substring problems. In: Lingas, A., Nilsson, B.J. (eds.) FCT 2003. LNCS, vol. 2751, pp. 210–221. Springer, Heidelberg (2003)Google Scholar
  8. 8.
    Evans, P.A., Smith, A.D., Wareham, H.T.: On the complexity of finding common approximate substrings. Theoretical Computer Science 306(1-3), 407–430 (2003)CrossRefMathSciNetzbMATHGoogle Scholar
  9. 9.
    Fellows, M.R., Gramm, J., Niedermeier, R.: On the parameterized intractability of motif search problems. Combinatorica 26(2), 141–167 (2006)CrossRefMathSciNetzbMATHGoogle Scholar
  10. 10.
    Frances, M., Litman, A.: On covering problems of codes. Theoretical Computer Science 30, 113–119 (1997)MathSciNetzbMATHGoogle Scholar
  11. 11.
    Gramm, J., Guo, J., Niedermeier, R.: On exact and approximation algorithms for distinguishing substring selection. In: Lingas, A., Nilsson, B.J. (eds.) FCT 2003. LNCS, vol. 2751, pp. 159–209. Springer, Heidelberg (2003)Google Scholar
  12. 12.
    Gramm, J., Hüffner, F., Niedermeier, R.: Closest strings, primer design, and motif search. In: Florea, L., et al. (eds.) Currents in Computational Molecular Biology, poster abstracts of RECOMB 2002, pp. 74–75 (2002)Google Scholar
  13. 13.
    Gramm, J., Niedermeier, R., Rossmanith, P.: Fixed-parameter algorithms for closest string and related problems. Algorithmica 37, 25–42 (2003)CrossRefMathSciNetzbMATHGoogle Scholar
  14. 14.
    Hochbaum, D.S. (ed.): Approximation Algorithms for NP-Hard Problems. PWS Publishing Company, Boston (1996)Google Scholar
  15. 15.
    Jiao, Y., Xu, J., Li, M.: On the k-closest substring and k-consensus pattern problems. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 130–144. Springer, Heidelberg (2004)Google Scholar
  16. 16.
    Lanctot, K., Li, M., Ma, B., Wang, S., Zhang, L.: Distinguishing string search problems. In: Proceedings of the 10th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 633–642 (1999)Google Scholar
  17. 17.
    Li, M., Ma, B., Wang, L.: Finding similar regions in many strings. In: Proceedings of the 31st ACM Symposium on Theory of Computing, pp. 473–482 (1999)Google Scholar
  18. 18.
    Li, M., Ma, B., Wang, L.: On the closest string and substring problems. Journal of the ACM 49(2), 157–171 (2002)CrossRefMathSciNetGoogle Scholar
  19. 19.
    Liu, X., He, H., Sýkora, O.: Parallel genetic algorithm and parallel simulated annealing algorithm for the closest string problem. In: Li, X., Wang, S., Dong, Z.Y. (eds.) ADMA 2005. LNCS (LNAI), vol. 3584, pp. 591–597. Springer, Heidelberg (2005)Google Scholar
  20. 20.
    Lucas, K., Busch, M., MÖssinger, S., Thompson, J.A.: An improved microcomputer program for finding gene- or gene family-specific oligonucleotides suitable as primers for polymerase chain reactions or as probes. CABIOS 7, 525–529 (1991)Google Scholar
  21. 21.
    Ma, B.: A polynomial time approximation scheme for the closest substring problem. In: Proceedings of the 11th Symposium on Combinatorial Pattern Matching, pp. 99–107 (2000)Google Scholar
  22. 22.
    Marx, D.: The closest substring problem with small distances. In: Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pp. 63–72 (2005)Google Scholar
  23. 23.
    Mauch, H., Melzer, M.J., Hu, J.S.: Genetic algorithm approach for the closest string problem. In: Proceedings of the 2nd IEEE Computer Society Bioinformatics Conference (CSB), pp. 560–561 (2003)Google Scholar
  24. 24.
    Meneses, C.N., Lu, Z., Oliveira, C.A.S., Pardalos, P.M.: Optimal solutions for the closest-string problem via integer programming. INFORMS Journal on Computing (2004)Google Scholar
  25. 25.
    Moan, C., Rusu, I.: Hard problems in similarity searching. Discrete Applied Mathematics 144, 213–227 (2004)CrossRefMathSciNetzbMATHGoogle Scholar
  26. 26.
    Nicolas, F., Rivals, E.: Complexities of the centre and median string problems. In: Proceedings of the 14th Annual Symposium on Combinatorial Pattern Matching, pp. 315–327 (2003)Google Scholar
  27. 27.
    Proutski, V., Holme, E.C.: Primer master: A new program for the design and analysis of PCR primers. CABIOS 12, 253–255 (1996)Google Scholar
  28. 28.
    Raghavan, P.: Probabilistic construction of deterministic algorithms: Approximating packing integer program. Journal of Computer and System Sciences 37, 130–143 (1988)CrossRefMathSciNetzbMATHGoogle Scholar
  29. 29.
    Stojanovic, N., Berman, P., Gumucio, D., Hardison, R., Miller, W.: A linear-time algorithm for the 1-mismatch problem. In: Proceedings of the 5th International Workshop on Algorithms and Data Structures, pp. 126–135 (1997)Google Scholar
  30. 30.
    Wang, L., Dong, L.: Randomized algorithms for motif detection. Journal of Bioinformatics and Computational Biology 3(5), 1039–1052 (2005)CrossRefMathSciNetGoogle Scholar
  31. 31.
    Wang, Y., Chen, W., Li, X., Cheng, B.: Degenerated primer design to amplify the heavy chain variable region from immunoglobulin cDNA. BMC Bioinformatics 7(suppl. 4), S9 (2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Bin Ma
    • 1
  • Xiaoming Sun
    • 2
  1. 1.Department of Computer ScienceUniversity of Western OntarioLondonCanada
  2. 2.Center for Advanced Study and Institute for Theoretical Computer ScienceTsinghua UniversityBeijingChina

Personalised recommendations