Journal of Clinical Monitoring and Computing

, Volume 19, Issue 4–5, pp 319–328

High-Performance Exact Algorithms For Motif Search

  • Sanguthevar Rajasekaran
  • Sudha Balla
  • Chun-Hsi Huang
  • Vishal Thapar
  • Michael Gryk
  • Mark Maciejewski
  • Martin Schiller
Article

Abstract

Objective. The human genome project has resulted in the generation of voluminous biological data. Novel computational techniques are called for to extract useful information from this data. One such technique is that of finding patterns that are repeated over many sequences (and possibly over many species). In this paper we study the problem of identifying meaningful patterns (i.e., motifs) from biological data, the motif search problem. Methods. The general version of the motif search problem is NP-hard. Numerous algorithms have been proposed in the literature to solve this problem. Many of these algorithms fall under the category of heuristics. We concentrate on exact algorithms in this paper. In particular, we concentrate on two different versions of the motif search problem and offer exact algorithms for them. Results. In this paper we present algorithms for two versions of the motif search problem. All of our algorithms are elegant and use only such simple data structures as arrays. For the first version of the problem described as Problem 1 in the paper, we present a simple sorting based algorithm, SMS (Simple Motif Search). This algorithm has been coded and experimental results have been obtained. For the second version of the problem (described in the paper as Problem 2), we present two different algorithms – a deterministic algorithm (called DMS) and a randomized algorithm (Monte Carlo algorithm). We also show how these algorithms can be parallelized.Conclusions. All the algorithms proposed in this paper are improvements over existing algorithms for these versions of motif search in biological sequence data. The algorithms presented have the potential of performing well in practice.

Keywords

motif search algorithm short sequence motifs 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Adebiyi EF, Jiang T, Kaufmann M. An efficient algorithm for finding short approximate non-tandem repeats. Bioinformatics 2001; 17(1): S5–S12.PubMedGoogle Scholar
  2. 2.
    Adebiyi EF, Kaufmann M. Extracting common motifs under the Levenshtein measure: Theory and experimentation, Proc. Workshop on Algorithms for Bioinformatics (WABI). Springer-Verlag LNCS 2002; 2452: 140–156.Google Scholar
  3. 3.
    Buhler J, Tompa M. Finding motifs using random projections, Proc. Fifth Annual International Conference on Computational Molecular Biology (RECOMB) 2001.Google Scholar
  4. 4.
    Chernoff H. A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Annals of Math Statistics 1952; 23: 493–507.Google Scholar
  5. 5.
    Floratos A, Rigoutsos I. On the Time Complexity of the TEIRESIAS Algorithm, Research Report RC 21161 (94582), IBM TJ, Watson Research Center 1998.Google Scholar
  6. 6.
    Galil Z, Park K. An improved algorithm for approximate string matching. SIAM Journal of Computing 1990; 19(6): 989–999.CrossRefGoogle Scholar
  7. 7.
    Horowitz E, Sahni S, Rajasekaran S. Computer Algorithms. W. H. Freeman Press, 1998.Google Scholar
  8. 8.
    Landau GM, Vishkin U. Introducing efficient parallelism into approximate string matching and a new serial algorithm, Proc. ACM Symposium on Theory of Computing 1986: 220–230.Google Scholar
  9. 9.
    Martinez HM. An efficient method for finding repeats in molecular sequences. Nucleic Acids Research 1983; 11(13): 4629–4634.PubMedGoogle Scholar
  10. 10.
    Myers EW. Incremental Alignment Algorithms and Their Applications, Technical Report 86-22, Department of Computer Science, University of Arizona, Tucson, AZ 85721, 1986.Google Scholar
  11. 11.
    Myers EW. A sublinear algorithm for approximate keyword searching. Algorithmica 1994; 12: 345–374.CrossRefGoogle Scholar
  12. 12.
    Rajasekaran S, Balla S, Huang CH. Exact Algorithms for Planted Motif Challenge Problems, Proc. Asia-Pacific Bioinformatics Conference (APBC), 2005: 249–260.Google Scholar
  13. 13.
    Sagot MF. Spelling approximate repeated or common motifs using a suffix tree. Springer-Verlag LNCS 1998; 1380: 111–127.Google Scholar
  14. 14.
    Ukkonen E. Finding approximate patterns in strings. Journal of Algorithms 1985; 6: 132–137.CrossRefGoogle Scholar

Copyright information

© Springer Science + Business Media, Inc. 2005

Authors and Affiliations

  • Sanguthevar Rajasekaran
    • 1
    • 3
  • Sudha Balla
    • 1
  • Chun-Hsi Huang
    • 1
  • Vishal Thapar
    • 1
  • Michael Gryk
    • 2
  • Mark Maciejewski
    • 2
  • Martin Schiller
    • 2
  1. 1.Department of CSEUniversity of ConnecticutStorrsUSA
  2. 2.Department of NeuroscienceUniversity of ConnecticutFarmingtonUSA
  3. 3.Professor of Computer Science and EngineeringUniversity of ConnecticutStorrsU.S.A.

Personalised recommendations