Skip to main content
Log in

An Efficient Algorithm to Identify DNA Motifs

  • Published:
Mathematics in Computer Science Aims and scope Submit manuscript

Abstract

We consider the problem of identifying motifs that abstracts the task of finding short conserved sites in genomic DNA. The planted (l, d)-motif problem, PMP, is the mathematical abstraction of this problem, which consists of finding a substring of length l that occurs in each s i in a set of input sequences S = {s 1, s 2, . . . ,s t } with at most d substitutions. Our propose algorithm combines the voting algorithm and pattern matching algorithm to find exact motifs. The combined algorithm is achieved by running the voting algorithm on t′ sequences, t′ < t. After that we use the pattern matching on the output of the voting algorithm and the reminder sequences, tt′. Two values of t′ are calculated. The first value of t′ makes the running time of our proposed algorithm less than the running time of voting algorithm. The second value of t′ makes the running time of our proposed algorithm is minimal. We show that our proposed algorithm is faster than the voting algorithm by testing both algorithms on simulated data from (9, d ≤ 2) to (19, d ≤ 7). Finally, we test the performance of the combined algorithm on realistic biological data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Abbas M., Abouelhoda M., Bahig H.: A hybrid method for the exact planted (l, d)-motif finding problem and its parallelization. BMC Bioinforma. 13(Suppl. 17), S10 (2012)

    Article  Google Scholar 

  2. Bailey T., Elkan C.: Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Mach. Learn. 21, 51–80 (1995)

    Google Scholar 

  3. Bandyopadhyay, S., Sahni, S., Rajasekaran, S.: PMS6: A faster algorithm for motif discovery. In: Proc. ICCABS 2012, pp. 1–6 (2012)

  4. Blanchette, M.: Algorithms for phylogenetic footprinting. In: Proc. RECOMB’01, pp. 49–58 (2001)

  5. Blanchette M., Tompa M.: Discovery of regulatory elements by a computational method for phylogenetic footprinting. Genome Res. 12(5), 739–748 (2002)

    Article  Google Scholar 

  6. Brazma A., Jonassen I., Vilo J., Ukkonen E.: Predicting gene regulatory elements in silico on a genomic scale. Genome Res. 15, 1202–1215 (1998)

    Google Scholar 

  7. Buhler J., Tompa M.: Finding motifs using random projections. J. Comput. Biol. 9(2), 225–242 (2002)

    Article  Google Scholar 

  8. Chin, F., Leung, H.: Voting algorithms for discovering long motifs. In: Proc. APBC 2005, pp. 261–271 (2005)

  9. Davila, J., Balla, S., Rajasekaran, S.: Space and time efficient algorithms for planted motif search. In: Proc. IWBRA 2006, LNCS 3992, pp. 822–829 (2006)

  10. Davila J., Balla S., Rajasekaran S.: Fast and practical algorithms for planted (l, d)-motif search. IEEE/ACM Trans. Comput. Biol. Bioinforma. 4(4), 544–552 (2007)

    Article  Google Scholar 

  11. Dinh H., Rajasekaran S., Kundeti V.: PMS5: an efficient exact algorithm for the (l, d)-motif finding problem. BMC Bioinforma. 12, 410–420 (2011)

    Article  Google Scholar 

  12. Galas D., Eggert M., Waterman M.: Rigorous pattern-recognition methods for DNA sequences: analysis of promoter sequences from Escherichia coli. J. Mol. Biol. 186(1), 117–128 (1985)

    Article  Google Scholar 

  13. Hertz G., Stormo G.: Identification of consensus patterns in unaligned DNA and protein sequences: a large-deviation statistical basis for penalizing gaps. In: Lim, H., Cantor, C. (eds.) Bioinformatics and Genome Research, pp. 201–216. World Scientific, Singapore (1995)

  14. Lawrence C., Altschul S., Boguski M., Liu J., Neuwald A., Wootton J.: Detecting subtule sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262, 208–214 (1993)

    Article  Google Scholar 

  15. Leung, H., Chin, F.: Finding exact optimal motif in matrix representation by partitioning. Bioinformatics 21(Supp. 2), ii-86–ii92 (2005)

    Google Scholar 

  16. Ono, H., Ng, Y.: Best fiting-length substring patterns for a set of string. In: Proc. COCOON 2005, LNCS 3595, pp. 240–250 (2005)

  17. Pevzner, P., Sze, S.H.: Combinatorial approaches to finding subtle signals in DNA sequences. In: Proc. ISMB 2000. The AAAI Press, Menlo Park, pp. 269–278 (2000)

  18. Rajasekaran S.: Algorithms for motif search. In: Aluru, S. (ed.) Handbook of Computational Molecular Biology, pp. 37-1–37-21. Chapman and Hall/CRC, London (2006)

  19. Rajasekaran S., Balla S., Huang C.-H.: Exact algorithms for planted motif problems. J. Comput. Biol. 12(8), 1117–1128 (2005)

    Article  Google Scholar 

  20. Rigoutsos I., Floratos A.: Combinatorial pattern discovery in biological sequences: the TEIRESIAS algorithm. Bioinformatics 14(1), 55–67 (1998)

    Article  Google Scholar 

  21. Sagot, M.: Spelling approximate repeated or common motifs using a suffix tree. In: Lucchesi, L., Moura, A. (eds.), Proc. Latin’98, LNCS 1380, pp 111–127 (1998)

  22. Sinha, S., Tompa, M.: A statistical method for finding transcription factor binding sites. In: Proc. ISMB 2000. The AAAI Press, Menlo Park, pp. 344–354 (2000)

  23. Tompa, M.: An exact method for finding short motifs in sequences with application to the ribosome binding site problem. In: Proc. ISMB 1999. The AAAI Press, Menlo Park, pp. 262–271 (1999)

  24. Wingender E., Dietze P., Karas H., Knppel R.: TRANSFAC: a database on transcription factors and their DNA binding sites. Nucleic Acids Res. 24(1), 238–241 (1996)

    Article  Google Scholar 

  25. Zhu J., Zhang M.: SCPD: a promoter database of the yeast Saccharomyces cerevisiae. Bioinformatics 15(7–8), 607–611 (1999)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hazem M. Bahig.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Abbass, M.M., Bahig, H.M. An Efficient Algorithm to Identify DNA Motifs. Math.Comput.Sci. 7, 387–399 (2013). https://doi.org/10.1007/s11786-013-0165-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11786-013-0165-6

Keywords

Mathematics Subject Classification (2010)

Navigation