An Efficient Algorithm to Identify DNA Motifs

Abbass, Mostafa M.; Bahig, Hazem M.

doi:10.1007/s11786-013-0165-6

An Efficient Algorithm to Identify DNA Motifs

Published: 22 November 2013

Volume 7, pages 387–399, (2013)
Cite this article

Mathematics in Computer Science Aims and scope Submit manuscript

Mostafa M. Abbass^1,2 &
Hazem M. Bahig^3,4

159 Accesses
7 Citations
Explore all metrics

Abstract

We consider the problem of identifying motifs that abstracts the task of finding short conserved sites in genomic DNA. The planted (l, d)-motif problem, PMP, is the mathematical abstraction of this problem, which consists of finding a substring of length l that occurs in each s _i in a set of input sequences S = {s ₁, s ₂, . . . ,s _t} with at most d substitutions. Our propose algorithm combines the voting algorithm and pattern matching algorithm to find exact motifs. The combined algorithm is achieved by running the voting algorithm on t′ sequences, t′ < t. After that we use the pattern matching on the output of the voting algorithm and the reminder sequences, t − t′. Two values of t′ are calculated. The first value of t′ makes the running time of our proposed algorithm less than the running time of voting algorithm. The second value of t′ makes the running time of our proposed algorithm is minimal. We show that our proposed algorithm is faster than the voting algorithm by testing both algorithms on simulated data from (9, d ≤ 2) to (19, d ≤ 7). Finally, we test the performance of the combined algorithm on realistic biological data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Abbas M., Abouelhoda M., Bahig H.: A hybrid method for the exact planted (l, d)-motif finding problem and its parallelization. BMC Bioinforma. 13(Suppl. 17), S10 (2012)
Article Google Scholar
Bailey T., Elkan C.: Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Mach. Learn. 21, 51–80 (1995)
Google Scholar
Bandyopadhyay, S., Sahni, S., Rajasekaran, S.: PMS6: A faster algorithm for motif discovery. In: Proc. ICCABS 2012, pp. 1–6 (2012)
Blanchette, M.: Algorithms for phylogenetic footprinting. In: Proc. RECOMB’01, pp. 49–58 (2001)
Blanchette M., Tompa M.: Discovery of regulatory elements by a computational method for phylogenetic footprinting. Genome Res. 12(5), 739–748 (2002)
Article Google Scholar
Brazma A., Jonassen I., Vilo J., Ukkonen E.: Predicting gene regulatory elements in silico on a genomic scale. Genome Res. 15, 1202–1215 (1998)
Google Scholar
Buhler J., Tompa M.: Finding motifs using random projections. J. Comput. Biol. 9(2), 225–242 (2002)
Article Google Scholar
Chin, F., Leung, H.: Voting algorithms for discovering long motifs. In: Proc. APBC 2005, pp. 261–271 (2005)
Davila, J., Balla, S., Rajasekaran, S.: Space and time efficient algorithms for planted motif search. In: Proc. IWBRA 2006, LNCS 3992, pp. 822–829 (2006)
Davila J., Balla S., Rajasekaran S.: Fast and practical algorithms for planted (l, d)-motif search. IEEE/ACM Trans. Comput. Biol. Bioinforma. 4(4), 544–552 (2007)
Article Google Scholar
Dinh H., Rajasekaran S., Kundeti V.: PMS5: an efficient exact algorithm for the (l, d)-motif finding problem. BMC Bioinforma. 12, 410–420 (2011)
Article Google Scholar
Galas D., Eggert M., Waterman M.: Rigorous pattern-recognition methods for DNA sequences: analysis of promoter sequences from Escherichia coli. J. Mol. Biol. 186(1), 117–128 (1985)
Article Google Scholar
Hertz G., Stormo G.: Identification of consensus patterns in unaligned DNA and protein sequences: a large-deviation statistical basis for penalizing gaps. In: Lim, H., Cantor, C. (eds.) Bioinformatics and Genome Research, pp. 201–216. World Scientific, Singapore (1995)
Lawrence C., Altschul S., Boguski M., Liu J., Neuwald A., Wootton J.: Detecting subtule sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262, 208–214 (1993)
Article Google Scholar
Leung, H., Chin, F.: Finding exact optimal motif in matrix representation by partitioning. Bioinformatics 21(Supp. 2), ii-86–ii92 (2005)
Google Scholar
Ono, H., Ng, Y.: Best fiting-length substring patterns for a set of string. In: Proc. COCOON 2005, LNCS 3595, pp. 240–250 (2005)
Pevzner, P., Sze, S.H.: Combinatorial approaches to finding subtle signals in DNA sequences. In: Proc. ISMB 2000. The AAAI Press, Menlo Park, pp. 269–278 (2000)
Rajasekaran S.: Algorithms for motif search. In: Aluru, S. (ed.) Handbook of Computational Molecular Biology, pp. 37-1–37-21. Chapman and Hall/CRC, London (2006)
Rajasekaran S., Balla S., Huang C.-H.: Exact algorithms for planted motif problems. J. Comput. Biol. 12(8), 1117–1128 (2005)
Article Google Scholar
Rigoutsos I., Floratos A.: Combinatorial pattern discovery in biological sequences: the TEIRESIAS algorithm. Bioinformatics 14(1), 55–67 (1998)
Article Google Scholar
Sagot, M.: Spelling approximate repeated or common motifs using a suffix tree. In: Lucchesi, L., Moura, A. (eds.), Proc. Latin’98, LNCS 1380, pp 111–127 (1998)
Sinha, S., Tompa, M.: A statistical method for finding transcription factor binding sites. In: Proc. ISMB 2000. The AAAI Press, Menlo Park, pp. 344–354 (2000)
Tompa, M.: An exact method for finding short motifs in sequences with application to the ribosome binding site problem. In: Proc. ISMB 1999. The AAAI Press, Menlo Park, pp. 262–271 (1999)
Wingender E., Dietze P., Karas H., Knppel R.: TRANSFAC: a database on transcription factors and their DNA binding sites. Nucleic Acids Res. 24(1), 238–241 (1996)
Article Google Scholar
Zhu J., Zhang M.: SCPD: a promoter database of the yeast Saccharomyces cerevisiae. Bioinformatics 15(7–8), 607–611 (1999)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Basic Science, Faculty of Engineering, Sinai University, Sinai, Egypt
Mostafa M. Abbass
KINDI Lab for Computing Research, College of Engineering, Qatar University, Doha, Qatar
Mostafa M. Abbass
Computer Science and Software Engineering Department, College of Computer Science and Engineering, Hail University, Hail, Kingdom of Saudi Arabia
Hazem M. Bahig
Computer Science Division, Department of Mathematics, Faculty of Science, Ain Shams University, Cairo, Egypt
Hazem M. Bahig

Authors

Mostafa M. Abbass
View author publications
You can also search for this author in PubMed Google Scholar
Hazem M. Bahig
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hazem M. Bahig.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Abbass, M.M., Bahig, H.M. An Efficient Algorithm to Identify DNA Motifs. Math.Comput.Sci. 7, 387–399 (2013). https://doi.org/10.1007/s11786-013-0165-6

Download citation

Received: 04 July 2013
Revised: 16 September 2013
Accepted: 28 October 2013
Published: 22 November 2013
Issue Date: December 2013
DOI: https://doi.org/10.1007/s11786-013-0165-6

Keywords

Mathematics Subject Classification (2010)

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Efficient Algorithm to Identify DNA Motifs

Abstract

Access this article

Similar content being viewed by others

Towards a More Efficient Discovery of Biologically Significant DNA Motifs

Unsupervised statistical discovery of spaced motifs in prokaryotic genomes

Novel algorithms for LDD motif search

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification (2010)

Navigation

An Efficient Algorithm to Identify DNA Motifs

Abstract

Access this article

Similar content being viewed by others

Towards a More Efficient Discovery of Biologically Significant DNA Motifs

Unsupervised statistical discovery of spaced motifs in prokaryotic genomes

Novel algorithms for LDD motif search

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2010)

Search

Navigation