An Algorithm to Find All Identical Motifs in Multiple Biological Sequences

Bindal, Ashish Kishor; Sabarinathan, R.; Sridhar, J.; Sherlin, D.; Sekar, K.

doi:10.1007/978-3-642-16001-1_12

Ashish Kishor Bindal²¹,
R. Sabarinathan²¹,
J. Sridhar²²,
D. Sherlin²¹ &
…
K. Sekar²¹

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 6282))

Included in the following conference series:

IAPR International Conference on Pattern Recognition in Bioinformatics

1368 Accesses

Abstract

Sequence motifs are of greater biological importance in nucleotide and protein sequences. The conserved occurrence of identical motifs represents the functional significance and helps to classify the biological sequences. In this paper, a new algorithm is proposed to find all identical motifs in multiple nucleotide or protein sequences. The proposed algorithm uses the concept of dynamic programming. The application of this algorithm includes the identification of (a) conserved identical sequence motifs and (b) identical or direct repeat sequence motifs across multiple biological sequences (nucleotide or protein sequences). Further, the proposed algorithm facilitates the analysis of comparative internal sequence repeats for the evolutionary studies which helps to derive the phylogenetic relationships from the distribution of repeats.

Download to read the full chapter text

Chapter PDF

Sequence Repeats

Finding identical sequence repeats in multiple protein sequences: An algorithm

Article 28 February 2024

Vikas Kumar Maurya, Madhumathi Sanjeevi, … Sekar Kanagaraj

Sequence Alignment

Keywords

References

D’Haeseleer, P.: What are DNA sequence motifs? Nat. Biotechnol. 24, 423–425 (2006)
Google Scholar
Kumar, C., Kumar, N., Sarani, R., Balakrishnan, N., Sekar, K.: A Method to find Sequentially Separated Motifs in Biological Sequences (SSMBS). In: Chetty, M., Ngom, A., Ahmad, S. (eds.) PRIB 2008. LNCS (LNBI), vol. 5265, pp. 13–27. Springer, Heidelberg (2008)
Chapter Google Scholar
Hulo, N., Sigrist, C.J., Le Saux, V., Langendijk-Genevaux, P.S., Bordoli, L., Gattiker, A., De Castro, E., Bucher, P., Bairoch, A.: Recent improvements to the PROSITE database. Nucl. Acids Res. 32, D134–D137 (2004)
Article Google Scholar
Huang, J.Y., Brutlag, D.L.: The EMOTIF database. Nucl. Acids Res. 29, 202–204 (2001)
Article CAS PubMed PubMed Central Google Scholar
Zdobnov, E.M., Apweiler, R.: InterProScan–an integration platform for the signature-recognition methods in InterPro. Bioinformatics 17, 847–848 (2001)
Article CAS PubMed Google Scholar
Bailey, T.L., Elkan, C.: Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology 2, 28–36 (1994)
CAS Google Scholar
Rigoutsos, I., Floratos, A.: Combinatorial pattern discovery in biological sequences: the TEIRESIAS algorithm. Bioinformatics 14, 55–67 (1998)
Article CAS PubMed Google Scholar
Werner, T.: Model for prediction and recognition of eukaryotic promoters. Mamm. Genome 10, 168–175 (1999)
Article CAS PubMed Google Scholar
VanHelden, J., Andre, B., Collado-Vides, J.: Extracting Regulatory Sites from the Upstream Region of Yeast Genes by Computational Analysis of Oligonucleotide Frequencies. J. Mol. Biol. 281, 827–842 (1998)
Article CAS Google Scholar
Koonin, E.V., Mushegian, A.R., Galperin, M.Y., Walker, D.R.: Comparison of archeal and bacterial genomes: Computer analysis of protein sequence predicts novel function and suggests chimeric origins for the archaea. Mol. Microbiol. 25, 619–637 (1997)
Article CAS PubMed Google Scholar
Boby, T., Patch, A.M., Aves, S.J.: TRbase: a database relating tandem repeats to disease genes in the human genome. Bioinformatics 21, 811–816 (2005)
Article CAS PubMed Google Scholar
Mojica, F.J., Diez-Villasenor, C., Soria, E., Juez, G.: Biological significance of a family of regularly spaced repeats in the genomes of archaea, bacteria and mitochondria. Mol. Microbiol. 36, 244–246 (2000)
Article CAS PubMed Google Scholar
Van de Lagemaat, L.N., Gagnier, L., Medstrand, P., Mager, D.L.: Genomic deletions and precise removal of transposable elements mediated by short identical DNA segments in primates. Genome Res. 15, 1243–1249 (2005)
Article PubMed PubMed Central Google Scholar
Wu, T.T., Miller, M.R., Perry, H.M., Kabat, E.A.: Long identical repeats in the mouse gamma 2b switch region and their implications for the mechanism of class switching. EMBO J. 3, 2033–2040 (1984)
CAS PubMed PubMed Central Google Scholar
Banerjee, N., Chidambarathanu, N., Sabarinathan, R., Michael, D., Vasuki Ranjani, C., Balakrishnan, N., Sekar, K.: An Algorithm to Find Similar Internal Sequence Repeats. Curr. Sci. 97, 1345–1349 (2009)
Google Scholar
Sarani, R., Udayaprakash, N.A., Subashini, R., Mridula, P., Yamane, T., Sekar, K.: Large cryptic internal sequence repeats in protein structures from Homo sapiens. J. Biosciences 34, 103–112 (2009)
Article CAS Google Scholar
Sabarinathan, R., Basu, R., Sekar, K.: ProSTRIP: A method to find similar structural repeats in three-dimensional protein structures. Comput. Biol. Chem. 34, 126–130 (2010)
Article CAS PubMed Google Scholar
Heringa, J.: Detection of internal repeats: How common are they? Curr. Opin. Struct. Biol. 8, 338–345 (1998)
Article CAS Google Scholar
Djian, P.: Evolution of simple repeats in DNA and their relation to human diseases. Cell 94, 155–160 (1998)
Article CAS PubMed Google Scholar
Pons, T., Gomez, R., Chinea, G., Valencia, A.: Beta-propellers: associated functions and their role in human diseases. Curr. Med. Chem. 10, 505–524 (2003)
Article CAS PubMed Google Scholar
MOTIF SCAN, http://myhits.isb-sib.ch/cgi-bin/motif_scan
de Castro, E., Sigrist, C.J., Gattiker, A., Bulliard, V., Langendijk-Genevaux, P.S., Gasteiger, E., Bairoch, A., Hulo, N.: ScanProsite: detection of PROSITE signature matches and ProRule-associated functional and structural residues in proteins. Nucl. Acids Res. 34, W362–W365 (2006)
Article Google Scholar
Schultz, J., Milpetz, F., Bork, P., Ponting, C.P.: SMART, a simple modular architecture research tool: identification of signaling domains. Proc. Natl. Acad. Sci. USA 95, 5857–5864 (1998)
Article CAS PubMed PubMed Central Google Scholar
MOTIF Search, http://motif.genome.jp/
Hughes, J.D., Estep, P.W., Tavazoie, S., Church, G.M.: Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. J. Mol. Biol. 296, 1205–1214 (2000)
Article CAS PubMed Google Scholar
Neduva, V., Linding, R., Su-Angrand, I., Stark, A., de Massi, F., Gibson, T.J., Lewis, J., Serrano, L., Russell, R.B.: Systematic discovery of new recognition peptides mediating protein interaction networks. PLoS Biol. 3, e405 (2005)
Article Google Scholar
Favorov, A.V., Gelfand, M.S., Gerasimova, A.V., Ravcheev, D.A., Mironov, A.A., Makeev, V.J.: A Gibbs sampler for identification of symmetrically structured, spaced DNA motifs with improved estimation of the signal length. Bioinformatics 21, 2240–2245 (2005)
Article CAS PubMed Google Scholar
Banerjee, N., Chidambarathanu, N., Michael, D., Balakrishnan, N., Sekar, K.: An Algorithm to Find All Identical Internal Sequence Repeats. Curr. Sci. 95, 188–195 (2008)
CAS Google Scholar
Sorek, R., Kunin, V., Hugenholtz, P.: CRISPR - a widespread system that provides acquired resistance against phages in bacteria and archaea. Nat. Rev. Microbiol., 181–186 (2008)
Google Scholar
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990)
Article CAS PubMed Google Scholar
Thompson, J.D., Higgins, D.G., Gibson, T.J.: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position specific gap penalties and weight matrix choice. Nucl. Acids Res. 22, 4673–4680 (1994)
Article CAS PubMed PubMed Central Google Scholar

Download references

Author information

Authors and Affiliations

Bioinformatics Centre (Centre of excellence in Structural Biology and Bio-computing), Indian Institute of Science, Bangalore, 560012, India
Ashish Kishor Bindal, R. Sabarinathan, D. Sherlin & K. Sekar
Center of Excellence in Bioinformatics, School of Biotechnology, Madurai Kamaraj University, Madurai, 625021, Tamilnadu, India
J. Sridhar

Authors

Ashish Kishor Bindal
View author publications
You can also search for this author in PubMed Google Scholar
R. Sabarinathan
View author publications
You can also search for this author in PubMed Google Scholar
J. Sridhar
View author publications
You can also search for this author in PubMed Google Scholar
D. Sherlin
View author publications
You can also search for this author in PubMed Google Scholar
K. Sekar
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute for Computing and Information Sciences, Radboud University Nijmegen, Heyendaalseweg 135, 6525AJ, Nijmegen, The Netherlands
Tjeerd M. H. Dijkstra , Elena Marchiori & Tom Heskes , &
Institute for Computing and Information Sciences, Turku Centre for Computer Science, Radboud University Nijmegen, Heyendaalseweg 135, 6525AJ, Nijmegen, The Netherlands
Evgeni Tsivtsivadze

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bindal, A.K., Sabarinathan, R., Sridhar, J., Sherlin, D., Sekar, K. (2010). An Algorithm to Find All Identical Motifs in Multiple Biological Sequences. In: Dijkstra, T.M.H., Tsivtsivadze, E., Marchiori, E., Heskes, T. (eds) Pattern Recognition in Bioinformatics. PRIB 2010. Lecture Notes in Computer Science(), vol 6282. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16001-1_12

Download citation

DOI: https://doi.org/10.1007/978-3-642-16001-1_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16000-4
Online ISBN: 978-3-642-16001-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

An Algorithm to Find All Identical Motifs in Multiple Biological Sequences

Abstract

Chapter PDF