A Method to Find Sequentially Separated Motifs in Biological Sequences (SSMBS)

  • Chetan Kumar
  • Nishith Kumar
  • Sarani Rangarajan
  • Narayanaswamy Balakrishnan
  • Kanagaraj Sekar
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5265)


Sequence motifs occurring in a particular order in proteins or DNA have been proved to be of biological interest. In this paper, a new method to locate the occurrences of up to five user-defined motifs in a specified order in large proteins and in nucleotide sequence databases is proposed. It has been designed using the concept of quantifiers in regular expressions and linked lists for data storage. The application of this method includes the extraction of relevant consensus regions from biological sequences. This might be useful in clustering of protein families as well as to study the correlation between positions of motifs and their functional sites in DNA sequences.


Regular expressions protein and nucleotide sequences sequence motifs 


  1. 1.
    Hulo, N., Sigrist, C.J.A., Bairoch, A.: Recent improvements to the PROSITE database. Nucl. Acids Res. 32, D134–D137 (2004)CrossRefGoogle Scholar
  2. 2.
    Carvalho, A.M., Freitas, A.T., Oliveira, A.L., Sagot, M.: An Efficient Algorithm for the Identification of Structured Motifs in DNA Promoter Sequences. IEEE/ACM Transactions on Computational Biology and Bioinformatics 03, 126–140 (2006)CrossRefGoogle Scholar
  3. 3.
    Cartharius, K., Frech, K., Grote, K., Klocke, B., Haltmeier, M., Klingenhoff, A., Frisch, M., Bayerlein, M., Werner, T.: MatInspector and beyond: promoter analysis based on transcription factor binding sites. Bioinformatics 21, 2933–2942 (2005)CrossRefPubMedGoogle Scholar
  4. 4.
    Wingender, E., Chen, X., Fricke, E., Geffers, R., Hehl, R., Liebich, I., Krull, M., Matys, V., Michael, H., Ohnhaeuser, R., Prueb, M., Schacherer, F., Thiele, S., Urbach, S.: Match - a tool for searching transcription factor binding sites in DNA sequences. Nucl. Acids Res. 29, 281–283 (2001)CrossRefPubMedPubMedCentralGoogle Scholar
  5. 5.
    Akiyama, Y.: TFSEARCH: Searching Transcription Factor Binding Sites,
  6. 6.
    Werner, T.: Model for prediction and recognition of eukaryotic promoters. Mammalian Genome 10, 168–175 (1999)CrossRefPubMedGoogle Scholar
  7. 7.
    Wang, W., Kim, R., Jancarik, J., Yokota, H., Kim, S.H.: Crystal structure of phosphoserine phosphatase from Methanococcus jannaschii, a hyperthermophile, at 1.8 A resolution. Structure 9, 65–71 (2001)CrossRefPubMedGoogle Scholar
  8. 8.
    VanHelden, J., André, B., Collado-Vides, J.: Extracting Regulatory Sites from the Upstream Region of Yeast Genes by Computational Analysis of Oligonucleotide Frequencies. J. Mol. Biol. 281, 827–842 (1998)CrossRefGoogle Scholar
  9. 9.
    Pavlidis, P., Furey, T.S., Liberto, M., Haussler, D., Grundy, W.N.: Promoter region-based classification of genes. In: Proceedings of the Pacific Symposium on Bio-computing, pp. 151–163 (2001)Google Scholar
  10. 10.
    Collet, J.F., Stroobant, V., Pirard, M., Delpierre, G., Schaftingen, E.V.: A new class of phosphotransferases phosphorylated on an aspartate residue in an amino-terminal (DXDX(T/V)) motif. J. Biol. Chem. 273, 14107–14112Google Scholar
  11. 11.
    Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)CrossRefPubMedGoogle Scholar
  12. 12.
    Rao, K.N., Kumaran, D., Swaminathan, S.: Crystal structure of trehalose-6-phosphate phosphatase-related protein: Biochemical and biological implications. Protein Sci. 15, 1735–1744 (2006)CrossRefPubMedPubMedCentralGoogle Scholar
  13. 13.
    Altschul, S.F., Gish, W., Miller, W., Myers, W.E., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990)CrossRefPubMedGoogle Scholar
  14. 14.
    Nevill-Manning, C.G., Wu, T.D., Brutlag, D.L.: Highly specific protein sequence motifs for genome analysis. JOURNAL NAME HERE 95, 5865–5871 (1998)Google Scholar
  15. 15.
    Ben-Hur, A., Brutlag, D.: Remote homology detection: a motif based approach. Bioinformatics 19, i26–i33 (2003)CrossRefGoogle Scholar
  16. 16.
  17. 17.
    Dsouza, M., Larsen, N., Overbeek, R.: Searching for patterns in genomic data. Trends Genet. 13, 497–504 (1997)CrossRefPubMedGoogle Scholar
  18. 18.
    Pesole, S., Liuni, S., D’Souza, M.: PatSearch: a pattern matcher software that finds functional elements in nucleotide and protein sequences and assesses their statistical significance. Bioinformatics 16, 439–450 (2000)CrossRefPubMedGoogle Scholar
  19. 19.
    Huang, J.Y., Brutlag, S.: The eMOTIF Database. Nucl. Acids Res. 29, 202–204 (2001)CrossRefPubMedPubMedCentralGoogle Scholar
  20. 20.
    Obenauer, J.C., Cantley, L.C., Yaffe, M.B.: Scansite 2.0 Proteome-wide prediction of cell signaling interactions using short sequence motifs. Nucl. Acids Res. 31, 3635–3641 (2003)CrossRefPubMedPubMedCentralGoogle Scholar
  21. 21.
  22. 22.
    Bateman, A., Coin, L., Durbin, R., Finn, R.D., Hollich, V., Griffiths-Jones, S., Khanna, A., Marshall, M., Moxon, S., Sonnhammer, E.L.L., Studholme, D.J., Yeats, C., Eddy, S.R.: The Pfam protein families database. Nucl. Acids Res. 32, D138–D141 (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Chetan Kumar
    • 1
  • Nishith Kumar
    • 1
  • Sarani Rangarajan
    • 1
  • Narayanaswamy Balakrishnan
    • 2
  • Kanagaraj Sekar
    • 1
    • 2
  1. 1.Bioinformatics Centre (Centre of excellence in Structural Biology, and Bio-computing)India
  2. 2.Supercomputer Education and Research CentreIndian Institute of ScienceBangaloreIndia

Personalised recommendations