Advertisement

SimpLiSMS: A Simple, Lightweight and Fast Approach for Structured Motifs Searching

  • Ali Alatabbi
  • Shuhana Azmin
  • Md. Kawser Habib
  • Costas S. Iliopoulos
  • M. Sohel Rahman
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9044)

Abstract

A Structured Motif refers to a sequence of simple motifs with distance constraints. We present SimpLiSMS, a simple, lightweight and fast algorithm for searching structured motifs. SimpLiSMS does not use any sophisticated data structure, which makes it simple and lightweight. Our experiments show excellent performance of SimpLiSMS. Furthermore, we introduce a parallel version of SimpLiSMS which runs even faster.

Keywords

Structure Motif Character Class Nucleic Acid Research Fast Approach PROSITE Database 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aho, A.V., Corasick, M.J.: Efficient string matching: an aid to bibliographic search. Communications of the ACM 18(6), 333–340 (1975)CrossRefzbMATHMathSciNetGoogle Scholar
  2. 2.
    Bailey, T.L., Bodén, M., Buske, F.A., Frith, M.C., Grant, C.E., Clementi, L., Ren, J., Li, W.W., Noble, W.S.: MEME SUITE: tools for motif discovery and searching. Nucleic Acids Research 37(Web-Server-Issue), 202–208 (2009)CrossRefGoogle Scholar
  3. 3.
    Bailey, T.L., Williams, N., Misleh, C., Li, W.W.: MEME: discovering and analyzing DNA and protein sequence motifs. Nucleic Acids Research 34(Web-Server-Issue), 369–373 (2006)CrossRefGoogle Scholar
  4. 4.
    Bille, P., Gortz, I.L., Vildhoj, H.W., Wind, D.K.: String matching with variable length gaps. Theor. Comput. Sci. 443, 25–34 (2012)CrossRefzbMATHMathSciNetGoogle Scholar
  5. 5.
    Bille, P., Thorup, M.: Regular expression matching with multi–strings and intervals. In: Charikar, M. (ed.) ACM–SIAM Symp. on Discrete Algorithms, pp. 1297–1308. SIAM (2010)Google Scholar
  6. 6.
    Boyer, R.S., Moore, J.S.: A fast string searching algorithm. Communications of the ACM 20(10), 762–772 (1977)CrossRefzbMATHGoogle Scholar
  7. 7.
    Crochemore, M., Sagot, M.-F.: 1. motifs in sequences. In: Compact Handbook of Computational Biology, p. 47 (2004)Google Scholar
  8. 8.
    Grundy, W.N., Bailey, T.L., Elkan, C., Baker, M.E.: Meta-meme: motif-based hidden markov models of protein families. Computer Applications in the Biosciences 13(4), 397–406 (1997)Google Scholar
  9. 9.
    Halachev, M., Shiri, N.: Fast structured motif search in DNA sequences. In: Elloumi, M., Küng, J., Linial, M., Murphy, R.F., Schneider, K., Toma, C. (eds.) BIRD 2008. CCIS, vol. 13, pp. 58–73. Springer, Heidelberg (2008)Google Scholar
  10. 10.
    Hulo, N., Bairoch, A., Bulliard, V., Cerutti, L., De Castro, E., Langendijk-Genevaux, P.S., Pagni, M., Sigrist, C.J.A.: The prosite database. Nucleic Acids Research 34(suppl. 1), D227–D230 (2006)Google Scholar
  11. 11.
    Junier, T., Pagni, M., Bucher, P.: mmsearch: a motif arrangement language and search program. Bioinformatics 17(12), 1234–1235 (2001)CrossRefGoogle Scholar
  12. 12.
    Knuth, D.E., Morris, J.H., Pratt, V.R.: Fast pattern matching in strings. SIAM Journal of Computing 6(2), 323–350 (1977)CrossRefzbMATHMathSciNetGoogle Scholar
  13. 13.
    Morgante, M., Policriti, A., Vitacolonna, N., Zuccolo, A.: Structured motifs search. Journal of Computational Biology 12(8), 1065–1082 (2005)CrossRefGoogle Scholar
  14. 14.
    Navarro, G., Raffinot, M.: Fast and simple character classes and bounded gaps patternmatching, with application to protein searching. In: RECOMB, pp. 231–240 (2001)Google Scholar
  15. 15.
    Navarro, G., Raffinot, M.: Fast and simple character classes and bounded gaps pattern matching, with applications to protein searching. Journal of Computational Biology 10(6), 903–923 (2003)CrossRefGoogle Scholar
  16. 16.
    Pissis, S.P.: Motex-ii: structured motif extraction from large-scale datasets. BMC Bioinformatics 15, 235 (2014)CrossRefGoogle Scholar
  17. 17.
    Rahman, M.S., Iliopoulos, C.S., Lee, I., Mohamed, M., Smyth, W.F.: Finding patterns with variable length gaps or don’t cares. In: Chen, D.Z., Lee, D.T. (eds.) COCOON 2006. LNCS, vol. 4112, pp. 146–155. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  18. 18.
    Sigrist, C.J.A., de Castro, E., Cerutti, L., Cuche, B.A., Hulo, N., Bridge, A., Bougueleret, L., Xenarios, I.: New and continuing developments at prosite. Nucleic Acids Research 41(D1), D344–D347 (2013)Google Scholar
  19. 19.
    Zhang, Y., Zaki, M.J.: SMOTIF: efficient structured pattern and profile motif search. Algorithms for Molecular Biology, 1 (2006)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Ali Alatabbi
    • 1
  • Shuhana Azmin
    • 2
  • Md. Kawser Habib
    • 2
  • Costas S. Iliopoulos
    • 1
  • M. Sohel Rahman
    • 1
    • 2
  1. 1.Department of InformaticsKing’s College LondonUK
  2. 2.AℓEDA Group, Department of CSEBUETDhakaBangladesh

Personalised recommendations