Self-overlapping Occurrences and Knuth-Morris-Pratt Algorithm for Weighted Matching

  • Aude Liefooghe
  • Hélène Touzet
  • Jean-Stéphane Varré
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5457)

Abstract

Position Weight Matrices are broadly used probabilistic motif models. In this paper, we address the problem of identifying and characterizing potential overlaps between occurrences of such a motif. It has useful applications to the statistics of the number of occurrences, and to weighted pattern matching with an extension of the well-known Knuth-Morris-Pratt algorithm.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Mount, S.: A catalogue of splice junction sequences. Nucleic Acids Research 10, 459–472 (1982)Google Scholar
  2. 2.
    Hulo, N., Sigrist, C., Saux, V.L., Langendijk-Genevaux, P., Bordoli, L., Gattiker, A., Castro, E.D., Bucher, P., Bairoch, A.: Recent improvements to the PROSITE database. Nucleic Acids Research 32, 134 (2004)CrossRefGoogle Scholar
  3. 3.
    Ewens, W., Grant, G.: Statistical Methods in Bioinformatics. Springer, Heidelberg (2005)CrossRefMATHGoogle Scholar
  4. 4.
    Pape, U., Rahmann, S., Sun, F., Vingron, M.: Compound poisson approximation of the number of occurrences of a position frequency matrix (PFM) on both strands. Journal of Computation Biology 15, 547–564 (2008)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Knuth, D., Morris Jr., J., Pratt, V.: Fast pattern matching in strings. SIAM Journal on Computing (1977)Google Scholar
  6. 6.
    Boyer, R.S., Moore, J.S.: A fast string searching algorithm. Commun. ACM 20, 762–772 (1977)CrossRefMATHGoogle Scholar
  7. 7.
    Aho, A., Corasick, M.: Efficient string matching: an aid to bibliographic search. Communications of the ACM (1975)Google Scholar
  8. 8.
    Sandelin, A., Alkema, W., Engström, P., Wasserman, W.: Jaspar: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Research (2004)Google Scholar
  9. 9.
    Wu, T.D., Nevill-Manning, C.G., Brutlag, D.L.: Fast probabilistic analysis of sequence function using scoring matrices. Bioinformatics 16, 233–244 (2000)CrossRefGoogle Scholar
  10. 10.
    Staden, R.: Methods for calculating the probabilities of finding patterns in sequences. Comput. Appl. Biosci. 5, 89–96 (1989)Google Scholar
  11. 11.
    Touzet, H., Varré, J.S.: Efficient and accurate p-value computation for position weight matrices. Algorithms for Molecular Biology 2 (2007)Google Scholar
  12. 12.
    Liefooghe, A., Touzet, H., Varré, J.S.: Large scale matching for position weight matrices. In: Lewenstein, M., Valiente, G. (eds.) CPM 2006. LNCS, vol. 4009, pp. 401–412. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  13. 13.
    Beckstette, M., Homann, R., Giegerich, R., Kurtz, S.: Fast index based algorithms and software for matching position specific scoring matrices. BMC Bioinformatics (2006)Google Scholar
  14. 14.
    Pizzi, C., Rastas, P., Ukkonen, E.: Fast search algorithms for position specific scoring matrices. In: Hochreiter, S., Wagner, R. (eds.) BIRD 2007. LNCS (LNBI), vol. 4414, pp. 239–250. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  15. 15.
    Salmela, L., Tarhio, J.: Algorithms for weighted matching. In: Ziviani, N., Baeza-Yates, R. (eds.) SPIRE 2007. LNCS, vol. 4726, pp. 276–286. Springer, Heidelberg (2007)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Aude Liefooghe
    • 1
    • 2
  • Hélène Touzet
    • 1
    • 2
  • Jean-Stéphane Varré
    • 1
    • 2
  1. 1.LIFL - UMR CNRS 8022Université des Sciences et Technologies de LilleVilleneuve d’Ascq CedexFrance
  2. 2.INRIA Lille Nord-EuropeVilleneuve d’AscqFrance

Personalised recommendations