Fast Search Algorithms for Position Specific Scoring Matrices

  • Cinzia Pizzi
  • Pasi Rastas
  • Esko Ukkonen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4414)

Abstract

Fast search algorithms for finding good instances of patterns given as position specific scoring matrices are developed, and some empirical results on their performance on DNA sequences are reported. The algorithms basically generalize the Aho–Corasick, filtration, and superalphabet techniques of string matching to the scoring matrix search. As compared to the naive search, our algorithms can be faster by a factor which is proportional to the length of the pattern. In our experimental comparison of different algorithms the new algorithms were clearly faster than the naive method and also faster than the well-known lookahead scoring algorithm. The Aho–Corasick technique is the fastest for short patterns and high significance thresholds of the search. For longer patterns the filtration method is better while the superalphabet technique is the best for very long patterns and low significance levels. We also observed that the actual speed of all these algorithms is very sensitive to implementation details.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic Local Alignment Search Tool. Journal of Molecular Biology 215(3), 403–410 (1990)Google Scholar
  2. 2.
    Attwood, T.K., Beck, M.E.: PRINTS - A Protein Motif Finger-print Database. Protein Engineering 7(7), 841–848 (1994)CrossRefGoogle Scholar
  3. 3.
    Beckstette, M., Homann, R., Giegerich, R., Kurtz, S.: Fast index based algorithms and software for matching position specific scoring matrices. BMC Bioinformatics 7, 389 (2006)CrossRefGoogle Scholar
  4. 4.
    Crochemore, M., Rytter, W.: Text Algorithms. Oxford University Press, Oxford (1994)MATHGoogle Scholar
  5. 5.
    Dorohonceanu, B., Neville-Manning, C.G.: Accelerating Protein Classification Using Suffix Trees. In: Proc. of the 8th International Conference on Intelligent Systems for Molecular Biology (ISMB), pp. 128–133 (2000)Google Scholar
  6. 6.
    Freschi, V., Bogliolo, A.: Using Sequence Compression to Speedup Probabilistic Profile Matching. Bioinformatics 21(10), 2225–2229 (2005)CrossRefGoogle Scholar
  7. 7.
    Gribskov, M., McLachlan, A.D., Eisenberg, D.: Profile Analysis: Detection of Distantly related Proteins. Proc. Natl. Acad. Sci. 84(13), 4355–4358 (1987)CrossRefGoogle Scholar
  8. 8.
    Hallikas, O., Palin, K., Sinjushina, N., Rautiainen, R., Partanen, J., Ukkonen, E., Taipale, J.: Genome-wide prediction of mammalian enhancers based on analysis of transcription-factor binding affinity. Cell 124, 47–59 (2006)CrossRefGoogle Scholar
  9. 9.
    Henikoff, S., Wallace, J.C., Brown, J.P.: Finding protein similarities with nucleotide sequence databases. Methods Enzymol. 183, 111–132 (1990)CrossRefGoogle Scholar
  10. 10.
    Henikoff, J.G., Greene, E.A., Pietrokovski, S., Henikoff, S.: Increased Coverage of Protein Families with the Blocks Database Servers. Nucleic Acids Research 28(1), 228–230 (2000)CrossRefGoogle Scholar
  11. 11.
    Liefhooghe, A., Touzet, H., Varre, J.: Large Scale Matching for Position Weight Matrices. In: Pinho, L.M., González Harbour, M. (eds.) Ada-Europe 2006. LNCS, vol. 4006, pp. 401–412. Springer, Heidelberg (2006)Google Scholar
  12. 12.
    Matys, V., Fricke, E., Geffers, R., Gossling, E., Haubrock, M., Hehl, R., Hornischer, K., Karas, D., Kel, A.E., Kel-Margoulis, O.V., Kloos, D.U., Land, S., Lewicki-Potapov, B., Michael, H., Munch, R., Reuter, I., Rotert, S., Saxel, H., Scheer, M., Thiele, S., Wingender, E.: TRANSFAC: Transcriptional Regulation, from Patterns to Profiles. Nucleic Acids Research 31(1), 374–378 (2003)CrossRefGoogle Scholar
  13. 13.
    Navarro, G., Raffinot, M.: Flexible Pattern Matching in Strings. Cambridge University Press, Cambridge (2002)MATHGoogle Scholar
  14. 14.
    Quandt, K., Frech, K., Karas, H., Wingender, E., Werner, T.: MatInd and MatInspector: New Fast and Versatile Tools for Detection of Consensus Matches in Nucleotide Sequences Data. Nucleic Acid Research 23(23), 4878–4884 (1995)CrossRefGoogle Scholar
  15. 15.
    Rajasekaran, S., Jin, X., Spouge, J.L.: The Efficient Computation of Position-Specific Match Scores with the Fast Fourier Transform. Journal of Computational Biology 9(1), 23–33 (2002)CrossRefGoogle Scholar
  16. 16.
    Sandelin, A., Alkema, W., Engstrom, P., Wasserman, W.W., Lanhard, B.: JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Research 32, D91–D94 (2004)CrossRefGoogle Scholar
  17. 17.
    Scordis, P., Flower, D.R., Attwood, T.: FingerPRINTScan: Intelligent Searching of the PRINTS Motif Database. Bioinformatics 15(10), 799–806 (1999)CrossRefGoogle Scholar
  18. 18.
    Staden, R.: Methods for calculating the probabilities of finding patterns in sequences. CABIOS 5(2), 89–96 (1989)Google Scholar
  19. 19.
    Stormo, G.D., Schneider, T.D., Gold, L.M., Ehrenfeucht, A.: Use of the ‘Perceptron’ Algorithm to Distinguish Translational Initiation Sites in E.coli. Nucleic Acid Research 10, 2997–3012 (1982)CrossRefGoogle Scholar
  20. 20.
    Stormo, G.D.: Probing Information Content of DNA-binding Sites. Methods in Enzymology 208, 458–468 (1991)Google Scholar
  21. 21.
    Ukkonen, E.: Approximate string-matching with q-grams and maximal matches. Theoretical Computer Science 92, 191–211 (1992)MATHCrossRefMathSciNetGoogle Scholar
  22. 22.
    Wallace, J.C., Henikoff, S.: PATMAT: a Searching and Extraction Program for Sequence, Pattern and Block Queries and Databases. CABIOS 8(3), 249–254 (1992)Google Scholar
  23. 23.
    Wu, T.D., Neville-Manning, C.G., Brutlag, D.L.: Fast Probabilistic Analysis of Sequence Function using Scoring Matrices. Bioinformatics 16(3), 233–244 (2000)CrossRefGoogle Scholar

Copyright information

© Springer Berlin Heidelberg 2007

Authors and Affiliations

  • Cinzia Pizzi
    • 1
  • Pasi Rastas
    • 1
  • Esko Ukkonen
    • 1
  1. 1.Department of Computer Science and, Helsinki Institute for Information Technology HIIT, P.O Box 68, FIN-00014 University of HelsinkiFinland

Personalised recommendations