Large Scale Matching for Position Weight Matrices

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4009)


This paper addresses the problem of multiple pattern matching for motifs encoded by Position Weight Matrices. We first present an algorithm that uses a multi-index table to preprocess the set of motifs, allowing a dramatically decrease of computation time. We then show how to take benefit from simlar motifs to prevent useless computations.


Transcription Factor Binding Site Similar Matrice Score Threshold Position Weight Matrix Weighted Pattern 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Chung, Y.S., Peng, S.L., Tang, C.Y., Yang, J.M.: Finging k-cliques on a k-partite graph. In: 22nd Workshop on Combinatorial Mathematics and Computational Theory (2005)Google Scholar
  2. 2.
    Claverie, J.M., Audic, S.: The statistical significance of nucleotide position-weight matrix matches. Computer Applications in the Biosciences 12(5), 431–439 (1996)Google Scholar
  3. 3.
    Elkon, R., Linhart, C., Sharan, R., Shamir, R., Shiloh, Y.: Genome-wide in silico identification of transcriptional regulators controlling the cell cycle in human cells. Genome Research 13, 773–780 (2003)CrossRefGoogle Scholar
  4. 4.
    Grunert, T., Irnich, S., Zimmermann, H.-J., Schneider, M., Wulfhorst, B.: Cliques in k-partite graphs and their application in textile engineering (2002)Google Scholar
  5. 5.
    Huang, H., Kao, M.-C.J., Zhou, X., Liu, J.S., Wong, W.H.: Determination of local statistical significance of patterns in markov sequences with application to promoter element identification. J. Comput. Biol. 11(1), 1–14 (2004)CrossRefzbMATHGoogle Scholar
  6. 6.
    Marinescu, V.D., Kohane, I.S., Riva, A.: The MAPPER database: a multi-genome catalog of putative transcription factor binding sites. Nucleic Acids Research 33, Database issue: D91–D97 (2005)Google Scholar
  7. 7.
    Sandelin, A., Alkema, W., Engstrom, P., Wasserman, W.W., Lenhard, B.: JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Research, Database issue: D91-D94 (2004)Google Scholar
  8. 8.
    Sandelin, A., Wasserman, W.W.: Constrained binding site diversity within families of transcription factors enhances pattern discovery bioinformatics. Journal of Molecular Biology 338(2), 207–215 (2004)CrossRefGoogle Scholar
  9. 9.
    Schones, E.D., Sumazin, P., Zhang, M.Q.: Similarity of position frequency matrices for transcription factor binding sites. Bioinformatics 21(3), 307–313 (2005)CrossRefGoogle Scholar
  10. 10.
    Kielbasa, S.M., Gonze, D., Herzel, H.: Measuring similarities between transcription factor binding sites. BMC Bioinformatics 6(237) (2005)Google Scholar
  11. 11.
    Staden, R.: Methods for calculating the probabilities of finding patterns in sequences. Computer Applications in the Biosciences 5, 89–96 (1989)Google Scholar
  12. 12.
    Stormo, G.D., Fields, D.S.: Specificity, free energy and information content in protein-DNA interactions. Trends in biochemical sciences 23, 109–113 (1998)CrossRefGoogle Scholar
  13. 13.
    Sui, S.J.H., Mortimer, J.R., Arenillas, D.J., Brumm, J., Walsh, C.J., Kennedy, B.P., Wasserman, W.W.: oPOSSUM: identification of over-represented transcription factor binding sites in co-expressed genes. Nucleic Acids Res. 33(10), 3154–3164 (2005)CrossRefGoogle Scholar
  14. 14.
    Wingender, E., Chen, X., Hehl, R., Karas, I., Liebich, I., Matys, V., Meinhardt, T., Pruss, M., Reuter, I., Schacherer, F.: TRANSFAC: an integrated system for gene expression regulation. Nucleic Acids Research 28(1), 316–319 (2000)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  1. 1.LIFL – UMR CNRS 8022Villeneuve d’AscqFrance

Personalised recommendations