Mutual Enrichment in Ranked Lists and the Statistical Assessment of Position Weight Matrix Motifs

  • Limor Leibovich
  • Zohar Yakhini
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8126)


Statistics in ranked lists is important in analyzing molecular biology measurement data, such as ChIP-seq, which yields ranked lists of genomic sequences. State of the art methods study fixed motifs in ranked lists. More flexible models such as position weight matrix (PWM) motifs are not addressed in this context. To assess the enrichment of a PWM motif in a ranked list we use a PWM induced second ranking on the same set of elements. Possible orders of one ranked list relative to the other are modeled by permutations. Due to sample space complexity, it is difficult to characterize tail distributions in the group of permutations. In this paper we develop tight upper bounds on tail distributions of the size of the intersection of the top of two uniformly and independently drawn permutations and demonstrate advantages of this approach using our software implementation, mmHG-Finder, to study PWMs in several datasets.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Abramowitz, M., Stegun, I.A.: Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables (1964)Google Scholar
  2. 2.
    Akavia, U.D., Litvin, O., Kim, J., Sanchez-Garcia, F., Kotliar, D., Causton, H.C., Pochanard, P., Mozes, E., Garraway, L.A., Pe’er, D.: An Integrated Approach to Uncover Drivers of Cancer. Cell 143(6), 1005–1017 (2010)CrossRefGoogle Scholar
  3. 3.
    Bailey, T.L., Elkan, C.: Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Machine Learning 21(1-2), 51–80 (1995)CrossRefGoogle Scholar
  4. 4.
    Bailey, T.L.: DREME: motif discovery in transcription factor ChIP-seq data. Bioinformatics 27(12), 1653–1659 (2011)CrossRefGoogle Scholar
  5. 5.
    Dehan, E., Ben-Dor, A., Liao, W., Lipson, D., Frimer, H., Rienstein, S., Simansky, D., Krupsky, M., Yaron, P., Friedman, E., et al.: Chromosomal aberrations and gene expression profiles in non-small cell lung cancer. Lung Cancer 56(2), 175–184 (2007)CrossRefGoogle Scholar
  6. 6.
    Eden, E., Navon, R., Steinfeld, I., Lipson, D., Yakhini, Z.: GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists. BMC Bioinformatics 10(1), 48 (2009)CrossRefGoogle Scholar
  7. 7.
    Eden, E., Lipson, D., Yogev, S., Yakhini, Z.: Discovering Motifs in Ranked Lists of DNA Sequences. PLoS Comput. Biol. 3(3), e39 (2007)Google Scholar
  8. 8.
    Enerly, E., Steinfeld, I., Kleivi, K., Leivonen, S.-K., Ragle-Aure, M., Russnes, H.G., Rønneberg, J.A., Johnsen, H., Navon, R., Rødland, E., et al.: miRNA-mRNA Integrated Analysis Reveals Roles for miRNAs in Primary Breast Tumors. PLoS ONE 6(2), e16915 (2011)CrossRefGoogle Scholar
  9. 9.
    Hafner, M., Landthaler, M., Burger, L., Khorshid, M., Hausser, J., Berninger, P., Rothballer, A., Ascano Jr., M., Jungkamp, A.-C., Munschauer, M., et al.: Transcriptome-wide Identification of RNA-Binding Protein and MicroRNA Target Sites by PAR-CLIP. Cell 141(1), 129–141 (2010)CrossRefGoogle Scholar
  10. 10.
    Harbison, C.T., Gordon, D.B., Lee, T.I., Rinaldi, N.J., Macisaac, K.D., Danford, T.W., Hannett, N.M., Tagne, J.-B., Reynolds, D.B., Yoo, J., et al.: Transcriptional regulatory code of a eukaryotic genome. Nature 431(7004), 99–104 (2004)CrossRefGoogle Scholar
  11. 11.
    Hertz, G.Z., Stormo, G.D.: Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15(7-8), 563–577 (1999)CrossRefGoogle Scholar
  12. 12.
    Hogan, D.J., Riordan, D.P., Gerber, A.P., Herschlag, D., Brown, P.O.: Diverse RNA-Binding Proteins Interact with Functionally Related Sets of RNAs, Suggesting an Extensive Regulatory System. PLoS Biol. 6(10), e255 (2008)Google Scholar
  13. 13.
    Lebedeva, S., Jens, M., Theil, K., Schwanhäusser, B., Selbach, M., Landthaler, M., Rajewsky, N.: Transcriptome-wide Analysis of Regulatory Interactions of the RNA-Binding Protein HuR. Molecular Cell 43(3), 340–352 (2011)CrossRefGoogle Scholar
  14. 14.
    Lee, B.-K., Bhinge, A.A., Iyer, V.R.: Wide-ranging functions of E2F4 in transcriptional activation and repression revealed by genome-wide analysis. Nucleic Acids Research 39(9), 3558–3573 (2011)CrossRefGoogle Scholar
  15. 15.
    Leibovich, L., Yakhini, Z.: Efficient motif search in ranked lists and applications to variable gap motifs. Nucleic Acids Research 40(13), 5832–5847 (2012)CrossRefGoogle Scholar
  16. 16.
    Leibovich, L., Paz, I., Yakhini, Z., Mandel-Gutfreund, Y.: DRIMust: a web server for discovering rank imbalanced motifs using suffix trees. Nucleic Acids Research 41(W1), W174–W179 (2013)CrossRefGoogle Scholar
  17. 17.
    Luehr, S., Hartmann, H., Söding, J.: The XXmotif web server for eXhaustive, weight matriX-based motif discovery in nucleotide sequences. Nucleic Acids Research 41(W1), W104–W109 (2012)Google Scholar
  18. 18.
    Plis, S.M., Weisend, M.P., Damaraju, E., Eichele, T., Mayer, A., Clark, V.P., Lane, T., Calhoun, V.D.: Effective connectivity analysis of fMRI and MEG data collected under identical paradigms. Computers in Biology and Medicine 41(12), 1156–1165 (2011)CrossRefGoogle Scholar
  19. 19.
    Ragle-Aure, M., Steinfeld, I., Baumbusch, L.O., Liestøl, K., Lipson, D., Nyberg, S., Naume, B., Sahlberg, K.K., Kristensen, V.N., Børresen-Dale, A.-L., et al.: Identifying In-Trans Process Associated Genes in Breast Cancer by Integrated Analysis of Copy Number and Expression Data. PLoS ONE 8(1), e53014 (2013)Google Scholar
  20. 20.
    Rhee, H.S., Pugh, B.F.: Comprehensive Genome-wide Protein-DNA Interactions Detected at Single-Nucleotide Resolution. Cell 147(6), 1408–1419 (2011)CrossRefGoogle Scholar
  21. 21.
    Al-Shahrour, F., Díaz-Uriarte, R., Dopazo, J.: FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes. Bioinformatics 20(4), 578–580 (2004)CrossRefGoogle Scholar
  22. 22.
    Sinha, S.: On counting position weight matrix matches in a sequence, with application to discriminative motif finding. Bioinformatics 22(14), e454-e463 (2006)Google Scholar
  23. 23.
    Smeenk, L., van Heeringen, S.J., Koeppel, M., van Driel, M.A., Bartels, S.J.J., Akkers, R.C., Denissov, S., Stunnenberg, H.G., Lohrum, M.: Characterization of genome-wide p53-binding sites upon stress response. Nucleic Acids Research 36(11), 3639–3654 (2008)CrossRefGoogle Scholar
  24. 24.
    Staden, R.: Computer methods to locate signals in nucleic acid sequences. Nucleic Acids Research 12(1 Part 2), 505–519 (1984)Google Scholar
  25. 25.
    Steinfeld, I., Navon, R., Ach, R., Yakhini, Z.: miRNA target enrichment analysis reveals directly active miRNAs in health and disease. Nucleic Acids Research 41(3), e45–e45 (2013)Google Scholar
  26. 26.
    Steinfeld, I., Navon, R., Ardigò, D., Zavaroni, I., Yakhini, Z.: Clinically driven semi-supervised class discovery in gene expression data. Bioinformatics 24(16), i90–i97 (2008)Google Scholar
  27. 27.
    Stormo, G.D., Schneider, T.D., Gold, L.: Quantitative analysis of the relationship between nucleotide sequence and functional activity. Nucleic Acids Research 14(16), 6661–6679 (1986)CrossRefGoogle Scholar
  28. 28.
    Straussman, R., Nejman, D., Roberts, D., Steinfeld, I., Blum, B., Benvenisty, N., Simon, I., Yakhini, Z., Cedar, H.: Developmental programming of CpG island methylation profiles in the human genome. Nat. Struct. Mol. Biol. 16(5), 564–571 (2009)CrossRefGoogle Scholar
  29. 29.
    Subramanian, A., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gillette, M.A., Paulovich, A., Pomeroy, S.L., Golub, T.R., Lander, E.S., et al.: Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America 102(43), 15545–15550 (2005)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Limor Leibovich
    • 1
  • Zohar Yakhini
    • 1
    • 2
  1. 1.Department of Computer ScienceTechnion – Israel Institute of TechnologyTechnion CityIsrael
  2. 2.Agilent Laboratories IsraelPetach-TikvaIsrael

Personalised recommendations