Analysis of cis-Regulatory Elements in Gene Co-expression Networks in Cancer

  • Martin Triska
  • Alexander Ivliev
  • Yuri Nikolsky
  • Tatiana V. TatarinovaEmail author
Part of the Methods in Molecular Biology book series (MIMB, volume 1613)


Analysis of gene co-expression networks is a powerful “data-driven” tool, invaluable for understanding cancer biology and mechanisms of tumor development. Yet, despite of completion of thousands of studies on cancer gene expression, there were few attempts to normalize and integrate co-expression data from scattered sources in a concise “meta-analysis” framework. Here we describe an integrated approach to cancer expression meta-analysis, which combines generation of “data-driven” co-expression networks with detailed statistical detection of promoter sequence motifs within the co-expression clusters. First, we applied Weighted Gene Co-Expression Network Analysis (WGCNA) workflow and Pearson’s correlation to generate a comprehensive set of over 3000 co-expression clusters in 82 normalized microarray datasets from nine cancers of different origin. Next, we designed a genome-wide statistical approach to the detection of specific DNA sequence motifs based on similarities between the promoters of similarly expressed genes. The approach, realized as cisExpress software module, was specifically designed for analysis of very large data sets such as those generated by publicly accessible whole genome and transcriptome projects. cisExpress uses a task farming algorithm to exploit all available computational cores within a shared memory node.

We discovered that although co-expression modules are populated with different sets of genes, they share distinct stable patterns of co-regulation based on promoter sequence analysis. The number of motifs per co-expression cluster varies widely in accordance with cancer tissue of origin, with the largest number in colon (68 motifs) and the lowest in ovary (18 motifs). The top scored motifs are typically shared between several tissues; they define sets of target genes responsible for certain functionality of cancerogenesis. Both the co-expression modules and a database of precalculated motifs are publically available and accessible for further studies.

Key words

Promoters Motifs Gene expression Genome annotation Co-expression clusters Cancer 


  1. 1.
    Consortium, S.M.-I (2014) A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the sequencing quality control consortium. Nat Biotechnol 32(9):903–914CrossRefGoogle Scholar
  2. 2.
    Kristensen VN et al (2014) Principles and methods of integrative genomic analyses in cancer. Nat Rev Cancer 14(5):299–313CrossRefPubMedGoogle Scholar
  3. 3.
    Zhao Z et al (2016) Drug drug interaction extraction from biomedical literature using syntax convolutional neural network. Bioinformatics 32(22):3444–3453PubMedCentralGoogle Scholar
  4. 4.
    Zolotarenko A et al (2016) Integrated computational approach to the analysis of RNASeq data reveals new transcriptional regulators for psoriasis. Exp Mol Med 48(11):e268CrossRefPubMedPubMedCentralGoogle Scholar
  5. 5.
    Ivliev AE, ‘t Hoen PAC, Borisevich D, Nikolsky Y, Sergeeva MG (2016) Drug repositioning through systematic mining of gene coexpression networks in cancer. PLoS One 11(11):e0165059. doi: 10.1371/journal.pone.0165059 CrossRefPubMedPubMedCentralGoogle Scholar
  6. 6.
    Langfelder P, Horvath S (2008) WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9:559CrossRefPubMedPubMedCentralGoogle Scholar
  7. 7.
    MacIsaac KD, Fraenkel E (2006) Practical strategies for discovering regulatory DNA sequence motifs. PLoS Comput Biol 2(4):e36CrossRefPubMedPubMedCentralGoogle Scholar
  8. 8.
    Shi Q et al (2010) Biomarkers for drug-induced liver injury. Expert Rev Gastroenterol Hepatol 4(2):225–234CrossRefPubMedGoogle Scholar
  9. 9.
    Weirauch MT et al (2013) Evaluation of methods for modeling transcription factor sequence specificity. Nat Biotechnol 31(2):126–134CrossRefPubMedCentralGoogle Scholar
  10. 10.
    Pavesi G, Mauri G, Pesole G (2001) An algorithm for finding signals of unknown length in DNA sequences. Bioinformatics 17(Suppl 1):S207–S214CrossRefPubMedGoogle Scholar
  11. 11.
    Troukhan M et al (2009) Genome-wide discovery of cis-elements in promoter sequences using gene expression data. OMICS J Integr Biol 13(2)Google Scholar
  12. 12.
    Bussemaker HJ, Li H, Siggia ED (2000) Building a dictionary for genomes: identification of presumptive regulatory sites by statistical analysis. Proc Natl Acad Sci U S A 97(18):10096–10100CrossRefPubMedPubMedCentralGoogle Scholar
  13. 13.
    Foat BC, Morozov AV, Bussemaker HJ (2006) Statistical mechanical modeling of genome-wide transcription factor occupancy data by MatrixREDUCE. Bioinformatics 22(14):e141–e149CrossRefPubMedGoogle Scholar
  14. 14.
    Wang G, Yu T, Zhang W (2005) WordSpy: identifying transcription factor binding motifs by building a dictionary and learning a grammar. Nucleic Acids Res 33(Web Server issue):W412–W416CrossRefPubMedPubMedCentralGoogle Scholar
  15. 15.
    Sabatti C, Lange K (2002) Genomewide motif identification using a dictionary model. Proc IEEE 90(11):1803–1810CrossRefGoogle Scholar
  16. 16.
    Sabatti C et al (2005) Vocabulon: a dictionary model approach for reconstruction and localization of transcription factor binding sites. Bioinformatics 21(7):922–931CrossRefPubMedGoogle Scholar
  17. 17.
    Halperin Y et al (2009) Allegro: analyzing expression and sequence in concert to discover regulatory programs. Nucleic Acids Res 37(5):1566–1579CrossRefPubMedPubMedCentralGoogle Scholar
  18. 18.
    Orenstein Y, Linhart C, Shamir R (2012) Assessment of algorithms for inferring positional weight matrix motifs of transcription factor binding sites using protein binding microarray data. PLoS One 7(9):e46145CrossRefPubMedPubMedCentralGoogle Scholar
  19. 19.
    Triska M et al (2013) cisExpress: motif detection in DNA sequences. Bioinformatics 29(17):2203–2205CrossRefPubMedPubMedCentralGoogle Scholar
  20. 20.
    Hughes JD et al (2000) Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. J Mol Biol 296(5):1205–1214CrossRefPubMedGoogle Scholar
  21. 21.
    Roth FP et al (1998) Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation. Nat Biotechnol 16(10):939–945CrossRefPubMedGoogle Scholar
  22. 22.
    Bailey TL et al (2006) MEME discovering and analyzing DNA and protein sequence motifs. Nucleic Acids Res 34(Web Server issue):W369–W373CrossRefPubMedPubMedCentralGoogle Scholar
  23. 23.
    Subramanian AR, Kaufmann M, Morgenstern B (2008) DIALIGN-TX: greedy and progressive approaches for segment-based multiple sequence alignment. Algorithms Mol Biol 3:6CrossRefPubMedPubMedCentralGoogle Scholar
  24. 24.
    Do CB, Batzoglou S (2008) What is the expectation maximization algorithm? Nat Biotechnol 26(8):897–899CrossRefPubMedGoogle Scholar
  25. 25.
    Thompson W, Rouchka EC, Lawrence CE (2003) Gibbs recursive sampler: finding transcription factor binding sites. Nucleic Acids Res 31(13):3580–3585CrossRefPubMedPubMedCentralGoogle Scholar
  26. 26.
    Jolma A et al (2013) DNA-binding specificities of human transcription factors. Cell 152(1–2):327–339CrossRefGoogle Scholar
  27. 27.
    Tharakaraman K et al (2005) Alignments anchored on genomic landmarks can aid in the identification of regulatory elements. Bioinformatics 21(Suppl 1):i440–i448CrossRefPubMedPubMedCentralGoogle Scholar
  28. 28.
    Sandve GK, Drablos F (2006) A survey of motif discovery methods in an integrated framework. Biol Direct 1:11CrossRefPubMedPubMedCentralGoogle Scholar
  29. 29.
    Tompa M et al (2005) Assessing computational tools for the discovery of transcription factor binding sites. Nat Biotechnol 23(1):137–144CrossRefPubMedGoogle Scholar
  30. 30.
    Whalen S, Truty RM, Pollard KS (2016) Enhancer-promoter interactions are encoded by complex genomic signatures on looping chromatin. Nat Genet 48(5):488–496CrossRefPubMedPubMedCentralGoogle Scholar
  31. 31.
    Barrett T et al (2011) NCBI GEO: archive for functional genomics data sets—10 years on. Nucleic Acids Res 39(Database issue):D1005–D1010CrossRefPubMedGoogle Scholar
  32. 32.
    Barrett T et al (2013) NCBI GEO: archive for functional genomics data sets—update. Nucleic Acids Res 41(Database issue):D991–D995CrossRefPubMedGoogle Scholar
  33. 33.
    Edgar R, Domrachev M, Lash AE (2002) Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 30(1):207–210CrossRefPubMedPubMedCentralGoogle Scholar
  34. 34.
    Ivliev AE, Hoen PA, Sergeeva MG (2010) Coexpression network analysis identifies transcriptional modules related to proastrocytic differentiation and sprouty signaling in glioma. Cancer Res 70(24):10060–10070CrossRefPubMedGoogle Scholar
  35. 35.
    Miller JA, Oldham MC, Geschwind DH (2008) A systems level analysis of transcriptional changes in Alzheimer’s disease and normal aging. J Neurosci 28(6):1410–1420CrossRefPubMedPubMedCentralGoogle Scholar
  36. 36.
    Miller JA, Horvath S, Geschwind DH (2010) Divergence of human and mouse brain transcriptome highlights Alzheimer disease pathways. Proc Natl Acad Sci U S A 107(28):12698–12703CrossRefPubMedPubMedCentralGoogle Scholar
  37. 37.
    Suzuki A et al (2015) DBTSS as an integrative platform for transcriptome, epigenome and genome sequence variation data. Nucleic Acids Res 43(Database issue):D87–D91CrossRefGoogle Scholar
  38. 38.
    Tatarinova T et al (2013) NPEST: a nonparametric method and a database for transcription start site prediction. Quant Biol 1(4):61–271CrossRefGoogle Scholar
  39. 39.
    Zhang B, Horvath S (2005) A general framework for weighted gene co-expression network analysis. Stat Appl Genet Mol Biol 4:17CrossRefGoogle Scholar
  40. 40.
    Langfelder P, Zhang B, Horvath S (2008) Defining clusters from a hierarchical cluster tree: the dynamic tree cut package for R. Bioinformatics 24(5):719–720CrossRefPubMedGoogle Scholar
  41. 41.
    Whitley D (1994) A genetic algorithm tutorial. Stat Comput 4(2):65–85CrossRefGoogle Scholar
  42. 42.
    Wall M (2007) GAlibA C++ library of genetic algorithm components. Massachusetts Institute of Technology, Cambridge, MAGoogle Scholar
  43. 43.
    Sonkin D et al (2013) Tumor suppressors status in cancer cell line encyclopedia. Mol Oncol 7(4):791–798CrossRefPubMedPubMedCentralGoogle Scholar
  44. 44.
    Thakur S et al (2003) Regulation of BRCA1 transcription by specific single-stranded DNA binding factors. Mol Cell Biol 23(11):3774–3787CrossRefPubMedPubMedCentralGoogle Scholar
  45. 45.
    Nevins JR (2001) The Rb/E2F pathway and cancer. Hum Mol Genet 10(7):699–703CrossRefPubMedGoogle Scholar
  46. 46.
    Evangelou K, Havaki S, Kotsinas A (2014) E2F transcription factors and digestive system malignancies: how much do we know? World J Gastroenterol 20(29):10212–10216CrossRefPubMedPubMedCentralGoogle Scholar
  47. 47.
    Xanthoulis A, Tiniakos DG (2013) E2F transcription factors and digestive system malignancies: how much do we know? World J Gastroenterol 19(21):3189–3198CrossRefPubMedPubMedCentralGoogle Scholar
  48. 48.
    Sadasivan E, Cedeno MM, Rothenberg SP (1994) Characterization of the gene encoding a folate-binding protein expressed in human placenta. Identification of promoter activity in a G-rich SP1 site linked with the tandemly repeated GGAAG motif for the ets encoded GA-binding protein. J Biol Chem 269(7):4725–4735PubMedGoogle Scholar
  49. 49.
    Bell RJ et al (2015) Cancer. The transcription factor GABP selectively binds and activates the mutant TERT promoter in cancer. Science 348(6238):1036–1039CrossRefPubMedPubMedCentralGoogle Scholar
  50. 50.
    Siddharthan R (2010) Dinucleotide weight matrices for predicting transcription factor binding sites: generalizing the position weight matrix. PLoS One 5(3):e9722CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer Science+Business Media LLC 2017

Authors and Affiliations

  • Martin Triska
    • 1
  • Alexander Ivliev
    • 2
  • Yuri Nikolsky
    • 3
    • 4
  • Tatiana V. Tatarinova
    • 1
    • 5
    • 6
    Email author
  1. 1.Spatial Sciences InstituteUniversity of Southern CaliforniaLos AngelesUSA
  2. 2.Thomson ReutersBostonUSA
  3. 3.Prosapia GeneticsSolana BeachUSA
  4. 4.School of Systems BiologyGeorge Mason UniversityFairfaxUSA
  5. 5.Center for Personalized MedicineChildren’s Hospital Los AngelesLos AngelesUSA
  6. 6.A.A. Kharkevich Institute for Information Transmission Problems RASMoscowRussia

Personalised recommendations