Skip to main content

Analysis of cis-Regulatory Elements in Gene Co-expression Networks in Cancer

  • Protocol
  • First Online:
Biological Networks and Pathway Analysis

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1613))

Abstract

Analysis of gene co-expression networks is a powerful “data-driven” tool, invaluable for understanding cancer biology and mechanisms of tumor development. Yet, despite of completion of thousands of studies on cancer gene expression, there were few attempts to normalize and integrate co-expression data from scattered sources in a concise “meta-analysis” framework. Here we describe an integrated approach to cancer expression meta-analysis, which combines generation of “data-driven” co-expression networks with detailed statistical detection of promoter sequence motifs within the co-expression clusters. First, we applied Weighted Gene Co-Expression Network Analysis (WGCNA) workflow and Pearson’s correlation to generate a comprehensive set of over 3000 co-expression clusters in 82 normalized microarray datasets from nine cancers of different origin. Next, we designed a genome-wide statistical approach to the detection of specific DNA sequence motifs based on similarities between the promoters of similarly expressed genes. The approach, realized as cisExpress software module, was specifically designed for analysis of very large data sets such as those generated by publicly accessible whole genome and transcriptome projects. cisExpress uses a task farming algorithm to exploit all available computational cores within a shared memory node.

We discovered that although co-expression modules are populated with different sets of genes, they share distinct stable patterns of co-regulation based on promoter sequence analysis. The number of motifs per co-expression cluster varies widely in accordance with cancer tissue of origin, with the largest number in colon (68 motifs) and the lowest in ovary (18 motifs). The top scored motifs are typically shared between several tissues; they define sets of target genes responsible for certain functionality of cancerogenesis. Both the co-expression modules and a database of precalculated motifs are publically available and accessible for further studies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Consortium, S.M.-I (2014) A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the sequencing quality control consortium. Nat Biotechnol 32(9):903–914

    Article  Google Scholar 

  2. Kristensen VN et al (2014) Principles and methods of integrative genomic analyses in cancer. Nat Rev Cancer 14(5):299–313

    Article  CAS  PubMed  Google Scholar 

  3. Zhao Z et al (2016) Drug drug interaction extraction from biomedical literature using syntax convolutional neural network. Bioinformatics 32(22):3444–3453

    PubMed Central  Google Scholar 

  4. Zolotarenko A et al (2016) Integrated computational approach to the analysis of RNASeq data reveals new transcriptional regulators for psoriasis. Exp Mol Med 48(11):e268

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Ivliev AE, ‘t Hoen PAC, Borisevich D, Nikolsky Y, Sergeeva MG (2016) Drug repositioning through systematic mining of gene coexpression networks in cancer. PLoS One 11(11):e0165059. doi:10.1371/journal.pone.0165059

    Article  PubMed  PubMed Central  Google Scholar 

  6. Langfelder P, Horvath S (2008) WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9:559

    Article  PubMed  PubMed Central  Google Scholar 

  7. MacIsaac KD, Fraenkel E (2006) Practical strategies for discovering regulatory DNA sequence motifs. PLoS Comput Biol 2(4):e36

    Article  PubMed  PubMed Central  Google Scholar 

  8. Shi Q et al (2010) Biomarkers for drug-induced liver injury. Expert Rev Gastroenterol Hepatol 4(2):225–234

    Article  CAS  PubMed  Google Scholar 

  9. Weirauch MT et al (2013) Evaluation of methods for modeling transcription factor sequence specificity. Nat Biotechnol 31(2):126–134

    Article  CAS  PubMed Central  Google Scholar 

  10. Pavesi G, Mauri G, Pesole G (2001) An algorithm for finding signals of unknown length in DNA sequences. Bioinformatics 17(Suppl 1):S207–S214

    Article  PubMed  Google Scholar 

  11. Troukhan M et al (2009) Genome-wide discovery of cis-elements in promoter sequences using gene expression data. OMICS J Integr Biol 13(2)

    Google Scholar 

  12. Bussemaker HJ, Li H, Siggia ED (2000) Building a dictionary for genomes: identification of presumptive regulatory sites by statistical analysis. Proc Natl Acad Sci U S A 97(18):10096–10100

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Foat BC, Morozov AV, Bussemaker HJ (2006) Statistical mechanical modeling of genome-wide transcription factor occupancy data by MatrixREDUCE. Bioinformatics 22(14):e141–e149

    Article  CAS  PubMed  Google Scholar 

  14. Wang G, Yu T, Zhang W (2005) WordSpy: identifying transcription factor binding motifs by building a dictionary and learning a grammar. Nucleic Acids Res 33(Web Server issue):W412–W416

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Sabatti C, Lange K (2002) Genomewide motif identification using a dictionary model. Proc IEEE 90(11):1803–1810

    Article  Google Scholar 

  16. Sabatti C et al (2005) Vocabulon: a dictionary model approach for reconstruction and localization of transcription factor binding sites. Bioinformatics 21(7):922–931

    Article  CAS  PubMed  Google Scholar 

  17. Halperin Y et al (2009) Allegro: analyzing expression and sequence in concert to discover regulatory programs. Nucleic Acids Res 37(5):1566–1579

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Orenstein Y, Linhart C, Shamir R (2012) Assessment of algorithms for inferring positional weight matrix motifs of transcription factor binding sites using protein binding microarray data. PLoS One 7(9):e46145

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Triska M et al (2013) cisExpress: motif detection in DNA sequences. Bioinformatics 29(17):2203–2205

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Hughes JD et al (2000) Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. J Mol Biol 296(5):1205–1214

    Article  CAS  PubMed  Google Scholar 

  21. Roth FP et al (1998) Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation. Nat Biotechnol 16(10):939–945

    Article  CAS  PubMed  Google Scholar 

  22. Bailey TL et al (2006) MEME discovering and analyzing DNA and protein sequence motifs. Nucleic Acids Res 34(Web Server issue):W369–W373

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Subramanian AR, Kaufmann M, Morgenstern B (2008) DIALIGN-TX: greedy and progressive approaches for segment-based multiple sequence alignment. Algorithms Mol Biol 3:6

    Article  PubMed  PubMed Central  Google Scholar 

  24. Do CB, Batzoglou S (2008) What is the expectation maximization algorithm? Nat Biotechnol 26(8):897–899

    Article  CAS  PubMed  Google Scholar 

  25. Thompson W, Rouchka EC, Lawrence CE (2003) Gibbs recursive sampler: finding transcription factor binding sites. Nucleic Acids Res 31(13):3580–3585

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Jolma A et al (2013) DNA-binding specificities of human transcription factors. Cell 152(1–2):327–339

    Article  CAS  Google Scholar 

  27. Tharakaraman K et al (2005) Alignments anchored on genomic landmarks can aid in the identification of regulatory elements. Bioinformatics 21(Suppl 1):i440–i448

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Sandve GK, Drablos F (2006) A survey of motif discovery methods in an integrated framework. Biol Direct 1:11

    Article  PubMed  PubMed Central  Google Scholar 

  29. Tompa M et al (2005) Assessing computational tools for the discovery of transcription factor binding sites. Nat Biotechnol 23(1):137–144

    Article  CAS  PubMed  Google Scholar 

  30. Whalen S, Truty RM, Pollard KS (2016) Enhancer-promoter interactions are encoded by complex genomic signatures on looping chromatin. Nat Genet 48(5):488–496

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Barrett T et al (2011) NCBI GEO: archive for functional genomics data sets—10 years on. Nucleic Acids Res 39(Database issue):D1005–D1010

    Article  CAS  PubMed  Google Scholar 

  32. Barrett T et al (2013) NCBI GEO: archive for functional genomics data sets—update. Nucleic Acids Res 41(Database issue):D991–D995

    Article  CAS  PubMed  Google Scholar 

  33. Edgar R, Domrachev M, Lash AE (2002) Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 30(1):207–210

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Ivliev AE, Hoen PA, Sergeeva MG (2010) Coexpression network analysis identifies transcriptional modules related to proastrocytic differentiation and sprouty signaling in glioma. Cancer Res 70(24):10060–10070

    Article  CAS  PubMed  Google Scholar 

  35. Miller JA, Oldham MC, Geschwind DH (2008) A systems level analysis of transcriptional changes in Alzheimer’s disease and normal aging. J Neurosci 28(6):1410–1420

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Miller JA, Horvath S, Geschwind DH (2010) Divergence of human and mouse brain transcriptome highlights Alzheimer disease pathways. Proc Natl Acad Sci U S A 107(28):12698–12703

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Suzuki A et al (2015) DBTSS as an integrative platform for transcriptome, epigenome and genome sequence variation data. Nucleic Acids Res 43(Database issue):D87–D91

    Article  CAS  Google Scholar 

  38. Tatarinova T et al (2013) NPEST: a nonparametric method and a database for transcription start site prediction. Quant Biol 1(4):61–271

    Article  Google Scholar 

  39. Zhang B, Horvath S (2005) A general framework for weighted gene co-expression network analysis. Stat Appl Genet Mol Biol 4:17

    Article  Google Scholar 

  40. Langfelder P, Zhang B, Horvath S (2008) Defining clusters from a hierarchical cluster tree: the dynamic tree cut package for R. Bioinformatics 24(5):719–720

    Article  CAS  PubMed  Google Scholar 

  41. Whitley D (1994) A genetic algorithm tutorial. Stat Comput 4(2):65–85

    Article  Google Scholar 

  42. Wall M (2007) GAlibA C++ library of genetic algorithm components. Massachusetts Institute of Technology, Cambridge, MA

    Google Scholar 

  43. Sonkin D et al (2013) Tumor suppressors status in cancer cell line encyclopedia. Mol Oncol 7(4):791–798

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Thakur S et al (2003) Regulation of BRCA1 transcription by specific single-stranded DNA binding factors. Mol Cell Biol 23(11):3774–3787

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Nevins JR (2001) The Rb/E2F pathway and cancer. Hum Mol Genet 10(7):699–703

    Article  CAS  PubMed  Google Scholar 

  46. Evangelou K, Havaki S, Kotsinas A (2014) E2F transcription factors and digestive system malignancies: how much do we know? World J Gastroenterol 20(29):10212–10216

    Article  PubMed  PubMed Central  Google Scholar 

  47. Xanthoulis A, Tiniakos DG (2013) E2F transcription factors and digestive system malignancies: how much do we know? World J Gastroenterol 19(21):3189–3198

    Article  PubMed  PubMed Central  Google Scholar 

  48. Sadasivan E, Cedeno MM, Rothenberg SP (1994) Characterization of the gene encoding a folate-binding protein expressed in human placenta. Identification of promoter activity in a G-rich SP1 site linked with the tandemly repeated GGAAG motif for the ets encoded GA-binding protein. J Biol Chem 269(7):4725–4735

    CAS  PubMed  Google Scholar 

  49. Bell RJ et al (2015) Cancer. The transcription factor GABP selectively binds and activates the mutant TERT promoter in cancer. Science 348(6238):1036–1039

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Siddharthan R (2010) Dinucleotide weight matrices for predicting transcription factor binding sites: generalizing the position weight matrix. PLoS One 5(3):e9722

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tatiana V. Tatarinova .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media LLC

About this protocol

Cite this protocol

Triska, M., Ivliev, A., Nikolsky, Y., Tatarinova, T.V. (2017). Analysis of cis-Regulatory Elements in Gene Co-expression Networks in Cancer. In: Tatarinova, T., Nikolsky, Y. (eds) Biological Networks and Pathway Analysis. Methods in Molecular Biology, vol 1613. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-7027-8_11

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-7027-8_11

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-7025-4

  • Online ISBN: 978-1-4939-7027-8

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics