Abstract
Analysis of gene co-expression networks is a powerful “data-driven” tool, invaluable for understanding cancer biology and mechanisms of tumor development. Yet, despite of completion of thousands of studies on cancer gene expression, there were few attempts to normalize and integrate co-expression data from scattered sources in a concise “meta-analysis” framework. Here we describe an integrated approach to cancer expression meta-analysis, which combines generation of “data-driven” co-expression networks with detailed statistical detection of promoter sequence motifs within the co-expression clusters. First, we applied Weighted Gene Co-Expression Network Analysis (WGCNA) workflow and Pearson’s correlation to generate a comprehensive set of over 3000 co-expression clusters in 82 normalized microarray datasets from nine cancers of different origin. Next, we designed a genome-wide statistical approach to the detection of specific DNA sequence motifs based on similarities between the promoters of similarly expressed genes. The approach, realized as cisExpress software module, was specifically designed for analysis of very large data sets such as those generated by publicly accessible whole genome and transcriptome projects. cisExpress uses a task farming algorithm to exploit all available computational cores within a shared memory node.
We discovered that although co-expression modules are populated with different sets of genes, they share distinct stable patterns of co-regulation based on promoter sequence analysis. The number of motifs per co-expression cluster varies widely in accordance with cancer tissue of origin, with the largest number in colon (68 motifs) and the lowest in ovary (18 motifs). The top scored motifs are typically shared between several tissues; they define sets of target genes responsible for certain functionality of cancerogenesis. Both the co-expression modules and a database of precalculated motifs are publically available and accessible for further studies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Consortium, S.M.-I (2014) A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the sequencing quality control consortium. Nat Biotechnol 32(9):903–914
Kristensen VN et al (2014) Principles and methods of integrative genomic analyses in cancer. Nat Rev Cancer 14(5):299–313
Zhao Z et al (2016) Drug drug interaction extraction from biomedical literature using syntax convolutional neural network. Bioinformatics 32(22):3444–3453
Zolotarenko A et al (2016) Integrated computational approach to the analysis of RNASeq data reveals new transcriptional regulators for psoriasis. Exp Mol Med 48(11):e268
Ivliev AE, ‘t Hoen PAC, Borisevich D, Nikolsky Y, Sergeeva MG (2016) Drug repositioning through systematic mining of gene coexpression networks in cancer. PLoS One 11(11):e0165059. doi:10.1371/journal.pone.0165059
Langfelder P, Horvath S (2008) WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9:559
MacIsaac KD, Fraenkel E (2006) Practical strategies for discovering regulatory DNA sequence motifs. PLoS Comput Biol 2(4):e36
Shi Q et al (2010) Biomarkers for drug-induced liver injury. Expert Rev Gastroenterol Hepatol 4(2):225–234
Weirauch MT et al (2013) Evaluation of methods for modeling transcription factor sequence specificity. Nat Biotechnol 31(2):126–134
Pavesi G, Mauri G, Pesole G (2001) An algorithm for finding signals of unknown length in DNA sequences. Bioinformatics 17(Suppl 1):S207–S214
Troukhan M et al (2009) Genome-wide discovery of cis-elements in promoter sequences using gene expression data. OMICS J Integr Biol 13(2)
Bussemaker HJ, Li H, Siggia ED (2000) Building a dictionary for genomes: identification of presumptive regulatory sites by statistical analysis. Proc Natl Acad Sci U S A 97(18):10096–10100
Foat BC, Morozov AV, Bussemaker HJ (2006) Statistical mechanical modeling of genome-wide transcription factor occupancy data by MatrixREDUCE. Bioinformatics 22(14):e141–e149
Wang G, Yu T, Zhang W (2005) WordSpy: identifying transcription factor binding motifs by building a dictionary and learning a grammar. Nucleic Acids Res 33(Web Server issue):W412–W416
Sabatti C, Lange K (2002) Genomewide motif identification using a dictionary model. Proc IEEE 90(11):1803–1810
Sabatti C et al (2005) Vocabulon: a dictionary model approach for reconstruction and localization of transcription factor binding sites. Bioinformatics 21(7):922–931
Halperin Y et al (2009) Allegro: analyzing expression and sequence in concert to discover regulatory programs. Nucleic Acids Res 37(5):1566–1579
Orenstein Y, Linhart C, Shamir R (2012) Assessment of algorithms for inferring positional weight matrix motifs of transcription factor binding sites using protein binding microarray data. PLoS One 7(9):e46145
Triska M et al (2013) cisExpress: motif detection in DNA sequences. Bioinformatics 29(17):2203–2205
Hughes JD et al (2000) Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. J Mol Biol 296(5):1205–1214
Roth FP et al (1998) Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation. Nat Biotechnol 16(10):939–945
Bailey TL et al (2006) MEME discovering and analyzing DNA and protein sequence motifs. Nucleic Acids Res 34(Web Server issue):W369–W373
Subramanian AR, Kaufmann M, Morgenstern B (2008) DIALIGN-TX: greedy and progressive approaches for segment-based multiple sequence alignment. Algorithms Mol Biol 3:6
Do CB, Batzoglou S (2008) What is the expectation maximization algorithm? Nat Biotechnol 26(8):897–899
Thompson W, Rouchka EC, Lawrence CE (2003) Gibbs recursive sampler: finding transcription factor binding sites. Nucleic Acids Res 31(13):3580–3585
Jolma A et al (2013) DNA-binding specificities of human transcription factors. Cell 152(1–2):327–339
Tharakaraman K et al (2005) Alignments anchored on genomic landmarks can aid in the identification of regulatory elements. Bioinformatics 21(Suppl 1):i440–i448
Sandve GK, Drablos F (2006) A survey of motif discovery methods in an integrated framework. Biol Direct 1:11
Tompa M et al (2005) Assessing computational tools for the discovery of transcription factor binding sites. Nat Biotechnol 23(1):137–144
Whalen S, Truty RM, Pollard KS (2016) Enhancer-promoter interactions are encoded by complex genomic signatures on looping chromatin. Nat Genet 48(5):488–496
Barrett T et al (2011) NCBI GEO: archive for functional genomics data sets—10 years on. Nucleic Acids Res 39(Database issue):D1005–D1010
Barrett T et al (2013) NCBI GEO: archive for functional genomics data sets—update. Nucleic Acids Res 41(Database issue):D991–D995
Edgar R, Domrachev M, Lash AE (2002) Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 30(1):207–210
Ivliev AE, Hoen PA, Sergeeva MG (2010) Coexpression network analysis identifies transcriptional modules related to proastrocytic differentiation and sprouty signaling in glioma. Cancer Res 70(24):10060–10070
Miller JA, Oldham MC, Geschwind DH (2008) A systems level analysis of transcriptional changes in Alzheimer’s disease and normal aging. J Neurosci 28(6):1410–1420
Miller JA, Horvath S, Geschwind DH (2010) Divergence of human and mouse brain transcriptome highlights Alzheimer disease pathways. Proc Natl Acad Sci U S A 107(28):12698–12703
Suzuki A et al (2015) DBTSS as an integrative platform for transcriptome, epigenome and genome sequence variation data. Nucleic Acids Res 43(Database issue):D87–D91
Tatarinova T et al (2013) NPEST: a nonparametric method and a database for transcription start site prediction. Quant Biol 1(4):61–271
Zhang B, Horvath S (2005) A general framework for weighted gene co-expression network analysis. Stat Appl Genet Mol Biol 4:17
Langfelder P, Zhang B, Horvath S (2008) Defining clusters from a hierarchical cluster tree: the dynamic tree cut package for R. Bioinformatics 24(5):719–720
Whitley D (1994) A genetic algorithm tutorial. Stat Comput 4(2):65–85
Wall M (2007) GAlibA C++ library of genetic algorithm components. Massachusetts Institute of Technology, Cambridge, MA
Sonkin D et al (2013) Tumor suppressors status in cancer cell line encyclopedia. Mol Oncol 7(4):791–798
Thakur S et al (2003) Regulation of BRCA1 transcription by specific single-stranded DNA binding factors. Mol Cell Biol 23(11):3774–3787
Nevins JR (2001) The Rb/E2F pathway and cancer. Hum Mol Genet 10(7):699–703
Evangelou K, Havaki S, Kotsinas A (2014) E2F transcription factors and digestive system malignancies: how much do we know? World J Gastroenterol 20(29):10212–10216
Xanthoulis A, Tiniakos DG (2013) E2F transcription factors and digestive system malignancies: how much do we know? World J Gastroenterol 19(21):3189–3198
Sadasivan E, Cedeno MM, Rothenberg SP (1994) Characterization of the gene encoding a folate-binding protein expressed in human placenta. Identification of promoter activity in a G-rich SP1 site linked with the tandemly repeated GGAAG motif for the ets encoded GA-binding protein. J Biol Chem 269(7):4725–4735
Bell RJ et al (2015) Cancer. The transcription factor GABP selectively binds and activates the mutant TERT promoter in cancer. Science 348(6238):1036–1039
Siddharthan R (2010) Dinucleotide weight matrices for predicting transcription factor binding sites: generalizing the position weight matrix. PLoS One 5(3):e9722
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Science+Business Media LLC
About this protocol
Cite this protocol
Triska, M., Ivliev, A., Nikolsky, Y., Tatarinova, T.V. (2017). Analysis of cis-Regulatory Elements in Gene Co-expression Networks in Cancer. In: Tatarinova, T., Nikolsky, Y. (eds) Biological Networks and Pathway Analysis. Methods in Molecular Biology, vol 1613. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-7027-8_11
Download citation
DOI: https://doi.org/10.1007/978-1-4939-7027-8_11
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-7025-4
Online ISBN: 978-1-4939-7027-8
eBook Packages: Springer Protocols