Analysis of cis-Regulatory Elements in Gene Co-expression Networks in Cancer

Triska, Martin; Ivliev, Alexander; Nikolsky, Yuri; Tatarinova, Tatiana V.

doi:10.1007/978-1-4939-7027-8_11

Martin Triska⁴,
Alexander Ivliev⁵,
Yuri Nikolsky^6,7 &
…
Tatiana V. Tatarinova^4,8,9

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1613))

2416 Accesses
7 Citations
1 Altmetric

Abstract

Analysis of gene co-expression networks is a powerful “data-driven” tool, invaluable for understanding cancer biology and mechanisms of tumor development. Yet, despite of completion of thousands of studies on cancer gene expression, there were few attempts to normalize and integrate co-expression data from scattered sources in a concise “meta-analysis” framework. Here we describe an integrated approach to cancer expression meta-analysis, which combines generation of “data-driven” co-expression networks with detailed statistical detection of promoter sequence motifs within the co-expression clusters. First, we applied Weighted Gene Co-Expression Network Analysis (WGCNA) workflow and Pearson’s correlation to generate a comprehensive set of over 3000 co-expression clusters in 82 normalized microarray datasets from nine cancers of different origin. Next, we designed a genome-wide statistical approach to the detection of specific DNA sequence motifs based on similarities between the promoters of similarly expressed genes. The approach, realized as cisExpress software module, was specifically designed for analysis of very large data sets such as those generated by publicly accessible whole genome and transcriptome projects. cisExpress uses a task farming algorithm to exploit all available computational cores within a shared memory node.

We discovered that although co-expression modules are populated with different sets of genes, they share distinct stable patterns of co-regulation based on promoter sequence analysis. The number of motifs per co-expression cluster varies widely in accordance with cancer tissue of origin, with the largest number in colon (68 motifs) and the lowest in ovary (18 motifs). The top scored motifs are typically shared between several tissues; they define sets of target genes responsible for certain functionality of cancerogenesis. Both the co-expression modules and a database of precalculated motifs are publically available and accessible for further studies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.00; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Consortium, S.M.-I (2014) A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the sequencing quality control consortium. Nat Biotechnol 32(9):903–914
Article Google Scholar
Kristensen VN et al (2014) Principles and methods of integrative genomic analyses in cancer. Nat Rev Cancer 14(5):299–313
Article CAS PubMed Google Scholar
Zhao Z et al (2016) Drug drug interaction extraction from biomedical literature using syntax convolutional neural network. Bioinformatics 32(22):3444–3453
PubMed Central Google Scholar
Zolotarenko A et al (2016) Integrated computational approach to the analysis of RNASeq data reveals new transcriptional regulators for psoriasis. Exp Mol Med 48(11):e268
Article CAS PubMed PubMed Central Google Scholar
Ivliev AE, ‘t Hoen PAC, Borisevich D, Nikolsky Y, Sergeeva MG (2016) Drug repositioning through systematic mining of gene coexpression networks in cancer. PLoS One 11(11):e0165059. doi:10.1371/journal.pone.0165059
Article PubMed PubMed Central Google Scholar
Langfelder P, Horvath S (2008) WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9:559
Article PubMed PubMed Central Google Scholar
MacIsaac KD, Fraenkel E (2006) Practical strategies for discovering regulatory DNA sequence motifs. PLoS Comput Biol 2(4):e36
Article PubMed PubMed Central Google Scholar
Shi Q et al (2010) Biomarkers for drug-induced liver injury. Expert Rev Gastroenterol Hepatol 4(2):225–234
Article CAS PubMed Google Scholar
Weirauch MT et al (2013) Evaluation of methods for modeling transcription factor sequence specificity. Nat Biotechnol 31(2):126–134
Article CAS PubMed Central Google Scholar
Pavesi G, Mauri G, Pesole G (2001) An algorithm for finding signals of unknown length in DNA sequences. Bioinformatics 17(Suppl 1):S207–S214
Article PubMed Google Scholar
Troukhan M et al (2009) Genome-wide discovery of cis-elements in promoter sequences using gene expression data. OMICS J Integr Biol 13(2)
Google Scholar
Bussemaker HJ, Li H, Siggia ED (2000) Building a dictionary for genomes: identification of presumptive regulatory sites by statistical analysis. Proc Natl Acad Sci U S A 97(18):10096–10100
Article CAS PubMed PubMed Central Google Scholar
Foat BC, Morozov AV, Bussemaker HJ (2006) Statistical mechanical modeling of genome-wide transcription factor occupancy data by MatrixREDUCE. Bioinformatics 22(14):e141–e149
Article CAS PubMed Google Scholar
Wang G, Yu T, Zhang W (2005) WordSpy: identifying transcription factor binding motifs by building a dictionary and learning a grammar. Nucleic Acids Res 33(Web Server issue):W412–W416
Article CAS PubMed PubMed Central Google Scholar
Sabatti C, Lange K (2002) Genomewide motif identification using a dictionary model. Proc IEEE 90(11):1803–1810
Article Google Scholar
Sabatti C et al (2005) Vocabulon: a dictionary model approach for reconstruction and localization of transcription factor binding sites. Bioinformatics 21(7):922–931
Article CAS PubMed Google Scholar
Halperin Y et al (2009) Allegro: analyzing expression and sequence in concert to discover regulatory programs. Nucleic Acids Res 37(5):1566–1579
Article CAS PubMed PubMed Central Google Scholar
Orenstein Y, Linhart C, Shamir R (2012) Assessment of algorithms for inferring positional weight matrix motifs of transcription factor binding sites using protein binding microarray data. PLoS One 7(9):e46145
Article CAS PubMed PubMed Central Google Scholar
Triska M et al (2013) cisExpress: motif detection in DNA sequences. Bioinformatics 29(17):2203–2205
Article CAS PubMed PubMed Central Google Scholar
Hughes JD et al (2000) Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. J Mol Biol 296(5):1205–1214
Article CAS PubMed Google Scholar
Roth FP et al (1998) Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation. Nat Biotechnol 16(10):939–945
Article CAS PubMed Google Scholar
Bailey TL et al (2006) MEME discovering and analyzing DNA and protein sequence motifs. Nucleic Acids Res 34(Web Server issue):W369–W373
Article CAS PubMed PubMed Central Google Scholar
Subramanian AR, Kaufmann M, Morgenstern B (2008) DIALIGN-TX: greedy and progressive approaches for segment-based multiple sequence alignment. Algorithms Mol Biol 3:6
Article PubMed PubMed Central Google Scholar
Do CB, Batzoglou S (2008) What is the expectation maximization algorithm? Nat Biotechnol 26(8):897–899
Article CAS PubMed Google Scholar
Thompson W, Rouchka EC, Lawrence CE (2003) Gibbs recursive sampler: finding transcription factor binding sites. Nucleic Acids Res 31(13):3580–3585
Article CAS PubMed PubMed Central Google Scholar
Jolma A et al (2013) DNA-binding specificities of human transcription factors. Cell 152(1–2):327–339
Article CAS Google Scholar
Tharakaraman K et al (2005) Alignments anchored on genomic landmarks can aid in the identification of regulatory elements. Bioinformatics 21(Suppl 1):i440–i448
Article CAS PubMed PubMed Central Google Scholar
Sandve GK, Drablos F (2006) A survey of motif discovery methods in an integrated framework. Biol Direct 1:11
Article PubMed PubMed Central Google Scholar
Tompa M et al (2005) Assessing computational tools for the discovery of transcription factor binding sites. Nat Biotechnol 23(1):137–144
Article CAS PubMed Google Scholar
Whalen S, Truty RM, Pollard KS (2016) Enhancer-promoter interactions are encoded by complex genomic signatures on looping chromatin. Nat Genet 48(5):488–496
Article CAS PubMed PubMed Central Google Scholar
Barrett T et al (2011) NCBI GEO: archive for functional genomics data sets—10 years on. Nucleic Acids Res 39(Database issue):D1005–D1010
Article CAS PubMed Google Scholar
Barrett T et al (2013) NCBI GEO: archive for functional genomics data sets—update. Nucleic Acids Res 41(Database issue):D991–D995
Article CAS PubMed Google Scholar
Edgar R, Domrachev M, Lash AE (2002) Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 30(1):207–210
Article CAS PubMed PubMed Central Google Scholar
Ivliev AE, Hoen PA, Sergeeva MG (2010) Coexpression network analysis identifies transcriptional modules related to proastrocytic differentiation and sprouty signaling in glioma. Cancer Res 70(24):10060–10070
Article CAS PubMed Google Scholar
Miller JA, Oldham MC, Geschwind DH (2008) A systems level analysis of transcriptional changes in Alzheimer’s disease and normal aging. J Neurosci 28(6):1410–1420
Article CAS PubMed PubMed Central Google Scholar
Miller JA, Horvath S, Geschwind DH (2010) Divergence of human and mouse brain transcriptome highlights Alzheimer disease pathways. Proc Natl Acad Sci U S A 107(28):12698–12703
Article CAS PubMed PubMed Central Google Scholar
Suzuki A et al (2015) DBTSS as an integrative platform for transcriptome, epigenome and genome sequence variation data. Nucleic Acids Res 43(Database issue):D87–D91
Article CAS Google Scholar
Tatarinova T et al (2013) NPEST: a nonparametric method and a database for transcription start site prediction. Quant Biol 1(4):61–271
Article Google Scholar
Zhang B, Horvath S (2005) A general framework for weighted gene co-expression network analysis. Stat Appl Genet Mol Biol 4:17
Article Google Scholar
Langfelder P, Zhang B, Horvath S (2008) Defining clusters from a hierarchical cluster tree: the dynamic tree cut package for R. Bioinformatics 24(5):719–720
Article CAS PubMed Google Scholar
Whitley D (1994) A genetic algorithm tutorial. Stat Comput 4(2):65–85
Article Google Scholar
Wall M (2007) GAlibA C++ library of genetic algorithm components. Massachusetts Institute of Technology, Cambridge, MA
Google Scholar
Sonkin D et al (2013) Tumor suppressors status in cancer cell line encyclopedia. Mol Oncol 7(4):791–798
Article CAS PubMed PubMed Central Google Scholar
Thakur S et al (2003) Regulation of BRCA1 transcription by specific single-stranded DNA binding factors. Mol Cell Biol 23(11):3774–3787
Article CAS PubMed PubMed Central Google Scholar
Nevins JR (2001) The Rb/E2F pathway and cancer. Hum Mol Genet 10(7):699–703
Article CAS PubMed Google Scholar
Evangelou K, Havaki S, Kotsinas A (2014) E2F transcription factors and digestive system malignancies: how much do we know? World J Gastroenterol 20(29):10212–10216
Article PubMed PubMed Central Google Scholar
Xanthoulis A, Tiniakos DG (2013) E2F transcription factors and digestive system malignancies: how much do we know? World J Gastroenterol 19(21):3189–3198
Article PubMed PubMed Central Google Scholar
Sadasivan E, Cedeno MM, Rothenberg SP (1994) Characterization of the gene encoding a folate-binding protein expressed in human placenta. Identification of promoter activity in a G-rich SP1 site linked with the tandemly repeated GGAAG motif for the ets encoded GA-binding protein. J Biol Chem 269(7):4725–4735
CAS PubMed Google Scholar
Bell RJ et al (2015) Cancer. The transcription factor GABP selectively binds and activates the mutant TERT promoter in cancer. Science 348(6238):1036–1039
Article CAS PubMed PubMed Central Google Scholar
Siddharthan R (2010) Dinucleotide weight matrices for predicting transcription factor binding sites: generalizing the position weight matrix. PLoS One 5(3):e9722
Article PubMed PubMed Central Google Scholar

Download references

Author information

Authors and Affiliations

Spatial Sciences Institute, University of Southern California, Los Angeles, CA, USA
Martin Triska & Tatiana V. Tatarinova
Thomson Reuters, Boston, MA, USA
Alexander Ivliev
Prosapia Genetics, Solana Beach, CA, USA
Yuri Nikolsky
School of Systems Biology, George Mason University, Fairfax, VA, USA
Yuri Nikolsky
Center for Personalized Medicine, Children’s Hospital Los Angeles, 4640 Hollywood Blvd, Los Angeles, CA, 90027, USA
Tatiana V. Tatarinova
A.A. Kharkevich Institute for Information Transmission Problems RAS, Moscow, Russia
Tatiana V. Tatarinova

Authors

Martin Triska
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Ivliev
View author publications
You can also search for this author in PubMed Google Scholar
Yuri Nikolsky
View author publications
You can also search for this author in PubMed Google Scholar
Tatiana V. Tatarinova
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tatiana V. Tatarinova .

Editor information

Editors and Affiliations

Keck School of Medicine, University of Southern California, Los Angeles, California, USA
Tatiana V. Tatarinova
Prosapia Genetics, Solana Beach, California, USA
Yuri Nikolsky

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Triska, M., Ivliev, A., Nikolsky, Y., Tatarinova, T.V. (2017). Analysis of cis-Regulatory Elements in Gene Co-expression Networks in Cancer. In: Tatarinova, T., Nikolsky, Y. (eds) Biological Networks and Pathway Analysis. Methods in Molecular Biology, vol 1613. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-7027-8_11

Download citation

DOI: https://doi.org/10.1007/978-1-4939-7027-8_11
Published: 29 August 2017
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-7025-4
Online ISBN: 978-1-4939-7027-8
eBook Packages: Springer Protocols

Publish with us

Policies and ethics