Skip to main content

Step-by-Step Construction of Gene Co-expression Networks from High-Throughput Arabidopsis RNA Sequencing Data

Part of the Methods in Molecular Biology book series (MIMB,volume 1761)

Abstract

The rapid increase in the availability of transcriptomics data generated by RNA sequencing represents both a challenge and an opportunity for biologists without bioinformatics training. The challenge is handling, integrating, and interpreting these data sets. The opportunity is to use this information to generate testable hypothesis to understand molecular mechanisms controlling gene expression and biological processes (Fig. 1). A successful strategy to generate tractable hypotheses from transcriptomics data has been to build undirected network graphs based on patterns of gene co-expression. Many examples of new hypothesis derived from network analyses can be found in the literature, spanning different organisms including plants and specific fields such as root developmental biology.

In order to make the process of constructing a gene co-expression network more accessible to biologists, here we provide step-by-step instructions using published RNA-seq experimental data obtained from a public database. Similar strategies have been used in previous studies to advance root developmental biology. This guide includes basic instructions for the operation of widely used open source platforms such as Bio-Linux, R, and Cytoscape. Even though the data we used in this example was obtained from Arabidopsis thaliana, the workflow developed in this guide can be easily adapted to work with RNA-seq data from any organism.

Key words

  • RNA-seq
  • Gene co-expression network
  • Differential gene expression
  • DESeq2
  • Cytoscape
  • Bioinformatics
  • Network generation
  • Correlation
  • Cytoscape
  • Bio-Linux
  • HISAT2
  • FastQC
  • Trimmomatic

This is a preview of subscription content, access via your institution.

Buying options

Protocol
USD   49.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-1-4939-7747-5_21
  • Chapter length: 27 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   129.00
Price excludes VAT (USA)
  • ISBN: 978-1-4939-7747-5
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   169.99
Price excludes VAT (USA)
Hardcover Book
USD   169.99
Price excludes VAT (USA)
Fig. 1
Fig. 2
Fig. 3
Fig. 4

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

References

  1. Usadel B, Fernie AR (2013) The plant transcriptome—from integrating observations to models. Front Plant Sci 4:1–3

    CrossRef  Google Scholar 

  2. Moustafa K, Cross JM (2016) Genetic approaches to study plant responses to environmental stresses: an overview. Biology (Basel) 5:1–18

    Google Scholar 

  3. Malik VS (2016) RNA sequencing as a tool for understanding biological complexity of abiotic stress in plants. J Plant Biochem Biotechnol 25:1–2

    CrossRef  Google Scholar 

  4. Wetterstrand, KA (2016). DNA sequencing costs: data from the NHGRI large-scale genome sequencing program. www.genome.gov/sequencingcostsdata, Accessed 4 Sep 2016

  5. Martin LBB, Fei Z, Giovannoni JJ, Rose JKC (2013) Catalyzing plant science research with RNA-seq. Front Plant Sci 4:66

    CrossRef  PubMed  PubMed Central  Google Scholar 

  6. Weber APM (2015) Discovering new biology through RNA-Seq. Plant Physiol 169(3):1524–1531. 01081.2015

    PubMed  PubMed Central  CAS  Google Scholar 

  7. Hrdlickova R, Toloue M, Tian B (2017) RNA-Seq methods for transcriptome analysis. Wiley Interdiscip Rev RNA 8:e1364

    CrossRef  CAS  Google Scholar 

  8. Kolesnikov N, Hastings E, Keays M, Melnichuk O, Tang YA, Williams E et al (2015) ArrayExpress update-simplifying data submissions. Nucleic Acids Res 43:D1113–D1116

    CrossRef  CAS  PubMed  Google Scholar 

  9. Serin EAR, Nijveen H, Hilhorst HWM, Ligterink W (2016) Learning from co-expression networks: possibilities and challenges. Front Plant Sci 7:444

    CrossRef  PubMed  PubMed Central  Google Scholar 

  10. Katari MS, Nowicki SD, Aceituno FF, Nero D, Kelfer J, Thompson LP et al (2010) VirtualPlant: a software platform to support systems biology research. Plant Physiol 152:500–515

    CrossRef  CAS  PubMed  PubMed Central  Google Scholar 

  11. Gutiérrez RA, Lejay LV, Dean A, Chiaromonte F, Shasha DE, Coruzzi GM (2007) Qualitative network models and genome-wide expression data define carbon/nitrogen-responsive molecular machines in Arabidopsis. Genome Biol 8:R7

    CrossRef  CAS  PubMed  PubMed Central  Google Scholar 

  12. Yang C, Wei H (2015) Designing microarray and RNA-Seq experiments for greater systems biology discovery in modern plant genomics. Mol Plant 8:196–206

    CrossRef  CAS  PubMed  Google Scholar 

  13. Bassel GW, Gaudinier A, Brady SM, Hennig L, Rhee SY, De Smet I (2012) Systems analysis of plant functional, transcriptional, physical interaction, and metabolic networks. Plant Cell 24:3859–3875

    CrossRef  CAS  PubMed  PubMed Central  Google Scholar 

  14. Canales J, Moyano TC, Villarroel E, Gutiérrez RA (2014) Systems analysis of transcriptome data provides new hypotheses about Arabidopsis root response to nitrate treatments. Front Plant Sci 5:22

    CrossRef  PubMed  PubMed Central  Google Scholar 

  15. Long TA, Brady SM, Benfey PN (2008) Systems approaches to identifying gene regulatory networks in plants. Annu Rev Cell Dev Biol 24:81–103

    CrossRef  CAS  PubMed  PubMed Central  Google Scholar 

  16. Rasmussen S, Barah P, Suarez-Rodriguez MC, Bressendorff S, Friis P, Costantino P et al (2013) Transcriptome responses to combinations of stresses in Arabidopsis. Plant Physiol 161:1783–1794

    CrossRef  CAS  PubMed  PubMed Central  Google Scholar 

  17. Ruffel S, Krouk G, Coruzzi GM (2010) A systems view of responses to nutritional cues in Arabidopsis: toward a paradigm shift for predictive network modeling. Plant Physiol 152:445–452

    CrossRef  CAS  PubMed  PubMed Central  Google Scholar 

  18. Conesa A, Madrigal P, Tarazona S, Gomez-Cabrero D, Cervera A, McPherson A et al (2016) A survey of best practices for RNA-seq data analysis. Genome Biol 17:13

    CrossRef  CAS  PubMed  PubMed Central  Google Scholar 

  19. Wei H, Persson S, Mehta T, Srinivasasainagendra V, Chen L, Page GP et al (2006) Transcriptional coordination of the metabolic network in Arabidopsis. Plant Physiol 142:762–774

    CrossRef  CAS  PubMed  PubMed Central  Google Scholar 

  20. Alvarez JM, Riveras E, Vidal EA, Gras DE, Contreras-López O, Tamayo KP et al (2014) Systems approach identifies TGA1 and TGA4 transcription factors as important regulatory components of the nitrate response of Arabidopsis Thaliana roots. Plant J 80:1–13

    CrossRef  CAS  PubMed  Google Scholar 

  21. Gutierrez RA, Stokes TL, Thum K, Xu X, Obertello M, Katari MS et al (2008) Systems approach identifies an organic nitrogen-responsive gene network that is regulated by the master clock control gene CCA1. Proc Natl Acad Sci 105:4939–4944

    CrossRef  PubMed  Google Scholar 

  22. Gutiérrez RA, Gifford ML, Poultney C, Wang R, Shasha DE, Coruzzi GM et al (2007) Insights into the genomic nitrate response using genetics and the Sungear software system. J Exp Bot 58:2359–2367

    CrossRef  CAS  PubMed  Google Scholar 

  23. Krouk G, Mirowski P, LeCun Y, Shasha DE, Coruzzi GM (2010) Predictive network modeling of the high-resolution dynamic plant transcriptome in response to nitrate. Genome Biol 11:R123

    CrossRef  CAS  PubMed  PubMed Central  Google Scholar 

  24. Nero D, Krouk G, Tranchina D, Coruzzi GM (2009) A system biology approach highlights a hormonal enhancer effect on regulation of genes in a nitrate responsive ‘biomodule’. BMC Syst Biol 3:59

    CrossRef  CAS  PubMed  PubMed Central  Google Scholar 

  25. Vidal EA, Araus V, Lu C, Parry G, Green PJ, Coruzzi GM et al (2010) Nitrate-responsive miR393/AFB3 regulatory module controls root system architecture in Arabidopsis Thaliana. Proc Natl Acad Sci U S A 107:4477–4482

    CrossRef  PubMed  PubMed Central  Google Scholar 

  26. Vidal EA, Moyano TC, Krouk G, Katari MS, Tanurdzic M, McCombie WR et al (2013) Integrated RNA-seq and sRNA-seq analysis identifies novel nitrate-responsive genes in Arabidopsis Thaliana roots. BMC Genomics 14:701

    CrossRef  CAS  PubMed  PubMed Central  Google Scholar 

  27. Araus V, Vidal EA, Puelma T, Alamos S, Mieulet D, Guiderdoni E et al (2016) Members of BTB gene family of scaffold proteins suppress nitrate uptake and nitrogen use efficiency. Plant Physiol 171:1523–1532

    PubMed  PubMed Central  CAS  Google Scholar 

  28. De Bodt S, Carvajal D, Hollunder J, Van den Cruyce J, Movahedi S, Inze D (2010) CORNET: a user-friendly tool for data mining and integration. Plant Physiol 152:1167–1179

    CrossRef  CAS  PubMed  PubMed Central  Google Scholar 

  29. Szklarczyk D, Franceschini A, Wyder S, Forslund K, Heller D, Huerta-Cepas J et al (2015) STRING v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res 43:D447–D452

    CrossRef  CAS  PubMed  Google Scholar 

  30. Zuberi K, Franz M, Rodriguez H, Montojo J, Lopes CT, Bader GD et al (2013) GeneMANIA prediction server 2013 update. Nucleic Acids Res 41:W115–W122

    CrossRef  PubMed  PubMed Central  Google Scholar 

  31. Aoki Y, Okamura Y, Tadaka S, Kinoshita K, Obayashi T (2016) ATTED-II in 2016: a plant coexpression database towards lineage-specific coexpression. Plant Cell Physiol 57:e5

    CrossRef  CAS  PubMed  Google Scholar 

  32. Field D, Tiwari B, Booth T, Houten S, Swan D, Bertrand N et al (2006) Open software for biologists: from famine to feast. Nat Biotechnol 24:801–803

    CrossRef  CAS  PubMed  Google Scholar 

  33. R Core Team (2015) R: a language and environment for statistical computing. R a lang. environ. stat. comput. R Core Team, Vienna, Austria

    Google Scholar 

  34. Gentleman R, Carey V, Bates D, Bolstad B, Dettling M, Dudoit S et al (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5:R80

    CrossRef  PubMed  PubMed Central  Google Scholar 

  35. Huber W, Carey VJ, Gentleman R, Anders S, Carlson M, Carvalho BS et al (2015) Orchestrating high-throughput genomic analysis with bioconductor. Nat Methods 12:115–121

    CrossRef  CAS  PubMed  PubMed Central  Google Scholar 

  36. Wilhelm BT, Landry J-R (2009) RNA-Seq-quantitative measurement of expression through massively parallel RNA-sequencing. Methods 48:249–257

    CrossRef  CAS  PubMed  Google Scholar 

  37. Kodama Y, Shumway M, Leinonen R (2012) The sequence read archive: explosive growth of sequencing data. Nucleic Acids Res 40:2011–2013

    Google Scholar 

  38. Andrews, S (2010). FastQC: a quality control tool for high throughput sequence data. Available online at:http://www.bioinformatics.babraham.ac.uk/projects/fastqc

  39. Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120

    CrossRef  CAS  PubMed  PubMed Central  Google Scholar 

  40. Liao Y, Smyth GK, Shi W (2013) The subread aligner: fast, accurate and scalable read mapping by seed-and-vote. Nucleic Acids Res 41(10):e108

    CrossRef  CAS  PubMed  PubMed Central  Google Scholar 

  41. Lan P, Li W, Schmidt W (2012) Complementary proteome and transcriptome profiling in phosphate-deficient Arabidopsis roots reveals multiple levels of gene regulation. Mol Cell Proteomics 11:1156–1166

    CrossRef  CAS  PubMed  PubMed Central  Google Scholar 

  42. Love MI, Huber W, Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15:550

    CrossRef  CAS  PubMed  PubMed Central  Google Scholar 

  43. Schurch NJ, Schofield P, Gierliński M, Cole C, Sherstnev A, Singh V et al (2016) How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use? RNA 22:839–851

    CrossRef  CAS  PubMed  PubMed Central  Google Scholar 

  44. Leng N, Dawson JA, Thomson JA, Ruotti V, Rissman AI, Smits BMG et al (2013) EBSeq: an empirical Bayes hierarchical model for inference in RNA-seq experiments. Bioinformatics 29:1035–1043

    CrossRef  CAS  PubMed  PubMed Central  Google Scholar 

  45. Morgun A, Yambartsev A, Thomas L, Shulzhenko N, Ramsey S, Dong X (2015) Reverse enGENEering of regulatory networks from big data: a roadmap for biologists. Bioinform Biol Insights 9:61–74

    PubMed  PubMed Central  Google Scholar 

  46. Revelle, W (2017) psych: Procedures for Personality and Psychological Research, Northwestern University, Evanston, Illinois, USA. Available at: https://CRAN.R-project.org/package=psychVersion=1.7.8.

  47. Yoon J, Blumer A, Lee K (2006) An algorithm for modularity analysis of directed and weighted biological networks based on edge-betweenness centrality. Bioinformatics 22:3106–3108

    CrossRef  CAS  PubMed  Google Scholar 

  48. Csardi G, Nepusz T (2006) The igraph software package for complex network research, InterJournal, Complex Systems 1695. Available at: http://igraph.org

  49. Kohl M, Wiese S, Warscheid B (2011) Cytoscape: software for visualization and analysis of biological networks. Methods Mol Biol 696:291–303

    CrossRef  CAS  PubMed  Google Scholar 

  50. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D et al (2003) Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13:2498–2504

    CrossRef  CAS  PubMed  PubMed Central  Google Scholar 

  51. Cline M, Smoot M, Cerami E, Kuchinsky A, Landys N, Workman C et al (2007) Integration of biological networks and gene expression data using Cytoscape. Nat Protoc 2:2366–2382

    CrossRef  CAS  PubMed  PubMed Central  Google Scholar 

  52. Morris JH, Apeltsin L, Newman AM, Baumbach J, Wittkop T, Su G et al (2011) clusterMaker: a multi-algorithm clustering plugin for Cytoscape. BMC Bioinformatics 12:436

    CrossRef  CAS  PubMed  PubMed Central  Google Scholar 

  53. Su G, Kuchinsky A, Morris JH, States DJ, Meng F (2010) GLay: community structure analysis of biological networks. Bioinformatics 26:3135–3137

    CrossRef  CAS  PubMed  PubMed Central  Google Scholar 

  54. Bindea G, Mlecnik B, Hackl H, Charoentong P, Tosolini M, Kirilovsky A et al (2009) ClueGO: a Cytoscape plug-in to decipher functionally grouped gene ontology and pathway annotation networks. Bioinformatics 25:1091–1093

    CrossRef  CAS  PubMed  PubMed Central  Google Scholar 

  55. Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL (2013) TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14:R36

    CrossRef  CAS  PubMed  PubMed Central  Google Scholar 

  56. Kim D, Langmead B, Salzberg SL (2015) HISAT: a fast spliced aligner with low memory requirements. Nat Methods 12:357–360

    CrossRef  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgments

Research in our group is funded by Fondo de Desarrollo de Areas Prioritarias (FONDAP) Center for Genome Regulation (15090007), MIISSB Iniciativa Científica Milenio-MINECON, Fondo Nacional de Desarrollo Científico y Tecnológico (FONDECYT) 1141097, and EvoNet (DE-SC0014377).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rodrigo A. Gutiérrez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2018 Springer Science+Business Media, LLC

About this protocol

Verify currency and authenticity via CrossMark

Cite this protocol

Contreras-López, O., Moyano, T.C., Soto, D.C., Gutiérrez, R.A. (2018). Step-by-Step Construction of Gene Co-expression Networks from High-Throughput Arabidopsis RNA Sequencing Data. In: Ristova, D., Barbez, E. (eds) Root Development. Methods in Molecular Biology, vol 1761. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-7747-5_21

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-7747-5_21

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-7746-8

  • Online ISBN: 978-1-4939-7747-5

  • eBook Packages: Springer Protocols