A Flexible Protocol for Targeted Gene Co-expression Network Analysis

Part of the Methods in Molecular Biology book series (MIMB, volume 1153)


The inference of gene co-expression networks is a valuable resource for novel hypotheses in experimental research. Routine high-throughput microarray transcript profiling experiments and the rapid development of next-generation sequencing (NGS) technologies generate a large amount of publicly available data, enabling in silico reconstruction of regulatory networks. Analysis of the transcriptome under various experimental conditions proved that genes with an overall similar expression pattern often have similar functions. Consistently, genes involved in the same metabolic pathway are found in co-expressed modules. In this chapter, we describe a detailed workflow for analyzing gene co-expression networks using large-scale gene expression data and explain critical steps from design and data analysis to prediction of functionally related modules. This protocol is platform independent and can be used for data generated by ATH1 arrays, tiling arrays, or RNA sequencing for any organism. The most important feature of this workflow is that it can infer statistically significant gene co-expression networks for any number of genes and transcriptome data sets and it does not involve any particular hardware requirements.

Key words

Co-expression Isoprenoids Circadian clock Gene module Network Transcriptome 



We thank Dr. Eva Vranová and Prof. Peter Bühlmann for helpful discussions and Philipp Ihmor for critically reading the manuscript. This work was supported by the Seventh Framework Program of the European Commission through the TiMet collaborative project (grant 245143) to W.G.


  1. 1.
    Barabási AL, Oltvai ZN (2004) Network biology: Understanding the cell‘s functional organization. Nat Rev Genet 5(2):101–115PubMedCrossRefGoogle Scholar
  2. 2.
    Huber W, Carey VJ, Long L et al (2007) Graphs in molecular biology. BMC Bioinformatics 8(Suppl 6):S8PubMedCentralPubMedCrossRefGoogle Scholar
  3. 3.
    Lèbre S, Lelandais G (2009) Modeling a regulatory network using temporal gene expression data: why and how? In: G. Alterovitz, R. Benson and M. Ramoni (eds) Automation in proteomics and genomics: an engineering case-based approach. Wiley, Chichester UK. pp. 69–96Google Scholar
  4. 4.
    Jeong H, Tombor B, Albert R et al (2000) The large-scale organization of metabolic networks. Nature 407(6804):651–654PubMedCrossRefGoogle Scholar
  5. 5.
    Oliver S (2000) Guilt-by-association goes global. Nature 403(6770):601–603PubMedCrossRefGoogle Scholar
  6. 6.
    Gao F, Foat BC, Bussemaker HJ (2004) Defining transcriptional networks through integrative modeling of mRNA expression and transcription factor binding data. BMC Bioinformatics 5:31PubMedCentralPubMedCrossRefGoogle Scholar
  7. 7.
    Hughes TR, Marton MJ, Jones AR et al (2000) Functional discovery via a compendium of expression profiles. Cell 102(1):109–126PubMedCrossRefGoogle Scholar
  8. 8.
    Stuart JM, Segal E, Koller D et al (2003) A gene-coexpression network for global discovery of conserved genetic modules. Science 302(5643):249–255PubMedCrossRefGoogle Scholar
  9. 9.
    Gachon CMM, Langlois-Meurinne M, Henry Y et al (2005) Transcriptional co-regulation of secondary metabolism enzymes in Arabidopsis: functional and evolutionary implications. Plant Mol Biol 58(2):229–245PubMedCrossRefGoogle Scholar
  10. 10.
    Wei HR, Persson S, Mehta T et al (2006) Transcriptional coordination of the metabolic network in Arabidopsis. Plant Physiol 142(2):762–774PubMedCentralPubMedCrossRefGoogle Scholar
  11. 11.
    Heyndrickx KS, Vandepoele K (2012) Systematic identification of functional plant modules through the integration of complementary data sources. Plant Physiol 159(3):884–901PubMedCentralPubMedCrossRefGoogle Scholar
  12. 12.
    Mentzen WI, Wurtele ES (2008) Regulon organization of Arabidopsis. BMC Plant Biol 8:99PubMedCentralPubMedCrossRefGoogle Scholar
  13. 13.
    Tieri P, de la Fuente A, Termanini A et al (2011) Integrating Omics data for signaling pathways, interactome reconstruction, and functional analysis. Methods Mol Biol 719:415–433PubMedCrossRefGoogle Scholar
  14. 14.
    Jeong H, Mason SP, Barabasi AL et al (2001) Lethality and centrality in protein networks. Nature 411(6833):41–42PubMedCrossRefGoogle Scholar
  15. 15.
    Leclerc RD (2008) Survival of the sparsest: robust gene networks are parsimonious. Mol Syst Biol 4:213PubMedCentralPubMedCrossRefGoogle Scholar
  16. 16.
    Albert R, Jeong H, Barabasi AL (2000) Error and attack tolerance of complex networks. Nature 406(6794):378–382PubMedCrossRefGoogle Scholar
  17. 17.
    Jalili M (2011) Error and attack tolerance of small-worldness in complex networks. J Informetrics 5(3):422–430CrossRefGoogle Scholar
  18. 18.
    Krylov DM, Wolf YI, Rogozin IB et al (2003) Gene loss, protein sequence divergence, gene dispensability, expression level, and interactivity are correlated in eukaryotic evolution. Genome Res 13(10):2229–2235PubMedCentralPubMedCrossRefGoogle Scholar
  19. 19.
    Zotenko E, Mestre J, O‘Leary DP et al (2008) Why do hubs in the yeast protein interaction network tend to be essential: reexamining the connection between the network topology and essentiality. PLoS Comput Biol 4(8):e1000140PubMedCentralPubMedCrossRefGoogle Scholar
  20. 20.
    Hartwell LH, Hopfield JJ, Leibler S et al (1999) From molecular to modular cell biology. Nature 402(6761):47–52CrossRefGoogle Scholar
  21. 21.
    Usadel B, Obayashi T, Mutwil M et al (2009) Co-expression tools for plant biology: opportunities for hypothesis generation and caveats. Plant Cell Environ 32(12):1633–1651PubMedCrossRefGoogle Scholar
  22. 22.
    Gavin AC, Aloy P, Grandi P et al (2006) Proteome survey reveals modularity of the yeast cell machinery. Nature 440(7084):631–636PubMedCrossRefGoogle Scholar
  23. 23.
    Freeman TC, Goldovsky L, Brosch M et al (2007) Construction, visualisation, and clustering of transcription networks from Microarray expression data. PLoS Comput Biol 3(10):2032–2042PubMedCrossRefGoogle Scholar
  24. 24.
    Alon U (2007) Network motifs: theory and experimental approaches. Nat Rev Genet 8(6):450–461PubMedCrossRefGoogle Scholar
  25. 25.
    Huang CY, Cheng CY, Sun CT (2007) Bridge and brick network motifs: identifying significant building blocks from complex biological systems. Artif Intell Med 41(2):117–127PubMedCrossRefGoogle Scholar
  26. 26.
    Milo R, Shen-Orr S, Itzkovitz S et al (2002) Network motifs: simple building blocks of complex networks. Science 298(5594):824–827PubMedCrossRefGoogle Scholar
  27. 27.
    Kim TH, Kim J, Heslop-Harrison P et al (2011) Evolutionary design principles and functional characteristics based on kingdom-specific network motifs. Bioinformatics 27(2):245–251PubMedCrossRefGoogle Scholar
  28. 28.
    Mao LY, Van Hemert JL, Dash S et al (2009) Arabidopsis gene co-expression network and its functional modules. BMC Bioinformatics 10:346PubMedCentralPubMedCrossRefGoogle Scholar
  29. 29.
    Aoki K, Ogata Y, Shibata D (2007) Approaches for extracting practical information from gene co-expression networks in plant biology. Plant Cell Physiol 48(3):381–390PubMedCrossRefGoogle Scholar
  30. 30.
    Lisso J, Steinhauser D, Altmann T et al (2005) Identification of brassinosteroid-related genes by means of transcript co-response analyses. Nucleic Acids Res 33(8):2685–2696PubMedCentralPubMedCrossRefGoogle Scholar
  31. 31.
    Hirai MY, Sugiyama K, Sawada Y et al (2007) Omics-based identification of Arabidopsis Myb transcription factors regulating aliphatic glucosinolate biosynthesis. Proc Natl Acad Sci U S A 104(15):6478–6483PubMedCentralPubMedCrossRefGoogle Scholar
  32. 32.
    Meier S, Tzfadia O, Vallabhaneni R et al (2011) A transcriptional analysis of carotenoid, chlorophyll and plastidial isoprenoid biosynthesis genes during development and osmotic stress responses in Arabidopsis thaliana. BMC Syst Biol 5:77PubMedCentralPubMedCrossRefGoogle Scholar
  33. 33.
    Vranová E, Coman D, Gruissem W (2012) Structure and dynamics of the isoprenoid pathway network. Mol Plant 5(2):318–333PubMedCrossRefGoogle Scholar
  34. 34.
    Mutwil M, Usadel B, Schutte M et al (2010) Assembly of an interactive correlation network for the Arabidopsis genome using a novel heuristic clustering algorithm. Plant Physiol 152(1):29–43PubMedCentralPubMedCrossRefGoogle Scholar
  35. 35.
    Zampieri M, Soranzo N, Bianchini D et al (2008) Origin of co-expression patterns in E. coli and S. cerevisiae emerging from reverse engineering algorithms. PLoS One 3(8):e2981PubMedCentralPubMedCrossRefGoogle Scholar
  36. 36.
    Zare H, Sangurdekar D, Srivastava P et al (2009) Reconstruction of Escherichia coli transcriptional regulatory networks via regulon-based associations. BMC Syst Biol 3:39PubMedCentralPubMedCrossRefGoogle Scholar
  37. 37.
    Segal E, Shapira M, Regev A et al (2003) Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nat Genet 34(2):166–176PubMedCrossRefGoogle Scholar
  38. 38.
    Ficklin SP, Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules between two grass species: maize and rice. Plant Physiol 156(3):1244–1256PubMedCentralPubMedCrossRefGoogle Scholar
  39. 39.
    Ma S, Shi M, Li Y et al (2010) Incorporating gene co-expression network in identification of cancer prognosis markers. BMC Bioinformatics 11:271PubMedCentralPubMedCrossRefGoogle Scholar
  40. 40.
    Oldham MC, Langfelder P, Horvath S (2012) Network methods for describing sample relationships in genomic datasets: application to Huntington's disease. BMC Syst Biol 6:63PubMedCentralPubMedCrossRefGoogle Scholar
  41. 41.
    Horan K, Jang C, Bailey-Serres J et al (2008) Annotating genes of known and unknown function by large-scale coexpression analysis. Plant Physiol 147(1):41–57PubMedCentralPubMedCrossRefGoogle Scholar
  42. 42.
    Ehlting J, Provart NJ, Werck-Reichhart D (2006) Functional annotation of the Arabidopsis P450 superfamily based on large-scale co-expression analysis. Biochem Soc Trans 34:1192–1198PubMedCrossRefGoogle Scholar
  43. 43.
    Brown DM, Zeef LAH, Ellis J et al (2005) Identification of novel genes in Arabidopsis involved in secondary cell wall formation using expression profiling and reverse genetics. Plant Cell 17(8):2281–2295PubMedCentralPubMedCrossRefGoogle Scholar
  44. 44.
    Persson S, Wei H, Milne J et al (2005) Identification of genes required for cellulose synthesis by regression analysis of public microarray data sets. Proc Natl Acad Sci U S A 102(24):8633–8638PubMedCentralPubMedCrossRefGoogle Scholar
  45. 45.
    Wille A, Zimmermann P, Vranova E et al (2004) Sparse graphical Gaussian modeling of the isoprenoid gene network in Arabidopsis thaliana. Genome Biol 5(11):R92PubMedCentralPubMedCrossRefGoogle Scholar
  46. 46.
    Ruiz-Sola MA, Rodriguez-Concepcion M (2012) Carotenoid biosynthesis in Arabidopsis: a colorful pathway. Arabidopsis Book 10:e0158PubMedCentralPubMedCrossRefGoogle Scholar
  47. 47.
    Xu XJ, Wang LS, Ding DF (2004) Learning module networks from genome-wide location and expression data. FEBS Lett 578(3):297–304PubMedCrossRefGoogle Scholar
  48. 48.
    Vandepoele K, Quimbaya M, Casneuf T et al (2009) Unraveling transcriptional control in Arabidopsis using cis-regulatory elements and coexpression networks. Plant Physiol 150(2):535–546PubMedCentralPubMedCrossRefGoogle Scholar
  49. 49.
    Movahedi S, Van Bel M, Heyndrickx KS et al (2012) Comparative co-expression analysis in plant biology. Plant Cell Environ 35(10):1787–1798PubMedCrossRefGoogle Scholar
  50. 50.
    Weirauch MT (2011) Gene coexpression networks for the analysis of DNA microarray data. Applied statistics for network biology. Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim, pp 215–250Google Scholar
  51. 51.
    Kanehisa M, Goto S, Sato Y et al (2012) KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res 40(1):109–114CrossRefGoogle Scholar
  52. 52.
    Vranová E, Hirsch-Hoffmann M, Gruissem W (2011) AtIPD: a curated database of Arabidopsis isoprenoid pathway models and genes for isoprenoid network analysis. Plant Physiol 156(4):1655–1660PubMedCentralPubMedCrossRefGoogle Scholar
  53. 53.
    Toufighi K, Brady SM, Austin R et al (2005) The botany array resource: e-Northerns, expression angling, and promoter analyses. Plant J 43(1):153–163PubMedCrossRefGoogle Scholar
  54. 54.
    Mockler TC, Michael TP, Priest HD et al (2007) The Diurnal project: Diurnal and circadian expression profiling, model-based pattern matching, and promoter analysis. Cold Spring Harb Symp 72:353–363CrossRefGoogle Scholar
  55. 55.
    Barrett T, Troup DB, Wilhite SE et al (2011) NCBI GEO: archive for functional genomics data sets-10 years on. Nucleic Acids Res 39:1005–1010CrossRefGoogle Scholar
  56. 56.
    Parkinson H, Sarkans U, Kolesnikov N et al (2011) ArrayExpress update-an archive of microarray and high-throughput sequencing-based functional genomics experiments. Nucleic Acids Res 39:1002–1004CrossRefGoogle Scholar
  57. 57.
    Irizarry RA, Hobbs B, Collin F et al (2003) Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4(2):249–264PubMedCrossRefGoogle Scholar
  58. 58.
    Smoot ME, Ono K, Ruscheinski J et al (2011) Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics 27(3):431–432PubMedCentralPubMedCrossRefGoogle Scholar
  59. 59.
    Obayashi T, Nishida K, Kasahara K et al (2011) ATTED-II updates: condition-specific gene coexpression to extend coexpression analyses and applications to a broad range of flowering Plants. Plant Cell Physiol 52(2):213–219PubMedCentralPubMedCrossRefGoogle Scholar
  60. 60.
    Srinivasasainagendra V, Page GP, Mehta T et al (2008) CressExpress: a tool for large-scale mining of expression data from Arabidopsis. Plant Physiol 147(3):1004–1016PubMedCentralPubMedCrossRefGoogle Scholar
  61. 61.
    Hruz T, Laule O, Szabo G et al (2008) Genevestigator v3: a reference expression database for the meta-analysis of transcriptomes. Adv Bioinformatics 2008:420747PubMedCentralPubMedCrossRefGoogle Scholar
  62. 62.
    D'haeseleer P (2005) How does gene expression clustering work? Nat Biotechnol 23:1499–1501PubMedCrossRefGoogle Scholar
  63. 63.
    Bickel DR (2003) Robust cluster analysis of microarray gene expression data with the number of clusters determined biologically. Bioinformatics 19(7):818–824PubMedCrossRefGoogle Scholar
  64. 64.
    Ma S, Gong Q, Bohnert HJ (2007) An Arabidopsis gene network based on the graphical Gaussian model. Genome Res 17(11):1614–1625PubMedCentralPubMedCrossRefGoogle Scholar
  65. 65.
    Erdős P, Rényi A (1961) On the strength of connectedness of a random graph. Acta Math Hung 12:261–267CrossRefGoogle Scholar
  66. 66.
    Watts DJ, Strogatz SH (1998) Collective dynamics of ‘small-world’ networks. Nature 393(6684):440–442PubMedCrossRefGoogle Scholar
  67. 67.
    Barabási AL, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Diana Coman
    • 1
  • Philipp Rütimann
    • 2
  • Wilhelm Gruissem
    • 1
    • 3
  1. 1.Department of BiologyPlant Biotechnology, ETH ZurichZurichSwitzerland
  2. 2.Seminar for Statistics, ETH ZurichZurichSwitzerland
  3. 3.Functional Genomics Center Zurich, ETH ZurichZurichSwitzerland

Personalised recommendations