Pathway and Network Analysis of Differentially Expressed Genes in Transcriptomes

  • Qianli Huang
  • Ming-an Sun
  • Ping YanEmail author
Part of the Methods in Molecular Biology book series (MIMB, volume 1751)


In recent years, transcriptome sequencing has become very popular, encompassing a wide variety of applications from simple mRNA profiling to discovery and analysis of the entire transcriptome. One of the most common aims of transcriptome sequencing is to identify genes that are differentially expressed (DE) between two or more biological conditions, and to infer associated pathways and gene networks from expression profiles. It can provide avenues for further systematic investigation into potential biologic mechanisms. Gene Set (GS) enrichment analysis is a popular approach to identify pathways or sets of genes that are significantly enriched in the context of differentially expressed genes. However, the approach considers a pathway as a simple gene collection disregarding knowledge of gene or protein interactions. In contrast, topology-based methods integrate the topological structure of a pathway and gene network into the analysis. To provide a panoramic view of such approaches, this chapter demonstrates several recent computational workflows, including gene set enrichment and topology-based methods, for analysis of the DE pathways and gene networks from transcriptome-wide sequencing data.

Key words

Transcriptome RNA-Seq Microarray Pathway Network Topology Enrichment analysis 



This work was supported by the Fundamental Research Funds for the Central Universities (Grant No. JZ2017YYPY0899). The authors are grateful to the editors and the anonymous reviewers for their valuable suggestions and comments facilitating the improvement of this chapter.


  1. 1.
    Conesa A, Madrigal P, Tarazona S, Gomez-Cabrero D, Cervera A, McPherson A, Szcześniak MW, Gaffney DJ, Elo LL, Zhang X, Mortazavi A (2016) A survey of best practices for RNA-Seq data analysis. Genome Biol 17:13. CrossRefPubMedPubMedCentralGoogle Scholar
  2. 2.
    Bayerlova M, Jung K, Kramer F, Klemm F, Bleckmann A, Beissbarth T (2015) Comparative study on gene set and pathway topology-based enrichment methods. BMC Bioinformatics 16:334. CrossRefPubMedPubMedCentralGoogle Scholar
  3. 3.
    Jaakkola MK, Elo LL (2016) Empirical comparison of structure-based pathway methods. Brief Bioinform 17(2):336–345. CrossRefPubMedGoogle Scholar
  4. 4.
    Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci 102(43):15545–15550. CrossRefPubMedPubMedCentralGoogle Scholar
  5. 5.
    Nam D, Kim S-Y (2008) Gene-set approach for expression pattern analysis. Brief Bioinform 9(3):189–197CrossRefPubMedGoogle Scholar
  6. 6.
    Huang d W, Sherman BT, Lempicki RA (2009) Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 4(1):44–57. CrossRefGoogle Scholar
  7. 7.
    Barry WT, Nobel AB, Wright FA (2005) Significance analysis of functional categories in gene expression studies: a structured permutation approach. Bioinformatics 21(9):1943–1949. CrossRefPubMedGoogle Scholar
  8. 8.
    Beissbarth T, Speed TP (2004) GOstat: find statistically overrepresented gene ontologies within a group of genes. Bioinformatics 20(9):1464–1465. CrossRefPubMedGoogle Scholar
  9. 9.
    Team RC (2014) R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing 14(3):279-293.Google Scholar
  10. 10.
    Charmpi K, Ycart B (2015) Weighted Kolmogorov Smirnov testing: an alternative for gene set enrichment analysis. Stat Appl Genet Mol Biol 14.
  11. 11.
    Fontoura CARS, Castellani G, Mombach JCM (2016) The R implementation of the CRAN package PATHChange, a tool to study genetic pathway alterations in transcriptomic data. Comput Biol Med 78:76–80. CrossRefPubMedGoogle Scholar
  12. 12.
    Draghici S, Khatri P, Tarca AL, Amin K, Done A, Voichita C, Georgescu C, Romero R (2007) A systems biology approach for pathway level analysis. Genome Res 17(10):1537–1545. CrossRefPubMedPubMedCentralGoogle Scholar
  13. 13.
    Mitrea C, Taghavi Z, Bokanizad B, Hanoudi S, Tagett R, Donato M, Voichiţa C, Drăghici S (2013) Methods and approaches in the topology-based analysis of biological pathways. Front Physiol 4:278. CrossRefPubMedPubMedCentralGoogle Scholar
  14. 14.
    Ahsan S, Draghici S (2017) Identifying significantly impacted pathways and putative mechanisms with iPathwayGuide. Curr Protoc Bioinformatics 57:7.15.11–17.15.30. Google Scholar
  15. 15.
    Ibrahim M, Jassim S, Cawthorne MA, Langlands K (2014) A MATLAB tool for pathway enrichment using a topology-based pathway regulation score. BMC Bioinformatics 15:358. CrossRefPubMedPubMedCentralGoogle Scholar
  16. 16.
    Wadi L, Meyer M, Weiser J, Stein LD, Reimand J (2016) Impact of outdated gene annotations on pathway enrichment analysis. Nat Methods 13(9):705–706. CrossRefPubMedGoogle Scholar
  17. 17.
    Dona MSI, Prendergast LA, Mathivanan S, Keerthikumar S, Salim A (2017) Powerful differential expression analysis incorporating network topology for next-generation sequencing data. Bioinformatics 33(10):1505–1513. CrossRefPubMedGoogle Scholar
  18. 18.
    Jacob L, Neuvial P, Dudoit S (2010) Gains in power from structured two-sample tests of means on graphs. arXiv preprint arXiv:10095173Google Scholar
  19. 19.
    Martini P, Sales G, Massa MS, Chiogna M, Romualdi C (2013) Along signal paths: an empirical gene set approach exploiting pathway topology. Nucleic Acids Res 41(1):e19–e19. CrossRefPubMedGoogle Scholar
  20. 20.
    Massa MS, Chiogna M, Romualdi C (2010) Gene set analysis exploiting the topology of a pathway. BMC Syst Biol 4:121. PubMedPubMedCentralGoogle Scholar
  21. 21.
    Sales G, Calura E, Cavalieri D, Romualdi C (2012) graphite - a Bioconductor package to convert pathway topology to gene network. BMC Bioinformatics 13:20–20. CrossRefPubMedPubMedCentralGoogle Scholar
  22. 22.
    Clough E, Barrett T (2016) The Gene Expression Omnibus database. Methods Mol Biol 1418:93–110. CrossRefPubMedPubMedCentralGoogle Scholar
  23. 23.
    Davis S, Meltzer PS (2007) GEOquery: a bridge between the Gene Expression Omnibus (GEO) and BioConductor. Bioinformatics 23:1846–1847.Google Scholar
  24. 24.
    Cerami EG, Gross BE, Demir E, Rodchenkov I, Babur O, Anwar N, Schultz N, Bader GD, Sander C (2011) Pathway Commons, a web resource for biological pathway data. Nucleic Acids Res 39(Database):D685–D690. CrossRefPubMedGoogle Scholar
  25. 25.
    Kanehisa M, Furumichi M, Tanabe M, Sato Y, Morishima K (2017) KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res 45(D1):D353–D361. CrossRefPubMedGoogle Scholar
  26. 26.
    Sidiropoulos K, Viteri G, Sevilla C, Jupe S, Webber M, Orlic-Milacic M, Jassal B, May B, Shamovsky V, Duenas C (2017) Reactome enhanced pathway visualization. Bioinformatics 33(21):3461–3467CrossRefPubMedGoogle Scholar
  27. 27.
    Luna A, Babur O, Aksoy BA, Demir E, Sander C (2016) PaxtoolsR: pathway analysis in R using Pathway Commons. Bioinformatics 32(8):1262–1264. CrossRefPubMedGoogle Scholar
  28. 28.
    Liberzon A, Birger C, Thorvaldsdóttir H, Ghandi M, Mesirov JP, Tamayo P (2015) The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst 1(6):417–425. CrossRefPubMedPubMedCentralGoogle Scholar
  29. 29.
    Liberzon A, Subramanian A, Pinchback R, Thorvaldsdóttir H, Tamayo P, Mesirov JP (2011) Molecular signatures database (MSigDB) 3.0. Bioinformatics 27(12):1739–1740. CrossRefPubMedPubMedCentralGoogle Scholar
  30. 30.
    Lu TP, Tsai MH, Lee JM, Hsu CP, Chen PC, Lin CW, Shih JY, Yang PC, Hsiao CK, Lai LC, Chuang EY (2010) Identification of a novel biomarker, SEMA5A, for non-small cell lung carcinoma in nonsmoking women. Cancer Epidemiol Biomarkers Prevent 19(10):2590–2597. CrossRefGoogle Scholar
  31. 31.
    Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, Holko M (2012) NCBI GEO: archive for functional genomics data sets—update. Nucleic Acids Res 41(D1):D991–D995CrossRefPubMedPubMedCentralGoogle Scholar
  32. 32.
    Anders S, Huber W (2010) Differential expression analysis for sequence count data. Genome Biol 11(10):R106. CrossRefPubMedPubMedCentralGoogle Scholar
  33. 33.
    Leng N, Dawson JA, Thomson JA, Ruotti V, Rissman AI, Smits BMG, Haag JD, Gould MN, Stewart RM, Kendziorski C (2013) EBSeq: an empirical Bayes hierarchical model for inference in RNA-Seq experiments. Bioinformatics 29(8):1035–1043. CrossRefPubMedPubMedCentralGoogle Scholar
  34. 34.
    Robinson MD, McCarthy DJ, Smyth GK (2010) edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26(1):139–140. CrossRefPubMedGoogle Scholar
  35. 35.
    Tarazona S, García F, Ferrer A, Dopazo J, Conesa A (2012) NOIseq: a RNA-Seq differential expression method robust for sequencing depth biases. EMBnet J 17(B):18–19CrossRefGoogle Scholar
  36. 36.
    Kim SK, Kim SY, Kim JH, Roh SA, Cho DH, Kim YS, Kim JC (2014) A nineteen gene-based risk score classifier predicts prognosis of colorectal cancer patients. Mol Oncol 8(8):1653–1666. CrossRefPubMedPubMedCentralGoogle Scholar
  37. 37.
    Chatr-Aryamontri A, Breitkreutz BJ, Oughtred R, Boucher L, Heinicke S, Chen D, Stark C, Breitkreutz A, Kolas N, O'Donnell L, Reguly T, Nixon J, Ramage L, Winter A, Sellam A, Chang C, Hirschman J, Theesfeld C, Rust J, Livstone MS, Dolinski K, Tyers M (2015) The BioGRID interaction database: 2015 update. Nucleic Acids Res 43(Database issue):D470–D478. CrossRefPubMedGoogle Scholar
  38. 38.
    Sales G, Calura E, Romualdi C (2012) GRAPH interaction from pathway topological environment BMC Bioinformatics 2013
  39. 39.
    Caspi R, Altman T, Billington R, Dreher K, Foerster H, Fulcher CA, Holland TA, Keseler IM, Kothari A, Kubo A, Krummenacker M, Latendresse M, Mueller LA, Ong Q, Paley S, Subhraveti P, Weaver DS, Weerasinghe D, Zhang P, Karp PD (2014) The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of Pathway/Genome Databases. Nucleic Acids Res 42(Database issue):D459–D471. CrossRefPubMedGoogle Scholar
  40. 40.
    Mi H, Muruganujan A, Thomas PD (2013) PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees. Nucleic Acids Res 41(Database issue):D377–D386. PubMedGoogle Scholar
  41. 41.
    Schaefer CF, Anthony K, Krupa S, Buchoff J, Day M, Hannay T, Buetow KH (2009) PID: the Pathway Interaction Database. Nucleic Acids Res 37(Database issue):D674–D679. CrossRefPubMedGoogle Scholar
  42. 42.
    Gray KA, Yates B, Seal RL, Wright MW, Bruford EA (2015) the HGNC resources in 2015. Nucleic Acids Res 43(Database issue):D1079–D1085. CrossRefPubMedGoogle Scholar
  43. 43.
    Maglott D, Ostell J, Pruitt KD, Tatusova T (2005) Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res 33(Database issue):D54–D58. CrossRefPubMedGoogle Scholar
  44. 44.
    Knijnenburg TA, Wessels LFA, Reinders MJT, Shmulevich I (2009) Fewer permutations, more accurate P-values. Bioinformatics 25(12):i161–i168. CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2018

Authors and Affiliations

  1. 1.School of Biological and Medical EngineeringHefei University of TechnologyHefeiChina
  2. 2.Epigenomics and Computational Biology LabBiocomplexity Institute of Virginia TechBlacksburgUSA

Personalised recommendations