Skip to main content

Extracting the Strongest Signals from Omics Data: Differentially Expressed Pathways and Beyond

  • Protocol
  • First Online:
Biological Networks and Pathway Analysis

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1613))

Abstract

The analysis of gene sets (in a form of functionally related genes or pathways) has become the method of choice for extracting the strongest signals from omics data. The motivation behind using gene sets instead of individual genes is two-fold. First, this approach incorporates pre-existing biological knowledge into the analysis and facilitates the interpretation of experimental results. Second, it employs a statistical hypotheses testing framework. Here, we briefly review main Gene Set Analysis (GSA) approaches for testing differential expression of gene sets and several GSA approaches for testing statistical hypotheses beyond differential expression that allow extracting additional biological information from the data. We distinguish three major types of GSA approaches testing: (1) differential expression (DE), (2) differential variability (DV), and (3) differential co-expression (DC) of gene sets between two phenotypes. We also present comparative power analysis and Type I error rates for different approaches in each major type of GSA on simulated data. Our evaluation presents a concise guideline for selecting GSA approaches best performing under particular experimental settings. The value of the three major types of GSA approaches is illustrated with real data example. While being applied to the same data set, major types of GSA approaches result in complementary biological information.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Mootha VK et al (2003) PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet 34(3):267–273

    Article  CAS  PubMed  Google Scholar 

  2. Bar HY, Booth JG, Wells MT ((2012)) A mixture-model approach for parallel testing for unequal variances. Stat Appl Genet Mol Biol 11(1.) p. Article 8

    Google Scholar 

  3. Ho JW et al (2008) Differential variability analysis of gene expression and its application to human diseases. Bioinformatics 24(13):i390–i398

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Hulse AM, Cai JJ (2013) Genetic variants contribute to gene expression variability in humans. Genetics 193(1):95–108

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Mar JC et al (2011) Variance of gene expression identifies altered network constraints in neurological disease. PLoS Genet 7(8):e1002207

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Xu Z et al (2011) Antisense expression increases gene expression variability and locus interdependency. Mol Syst Biol 7:468

    Article  PubMed Central  Google Scholar 

  7. Bravo HC et al (2012) Gene expression anti-profiles as a basis for accurate universal cancer signatures. BMC Bioinform 13:272

    Article  Google Scholar 

  8. Dinalankara W, Bravo HC (2015) Gene expression signatures based on variability can robustly predict tumor progression and prognosis. Cancer Informat 14:71–81

    Google Scholar 

  9. Friedman JH, Rafsky LC (1979) Multivariate generalizations of the Wald-Wolfowitz and Smirnov two-sample tests. Ann Stat 7(4):697–717

    Article  Google Scholar 

  10. Rahmatallah Y, Emmert-Streib F, Glazko G (2012) Gene set analysis for self-contained tests: complex null and specific alternative hypotheses. Bioinformatics 28(23):3073–3080

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Afsari B, Geman D, Fertig EJ (2014) Learning dysregulated pathways in cancers from differential variability analysis. Cancer Informat 13(Suppl 5):61–67

    Google Scholar 

  12. Fisher R (1932) Statistical methods for research workers. Oliver and Boyd, Edinburg

    Google Scholar 

  13. Stadler N, Mukherjee S (2015) Multivariate gene-set testing based on graphical models. Biostatistics 16(1):47–59

    Article  Google Scholar 

  14. Friedman J, Hastie T, Tibshirani R (2008) Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(3):432–441

    Article  PubMed  Google Scholar 

  15. Meinshausen N, Bühlmann P (2006) High-dimensional graphs and variable selection with the lasso. Ann Stat 34(3):1436–1462

    Article  Google Scholar 

  16. Schafer J, Strimmer K (2005) A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Stat Appl Genet Mol Biol 4(1):Article 32

    Article  Google Scholar 

  17. Choi Y, Kendziorski C (2009) Statistical methods for gene set co-expression analysis. Bioinformatics 25(21):2780–2786

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Rahmatallah Y, Emmert-Streib F, Glazko G (2014) Gene Sets Net Correlations Analysis (GSNCA): a multivariate differential coexpression test for gene sets. Bioinformatics 30(3):360–368

    Article  CAS  PubMed  Google Scholar 

  19. Santos Sde S et al (2015) CoGA: an R package to identify differentially co-expressed gene sets by analyzing the graph spectra. PLoS One 10(8):e0135831

    Article  PubMed  Google Scholar 

  20. Takahashi DY et al (2012) Discriminating different classes of biological networks by analyzing the graphs spectra distribution. PLoS One 7(12):e49949

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Goeman JJ, Buhlmann P (2007) Analyzing gene expression data in terms of gene sets: methodological issues. Bioinformatics 23(8):980–987

    Article  CAS  PubMed  Google Scholar 

  22. Tian L et al (2005) Discovering statistically significant pathways in expression profiling studies. Proc Natl Acad Sci U S A 102(38):13544–13549

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Ackermann M, Strimmer K (2009) A general modular framework for gene set enrichment analysis. BMC Bioinform 10(1):47

    Article  Google Scholar 

  24. Rahmatallah Y, Emmert-Streib F, Glazko G (2014) Comparative evaluation of gene set analysis approaches for RNA-Seq data. BMC Bioinform 15(1):397

    Article  Google Scholar 

  25. Montaner D et al (2009) Gene set internal coherence in the context of functional profiling. BMC Genomics 10:197

    Article  PubMed  PubMed Central  Google Scholar 

  26. Gatti DM et al (2010) Heading down the wrong pathway: on the influence of correlation within gene sets. BMC Genomics 11:574

    Article  PubMed  PubMed Central  Google Scholar 

  27. Tripathi S, Emmert-Streib F (2012) Assessment method for a power analysis to identify differentially expressed pathways. PLoS One 7(5):e37510

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Glazko GV, Emmert-Streib F (2009) Unite and conquer: univariate and multivariate approaches for finding differentially expressed gene sets. Bioinformatics 25(18):2348–2354

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Wang X et al (2011) Linear combination test for hierarchical gene set analysis. Stat Appl Genet Mol Biol 10(1.) Article 13

    Google Scholar 

  30. Hanzelmann S, Castelo R, Guinney J (2013) GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinform 14:7

    Article  Google Scholar 

  31. Khatri P, Sirota M, Butte AJ (2012) Ten years of pathway analysis: current approaches and outstanding challenges. PLoS Comput Biol 8(2):e1002375

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Maciejewski H (2014) Gene set analysis methods: statistical models and methodological differences. Brief Bioinform 15(4):504–518

    Article  PubMed  Google Scholar 

  33. Nam D, Kim SY (2008) Gene-set approach for expression pattern analysis. Brief Bioinform 9(3):189–197

    Article  PubMed  Google Scholar 

  34. Tamayo P et al (2012) The limitations of simple gene set enrichment analysis assuming gene independence. Stat Methods Med Res 25(1):472–487

    Article  PubMed  PubMed Central  Google Scholar 

  35. Tarca AL, Bhatti G, Romero R (2013) A comparison of gene set analysis methods in terms of sensitivity, prioritization and specificity. PLoS One 8(11):e79217

    Article  PubMed Central  Google Scholar 

  36. Tripathi S, Glazko GV, Emmert-Streib F (2013) Ensuring the statistical soundness of competitive gene set approaches: gene filtering and genome-scale coverage are essential. Nucleic Acids Res 41(7):e82

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Dinu I et al (2007) Improving gene set analysis of microarray data by SAM-GS. BMC Bioinform 8:242

    Article  Google Scholar 

  38. Subramanian A et al (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 102(43):15545–15550

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Barbie DA et al (2009) Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1. Nature 462(7269):108–112

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Fridley BL, Jenkins GD, Biernacka JM (2010) Self-contained gene-set analysis of expression data: an evaluation of existing and novel methods. PLoS One 5(9)

    Google Scholar 

  41. Stouffer S, DeVinney L, Suchmen E (1949) The American soldier: adjustment during army life, vol 1. Princeton University Press, Princeton, NJ

    Google Scholar 

  42. Taylor J, Tibshirani R (2006) A tail strength measure for assessing the overall univariate significance in a dataset. Biostatistics 7(2):167–181

    Article  PubMed  Google Scholar 

  43. Robinson MD, McCarthy DJ, Smyth GK (2010) edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26(1):139–140

    Article  CAS  PubMed  Google Scholar 

  44. Anders S, Huber W (2010) Differential expression analysis for sequence count data. Genome Biol 11(10):R106

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Smyth G (2005) Limma: linear models for microarray data. In: Smyth G, Gentleman R, Carey V, Dudoit S, Irizarry R, Huber W (eds) Bioinformatics and computational biology solutions using r and bioconductor. Springer, New York, pp 397–420

    Chapter  Google Scholar 

  46. Law CW et al (2014) Voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol 15(2):R29

    Article  PubMed  PubMed Central  Google Scholar 

  47. Rahmatallah Y, Emmert-Streib F, Glazko G (2016) Gene set analysis approaches for RNA-seq data: performance evaluation and application guideline. Brief Bioinform 17(3):393–407

    Article  PubMed  Google Scholar 

  48. Tusher VG, Tibshirani R, Chu G (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A 98(9):5116–5121

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Baldi P, Long AD (2001) A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes. Bioinformatics 17(6):509–519

    Article  CAS  PubMed  Google Scholar 

  50. Smyth GK (2004) Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 3:3

    Article  Google Scholar 

  51. Dinu I et al (2009) Gene-set analysis and reduction. Brief Bioinform 10(1):24–34

    Article  CAS  Google Scholar 

  52. Liu Q et al (2007) Comparative evaluation of gene-set analysis methods. BMC Bioinform 8:431

    Article  Google Scholar 

  53. Baringhaus L, Franz C (2004) On a new multivariate two-sample test. J Multivar Anal 88:190–206

    Article  Google Scholar 

  54. Klebanov L et al (2007) A multivariate extension of the gene set enrichment analysis. J Bioinforma Comput Biol 5(5):1139–1153

    Article  CAS  Google Scholar 

  55. Wu D et al (2010) ROAST: rotation gene set tests for complex microarray experiments. Bioinformatics 26(17):2176–2182

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Damian D, Gorfine M (2004) Statistical concerns about the GSEA procedure. Nat Genet 36(7):663. author reply 663

    Article  CAS  PubMed  Google Scholar 

  57. Ritchie ME et al (2015) limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 43(7):e47

    Article  PubMed  PubMed Central  Google Scholar 

  58. Pickrell JK et al (2010) Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature 464(7289):768–772

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Olivier M et al (2002) The IARC TP53 database: new online mutation analysis and recommendations to users. Hum Mutat 19(6):607–614

    Article  CAS  PubMed  Google Scholar 

  60. Liberzon A et al (2011) Molecular signatures database (MSigDB) 3.0. Bioinformatics 27(12):1739–1740

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Wu D, Smyth GK (2012) Camera: a competitive gene set test accounting for inter-gene correlation. Nucleic Acids Res 40(17):e133

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Bandres E et al (2005) Gene expression profile induced by BCNU in human glioma cell lines with differential MGMT expression. J Neuro-Oncol 73(3):189–198

    Article  CAS  Google Scholar 

  63. Ongusaha PP et al (2003) BRCA1 shifts p53-mediated cellular outcomes towards irreversible growth arrest. Oncogene 22(24):3749–3758

    Article  CAS  Google Scholar 

Download references

Acknowledgments

We would like to thank Bárbara Macías Solís for proof reading of the manuscript. Support has been provided in part by the Arkansas INBRE program, with grants from the National Center for Research Resources (P20RR016460) and the National Institute of General Medical Sciences (P20 GM103429) from the National Institutes of Health. Large-scale computer simulations were implemented using the High Performance Computing (HPC) resources at the UALR Computational Research Center supported by the following grants: National Science Foundation grants CRI CNS-0855248, EPS-0701890, MRI CNS-0619069 and OISE-0729792.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Galina Glazko .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media LLC

About this protocol

Cite this protocol

Glazko, G., Rahmatallah, Y., Zybailov, B., Emmert-Streib, F. (2017). Extracting the Strongest Signals from Omics Data: Differentially Expressed Pathways and Beyond. In: Tatarinova, T., Nikolsky, Y. (eds) Biological Networks and Pathway Analysis. Methods in Molecular Biology, vol 1613. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-7027-8_7

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-7027-8_7

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-7025-4

  • Online ISBN: 978-1-4939-7027-8

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics