Extracting the Strongest Signals from Omics Data: Differentially Expressed Pathways and Beyond
The analysis of gene sets (in a form of functionally related genes or pathways) has become the method of choice for extracting the strongest signals from omics data. The motivation behind using gene sets instead of individual genes is two-fold. First, this approach incorporates pre-existing biological knowledge into the analysis and facilitates the interpretation of experimental results. Second, it employs a statistical hypotheses testing framework. Here, we briefly review main Gene Set Analysis (GSA) approaches for testing differential expression of gene sets and several GSA approaches for testing statistical hypotheses beyond differential expression that allow extracting additional biological information from the data. We distinguish three major types of GSA approaches testing: (1) differential expression (DE), (2) differential variability (DV), and (3) differential co-expression (DC) of gene sets between two phenotypes. We also present comparative power analysis and Type I error rates for different approaches in each major type of GSA on simulated data. Our evaluation presents a concise guideline for selecting GSA approaches best performing under particular experimental settings. The value of the three major types of GSA approaches is illustrated with real data example. While being applied to the same data set, major types of GSA approaches result in complementary biological information.
Key wordsOmics data Gene set analysis approaches Hypotheses testing Self-contained Competitive Differential expression Differential co-expression Differential variability
We would like to thank Bárbara Macías Solís for proof reading of the manuscript. Support has been provided in part by the Arkansas INBRE program, with grants from the National Center for Research Resources (P20RR016460) and the National Institute of General Medical Sciences (P20 GM103429) from the National Institutes of Health. Large-scale computer simulations were implemented using the High Performance Computing (HPC) resources at the UALR Computational Research Center supported by the following grants: National Science Foundation grants CRI CNS-0855248, EPS-0701890, MRI CNS-0619069 and OISE-0729792.
- 2.Bar HY, Booth JG, Wells MT ((2012)) A mixture-model approach for parallel testing for unequal variances. Stat Appl Genet Mol Biol 11(1.) p. Article 8Google Scholar
- 8.Dinalankara W, Bravo HC (2015) Gene expression signatures based on variability can robustly predict tumor progression and prognosis. Cancer Informat 14:71–81Google Scholar
- 11.Afsari B, Geman D, Fertig EJ (2014) Learning dysregulated pathways in cancers from differential variability analysis. Cancer Informat 13(Suppl 5):61–67Google Scholar
- 12.Fisher R (1932) Statistical methods for research workers. Oliver and Boyd, EdinburgGoogle Scholar
- 29.Wang X et al (2011) Linear combination test for hierarchical gene set analysis. Stat Appl Genet Mol Biol 10(1.) Article 13Google Scholar
- 40.Fridley BL, Jenkins GD, Biernacka JM (2010) Self-contained gene-set analysis of expression data: an evaluation of existing and novel methods. PLoS One 5(9)Google Scholar
- 41.Stouffer S, DeVinney L, Suchmen E (1949) The American soldier: adjustment during army life, vol 1. Princeton University Press, Princeton, NJGoogle Scholar