Gene Set Enrichment Analysis Using Non-parametric Scores

  • Ariel E. Bayá
  • Mónica G. Larese
  • Pablo M. Granitto
  • Juan Carlos Gómez
  • Elizabeth Tapia
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4643)


Gene Set Enrichment Analysis (GSEA) is a well-known technique used for studying groups of functionally related genes and their correlation with phenotype. This method creates a ranked list of genes, which is used to calculate an enrichment score. In this work, we introduce two different metrics for gene ranking in GSEA, namely the Wilcoxon and the Baumgartner-Weiß-Schindler tests. The advantage of these metrics is that they do not assume any particular distribution on the data. We compared them with the signal-to-noise ratio metric originally proposed by the developers of GSEA on a type 2 diabetes mellitus (DM2) database. Statistical significance is evaluated by means of false discovery rate and p-value calculations. Results show that the Baumgartner-Weiß-Schindler test detects more pathways with statistical significance. One of them could be related to DM2, according to the literature, but further research is needed.


GSEA gene ranking non-parametric statistical tests statistical significance DNA microarrays 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Baumgartner, W., Weiß, P., Schindler, H.: A Nonparametric Test for the General Two-Sample Problem. Biometrics 54, 1129–1135 (1998)zbMATHCrossRefGoogle Scholar
  2. 2.
    Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 286, 531–537 (1999)CrossRefGoogle Scholar
  3. 3.
    Liu, L., Hawkins, D.M., Ghosh, S., Young, S.S.: Robust Singular Value Decomposition Analysis of Microarray Data. PNAS 100(23), 13167–13172 (2003)zbMATHCrossRefGoogle Scholar
  4. 4.
    Manoli, T., Gretz, N., Gröne, H.-J., Kenzelmann, M., Eils, R., Brors, B.: Group Testing for Pathway Analysis Improves Comparability of Different Microarray Datasets. Bioinformatics 22(20), 2500–2506 (2006)CrossRefGoogle Scholar
  5. 5.
    Mootha, V.K., Lindgren, C.M., Eriksson, K.-F., Subramanian, A., Sihag, S., Lehar, J., Puigserver, P., Carlsson, E., Ridderstråle, M., Laurila, E., Houstis, N., Daly, M.J., Patterson, N., Mesirov, J.P., Golub, T.R., Tamayo, P., Spiegelman, B., Lander, E.S., Hirschhorn, J.N., Altshuler, D., Groop, L.C.: PGC-1α-Responsive Genes Involved in Oxidative Phosphorylation are Coordinately Downregulated in Human Diabetes. Nature Genetics 34(3), 267–273 (2003)CrossRefGoogle Scholar
  6. 6.
    Neuhäuser, M.: An Exact Two-Sample Test Based on the Baumgartner-Weiß-Schindler Statistic and a Modification of the Lepage’s Test. Communications in Statistics-Theory and Methods 29(1), 67–78 (2000)zbMATHCrossRefGoogle Scholar
  7. 7.
    Neuhäuser, M., Senske, R.: The Baumgartner-Weiß-Schindler Test for the Detection of Differentially Expressed Genes in Replicated Microarray Experiments. Bioinformatics 20(18), 3553–3564 (2004)CrossRefGoogle Scholar
  8. 8.
    Petersen, K.F., Dufour, S., Befroy, D., Garcia, R., Shulman, G.I.: Impaired Mitochondrial Activity in the Insulin-Resistant Offspring of Patients with Type 2 Diabetes. The New England Journal of Medicine 350, 664–671 (2004)CrossRefGoogle Scholar
  9. 9.
    Rufer, A.C., Thoma, R., Benz, J., Stihle, M., Gsell, B., De Roo, E., Banner, D.W., Mueller, F., Chomienne, O., Hennig, M.: The Crystal Structure of Carnitine Palmitoyltransferase 2 and Implications for Diabetes Treatment. Structure 14, 713–723 (2006)CrossRefGoogle Scholar
  10. 10.
    Sheskin, D.J.: Handbook of Parametric and Nonparametric Statistical Procedures, 2nd edn. Chapman & Hall/CRC, Boca Raton (2000)Google Scholar
  11. 11.
    Shipp, M.A., Ross, K.N., Tamayo, P., Weng, A.P., Kutok, J.L., Aguiar, R.C.T., Gaasenbeek, M., Angelo, M., Reich, M., Pinkus, G.S., Ray, T.S., Koval, M.A., Last, K.W., Norton, A., Lister, T.A., Mesirov, J., Neuberg, D.S., Lander, E.S., Aster, J.C., Golub, T.R.: Diffuse Large B-cell Lymphoma Outcome Prediction by Gene-Expression Profiling and Supervised Machine Learning. Nature Medicine 8(1), 68–74 (2002)CrossRefGoogle Scholar
  12. 12.
    Storey, J.D., Tibshirani, R.: Statistical Significance for Genomewide Studies. PNAS 100(16), 9440–9445 (2003)zbMATHCrossRefGoogle Scholar
  13. 13.
    Subramanian, A., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gillette, M.A., Paulovich, A., Pomeroy, S.L., Golub, T.R., Lander, E.S., Mesirov, J.P.: Gene Set Enrichment Analysis: A Knowledge-Based Approach for Interpreting Genome-Wide Expression Profiles. PNAS 102(43), 15545–15550 (2005)CrossRefGoogle Scholar
  14. 14.
    Wood, P.A.: Genetically Modified Mouse Models for Disorders of Fatty Acid Metabolism: Pursuing the Nutrigenomics of Insulin Resistance and Type 2 Diabetes. Nutrition 20(1), 121–126 (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Ariel E. Bayá
    • 1
  • Mónica G. Larese
    • 1
  • Pablo M. Granitto
    • 1
  • Juan Carlos Gómez
    • 1
    • 2
  • Elizabeth Tapia
    • 1
    • 3
  1. 1.Intelligent Systems Group, Instituto de Física Rosario, CONICET, Bv. 27 de Febrero 210 Bis, 2000 RosarioArgentina
  2. 2.Laboratory for System Dynamics and Signal Processing, FCEIA, UNR, Riobamba 245 Bis, 2000 RosarioArgentina
  3. 3.Communications Department, FCEIA, UNR, Riobamba 245 Bis, 2000 RosarioArgentina

Personalised recommendations