Skip to main content

Gene Set Enrichment Analysis

  • Protocol
  • First Online:
Protein Networks and Pathway Analysis

Part of the book series: Methods in Molecular Biology ((MIMB,volume 563))

Abstract

Set enrichment analytical methods have become commonplace tools applied to the analysis and interpretation of biological data. The statistical techniques are used to identify categorical biases within lists of genes, proteins, or metabolites. The goal is to discover the shared functions or properties of the biological items represented within the lists. Application of these methods can provide great biological insight, including the discovery of participation in the same biological activity or pathway, shared interacting genes or regulators, common cellular compartmentalization, or association with disease. The methods require ordered or unordered lists of biological items as input, understanding of the reference set from which the lists were selected, categorical classifiers describing the items, and a statistical algorithm to assess bias of each classifier. Due to the complexity of most algorithms and the number of calculations performed, computer software is almost always used for execution of the algorithm, as well as for presentation of the results.

This chapter will provide an overview of the statistical methods used to perform an enrichment analysis. Guidelines for assembly of the requisite information will be presented, with a focus on careful definition of the sets used by the statistical algorithms. The need for multiple test correction when working with large libraries of classifiers is emphasized, and we outline several options for performing the corrections. Finally, interpreting the results of such analysis will be discussed along with examples of recent research utilizing the techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ashburner, M., Ball, C., Blake, J., Botstein, D., Butler, H., Cherry, J., Davis, A., Dolinski, K., Dwight, S., Eppig, J., Harris, M., Hill, D., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J., Richardson, J., Ringwald, M., Rubin, G. and Sherlock, G. (2000) Gene Ontology: Tool for the unification of biology. Nature. 25, 25–29.

    CAS  Google Scholar 

  2. Online Mendelian Inheritance in Man, OMIM (TM). McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University (Baltimore, MD) and National Center for Biotechnology Information, National Library of Medicine (Bethesda, MD) http://www.ncbi.nlm.nih.gov/Omim/

  3. Mulder, N., Apweiler, R., Attwood, T., Bairoch, A., Bateman, A., Binns, D., Bork, P., Buillard, V., Cerutti, L., Copley, R., Courcelle, E., Das, U., Daugherty, L., Dibley, M., Finn, R., Fleischmann, W., Gough, J., Haft, D., Hulo, N., Hunter, S., Kahn, D., Kanapin, A., Kejariwal, A., Labarga, A., Langendijk-Genevaux, P., Lonsdale, D., Lopez, R., Letunic, I., Madera, M., Maslen, J., McAnulla, C., McDowall, J., Mistry, J., Mitchell, A., Nikolskaya, A., Orchard, S., Orengo, C., Petryszak, R., Selengut, J., Sigrist, C., Thomas, P., Valentin, F., Wilson, D., Wu, C. and Yeats, C. (2007) New developments in the InterPro database. Nucleic Acids Research. 35, D224.

    Article  PubMed  CAS  Google Scholar 

  4. PubMed. http://www.pubmed.gov/

  5. Guide to GO Evidence Codes. Gene Ontology http://www.geneontology.org/GO.evidence.shtml

  6. Fisher’s exact test. Wikipedia http://en.wikipedia.org/wiki/Fisher’s_exact_test

  7. Hypergeometric distribution. Wikipedia http://en.wikipedia.org/wiki/Hypergeometric_distribution

  8. Binomial distribution. Wikipedia http://en.wikipedia.org/wiki/Binomial_distribution

  9. Chi-square distribution. Wikipedia http://en.wikipedia.org/wiki/Chi-square_distribution

  10. Goeman, J. and Buhlmann, P. (2007) Analyzing gene expression data in terms of gene sets: Methodological issues. Bioinformatics. 23, 980–987.

    Article  PubMed  CAS  Google Scholar 

  11. Subramanian, A., Tamayo, P., Mootha, V., Mukherjee, S., Ebert, B., Gillette, M., Paulovich, A., Pomeroy, S., Golub, T., Lander, E. and Mesirov, P. (2005) Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences. 102, 15545–15550.

    Google Scholar 

  12. Bonferroni correction. Wikipedia http://en. wikipedia.org/wiki/Bonferroni_correction

  13. Ury, H. (1976) A comparison of four procedures for multiple comparisons among means (pairwise contrasts) for arbitrary sample sizes. Technometrics. 18, 89–97.

    Article  Google Scholar 

  14. Holm, S. (1979) A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics. 6, 65–70.

    Google Scholar 

  15. Benjamini, Y. and Hochberg, Y. (1995) Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. 57, 289–300.

    Google Scholar 

  16. Storey, J. (2003) The positive false discovery rate: A Bayesian interpretation and the q-value. Annals of Statistics. 31, 2013–2035.

    Article  Google Scholar 

  17. Berriz, G., King, O., Bryant, B., Sander, C. and Roth, P. (2003) Characterizing gene sets with FuncAssociate. Bioinformatics. 19, 2502–2504.

    Article  PubMed  CAS  Google Scholar 

  18. Khatri, P. and Draghici, S. (2005) Ontological analysis of gene expression data: Current tools, limitations, and open problems. Bioinformatics. 21, 3587–3595.

    Article  PubMed  CAS  Google Scholar 

  19. Mootha, V., Lindgren, C., Eriksson, K.-F., Subramanian, A., Sihag, S., Lehar, J., Puigserver, P., Carlsson, E., RidderstrÃ¥le, M., Laurila, E., Houstis, N., Daly, M., Patterson, N., Mesirov, J., Golub, T., Tamayo, P., Spiegelman, B., Lander, E., Hirschhorn, J., Altshuler, D. and Groop, L. (2003) PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nature Genetics. 34, 267–273.

    Article  PubMed  CAS  Google Scholar 

  20. Khatri, P., Draghici, S., Ostermeier, G. and Krawetz, S. (2002) Profiling gene expression using onto-express. Genomics. 79, 266–270.

    Article  PubMed  CAS  Google Scholar 

  21. Lee, H., Braynen, W., Keshav, K. and Pavlidis, P. (2005) ErmineJ: Tool for functional analysis of gene expression data sets. BMC Bioinformatics. 6, 269.

    Article  PubMed  Google Scholar 

  22. Backes, C., Keller, A., Kuentzer, J., Kneissl, B., Comtesse, N., Elnakady, Y., Mueller, R., Meese, E. and Lenhof, H.-P. (2007) GeneTrail – Advanced gene set enrichment analysis. Nucleic Acids Research. 35, 186.

    Article  Google Scholar 

  23. Prufer, K., Muetzel, B., Do, H.-H., Weiss, G., Khaitovich, P., Rahm, E., Paabo, S., Lachmann, M. and Enard, W. (2007) FUNC: a package for detecting significant associations between gene sets and ontological annotations. BMC Bioinformatics. 8, 41.

    Article  PubMed  Google Scholar 

  24. Beißbarth, T. and Speed, T. (2004) GOstat: Find statistically overrepresented Gene Ontologies within a group of genes. Bioinformatics. 20, 1464–1465.

    Article  PubMed  Google Scholar 

  25. Houstis, N., Rosen, E. and Lander, E. (2006) Reactive oxygen species have a causal role in multiple forms of insulin resistance. Nature. 440, 944–948.

    Article  PubMed  CAS  Google Scholar 

  26. Vivanco, I., Palaskas, N., Tran, C., Finn, S., Getz, G., Kennedy, N., Jiao, J., Rose, J., Xie, W., Loda, M., Golub, T., Mellinghoff, I., Davis, R. and Sawyers, C. (2007) Identification of the JNK signaling pathway as a functional target of the tumor suppressor PTEN. Cancer Cell. 11, 555–569.

    Article  PubMed  CAS  Google Scholar 

  27. Li, Z., Srivastava, S., Yang, X., Mittal, S., Norton, P., Resau, J., Haab, B. and Chan, C. (2007) A hierarchical approach employing metabolic and gene expression profiles to identify the pathways that confer cytotoxicity in HepG2 cells. BMC Systems Biology. 1, 21.

    Google Scholar 

  28. Grasser, W., Orlic, I., Borovecki, F., Riccardi, K., Simic, P., Vukicevic, S. and Paralkar, V. (2007) BMP-6 exerts its osteoinductive effect through activation of IGF-I and EGF pathways. International Orthopaedics. 31, 759–765.

    Article  PubMed  CAS  Google Scholar 

  29. Radich, J., Dai, H., Mao, M., Oehler, V., Schelter, J., Druker, B., Sawyers, C., Shah, N., Stock, W., Willman, C., Friend, S. and Linsley, P. (2006) Gene expression changes associated with progression and response in chronic myeloid leukemia. Proceedings of the National Academy of Sciences. 103, 2794.

    Google Scholar 

  30. Dehan, E., Ben-Dor, A., Liao, W., Lipson, D., Frimer, H., Rienstein, S., Simansky, D., Krupsky, M., Yaron, P., Friedman, E., Rechavi, G., Perlman, M., Aviram-Goldring, A., Izraeli, S., Bittner, M., Yakhini, Z. and Kaminski, N. (2007) Chromosomal aberrations and gene expression profiles in non-small cell lung cancer. Lung Cancer. 56, 175–184.

    Article  PubMed  CAS  Google Scholar 

  31. Dixon, A., Liang, L., Moffatt, M., Chen, W., Heath, S., Wong, K., Taylor, J., Burnett, E., Gut, I., Farrall, M., Lathrop, G. M., Abecasis, G. and Cookson, W. (2007) A genome-wide association study of global gene expression. Nature Genetics. Advanced online publication. 39, 1202–1207.

    Google Scholar 

Download references

Acknowledgments

We would like to thank Roumyana Yordanova for her thoughts and advice on statistical matters.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Humana Press, a part of Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Tilford, C.A., Siemers, N.O. (2009). Gene Set Enrichment Analysis. In: Nikolsky, Y., Bryant, J. (eds) Protein Networks and Pathway Analysis. Methods in Molecular Biology, vol 563. Humana Press. https://doi.org/10.1007/978-1-60761-175-2_6

Download citation

  • DOI: https://doi.org/10.1007/978-1-60761-175-2_6

  • Published:

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-60761-174-5

  • Online ISBN: 978-1-60761-175-2

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics