Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates


A Graphical Interactive Tool For Comparative Analysis of Large Gene Sets in Gene Ontology™ Space


Abstract: The analysis of complex patterns of gene regulation is central to understanding the biology of cells, tissues and organisms. Patterns of gene regulation pertaining to specific biological processes can be revealed by a variety of experimental strategies, particularly microarrays and other highly parallel methods, which generate large datasets linking many genes. Although methods for detecting gene expression have improved substantially in recent years, understanding the physiological implications of complex patterns in gene expression data is a major challenge. This article presents GoSurfer, an easy-to-use graphical exploration tool with built-in statistical features that allow a rapid assessment of the biological functions represented in large gene sets. GoSurfer takes one or two list(s) of gene identifiers (Affymetrix® probe set ID) as input and retrieves all the Gene Ontology™ (GO) terms associated with the input genes. GoSurfer visualises these GO terms in a hierarchical tree format. With GoSurfer, users can perform statistical tests to search for the GO terms that are enriched in the annotations of the input genes. These GO terms can be highlighted on the GO tree. Users can manipulate the GO tree in various ways and interactively query the genes associated with any GO term. The user-generated graphics can be saved as graphics files, and all the GO information related to the input genes can be exported as text files.

Availability: GoSurfer is a Windows®-based program freely available for noncommercial use and can be downloaded at Datasets used to construct the trees shown in the figures in this article are available at

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2


  1. 1.

    Raw image files of 102 oligonucleotide arrays[3] (U95Av2, Affymetrix) were obtained from the Whitehead Institute Center for Genomic Research ( ). Expression value computation and normalisation was carried out using dChip ( ) with all parameters set to default, except for minus values, which were truncated to zero. p-Values for differential expression were calculated for all probe sets using a two-sided Welch modified two-sample t-test, assuming unequal variance and with the confidence interval set to 95%.


  1. 1.

    Ashburner M, Ball CA, Blake JA, et al. Gene Ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 2000; 25: 25–9

  2. 2.

    Storch KF, Lipan O, Leykin I, et al. Extensive and divergent circadian gene expression in liver and heart. Nature 2002; 417: 78–83

  3. 3.

    Singh D, Febbo PG, Ross K, et al. Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 2002; 1: 203–9

  4. 4.

    Dhanasekaran SM, Barrette TR, Ghosh D, et al. Delineation of prognostic bio-markers in prostate cancer. Nature 2001; 412: 822–6

  5. 5.

    Varambally S, Dhanasekaran SM, Zhou M, et al. The polycomb group protein EZH2 is involved in progression of prostate cancer. Nature 2002; 419: 624–9

  6. 6.

    Hanahan D, Weinberg RA. The hallmarks of cancer. Cell 2000; 100: 57–70

  7. 7.

    Zhong S, Li C, Wong WH. ChipInfo: software for extracting gene annotation and gene ontology information for microarray analysis. Nucleic Acids Res 2003; 31: 3483–6

Download references


We thank Dr Cheng Li for valuable discussions and suggestions. We thank the Editor, Professor Allen Rodrigo, for valuable suggestions on the manuscript revision. This work was supported by NIH grants CA95616 and HG02341. SZ created the algorithm and the software. WHW supervised the software development. KFS, OL and MCJK did data analysis with the software. CJW and WHW supervised the data analysis. The authors have no conflicts of interest that are directly relevant to the content of this article.

Author information

Correspondence to Dr Wing H. Wong.

Appendix: Statistical Methods

Appendix: Statistical Methods

To address the significance of the group and GO term association, we used the Pearson Chi-squared test. The 2×2 table (table AI) was constructed for every GO term (node). In figure 2, the 1261 genes upregulated in prostate tumours refer to Group 1, and the 1808 genes downregulated in prostate tumours refer to Group 2.

Table AI

2×2 table for gene list—Gene Ontology™ (GO) term association test

The Chi-squared test statistic χ2 is written as:


where i and j are the indices of rows and columns, Obsj is the observed value at the ij th cell in the 2×2 table, and Expij is the expected value at the ij th cell, holding the marginal row totals constant.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Zhong, S., Storch, K., Lipan, O. et al. GoSurfer. Appl-Bioinformatics 3, 261–264 (2004).

Download citation


  • Prostate Cancer
  • Gene Ontology
  • Directed Acyclic Graph
  • Normal Prostate
  • Mitotic Control