An Improved Statistic for Detecting Over-Represented Gene Ontology Annotations in Gene Sets

  • Steffen Grossmann
  • Sebastian Bauer
  • Peter N. Robinson
  • Martin Vingron
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3909)


We propose an improved statistic for detecting over-represented Gene Ontology (GO) annotations in gene sets. While the current methods treats each term independently and hence ignores the structure of the GO hierarchy, our approach takes parent-child relationships into account. Over-representation of a term is measured with respect to the presence of its parental terms in the set. This resolves the problem that the standard approach tends to falsely detect an over-representation of more specific terms below terms known to be over-represented. To show this, we have generated gene sets in which single terms are artificially over-represented and compared the receiver operator characteristics of the two approaches on these sets. A comparison on a biological dataset further supports our method. Our approach comes at no additional computational complexity when compared to the standard approach. An implementation is available within the framework of the freely available Ontologizer application.


Gene Ontology Receiver Operator Characteristic True Positive Rate Improve Statistic Population Proportion 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ashburner, M., Ball, C., Blake, J., Botstein, D., Butler, H., Cherry, J., Davis, A., Dolinski, K., Dwight, S., Eppig, J., Harris, M., Hill, D., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J., Richardson, J., Ringwald, M., Rubin, G., Sherlock, G.: Gene Ontology: tool for the unification of biology. The Gene Ontology Consortium 25, 25–29 (2000)Google Scholar
  2. 2.
    Harris, M.A., Clark, J., Ireland, A., Lomax, J., Ashburner, M., Foulger, R., Eilbeck, K., Lewis, S., Marshall, B., Mungall, C., Richter, J., Rubin, G.M., Blake, J.A., Bult, C., Dolan, M., Drabkin, H., Eppig, J.T., Hill, D.P., Ni, L., Ringwald, M., Balakrishnan, R., Cherry, J.M., Christie, K.R., Costanzo, M.C., Dwight, S.S., Engel, S., Fisk, D.G., Hirschman, J.E., Hong, E.L., Nash, R.S., Sethuraman, A., Theesfeld, C.L., Botstein, D., Dolinski, K., Feierbach, B., Berardini, T., Mundodi, S., Rhee, S.Y., Apweiler, R., Barrell, D., Camon, E., Dimmer, E., Lee, V., Chisholm, R., Gaudet, P., Kibbe, W., Kishore, R., Schwarz, E.M., Sternberg, P., Gwinn, M., Hannick, L., Wortman, J., Berriman, M., Wood, V., de la Cruz, N., Tonellato, P., Jaiswal, P., Seigfried, T., White, R.: The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res 32, D258–D261 (2004)Google Scholar
  3. 3.
    Camon, E., Magrane, M., Barrell, D., Lee, V., Dimmer, E., Maslen, J., Binns, D., Harte, N., Lopez, R., Apweiler, R.: The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology. Nucleic Acids Res 32, D262–D266 (2004)Google Scholar
  4. 4.
    Castillo-Davis, C.I., Hartl, D.L.: GeneMerge–post-genomic analysis, data mining, and hypothesis testing. Bioinformatics 19, 891–892 (2003)CrossRefGoogle Scholar
  5. 5.
    Berriz, G.F., King, O.D., Bryant, B., Sander, C., Roth, F.P.: Characterizing gene sets with FuncAssociate. Bioinformatics 19, 2502–2504 (2003)CrossRefGoogle Scholar
  6. 6.
    Draghici, S., Khatri, P., Bhavsar, P., Shah, A., Krawetz, S.A., Tainsky, M.A.: Onto-Tools, the toolkit of the modern biologist: Onto-Express, Onto-Compare, Onto-Design and Onto-Translate. Nucleic Acids Res 31, 3775–3781 (2003)CrossRefGoogle Scholar
  7. 7.
    Beissbarth, T., Speed, T.P.: GOstat: find statistically overrepresented Gene Ontologies within a group of genes. Bioinformatics 20, 1464–1465 (2004)CrossRefGoogle Scholar
  8. 8.
    Martin, D., Brun, C., Remy, E., Mouren, P., Thieffry, D., Jacq, B.: GOToolBox: functional analysis of gene datasets based on Gene Ontology. Genome Biol. 5, R101 (2004)Google Scholar
  9. 9.
    Robinson, P.N., Wollstein, A., Böhme, U., Beattie, B.: Ontologizing gene-expression microarray data: characterizing clusters with Gene Ontology. Bioinformatics 20, 979–981 (2004)CrossRefGoogle Scholar
  10. 10.
    Khatri, P., Draghici, S.: Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics 21, 3587–3595 (2005)CrossRefGoogle Scholar
  11. 11.
    Sharan, R., Suthram, S., Kelley, R.M., Kuhn, T., McCuine, S., Uetz, P., Sittler, T., Karp, R.M., Ideker, T.: Conserved patterns of protein interaction in multiple species. Proc. Natl. Acad. Sci. USA 102, 1974–1979 (2005)CrossRefGoogle Scholar
  12. 12.
    Westfall, P.H., Young, S.S.: Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment. Wiley-Interscience, New York (1993)Google Scholar
  13. 13.
    Ge, Y., Dudoit, S., Speed, T.: Resampling-based multiple testing for microarray data analysis. TEST 12, 1–77 (2003)CrossRefMathSciNetGoogle Scholar
  14. 14.
    Dwight, S.S., Harris, M.A., Dolinski, K., Ball, C.A., Binkley, G., Christie, K.R., Fisk, D.G., Issel-Tarver, L., Schroeder, M., Sherlock, G., Sethuraman, A., Weng, S., Botstein, D., Cherry, J.M.: Saccharomyces Genome Database (SGD) provides secondary gene annotation using the Gene Ontology (GO). Nucleic Acids Res 30, 69–72 (2002)CrossRefGoogle Scholar
  15. 15.
    Spellman, P., Sherlock, G., Zhang, M., Iyer, V., Anders, K., Eisen, M., Brown, P., Botstein, D., Futcher, B.: Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell 9, 3273–3297 (1998)Google Scholar
  16. 16.
    Gansner, E.R., North, S.C.: An open graph visualization system and its applications to software engineering. Software — Practice and Experience 30, 1203–1233 (2000)zbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Steffen Grossmann
    • 1
  • Sebastian Bauer
    • 1
    • 2
  • Peter N. Robinson
    • 2
  • Martin Vingron
    • 1
  1. 1.Max Planck Institute for Molecular GeneticsBerlinGermany
  2. 2.Institute for Medical GeneticsCharité University Hospital, Humboldt UniversityBerlinGermany

Personalised recommendations