A Novel Method for Expanding Current Annotations in Gene Ontology

  • Dapeng Hao
  • Xia Li
  • Lei Du
  • Liangde Xu
  • Jiankai Xu
  • Shaoqi Rao
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4115)


Since the gap between the amount of protein sequence data and the reliable function annotations in public databases is growing, characterizing protein functions becomes a major task in the post genomic era. Some current ways to predict functions of a protein are based on the relationships between the protein and other proteins in databases. As a large fraction of annotated proteins are not fully characterized, annotating novel proteins is limited. Therefore, it is of high demand to develop efficient computation methods to push the current broad function annotations of the partially known proteins toward more detailed and specific knowledge. In this study, we explore the capability of a rule-based method for expanding the current annotations per some function categorization system such as Gene Ontology. Applications of the proposed method to predict human and yeast protein functions demonstrate its efficiency in expanding the knowledge space of the partially known proteins.


Gene Ontology Child Node Function Annotation Decision Tree Algorithm Function Knowledge 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ashburner, M., et al.: Gene Ontology: Tool for the Unification of Biology. The Gene Ontology Consortium. Nat. Genet., 25–29 (2000)Google Scholar
  2. 2.
    Attwood, T.K., et al.: PRINTS and Its Automatic Supplement, prePRINTS. Nucleic Acids Res., 400–402 (2003)Google Scholar
  3. 3.
    Bateman, A., et al.: The Pfam Protein Families Database. Nucleic Acids Res., 138–141 (1994)Google Scholar
  4. 4.
    Camon, E., et al.: The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology. Nucleic Acids Res., 262–266 (2002)Google Scholar
  5. 5.
    Dwight, S.S., et al.: Saccharomyces Genome Database (SGD) Provides Secondary Gene Annotation using the Gene Ontology (GO). Nucleic Acids Res., 69–72 (1996)Google Scholar
  6. 6.
    Enright, A.J., et al.: Protein Interaction Maps for Complete Genomes Based on Gene Fusion Events. Nature 402, 86–90 (1998)Google Scholar
  7. 7.
    Falquet, L., et al.: The PROSITE Database, Its Status in 2002. Nucleic Acids Res. 30, 235–238 (1997)CrossRefGoogle Scholar
  8. 8.
    Kretschmann, E., et al.: Automatic Rule Generation for Protein Annotation with the C4.5 Data Mining Algorithm Applied on SWISS-PROT. Bioinformatics, 920–926 (1998)Google Scholar
  9. 9.
    Lagreid, A., et al.: Predicting Gene Ontology Biological Process from Temporal Gene Expression Patterns. Genome Res., 965–979 (1998)Google Scholar
  10. 10.
    Letovsky, S., Kasif, S.: Predicting Protein Function from Protein/protein Interaction Data: a Probabilistic Approach. Bioinformatics 19(Suppl.), 197–204 (1998)Google Scholar
  11. 11.
    Marcotte, E.M., et al.: Detecting Protein Function and Protein-protein Interactions from Genome Sequences. Science, 751–753 (1998)Google Scholar
  12. 12.
    Mulder, N.J., et al.: InterPro, Progress and Status in 2005. Nucleic Acids Res., 201–205 (1998)Google Scholar
  13. 13.
    Pellegrini, M., et al.: Assigning Protein Functions by Comparative Genome Analysis: Protein Phylogenetic Profiles. Proc. Natl. Acad. Sci. USA, 4285–4288 (1997)Google Scholar
  14. 14.
    Tao, T., et al.: A Study of Statistical Methods for Function Prediction of Protein Motifs. Appl. Bioinformatics, 115–124 (1998)Google Scholar
  15. 15.
    Tu, K., et al.: Learnability-based Further Prediction of Gene Functions in Gene Ontology. Genomics, 922–928 (1997)Google Scholar
  16. 16.
    Yang, S., et al.: Phylogeny Determined by Protein Domain Content. Proc. Natl. Acad. Sci. USA, 373–378 (1998)Google Scholar
  17. 17.
    Zhou, X., et al.: Transitive Functional Annotation by Shortest-path Analysis of Gene Expression data. Proc. Natl. Acad. Sci. USA, 12783–12788 (1997)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Dapeng Hao
    • 1
  • Xia Li
    • 1
    • 2
  • Lei Du
    • 1
  • Liangde Xu
    • 1
  • Jiankai Xu
    • 1
  • Shaoqi Rao
    • 1
    • 3
  1. 1.Department of BioinformaticsHarbin Medical UniversityHarbinP.R. China
  2. 2.Biomedical Engineering InstituteCapital University of Medical SciencesBeijingP.R. China
  3. 3.Departments of Cardiovascular Medicine and Molecular CardiologyThe Cleveland Clinic FoundationClevelandUSA

Personalised recommendations