Structure-Guided Rule-Based Annotation of Protein Functional Sites in UniProt Knowledgebase

  • Sona Vasudevan
  • C. R. Vinayaka
  • Darren A. Natale
  • Hongzhan Huang
  • Robel Y. Kahsay
  • Cathy H. Wu
Part of the Methods in Molecular Biology book series (MIMB, volume 694)


The rapid growth of protein sequence databases has necessitated the development of methods to computationally derive annotation for uncharacterized entries. Most such methods focus on “global” annotation, such as molecular function or biological process. Methods to supply high-accuracy “local” annotation to functional sites based on structural information at the level of individual amino acids are relatively rare. In this chapter we will describe a method we have developed for annotation of functional residues within experimentally-uncharacterized proteins that relies on position-specific site annotation rules (PIR Site Rules) derived from structural and experimental information. These PIR Site Rules are manually defined to allow for conditional propagation of annotation. Each rule specifies a tripartite set of conditions whereby candidates for annotation must pass a whole-protein classification test (that is, have end-to-end match to a whole-protein-based HMM), match a site-specific profile HMM and, finally, match functionally and structurally characterized residues of a template. Positive matches trigger the appropriate annotation for active site residues, binding site residues, modified residues, or other functionally important amino acids. The strict criteria used in this process have rendered high-confidence annotation suitable for UniProtKB/Swiss-Prot features.

Key words

PIR Site-rules Functional sites Functional annotation PIR Features 



Thanks are given to our colleagues at the Protein Information Resource (PIR) and to our UniProt collaborators at the Swiss Institute of Bioinformatics (SIB) and European Bioinformatics Institute (EBI) for their support and fruitful discussions. This work was funded under a grant to the UniProt Consortium. UniProt is supported by the National Institutes of Health, grant number: 5U01HG02712-05.


  1. 1.
    Date, S.V. (2007) Estimating protein function using protein-protein relationships. Methods Mol Biol. 408,109–127.PubMedCrossRefGoogle Scholar
  2. 2.
    Glaser, F., Pupko, T., Paz, I., Bell, R.E., Bechor-Shental, D., Martz, E., and Ben-Tal, N. (2003) ConSurf: identification of functional regions in proteins by surface-mapping of phylogenetic information. Bioinformatics. 19, 163–164.PubMedCrossRefGoogle Scholar
  3. 3.
    Laskowski, R.A., Watson, J.D., and Thornton, J.M. (2005) ProFunc: a server for predicting protein function from 3D structure. Nucleic Acids Res. 33, W89–W93.PubMedCrossRefGoogle Scholar
  4. 4.
    Standley, D.M., Toh, H., and Nakamura, H. (2008) Functional annotation by sequence-weighted structure alignments: Statistical analysis and case studies from the Protein 3000 structural genomics project in Japan. Proteins. 72, 1333–1351.PubMedCrossRefGoogle Scholar
  5. 5.
    Tatusov, R.L., Fedorova, N.D., Jackson, J.D., Jacobs, A.R., Kiryutin, B., Koonin, E.V., Krylov, D.M., Mazumder, R., Mekhedov, S.L., Nikolskaya, A.N., Rao, B.S., Smirnov, S., Sverdlov, A.V., Vasudevan, S., Wolf, Y.I., Yin, J.J., and Natale, D.A. (2003) The COG database: an updated version includes eukaryotes. BMC Bioinformatics. 4, 41.PubMedCrossRefGoogle Scholar
  6. 6.
    Nikolskaya, A.N., Arighi, C.N., Huang, H., Barker, W.C., and Wu, C.H. (2006) PIRSF family classification system for protein functional and evolutionary analysis. Evol Bioinform Online. 2, 197–209.Google Scholar
  7. 7.
    Aziz, R.K., Bartels, D., Best, A.A., DeJongh, M., Disz, T., Edwards, R.A., Formsma, K., Gerdes, S., Glass, E.M., Kubal, M., Meyer, F., Olsen, G.J., Olson, R., Osterman, A.L., Overbeek, R.A., McNeil, L.K., Paarmann, D., Paczian, T., Parrello, B., Pusch, G.D., Reich, C., Stevens, R., Vassieva, O., Vonstein, V., Wilke, A., and Zagnitko, O. (2008) The RAST Server: rapid annotations using subsystems technology. BMC Genomics. 9, 75.PubMedCrossRefGoogle Scholar
  8. 8.
    Hunter, S., Apweiler, R., Attwood, T.K., Bairoch, A., Bateman, A., Binns, D., Bork, P., Das, U., Daugherty, L., Duquenne, L., Finn, R.D., Gough, J., Haft, D, Hulo, N., Kahn, D., Kelly, E., Laugraud, A., Letunic, I., Lonsdale, D., Lopez, R., Madera, M., Maslen, J., McAnulla, C., McDowall, J., Mistry, J., Mitchell, A., Mulder, N., Natale, D., Orengo, C., Quinn, A.F., Selengut, J.D., Sigrist, C.J., Thimma, M., Thomas, P.D., Valentin, F., Wilson, D., Wu, C.H., and Yeats, C. (2009) InterPro: the integrative protein signature database. Nucleic Acids Res. 37, D211–D215.PubMedCrossRefGoogle Scholar
  9. 9.
    Lima, T., Auchincloss, A.H., Coudert, E., Keller, G., Michoud, K., Rivoire, C., Bulliard, V., de Castro, E., Lachaize, C., Baratin, D., Phan, I., Bougueleret, L., and Bairoch, A. (2009) HAMAP: a database of completely sequenced microbial proteome sets and manually curated microbial protein families in UniProtKB/Swiss-Prot. Nucleic Acids Res. 37, D471–D478.PubMedCrossRefGoogle Scholar
  10. 10.
    UniProt Consortium. (2008) The Universal Protein Resource (UniProt). Nucleic Acids Res. 36, D190–D195.CrossRefGoogle Scholar
  11. 11.
    Wu, C.H., Nikolskaya, A., Huang, H., Yeh, L-S., Natale, D.A., Vinayaka, C.R., Hu, Z-Z., Mazumder, R., Kumar, S., Kourtesis, P., Ledley, R.S., Suzek, B.E., Arminski, L., Chen, Y., Zhang, J., Cardenas, J.L., Chung, S., Castro-Alvear, J., Dinkov, G., and Barker, W.C. (2004) PIRSF: family classification system at the Protein Information Resource. Nucleic Acids Res. 32, D112–D114.PubMedCrossRefGoogle Scholar
  12. 12.
    Bourne, P.E., Westbrook, J., and Berman, H.M. (2004) The Protein Data Bank and lessons in data management. Brief Bioinform. 5, 23–30.PubMedCrossRefGoogle Scholar
  13. 13.
    Laskowski, R.A. (2001) PDBsum: summaries and analyses of PDB structures. Nucleic Acids Res. 29, 221–222.PubMedCrossRefGoogle Scholar
  14. 14.
    Bartlett, G.J., Porter, C.T., Borkakoti, N., and Thornton, J.M. (2002) Analysis of catalytic residues in enzyme active sites. J Mol Biol. 324, 105–121.PubMedCrossRefGoogle Scholar
  15. 15.
    Porter, C.T., Bartlett, G.J., and Thornton, J.M. (2004) The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data. Nucleic Acids Res. 32, D129–D133.PubMedCrossRefGoogle Scholar
  16. 16.
    Eddy, S.R. (1998) Profile hidden Markov models. Bioinformatics. 14, 755–763.PubMedCrossRefGoogle Scholar
  17. 17.
    Wu, C.H., Huang, H., Yeh, L.S., and Barker, W.C. (2003) Protein family classification and functional annotation. Comput Biol Chem. 27, 37–47.PubMedCrossRefGoogle Scholar
  18. 18.
    LeMaster, D.M., Springer, P.A., and Unkefer, C.J. (1997) The role of the buried aspartate of Escherichia coli thioredoxin in the activation of the mixed disulfide intermediate. J Biol Chem. 272, 29998–30001.PubMedCrossRefGoogle Scholar
  19. 19.
    Katti, S.K., LeMaster, D.M., and Eklund, H. (1990) Crystal structure of thioredoxin from Escherichia coli at 1.68 A resolution. J Mol Biol. 212, 167–184.PubMedCrossRefGoogle Scholar
  20. 20.
    Chivers, P.T., Prehoda, K.E., and Raines, R.T. (1997) The CXXC motif: a rheostat in the active site. Biochemistry. 36, 4061–4066.PubMedCrossRefGoogle Scholar
  21. 21.
    Frey, P.A., Hegeman, A.D., and Ruzicka, F.J. (2008) The Radical SAM Superfamily. Crit Rev Biochem Mol Biol. 43, 63–88.PubMedCrossRefGoogle Scholar
  22. 22.
    Layer, G., Grage, K., Teschner, T., Schünemann, V., Breckau, D., Masoumi, A., Jahn, M., Heathcote, P., Trautwein, A.X., Jahn, D. (2005) Radical S-adenosylmethionine enzyme coproporphyrinogen III oxidase HemN: functional features of the [4Fe-4S] cluster and the two bound S-adenosyl-L-methionines. J Biol Chem. 280, 29038–29046.PubMedCrossRefGoogle Scholar
  23. 23.
    Bork, P., and Koonin, E.V. (1998) Predicting functions from protein sequences--where are the bottlenecks? Nat Genet. 18, 313–318.PubMedCrossRefGoogle Scholar
  24. 24.
    Devos, D., and Valencia, A. (2001) Intrinsic errors in genome annotation. Trends Genet. 17, 429–431.PubMedCrossRefGoogle Scholar
  25. 25.
    Astner, I., Schulze, J.O., van den Heuvel, J, Jahn, D., Schubert, W.D., and Heinz, D.W. (2005) Crystal structure of 5-aminolevulinate synthase, the first enzyme of heme biosynthesis, and its link to XLSA in humans. EMBO J. 24, 3166–3177.PubMedCrossRefGoogle Scholar
  26. 26.
    Janosik, M., Oliveriusova, J., Janosikova, B., Sokolova, J., Kraus, E., Kraus, J.P., and Kozich, V. (2001) Impaired heme binding and aggregation of mutant cystathionine beta-synthase subunits in homocystinuria. Am J Hum Genet. 68, 1506–1513.PubMedCrossRefGoogle Scholar
  27. 27.
    Nakazawa, T., Takai, T., Hatanaka, H., Mizuuchi, E., Nagamune, T., Okumura, K., and Ogawa, H. (2005) Multiple-mutation at a potential ligand-binding region decreased allergenicity of a mite allergen Der f 2 without disrupting global structure. FEBS Lett. 579, 1988–1994.PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Sona Vasudevan
    • 1
  • C. R. Vinayaka
    • 1
  • Darren A. Natale
    • 1
  • Hongzhan Huang
    • 2
  • Robel Y. Kahsay
    • 3
  • Cathy H. Wu
    • 2
  1. 1.Department of Biochemistry and Molecular & Cellular BiologyGeorgetown University Medical CenterWashingtonUSA
  2. 2.Department of Computer and Information SciencesUniversity of DelawareNewarkUSA
  3. 3.DuPont Central Research & DevelopmentWilmingtonUSA

Personalised recommendations