The Manually Annotated Section of the UniProt KnowledgeBase
  • Emmanuel Boutet
  • Damien Lieberherr
  • Michael Tognolli
  • Michel Schneider
  • Amos Bairoch
Part of the Methods in Molecular Biology™ book series (MIMB, volume 406)


The Swiss Institute of Bioinformatics (SIB), the European Bioinformatics Institute (EBI), and the Protein Information Resource (PIR) form the Universal Protein Resource (UniProt) consortium. Its main goal is to provide the scientific community with a central resource for protein sequences and functional information. The UniProt consortium maintains the UniProt KnowledgeBase (UniProtKB) and several supplementary databases including the UniProt Reference Clusters (UniRef) and the UniProt Archive (UniParc). (1) UniProtKB is a comprehensive protein sequence knowledgebase that consists of two sections: UniProtKB/Swiss-Prot, which contains manually annotated entries, and UniProtKB/TrEMBL, which contains computer-annotated entries. UniProtKB/Swiss-Prot entries contain information curated by biologists and provide users with cross-links to about 100 external databases and with access to additional information or tools. (2) The UniRef databases (UniRef100, UniRef90, and UniRef50) define clusters of protein sequences that share 100, 90, or 50% identity. (3) The UniParc database stores and maps all publicly available protein sequence data, including obsolete data excluded from UniProtKB. The UniProt databases can be accessed online ( or downloaded in several formats ( New releases are published every 2 weeks. The purpose of this chapter is to present a guided tour of a UniProtKB/Swiss-Prot entry, paying particular attention to the specificities of plant protein annotation. We will also present some of the tools and databases that are linked to each entry.

Key Words

Swiss-Prot TrEMBL UniProt protein database amino-acid sequence manual annotation 


  1. 1.
    Bairoch, A., Apweiler, R., Wu, C.H., Barker, W.C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez, R., Magrane, M., Martin, M.J., Natale, D.A., O’Donovan, C., Redaschi, N., and Yeh, L.S. (2005) The Universal Protein Resource (UniProt). Nucleic Acids Res. 33(Database issue), D154–D159.CrossRefPubMedGoogle Scholar
  2. 2.
    Bairoch, A., Boeckmann, B., Ferro, S., and Gasteiger, E. (2004) Swiss-Prot: juggling between evolution and stability. Brief. Bioinform. 5, 39–55.CrossRefPubMedGoogle Scholar
  3. 3.
    Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M.-C., Estreicher, A., Gasteiger, E., Martin, M.J., Michoud, K., O’Donovan, C., Phan, I., Pilbout, S., and Schneider, M. (2003) The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 31, 365–370.CrossRefPubMedGoogle Scholar
  4. 4.
    Mulder, N.J., Apweiler, R., Attwood, T.K., Bairoch, A., Bateman, A., Binns, D., Bradley, P., Bork, P., Bucher, P., Cerutti, L., Copley, R., Courcelle, E., Das, U., Durbin, R., Fleischmann, W., Gough, J., Haft, D., Harte, N., Hulo, N., Kahn, D., Kanapin, A., Krestyaninova, M., Lonsdale, D., Lopez, R., Letunic, I., Madera, M., Maslen, J., McDowall, J., Mitchell, A., Nikolskaya, A.N., Orchard, S., Pagni, M., Ponting, C.P., Quevillon, E., Selengut, J., Sigrist, C.J., Silventoinen, V., Studholme, D.J., Vaughan, R., Wu, C.H. (2005) InterPro, progress and status in 2005. Nucleic Acids Res. 33(Database issue), D201–D205.Google Scholar
  5. 5.
    Schneider, M., Tognolli, M., and Bairoch, A. (2004). The Swiss-Prot protein knowledgebase and ExPASy: providing the plant community with high quality proteomic data and tools. Plant Physiol. Biochem. 42, 1013–1021.CrossRefPubMedGoogle Scholar
  6. 6.
    Schneider, M., Bairoch, A., Wu, C.H., and Apweiler, R. (2005) Plant protein annotation in the UniProt Knowledgebase. Plant Physiol. 138, 59–66.CrossRefPubMedGoogle Scholar
  7. 7.
    Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. (1990) Basic local alignment search tool. J. Mol. Biol. 215, 403–410.PubMedGoogle Scholar
  8. 8.
    Gattiker, A., Gasteiger, E., and Bairoch, A. (2002) ScanProsite: a reference implementation of a PROSITE scanning tool. Applied Bioinformatics 1, 107–108.PubMedGoogle Scholar
  9. 9.
    Gasteiger, E., Gattiker, A., Hoogland, C., Ivanyi, I., Appel, R.D., and Bairoch, A. (2003) ExPASy: the proteomics server for in-depth protein knowledge and analysis. Nucleic Acids Res. 31, 3784–3788.CrossRefPubMedGoogle Scholar
  10. 10.
    Gasteiger, E., Hoogland, C., Gattiker, A., Duvaud, S., Wilkins, M.R., Appel, R.D., and Bairoch, A. (2005) Protein identification and analysis tools on the ExPASy server, in The Proteomics Protocols Handbook (Walker, J.M., ed.). Humana, Totowa, NJ, pp. 571–607.CrossRefGoogle Scholar
  11. 11.
    Etzold, T. and Argos, P. (1993) SRS – an indexing and retrieval tool for flat file data libraries. Comput. Appl. Biosci. 9, 49–57.PubMedGoogle Scholar
  12. 12.
    Bairoch, A. (2000) The ENZYME database in 2000. Nucleic Acids Res. 28, 304–305.CrossRefPubMedGoogle Scholar
  13. 13.
    Aubourg, S., Brunaud, V., Bruyere, C., Cock, M., Cooke, R., Cottet, A., Couloux, A., Dehais, P., Deleage, G., Duclert, A., Echeverria, M., Eschbach, A., Falconet, D., Filippi, G., Gaspin, C., Geourjon, C., Grienenberger, J.-M., Houlne, G., Jamet, E., Lechauve, F., Leleu, O., Leroy, P., Mache, R., Meyer, C., Nedjari, H., Negrutiu, I., Orsini, V., Peyretaillade, E., Pommier, C., Raes, J., Risler, J.-L., Riviere, S., Rombauts, S., Rouze, P., Schneider, M., Schwob, P., Small, I., Soumayet-Kampetenga, G., Stankovski, D., Toffano, C., Tognolli, M., Caboche, M., and Lecharny, A. (2005) GeneFarm, structural and functional annotation of Arabidopsis gene and protein families by a network of experts. Nucleic Acids Res. 33, D641–D646.CrossRefPubMedGoogle Scholar
  14. 14.
    Ware, D.H., Jaiswal, P., Ni, J., Yap, I.V., Pan, X., Clark, K.Y., Teytelman, L., Schmidt, S.C., Zhao, W., Chang, K., Cartinhour, S, Stein, L.D., and McCouch, S.R. (2002) Gramene, a tool for grass genomics. Plant Physiol. 130, 1606–1613.CrossRefPubMedGoogle Scholar
  15. 15.
    Lawrence, C.J., Dong, Q., Polacco, M.L., Seigfried, T.E., and Brendel, V. (2004) MaizeGDB, the community database for maize genetics and genomics. Nucleic Acids Res. 32(Database issue), D393–D397.CrossRefPubMedGoogle Scholar
  16. 16.
    Rhee, S.Y., Beavis, W., Berardini, T.Z., Chen, G., Dixon, D., Doyle, A., Garcia-Hernandez, M., Huala, E., Lander, G., Montoya, M., Miller, N., Mueller, L.A., Mundodi, S., Reiser, L., Tacklind, J., Weems, D.C., Wu, Y., Xu, I., Yoo, D., Yoon, J., and Zhang, P. (2003) The Arabidopsis Information Resource (TAIR): a model organism database providing a centralized, curated gateway to Arabidopsis biology, research materials and community. Nucleic Acids Res. 31, 224–228.CrossRefPubMedGoogle Scholar
  17. 17.
    Press, W.H., Flannery, B.P., Teukolsky, S.A., and Vetterling, W.T. (1993) Numerical Recipes in C, 2nd edition. Cambridge University Press, Cambridge, pp. 896–902.Google Scholar
  18. 18.
    Harte, N., Silventoinen, V., Quevillon, E., Robinson, S., Kallio, K., Fustero, X., Patel, P., Jokinen, P., and Lopez, R. (2004) European Bioinformatics Institute. Public web-based services from the European Bioinformatics Institute. Nucleic Acids Res. 32(Web Server issue), W3–W9.CrossRefPubMedGoogle Scholar

Copyright information

© Humana Press Inc. 2007

Authors and Affiliations

  • Emmanuel Boutet
    • 1
  • Damien Lieberherr
    • 1
  • Michael Tognolli
    • 1
  • Michel Schneider
    • 1
  • Amos Bairoch
    • 1
  1. 1.Swiss Institute of Bioinformatics, Centre Medical UniversitaireGenevaSwitzerland

Personalised recommendations