Skip to main content

Protein Sequence Databases

  • Protocol
The Proteomics Protocols Handbook

Part of the book series: Springer Protocols Handbooks ((SPH))

Abstract

With the availability of almost 150 completed genome sequences from both eukaryotic and prokaryotic organisms, efforts are now being focused on the identification and functional analysis of the proteins encoded by these genomes. The rapidly emerging field of proteomics, the large-scale analysis of these proteins, has started to generate huge amounts of data as a result of the new information provided by the genome projects and by a range of new technologies in protein science. For example, mass spectrometry approaches are being used in protein identification and in determining the nature of posttranslational modifications (1, and large-scale yeast two-hybrid screens provide valuable data about protein-protein interactions (2. These and other methods now make it possible to quickly identify large numbers of proteins in a complex, to map their interactions in a cellular context, to determine their location within the cell, and to analyze their biological activities. Protein sequence databases play a vital role as a central resource for storing the data generated by these efforts and making them freely available to the scientific community. Data from large-scale experiments are often no longer published in a conventional sense but are deposited in a database. This means that protein sequence databases are the most comprehensive resource of information on proteins available to scientists.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Similar content being viewed by others

References

  1. Sickmann, A., Mreyen, M., and Meyer, H. E. (2003) Mass spectrometry-a key technology in proteome research. Adv. Biochem. Eng. Biotechnol. 83, 141–76.

    PubMed  CAS  Google Scholar 

  2. Coates, P. J. and Hall, P. A. (2003) The yeast two-hybrid system for identifying proteinprotein interactions. J. Pathol. 199, 4–7.

    Article  PubMed  CAS  Google Scholar 

  3. Wheeler, D. L., Church, D. M., Federhen, S., et al. (2003) Database resources of the National Center for Biotechnology. Nucl. Acids Res. 31, 28–33.

    Article  PubMed  CAS  Google Scholar 

  4. Miyazaki, S., Sugawara, H., Gojobori, T., and Tateno, Y. (2003) DNA Data Bank of Japan in XML. Nucleic Acids Res. 31, 13–16.

    Article  PubMed  CAS  Google Scholar 

  5. Stoesser, G., Baker, W., van den Broek, A., et al. (2003) The EMBL Nucleotide Sequence Database: major new developments. Nucleic Acids Res. 31, 17–22.

    Article  PubMed  CAS  Google Scholar 

  6. Benson, D. A., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J., and Wheeler, D. L. (2003) GenBank. Nucleic Acids Res. 31, 23–27.

    Article  PubMed  CAS  Google Scholar 

  7. Boeckmann, B., Bairoch, A., Apweiler, R., et al. (2003). The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 31, 365–370.

    Article  PubMed  CAS  Google Scholar 

  8. Wu, C. H., Yeh, L. S., Huang, H., et al. (2003). The Protein Information Resource. Nucleic Acids Res. 31, 345–347.

    Article  PubMed  Google Scholar 

  9. Pruitt, K. D., Tatusova, T., and Maglott, D. R. (2003) NCBI Reference Sequence Project: update and current status. Nucleic Acids Res. 31, 4–37.

    Article  Google Scholar 

  10. Westbrook, J., Feng, Z., Chen, L., Yang, H., and Berman, H. M. (2003) The Protein Data Bank and structural genomics. Nucleic Acids. Res. 31, 489–491.

    Article  PubMed  CAS  Google Scholar 

  11. Dayhoff, M. O. (1978) Atlas of Protein Sequence and Structure Vol. 5Supplement 3. National Biomedical Research Foundation, Washington, DC.

    Google Scholar 

  12. Gasteiger, E., Jung, E., and Bairoch, A. (2001) SWISS-PROT: connecting biomolecular knowledge via a protein database. Curr. Issues Mol. Biol. 3, 47–55.

    PubMed  CAS  Google Scholar 

  13. Wain, H. M., Lush, M., Ducluzeau, F., and Povey, S. (2002) Genew: the human gene nomenclature database. Nucleic Acids Res. 30, 169–171.

    Article  PubMed  CAS  Google Scholar 

  14. FlyBase consortium. (2003) The FlyBase database of the Drosophila genome projects and community literature.Nucleic Acids Res. 31, 172–175.

    Article  Google Scholar 

  15. Blake, J. A., Richardson, J. E., Bult, C. J., Kadin, J. A., and Eppig, J. T. (2003) MGD: the Mouse Genome Database. Nucleic Acids Res. 31, 193–195.

    Article  PubMed  CAS  Google Scholar 

  16. Junker, V., Apweiler, R., and Bairoch, A. (1999) Representation of functional information in the Swiss-Prot data bank. Bioinformatics 15, 1066–1067.

    Article  PubMed  CAS  Google Scholar 

  17. O’Donovan, C., Martin, M. J., Glemet, E., Codani, J., and Apweiler, R. (1999) Removing redundancy in Swiss-Prot and TrEMBL. Bioinformatics 15, 258–259.

    Article  CAS  Google Scholar 

  18. Apweiler, R. (2001) Functional information in SWISS-PROT: the basis for large-scale characterisation of protein sequences. Briefings in Bioinformatics 2, 9–18.

    Article  PubMed  CAS  Google Scholar 

  19. Fleischmann, W., Moeller, S., Gateau, A., and Apweiler, R. (1998) A novel method for automatic and reliable functional annotation. Bioinformatics 15, 228–233.

    Article  Google Scholar 

  20. Mulder, N. J, Apweiler, R., Attwood, T. K., et al. (2003) The InterPro database, 2003 brings increased coverage and new features. Nucleic Acids Res. 31, 315–318.

    Article  PubMed  CAS  Google Scholar 

  21. Falquet, L., Pagni, M., Bucher, P., et al. (2002) The PROSITE database, its status in 2002. Nucleic Acids Res. 30, 235–238.

    Article  PubMed  CAS  Google Scholar 

  22. Attwood, T. K., Bradley, P., Flower, D. R., et al. (2003) PRINTS and its automatic supplement, prePRINTS. Nucleic Acids Res. 31, 400–402.

    Article  PubMed  CAS  Google Scholar 

  23. Bateman, A., Birney, E., Cerruti, L., et al. (2002) The Pfam protein families database. Nucleic Acids Res. 30, 276–280.

    Article  PubMed  CAS  Google Scholar 

  24. Corpet, F., Servant, F., Gouzy, J., and Kahn, D. (2000) ProDom and ProDom-CG: tools for protein domain analysis and whole genome comparisons. Nucleic Acids Res. 28, 267–269.

    Article  PubMed  CAS  Google Scholar 

  25. Letunic, I., Goodstadt, L., Dickens, N. J., et al. (2002) Recent improvements to the SMART domain-based sequence annotation resource. Nucleic Acids Res. 30, 242–244.

    Article  PubMed  CAS  Google Scholar 

  26. Haft, D. H., Selengut, J. D., and White, O. (2003) The TIGRFAMs database of protein families. Nucleic Acids Res. 31, 371–373.

    Article  PubMed  CAS  Google Scholar 

  27. Huang, H., Barker, W. C., Chen, Y., and Wu, C. H. (2003) iProClass: an integrated database of protein family, function and structure information. Nucleic Acids Res. 31, 390–392.

    Article  PubMed  CAS  Google Scholar 

  28. Gough, J., Karplus, K., Hughey, R., and Chothia, C. (2001) Assignment of homology to genome sequences using a library of Hidden Markov Models that represent all proteins of known structure. J. Mol. Biol. 313, 903–919.

    Article  PubMed  CAS  Google Scholar 

  29. Rawlings, N. D., O’Brien, E., and Barrett, A. J. (2002) MEROPS: the protease database. Nucleic Acids Res. 30, 343–346.

    Article  PubMed  CAS  Google Scholar 

  30. Butler, D.(2002) NIH pledges cash for global protein database. Nature 419, 101.

    Google Scholar 

  31. Clamp, M., Andrews, D., Barker, D., et al. (2003) Ensembl 2002: accommodating comparative genomics. Nucleic Acids Res. 31, 38–42.

    Article  PubMed  CAS  Google Scholar 

  32. Harris, T. W., Lee, R., Schwarz, E., et al. (2003) WormBase: a cross-species database for comparative genomics. Nucleic Acids Res. 31, 133–137.

    Article  PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Humana Press Inc., Totowa, NJ

About this protocol

Cite this protocol

Magrane, M., Martin, M.J., O’Donovan, C., Apweiler, R. (2005). Protein Sequence Databases. In: Walker, J.M. (eds) The Proteomics Protocols Handbook. Springer Protocols Handbooks. Humana Press. https://doi.org/10.1385/1-59259-890-0:609

Download citation

  • DOI: https://doi.org/10.1385/1-59259-890-0:609

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-58829-343-5

  • Online ISBN: 978-1-59259-890-8

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics