Molecular Biotechnology

, Volume 42, Issue 1, pp 1–13 | Cite as

Data Deposition and Annotation at the Worldwide Protein Data Bank

  • Shuchismita Dutta
  • Kyle Burkhardt
  • Jasmine Young
  • Ganesh J. Swaminathan
  • Takanori Matsuura
  • Kim Henrick
  • Haruki Nakamura
  • Helen M. BermanEmail author


The Protein Data Bank (PDB) is the repository for three-dimensional structures of biological macromolecules, determined by experimental methods. The data in the archive is free and easily available via the Internet from any of the worldwide centers managing this global archive. These data are used by scientists, researchers, bioinformatics specialists, educators, students, and general audiences to understand biological phenomenon at a molecular level. Analysis of this structural data also inspires and facilitates new discoveries in science. This chapter describes the tools and methods currently used for deposition, processing, and release of data in the PDB. References to future enhancements are also included.


Protein Data Bank (PDB) wwPDB Deposition Annotation Validation 



The authors would like to acknowledge the staff of the 3 wwPDB sites, and our advisory committees. At the RCSB PDB, we acknowledge the programing staff consisting of Li Chen, Zukang Feng, Vladimir Guranovic, Andrei Kouranov, John Westbrook, Huanwang Yang; and the annotation staff consisting of Jaroslaw Blaszczyk, Batsal Devkota, Guanghua Gao, Sutapa Ghosh, Irina Persikova, Bohdan Schneider, Monica Sekharan, Chenghua Shao, Lihua Tan, and Jing Zhou. At PDBe, we acknowledge annotators, Richard Newman, Gaurav Sahni, Glen van Ginkel, and Sanchayita Sen. At PDBj, we acknowledge annotators Minyu Chen, Mayumi Inoue, Reiko Igarashi, Yumiko Kengaku, and Kanna Matsuura. The RCSB PDB is operated by Rutgers, The State University of New Jersey and the San Diego Supercomputer Center and the Skaggs School of Pharmacy and Pharmaceutical Sciences at the University of California, San Diego. It is supported by funds from the National Science Foundation, the National Institute of General Medical Sciences, the Office of Science, Department of Energy, the National Library of Medicine, the National Cancer Institute, the National Center for Research Resources, the National Institute of Biomedical Imaging and Bioengineering, the National Institute of Neurological Disorders and Stroke, and the National Institute of Diabetes and Digestive and Kidney Diseases. PDBe is supported by funds from the Wellcome Trust (GR062025MA), the European Union (TEMBLOR, NMRQUAL, SPINE, AUTOSTRUCT, and IIMS awards), CCP4, the Biotechnology and Biological Sciences Research Council (UK), the Medical Research Council (UK) and European Molecular Biology Laboratory. PDBj is supported by grant-in-aid from the Institute for Bioinformatics Research and Development, Japan Science and Technology Agency (BIRD-JST), and the Ministry of Education, Culture, Sports, Science, and Technology (MEXT). The BMRB is supported by NIH grant LM05799 from the National Library of Medicine.


  1. 1.
    Bernstein, F. C., Koetzle, T. F., Williams, G. J. B., Meyer, E. F., Jr, Brice, M. D., Rodgers, J. R., et al. (1977). Protein Data Bank: A computer-based archival file for macromolecular structures. Journal of Molecular Biology, 112, 535–542. doi: 10.1016/S0022-2836(77)80200-3.CrossRefGoogle Scholar
  2. 2.
    Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., et al. (2000). The Protein Data Bank. Nucleic Acids Research, 28, 235–242. doi: 10.1093/nar/28.1.235.CrossRefGoogle Scholar
  3. 3.
    Berman, H. M., Henrick, K., & Nakamura, H. (2003). Announcing the worldwide Protein Data Bank. Nature Structural Biology, 10, 980. doi: 10.1038/nsb1203-980.CrossRefGoogle Scholar
  4. 4.
    Berman, H. M., Henrick, K., Nakamura, H., & Markley, J. L. (2007). The Worldwide Protein Data Bank (wwPDB): Ensuring a single, uniform archive of PDB data. Nucleic Acids Research, 35, D301–D303. doi: 10.1093/nar/gkl971.CrossRefGoogle Scholar
  5. 5.
    Ulrich, E. L., Markley, J. L., & Kyogoku, Y. (1989). Creation of a nuclear magnetic resonance data repository and literature database. Protein Sequences & Data Analysis, 2, 23–37.Google Scholar
  6. 6.
    Deshpande, N., Addess, K. J., Bluhm, W. F., Merino-Ott, J. C., Townsend-Merino, W., Zhang, Q., et al. (2005). The RCSB Protein Data Bank: A redesigned query system and relational database based on the mmCIF schema. Nucleic Acids Research, 33, D233–D237. doi: 10.1093/nar/gki057.CrossRefGoogle Scholar
  7. 7.
    Kouranov, A., Xie, L., de la Cruz, J., Chen, L., Westbrook, J., Bourne, P. E., et al. (2006). The RCSB PDB information portal for structural genomics. Nucleic Acids Research, 34, D302–D305. doi: 10.1093/nar/gkj120.CrossRefGoogle Scholar
  8. 8.
    Tagari, M., Tate, J., Swaminathan, G. J., Newman, R., Naim, A., Vranken, W., et al. (2006). E-MSD: Improving data deposition and structure quality. Nucleic Acids Research, 34, D287–D290. doi: 10.1093/nar/gkj163.CrossRefGoogle Scholar
  9. 9.
    Henrick, K., & Thornton, J. M. (1998). PQS: A protein quarternary file server. Trends in Biochemical Sciences, 23, 358–361. doi: 10.1016/S0968-0004(98)01253-5.CrossRefGoogle Scholar
  10. 10.
    Kinoshita, K., & Nakamura, H. (2004). eF-site and PDBjViewer: Database and viewer for protein functional sites. Bioinformatics (Oxford, England), 20, 1329–1330. doi: 10.1093/bioinformatics/bth073.CrossRefGoogle Scholar
  11. 11.
    Standley, D. M., Toh, H., & Nakamura, H. (2005). GASH: An improved algorithm for maximizing the number of equivalent residues between two protein structures. BMC Bioinformatics, 6, 221. doi: 10.1186/1471-2105-6-221.CrossRefGoogle Scholar
  12. 12.
    Wako, H., Kato, M., & Endo, S. (2004). ProMode: A database of normal mode analyses on protein molecules with a full-atom model. Bioinformatics (Oxford, England), 20, 2035–2043. doi: 10.1093/bioinformatics/bth197.CrossRefGoogle Scholar
  13. 13.
    Stevens, R. C., Yokoyama, S., & Wilson, I. A. (2001). Global efforts in structural genomics. Science, 294, 89–92. doi: 10.1126/science.1066011.CrossRefGoogle Scholar
  14. 14.
    Callaway, J., Cummings, M., Deroski, B., Esposito, P., Forman, A., Langdon, P., et al. (1996). Brookhaven National Laboratory.Google Scholar
  15. 15.
    Dutta, S., & Berman, H. M. (2005). Large macromolecular complexes in the Protein Data Bank: A status report. Structure (London, England), 13, 381–388. doi: 10.1016/j.str.2005.01.008.Google Scholar
  16. 16.
    Yusupov, M. M., Yusupova, G. Z., Baucom, A., Lieberman, K., Earnest, T. N., Cate, J. H. D., et al. (2001). Crystal structure of the ribosome at 5.5 Å resolution. Science, 282, 883–896. doi: 10.1126/science.1060089.CrossRefGoogle Scholar
  17. 17.
    Chen, B., Colgrave, M. L., Daly, N. L., Rosengren, K. J., Gustafson, K. R., & Craik, D. J. (2005). Isolation and characterization of novel cyclotides from Viola hederaceae: Solution structure and anti-HIV activity of vhl-1, a leaf-specific expressed cyclotide. The Journal of Biological Chemistry, 280, 22395–22405. doi: 10.1074/jbc.M501737200.CrossRefGoogle Scholar
  18. 18.
    Ciszak, E. M., Makal, A., Hong, Y. S., Vettaikkorumakankauv, A. K., Korotchkina, L. G., & Patel, M. S. (2006). How dihydrolipoamide dehydrogenase-binding protein binds dihydrolipoamide dehydrogenase in the human pyruvate dehydrogenase complex. The Journal of Biological Chemistry, 281, 648–655. doi: 10.1074/jbc.M507850200.CrossRefGoogle Scholar
  19. 19.
    Bourne, P. E., Berman, H. M., Watenpaugh, K., Westbrook, J. D., & Fitzgerald, P. M. D. (1997). The macromolecular Crystallographic Information File (mmCIF). Methods in Enzymology, 277, 571–590. doi: 10.1016/S0076-6879(97)77032-0.CrossRefGoogle Scholar
  20. 20.
    Fitzgerald, P. M. D., Westbrook, J. D., Bourne, P. E., McMahon, B., Watenpaugh, K. D., & Berman, H. M. (2005). Definition and exchange of crystallographic data. In S. R. Hall & B. McMahon (Eds.), International tables for crystallography (Vol. G, pp. 295–443). Dordrecht, The Netherlands: Springer.CrossRefGoogle Scholar
  21. 21.
    Westbrook, J., Henrick, K., Ulrich, E. L., & Berman, H. M. (2005). Definition and exchange of crystallographic data. In S. R. Hall & B. McMahon (Eds.), International tables for crystallography (Vol. G, pp. 195–198). Dordrecht, The Netherlands: Springer.Google Scholar
  22. 22.
    Westbrook, J. D., Berman, H. M., & Hall, S. R. (2005). Definition and exchange of crystallographic data. In S. R. Hall & B. McMahon (Eds.), International tables for crystallography (Vol. G, pp. 61–72). Dordrecht, The Netherlands: Springer.CrossRefGoogle Scholar
  23. 23.
    Westbrook, J., Ito, N., Nakamura, H., Henrick, K., & Berman, H. M. (2005). PDBML: The representation of archival macromolecular structure data in XML. Bioinformatics (Oxford, England), 21, 988–992. doi: 10.1093/bioinformatics/bti082.CrossRefGoogle Scholar
  24. 24.
    Chen, L., Oughtred, R., Berman, H. M., & Westbrook, J. (2004). TargetDB: A target registration database for structural genomics projects. Bioinformatics (Oxford, England), 20, 2860–2862. doi: 10.1093/bioinformatics/bth300.CrossRefGoogle Scholar
  25. 25.
    Pajon, A., Ionides, J., Diprose, J., Fillon, J., Fogh, R., Ashton, A. W., et al. (2005). Design of a data model for developing laboratory information management and analysis systems for protein production. Proteins, 58, 278–284. doi: 10.1002/prot.20303.CrossRefGoogle Scholar
  26. 26.
    Winn, M. D., Ashton, A. W., Briggs, P. J., Ballard, C. C., & Patel, P. (2002). Ongoing developments in CCP4 for high-throughput structure determination. Acta Crystallographica. Section D, Biological Crystallography, 58, 1929–1936. doi: 10.1107/S0907444902016116.CrossRefGoogle Scholar
  27. 27.
    Yang, H., Guranovic, V., Dutta, S., Feng, Z., Berman, H. M., & Westbrook, J. (2004). Automated and accurate deposition of structures solved by X-ray diffraction to the Protein Data Bank. Acta Crystallographica. Section D, Biological Crystallography, 60, 1833–1839. doi: 10.1107/S0907444904019419.CrossRefGoogle Scholar
  28. 28.
    Markley, J. L., Bax, A., Arata, Y., Hilbers, C. W., Kaptein, R., Sykes, B. D., et al. (1998). Recommendations for the presentation of NMR structures of proteins and nucleic acids. IUPAC-IUBMB-IUPAB Inter-Union Task Group on the standardization of data bases of protein and nucleic acid structures determined by NMR spectroscopy. Journal of Biomolecular NMR, 12, 1–23. doi: 10.1023/A:1008290618449.CrossRefGoogle Scholar
  29. 29.
    Golovin, A., Oldfield, T. J., Tate, J. G., Velankar, S., Barton, G. J., Boutselakis, H., et al. (2004). E-MSD: An integrated data resource for bioinformatics. Nucleic Acids Research, 32(Database issue), D211–D216.Google Scholar
  30. 30.
    Ihlenfeldt, W. D., Voigt, J. H., Bienfait, B., Oellien, F., & Nicklaus, M. C. (2002). Enhanced CACTVS browser of the Open NCI Database. Journal of Chemical Information and Computer Sciences, 42, 46–57. doi: 10.1021/ci010056s.Google Scholar
  31. 31.
    Ihlenfeldt, W., Takahasi, Y., Abe, H., & Sasaki, S. (1994). Computation and management of chemical properties in CACTVS: An extensible networked approach toward modularity and flexibility. Journal of Chemical Information and Computer Sciences, 34, 109–116. doi: 10.1021/ci00017a013.Google Scholar
  32. 32.
    Wheeler, D. L., Chappey, C., Lash, A. E., Leipe, D. D., Madden, T. L., Schuler, G. D., et al. (2000). Database resources of the National Center for Biotechnology Information. Nucleic Acids Research, 28, 10–14. doi: 10.1093/nar/28.1.10.CrossRefGoogle Scholar
  33. 33.
    Bairoch, A., Apweiler, R., Wu, C. H., Barker, W. C., Boeckmann, B., Ferro, S., et al. (2005). The Universal Protein Resource (UniProt). Nucleic Acids Research, 33, D154–D159. doi: 10.1093/nar/gki070.CrossRefGoogle Scholar
  34. 34.
    Benson, D. A., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J., & Wheeler, D. L. (2008). GenBank. Nucleic Acids Research, 36, D25–D30. doi: 10.1093/nar/gkm929.CrossRefGoogle Scholar
  35. 35.
    Sugawara, H., Ogasawara, O., Okubo, K., Gojobori, T., & Tateno, Y. (2008). DDBJ with new system and face. Nucleic Acids Research, 36, D22–D24. doi: 10.1093/nar/gkm889.CrossRefGoogle Scholar
  36. 36.
    Cochrane, G., Akhtar, R., Aldebert, P., Althorpe, N., Baldwin, A., Bates, K., et al. (2008). Priorities for nucleotide trace, sequence and annotation data capture at the Ensembl Trace Archive and the EMBL Nucleotide Sequence Database. Nucleic Acids Research, 36, D5–D12. doi: 10.1093/nar/gkm1018.CrossRefGoogle Scholar
  37. 37.
    Krissinel, E., & Henrick, K. (2007). Inference of macromolecular assemblies from crystalline state. Journal of Molecular Biology, 372, 774–797. doi: 10.1016/j.jmb.2007.05.022.Google Scholar
  38. 38.
    Hooft, R. W., Vriend, G., Sander, C., & Abola, E. E. (1996). Errors in protein structures. Nature, 381, 272. doi: 10.1038/381272a0.CrossRefGoogle Scholar
  39. 39.
    Laskowski, R. A., McArthur, M. W., Moss, D. S., & Thornton, J. M. (1993). PROCHECK: A program to check the stereochemical quality of protein structures. Journal of Applied Crystallography, 26, 283–291. doi: 10.1107/S0021889892009944.CrossRefGoogle Scholar
  40. 40.
    Lovell, S. C., Davis, I. W., Arendall, W. B., III, de Bakker, P. I., Word, J. M., Prisant, M. G., et al. (2003). Structure validation by Calpha geometry: Phi, psi and cbeta deviation. Proteins, 50, 437–450. doi: 10.1002/prot.10286.CrossRefGoogle Scholar
  41. 41.
    Kleywegt, G. J., & Jones, T. A. (1996). Phi/psi-chology: Ramachandran revisited. Structure (London, England), 4, 1395–1400. doi: 10.1016/S0969-2126(96)00147-5.Google Scholar
  42. 42.
    Westbrook, J., Feng, Z., Burkhardt, K., & Berman, H. M. (2003). Validation of protein structures for the Protein Data Bank. Methods in Enzymology, 374, 370–385. doi: 10.1016/S0076-6879(03)74017-8.CrossRefGoogle Scholar
  43. 43.
    Sayle, R., & Milner-White, E. J. (1995). RasMol: Biomolecular graphics for all. Trends in Biochemical Sciences, 20, 374. doi: 10.1016/S0968-0004(00)89080-5.CrossRefGoogle Scholar
  44. 44.
    Pettersen, E. F., Goddard, T. D., Huang, C. C., Couch, G. S., Greenblatt, D. M., Meng, E. C., et al. (2004). UCSF Chimera-a visualization system for exploratory research and analysis. Journal of Computational Chemistry, 25, 1605–1612. doi: 10.1002/jcc.20084.CrossRefGoogle Scholar
  45. 45.
    Hartshorn, M. J. (2002). AstexViewer: A visualisation aid for structure-based drug design. Journal of Computer-Aided Molecular Design, 16, 871–881. doi: 10.1023/A:1023813504011.CrossRefGoogle Scholar
  46. 46.
    Vaguine, A. A., Richelle, J., & Wodak, S. J. (1999). SFCHECK: A unified set of procedures for evaluating the quality of macromolecular structure-factor data and their agreement with the atomic model. Acta Crystallographica. Section D, Biological Crystallography, 55, 191–205. doi: 10.1107/S0907444998006684.CrossRefGoogle Scholar
  47. 47.
    Kleywegt, G. J., Harris, M. R., Zou, J. Y., Taylor, T. C., Wahlby, A., & Jones, T. A. (2004). The uppsala electron-density server. Acta Crystallographica. Section D, Biological Crystallography, 60, 2240–2249. doi: 10.1107/S0907444904013253.CrossRefGoogle Scholar
  48. 48.
    Doreleijers, J. F., Nederveen, A. J., Vranken, W., Lin, J., Bonvin, A. M., Kaptein, R., et al. (2005). BioMagResBank databases DOCR and FRED containing converted and filtered sets of experimental NMR restraints and coordinates from over 500 protein PDB structures. Journal of Biomolecular NMR, 32, 1–12. doi: 10.1007/s10858-005-2195-0.CrossRefGoogle Scholar
  49. 49.
    Henrick, K., Newman, R., Tagari, M., & Chagoyen, M. (2003). EMDep: A web-based system for the deposition and validation of high-resolution electron microscopy macromolecular structural information. Journal of Structural Biology, 144, 228–237. doi: 10.1016/j.jsb.2003.09.009.CrossRefGoogle Scholar
  50. 50.
    Berman, H. M., Burley, S. K., Chiu, W., Sali, A., Adzhubei, A., Bourne, P. E., et al. (2006). Outcome of a workshop on archiving structural models of biological macromolecules. Structure (London, England), 14, 1211–1217. doi: 10.1016/j.str.2006.06.005.Google Scholar
  51. 51.
    Hempstead, P. D., Yewdall, S. J., Fernie, A. R., Lawson, D. M., Artymiuk, P. J., Rice, D. W., et al. (1997). Comparison of the three-dimensional structures of recombinant human H and horse L ferritins at high resolution. Journal of Molecular Biology, 268, 424–448. doi: 10.1006/jmbi.1997.0970.CrossRefGoogle Scholar

Copyright information

© Humana Press 2008

Authors and Affiliations

  • Shuchismita Dutta
    • 1
  • Kyle Burkhardt
    • 1
    • 2
  • Jasmine Young
    • 1
  • Ganesh J. Swaminathan
    • 3
  • Takanori Matsuura
    • 4
  • Kim Henrick
    • 3
  • Haruki Nakamura
    • 4
  • Helen M. Berman
    • 1
    Email author
  1. 1.Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB), Department of Chemistry and Chemical BiologyRutgers, The State University of New JerseyPiscatawayUSA
  2. 2.Office of Research and Project AdministrationPrinceton UniversityPrincetonUSA
  3. 3.The Macromolecular Structure Database at the European Bioinformatics Institute PDB-eEMBL Outstation-HinxtonCambridgeUK
  4. 4.Protein Data Bank Japan (PDBj), Institute for Protein ResearchOsaka UniversitySuitaJapan

Personalised recommendations