Data Deposition and Annotation at the Worldwide Protein Data Bank

Dutta, Shuchismita; Burkhardt, Kyle; Young, Jasmine; Swaminathan, Ganesh J.; Matsuura, Takanori; Henrick, Kim; Nakamura, Haruki; Berman, Helen M.

doi:10.1007/s12033-008-9127-7

Data Deposition and Annotation at the Worldwide Protein Data Bank

Review
Published: 10 December 2008

Volume 42, pages 1–13, (2009)
Cite this article

Molecular Biotechnology Aims and scope Submit manuscript

Shuchismita Dutta¹,
Kyle Burkhardt¹^nAff2,
Jasmine Young¹,
Ganesh J. Swaminathan³,
Takanori Matsuura⁴,
Kim Henrick³,
Haruki Nakamura⁴ &
…
Helen M. Berman¹

689 Accesses
95 Citations
Explore all metrics

Abstract

The Protein Data Bank (PDB) is the repository for three-dimensional structures of biological macromolecules, determined by experimental methods. The data in the archive is free and easily available via the Internet from any of the worldwide centers managing this global archive. These data are used by scientists, researchers, bioinformatics specialists, educators, students, and general audiences to understand biological phenomenon at a molecular level. Analysis of this structural data also inspires and facilitates new discoveries in science. This chapter describes the tools and methods currently used for deposition, processing, and release of data in the PDB. References to future enhancements are also included.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Software for molecular docking: a review

Article 16 January 2017

Classical Molecular Dynamics in a Nutshell

Advances in Structural Bioinformatics

References

Bernstein, F. C., Koetzle, T. F., Williams, G. J. B., Meyer, E. F., Jr, Brice, M. D., Rodgers, J. R., et al. (1977). Protein Data Bank: A computer-based archival file for macromolecular structures. Journal of Molecular Biology, 112, 535–542. doi:10.1016/S0022-2836(77)80200-3.
Article CAS Google Scholar
Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., et al. (2000). The Protein Data Bank. Nucleic Acids Research, 28, 235–242. doi:10.1093/nar/28.1.235.
Article CAS Google Scholar
Berman, H. M., Henrick, K., & Nakamura, H. (2003). Announcing the worldwide Protein Data Bank. Nature Structural Biology, 10, 980. doi:10.1038/nsb1203-980.
Article CAS Google Scholar
Berman, H. M., Henrick, K., Nakamura, H., & Markley, J. L. (2007). The Worldwide Protein Data Bank (wwPDB): Ensuring a single, uniform archive of PDB data. Nucleic Acids Research, 35, D301–D303. doi:10.1093/nar/gkl971.
Article CAS Google Scholar
Ulrich, E. L., Markley, J. L., & Kyogoku, Y. (1989). Creation of a nuclear magnetic resonance data repository and literature database. Protein Sequences & Data Analysis, 2, 23–37.
CAS Google Scholar
Deshpande, N., Addess, K. J., Bluhm, W. F., Merino-Ott, J. C., Townsend-Merino, W., Zhang, Q., et al. (2005). The RCSB Protein Data Bank: A redesigned query system and relational database based on the mmCIF schema. Nucleic Acids Research, 33, D233–D237. doi:10.1093/nar/gki057.
Article CAS Google Scholar
Kouranov, A., Xie, L., de la Cruz, J., Chen, L., Westbrook, J., Bourne, P. E., et al. (2006). The RCSB PDB information portal for structural genomics. Nucleic Acids Research, 34, D302–D305. doi:10.1093/nar/gkj120.
Article CAS Google Scholar
Tagari, M., Tate, J., Swaminathan, G. J., Newman, R., Naim, A., Vranken, W., et al. (2006). E-MSD: Improving data deposition and structure quality. Nucleic Acids Research, 34, D287–D290. doi:10.1093/nar/gkj163.
Article CAS Google Scholar
Henrick, K., & Thornton, J. M. (1998). PQS: A protein quarternary file server. Trends in Biochemical Sciences, 23, 358–361. doi:10.1016/S0968-0004(98)01253-5.
Article CAS Google Scholar
Kinoshita, K., & Nakamura, H. (2004). eF-site and PDBjViewer: Database and viewer for protein functional sites. Bioinformatics (Oxford, England), 20, 1329–1330. doi:10.1093/bioinformatics/bth073.
Article CAS Google Scholar
Standley, D. M., Toh, H., & Nakamura, H. (2005). GASH: An improved algorithm for maximizing the number of equivalent residues between two protein structures. BMC Bioinformatics, 6, 221. doi:10.1186/1471-2105-6-221.
Article Google Scholar
Wako, H., Kato, M., & Endo, S. (2004). ProMode: A database of normal mode analyses on protein molecules with a full-atom model. Bioinformatics (Oxford, England), 20, 2035–2043. doi:10.1093/bioinformatics/bth197.
Article CAS Google Scholar
Stevens, R. C., Yokoyama, S., & Wilson, I. A. (2001). Global efforts in structural genomics. Science, 294, 89–92. doi:10.1126/science.1066011.
Article CAS Google Scholar
Callaway, J., Cummings, M., Deroski, B., Esposito, P., Forman, A., Langdon, P., et al. (1996). Brookhaven National Laboratory.
Dutta, S., & Berman, H. M. (2005). Large macromolecular complexes in the Protein Data Bank: A status report. Structure (London, England), 13, 381–388. doi:10.1016/j.str.2005.01.008.
CAS Google Scholar
Yusupov, M. M., Yusupova, G. Z., Baucom, A., Lieberman, K., Earnest, T. N., Cate, J. H. D., et al. (2001). Crystal structure of the ribosome at 5.5 Å resolution. Science, 282, 883–896. doi:10.1126/science.1060089.
Article Google Scholar
Chen, B., Colgrave, M. L., Daly, N. L., Rosengren, K. J., Gustafson, K. R., & Craik, D. J. (2005). Isolation and characterization of novel cyclotides from Viola hederaceae: Solution structure and anti-HIV activity of vhl-1, a leaf-specific expressed cyclotide. The Journal of Biological Chemistry, 280, 22395–22405. doi:10.1074/jbc.M501737200.
Article CAS Google Scholar
Ciszak, E. M., Makal, A., Hong, Y. S., Vettaikkorumakankauv, A. K., Korotchkina, L. G., & Patel, M. S. (2006). How dihydrolipoamide dehydrogenase-binding protein binds dihydrolipoamide dehydrogenase in the human pyruvate dehydrogenase complex. The Journal of Biological Chemistry, 281, 648–655. doi:10.1074/jbc.M507850200.
Article CAS Google Scholar
Bourne, P. E., Berman, H. M., Watenpaugh, K., Westbrook, J. D., & Fitzgerald, P. M. D. (1997). The macromolecular Crystallographic Information File (mmCIF). Methods in Enzymology, 277, 571–590. doi:10.1016/S0076-6879(97)77032-0.
Article CAS Google Scholar
Fitzgerald, P. M. D., Westbrook, J. D., Bourne, P. E., McMahon, B., Watenpaugh, K. D., & Berman, H. M. (2005). Definition and exchange of crystallographic data. In S. R. Hall & B. McMahon (Eds.), International tables for crystallography (Vol. G, pp. 295–443). Dordrecht, The Netherlands: Springer.
Chapter Google Scholar
Westbrook, J., Henrick, K., Ulrich, E. L., & Berman, H. M. (2005). Definition and exchange of crystallographic data. In S. R. Hall & B. McMahon (Eds.), International tables for crystallography (Vol. G, pp. 195–198). Dordrecht, The Netherlands: Springer.
Google Scholar
Westbrook, J. D., Berman, H. M., & Hall, S. R. (2005). Definition and exchange of crystallographic data. In S. R. Hall & B. McMahon (Eds.), International tables for crystallography (Vol. G, pp. 61–72). Dordrecht, The Netherlands: Springer.
Chapter Google Scholar
Westbrook, J., Ito, N., Nakamura, H., Henrick, K., & Berman, H. M. (2005). PDBML: The representation of archival macromolecular structure data in XML. Bioinformatics (Oxford, England), 21, 988–992. doi:10.1093/bioinformatics/bti082.
Article CAS Google Scholar
Chen, L., Oughtred, R., Berman, H. M., & Westbrook, J. (2004). TargetDB: A target registration database for structural genomics projects. Bioinformatics (Oxford, England), 20, 2860–2862. doi:10.1093/bioinformatics/bth300.
Article CAS Google Scholar
Pajon, A., Ionides, J., Diprose, J., Fillon, J., Fogh, R., Ashton, A. W., et al. (2005). Design of a data model for developing laboratory information management and analysis systems for protein production. Proteins, 58, 278–284. doi:10.1002/prot.20303.
Article CAS Google Scholar
Winn, M. D., Ashton, A. W., Briggs, P. J., Ballard, C. C., & Patel, P. (2002). Ongoing developments in CCP4 for high-throughput structure determination. Acta Crystallographica. Section D, Biological Crystallography, 58, 1929–1936. doi:10.1107/S0907444902016116.
Article CAS Google Scholar
Yang, H., Guranovic, V., Dutta, S., Feng, Z., Berman, H. M., & Westbrook, J. (2004). Automated and accurate deposition of structures solved by X-ray diffraction to the Protein Data Bank. Acta Crystallographica. Section D, Biological Crystallography, 60, 1833–1839. doi:10.1107/S0907444904019419.
Article Google Scholar
Markley, J. L., Bax, A., Arata, Y., Hilbers, C. W., Kaptein, R., Sykes, B. D., et al. (1998). Recommendations for the presentation of NMR structures of proteins and nucleic acids. IUPAC-IUBMB-IUPAB Inter-Union Task Group on the standardization of data bases of protein and nucleic acid structures determined by NMR spectroscopy. Journal of Biomolecular NMR, 12, 1–23. doi:10.1023/A:1008290618449.
Article CAS Google Scholar
Golovin, A., Oldfield, T. J., Tate, J. G., Velankar, S., Barton, G. J., Boutselakis, H., et al. (2004). E-MSD: An integrated data resource for bioinformatics. Nucleic Acids Research, 32(Database issue), D211–D216.
Google Scholar
Ihlenfeldt, W. D., Voigt, J. H., Bienfait, B., Oellien, F., & Nicklaus, M. C. (2002). Enhanced CACTVS browser of the Open NCI Database. Journal of Chemical Information and Computer Sciences, 42, 46–57. doi:10.1021/ci010056s.
CAS Google Scholar
Ihlenfeldt, W., Takahasi, Y., Abe, H., & Sasaki, S. (1994). Computation and management of chemical properties in CACTVS: An extensible networked approach toward modularity and flexibility. Journal of Chemical Information and Computer Sciences, 34, 109–116. doi:10.1021/ci00017a013.
CAS Google Scholar
Wheeler, D. L., Chappey, C., Lash, A. E., Leipe, D. D., Madden, T. L., Schuler, G. D., et al. (2000). Database resources of the National Center for Biotechnology Information. Nucleic Acids Research, 28, 10–14. doi:10.1093/nar/28.1.10.
Article CAS Google Scholar
Bairoch, A., Apweiler, R., Wu, C. H., Barker, W. C., Boeckmann, B., Ferro, S., et al. (2005). The Universal Protein Resource (UniProt). Nucleic Acids Research, 33, D154–D159. doi:10.1093/nar/gki070.
Article CAS Google Scholar
Benson, D. A., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J., & Wheeler, D. L. (2008). GenBank. Nucleic Acids Research, 36, D25–D30. doi:10.1093/nar/gkm929.
Article CAS Google Scholar
Sugawara, H., Ogasawara, O., Okubo, K., Gojobori, T., & Tateno, Y. (2008). DDBJ with new system and face. Nucleic Acids Research, 36, D22–D24. doi:10.1093/nar/gkm889.
Article CAS Google Scholar
Cochrane, G., Akhtar, R., Aldebert, P., Althorpe, N., Baldwin, A., Bates, K., et al. (2008). Priorities for nucleotide trace, sequence and annotation data capture at the Ensembl Trace Archive and the EMBL Nucleotide Sequence Database. Nucleic Acids Research, 36, D5–D12. doi:10.1093/nar/gkm1018.
Article CAS Google Scholar
Krissinel, E., & Henrick, K. (2007). Inference of macromolecular assemblies from crystalline state. Journal of Molecular Biology, 372, 774–797. doi:10.1016/j.jmb.2007.05.022.
Google Scholar
Hooft, R. W., Vriend, G., Sander, C., & Abola, E. E. (1996). Errors in protein structures. Nature, 381, 272. doi:10.1038/381272a0.
Article CAS Google Scholar
Laskowski, R. A., McArthur, M. W., Moss, D. S., & Thornton, J. M. (1993). PROCHECK: A program to check the stereochemical quality of protein structures. Journal of Applied Crystallography, 26, 283–291. doi:10.1107/S0021889892009944.
Article CAS Google Scholar
Lovell, S. C., Davis, I. W., Arendall, W. B., III, de Bakker, P. I., Word, J. M., Prisant, M. G., et al. (2003). Structure validation by Calpha geometry: Phi, psi and cbeta deviation. Proteins, 50, 437–450. doi:10.1002/prot.10286.
Article CAS Google Scholar
Kleywegt, G. J., & Jones, T. A. (1996). Phi/psi-chology: Ramachandran revisited. Structure (London, England), 4, 1395–1400. doi:10.1016/S0969-2126(96)00147-5.
CAS Google Scholar
Westbrook, J., Feng, Z., Burkhardt, K., & Berman, H. M. (2003). Validation of protein structures for the Protein Data Bank. Methods in Enzymology, 374, 370–385. doi:10.1016/S0076-6879(03)74017-8.
Article CAS Google Scholar
Sayle, R., & Milner-White, E. J. (1995). RasMol: Biomolecular graphics for all. Trends in Biochemical Sciences, 20, 374. doi:10.1016/S0968-0004(00)89080-5.
Article CAS Google Scholar
Pettersen, E. F., Goddard, T. D., Huang, C. C., Couch, G. S., Greenblatt, D. M., Meng, E. C., et al. (2004). UCSF Chimera-a visualization system for exploratory research and analysis. Journal of Computational Chemistry, 25, 1605–1612. doi:10.1002/jcc.20084.
Article CAS Google Scholar
Hartshorn, M. J. (2002). AstexViewer: A visualisation aid for structure-based drug design. Journal of Computer-Aided Molecular Design, 16, 871–881. doi:10.1023/A:1023813504011.
Article CAS Google Scholar
Vaguine, A. A., Richelle, J., & Wodak, S. J. (1999). SFCHECK: A unified set of procedures for evaluating the quality of macromolecular structure-factor data and their agreement with the atomic model. Acta Crystallographica. Section D, Biological Crystallography, 55, 191–205. doi:10.1107/S0907444998006684.
Article CAS Google Scholar
Kleywegt, G. J., Harris, M. R., Zou, J. Y., Taylor, T. C., Wahlby, A., & Jones, T. A. (2004). The uppsala electron-density server. Acta Crystallographica. Section D, Biological Crystallography, 60, 2240–2249. doi:10.1107/S0907444904013253.
Article Google Scholar
Doreleijers, J. F., Nederveen, A. J., Vranken, W., Lin, J., Bonvin, A. M., Kaptein, R., et al. (2005). BioMagResBank databases DOCR and FRED containing converted and filtered sets of experimental NMR restraints and coordinates from over 500 protein PDB structures. Journal of Biomolecular NMR, 32, 1–12. doi:10.1007/s10858-005-2195-0.
Article CAS Google Scholar
Henrick, K., Newman, R., Tagari, M., & Chagoyen, M. (2003). EMDep: A web-based system for the deposition and validation of high-resolution electron microscopy macromolecular structural information. Journal of Structural Biology, 144, 228–237. doi:10.1016/j.jsb.2003.09.009.
Article CAS Google Scholar
Berman, H. M., Burley, S. K., Chiu, W., Sali, A., Adzhubei, A., Bourne, P. E., et al. (2006). Outcome of a workshop on archiving structural models of biological macromolecules. Structure (London, England), 14, 1211–1217. doi:10.1016/j.str.2006.06.005.
CAS Google Scholar
Hempstead, P. D., Yewdall, S. J., Fernie, A. R., Lawson, D. M., Artymiuk, P. J., Rice, D. W., et al. (1997). Comparison of the three-dimensional structures of recombinant human H and horse L ferritins at high resolution. Journal of Molecular Biology, 268, 424–448. doi:10.1006/jmbi.1997.0970.
Article CAS Google Scholar

Download references

Acknowledgment

The authors would like to acknowledge the staff of the 3 wwPDB sites, and our advisory committees. At the RCSB PDB, we acknowledge the programing staff consisting of Li Chen, Zukang Feng, Vladimir Guranovic, Andrei Kouranov, John Westbrook, Huanwang Yang; and the annotation staff consisting of Jaroslaw Blaszczyk, Batsal Devkota, Guanghua Gao, Sutapa Ghosh, Irina Persikova, Bohdan Schneider, Monica Sekharan, Chenghua Shao, Lihua Tan, and Jing Zhou. At PDBe, we acknowledge annotators, Richard Newman, Gaurav Sahni, Glen van Ginkel, and Sanchayita Sen. At PDBj, we acknowledge annotators Minyu Chen, Mayumi Inoue, Reiko Igarashi, Yumiko Kengaku, and Kanna Matsuura. The RCSB PDB is operated by Rutgers, The State University of New Jersey and the San Diego Supercomputer Center and the Skaggs School of Pharmacy and Pharmaceutical Sciences at the University of California, San Diego. It is supported by funds from the National Science Foundation, the National Institute of General Medical Sciences, the Office of Science, Department of Energy, the National Library of Medicine, the National Cancer Institute, the National Center for Research Resources, the National Institute of Biomedical Imaging and Bioengineering, the National Institute of Neurological Disorders and Stroke, and the National Institute of Diabetes and Digestive and Kidney Diseases. PDBe is supported by funds from the Wellcome Trust (GR062025MA), the European Union (TEMBLOR, NMRQUAL, SPINE, AUTOSTRUCT, and IIMS awards), CCP4, the Biotechnology and Biological Sciences Research Council (UK), the Medical Research Council (UK) and European Molecular Biology Laboratory. PDBj is supported by grant-in-aid from the Institute for Bioinformatics Research and Development, Japan Science and Technology Agency (BIRD-JST), and the Ministry of Education, Culture, Sports, Science, and Technology (MEXT). The BMRB is supported by NIH grant LM05799 from the National Library of Medicine.

Author information

Kyle Burkhardt
Present address: Office of Research and Project Administration, Princeton University, 4 New South Building, Princeton, NJ, 08544, USA

Authors and Affiliations

Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB), Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, 610 Taylor Road, Piscataway, NJ, 08854-8087, USA
Shuchismita Dutta, Kyle Burkhardt, Jasmine Young & Helen M. Berman
The Macromolecular Structure Database at the European Bioinformatics Institute PDB-e, EMBL Outstation-Hinxton, Cambridge, CB10 1SD, UK
Ganesh J. Swaminathan & Kim Henrick
Protein Data Bank Japan (PDBj), Institute for Protein Research, Osaka University, 3-2 Yamadaoka, Suita, Osaka, 565-0871, Japan
Takanori Matsuura & Haruki Nakamura

Authors

Shuchismita Dutta
View author publications
You can also search for this author in PubMed Google Scholar
Kyle Burkhardt
View author publications
You can also search for this author in PubMed Google Scholar
Jasmine Young
View author publications
You can also search for this author in PubMed Google Scholar
Ganesh J. Swaminathan
View author publications
You can also search for this author in PubMed Google Scholar
Takanori Matsuura
View author publications
You can also search for this author in PubMed Google Scholar
Kim Henrick
View author publications
You can also search for this author in PubMed Google Scholar
Haruki Nakamura
View author publications
You can also search for this author in PubMed Google Scholar
Helen M. Berman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Helen M. Berman.

Additional information

Shuchismita Dutta, Kyle Burkhardt, and Ganesh J. Swaminathan have contributed equally to this work.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dutta, S., Burkhardt, K., Young, J. et al. Data Deposition and Annotation at the Worldwide Protein Data Bank. Mol Biotechnol 42, 1–13 (2009). https://doi.org/10.1007/s12033-008-9127-7

Download citation

Received: 04 November 2008
Accepted: 06 November 2008
Published: 10 December 2008
Issue Date: May 2009
DOI: https://doi.org/10.1007/s12033-008-9127-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data Deposition and Annotation at the Worldwide Protein Data Bank

Abstract

Access this article

Similar content being viewed by others

Software for molecular docking: a review

Classical Molecular Dynamics in a Nutshell

Advances in Structural Bioinformatics

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Data Deposition and Annotation at the Worldwide Protein Data Bank

Abstract

Access this article

Similar content being viewed by others

Software for molecular docking: a review

Classical Molecular Dynamics in a Nutshell

Advances in Structural Bioinformatics

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation