Abstract
The Protein Data Bank (PDB) is the repository for three-dimensional structures of biological macromolecules, determined by experimental methods. The data in the archive is free and easily available via the Internet from any of the worldwide centers managing this global archive. These data are used by scientists, researchers, bioinformatics specialists, educators, students, and general audiences to understand biological phenomenon at a molecular level. Analysis of this structural data also inspires and facilitates new discoveries in science. This chapter describes the tools and methods currently used for deposition, processing, and release of data in the PDB. References to future enhancements are also included.
Similar content being viewed by others
References
Bernstein, F. C., Koetzle, T. F., Williams, G. J. B., Meyer, E. F., Jr, Brice, M. D., Rodgers, J. R., et al. (1977). Protein Data Bank: A computer-based archival file for macromolecular structures. Journal of Molecular Biology, 112, 535–542. doi:10.1016/S0022-2836(77)80200-3.
Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., et al. (2000). The Protein Data Bank. Nucleic Acids Research, 28, 235–242. doi:10.1093/nar/28.1.235.
Berman, H. M., Henrick, K., & Nakamura, H. (2003). Announcing the worldwide Protein Data Bank. Nature Structural Biology, 10, 980. doi:10.1038/nsb1203-980.
Berman, H. M., Henrick, K., Nakamura, H., & Markley, J. L. (2007). The Worldwide Protein Data Bank (wwPDB): Ensuring a single, uniform archive of PDB data. Nucleic Acids Research, 35, D301–D303. doi:10.1093/nar/gkl971.
Ulrich, E. L., Markley, J. L., & Kyogoku, Y. (1989). Creation of a nuclear magnetic resonance data repository and literature database. Protein Sequences & Data Analysis, 2, 23–37.
Deshpande, N., Addess, K. J., Bluhm, W. F., Merino-Ott, J. C., Townsend-Merino, W., Zhang, Q., et al. (2005). The RCSB Protein Data Bank: A redesigned query system and relational database based on the mmCIF schema. Nucleic Acids Research, 33, D233–D237. doi:10.1093/nar/gki057.
Kouranov, A., Xie, L., de la Cruz, J., Chen, L., Westbrook, J., Bourne, P. E., et al. (2006). The RCSB PDB information portal for structural genomics. Nucleic Acids Research, 34, D302–D305. doi:10.1093/nar/gkj120.
Tagari, M., Tate, J., Swaminathan, G. J., Newman, R., Naim, A., Vranken, W., et al. (2006). E-MSD: Improving data deposition and structure quality. Nucleic Acids Research, 34, D287–D290. doi:10.1093/nar/gkj163.
Henrick, K., & Thornton, J. M. (1998). PQS: A protein quarternary file server. Trends in Biochemical Sciences, 23, 358–361. doi:10.1016/S0968-0004(98)01253-5.
Kinoshita, K., & Nakamura, H. (2004). eF-site and PDBjViewer: Database and viewer for protein functional sites. Bioinformatics (Oxford, England), 20, 1329–1330. doi:10.1093/bioinformatics/bth073.
Standley, D. M., Toh, H., & Nakamura, H. (2005). GASH: An improved algorithm for maximizing the number of equivalent residues between two protein structures. BMC Bioinformatics, 6, 221. doi:10.1186/1471-2105-6-221.
Wako, H., Kato, M., & Endo, S. (2004). ProMode: A database of normal mode analyses on protein molecules with a full-atom model. Bioinformatics (Oxford, England), 20, 2035–2043. doi:10.1093/bioinformatics/bth197.
Stevens, R. C., Yokoyama, S., & Wilson, I. A. (2001). Global efforts in structural genomics. Science, 294, 89–92. doi:10.1126/science.1066011.
Callaway, J., Cummings, M., Deroski, B., Esposito, P., Forman, A., Langdon, P., et al. (1996). Brookhaven National Laboratory.
Dutta, S., & Berman, H. M. (2005). Large macromolecular complexes in the Protein Data Bank: A status report. Structure (London, England), 13, 381–388. doi:10.1016/j.str.2005.01.008.
Yusupov, M. M., Yusupova, G. Z., Baucom, A., Lieberman, K., Earnest, T. N., Cate, J. H. D., et al. (2001). Crystal structure of the ribosome at 5.5 Å resolution. Science, 282, 883–896. doi:10.1126/science.1060089.
Chen, B., Colgrave, M. L., Daly, N. L., Rosengren, K. J., Gustafson, K. R., & Craik, D. J. (2005). Isolation and characterization of novel cyclotides from Viola hederaceae: Solution structure and anti-HIV activity of vhl-1, a leaf-specific expressed cyclotide. The Journal of Biological Chemistry, 280, 22395–22405. doi:10.1074/jbc.M501737200.
Ciszak, E. M., Makal, A., Hong, Y. S., Vettaikkorumakankauv, A. K., Korotchkina, L. G., & Patel, M. S. (2006). How dihydrolipoamide dehydrogenase-binding protein binds dihydrolipoamide dehydrogenase in the human pyruvate dehydrogenase complex. The Journal of Biological Chemistry, 281, 648–655. doi:10.1074/jbc.M507850200.
Bourne, P. E., Berman, H. M., Watenpaugh, K., Westbrook, J. D., & Fitzgerald, P. M. D. (1997). The macromolecular Crystallographic Information File (mmCIF). Methods in Enzymology, 277, 571–590. doi:10.1016/S0076-6879(97)77032-0.
Fitzgerald, P. M. D., Westbrook, J. D., Bourne, P. E., McMahon, B., Watenpaugh, K. D., & Berman, H. M. (2005). Definition and exchange of crystallographic data. In S. R. Hall & B. McMahon (Eds.), International tables for crystallography (Vol. G, pp. 295–443). Dordrecht, The Netherlands: Springer.
Westbrook, J., Henrick, K., Ulrich, E. L., & Berman, H. M. (2005). Definition and exchange of crystallographic data. In S. R. Hall & B. McMahon (Eds.), International tables for crystallography (Vol. G, pp. 195–198). Dordrecht, The Netherlands: Springer.
Westbrook, J. D., Berman, H. M., & Hall, S. R. (2005). Definition and exchange of crystallographic data. In S. R. Hall & B. McMahon (Eds.), International tables for crystallography (Vol. G, pp. 61–72). Dordrecht, The Netherlands: Springer.
Westbrook, J., Ito, N., Nakamura, H., Henrick, K., & Berman, H. M. (2005). PDBML: The representation of archival macromolecular structure data in XML. Bioinformatics (Oxford, England), 21, 988–992. doi:10.1093/bioinformatics/bti082.
Chen, L., Oughtred, R., Berman, H. M., & Westbrook, J. (2004). TargetDB: A target registration database for structural genomics projects. Bioinformatics (Oxford, England), 20, 2860–2862. doi:10.1093/bioinformatics/bth300.
Pajon, A., Ionides, J., Diprose, J., Fillon, J., Fogh, R., Ashton, A. W., et al. (2005). Design of a data model for developing laboratory information management and analysis systems for protein production. Proteins, 58, 278–284. doi:10.1002/prot.20303.
Winn, M. D., Ashton, A. W., Briggs, P. J., Ballard, C. C., & Patel, P. (2002). Ongoing developments in CCP4 for high-throughput structure determination. Acta Crystallographica. Section D, Biological Crystallography, 58, 1929–1936. doi:10.1107/S0907444902016116.
Yang, H., Guranovic, V., Dutta, S., Feng, Z., Berman, H. M., & Westbrook, J. (2004). Automated and accurate deposition of structures solved by X-ray diffraction to the Protein Data Bank. Acta Crystallographica. Section D, Biological Crystallography, 60, 1833–1839. doi:10.1107/S0907444904019419.
Markley, J. L., Bax, A., Arata, Y., Hilbers, C. W., Kaptein, R., Sykes, B. D., et al. (1998). Recommendations for the presentation of NMR structures of proteins and nucleic acids. IUPAC-IUBMB-IUPAB Inter-Union Task Group on the standardization of data bases of protein and nucleic acid structures determined by NMR spectroscopy. Journal of Biomolecular NMR, 12, 1–23. doi:10.1023/A:1008290618449.
Golovin, A., Oldfield, T. J., Tate, J. G., Velankar, S., Barton, G. J., Boutselakis, H., et al. (2004). E-MSD: An integrated data resource for bioinformatics. Nucleic Acids Research, 32(Database issue), D211–D216.
Ihlenfeldt, W. D., Voigt, J. H., Bienfait, B., Oellien, F., & Nicklaus, M. C. (2002). Enhanced CACTVS browser of the Open NCI Database. Journal of Chemical Information and Computer Sciences, 42, 46–57. doi:10.1021/ci010056s.
Ihlenfeldt, W., Takahasi, Y., Abe, H., & Sasaki, S. (1994). Computation and management of chemical properties in CACTVS: An extensible networked approach toward modularity and flexibility. Journal of Chemical Information and Computer Sciences, 34, 109–116. doi:10.1021/ci00017a013.
Wheeler, D. L., Chappey, C., Lash, A. E., Leipe, D. D., Madden, T. L., Schuler, G. D., et al. (2000). Database resources of the National Center for Biotechnology Information. Nucleic Acids Research, 28, 10–14. doi:10.1093/nar/28.1.10.
Bairoch, A., Apweiler, R., Wu, C. H., Barker, W. C., Boeckmann, B., Ferro, S., et al. (2005). The Universal Protein Resource (UniProt). Nucleic Acids Research, 33, D154–D159. doi:10.1093/nar/gki070.
Benson, D. A., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J., & Wheeler, D. L. (2008). GenBank. Nucleic Acids Research, 36, D25–D30. doi:10.1093/nar/gkm929.
Sugawara, H., Ogasawara, O., Okubo, K., Gojobori, T., & Tateno, Y. (2008). DDBJ with new system and face. Nucleic Acids Research, 36, D22–D24. doi:10.1093/nar/gkm889.
Cochrane, G., Akhtar, R., Aldebert, P., Althorpe, N., Baldwin, A., Bates, K., et al. (2008). Priorities for nucleotide trace, sequence and annotation data capture at the Ensembl Trace Archive and the EMBL Nucleotide Sequence Database. Nucleic Acids Research, 36, D5–D12. doi:10.1093/nar/gkm1018.
Krissinel, E., & Henrick, K. (2007). Inference of macromolecular assemblies from crystalline state. Journal of Molecular Biology, 372, 774–797. doi:10.1016/j.jmb.2007.05.022.
Hooft, R. W., Vriend, G., Sander, C., & Abola, E. E. (1996). Errors in protein structures. Nature, 381, 272. doi:10.1038/381272a0.
Laskowski, R. A., McArthur, M. W., Moss, D. S., & Thornton, J. M. (1993). PROCHECK: A program to check the stereochemical quality of protein structures. Journal of Applied Crystallography, 26, 283–291. doi:10.1107/S0021889892009944.
Lovell, S. C., Davis, I. W., Arendall, W. B., III, de Bakker, P. I., Word, J. M., Prisant, M. G., et al. (2003). Structure validation by Calpha geometry: Phi, psi and cbeta deviation. Proteins, 50, 437–450. doi:10.1002/prot.10286.
Kleywegt, G. J., & Jones, T. A. (1996). Phi/psi-chology: Ramachandran revisited. Structure (London, England), 4, 1395–1400. doi:10.1016/S0969-2126(96)00147-5.
Westbrook, J., Feng, Z., Burkhardt, K., & Berman, H. M. (2003). Validation of protein structures for the Protein Data Bank. Methods in Enzymology, 374, 370–385. doi:10.1016/S0076-6879(03)74017-8.
Sayle, R., & Milner-White, E. J. (1995). RasMol: Biomolecular graphics for all. Trends in Biochemical Sciences, 20, 374. doi:10.1016/S0968-0004(00)89080-5.
Pettersen, E. F., Goddard, T. D., Huang, C. C., Couch, G. S., Greenblatt, D. M., Meng, E. C., et al. (2004). UCSF Chimera-a visualization system for exploratory research and analysis. Journal of Computational Chemistry, 25, 1605–1612. doi:10.1002/jcc.20084.
Hartshorn, M. J. (2002). AstexViewer: A visualisation aid for structure-based drug design. Journal of Computer-Aided Molecular Design, 16, 871–881. doi:10.1023/A:1023813504011.
Vaguine, A. A., Richelle, J., & Wodak, S. J. (1999). SFCHECK: A unified set of procedures for evaluating the quality of macromolecular structure-factor data and their agreement with the atomic model. Acta Crystallographica. Section D, Biological Crystallography, 55, 191–205. doi:10.1107/S0907444998006684.
Kleywegt, G. J., Harris, M. R., Zou, J. Y., Taylor, T. C., Wahlby, A., & Jones, T. A. (2004). The uppsala electron-density server. Acta Crystallographica. Section D, Biological Crystallography, 60, 2240–2249. doi:10.1107/S0907444904013253.
Doreleijers, J. F., Nederveen, A. J., Vranken, W., Lin, J., Bonvin, A. M., Kaptein, R., et al. (2005). BioMagResBank databases DOCR and FRED containing converted and filtered sets of experimental NMR restraints and coordinates from over 500 protein PDB structures. Journal of Biomolecular NMR, 32, 1–12. doi:10.1007/s10858-005-2195-0.
Henrick, K., Newman, R., Tagari, M., & Chagoyen, M. (2003). EMDep: A web-based system for the deposition and validation of high-resolution electron microscopy macromolecular structural information. Journal of Structural Biology, 144, 228–237. doi:10.1016/j.jsb.2003.09.009.
Berman, H. M., Burley, S. K., Chiu, W., Sali, A., Adzhubei, A., Bourne, P. E., et al. (2006). Outcome of a workshop on archiving structural models of biological macromolecules. Structure (London, England), 14, 1211–1217. doi:10.1016/j.str.2006.06.005.
Hempstead, P. D., Yewdall, S. J., Fernie, A. R., Lawson, D. M., Artymiuk, P. J., Rice, D. W., et al. (1997). Comparison of the three-dimensional structures of recombinant human H and horse L ferritins at high resolution. Journal of Molecular Biology, 268, 424–448. doi:10.1006/jmbi.1997.0970.
Acknowledgment
The authors would like to acknowledge the staff of the 3 wwPDB sites, and our advisory committees. At the RCSB PDB, we acknowledge the programing staff consisting of Li Chen, Zukang Feng, Vladimir Guranovic, Andrei Kouranov, John Westbrook, Huanwang Yang; and the annotation staff consisting of Jaroslaw Blaszczyk, Batsal Devkota, Guanghua Gao, Sutapa Ghosh, Irina Persikova, Bohdan Schneider, Monica Sekharan, Chenghua Shao, Lihua Tan, and Jing Zhou. At PDBe, we acknowledge annotators, Richard Newman, Gaurav Sahni, Glen van Ginkel, and Sanchayita Sen. At PDBj, we acknowledge annotators Minyu Chen, Mayumi Inoue, Reiko Igarashi, Yumiko Kengaku, and Kanna Matsuura. The RCSB PDB is operated by Rutgers, The State University of New Jersey and the San Diego Supercomputer Center and the Skaggs School of Pharmacy and Pharmaceutical Sciences at the University of California, San Diego. It is supported by funds from the National Science Foundation, the National Institute of General Medical Sciences, the Office of Science, Department of Energy, the National Library of Medicine, the National Cancer Institute, the National Center for Research Resources, the National Institute of Biomedical Imaging and Bioengineering, the National Institute of Neurological Disorders and Stroke, and the National Institute of Diabetes and Digestive and Kidney Diseases. PDBe is supported by funds from the Wellcome Trust (GR062025MA), the European Union (TEMBLOR, NMRQUAL, SPINE, AUTOSTRUCT, and IIMS awards), CCP4, the Biotechnology and Biological Sciences Research Council (UK), the Medical Research Council (UK) and European Molecular Biology Laboratory. PDBj is supported by grant-in-aid from the Institute for Bioinformatics Research and Development, Japan Science and Technology Agency (BIRD-JST), and the Ministry of Education, Culture, Sports, Science, and Technology (MEXT). The BMRB is supported by NIH grant LM05799 from the National Library of Medicine.
Author information
Authors and Affiliations
Corresponding author
Additional information
Shuchismita Dutta, Kyle Burkhardt, and Ganesh J. Swaminathan have contributed equally to this work.
Rights and permissions
About this article
Cite this article
Dutta, S., Burkhardt, K., Young, J. et al. Data Deposition and Annotation at the Worldwide Protein Data Bank. Mol Biotechnol 42, 1–13 (2009). https://doi.org/10.1007/s12033-008-9127-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12033-008-9127-7