Advertisement

Protein Data Bank (PDB): The Single Global Macromolecular Structure Archive

  • Stephen K. BurleyEmail author
  • Helen M. Berman
  • Gerard J. Kleywegt
  • John L. Markley
  • Haruki Nakamura
  • Sameer Velankar
Protocol
Part of the Methods in Molecular Biology book series (MIMB, volume 1607)

Abstract

The Protein Data Bank (PDB)––the single global repository of experimentally determined 3D structures of biological macromolecules and their complexes––was established in 1971, becoming the first open-access digital resource in the biological sciences. The PDB archive currently houses ~130,000 entries (May 2017). It is managed by the Worldwide Protein Data Bank organization (wwPDB; wwpdb.org), which includes the RCSB Protein Data Bank (RCSB PDB; rcsb.org), the Protein Data Bank Japan (PDBj; pdbj.org), the Protein Data Bank in Europe (PDBe; pdbe.org), and BioMagResBank (BMRB; www.bmrb.wisc.edu). The four wwPDB partners operate a unified global software system that enforces community-agreed data standards and supports data Deposition, Biocuration, and Validation of ~11,000 new PDB entries annually (deposit.wwpdb.org). The RCSB PDB currently acts as the archive keeper, ensuring disaster recovery of PDB data and coordinating weekly updates. wwPDB partners disseminate the same archival data from multiple FTP sites, while operating complementary websites that provide their own views of PDB data with selected value-added information and links to related data resources. At present, the PDB archives experimental data, associated metadata, and 3D-atomic level structural models derived from three well-established methods: crystallography, nuclear magnetic resonance spectroscopy (NMR), and electron microscopy (3DEM). wwPDB partners are working closely with experts in related experimental areas (small-angle scattering, chemical cross-linking/mass spectrometry, Forster energy resonance transfer or FRET, etc.) to establish a federation of data resources that will support sustainable archiving and validation of 3D structural models and experimental data derived from integrative or hybrid methods.

Key words

Protein Data Bank PDB Worldwide Protein Data Bank wwPDB PDBx/mmCIF Chemical Component Dictionary Crystallography NMR spectroscopy NMR-STAR NMR Exchange Format NEF 3D electron microscopy Integrative or hybrid methods 

Notes

Acknowledgments

The RCSB PDB is supported by the National Science Foundation (DBI 1338415), National Institutes of Health, and the Department of Energy; PDBe by the Wellcome Trust, BBSRC, MRC, EU, CCP4 , and EMBL-EBI; PDBj by JST-NBDC; and BMRB by the National Institute of General Medical Sciences (GM109046). We thank Christine Zardecki for expert help with manuscript preparation.

References

  1. 1.
    Protein Data Bank (1971) Protein Data Bank. Nature New Biology 233:223Google Scholar
  2. 2.
    Kendrew JC, Bodo G, Dintzis HM et al (1958) A three-dimensional model of the myoglobin molecule obtained by X-ray analysis. Nature 181:662–666CrossRefPubMedGoogle Scholar
  3. 3.
    Kendrew JC, Dickerson RE, Strandberg BE et al (1960) Structure of myoglobin: a three-dimensional Fourier synthesis at 2 Å resolution. Nature 185:422–427CrossRefPubMedGoogle Scholar
  4. 4.
    Bolton W, Perutz MF (1970) Three dimensional fourier synthesis of horse deoxyhaemoglobin at 2.8 Ångstrom units resolution. Nature 228:551–552CrossRefPubMedGoogle Scholar
  5. 5.
    Perutz MF, Rossmann MG, Cullis AF et al (1960) Structure of haemoglobin: a three-dimensional Fourier synthesis at 5.5 Å resolution, obtained by X-ray analysis. Nature 185:416–422CrossRefPubMedGoogle Scholar
  6. 6.
    Cold Spring Laboratory (1972) Cold Spring Harbor Symposia on quantitative biology, vol 36. Cold Spring Laboratory Press, Cold Spring Harbor, NYGoogle Scholar
  7. 7.
    Berman H (2008) The Protein Data Bank: a historical perspective. Acta Crystallogr A 64:88–95CrossRefPubMedGoogle Scholar
  8. 8.
    Meyer EF (1997) The first years of the Protein Data Bank. Protein Sci 6:1591–1597CrossRefPubMedPubMedCentralGoogle Scholar
  9. 9.
    International Union of Crystallography (1989) Policy on publication and the deposition of data from crystallographic studies of biological macromolecules. Acta Crystallogr A 45:658CrossRefGoogle Scholar
  10. 10.
    Sussman JL, Lin D, Jiang J et al (1998) Protein Data Bank (PDB): database of three-dimensional structural information of biological macromolecules. Acta Crystallogr D Biol Crystallogr 54:1078–1084CrossRefPubMedGoogle Scholar
  11. 11.
    Berman HM, Westbrook J, Feng Z et al (2000) The Protein Data Bank. Nucleic Acids Res 28:235–242CrossRefPubMedPubMedCentralGoogle Scholar
  12. 12.
    Standley DM, Kinjo AR, Kinoshita K et al (2008) Protein structure databases with new web services for structural biology and biomedical research. Brief Bioinform 9:276–285CrossRefPubMedGoogle Scholar
  13. 13.
    Keller PA, Henrick K, McNeil P et al (1998) Deposition of macromolecular structures. Acta Crystallogr D Biol Crystallogr 54:1105–1108CrossRefPubMedGoogle Scholar
  14. 14.
    Velankar S, van Ginkel G, Alhroub Y et al (2016) PDBe: improved accessibility of macromolecular structure data from PDB and EMDB. Nucleic Acids Res 44:D385–D395CrossRefPubMedGoogle Scholar
  15. 15.
    Berman HM, Henrick K, Nakamura H (2003) Announcing the worldwide Protein Data Bank. Nat Struct Biol 10:980CrossRefPubMedGoogle Scholar
  16. 16.
    Ulrich EL, Markley JL, Kyogoku Y (1989) Creation of a nuclear magnetic resonance data repository and literature database. Protein Seq Data Anal 2:23–37PubMedGoogle Scholar
  17. 17.
    Markley JL, Ulrich EL, Berman HM et al (2008) BioMagResBank (BMRB) as a partner in the Worldwide Protein Data Bank (wwPDB): new policies affecting biomolecular NMR depositions. J Biomol NMR 40:153–155CrossRefPubMedPubMedCentralGoogle Scholar
  18. 18.
    Ulrich EL, Akutsu H, Doreleijers JF et al (2008) BioMagResBank. Nucleic Acids Res 36:D402–D408CrossRefPubMedGoogle Scholar
  19. 19.
    Velankar S, Best C, Beuth B et al (2010) PDBe: Protein Data Bank in Europe. Nucleic Acids Res 38:D308–D317CrossRefPubMedGoogle Scholar
  20. 20.
    Lin D, Manning NO, Jiang J et al (2000) AutoDep: a web-based system for deposition and validation of macromolecular structural information. Acta Crystallogr D Biol Crystallogr 56:828–841CrossRefPubMedGoogle Scholar
  21. 21.
    Tagari M, Tate J, Swaminathan GJ et al (2006) E-MSD: improving data deposition and structure quality. Nucleic Acids Res 34:D287–D290CrossRefPubMedGoogle Scholar
  22. 22.
    Read RJ, Adams PD, Arendall WB et al (2011) A new generation of crystallographic validation tools for the Protein Data Bank. Structure 19:1395–1412CrossRefPubMedPubMedCentralGoogle Scholar
  23. 23.
    Montelione GT, Nilges M, Bax A et al (2013) Recommendations of the wwPDB NMR Validation Task Force. Structure 21:1563–1570CrossRefPubMedGoogle Scholar
  24. 24.
    Henderson R, Sali A, Baker ML et al (2012) Outcome of the first electron microscopy validation task force meeting. Structure 20:205–214CrossRefPubMedGoogle Scholar
  25. 25.
    Berman HM, Burley SK, Chiu W et al (2006) Outcome of a workshop on archiving structural models of biological macromolecules. Structure 14:1211–1217CrossRefPubMedGoogle Scholar
  26. 26.
    Arnold K, Kiefer F, Kopp J et al (2009) The Protein Model Portal. J Struct Funct Genom 10:1–8CrossRefGoogle Scholar
  27. 27.
    Trewhella J, Hendrickson WA, Kleywegt GJ et al (2013) Report of the wwPDB Small-Angle Scattering Task Force: data requirements for biomolecular modeling and the PDB. Structure 21:875–881CrossRefPubMedGoogle Scholar
  28. 28.
    Valentini E, Kikhney AG, Previtali G et al (2015) SASBDB, a repository for biological small-angle scattering data. Nucleic Acids Res 43:D357–D363CrossRefPubMedGoogle Scholar
  29. 29.
    Groom CR, Bruno IJ, Lightfoot MP et al (2016) The Cambridge Structural Database. Acta Crystallogr B 72:171–179CrossRefGoogle Scholar
  30. 30.
    Adams PD, Aertgeerts K, Bauer C et al (2016) Outcome of the First wwPDB/CCDC/D3R Ligand Validation Workshop. Structure 24:502–508CrossRefPubMedPubMedCentralGoogle Scholar
  31. 31.
    Meyer PA, Socias S, Key J et al (2016) Data publication with the structural biology data grid supports live analysis. Nature Commun 7:10882CrossRefGoogle Scholar
  32. 32.
    Markley JL, Ulrich EL, Westler WM et al (2003) Macromolecular structure determination by NMR spectroscopy. In: Bourne PE, Weissig H (eds) Structural bioinformatics. John Wiley & Sons, Inc., Hoboken, NJ, pp 89–113Google Scholar
  33. 33.
    Lawson CL, Patwardhan A, Baker ML et al (2016) EMDataBank unified data resource for 3DEM. Nucleic Acids Res 44:D396–D403CrossRefPubMedGoogle Scholar
  34. 34.
    Iudin A, Korir PK, Salavert-Torres J et al (2016) EMPIAR: a public archive for raw electron microscopy image data. Nat Methods 13:387CrossRefPubMedGoogle Scholar
  35. 35.
    Bernstein FC, Koetzle TF, Williams GJB et al (1977) Protein Data Bank: a computer-based archival file for macromolecular structures. J Mol Biol 112:535–542CrossRefPubMedGoogle Scholar
  36. 36.
    Fitzgerald PMD, Westbrook JD, Bourne PE et al (2005) 4.5 Macromolecular dictionary (mmCIF). In: Hall SR, McMahon B (eds) International Tables for Crystallography G. Definition and exchange of crystallographic data. Springer, Dordrecht, The Netherlands, pp 295–443Google Scholar
  37. 37.
    Westbrook JD, Henrick K, Ulrich EL et al (2005) Appendix 3.6.2. The Protein Data Bank Exchange Data Dictionary. In: Hall SR, McMahon B (eds) International Tables for Crystallography G. Definition and exchange of crystallographic data. Springer, Dordrecht, The Netherlands, pp 195–198Google Scholar
  38. 38.
    Westbrook J, Ito N, Nakamura H et al (2005) PDBML: the representation of archival macromolecular structure data in XML. Bioinformatics 21:988–992CrossRefPubMedGoogle Scholar
  39. 39.
    Kinjo AR, Suzuki H, Yamashita R et al (2012) Protein Data Bank Japan (PDBj): maintaining a structural data archive and resource description framework format. Nucleic Acids Res 40:D453–D460CrossRefPubMedGoogle Scholar
  40. 40.
    Yokochi M, Kobayashi N, Ulrich EL et al (2016) Publication of nuclear magnetic resonance experimental data with semantic web technology and the application thereof to biomedical research of proteins. J Biomed Semantics 7:16CrossRefPubMedPubMedCentralGoogle Scholar
  41. 41.
    Malfois M, Svergun DI (2000) sasCIF: an extension of core Crystallographic Information File for SAS. J Appl Crystallogr 33:812–816CrossRefGoogle Scholar
  42. 42.
    Ulrich EL, Argentar D, Klimowicz A et al (1996) STAR/CIF macromolecular NMR data dictionaries and data file formats. Acta Crystallogr A 52:C577–C577CrossRefGoogle Scholar
  43. 43.
    Berman HM, Henrick K, Nakamura H et al (2009) The Worldwide Protein Data Bank. In: Gu J, Bourne PE (eds) Structural bioinformatics, 2nd edn. Wiley, Hoboken, NJ, pp 293–303Google Scholar
  44. 44.
    Doreleijers JF, Vranken WF, Schulte C et al (2012) NRG-CING: integrated validation reports of remediated experimental biomolecular NMR data and coordinates in wwPDB. Nucleic Acids Res 40:D519–D524CrossRefPubMedGoogle Scholar
  45. 45.
    Doreleijers JF, Vranken WF, Schulte C et al (2009) The NMR restraints grid at BMRB for 5,266 protein and nucleic acid PDB entries. J Biomol NMR 45:389–396CrossRefPubMedPubMedCentralGoogle Scholar
  46. 46.
    Gutmanas A, Adams PD, Bardiaux B et al (2015) NMR Exchange Format: a unified and open standard for representation of NMR restraint data. Nat Struct Mol Biol 22:433–434CrossRefPubMedPubMedCentralGoogle Scholar
  47. 47.
    Westbrook JD, Shao C, Feng Z et al (2015) The chemical component dictionary: complete descriptions of constituent molecules in experimentally determined 3D macromolecules in the Protein Data Bank. Bioinformatics 31:1274–1278CrossRefPubMedGoogle Scholar
  48. 48.
    Dutta S, Dimitropoulos D, Feng Z et al (2014) Improving the representation of peptide-like inhibitor and antibiotic molecules in the Protein Data Bank. Biopolymers 101:659–668CrossRefPubMedPubMedCentralGoogle Scholar
  49. 49.
    UniProt Consortium (2015) UniProt: a hub for protein information. Nucleic Acids Res 43:D204–D212CrossRefGoogle Scholar
  50. 50.
    Caboche S, Pupin M, Leclere V et al (2008) NORINE: a database of nonribosomal peptides. Nucleic Acids Res 36:D326–D331CrossRefPubMedGoogle Scholar
  51. 51.
    Haas J, Roth S, Arnold K et al (2013) The Protein Model Portal—a comprehensive resource for protein structure and model information. Database 2013:bat031CrossRefPubMedPubMedCentralGoogle Scholar
  52. 52.
    Prischi F, Pastore A (2016) Application of nuclear magnetic resonance and hybrid methods to structure determination of complex systems. Adv Exper Med Biol 896:351–368CrossRefGoogle Scholar
  53. 53.
    Cornilescu G, Didychuk AL, Rodgers ML et al (2016) Structural analysis of multi-helical RNAs by NMR-SAXS/WAXS: application to the U4/U6 di-snRNA. J Mol Biol 428:777–789CrossRefPubMedGoogle Scholar
  54. 54.
    Venditti V, Egner TK, Clore GM (2016) Hybrid approaches to structural characterization of conformational ensembles of complex macromolecular systems combining NMR residual dipolar couplings and solution X-ray scattering. Chem Rev 116:6305–6322CrossRefPubMedGoogle Scholar
  55. 55.
    Erzberger JP, Stengel F, Pellarin R et al (2014) Molecular architecture of the 40SeIF1eIF3 translation initiation complex. Cell 158:1123–1135CrossRefPubMedPubMedCentralGoogle Scholar
  56. 56.
    Sali A, Berman HM, Schwede T et al (2015) Outcome of the First wwPDB Hybrid/Integrative Methods Task Force Workshop. Structure 23:1156–1167CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer Science+Business Media LLC 2017

Authors and Affiliations

  • Stephen K. Burley
    • 1
    • 2
    • 3
    Email author
  • Helen M. Berman
    • 1
  • Gerard J. Kleywegt
    • 4
  • John L. Markley
    • 5
  • Haruki Nakamura
    • 6
  • Sameer Velankar
    • 4
  1. 1.Research Collaboratory for Structural Bioinformatics Protein Data Bank, Center for Integrative Proteomics, Research, Institute for Quantitative Biomedicine, and Department of Chemistry and Chemical BiologyRutgers, The State University of New JerseyPiscatawayUSA
  2. 2.Rutgers Cancer Institute of New JerseyRobert Wood Johnson Medical SchoolNew BrunswickUSA
  3. 3.Skaggs School of Pharmacy and Pharmaceutical Sciences and San Diego Supercomputer CenterUniversity of California, San DiegoLa JollaUSA
  4. 4.Protein Data Bank in EuropeEuropean Molecular Biology Laboratory–European Bioinformatics InstituteCambridgeUK
  5. 5.BioMagResBank, Department of BiochemistryUniversity of Wisconsin-MadisonMadisonUSA
  6. 6.Protein Data Bank Japan, Institute for Protein ResearchOsaka UniversityOsakaJapan

Personalised recommendations