Skip to main content

The Protein Data Bank Archive

  • Protocol
  • First Online:
Structural Proteomics

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2305))

Abstract

Protein Data Bank is the single worldwide archive of experimentally determined macromolecular structure data. Established in 1971 as the first open access data resource in biology, the PDB archive is managed by the worldwide Protein Data Bank (wwPDB) consortium which has four partners—the RCSB Protein Data Bank (RCSB PDB; rcsb.org), the Protein Data Bank Japan (PDBj; pdbj.org), the Protein Data Bank in Europe (PDBe; pdbe.org), and BioMagResBank (BMRB; www.bmrb.wisc.edu). The PDB archive currently includes ~175,000 entries. The wwPDB has established a number of task forces and working groups that bring together experts form the community who provide recommendations on improving data standards and data validation for improving data quality and integrity. The wwPDB members continue to develop the joint deposition, biocuration, and validation system (OneDep) to improve data quality and accommodate new data from emerging techniques such as 3DEM. Each PDB entry contains coordinate model and associated metadata for all experimentally determined atomic structures, experimental data for the traditional structure determination techniques (X-ray crystallography and nuclear magnetic resonance (NMR) spectroscopy), validation reports, and additional information on quaternary structures. The wwPDB partners are committed to following the FAIR (Findability, Accessibility, Interoperability, and Reproducibility) principles and have implemented a DOI resolution mechanism that provides access to all the relevant files for a given PDB entry. On average, >250 new entries are added to the archive every week and made available by each wwPDB partner via FTP area. The wwPDB partner sites also develop data access and analysis tools and make these available via their websites. wwPDB continues to work with experts in the community to establish a federation of archives for archiving structures determined using integrative/hybrid method where multiple experimental techniques are used.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. wwPDB Consortium (2019) Protein data Bank: the single global archive for 3D macromolecular structure data jointly managed by the worldwide protein data bank. Nucleic Acids Res 47(D1):520–528

    Article  CAS  Google Scholar 

  2. Durinx C, McEntyre J, Appel R et al (2016) Identifying ELIXIR core data resources. F1000Res 5. https://doi.org/10.12688/f1000research.9656.2

  3. Bousfield D, McEntyre J, Velankar S et al (2016) Patterns of database citation in articles and patents indicate long-term scientific and industry value of biological data resources. F1000Res 5. https://doi.org/10.12688/f1000research.7911.1

  4. Burley SK, Berman HM, Christie C et al (2018) RCSB protein data Bank: sustaining a living digital data resource that enables breakthroughs in scientific research and biomedical education. Protein Sci 27(1):316–330

    Article  CAS  PubMed  Google Scholar 

  5. Westbrook JD, Burley SK (2019) How structural biologists and the protein data bank contributed to recent FDA new drug approvals. Structure 27:211–217

    Article  CAS  PubMed  Google Scholar 

  6. Berman H, Henrick K, Nakamura H (2003) Announcing the worldwide protein data Bank. Nat Struct Biol 10:980

    Article  CAS  PubMed  Google Scholar 

  7. Burley SK, Berman HM, Bhikadiya C et al (2019) RCSB protein data bank: biological macromolecular structures enabling research and education in fundamental biology, biomedicine, biotechnology and energy. Nucleic Acids Res 47:D464–D474

    Article  CAS  PubMed  Google Scholar 

  8. Kinjo AR, Bekker GJ, Wako H et al (2018) New tools and functions in data-out activities at protein data Bank Japan (PDBj). Protein Sci 27:95–102

    Article  CAS  PubMed  Google Scholar 

  9. Armstrong DR, Berrisford JM, Conroy MJ et al (2020) PDBe: improved findability of macromolecular structure data in the PDB. Nucleic Acids Res 48:D335–D343

    CAS  PubMed  Google Scholar 

  10. Ulrich EL, Akutsu H, Doreleijers JF et al (2008) BioMagResBank. Nucleic Acids Res 36:D402–D408

    Article  CAS  PubMed  Google Scholar 

  11. Watson JD, Crick FH (1953) Molecular structure of nucleic acids; a structure for deoxyribose nucleic acid. Nature 171:737–738

    Article  CAS  PubMed  Google Scholar 

  12. Kendrew JC, Bodo G, Dintzis HM et al (1958) A three-dimensional model of the myoglobin molecule obtained by x-ray analysis. Nature 181:662–666

    Article  CAS  PubMed  Google Scholar 

  13. Perutz MF, Rossmann MG, Cullis AF et al (1960) Structure of haemoglobin: a three-dimensional Fourier synthesis at 5.5-A. resolution, obtained by X-ray analysis. Nature 185:416–422

    Article  CAS  PubMed  Google Scholar 

  14. (1971) Crystallography: protein data Bank. Nat New Biol 233:223–223

    Google Scholar 

  15. Kennard O, Watson DG, Town WG (1972) Cambridge crystallographic data centre. I. Bibliographic file. J Chem Doc 12:14–19

    Article  CAS  Google Scholar 

  16. Groom CR, Bruno IJ, Lightfoot MP, Ward SC (2016) The Cambridge structural database. Acta Crystallogr B Struct Sci Cryst Eng Mater 72:171–179

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. The Protein Data Bank Newsletter Nr 10, Oct 1979 (1979) ftp://ftp.wwpdb.org/pub/pdb/doc/newsletters/bnl/news10_oct79.pdf

    Google Scholar 

  18. Bernstein FC, Koetzle TF, Williams GJ et al (1977) The protein data bank: a computer-based archival file for macromolecular structures. J Mol Biol 112:535–542

    Article  CAS  PubMed  Google Scholar 

  19. Berman HM (2008) The protein data bank: a historical perspective. Acta Crystallogr A 64:88–95

    Article  CAS  PubMed  Google Scholar 

  20. (1989) Commission on biological macromolecules. Acta Crystallogr A 45:658

    Google Scholar 

  21. Sussman JL, Lin D, Jiang J et al (1998) Protein data Bank (PDB): database of three-dimensional structural information of biological macromolecules. Acta Crystallogr D Biol Crystallogr 54:1078–1084

    Article  CAS  PubMed  Google Scholar 

  22. Keller PA, Henrick K, McNeil P et al (1998) Deposition of macromolecular structures. Acta Crystallogr D Biol Crystallogr 54:1105–1108

    Article  CAS  PubMed  Google Scholar 

  23. Berman HM, Westbrook J, Feng Z et al (2000) The protein data bank. Nucleic Acids Res 28(1):235–242

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Henrick K, Newman R, Tagari M, Chagoyen M (2003) EMDep: a web-based system for the deposition and validation of high-resolution electron microscopy macromolecular structural information. J Struct Biol 144:228–237

    Article  CAS  PubMed  Google Scholar 

  25. Markley JL, Ulrich EL, Berman HM et al (2008) BioMagResBank (BMRB) as a partner in the worldwide protein data Bank (wwPDB): new policies affecting biomolecular NMR depositions. J Biomol NMR 40:153–155

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Wilkinson MD, Dumontier M, Aalbersberg IJ (2016) The FAIR guiding principles for scientific data management and stewardship. Sci Data 3:160018

    Article  PubMed  PubMed Central  Google Scholar 

  27. Read RJ, Adams PD, Arendall WB et al (2011) A new generation of crystallographic validation tools for the protein data bank. Structure 19:1395–1412

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Montelione GT, Nilges M, Bax A et al (2013) Recommendations of the wwPDB NMR validation task force. Structure 21:1563–1570

    Article  CAS  PubMed  Google Scholar 

  29. Henderson R, Sali A, Baker ML et al (2012) Outcome of the first electron microscopy validation task force meeting. Structure 20:205–214

    Article  CAS  PubMed  Google Scholar 

  30. Gore S, Sanz Garcia E, Hendrickx PM et al (2017) Validation of structures in the protein data bank. Structure 25:1916–1927

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Young JY, Westbrook JD, Feng Z et al (2017) OneDep: unified wwPDB system for deposition, biocuration, and validation of macromolecular structures in the PDB archive. Structure 25:536–545

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Adams PD, Aertgeerts K, Bauer C et al (2016) Outcome of the first wwPDB/CCDC/D3R ligand validation workshop. Structure 24:502–508

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Smart OS, Bricogne G (2015) Achieving high quality ligand chemistry in protein-ligand crystal structures for drug design. In: Scapin G, Patel D, Arnold E (eds) Multifaceted roles of crystallography in modern drug discovery, Dordrecht, 2015. Springer, Netherlands, pp 165–181

    Google Scholar 

  34. Ulrich EL, Baskaran K, Dashti H et al (2019) NMR-STAR: comprehensive ontology for representing, archiving and exchanging data from nuclear magnetic resonance spectroscopic experiments. J Biomol NMR 73:5–9

    Article  CAS  PubMed  Google Scholar 

  35. Sali A, Berman HM, Schwede T et al (2015) Outcome of the first wwPDB hybrid/integrative methods task force workshop. Structure 23:1156–1167

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Burley SK, Kurisu G, Markley JL et al (2017) PDB-dev: a prototype system for depositing integrative/hybrid structural models. Structure 25:1317–1318

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Jacobson RH, Zhang XJ, DuBose RF, Matthews BW (1994) Three-dimensional structure of beta-galactosidase from E. coli. Nature 369:761–766

    Article  CAS  PubMed  Google Scholar 

  38. Hall SR, Allen FH, Brown ID (1991) The crystallographic information file (CIF): a new standard archive file for crystallography. Acta Crystallogr A 47:31

    Article  Google Scholar 

  39. Hall SR (1991) The STAR file: a new format for electronic data transfer and archiving. J Chem Inf Comp Sci 31:326–333

    Article  CAS  Google Scholar 

  40. Westbrook JD, Bourne PE (2000) STAR/mmCIF: an ontology for macromolecular structure. Bioinformatics 16:159–168

    Article  CAS  PubMed  Google Scholar 

  41. Fitzgerald PM, Westbrook JD, Bourne PE et al (2005) The macromolecular dictionary (mmCIF). In: Hall SR, McMahon B (eds) International tables for crystallography, vol G. International tables for crystallography. Springer, Dordrecht, pp 295–443

    Google Scholar 

  42. Westbrook J, Henrick K, Ulrich EL, HM B (2005) The protein data bank exchange dictionary. In: International tables for crystallography, vol G. Springer, Dordrecht, pp 195–198

    Google Scholar 

  43. Kachala M, Westbrook J, Svergun D (2016) Extension of the sasCIF format and its applications for data processing and deposition. J Appl Crystallogr 49:302–310

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Vallat B, Webb B, Westbrook JD et al (2018) Development of a prototype system for archiving integrative/hybrid structure models of biological macromolecules. Structure 26:894–904

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Westbrook J, Ito N, Nakamura H et al (2005) PDBML: the representation of archival macromolecular structure data in XML. Bioinformatics 21:988–992

    Article  CAS  PubMed  Google Scholar 

  46. Kinjo AR, Suzuki H, Yamashita R et al (2012) Protein Data Bank Japan (PDBj): maintaining a structural data archive and resource description framework format. Nucleic Acids Res 40:D453–D460

    Article  CAS  PubMed  Google Scholar 

  47. Westbrook JD, Shao C, Feng Z et al (2015) The chemical component dictionary: complete descriptions of constituent molecules in experimentally determined 3D macromolecules in the protein data Bank. Bioinformatics 31:1274–1278

    Article  PubMed  Google Scholar 

  48. Dutta S, Dimitropoulos D, Feng Z et al (2014) Improving the representation of peptide-like inhibitor and antibiotic molecules in the Protein Data Bank. Biopolymers 101:659–668

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Abbott S, Iudin A, Korir PK et al (2018) EMDB web resources. Curr Protoc Bioinformatics 61:5. 10 11-15 10 12

    PubMed  PubMed Central  Google Scholar 

  50. Winn MD, Ballard CC, Cowtan KD et al (2011) Overview of the CCP4 suite and current developments. Acta Crystallogr D Biol Crystallogr 67:235–242

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Young JY, Westbrook JD, Feng Z et al (2018) Worldwide Protein Data Bank biocuration supporting open access to high-quality 3D structural biology data. Database (Oxford) 2018. https://doi.org/10.1093/database/bay002

  52. UniProt Consortium (2019) UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res 47:D506–D515

    Article  CAS  Google Scholar 

  53. Sayers EW, Beck J, Brister JR et al (2020) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 48:D9–D16

    Article  CAS  PubMed  Google Scholar 

  54. Shao C, Liu Z, Yang H et al (2018) Outlier analyses of the Protein Data Bank archive using a probability-density-ranking approach. Sci Data 5:180293

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Smart OS, Horsky V, Gore S et al (2018) Worldwide Protein Data Bank validation information: usage and trends. Acta Crystallogr D Struct Biol 74:237–244

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Liebschner D, Afonine PV, Baker ML et al (2019) Macromolecular structure determination using X-rays, neutrons and electrons: recent developments in Phenix. Acta Crystallogr D Struct Biol 75:861–877

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Potterton L, Agirre J, Ballard C et al (2018) CCP4i2: the new graphical user interface to the CCP4 program suite. Acta Crystallogr D Struct Biol 74:68–84

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Adams PD, Afonine PV, Baskaran K et al (2019) Announcing mandatory submission of PDBx/mmCIF format files for crystallographic depositions to the Protein Data Bank (PDB). Acta Crystallogr D Struct Biol 75:451–454

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Lemak A, Wu B, Yee A et al (2014) Structural characterisation of a flexible two-domain protein in solution using small angle X-ray scattering and NMR data. Structure 22:1862–1874

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Schlundt A, Tants JN, Sattler M (2017) Integrated structural biology to unravel molecular mechanisms of protein-RNA recognition. Methods 118:119–136

    Article  PubMed  CAS  Google Scholar 

  61. Kikhney AG, Borges CR, Molodenskiy DS et al (2020) SASBDB: towards an automatically curated and validated repository for biological scattering data. Protein Sci 29:66–75

    Article  CAS  PubMed  Google Scholar 

  62. Moult J, Fidelis K, Kryshtafovych A et al (2018) Critical assessment of methods of protein structure prediction (CASP)-round XII. Proteins 86(Suppl 1):7–15

    Article  CAS  PubMed  Google Scholar 

  63. Lensink MF, Nadzirin N, Velankar S, Wodak SJ (2019) Modeling protein-protein, protein-peptide, and protein-oligosaccharide complexes: CAPRI 7th edition. Proteins. https://doi.org/10.1002/prot.25870

  64. Haas J, Gumienny R, Barbato A et al (2019) Introducing "best single template" models as reference baseline for the continuous automated model evaluation (CAMEO). Proteins 87:1378–1387

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Wagner JR, Churas CP, Liu S et al (2019) Continuous evaluation of ligand protein predictions: a weekly community challenge for drug docking. Structure 27:1326–1335

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Rose AS, Bradley AR, Valasatava Y et al (2018) NGL viewer: web-based molecular graphics for large complexes. Bioinformatics 34:3755–3758

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Sehnal D, Deshpande M, Varekova RS et al (2017) LiteMol suite: interactive web-based visualization of large-scale macromolecular structure data. Nat Methods 14:1121–1122

    Article  CAS  PubMed  Google Scholar 

  68. Dana JM, Gutmanas A, Tyagi N, Qi G et al (2019) SIFTS: updated structure integration with function, taxonomy and sequences resource allows 40-fold increase in coverage of structure-based annotations for proteins. Nucleic Acids Res 47:D482–D489

    Article  CAS  PubMed  Google Scholar 

  69. Berman HM, Adams PD, Bonvin AA et al (2019) Federating structural models and data: outcomes from a workshop on archiving integrative structures. Structure 27:1745

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Morin A, Eisenbraun B, Key J et al (2013) Collaboration gets the most out of software. elife 2:e01456

    Article  PubMed  PubMed Central  Google Scholar 

  71. Iudin A, Korir PK, Salavert-Torres J et al (2016) EMPIAR: a public archive for raw electron microscopy image data. Nat Methods 13:387–388

    Article  CAS  PubMed  Google Scholar 

  72. Perez-Riverol Y, Csordas A, Bai J et al (2019) The PRIDE database and related tools and resources in 2019: improving support for quantification data. Nucleic Acids Res 47:D442–D450

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgments

The Protein Data Bank in Europe is supported by European Molecular Biology Laboratory-European Bioinformatics Institute; Wellcome Trust [104948]; Biotechnology and Biological Sciences Research Council [BB/G022577/1, BB/J007471/1, BB/K016970/1, BB/K020013/1, BB/M013146/1, BB/M011674/1, BB/M020347/1, BB/M020428/1, BB/P024351/1]; European Union [284209], ELIXIR, and Open Targets. The RCSB PDB is jointly funded by the National Science Foundation (DBI-1832184), the National Institutes of Health (R01GM133198), and the United States Department of Energy (DE-SSC0019749). PDBj is funded by the National Bioscience Database Center of Japan Science and Technology Agency (JST-NBDC), the Basis for Supporting Innovative Drug Discovery and Life Science Research of Japan Agency for Medical Research and Development (AMED-BINDS), and the Joint Usage / Research Center project assigned to Institute for Protein Research, Osaka University, by the Ministry of Education, Culture, Sports, Science and Technology (MEXT), Japan. BMRB is supported by US National Institutes of Health (NIH) grant R01GM109046. We gratefully acknowledge contributions from John Berrisford, Aleks Gutmanas, Eldon L. Ulrich, Jasmine Young, and John Westbrook, and all wwPDB staff members present and past. We would like to acknowledge wwPDB collaborators and partners at the EMDB, SASBDB, CCP4, CCPEM, CCPN, and the global structural biology and bioinformatics communities.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sameer Velankar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Velankar, S., Burley, S.K., Kurisu, G., Hoch, J.C., Markley, J.L. (2021). The Protein Data Bank Archive. In: Owens, R.J. (eds) Structural Proteomics. Methods in Molecular Biology, vol 2305. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-1406-8_1

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-1406-8_1

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-1405-1

  • Online ISBN: 978-1-0716-1406-8

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics