Skip to main content

Protein Structure Databases

  • Protocol
  • First Online:
Data Mining Techniques for the Life Sciences

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1415))

Abstract

Web-based protein structure databases come in a wide variety of types and levels of information content. Those having the most general interest are the various atlases that describe each experimentally determined protein structure and provide useful links, analyses, and schematic diagrams relating to its 3D structure and biological function. Also of great interest are the databases that classify 3D structures by their folds as these can reveal evolutionary relationships which may be hard to detect from sequence comparison alone. Related to these are the numerous servers that compare folds—particularly useful for newly solved structures, and especially those of unknown function. Beyond these are a vast number of databases for the more specialized user, dealing with specific families, diseases, structural features, and so on.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bernstein FC, Koetzle TF, Williams GJ et al (1977) The Protein Data Bank: a computer-based archival file for macromolecular structures. J Mol Biol 112:535–542

    Article  CAS  PubMed  Google Scholar 

  2. Berman HM, Westbrook J, Feng Z et al (2000) The Protein Data Bank. Nucleic Acids Res 28:235–242

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Berman H, Henrick K, Nakamura H (2003) Announcing the worldwide Protein Data Bank. Nat Struct Biol 10:980

    Article  CAS  PubMed  Google Scholar 

  4. Berman HM, Kleywegt GJ, Nakamura H, Markley JL (2012) The future of the protein data bank. Biopolymers 99:218–222

    Article  PubMed  PubMed Central  Google Scholar 

  5. Westbrook JD, Fitzgerald PM (2003) The PDB format, mmCIF, and other data formats. Methods Biochem Anal 44:161–179

    CAS  PubMed  Google Scholar 

  6. Westbrook J, Ito N, Nakamura H, Henrick K, Berman HM (2005) PDBML: the representation of archival macromolecular structure data in XML. Bioinformatics 21:988–992

    Article  CAS  PubMed  Google Scholar 

  7. Henrick K, Feng Z, Bluhm WF et al (2008) Remediation of the protein data bank archive. Nucleic Acids Res 36:D426–D433

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Velankar S, Dana JM, Jacobsen J et al (2013) SIFTS: structure integration with function, taxonomy and sequences resource. Nucleic Acids Res 41:D483–D489

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Read RJ, Adams PD, Arendall WB 3rd et al (2011) A new generation of crystallographic validation tools for the protein data bank. Structure 19:1395–1412

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Montelione GT, Nilges M, Bax A et al (2013) Recommendations of the wwPDB NMR Validation Task Force. Structure 21:1563–1570

    Article  CAS  PubMed  Google Scholar 

  11. Henderson R, Sali A, Baker ML et al (2012) Outcome of the first electron microscopy validation task force meeting. Structure 20:205–214

    Article  CAS  PubMed  Google Scholar 

  12. Brändén C-I, Jones TA (1990) Between objectivity and subjectivity. Nature 343:687–689

    Article  Google Scholar 

  13. Hooft RW, Vriend G, Sander C, Abola EE (1996) Errors in protein structures. Nature 381:272

    Article  CAS  PubMed  Google Scholar 

  14. Kleywegt GJ (2000) Validation of protein crystal structures. Acta Crystallogr D Biol Crystallogr 56:249–265

    Article  CAS  PubMed  Google Scholar 

  15. Laskowski RA (2009) Structural quality assurance. In: Gu J, Bourne PE (eds) Structural bioinformatics, 2nd edn. Wiley, New Jersey, pp 341–375

    Google Scholar 

  16. Brown EN, Ramaswamy S (2007) Quality of protein crystal structures. Acta Crystallogr D Biol Crystallogr 63:941–950

    Article  CAS  PubMed  Google Scholar 

  17. Krissinel E, Henrick K (2007) Inference of macromolecular assemblies from crystalline state. J Mol Biol 372:774–797

    Article  CAS  PubMed  Google Scholar 

  18. Rose PW, Prlic A, Bi C et al (2015) The RCSB Protein Data Bank: views of structural biology for basic and applied research and education. Nucleic Acids Res 43:D345–D356

    Article  PubMed  PubMed Central  Google Scholar 

  19. Finn RD, Tate J, Mistry J et al (2008) The Pfam protein families database. Nucleic Acids Res 36:D281–D288

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Murzin AG, Brenner SE, Hubbard T, Chothia C (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 247:536–540

    CAS  PubMed  Google Scholar 

  21. Lovell SC, Davis IW, Arendall WB 3rd et al (2003) Structure validation by Calpha geometry: phi, psi and Cbeta deviation. Proteins 50:437–450

    Article  CAS  PubMed  Google Scholar 

  22. Kleywegt GJ, Harris MR, Zou JY, Taylor TC, Wahlby A, Jones TA (2004) The Uppsala Electron-Density Server. Acta Crystallogr D Biol Crystallogr 60:2240–2249

    Article  PubMed  Google Scholar 

  23. Moreland JL, Gramada A, Buzko OV, Zhang Q, Bourne PE (2005) The Molecular Biology Toolkit (MBT): a modular platform for developing molecular visualization applications. BMC Bioinformatics 6:21

    Article  PubMed  PubMed Central  Google Scholar 

  24. Stierand K, Maass PC, Rarey M (2006) Molecular complexes at a glance: automated generation of two-dimensional complex diagrams. Bioinformatics 22:1710–1716

    Article  CAS  PubMed  Google Scholar 

  25. Goodsell DS, Dutta S, Zardecki C, Voigt M, Berman HM, Burley SK (2015) The RCSB PDB "Molecule of the Month": inspiring a molecular view of biology. PLoS Biol 13, e1002140

    Article  PubMed  PubMed Central  Google Scholar 

  26. Gutmanas A, Alhroub Y, Battle GM et al (2014) PDBe: Protein Data Bank in Europe. Nucleic Acids Res 42:D285–D291

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Krissinel E, Henrick K (2004) Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. Acta Crystallogr D Biol Crystallogr 60:2256–2268

    Article  CAS  PubMed  Google Scholar 

  28. Golovin A, Henrick K (2008) MSDmotif: exploring protein sites and motifs. BMC Bioinformatics 9:312

    Article  PubMed  PubMed Central  Google Scholar 

  29. Golovin A, Henrick K (2009) Chemical substructure search in SQL. J Chem Inf Model 49:22–27

    Article  CAS  PubMed  Google Scholar 

  30. Reichert J, The SJ, IMB (2002) Jena Image Library of Biological Macromolecules: 2002 update. Nucleic Acids Res 30:253–254

    Article  PubMed  PubMed Central  Google Scholar 

  31. Orengo CA, Michie AD, Jones S, Jones DT, Swindells MB, Thornton JM (1997) CATH: a hierarchic classification of protein domain structures. Structure 5:1093–1108

    Article  CAS  PubMed  Google Scholar 

  32. Laskowski RA, Hutchinson EG, Michie AD, Wallace AC, Jones ML, Thornton JM (1997) PDBsum: a web-based database of summaries and analyses of all PDB structures. Trends Biochem Sci 22:488–490

    Article  CAS  PubMed  Google Scholar 

  33. de Beer TA, Berka K, Thornton JM, Laskowski RA (2014) PDBsum additions. Nucleic Acids Res 42:D292–D296

    Article  PubMed  PubMed Central  Google Scholar 

  34. Laskowski RA, MacArthur MW, Moss DS, Thornton JM (1993) PROCHECK - a program to check the stereochemical quality of protein structures. J Appl Crystallogr 26:283–291

    Article  CAS  Google Scholar 

  35. Laskowski RA (2007) Enhancing the functional annotation of PDB structures in PDBsum using key figures extracted from the literature. Bioinformatics 23:1824–1827

    Article  CAS  PubMed  Google Scholar 

  36. Porter CT, Bartlett GJ, Thornton JM (2004) The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data. Nucleic Acids Res 32:D129–D133

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Sigrist CJ, de Castro E, Cerutti L et al (2012) New and continuing developments at PROSITE. Nucleic Acids Res 41:D344–D347

    Article  PubMed  PubMed Central  Google Scholar 

  38. Glaser F, Pupko T, Paz I et al (2003) ConSurf: identification of functional regions in proteins by surface-mapping of phylogenetic information. Bioinformatics 19:163–164

    Article  CAS  PubMed  Google Scholar 

  39. Wallace AC, Laskowski RA, Thornton JM (1995) LIGPLOT: a program to generate schematic diagrams of protein-ligand interactions. Protein Eng 8:127–134

    Article  CAS  PubMed  Google Scholar 

  40. Luscombe NM, Laskowski RA, Thornton JM (1997) NUCPLOT: a program to generate schematic diagrams of protein-nucleic acid interactions. Nucleic Acids Res 25:4940–4945

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Pakseresht N, Alako B, Amid C et al (2014) Assembly information services in the European Nucleotide Archive. Nucleic Acids Res 42:D38–D43

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Biasini M, Bienert S, Waterhouse A et al (2014) SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information. Nucleic Acids Res 42:W252–W258

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Kiefer F, Arnold K, Kunzli M, Bordoli L, Schwede T (2009) The SWISS-MODEL Repository and associated resources. Nucleic Acids Res 37:D387–D392

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Pieper U, Webb BM, Dong GQ et al (2014) ModBase, a database of annotated comparative protein structure models and associated resources. Nucleic Acids Res 42:D336–D346

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Moult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A (2014) Critical assessment of methods of protein structure prediction (CASP)--round x. Proteins 82(Suppl 2):1–6

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Marsden RL, Ranea JA, Sillero A et al (2006) Exploiting protein structure data to explore the evolution of protein function and biological complexity. Philos Trans R Soc Lond B Biol Sci 361:425–440

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Das S, Lee D, Sillitoe I, Dawson NL, Lees JG, Orengo CA (2015) Functional classification of CATH superfamilies: a domain-based approach for protein function annotation. Bioinformatics 31(21):3460–3467

    Article  PubMed  PubMed Central  Google Scholar 

  48. Jefferson ER, Walsh TP, Barton GJ (2008) A comparison of SCOP and CATH with respect to domain-domain interactions. Proteins 70:54–62

    Article  CAS  PubMed  Google Scholar 

  49. Kolodny R, Petrey D, Honig B (2006) Protein structure comparison: implications for the nature of 'fold space', and structure and function prediction. Curr Opin Struct Biol 16:393–398

    Article  CAS  PubMed  Google Scholar 

  50. Prakash A, Bateman A (2015) Domain atrophy creates rare cases of functional partial protein domains. Genome Biol 16:88

    Article  PubMed  PubMed Central  Google Scholar 

  51. Orengo CA, Jones DT, Thornton JM (1994) Protein superfamilies and domain superfolds. Nature 372:631–634

    Article  CAS  PubMed  Google Scholar 

  52. Novotny M, Madsen D, Kleywegt GJ (2004) Evaluation of protein fold comparison servers. Proteins 54:260–270

    Article  CAS  PubMed  Google Scholar 

  53. Carugo O (2006) Rapid methods for comparing protein structures and scanning structure databases. Curr Bioinformatics 1:75–83

    Article  CAS  Google Scholar 

  54. Joosten RP, Long F, Murshudov GN, Perrakis A (2014) The PDB_REDO server for macromolecular structure model optimization. IUCrJ 1:213–220

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Madej T, Lanczycki CJ, Zhang D et al (2014) MMDB and VAST+: tracking structural similarities between macromolecular complexes. Nucleic Acids Res 42:D297–D303

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. OCA, a browser-database for protein structure/function. 1996. (Accessed at http://oca.weizmann.ac.il)

  57. Kinjo AR, Suzuki H, Yamashita R et al (2012) Protein Data Bank Japan (PDBj): maintaining a structural data archive and resource description framework format. Nucleic Acids Res 40:D453–D460

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Bates PA, Kelley LA, MacCallum RM, Sternberg MJ (2001) Enhancement of protein modeling by human intervention in applying the automatic programs 3D-JIGSAW and 3D-PSSM. Proteins Suppl 5:39–46

    Google Scholar 

  59. Nielsen M, Lundegaard C, Lund O, Petersen TN (2010) CPHmodels-3.0--remote homology modeling using structure-guided sequence profiles. Nucleic Acids Res 38:W576–W581

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Lambert C, Leonard N, De Bolle X, Depiereux E (2002) ESyPred3D: Prediction of proteins 3D structures. Bioinformatics 18:1250–1256

    Article  CAS  PubMed  Google Scholar 

  61. Haas J, Roth S, Arnold K, et al (2013) The Protein Model Portal--a comprehensive resource for protein structure and model information. Database (Oxford) 2013;2013:bat031

    Google Scholar 

  62. Sillitoe I, Lewis TE, Cuff A et al (2015) CATH: comprehensive structural and functional annotations for genome sequences. Nucleic Acids Res 43:D376–D381

    Article  PubMed  PubMed Central  Google Scholar 

  63. Andreeva A, Howorth D, Chothia C, Kulesha E, Murzin AG (2014) SCOP2 prototype: a new approach to protein structure mining. Nucleic Acids Res 42:D310–D314

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Prlic A, Bliven S, Rose PW et al (2010) Pre-calculated protein structure alignments at the RCSB PDB website. Bioinformatics 26:2983–2985

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Holm L, Rosenstrom P (2010) Dali server: conservation mapping in 3D. Nucleic Acids Res 38:W545–W549

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Marti-Renom MA, Pieper U, Madhusudhan MS et al (2007) DBAli tools: mining the protein structure space. Nucleic Acids Res 35:W393–W397

    Article  PubMed  PubMed Central  Google Scholar 

  67. Kawabata T (2003) MATRAS: a program for protein 3D structure comparison. Nucleic Acids Res 31:3367–3369

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Martin AC (2000) The ups and downs of protein topology; rapid comparison of protein structure. Protein Eng 13:829–837

    Article  CAS  PubMed  Google Scholar 

  69. Fox NK, Brenner SE, Chandonia JM (2014) SCOPe: Structural Classification of Proteins--extended, integrating SCOP and ASTRAL data and classification of new structures. Nucleic Acids Res 42:D304–D309

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Wang G, Dunbrack RL Jr (2003) PISCES: a protein sequence culling server. Bioinformatics 19:1589–1591

    Article  CAS  PubMed  Google Scholar 

  71. Levy ED, Pereira-Leal JB, Chothia C, Teichmann SA (2006) 3D complex: a structural classification of protein complexes. PLoS Comput Biol 2, e155

    Article  PubMed  PubMed Central  Google Scholar 

  72. Flores S, Echols N, Milburn D et al (2006) The Database of Macromolecular Motions: new features added at the decade mark. Nucleic Acids Res 34:D296–D301

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Lomize MA, Lomize AL, Pogozheva ID, Mosberg HI (2006) OPM: orientations of proteins in membranes database. Bioinformatics 22:623–625

    Article  CAS  PubMed  Google Scholar 

  74. Lai YL, Chen CC, Hwang JK (2012) pKNOT v. 2: the protein KNOT web server. Nucleic Acids Res 40:W228–W231

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. Kolesov G, Virnau P, Kardar M, Mirny LA (2007) Protein knot server: detection of knots in protein structures. Nucleic Acids Res 35:W425–W428

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgments

The author would like to thank Tom Oldfield for useful comments on this chapter.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Roman A. Laskowski .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Science+Business Media New York

About this protocol

Cite this protocol

Laskowski, R.A. (2016). Protein Structure Databases. In: Carugo, O., Eisenhaber, F. (eds) Data Mining Techniques for the Life Sciences. Methods in Molecular Biology, vol 1415. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-3572-7_2

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-3572-7_2

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-3570-3

  • Online ISBN: 978-1-4939-3572-7

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics