Navigating Among Known Structures in Protein Space

  • Aya Narunsky
  • Nir Ben-Tal
  • Rachel KolodnyEmail author
Part of the Methods in Molecular Biology book series (MIMB, volume 1851)


Present-day protein space is the result of 3.7 billion years of evolution, constrained by the underlying physicochemical qualities of the proteins. It is difficult to differentiate between evolutionary traces and effects of physicochemical constraints. Nonetheless, as a rule of thumb, instances of structural reuse, or focusing on structural similarity, are likely attributable to physicochemical constraints, whereas sequence reuse, or focusing on sequence similarity, may be more indicative of evolutionary relationships. Both types of relationships have been studied and can provide meaningful insights to protein biophysics and evolution, which in turn can lead to better algorithms for protein search, annotation, and maybe even design.

In broad strokes, studies of protein space vary in the entities they represent, the similarity measure comparing these entities, and the representation used. The entities can be, for example, protein chains, domains, supra-domains, or smaller protein sub-parts denoted themes. The measures of similarity between the entities can be based on sequence, structure, function, or any combination of these. The representation can be global, encompassing the whole space, or local, focusing on a particular region surrounding protein(s) of interest. Global representations include lists of grouped proteins, protein networks, and maps. Networks are the abstraction that is derived most directly from the similarity data: each node is the protein entity (e.g., a domain), and edges connect similar domains. Selecting the entities, the similarity measure, and the abstraction are three intertwined decisions: the similarity measures allow us to identify the entities, and the selection of entities influences what is a meaningful similarity measure. Similarly, we seek entities that are related to each other in a way, for which a simple representation describes their relationships succinctly and accurately. This chapter will cover studies that rely on different entities, similarity measures, and a range of representations to better understand protein structure space. Scholars may use publicly available navigators offering a global representation, and in particular the hierarchical classifications SCOP, CATH, and ECOD, or a local representation, which encompass structural alignment algorithms. Alternatively, scholars can configure their own navigator using existing tools. To demonstrate this DIY (do it yourself) approach for navigating in protein space, we investigate substrate-binding proteins. By presenting sequence similarities among this large and diverse protein family as a network, we can infer that one member (pdb ID 4ntl; of yet unknown function) may bind methionine and suggest a putative binding mechanism.

Key words

Protein space navigation Structure space Evolutionary relationships in protein space 


  1. 1.
    Kolodny R, Pereyaslavets L, Samson AO, Levitt M (2012) On the universe of protein folds. Annu Rev Biophys 42:559. Scholar
  2. 2.
    Ben-Tal N, Kolodny R (2014) Representation of the protein universe using classifications, maps, and networks. Israel J Chem 54:1286CrossRefGoogle Scholar
  3. 3.
    Zeldovich KB, Shakhnovich EI (2008) Understanding protein evolution: from protein physics to Darwinian selection. Annu Rev Phys Chem 59:105–127CrossRefGoogle Scholar
  4. 4.
    Trifonov EN, Berezovsky IN (2003) Evolutionary aspects of protein structure and folding. Curr Opin Struct Biol 13(1):110–114CrossRefGoogle Scholar
  5. 5.
    Choi IG, Kim SH (2006) Evolution of protein structural classes and protein sequence families. Proc Natl Acad Sci U S A 103(38):14056–14061. Scholar
  6. 6.
    Dokholyan NV, Shakhnovich B, Shakhnovich EI (2002) Expanding protein universe and its origin from the biological big bang. Proc Natl Acad Sci 99(22):14132–14136. Scholar
  7. 7.
    Alva V, Remmert M, Biegert A, Lupas AN, Söding J (2010) A galaxy of folds. Protein Sci 19(1):124–130. Scholar
  8. 8.
    Farías-Rico JA, Schmidt S, Höcker B (2014) Evolutionary relationship of two ancient protein superfolds. Nat Chem Biol 10(9):710–715. Scholar
  9. 9.
    Nepomnyachiy S, Ben-Tal N, Kolodny R (2017) Complex evolutionary footprints revealed in an analysis of reused protein segments of diverse lengths. Proc Natl Acad Sci U S A 114:11703CrossRefGoogle Scholar
  10. 10.
    Skolnick J, Arakaki AK, Lee SY, Brylinski M (2009) The continuity of protein structure space is an intrinsic property of proteins. Proc Natl Acad Sci 106:15690. Scholar
  11. 11.
    Nepomnyachiy S, Ben-Tal N, Kolodny R (2014) Global view of the protein universe. Proc Natl Acad Sci 111:11691. Scholar
  12. 12.
    Mackenzie CO, Zhou J, Grigoryan G (2016) Tertiary alphabet for the observable protein structural universe. Proc Natl Acad Sci U S A 113(47):E7438–E7447CrossRefGoogle Scholar
  13. 13.
    Kolodny R, Petrey D, Honig B (2006) Protein structure comparison: implications for the nature of ‘fold space’, and structure and function prediction. Curr Opin Struct Biol 16(3):393–398CrossRefGoogle Scholar
  14. 14.
    Osadchy M, Kolodny R (2011) Maps of protein structure space reveal a fundamental relationship between protein structure and function. Proc Natl Acad Sci 108(30):12301–12306. Scholar
  15. 15.
    Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) The Protein Data Bank. Nucleic Acids Res 28(1):235–242CrossRefGoogle Scholar
  16. 16.
    Koehl P (2006) Protein structure classification. In: Reviews in Computational Chemistry. John Wiley & Sons, Inc., New York, pp 1–55. Scholar
  17. 17.
    Ponting CP, Russell RR (2002) The natural history of protein domains. Annu Rev Biophys Biomol Struct 31(1):45–71. Scholar
  18. 18.
    Vogel C, Berzuini C, Bashton M, Gough J, Teichmann SA (2004) Supra-domains: evolutionary units larger than single protein domains. J Mol Biol 336(3):809–823. Scholar
  19. 19.
    Kolodny R, Koehl P, Guibas L, Levitt M (2002) Small libraries of protein fragments model native protein structures accurately. J Mol Biol 323(2):297–307CrossRefGoogle Scholar
  20. 20.
    Vanhee P, Verschueren E, Baeten L, Stricher F, Serrano L, Rousseau F, Schymkowitz J (2011) BriX: a database of protein building blocks for structural analysis, modeling and design. Nucleic Acids Res 39(Suppl 1):D435–D442CrossRefGoogle Scholar
  21. 21.
    Davis FP, Sali A (2005) PIBASE: a comprehensive database of structurally defined protein interfaces. Bioinformatics 21(9):1901–1907CrossRefGoogle Scholar
  22. 22.
    Vanhee P, Reumers J, Stricher F, Baeten L, Serrano L, Schymkowitz J, Rousseau F (2009) PepX: a structural database of non-redundant protein–peptide complexes. Nucleic Acids Res 38(Suppl 1):D545–D551PubMedPubMedCentralGoogle Scholar
  23. 23.
    Fernandez-Fuentes N, Dybas JM, Fiser A (2010) Structural characteristics of novel protein folds. PLoS Comput Biol 6(4):e1000750CrossRefGoogle Scholar
  24. 24.
    Ovchinnikov S, Park H, Varghese N, Huang P-S, Pavlopoulos GA, Kim DE, Kamisetty H, Kyrpides NC, Baker D (2017) Protein structure determination using metagenome sequence data. Science 355(6322):294–298CrossRefGoogle Scholar
  25. 25.
    Pieper U, Eswar N, Davis FP, Braberg H, Madhusudhan MS, Rossi A, Marti-Renom M, Karchin R, Webb BM, Eramian D (2006) MODBASE: a database of annotated comparative protein structure models and associated resources. Nucleic Acids Res 34(Suppl 1):D291–D295CrossRefGoogle Scholar
  26. 26.
    Lo Conte L, Ailey B, Hubbard TJP, Brenner SE, Murzin AG, Chothia C (2000) SCOP: a structural classification of proteins database. Nucleic Acids Res 28(1):257–259CrossRefGoogle Scholar
  27. 27.
    Orengo C, Michie A, Jones S, Jones D, Swindells M, Thornton J (1997) CATH-a hierarchic classification of protein domain structures. Structure 5(8):1093–1108CrossRefGoogle Scholar
  28. 28.
    Cheng H, Schaeffer RD, Liao Y, Kinch LN, Pei J, Shi S, Kim B-H, Grishin NV (2014) ECOD: an evolutionary classification of protein domains. PLoS Comput Biol 10(12):e1003926. Scholar
  29. 29.
    Lupas AN, Ponting CP, Russell RB (2001) On the evolution of protein folds: are similar motifs in different protein folds the result of convergence, insertion, or relics of an ancient peptide world? J Struct Biol 134(2–3):191–203CrossRefGoogle Scholar
  30. 30.
    Soding J (2005) Protein homology detection by HMM-HMM comparison. Bioinformatics 21(7):951–960CrossRefGoogle Scholar
  31. 31.
    Eddy SR (2009) A new generation of homology search tools based on probabilistic inference. Genome Inform 1:205–211Google Scholar
  32. 32.
    Alva V, Söding J, Lupas AN (2016) A vocabulary of ancient peptides at the origin of folded proteins. elife 4:e09410CrossRefGoogle Scholar
  33. 33.
    Kosloff M, Kolodny R (2008) Sequence-similar, structure-dissimilar protein pairs in the PDB. Proteins 71(2):891–902CrossRefGoogle Scholar
  34. 34.
    Narunsky A, Nepomnyachiy S, Ashkenazy H, Kolodny R, Ben-Tal N (2015) ConTemplate suggests possible alternative conformations for a query protein of known structure. Structure 23(11):2162–2170CrossRefGoogle Scholar
  35. 35.
    Holm L, Sander C (1996) Mapping the protein universe. Science 273(5275):595–603CrossRefGoogle Scholar
  36. 36.
    Skolnick J, Gao M, Zhou H (2014) On the role of physics and evolution in dictating protein structure and function. Israel J Chem 54(8–9):1176–1188CrossRefGoogle Scholar
  37. 37.
    Hasegawa H, Holm L (2009) Advances and pitfalls of protein structural alignment. Curr Opin Struct Biol 19(3):341–348CrossRefGoogle Scholar
  38. 38.
    Kolodny R, Koehl P, Levitt M (2005) Comprehensive evaluation of protein structure alignment methods: scoring by geometric measures. J Mol Biol 346(4):1173–1188CrossRefGoogle Scholar
  39. 39.
    Kolodny R, Linial N (2004) Approximate protein structural alignment in polynomial time. Proc Natl Acad Sci U S A 101(33):12201–12206CrossRefGoogle Scholar
  40. 40.
    Carugo O (2007) Recent progress in measuring structural similarity between proteins. Curr Protein Pept Sci 8(3):241CrossRefGoogle Scholar
  41. 41.
    Yanover C, Vanetik N, Levitt M, Kolodny R, Keasar C (2014) Redundancy-weighting for better inference of protein structural features. Bioinformatics 30(16):2295–2301CrossRefGoogle Scholar
  42. 42.
    Li W, Godzik A (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22(13):1658–1659CrossRefGoogle Scholar
  43. 43.
    Wang G, Dunbrack RL (2003) PISCES: a protein sequence culling server. Bioinformatics 19(12):1589–1591. Scholar
  44. 44.
    Choi I-G, Kim S-H (2007) Global extent of horizontal gene transfer. Proc Natl Acad Sci 104(11):4489–4494. Scholar
  45. 45.
    Orengo CA, Flores TP, Taylor WR, Thornton JM (1993) Identification and classification of protein fold families. Protein Eng 6(5):485–500. Scholar
  46. 46.
    Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR (2014) Pfam: the protein families database. Nucleic Acids Res 42:D222. Scholar
  47. 47.
    Pearl FMG, Sillitoe I, Orengo CA (2015) Protein structure classification. In: eLS. John Wiley & Sons, Ltd., New York. Scholar
  48. 48.
    Levitt M, Chothia C (1976) Structural patterns in globular proteins. Nature 261(5561):552–558CrossRefGoogle Scholar
  49. 49.
    Holland TA, Veretnik S, Shindyalov IN, Bourne PE (2006) Partitioning protein structures into domains: why is it so difficult? J Mol Biol 361(3):562–590CrossRefGoogle Scholar
  50. 50.
    Hadley C, Jones DT (1999) A systematic comparison of protein structure classifications: SCOP, CATH and FSSP. Structure 7(9):1099–1112CrossRefGoogle Scholar
  51. 51.
    Day R, Beck DAC, Armen RS, Daggett V (2003) A consensus view of fold space: combining SCOP, CATH, and the Dali Domain Dictionary. Protein Sci 12(10):2150–2160. Scholar
  52. 52.
    Marchler-Bauer A, Lu S, Anderson JB, Chitsaz F, Derbyshire MK, DeWeese-Scott C, Fong JH, Geer LY, Geer RC, Gonzales NR (2010) CDD: a conserved domain database for the functional annotation of proteins. Nucleic Acids Res 39(Suppl 1):D225–D229PubMedPubMedCentralGoogle Scholar
  53. 53.
    Kelley LA, Sternberg MJ (2015) Partial protein domains: evolutionary insights and bioinformatics challenges. Genome Biol 16(1):1–3. Scholar
  54. 54.
    Veretnik S, Gu J, Wodak S (2009) Identifying structural domains in proteins. In: Gu G, Bourne P (eds) Structural bioinformatics, 2nd edn. Wiley-Blackwell, Hoboken, NJ, pp 485–513Google Scholar
  55. 55.
    Schaeffer RD, Jonsson AL, Simms AM, Daggett V (2011) Generation of a consensus protein domain dictionary. Bioinformatics 27(1):46–54. Scholar
  56. 56.
    Csaba G, Birzele F, Zimmer R (2009) Systematic comparison of SCOP and CATH: a new gold standard for protein structure analysis. BMC Struct Biol 9(1):23CrossRefGoogle Scholar
  57. 57.
    Redfern OC, Harrison A, Dallman T, Pearl FM, Orengo CA (2007) CATHEDRAL: a fast and effective algorithm to predict folds and domain boundaries from multidomain protein structures. PLoS Comput Biol 3(11):e232. Scholar
  58. 58.
    Zhou H, Xue B, Zhou Y (2007) DDOMAIN: dividing structures into domains using a normalized domain–domain interaction profile. Protein Sci 16(5):947–955. Scholar
  59. 59.
    Alexandrov N, Shindyalov I (2003) PDP: protein domain parser. Bioinformatics 19(3):429–430. Scholar
  60. 60.
    Krishna SS, Grishin NV (2005) Structural drift: a possible path to protein fold change. Bioinformatics 21(8):1308–1310CrossRefGoogle Scholar
  61. 61.
    Pascual-García A, Abia D, Ortiz ÁR, Bastolla U (2009) Cross-over between discrete and continuous protein structure space: insights into automatic classification and networks of protein structures. PLoS Comput Biol 5(3):e1000331. Scholar
  62. 62.
    Edwards H, Deane CM (2015) Structural bridges through fold space. PLoS Comput Biol 11(9):e1004466CrossRefGoogle Scholar
  63. 63.
    Fox NK, Brenner SE, Chandonia J-M (2014) SCOPe: structural classification of proteins—extended, integrating SCOP and ASTRAL data and classification of new structures. Nucleic Acids Res 42(D1):D304–D309. Scholar
  64. 64.
    Andreeva A, Howorth D, Chothia C, Kulesha E, Murzin AG (2013) SCOP2 prototype: a new approach to protein structure mining. Nucleic Acids Res 42:D310. Scholar
  65. 65.
    Ellson J, Gansner E, Koutsofios L, North SC, Woodhull G (2001) Graphviz—open source graph drawing tools. In: International symposium on graph drawing. Springer, Heidelberg, pp 483–484Google Scholar
  66. 66.
    Prlić A, Bliven S, Rose PW, Bluhm WF, Bizon C, Godzik A, Bourne PE (2010) Pre-calculated protein structure alignments at the RCSB PDB website. Bioinformatics 26(23):2983–2985. Scholar
  67. 67.
    Krissinel E, Henrick K (2003) Protein structure comparison in 3D based on secondary structure matching (SSM) followed by C-alpha alignment, scored by a new structural similarity function. Proceedings of the 5th International Conference on Molecular Structural Biology, Vienna, vol. 88Google Scholar
  68. 68.
    Krissinel E, Henrick K (2004) Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. Acta Crystallogr D 60(Pt 12 Pt 1):2256–2268CrossRefGoogle Scholar
  69. 69.
    Madej T, Lanczycki CJ, Zhang D, Thiessen PA, Geer RC, Marchler-Bauer A (2014) MMDB and VAST+: tracking structural similarities between macromolecular complexes. Nucleic Acids Res D42:D297. Scholar
  70. 70.
    Mezulis S, Sternberg MJE, Kelley LA (2016) PhyreStorm: a web server for fast structural searches against the PDB. J Mol Biol 428(4):702–708. Scholar
  71. 71.
    Zhang Y, Skolnick J (2005) TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res 33(7):2302–2309. Scholar
  72. 72.
    Wiederstein M, Gruber M, Frank K, Melo F, Sippl Manfred J (2014) Structure-based characterization of multiprotein complexes. Structure 22(7):1063–1070. Scholar
  73. 73.
    Berezovsky IN, Guarnera E, Zheng Z (2017) Basic units of protein structure, folding, and function. Prog Biophys Mol Biol 128:85–99. Scholar
  74. 74.
    Menke M, Berger B, Cowen L (2008) Matt: local flexibility aids protein multiple structure alignment. PLoS Comput Biol 4(1):e10CrossRefGoogle Scholar
  75. 75.
    Shindyalov I, Bourne P (1998) Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng 11(9):739–747CrossRefGoogle Scholar
  76. 76.
    Ortiz A, Strauss C, Olmea O (2002) MAMMOTH (matching molecular models obtained from theory): an automated method for model comparison. Protein Sci 11(11):2606–2621CrossRefGoogle Scholar
  77. 77.
    Tung CH, Huang JW, Yang JM (2007) Kappa-alpha plot derived structural alphabet and BLOSUM-like substitution matrix for rapid search of protein structure database. Genome Biol 8(3):R31CrossRefGoogle Scholar
  78. 78.
    Budowski-Tal I, Nov Y, Kolodny R (2010) FragBag, an accurate representation of protein structure, retrieves structural neighbors from the entire PDB quickly and accurately. Proc Natl Acad Sci U S A 107(8):3481–3486. Scholar
  79. 79.
    Petrey D, Xiang Z, Tang CL, Xie L, Gimpelev M, Mitros T, Soto CS, Goldsmith-Fischman S, Kernytsky A, Schlessinger A, Koh IY, Alexov E, Honig B (2003) Using multiple structure alignments, fast model building, and energetic analysis in fold recognition and homology modeling. Proteins 53(Suppl 6):430–435. Scholar
  80. 80.
    Subbiah S, Laurents DV, Levitt M (1993) Structural similarity of DNA-binding domains of bacteriophage repressors and the globin core. Curr Biol 3(3):141–148CrossRefGoogle Scholar
  81. 81.
    Saito R, Smoot ME, Ono K, Ruscheinski J, Wang P-L, Lotia S, Pico AR, Bader GD, Ideker T (2012) A travel guide to Cytoscape plugins. Nat Methods 9(11):1069–1076CrossRefGoogle Scholar
  82. 82.
    Nepomnyachiy S, Ben-Tal N, Kolodny R (2015) CyToStruct: augmenting the network visualization of cytoscape with the power of molecular viewers. Structure 23(5):941–948CrossRefGoogle Scholar
  83. 83.
    Morris JH, Huang CC, Babbitt PC, Ferrin TE (2007) structureViz: linking Cytoscape and UCSF chimera. Bioinformatics 23(17):2345–2347. Scholar
  84. 84.
    Schrodinger, LLC (2010) The PyMOL molecular graphics system, Version 1.3r1. Schrodinger, LLC, New YorkGoogle Scholar
  85. 85.
    Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, Meng EC, Ferrin TE (2004) UCSF chimera—a visualization system for exploratory research and analysis. J Comput Chem 25(13):1605–1612CrossRefGoogle Scholar
  86. 86.
    Jmol: an open-source java viewer for chemical structure in 3D.
  87. 87.
    Humphrey W, Dalke A, Schulten K (1996) VMD: visual molecular dynamics. J Mol Graph 14(1):33–38CrossRefGoogle Scholar
  88. 88.
    Rose AS, Hildebrand PW (2015) NGL viewer: a web application for molecular visualization. Nucleic Acids Res 43(Web Server issue):W576–W579. Scholar
  89. 89.
    O’Donoghue SI, Goodsell DS, Frangakis AS, Jossinet F, Laskowski RA, Nilges M, Saibil HR, Schafferhans A, Wade RC, Westhof E (2010) Visualization of macromolecular structures. Nat Methods 7:S42–S55CrossRefGoogle Scholar
  90. 90.
    Berntsson RP-A, Smits SH, Schmitt L, Slotboom D-J, Poolman B (2010) A structural classification of substrate-binding proteins. FEBS Lett 584(12):2606–2617CrossRefGoogle Scholar
  91. 91.
    Radivojac P, Clark WT, Oron TR, Schnoes AM, Wittkop T, Sokolov A, Graim K, Funk C, Verspoor K, Ben-Hur A (2013) A large-scale evaluation of computational protein function prediction. Nat Methods 10(3):221–227CrossRefGoogle Scholar
  92. 92.
    Glaser F, Pupko T, Paz I, Bell RE, Bechor-Shental D, Martz E, Ben-Tal N (2003) ConSurf: identification of functional regions in proteins by surface-mapping of phylogenetic information. Bioinformatics 19(1):163–164CrossRefGoogle Scholar
  93. 93.
    Ashkenazy H, Abadi S, Martz E, Chay O, Mayrose I, Pupko T, Ben-Tal N (2016) ConSurf 2016: an improved methodology to estimate and visualize evolutionary conservation in macromolecules. Nucleic Acids Res 44(W1):W344–W350CrossRefGoogle Scholar
  94. 94.
    Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T (2003) Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13(11):2498–2504. Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Biochemistry and Molecular Biology, George S. Wise Faculty of Life SciencesTel Aviv UniversityTel AvivIsrael
  2. 2.Department of Computer ScienceUniversity of HaifaHaifaIsrael

Personalised recommendations