Protein Modeling and Structural Prediction

  • Sebastian Kelm
  • Yoonjoo Choi
  • Charlotte M. Deane


Proteins perform crucial functions in every living cell. The genetic information in every organismʼs DNA encodes the proteinʼs amino acid sequence, which determines its three-dimensional structure, which, in turn, determines its function. In this postgenomic era, protein sequence information can be obtained relatively easily through experimental means. Sequence databases already contain millions of protein sequences and continue to grow. Structural information, however, is harder to obtain through experimental means – we currently know the structure of about 75000 proteins. Knowledge of a proteinʼs structure is extremely useful in understanding its molecular function and in developing drugs that bind to it. Thus, computational techniques have been developed to bridge the ever-increasing gap between the number of known protein sequences and structures.

In addition to proteins in general, this chapter discusses the specific importance of membrane proteins, which make up about one-third of all known proteins. Membrane proteins control communication and transport into and out of every living cell and are involved in many medically important processes. Over half of current drug targets are membrane proteins.

A brief introduction to protein sequence and structure is followed by an overview of common techniques used in the process of computational protein structure prediction. Emphasis is put on two particularly hard problems, namely protein loop modeling and the structural prediction of membrane proteins.


Dihedral Angle Protein Structure Prediction Steric Clash Template Protein Model Quality Assessment 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.





basic local alignment search tool


blocks of amino acids substitution matrix


critical assessment of techniques for protein structure prediction


deoxyribonucleic acid


define secondary structure of proteins


environment-specific substitution table


high accuracy


high coverage


model quality assessment program


protein data bank


predicted hydrophobic and transmembrane


qualitative model energy aNalysis


root-mean-square deviation


ribonucleic acid


scorematrx leading intramembrane




logistic regression


  1. 11.1.
    B. Alberts, A. Johnson, J. Lewis, M. Raff, K. Robets, P. Walter: Molecular Biology of the Cell, 4th edn. (Garland Science, New York 2002)Google Scholar
  2. 11.2.
    R.A. Laskowski, M.W. MacArthur, D.S. Moss, J.M. Thornton: PROCHECK: A program to check the stereochemical quality of protein structures, J. Appl. Cryst. 26, 283–291 (1993)CrossRefGoogle Scholar
  3. 11.3.
    C.B. Anfinsen: Principles that govern the folding of protein chains, Science 181(96), 223–230 (1973)CrossRefGoogle Scholar
  4. 11.4.
    C. Ramakrishnan, G.N. Ramachandran: Stereochemical criteria for polypeptide and protein chain conformations: II. Allowed conformations for a pair of peptide units, Biophys. J. 5, 909–933 (1965)CrossRefGoogle Scholar
  5. 11.5.
    W. Kabsch, C. Sander: Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features, Biopolymers 22(12), 2577–2637 (1983)CrossRefGoogle Scholar
  6. 11.6.
    J.M. Berg, J.L. Tymoczko, L. Stryer: Biochemistry, 5th edn. (Freeman, New York 2002)Google Scholar
  7. 11.7.
    G. von Heijne: The membrane protein universe: Whatʼs out there and why bother?, J. Intern. Med. 261(6), 543–557 (2007)CrossRefGoogle Scholar
  8. 11.8.
    D.J. Müller, N. Wu, K. Palczewski: Vertebrate membrane proteins: Structure, function, and insights from biophysical approaches, Pharmacol. Rev. 60(1), 43–78 (2008)CrossRefGoogle Scholar
  9. 11.9.
    S. Henikoff, J.G. Henikoff: Amino acid substitution matrices from protein blocks, Proc. Natl. Acad. Sci. USA 89(22), 10915–10919 (1992)CrossRefGoogle Scholar
  10. 11.10.
    D.T. Jones, W.R. Taylor, J.M. Thornton: A model recognition approach to the prediction of all-helical membrane protein structure and topology, Biochemistry 33(10), 3038–3049 (1994)CrossRefGoogle Scholar
  11. 11.11.
    P. Ng, J. Henikoff, S. Henikoff: PHAT: A transmembrane-specific substitution matrix, Bioinformatics 16(9), 760–766 (2000)CrossRefGoogle Scholar
  12. 11.12.
    T. Müller, S. Rahmann, M. Rehmsmeier: Non-symmetric score matrices and the detection of homologous transmembrane proteins, Bioinformatics 17(1), S182–S189 (2001)CrossRefGoogle Scholar
  13. 11.13.
    J. Shi, T.L. Blundell, K. Mizuguchi: FUGUE: Sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties, J. Mol. Biol. 310(1), 243–257 (2001)CrossRefGoogle Scholar
  14. 11.14.
    J.R. Hill, S. Kelm, J. Shi, C.M. Deane: Environment specific substitution tables improve membrane protein alignment, Bioinformatics 27(13), i15–i23 (2011)CrossRefGoogle Scholar
  15. 11.15.
    A. Kryshtafovych, C. Venclovas, K. Fidelis, J. Moult: Progress over the first decade of CASP experiments, Proteins 61(7), 225–236 (2005)CrossRefGoogle Scholar
  16. 11.16.
    J. Moult, K. Fidelis, A. Kryshtafovych, B. Rost, T. Hubbard, A. Tramontano: Critical assessment of methods of protein structure prediction – Round VII, Proteins 69, 3–9 (2007)CrossRefGoogle Scholar
  17. 11.17.
    J. Moult, K. Fidelis, A. Kryshtafovych, B. Rost, A. Tramontano: Critical assessment of methods of protein structure prediction – Round VIII, Proteins 9(77), 1–4 (2009)CrossRefGoogle Scholar
  18. 11.18.
    C.M. Deane, M. Dong, F.P. Huard, B.K. Lance, G.R. Wood: Cotranslational protein folding – fact or fiction?, Bioinformatics 23(13), i142–i148 (2007)CrossRefGoogle Scholar
  19. 11.19.
    J.J. Ellis, F.P.P. Huard, C.M. Deane, S. Srivastava, G.R. Wood: Directionality in protein fold prediction, BMC Bioinformatics 11(1), 172 (2010)CrossRefGoogle Scholar
  20. 11.20.
    B.R. Jefferys, L.A. Kelley, M.J.E. Sternberg: Protein folding requires crowd control in a simulated cell, J. Mol. Biol. 397(5), 1329–1338 (2010)CrossRefGoogle Scholar
  21. 11.21.
    R. Das, D. Baker: Macromolecular modeling with rosetta, Annu. Rev. Biochem. 77(1), 363–382 (2008)CrossRefGoogle Scholar
  22. 11.22.
    K.T. Simons, R. Bonneau, I. Ruczinski, D. Baker: Ab initio protein structure prediction of CASP III targets using ROSETTA, Proteins 37(3), 171–176 (1999)CrossRefGoogle Scholar
  23. 11.23.
    P. Bradley, L. Malmström, B. Qian, J. Schonbrun, D. Chivian, D.E. Kim, J. Meiler, K.M.S. Misura, D. Baker: Free modeling with Rosetta in CASP6, Proteins 61(7), 128–134 (2005)CrossRefGoogle Scholar
  24. 11.24.
    R. Das, B. Qian, S. Raman, R. Vernon, J. Thompson, P. Bradley, S. Khare, M.D. Tyka, D. Bhat, D. Chivian, D.E. Kim, W.H. Sheffler, L. Malmström, A.M. Wollacott, C. Wang, I. Andre, D. Baker: Structure prediction for CASP7 targets using extensive all-atom refinement with Rosetta@home, Proteins 69(S8), 118–128 (2007)CrossRefGoogle Scholar
  25. 11.25.
    S.F. Altschul, W. Gish, W. Miller, E.W. Myers, D.J. Lipman: Basic local alignment search tool, J. Mol. Biol. 215(3), 403–410 (1990)CrossRefGoogle Scholar
  26. 11.26.
    J. Söding, A. Biegert, A. Lupas: The HHpred interactive server for protein homology detection and structure prediction, Nucleic Acids Res. 33(2), W244–W248 (2005)CrossRefGoogle Scholar
  27. 11.27.
    C.M. Deane: Protein Structure Prediction: Amino Acid Propensities and Comparative Modelling (Univ. of Cambridge, Cambridge 2000)Google Scholar
  28. 11.28.
    R. Sánchez, A. Sali: Advances in comparative protein-structure modelling, Curr. Opin. Struct. Biol. 7(2), 206–214 (1997)CrossRefGoogle Scholar
  29. 11.29.
    A. Sali, T.L. Blundell: Comparative protein modelling by satisfaction of spatial restraints, J. Mol. Biol. 234(3), 779–815 (1993)CrossRefGoogle Scholar
  30. 11.30.
    M.J. Sutcliffe, I. Haneef, D. Carney, T.L. Blundell: Knowledge based modelling of homologous proteins, Part I: Three-dimensional frameworks derived from the simultaneous superposition of multiple structures, Protein Eng. 1(5), 377–384 (1987)CrossRefGoogle Scholar
  31. 11.31.
    P.A. Bates, L.A. Kelley, R.M. MacCallum, M.J. Sternberg: Enhancement of protein modeling by human intervention in applying the automatic programs 3D-JIGSAW and 3D-PSSM, Proteins 45(5), 39–46 (2001)CrossRefGoogle Scholar
  32. 11.32.
    T. Schwede, J. Kopp, N. Guex, M.C. Peitsch: SWISS-MODEL: An automated protein homology-modeling server, Nucleic Acids Res. 31(13), 3381–3385 (2003)CrossRefGoogle Scholar
  33. 11.33.
    D. Petrey, Z. Xiang, C.L. Tang, L. Xie, M. Gimpelev, T. Mitros, C.S. Soto, S. Goldsmith-Fischman, A. Kernytsky, A. Schlessinger, I.Y. Koh, E. Alexov, B. Honig: Using multiple structure alignments, fast model building, and energetic analysis in fold recognition and homology modeling, Proteins 53(6), 430–435 (2003)CrossRefGoogle Scholar
  34. 11.34.
    C.M. Deane, T.L. Blundell: CODA: A combined algorithm for predicting the structurally variable regions of protein models, Protein Sci. 10(3), 599–612 (2001)CrossRefGoogle Scholar
  35. 11.35.
    P. Koehl, M. Delarue: A self consistent mean field approach to simultaneous gap closure and side-chain positioning in homology modelling, Nat. Struct. Biol. 2(2), 163–170 (1995)CrossRefGoogle Scholar
  36. 11.36.
    G.G. Krivov, V.M. Shapovalov, L.R. Dunbrack: Improved prediction of protein side-chain conformations with SCWRL4, Proteins 77(4), 778–795 (2009)CrossRefGoogle Scholar
  37. 11.37.
    L.E. Donate, S.D. Rufino, L.H.J. Canard, T.L. Blundell: Conformational analysis and clustering of short and medium size loops connecting regular secondary structures: A database for modeling and prediction, Protein Sci. 5, 2600–2616 (1996)CrossRefGoogle Scholar
  38. 11.38.
    G.D. Rose: Prediction of chain turns in globular proteins on a hydrophobic basis, Nature 272, 586–590 (1978)CrossRefGoogle Scholar
  39. 11.39.
    I.P. Crawford, T. Niermann, K. Kirschner: Prediction of secondary structure by evolutionary comparison: Application to the α subunit of tryptophan synthase, Proteins 2(2), 118–129 (1987)CrossRefGoogle Scholar
  40. 11.40.
    J. Wojcik, J.-P. Mornon, J. Chomilier: New efficient statistical sequence-dependent structure prediction of short to medium-sized protein loops based on an exhaustive loop classification, J. Mol. Biol. 289, 1469–1490 (1999)CrossRefGoogle Scholar
  41. 11.41.
    N. Fernandez-Fuentes, B. Olivia, A. Fiser: A supersecondary structure library and search algorithm for modeling loops in protein structures, Nucleic Acids Res. 34, 2085–2097 (2006)CrossRefGoogle Scholar
  42. 11.42.
    N. Fernandez-Fuentes, J. Zhai, A. Fiser: ArchPRED: A template based loop structure prediction server, Nucleic Acids Res. 34, W173–W176 (2006)CrossRefGoogle Scholar
  43. 11.43.
    E. Michalsky, A. Goede, R. Preissner: Loops in proteins (LIP) – a comprehensive loop database for homology modeling, Protein Eng. 16, 979–985 (2003)CrossRefGoogle Scholar
  44. 11.44.
    A. Hildebrand, M. Remmert, A. Biegert, J. Söding: Fast and accurate automatic structure prediction with HHpred, Proteins 77(S9), 128–132 (2009)CrossRefGoogle Scholar
  45. 11.45.
    H. Peng, A. Yang: Modeling protein loops with knowledge-based prediction of sequence-structure alignment, Bioinformatics 23, 2836–2842 (2007)MathSciNetCrossRefGoogle Scholar
  46. 11.46.
    Y. Choi, C.M. Deane: FREAD revisited: Accurate loop structure prediction using a database search algorithm, Proteins 78(6), 1431–1440 (2010)Google Scholar
  47. 11.47.
    C.M. Deane, T.L. Blundell: A novel exhaustive search algorithm for predicting the conformation of polypeptide segments in proteins, Proteins 40, 135–144 (2000)CrossRefGoogle Scholar
  48. 11.48.
    S. Sucha, R.F. Dubose, C.J. March, S. Subashini: Modeling protein loops using a ϕ i+1, ψ i dimer database, Protein Sci. 4, 1412–1420 (1995)CrossRefGoogle Scholar
  49. 11.49.
    V.Z. Spassov, P.K. Flook, L. Yan: LOOPER: A molecular mechanics-based algorithm for protein loop prediction, Protein Eng. 21, 91–100 (2008)CrossRefGoogle Scholar
  50. 11.50.
    A.A. Canutescu, R.L. Dunbrack Jr.: Cyclic coordinate descent: A robotics algorithm for protein loop closure, Protein Sci. 12, 963–972 (2003)CrossRefGoogle Scholar
  51. 11.51.
    P.S. Shenkin, D.L. Yarmush, R.M. Fine, H. Wang, C. Levinthal: Predicting antibody hypervariable loop conformation. I. Ensembles of random conformations for ringlike structures, Biopolymers 26, 2053–2085 (1987)CrossRefGoogle Scholar
  52. 11.52.
    J. Lee, D. Lee, H. Park, E.A. Coutsias, C. Seok: Protein loop modeling by using fragment assembly and analytical loop closure, Proteins 78(16), 3428–3436 (2010)CrossRefGoogle Scholar
  53. 11.53.
    T. Hurst: Flexible 3D searching: The directed tweak technique, J. Chem. Inf. Comput. Sci. 34, 190–196 (1994)CrossRefGoogle Scholar
  54. 11.54.
    M.A. DePristo, P.I.W. de Bakker, S.C. Lovell, T.L. Blundell: Ab initio construction of polypeptide fragments: Efficient generation of accurate, representative ensembles, Proteins 51, 41–55 (2003)CrossRefGoogle Scholar
  55. 11.55.
    M.P. Jacobson, D.L. Pincus, C.S. Rapp, T.J.F. Day, B. Honig, D.E. Shaw, R.A. Friesner: A hierarchical approach to all-atom protein loop prediction, Proteins 55, 351–367 (2004)CrossRefGoogle Scholar
  56. 11.56.
    P. Benkert, S.C.E.C. Tosatto, D. Schomburg: QMEAN: A comprehensive scoring function for model quality assessment, Proteins 71(1), 261–277 (2007)CrossRefGoogle Scholar
  57. 11.57.
    P. Benkert, M. Kunzli, T. Schwede: QMEAN server for protein model quality estimation, Nucleic Acids Res. 37(2), W510–514 (2009)CrossRefGoogle Scholar
  58. 11.58.
    S. Kelm, J. Shi, C.M. Deane: MEDELLER: Homology-based coordinate generation for membrane proteins, Bioinformatics 26(22), 2833–2840 (2010)CrossRefGoogle Scholar
  59. 11.59.
    A. Elofsson, G. von Heijne: Membrane protein structure: Prediction versus reality, Annu. Rev. Biochem. 76(1), 125–140 (2007)CrossRefGoogle Scholar
  60. 11.60.
    M. Punta, L.R. Forrest, H. Bigelow, A. Kernytsky, J. Liu, B. Rost: Membrane protein prediction methods, Methods 41(4), 460–474 (2007)CrossRefGoogle Scholar
  61. 11.61.
    M.A. Lomize, A.L. Lomize, I.D. Pogozheva, H.I. Mosberg: OPM: Orientations of proteins in membranes database, Bioinformatics 22(5), 623–625 (2006)CrossRefGoogle Scholar
  62. 11.62.
    G.E. Tusnády, Z. Dosztányi, I. Simon: PDB: Selection and membrane localization of transmembrane proteins in the protein data bank, Nucleic Acids Res. 33(1), D275–D278 (2005)Google Scholar
  63. 11.63.
    K.A. Scott, P.J. Bond, A. Ivetac, A.P. Chetwynd, S. Khalid, M.S. Sansom: Coarse-grained MD simulations of membrane protein-bilayer self-assembly, Structure 16(4), 621–630 (2008)CrossRefGoogle Scholar
  64. 11.64.
    S. Kelm, J. Shi, C.M. Deane: iMembrane: Homology-based membrane-insertion of proteins, Bioinformatics 25(8), 1086–1088 (2009)CrossRefGoogle Scholar
  65. 11.65.
    L. Forrest, C. Tang, B. Honig: On the accuracy of homology modeling and sequence alignment methods applied to membrane proteins, Biophys. J. 91(2), 508–517 (2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2014

Authors and Affiliations

  1. 1.Department of StatisticsUniversity of OxfordOxfordUK
  2. 2.Department of Computer ScienceDartmouth CollegeHanoverUSA
  3. 3.Department of StatisticsUniversity of OxfordOxfordUK

Personalised recommendations