Abstract
The molecular basis of life rests on the activity of large biomolecules, mostly nucleic acids(DNA and RNA), carbohydrates, lipids, and proteins. While each of these molecules has itsrole, there is something special about proteins, as they are the lead performers of cellular functions.This was dramatized by Jacques Monod, who stated that “C’est à ce niveau d’ organisation chimique que gît, s’il y en a un, le secret de la vie,” i.e., that it is at this level of organization that lies the secret of life, if there is one [1]. To understand how these molecules function we first need to know their shapes; consequently, structural molecular biology has emerged as a new line of experimental research focused on revealing the structure of these biomolecules. This branch of biology has recently experienced a major uplift through the development of highthroughput structural studies, the structural genomics projects, aimed atdeveloping a comprehensive view of the protein structure universe. All these initiatives are expected to help us unravel the connections between the sequence, structure, and function of a protein. Experimental data at a molecular level are scarce, however; this has led to the development of many modeling initiatives to shed light on these connections. Probably the most famous is the study of the protein-folding problem — the “holy grail” for the structural biology community. Its elusive goal is to predict the detailed three-dimensional structure of a protein from its sequence as well as to decipher the sequence of events the protein goes through to reach its folded state. This chapter is dedicated to the first part of this task, namely the protein structure prediction problem. We structure prediction problem benefit from two different approaches to science, which differ in the importance they give to experimental data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Further Reading
Branden C, Tooze J. 1991. Introduction to protein structure. New York: Garland Publishing.
Creighton TE. 1993. Proteins. New York: W.H. Freeman & Co.
Taylor WR, May ACW, Brown NP, Aszodi A. 2001. Protein structure: geometry, topology and classification. Rep Prog Phys 64:517-590.
Marti-Renom MA, Stuart AC, Fiser A, Sanchez R, Melo F, Sali A. 2000. Comparative protein structure model-ing of genes and genomes. Annu Rev Biophys Biomol Struct 29:291-325.
Bonneau R, Baker D. 2001. Ab initio protein structure prediction: progress and prospects. Annu Rev Biophys Biomol Struct 30:173-189.
Dill KA, Bromberg S, Yue KZ, Fiebig KM, Yee DP, Thomas PD, Chan HS. 1995. Principles of protein fold-ing—a perspective from simple exact models. Protein Sci 4:561-602.
References
Monod J. 1973. Le hasard et la necessité. Paris: Seuil.
Levy Y, Wolynes PG, Onuchic JN. 2004. Protein topology determines binding mechanism. Proc Natl Acad Sci USA 101:511-516.
Plaxco KW, Simons KT, Baker D. 1998. Contact order, transition state placement and the refolding rates of single domain proteins. J Mol Biol 277:985-994.
Alm E, Baker D. 1999. Prediction of protein-folding mechanisms from free energy landscapes derived from native structures. Proc Natl Acad Sci USA 96:11305-11310.
Munoz V, Eaton WA. 1999. A simple model for calculating the kinetics of protein folding from three-dimensional structures. Proc Natl Acad Sci USA 96:11311-11316.
Alm E, Morozov AV, Kortemme T, Baker D. 2002. Simple physical models connect theory and experiments in protein folding kinetics. J Mol Biol 322:463-476.
Koehl P, Levitt M. 2002. Protein topology and stability defines the space of allowed sequences. Proc Natl Acad Sci USA 99:1280-1285.
Smalheiser NR. 2002. Informatics and hypothesis-driven research. EMBO Rep 3:702.
Kell DB, Oliver SG. 2003. Here is the evidence, now what is the hypothesis? The complementary role of induc-tive and hypothesis driven science in the post genomic era. Bioessays 26:99-105.
Liolios K, Mavrommatis K, Tavernarakis N, Kyrpides NC. 2007. The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata. Nucl Acids Res 36:D475-D479.
Bernstein FC, Koetzle TF, William G, Meyer DJ, Brice MD, Rodgers JR. 1977. The protein databank: a com-puter-based archival file for macromolecular structures. J Mol Biol 112:535-542.
Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H. 2000. The Protein Data Bank. Nucl Acids Res 28:235-242.
Schulz GE, Schirmer RH. 1979. Principles of protein structure. New York: Springer-Verlag.
Cantor CR, Schimmel PR. 1980. Biophysical chemistry: the conformation of biological macromolecules. New York: W.H. Freeman Company.
Branden C, Tooze J. 1991. Introduction to protein structure. New York: Garland Publishing.
Creighton TE. 1993. Proteins. New York: W.H. Freeman & Co.
Taylor WR, May ACW, Brown NP, Aszodi A. 2001. Protein structure: geometry, topology and classification. Rep Prog Phys 64:517-590.
Timberlake KC. 2004. General, organic, and biological chemistry: structures of life. San Francisco: Benjamin Cummings.
Brooks C, Karplus M, Pettitt M. 1988. Proteins: a theoretical perspective of dynamics, structure and thermody-namics. Adv Chem Phys 71:1-259.
Kendrew J, Dickerson R, Strandberg B, Hart R, Davies D, Philips D. 1960. Structure of myoglobin: a three dimensional Fourier synthesis at 2 angstrom resolution. Nature (London) 185:422-427.
Perutz M, Rossmann M, Cullis A, Muirhead G, Will G, North A. 1960. Structure of haemoglobin: a three-dimensional Fourier synthesis at 5.5 angstrom resolution, obtained by X-ray analysis. Nature (London) 185:416-422.
Levitt M, Chothia C. 1976. Structural patterns in globular proteins. Nature (London) 261:552-558.
Lesk AM, Chothia C. 1980. How different amino-acid sequences determine similar protein structures: the struc-ture and evolutionary dynamics of the globins. J Mol Biol 136:225-270.
Chothia C, Janin J. 1981. Relative orientation of close packed beta pleated sheets in proteins. Proc Nat Acad Sci USA 78:4146-4150.
Cohen FE, Sternberg MJE, Taylor WR. 1981. Analysis of the tertiary structure of protein beta sheet sand-wiches. J Mol Biol 148:253-272.
Chothia C, Janin J. 1982. Orthogonal packing of beta pleated sheets in proteins. Biochemistry 21:3955-3965.
Cohen FE, Sternberg MJE, Taylor WR. 1982. Analysis and prediction of the packing of aplha helices against a beta sheet in the tertiary structure of globular proteins. J Mol Biol 156:821-862.
Chou KC. 1995. A novel approach to predicting protein structural classes in a (20-1)-D amino acid composition space. Proteins: Struct Funct Genet 21:319-344.
Chou KC, Zhang CT. 1995. Prediction of protein structural classes. Crit Rev Biochem Molec Biol 30:275-349.
Bahar I, Atilgan AR, Jernigan RL, Erman B. 1997. Understanding the recognition of protein structural classes by amino acid composition. Proteins: Struct Funct Genet 29:172-185.
Liu WM, Chou KC. 1998. Prediction of protein structural classes by modified mahalanobis discriminant algo-rithm. J Prot Chem 17:209-217.
Chou KC, Liu WM, Maggiora GM, Zhang CT. 1998. Prediction and classification of domain structural classes. Proteins: Struct Funct Genet 31:97-103.
Cai YD, Li YX, Chou KC. 2000. Using neural networks for prediction of domain structural classes. Biochim Biophys Acta 1476:1-2.
Zhou GP, Assa-Munt N. 2001. Some insights into protein structural class prediction. Proteins: Struct Funct Genet 44:57-59.
Luo RY, Feng ZP, Liu JK. 2002. Prediction of protein structural class by amino acid and polypeptide composi-tion. Eur J Biochem 269:4219-4225.
Xiao X, Lin W-Z, Chou KC. 2008. Using grey dynamic modeling and pseudo amino acid composition to pre-dict protein structural classes. J Comput Chem 29:2018-2024.
Hutchinson EG, Thornton JM. 1993. The Greek key motif: extraction, classification and analysis. Protein Eng 6:233-245.
Meirovitch H. 2007. Recent developments in methodologies for calculating the entropy and free energy of bio-logical systems by computer simulation. Curr Opin Struct Biol 17:181-186.
Dill KA, Shortle D. 1991. Denatured states of proteins. Annu Rev Biochem 60:795-825.
Cozetto D, Tramontano A. 2005. Relationship between multiple sequence alignments and quality of protein comparative models. Proteins: Struct Funct Genet 58:151-157.
Chothia C, Lesk A. 1986. The relation betweeen the divergence of sequence and structure in proteins. EMBO J 5:823-826.
Flores TP, Orengo C, Moss DS, Thornton J. 1993. Comparison of conformation characteristics in structurally similar protein pairs. Protein Sci 2:1811-1826.
Russel RB, Saqi AS, Sayle RA, Bates PA, Sternberg MJE. 1997. Recognition of analogous and homologous protein folds: analysis of sequence and structure conservation. J Mol Biol 269:423-439.
Sauder JM, Arthur JW, Dunbrack RL. 2000. Large-scale comparison of protein sequence alignment algorithms with structure alignments. Proteins: Struct Funct Genet 40:6-22.
Lipman DJ, Pearson WR. 1985. Rapid and sensitive protein similarity searches. Science 227:1435-1441.
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local alignment search tool. J Mol Biol 215:403-410.
Pearson WR. 1995. Comparison of methods for searching protein sequence databases. Protein Sci 4:1145-1160.
Agarwal P, States DJ. 1998. Comparative accuracy of methods for protein sequence similarity search. Bioin-formatics 14:40-47.
Brenner SE, Chothia C, Hubbard TJ. 1998. Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships. Proc Nat Acad Sci USA 95:6073-6078.
Rost B. 1999. Twilight zone of protein sequence alignments. Protein Eng 12:85-94.
Park J, Karplus K, Barrett C, Hughey R, Haussler D, Hubbard T, Chothia C. 1998. Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. J Mol Biol 284:1201-1210.
Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucl Acids Res 25:3389-33402.
Eddy SR. 1996. Hidden Markov models. Curr Opin Struct Biol 6:361-365.
Jones DT. 1997. Progress in protein structure prediction. Curr Opin Struct Biol 7:377-387.
Marchler-Bauer A, Bryant SH. 1997. A measure of success in fold recognition. Trends Biochem Sci 22:236-240.
Levitt M. 1997. Competitive assessment of protein fold recognition and alignment accuracy. Proteins: Struct Funct Genet Suppl 1:92-104.
Godzik A. 2003. Fold recognition methods. Methods Biochem Anal 44:525-546.
Chothia C. 1992. One thousand fold families for the molecular biologist? Nature (London) 357:543.
Sali A, Blundell TL. 1993. Comparative protein modelling by satisfaction of spatial restraints. J Mol Biol 234:779-815.
Sanchez R, Sali A. 1997. Evaluation of comparative protein structure modelling by MODELLER-3. Proteins Suppl 1:50-58.
Lemer CMR, Rooman MJ, Wodak SJ. 1995. Protein structure prediction by threading methods: evaluation of current techniques. Proteins: Struct Funct Genet 23:337-355.
Marti-Renom MA, Stuart AC, Fiser A, Sanchez R, Melo F, Sali A. 2000. Comparative protein structure model-ing of genes and genomes. Annu Rev Biophys Biomol Struct 29:291-325.
Go N, Scheraga HA. 1970. Ring closure and local conformational deformations of chain molecules. Macro-molecules 3:178-187.
Palmer KA, Scheraga HA. 1991. Standard-geometry chains fitted to X-ray derived structures: validation of the rigid-geometry approximation, 1: chain closure through a limited search of loop conformations. J Comput Chem 12:505-526.
Wedemeyer WJ, Scheraga HA. 1999. Exact analytical loop closure in proteins using polynomial equations. J Comput Chem 20:819-844.
Bruccoleri RE, Karplus M. 1985. Chain closure with bond angle variations. Macromolecules 18:2767-2773.
Moult J, James MNG. 1986. An algorithm which predicts the conformation of short lengths of chain in proteins. J Mol Graphics 4:180.
Deane CM, Blundell TL. 2000. A novel exhaustive search algorithm for predicting the conformation of poly-peptide segments in proteins. Proteins: Struct Funct Genet 40:135-144.
Bruccoleri RE, Karplus M. 1990. Conformational sampling using high-temperature molecular dynamics. Bio-polymers 29:1847-1862.
Carlacci L, Englander SW. 1993. The Loop problem in proteins: a Monte-Carlo simulated annealing approach. Biopolymers 33:1271-1286.
Ring CS, Cohen FE. 1994. Conformational sampling of loop structures using genetic algorithms. Israel J Chem 34:245-252.
Zheng Q, Rosenfeld R, Vajda S, Delisi C. 1993. Loop closure via bond scaling and relaxation. J Comput Chem 14:556-565.
Zheng Q, Rosenfeld R, Delisi C, Kyle JD. 1994. Multiple copy sampling in protein loop modeling: computa-tional efficiency and sensitivity to dihedral angle perturbations. Protein Sci 3:493-506.
Lavalle SM, Finn PW, Kavraki LE, Latombe JC. 2000. A ramdomized kinematics-based approach to pharma-cophore-constrained conformational search and database screening. J Comput Chem 21:731-747.
Fine RM, Wang H, Shenkin PS, Yarmush DL, Levinthal C. 1996. Predicting antibody hyper-variable loop con-formations, II: minimization and molecular dynamics studies of mcp603 from many randomly generated loop conformations. Proteins: Struct Funct Genet 1:342-362.
Canutescu AA, Dunbrack RL. 2003. Cyclic coordinate descent: a robotics algorithm for protein loop closure. Protein Sci 12:963-972.
Jones TA, Thirup S. 1986. Using known substructures in protein model building and crystallography. EMBO J 5:819-822.
Fidelis K, Stern PS, Bacon D, Moult J. 1994. Comparison of systematic search and database methods for con-structing segments of protein-structure. Protein Eng 7:953-960.
Kolodny R, Guibas L, Levitt M, Koehl P. 2005. Inverse kinematics in biology: the protein loop closure prob-lem. Int J Rob Res 24:151-163.
Ponder JW, Richards FM. 1987. Tertiary templates for proteins: use of packing criteria in the enumeration of allowed sequences for different structural classes. J Mol Biol 193:775-791.
Dunbrack RL, Karplus M. 1994. Conformational-analysis of the backbone-dependent rotamer preferences of protein side-chains. Nat Struct Biol 1:334-340.
Pierce NA, Winfree E. 2002. Protein design is NP-hard. Protein Eng 15:779-782.
Chazelle B, Kingsfort C, Singh MA. 2004. A semi-definite programming approach to side-chain positioning with new rounding strategies. INFORMS J Comput 16:380-392.
Desmet J, Maeyer MD, Hazes B, Lasters I. 1992. The dead end elimination theorem and its use in protein side-chain positioning. Nature (London) 356:539-542.
Lasters I, Maeyer MD, Desmet J. 1995. Enhanced dead-end elimination in the search for the global minimum conformation of a collection of protein side chains. Protein Eng 8:815-822.
Goldstein RF. 1994. Efficient rotamer elimination applied to protein side-chains and related spin glasses. Bio-phys J 66:1335-1340.
Gordon DB, Mayo SL. 1998. Radical performance enhancements for combinatorial optimization algorithms based on the dead-end elimination theorem. J Comput Chem 19:1505-1514.
Looger LL, Hellinga HW. 2001. Generalized dead-end elimination algorithms make large-scale protein side-chain structure prediction tractable: implications for protein design and structural genomics. J Mol Biol 307:429-445.
Holm L, Sander C. 1991. Database algorithm for generating protein backbone and side-chain co-ordinates from a C-alpha trace: Application to model building and detection of co-ordinate errors. J Mol Biol 218:183-194.
Peterson RW, Dutton PL, Wand AJ. 2004. Improved side-chain prediction accuracy using an ab initio potential energy function and a very large rotamer library. Protein Sci 13:735-751.
Lu M, Dousis AD, Ma J. 2008. OPUS-Rota: a fast and accurate method for side-chain modeling. Protein Sci 17:1576-1585.
Xiang Z, Honig B. 2001. Extending the accuracy limits of prediction for side-chain conformations. J Mol Biol 311:421-430.
Samudrala R, Moult J. 1998. A graph theoretic algorithm for comparative modeling of protein structure. J Mol Biol 279:298-302.
Canutescu AA, Shelenkov AA, Dunbrack RL. 2003. A graph theory algorithm for rapid protein side-chain pre-diction. Protein Sci 12:2001-2014.
Dukka-Bahadur KC, Tomita E, Suzuki J, Akutsu T. 2005. Protein side-chain packing problem: a maximum edge-weigth clique algorithmic approach. J Bioinfo Comput Biol 3:103-126.
Koehl P, Delarue M. 1994. Application of a self consistent mean field theory to predict protein side-chains con-formation and estimate their conformational entropy. J Mol Biol 239:249-275.
Koehl P, Delarue M. 1996. Mean-field minimization methods for biological macromolecules. Curr Opin Struct Biol 6:222-226.
Koehl P, Delarue M. 1995. A self consistent mean field approach to simultaneous gap closure and side-chain positioning in homology modelling. Nat Struct Biol 2:163-170.
Levitt M, Lifson S. 1969. Refinement of protein conformations using a macromolecular energy minimization procedure. J Mol Biol 46:269-279.
Koehl P, Levitt M. 1999. A brighter future for protein structure prediction. Nat Struct Biol 6:108-111.
Venclovas C, Zemla A, Fidelis K, Moult J. 2003. Assessment of progress over the CASP experiments. Pro-teins: Struct Funct Genet 53:585-595.
Laskowski RA, Mc Arthur MW, Moss DS, Thornton J. 1993. PROCHECK: a program to check the stereo-chemical quality of protein structures. J Appl Cryst 26:283-291.
Hooft RW, Vriend G, Sander C, Abola EE. 1996. Errors in protein structures. Nature (London) 381:272.
Bowie JU, Lüthy R, Eisenberg D. 1991. A method to identify protein sequences that fold into a known three-dimensional structure. Science 253:164-170.
Lüthy R, Bowie JU, Eisenberg D. 1992. Assessment of protein models with three-dimensional profiles. Nature (London) 356:83-85.
Eisenberg D, Luthy R, Bowie JU. 1997. VERIFY3D, assessment of protein models with three-dimensional profiles. Methods Enzymol 277:396-404.
Sippl MJ. 1993. Recognition of errors in three-dimensional structures of proteins. Proteins: Struct Funct Genet 17:355-362.
Wiederstein M, Sippl MJ. 2007. ProSA-web: interactive web service for the recognition of errors in three-dimensional structures of proteins. Nucleic Acids Res 35:W407-W410.
Pawlowski M, Gajda MJ, Matlak R, Bujnicki JM. 2008. MetaMQAP: a meta-server for the quality assessment of protein models. BMC Bioinformatics 9:403.
Jones DT. 2001. Evaluating the potential of using fold-recognition models for molecular replacement. Acta Cryst D57:1428-1434.
Rossmann MG. 2001. Molecular replacement—historical background. Acta Crystallogr D Biol Crystallogr 57:1360-1366.
Ilari A, Savino C. 2008. Protein structure determination by x-ray crystallography. Methods Mol Biol 452:63-87.
Taylor G. 2003. The phase problem. Acta Crystallogr D Biol Crystallogr 59:1881-1890.
Friedberg I, Jaroszewski L, Ye Y, Godzik A. 2004. The interplay of fold recognition and experimental structure determination in structural genomics. Curr Opin Struct Biol 14:307-312.
Claude J-B, Suhre K, Notredame C, Claverie J-M, Abergel C. 2004. CaspR: a web server for automated mo-lecular replacement using homology modeling. Nucl Acids Res 32:W606-W609.
Giorgetti A, Raimondo D, Miele AE, Tramontano A. 2005. Evaluating the usefulness of protein structure mod-els for molecular replacement. Bioinformatics 21:72-76.
Qian B, Raman S, Das R, Bradley P, McCoy AJ, Read RJ, Baker D. 2007. High-resolution structure prediction and the crystallographic phase problem. Nature (London) 450:259-264.
Topf M, Sali A. 2005. Combining electron microscopy and comparative protein structure modeling. Curr Opin Struct Biol 15:578-585.
Zheng W, Doniach S. 2002. Protein structure prediction constrained by solution X-ray scattering data and struc-tural homology identification. J Mol Biol 316:173-187.
Chen SW, Pellequer JL. 2004. Identification of functionally important residues in proteins using comparative models. Curr Med Chem 11:595-605.
Skrabanek L, Saini HK, Bader GD, Enright AJ. 2008. Computational prediction of protein-protein interactions. Mol Biotechnol 38:1-17.
Hutchins C, Greer J. 1991. Comparative modeling of proteins in the design of novel renin inhibitors. Crit Rev Biochem Mol Biol 26:77-127.
Hillisch A, Pineda LF, Hilgenfeld R. 2004. Utility of homology models in the drug discovery process. Drug Discovery Today 9:659-669.
Rockey WM, Elcock AH. 2006. Structure selection for protein kinase docking and virtual screening: homology models or crystal structures? Curr Protein Pept Sci 7:437-457.
Villoutreix BO, Renault N, Lagorce D, Sperandio O, Montes M, Miteva MA. 2007. Free resources to assist structure-based virtual ligand screening experiments. Curr Protein Pept Sci 8:381-411.
Roessler CG, Hall BM, Anderson WJ, Ingram WM, Roberts SA, Montfort WR, Cordes MH. 2008. Transitive homology-guided structural studies lead to the discovery of Cro proteins with 40% sequence identity but differ-ent folds. Proc Nat Acad Sci USA 105:2343-2348.
Bradley P, Misura KM, Baker D. 2005. Toward high-resolution de novo structure prediction for small proteins. Science 309:1868-1871.
Bonneau R, Baker D. 2001. Ab initio protein structure prediction: progress and prospects. Annu Rev Biophys Biomol Struct 30:173-189.
Hardin C, Pogorelov TV, Luthey-Schulten Z. 2002. Ab initio protein structure prediction. Curr Opin Struct Biol 12:176-181.
Chivian D, Robertson T, Bonneau R, Baker D. 2003. Ab initio methods. Methods Biochem Anal 44:547-557.
Jauch R, Yeo HC, Kolatkar PR, Clarke ND. 2007. Assessment of CASP7 structure predictions for template free targets. Proteins: Struct Funct Genet 69(Suppl 8):57-67.
Dill KA, Ozkan SB, Welkl TR, Chodera JD, Voetz VA. 2007. The protein folding problem: when will it be solved? Curr Opin Struct Biol 17:342-346.
Zhang Y. 2008. Progress and challenges in protein structure prediction. Curr Opin Struct Biol 18:342-348.
Dill KA, Bromberg S, Yue KZ, Fiebig KM, Yee DP, Thomas PD, Chan HS. 1995. Principles of protein fold-ing—a perspective from simple exact models. Protein Sci 4:561-602.
Covell DG, Jernigan RL. 1990. Conformations of folded proteins in restricted space. Biochemistry 29:3287-3294.
Park BH, Levitt M. 1995. The complexity and accuracy of discrete state models of protein structure. J Mol Biol 249:493-507.
Lau KF, Dill K. 1989. A lattice statistical mechanics model of the conformational and sequence spaces of pro-teins. Macromolecules 22:3986-3997.
Shakhnovich EI, Gutin AM. 1993. Engineering of stable and fast-folding sequences of model proteins. Proc Natl Acad Sci USA 90:7195-7199.
Go N, Takemoti H. 1978. Resepctive roles of short-and long-range interactions in protein folding. Proc Nat Acad Sci USA 75:559-563.
Miyazawa S, Jernigan RL. 1985. Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation. Macromolecules 18:534-552.
Chan HS, Dill K. 1989. Compact polymers. Macromolecules 22:4559-4573.
Chan HS, Dill K. 1990. Origins of structure in globular proteins. Proc Nat Acad Sci USA 87:6388-6392.
Karplus M, McCammon JA. 2002. Molecular dynamics simulations of biomolecules. Nat Struct Biol 9:646-652.
Duan Y, Kollman PA. 1998. Pathways to a protein folding intermediate observed in a 1-microsecond simulation in aqueous solution. Science 282:740-744.
Pitera JW, Swope W. 2003. Understanding folding and design: replica-exchange simulations of "Trp-cage" miniproteins. Proc Nat Acad Sci USA 100:7587-7592.
Lei H, Wu C, Liu H, Duan Y. 2007. Folding free energy landscape of vllin headpiece subdomain from molecu-lar dynamic simulations. Proc Nat Acad Sci USA 104:4925-4930.
Zagrovic B, Snow CD, Shirts MR, Pande VS. 2002. Simulation of folding of a small alpha-helical protein in atomistic detail using worldwide-distributed computing. J Mol Biol 323:927-937.
Chou PY, Fasman GD. 1974. Conformational parameters for amino-acids in helical, beta-sheet, and random coil regions calculated from proteins. Biochemistry 13:211-222.
Garnier J, Osguthorpe D, Robson B. 1978. Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. J Mol Biol 120:97-120.
Heringa J. 2000. Computational methods for protein secondary structure prediction using multiple sequence alignments. Curr Protein Pept Sci 1:273-301.
Rost B. 2001. Review: protein secondary structure prediction continues to rise. J Struct Biol 134:204-218.
Rost B, Eyrich VA. 2001. EVA: large-scale analysis of secondary structure prediction. Proteins: Struct Funct Genet Suppl 5:192-199.
Rost B, Sander C. 1993. Prediction of protein secondary structure at better than 70% accuracy. J Mol Biol 232:584-599.
Montgomerie S, Sundararaj S, Gallin WJ, Wishart DS. 2006. Improving the accuracy of protein secondary structure prediction using structural alignment. BMC Bioinformatics 7:301.
Fain B, Levitt M. 2001. A novel method for sampling alpha-helical protein backbones. J Mol Biol 305:191-201.
Bradley P, Baker D. 2006. Improved beta-protein structure prediction by multilevel optimization of nonlocal strand pairings and local backbone conformation. Proteins: Struct Funct Genet 65:922-929.
Wu GA, Coutsias EA, Dill KA. 2008. Iterative assembly of helical proteins by optimal hydrophobic packing. Structure 16:1257-1266.
Orengo C, Bray J, Hubbard T, Lo Conte L, Sillitoe I. 1999. Analysis and assessment of ab initio three-dimensional prediction, secondary structure, and contacts prediction. Proteins: Struct Funct Genet 37:149-170.
Ortiz AR, Kolinski A, Skolnick J. 1998. Native-like topology assembly of small proteins using predicted re-straints in Monte Carlo folding simulations. Proc Nat Acad Sci USA 95:1020-1025.
Rohl CA, Strauss CE, Misura KM, Baker D. 2004. Protein structure prediction using Rosetta. Methods Enzymol 383:66-93.
Das R, Baker D. 2008. Macromolecular modeling with Rosetta. Annu Rev Biochem 77:363-382.
Lazaridis T, Karplus M. 2000. Effective energy functions for protein structure prediction. Currr Opin Struct Biol 10:139-145.
Huang ES, Samudrala R, Park BH. 2000. Scoring functions for ab initio protein structure prediction. Methods Mol Biol 143:223-245.
Ngan S-C, Hung LH, Liu T, Samudrala R. 2008. Scoring functions for de novo protein structure prediction revisited. Methods Mol Biol 413:243-281.
Roux B, Simonson T. 1999. Implicit solvent models. Biophys Chem 78:1-20.
Koehl P. 2006. Electrostatics calculations: latest methodological advances. Curr Opin Struct Biol 16:142-51.
Sippl M. 1990. Calculation of conformational ensembles from potentials of mean force: an approach to the knowledge-based prediction of local structures in globular proteins. J Mol Biol 1990:859-883.
Sippl M. 1993. Boltzmann’s principle, knowledge-based mean fields and protein folding: an approach to the computational determination of protein structures. J Comput Aided Mol Des 7:473-501.
Samudrala R, Moult J. 1998. An all-atom distance dependent conditional probability discriminatory function for protein structure prediction. J Mol Biol 275:895-916.
Moult J, Pedersen JT, Judson RS, Fidelis K. 1995. A large scale experiment to assess protein structure predic-tion methods. Proteins: Struct Funct Genet 23:R2-R4.
Subbiah S, Laurents DV, Levitt M. 1993. Structural similarity of DNA-binding domains of bacteriophage rep-ressors and the globin core. Curr Biol 3:141-148.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1.1 Electronic Supplementary material
Figure 1.1.
Amino acids: the building blocks of proteins. (A) Each amino acid has a mainchain (N, Cα, C, and O) on which is attached a sidechain schematically represented as R. The mainchain can itself be partitioned into three groups: the amino group, the central Cα group, and the carboxyl group. Note that even though the amino group and the carboxyl group are charged at neutral pH, the amino acid is neutral: we say that it is a zwitterion. (B) Amino acids in proteins are attached through planar peptide bonds, connecting atom C of the current residue to atom N of the following residue. Please visit http://extras.springer.com/ to view a high-resolution full-color version of this illustration. (PDF 2,785 KB)
Figure 1.2.
The three most common arrangements of secondary structure elements (SSE) found in proteins. (A) The regular α–helix is a right–handed helix, in which all residues adopt similar conformations. The α–helix is characterized by hydrogen bonds between the oxygen O of residue i, and the polar backbone hydrogen HN (bound to N) of residue i + 4. Note that all C=O and N–HN bonds are parallel to the main axis of the helix. (B) An anti-parallel β–sheet. Two strands (stretches of extended backbone segments) are running in an anti-parallel geometry. The atoms HN and O of residue i in the first strand hydrogen bond with the atoms O and HN of residue j in the opposite strand, respectively, while residues i + 1 and j + 1 face outward. (C) A parallel β–sheet. The two strands are parallel, and the atoms HN and O of residue i in the first strand hydrogen bond with the O of residue j and the HN of residue j + 2, respectively. The same alternating pattern of residues involved in hydrogen bonds with the opposite strand, and facing outward is observed in parallel and anti-parallel β–sheets. A strand can therefore be involved in two different sheets. For simplicity, sidechains and non-polar hydrogens are ignored. Figure drawn using Pymol (http://www.pymol.org). Please visit http://extras.springer.com/ to view a high-resolution full-color version of this illustration. (PDF 2,783 KB)
Figure 1.3.
The three main types of proteins. (A) Collagen is the main protein of connective tissues in animals and the most abundant protein in mammals, making up close to 30% of their body protein content. It is a fiber protein, with each fiber made up of three polypeptide strands possessing the conformation of left-handed helices. These three left-handed helices are twisted together into a right-handed coiled coil, a cooperative quaternary structure stabilized by numerous hydrogen bonds. (B) Bacteriorhodopsin is a mainly α–protein, containing seven helices, that crosses the membrane of a cell (a few lipids of the membrane are shown as a space-filling diagram in green). It serves as an ion pump, and is found in bacteria that can survive in high salt concentrations. (C) TIM is a globular protein that belongs to the α–β class. The protein chain alternates between β and α secondary structure type, giving rise to a barrel β–sheet in the center surrounded by a large ring of α-helix on the outside. This structure, first seen in the triose phosphate isomerase of chicken, has been observed in many unrelated proteins since then. Figure drawn using Pymol (http://www.pymol.org). Please visit http://extras.springer.com/ to view a high-resolution full-color version of this illustration. (PDF 2,792 KB)
Figure 1.7.
A self-consistent mean field (SCMF) approach to the problem of predicting sidechain conformation. (A) The multicopy approach. Let us assume that residue i in the protein of interest is a phenylalanine, and that this phenylalanine can adopt three possible conformations. A systematic enumeration of all possible sidechain conformations in the protein would require that all three conformations of phenylalanine i be considered. If the protein contains 100 residues, each with three possible conformations, the size of the corresponding conformational space is 3100, a number out of reach of modern computers. As an alternative, we construct a chimera molecule, where sidechains are represented as an ensemble of discrete conformation: phenylalanine i is now represented with 3 conformations, each with a weight P(i,j), such that the sum of the weights is 1. (B,C) The mean field. The chimera molecule considered contains all conformations of all sidechains in the proteins. The energy of conformation k for residue i includes the internal energy for conformation k, the energy of interaction of conformation k for i with the backbone, and all interactions with all conformations of the remaining sidechains of the protein, each weighted with their probabilities. (D) Updating the probabilities. The initial probabilities are chosen to be uniform. Using the equations given in (C) we get the energies of all conformations of all residues in the chimera protein. These energies are then used to update the probabilities of these conformations. We have shown that updating the probabilities using a Boltzmann law is equivalent to minimizing the total free energy of the chimera molecule [97]. The new probabilities are then used to compute new energies; this procedure is repeated until we reach convergence (“self-consistency”), i.e., when the probabilities and energies do not change anymore. For each residue, we choose the conformation with the resolution full-color version of this illustration. highest converged probability as its predicted conformation. Please visit http://extras.springer.com/ to view a high resolution full-color version of this illustration. (PDF 2,802 KB)
Figure 1.8.
Lattice model of a protein structure. The figure depicts an example of a compact selfavoiding structure of a protein chain of 27 “residues” on a regular cubic lattice. This structure contains 28 contacts between non-sequential residues (shown as dashed line). The total energy of this conformation is the sum of the energies over these contacts. Please visit http://extras.springer.com/ to view a high resolution full-color version of this illustration. (PDF 2,789 KB)
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Koehl, P. (2010). Protein Structure Prediction. In: Jue, T. (eds) Biomedical Applications of Biophysics. Handbook of Modern Biophysics, vol 3. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-60327-233-9_1
Download citation
DOI: https://doi.org/10.1007/978-1-60327-233-9_1
Published:
Publisher Name: Humana Press, Totowa, NJ
Print ISBN: 978-1-60327-232-2
Online ISBN: 978-1-60327-233-9
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)