Abstract
The post-genomic era has witnessed an explosion of protein sequences in the public databases; but this has not been complemented by the availability of genome-wide structure and function information, due to the technical difficulties and labor expenses incurred by existing experimental techniques. The rapid advancements in computer-based protein structure prediction methods have enabled automated and yet reliable methods for generating three-dimensional (3D) structural models of proteins. Genome-scale structure prediction experiments have been conducted by a number of groups, starting as early as in 1997, and some noteworthy efforts have been made using the MODELLER and ROSETTA methods. Along another line, TOUCHSTONE was used to predict the structures of all 85 small proteins in the Mycoplasma genitalium genome, which established template-refinement-based structure prediction as a practical approach for genome-scale experiments. This was followed by the development of Threading ASSEmbly Refinement (TASSER) and Iterative Threading ASSEmbly Refinement (I-TASSER) algorithms which use a combination of various approaches for threading, fragment assembly, ab initio loop modeling, and structural refinement to predict the structures. A successful structural prediction for all medium-sized open reading frames (ORFs) in the Escherichia coli genome was demonstrated by this method, achieving high-accuracy models for 920 out of 1,360 proteins. G protein-coupled receptors (GPCRs) are an extremely important class of membrane proteins for which only very few structures are available in the Protein Data Bank (PDB). TASSER was used to predict the structures of all 907 putative GPCRs in the human genome, and the high accuracy confirmed by newly solved GPCR structures and recent blind tests have demonstrated the usefulness and robustness of the TASSER/I-TASSER models for the functional annotation of GPCRs. Recently, the I-TASSER protein structure prediction method has been used as a basis for functional annotation of protein sequences. The increasing popularity and need for such automated structure and function prediction algorithms can be judged by the fact that the I-TASSER server has generated structure predictions for 35,000 proteins submitted by more than 8,000 users from 86 countries in the last 24 months. The success of these modeling experiments demonstrates significant new progress in high-throughput and genome-wide protein structure prediction.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aloy P, Querol E, Aviles F, Sternberg J (2001) Automated structure based prediction of functional sites in proteins: applications to assessing the validity of inheriting protein function from homology in genome annotation and to protein docking. J Mol Biol 311:395–408
Altschul S, Madden T, Schaffer A, Zhang J, Zhang Z, Miller W, Lipman D (1997) Gapped BLAST and PSI_BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402
Anfinsen CB (1973) Principles that govern the folding of protein chains. Science 181:223–230
Baker D, Sali A (2001) Protein structure prediction and structural genomics. Science 294:93–96
Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. (2000) The Protein Data Bank. Nucleic Acids Res 28:235–242
Blattner F, III GP, Bloch C, Perna N, Burland V, Riley M, Collado-Vides J, Glasner J, Rode C, Mayhew G and others (1997) The complete genome sequence of E. coli K-12. Science 277:1453–1474
Bowie JU, Luthy R, Eisenberg D (1991) A method to identify protein sequences that fold into a known three-dimensional structure. Science 253(5016):164–170
Bradley P, Misuara K, Baker D (2005) Towards high-resolution de novo structure prediction for small proteins. Science 309:1868–1871
Caffrey M (2003) Membrane protein crystallization. J Struct Biol 142:108–132
Canutescu AA, Shelenkov AA, Dunbrack RL Jr (2003) A graph-theory algorithm for rapid protein side-chain prediction. Protein Sci 12:2001–2014
Chandonia J, Brenner S (2006) The impact of structural genomics: expectations and outcomes. Science 311:347–351
Chen H, Zhou HX (2005a) Prediction of solvent accessibility and sites of deleterious mutations from protein sequence. Nucleic Acids Res 33(10):3193–3199
Chen J, Kuei C, Sutton S, Wilson S, Yu J, Kamme F, Mazur C, Lovenberg T, Liu C (2005b) Identification and pharmacological characterization of prokineticin 2beta as a selective ligand for prokineticin receptor 1. Mol Pharmacol 67:2070–2076
Cheng J, Baldi P (2005) Three-stage prediction of protein beta-sheets by neural networks, alignments and graph algorithms. Bioinformatics 21(Suppl 1):i75–84
Cheng J, Baldi P (2007) Improved residue contact prediction using support vector machines and a large feature set. BMC Bioinformatics 8:113
Cherezov V, Rosenbaum DM, Hanson MA, Rasmussen SG, Thian FS, Kobilka TS, Choi HJ, Kuhn P, Weis WI, Kobilka BK, others (2007) High-resolution crystal structure of an engineered human beta2-adrenergic G protein-coupled receptor. Science 318(5854):1258–1265
Drews J (2000) Drug discovery: a historical perspective. Science 287(5460):1960–1964
Du P, Salon JA, Tamm JA, Hou C, Cui W, Walker MW, Adham N, Dhanoa DS, Islam I, Vaysse PJ, others (1997) Modeling the G-protein-coupled neuropeptide Y Y1 receptor agonist and antagonist binding sites. Protein Eng 10:109–117
Fischer D, Eisenberg D (1997) Assigning folds to the proteins encoded by the genome of Mycoplasma genitalium. Proc Natl Acad Sci 94:11929–11934
Fiser A, Do RK, Sali A (2000) Modeling of loops in protein structures. Protein Sci 9:1753–1773
Flower DR (1999) Modelling G-protein-coupled receptors for drug design. Biochim Biophys Acta 1422:207–234
Fraser C, Gocayne J, White O, Adams M, Clayton R, Fleischmann R, Bult C, Kerlavage A, Sutton G, Kelley J, others (1995) The minimal gene complement of Mycoplasma genitalium. Science 270:397–403
Frishman D, Argos P (1995) Knowledge-based protein secondary structure assignment. Proteins 23:566–579
Gerstein M, Edwards A, Arrowsmith C, Montelione G (2003) Structural genomics: Current progress. Science 299(5613):1663
Granier S, Kim S, Shafer AM, Ratnala VR, Fung JJ, Zare RN, Kobilka B (2007) Structure and conformational changes in the C-terminal domain of the beta2-adrenoceptor: insights from fluorescence resonance energy transfer studies. J Biol Chem 282:13895–13905
Hubbard R ed (2006) Structure based drug discovery, Royal Society of Chemistry.
Hwa J, Graham RM, Perez DM (1995) Identification of critical determinants of alpha 1-adrenergic receptor subtype selective agonist binding. J Biol Chem 270:23189–23195
Jaakola VP, Griffith MT, Hanson MA, Cherezov V, Chien EY, Lane JR, Ijzerman AP, Stevens RC (2008) The 2.6 angstrom crystal structure of a human A2A adenosine receptor bound to an antagonist. Science 322(5905):1211–1217
Jones D (1999) Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 292:195–202
Jones DT, Taylor WR, Thornton JM (1992) A new approach to protein fold recognition. Nature 358(6381):86–89
Jones DT, Taylor WR, Thornton JM (1994) A model recognition approach to the prediction of all-helical membrane protein structure and topology. Biochemistry 33(10):3038–3049
Karplus K, Barrett C, Hughey R (1998) Hidden Markov models for detecting remote protein homologies. Bioinformatics 14(10):846–856
Kihara D, Lu H, Kolinski A, Skolnick J (2001) TOUCHSTONE: an ab initio protein structure prediction method that uses threading based tertiary restraints Proc Natl Acad Sci 98:10125–10130
Kihara D, Zhang Y, Lu H, Kolinski A, Skolnick J (2002) Ab initio protein structure prediction on a genomic scale: application to Mycoplasma genitalim genome. Proc Natl Acad Sci 99:5993–5998
Klepeis JL, Wei Y, Hecht MH, Floudas CA (2005) Ab initio prediction of the three-dimensional structure of a de novo designed protein: a double-blind case study. Proteins 58:560–570
Kleywegt GJ (1999) Recognition of spatial motifs in protein structures. J Mol Biol 285:1887–1897
Kolinski A, Skolnick J (1994) Monte Carlo simulations of protein folding. I. Lattice model and interaction scheme. Proteins 18:338–352
Kopp J, Bordoli L, Battey JN, Kiefer F, Schwede T (2007) Assessment of CASP7 predictions for template-based modeling targets. Proteins 69(Suppl 8):38–56
Ladoux A, Frelin C (2000) Coordinated up-regulation by hypoxia of adrenomedullin and one of its putative receptors (RDC-1) in cells of the rat blood–brain barrier. J Biol Chem 275:39914–39919
Li Y, Zhang Y (2009) REMO: a new protocol to refine full atomic protein models from C-alpha traces by optimizing hydrogen-bonding networks. Proteins 76(3):665–676
Liwo A, Lee J, Ripoll DR, Pillardy J, Scheraga HA (1999) Protein structure prediction by global optimization of a potential energy function. Proc Natl Acad Sci USA 96(10):5482–5485
Lopez G, Rojas A, Tress M, Valencia A (2007) Assessment of predictions submitted for the CASP7 function prediction category. Proteins 69(Suppl 8):165–174
Lundstrom K (2005) Structural biology of G protein-coupled receptors. Bioorg Med Chem Lett 15:3654–3657
Mac TT, von Hacht A, Hung KC, Dutton RJ, Boyd D, Bardwell JC, Ulmer TS (2008) Insight into disulfide bond catalysis in Chlamydia from the structure and function of DsbH, a novel oxidoreductase. J Biol Chem 283:824–832
Malmstrom L, Riffle M, Strauss CE, Chivian D, Davis TN, Bonneau R, Baker D (2007) Superfamily assignments for the yeast proteome through integration of structure prediction with the gene ontology. PLoS Biol 5:e76
Marchler-Bauer A, Anderson JB, Cherukuri PF, DeWeese-Scott C, Geer LY, Gwadz M, He S, Hurwitz DI, Jackson JD, Ke Z, others (2005) CDD: a conserved domain database for protein classification. Nucleic Acids Res 33(Database issue):D192–196
Marti-Renom M, Stuart A, Fiser A, Sanchez R, Melo F, Sali A (2000) Comparative protein structure modeling of genes and genomes. Ann Rev Biophys Biomol Struct 29:291–325
McGuffin L, Jones D (2003) Improvement of GenTHREADER method for genomic fold recognition. Bioinformatics 19:874–881
Miao Z, Luker KE, Summers BC, Berahovich R, Bhojani MS, Rehemtulla A, Kleer CG, Essner JJ, Nasevicius A, Luker GD, others (2007) CXCR7 (RDC1) promotes breast and lung tumor growth in vivo and is expressed on tumor-associated vasculature. Proc Natl Acad Sci USA 104(40):15735–15740
Michino M, Abola E, et al. (2009) Community-wide assessment of GPCR structure modelling and ligand docking: GPCR Dock 2008. Nat Rev Drug Discov 8(6):455–463
Needleman S, Wunsch C (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48:443–453
Oldziej S, Czaplewski C, Liwo A, Chinchio M, Nanias M, Vila JA, Khalili M, Arnautova YA, Jagielska A, Makowski M, others (2005) Physics-based protein-structure prediction using a hierarchical protocol based on the UNRES force field: assessment in two blind tests. Proc Natl Acad Sci USA 102:7547–7552
Ostermeier C, Michel H (1997) Crystallization of membrane proteins. Curr Opin Struct Biol 7:697–701
Palczewski K, Kumasaka T, Hori T, Behnke CA, Motoshima H, Fox BA, Le Trong I, Teller DC, Okada T, Stenkamp RE, others (2000) Crystal structure of rhodopsin: a G protein-coupled receptor. Science 289(5480):739–745
Pisarska M, Mulchahey JJ, Sheriff S, Geracioti TD, Kasckow JW (2001) Regulation of corticotropin-releasing hormone in vitro. Peptides 22:705–712
Rasmussen SG, Choi HJ, Rosenbaum DM, Kobilka TS, Thian FS, Edwards PC, Burghammer M, Ratnala VR, Sanishvili R, Fischetti RF, others (2007) Crystal structure of the human beta2 adrenergic G-protein-coupled receptor. Nature 450(7168):383–387
Read RJ, Chavali G (2007) Assessment of CASP7 predictions in the high accuracy template-based modeling category. Proteins 69(Suppl 8):27–37
Rosenbaum DM, Cherezov V, Hanson MA, Rasmussen SG, Thian FS, Kobilka TS, Choi HJ, Yao XJ, Weis WI, Stevens RC, others (2007) GPCR engineering yields high-resolution structural insights into beta2-adrenergic receptor function. Science 318(5854):1266–1273
Rost B (1999) Twilight zone of protein sequence alignments. Protein Eng 12:85–94
Roy A, Kucukural A, Mukherjee S, Hefty PS, Zhang Y (2010) Large scale benchmarking of protein function prediction using modeled protein structures. J Mol Biol (Submitted)
Sali A, Blundell T (1993) Comparative protein modelling by satisfaction of spatial restraints. J Mol Biol 234:779–815
Sanchez R, Pieper U, Mirkovic N, Bakker Pd, Wittenstein E, Sali A (2000) MODBASE, a database of annotated comparitive protein structure models Nucleic Acids Rese 28:250–253
Sanchez R, Sali A (1997) Evaluation of comparative protein structure modelling by MODELLER-3. Proteins Suppl 1:50–58
Sanchez R, Sali A (1998) Large scale structure modelling of the Saccharomyces cerevisiae genome. Proc Natl Acad Sci 95:13597–13602
Sautel M, Rudolf K, Wittneben H, Herzog H, Martinez R, Munoz M, Eberlein W, Engel W, Walker P, Beck-Sickinger AG (1996) Neuropeptide Y and the nonpeptide antagonist BIBP 3226 share an overlapping binding site at the human Y1 receptor. Mol Pharmacol 50:285–292
Schwartz TW (1994) Locating ligand-binding sites in 7TM receptors by protein engineering. Curr Opin Biotechnol 5:434–444
Shi J, Blundell TL, Mizuguchi K (2001) FUGUE: sequence–structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. J Mol Biol 310:243–257
Shi L, Javitch JA (2002) The binding site of aminergic G protein-coupled receptors: the transmembrane segments and second extracellular loop. Annu Rev Pharmacol Toxicol 42:437–467
Simons KT, Kooperberg C, Huang E, Baker D (1997) Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. J Mol Biol 268:209–225
Simons KT, Strauss C, Baker D (2001) Prospects for ab initio protein structural genomics. J Mol Biol 306:1191–1199
Sippl M, Weitckus S (1992) Detection of native like models for amino acid sequences of unknown three-dimensional structure in a database of known protein conformations. Proteins 13:258–271
Skolnick J, Fetrow JS, Kolinski A (2000) Structural genomics and its importance for gene function analysis. Nat Biotechnol 18:283–287
Skolnick J, Kihara D (2001) Defrosting the frozen approximation: PROSPECTOR – a new approach to threading. Proteins:Struct Funct Genet 42:319–331
Skolnick J, Kihara D, Zhang Y (2004a) Development and large scale benchmark testing of the PROSPECTOR_3 threading algorithm. Proteins 56:502–518
Skolnick J, Kihara D, Zhang Y (2004b) Development and large scale benchmark testing of the Prospector_3 threading algorithm. Proteins 56:502–518
Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147:195–197
Soding J (2005) Protein homology detection by HMM–HMM comparison. Bioinformatics 21:951–960
Tramontano A, Morea V (2003) Assesment of homology based predictions in CASP 5. Proteins 53(Suppl 6):352–368
Vitkup D, Melamud E, Moult J, Sander C (2001) Completeness in structural genomics. Nat Struct Biol 8:559–566
Wallace AC, Laskowski RA, Thornton JM (1996) Derivation of 3D coordinate templates for searching structural databases: application to Ser-His-Asp catalytic triads in the serine proteinases and lipases. Protein Sci 5:1001–1013
Warne T, Serrano-Vega MJ, Baker JG, Moukhametzianov R, Edwards PC, Henderson R, Leslie AG, Tate CG, Schertler GF (2008) Structure of a beta1-adrenergic G-protein-coupled receptor. Nature 454(7203):486–491
Watson S, Arkinstall S. (1994) The G protein linked receptors factbook. Academic, New York, NY
Wiley SR (1998) Genomics in the real world. Curr Pharm Des 4:417–422
Wu S, Skolnick J, Zhang Y (2007a) Ab initio modelling of small proteins by iterative TASSER simulations. BMC Biol 5:17
Wu S, Zhang Y (2007b) LOMETS: a local meta-threading-server for protein structure prediction. Nucleic Acids Res 35(10):3375–3382
Wu S, Zhang Y (2008a) A comprehensive assessment of sequence-based and template-based methods for protein contact prediction. Bioinformatics 24:924–931
Wu S, Zhang Y (2008b) MUSTER: improving protein sequence profile–profile alignments by using multiple sources of structure information. Proteins 72:547–556
Wu S, Zhang Y (2009) Improving protein tertiary structure assembly by sequence based contact predictions. Submitted
Xu Y, Xu D (2000) Protein threading using PROSPECT: design and evaluation. Proteins 40:343–354
Zhang B, Jaroszewski L, Rychlewski L, Godzik A (1997) Similarities and differences between non-homologous proteins with similar folds: evaluation of threading strategies. Fold Des 2:307–317
Zhang Y (2007) Template-based modeling and free modeling by I-TASSER in CASP7. Proteins 69(Suppl 8):108–117
Zhang Y (2008a) I-TASSER server for protein 3D structure prediction. BMC Bioinformatics 9:40
Zhang Y (2008b) Progress and challenges in protein structure prediction. Curr Opin Struct Biol 18:342–348
Zhang Y (2009a) I-TASSER: fully automated protein structure prediction in CASP8. Proteins:In press
Zhang Y (2009b) Protein structure prediction: when is it useful? Curr Opin Struct Biol 19:145–155
Zhang Y, Devries ME, Skolnick J (2006a) Structure modeling of all identified G protein-coupled receptors in the human genome. PLoS Comput Biol 2:e13
Zhang Y, Hubner IA, Arakaki AK, Shakhnovich E, Skolnick J (2006b) On the origin and highly likely completeness of single-domain protein structures. Proc Natl Acad Sci USA 103:2605–2610
Zhang Y, Kihara D, Skolnick J (2002) Local energy landscape flattening: Parallel hyperbolic Monte-Carlo sampling of protein folding. Proteins 48:192–201
Zhang Y, Kolinski A, Skolnick J (2003) TOUCHSTONE II: a new approach to ab initio protein structure prediction. Biophys J 85:1145–1164
Zhang Y, Skolnick J (2004a) Automated Structure prediction of weekly homologous proteins on a genomic scale. Proc Natl Acad Sci 101:7594–7599
Zhang Y, Skolnick J (2004b) Spicker: approach to clustering protein structures for near native model selection. J Comp Chem 25:865–871
Zhang Y, Skolnick J (2004c) Tertiary structure predictions on a comprehensive benchmark of medium to large size proteins. Biophys J 87:2647–2655
Zhang Y, Skolnick J (2005a) The protein structure prediction problem could be solved using the current PDB library. Proc Natl Acad Sci USA 102:1029–1034
Zhang Y, Skolnick J (2005b) TM-align:a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res 33:2302–2309
Zhou H, Zhou Y (2004) Single-body residue-level knowledge-based energy score combined with sequence-profile and secondary structure information for fold recognition. Proteins 55:1005–1013
Zhou H, Zhou Y (2005) Fold recognition by combining sequence profiles derived from evolution and from depth-dependent structural alignment of fragments. Proteins 58:321–328
Zhou W, Flanagan C, Ballesteros JA, Konvicka K, Davidson JS, Weinstein H, Millar RP, Sealfon SC (1994) A reciprocal mutation supports helix 2 and helix 7 proximity in the gonadotropin-releasing hormone receptor. Mol Pharmacol 45:165–170
Acknowledgments
The project is supported in part by the Alfred P. Sloan Foundation, NSF Career Award (DBI 0746198), and the National Institute of General Medical Sciences (R01GM083107, R01GM084222).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Mukherjee, S., Szilagyi, A., Roy, A., Zhang, Y. (2011). Genome-Wide Protein Structure Prediction. In: Kolinski, A. (eds) Multiscale Approaches to Protein Modeling. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-6889-0_11
Download citation
DOI: https://doi.org/10.1007/978-1-4419-6889-0_11
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-6888-3
Online ISBN: 978-1-4419-6889-0
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)