Abstract
In the absence of experimental structures, comparative modeling continues to be the chosen method for retrieving structural information on target proteins. However, models lack the accuracy of experimental structures. Alignment error and structural divergence (between target and template) influence model accuracy the most. Here, we examine the potential additional impact of backbone geometry, as our previous studies have suggested that the structural class (all-α, αβ, all-β) of a protein may influence the accuracy of its model. In the twilight zone (sequence identity ≤ 30%) and at a similar level of target-template divergence, the accuracy of protein models does indeed follow the trend all-α > αβ > all-β. This is mainly because the alignment accuracy follows the same trend (all-α > αβ > all-β), with backbone geometry playing only a minor role. Differences in the diversity of sequences belonging to different structural classes leads to the observed accuracy differences, thus enabling the accuracy of alignments/models to be estimated a priori in a class-dependent manner. This study provides a systematic description of and quantifies the structural class-dependent effect in comparative modeling. The study also suggests that datasets for large-scale sequence/structure analyses should have equal representations of different structural classes to avoid class-dependent bias.
Similar content being viewed by others
References
Taylor WR (2007) Evolutionary transitions in protein fold space. Curr Opin Struct Biol 17:354–361
Sanchez R, Sali A (1998) Large-scale protein structure modeling of the Saccharomyces cerevisiae genome. Proc Natl Acad Sci USA 95:13597–13602
Sanchez R, Pieper U, Melo F, Eswar N, Marti-Renom MA, Madhusudhan MS, Mirkovic N, Sali A (2000) Protein structure modeling for structural genomics. Nat Struct Biol 7(Suppl 1):986–990
Stevens RC, Yokoyama S, Wilson IA (2001) Global efforts in structural genomics. Science 294:89–92
Tramontano A, Morea V (2003) Assessment of homology-based predictions in CASP5. Proteins 53(Suppl 6):352–368
Lushington GH (2008) Comparative modeling of proteins. Meth Mol Biol Clifton NJ 443:199–212
Chakravarty S, Wang L, Sanchez R (2005) Accuracy of structure-derived properties in simple comparative models of protein structures. Nucleic Acids Res 33:244–259
Chakravarty S, Sanchez R (2004) Systematic analysis of added-value in simple comparative models of protein structure. Struct Camb 12:1461–1470
Kiel C, Wohlgemuth S, Rousseau F, Schymkowitz J, Ferkinghoff-Borg J, Wittinghofer F, Serrano L (2005) Recognizing and defining true Ras binding domains II: in silico prediction based on homology modelling and energy calculations. J Mol Biol 348:759–775
Liu T, Rojas A, Ye Y, Godzik A (2003) Homology modeling provides insights into the binding mode of the PAAD/DAPIN/pyrin domain, a fourth member of the CARD/DD/DED domain family. Protein Sci 12:1872–1881
Murray PS, Li Z, Wang J, Tang CL, Honig B, Murray D (2005) Retroviral matrix domains share electrostatic homology: models for membrane binding function throughout the viral life cycle. Structure 13:1521–1531
Berman HM, Battistuz T, Bhat TN, Bluhm WF, Bourne PE, Burkhardt K, Feng Z, Gilliland GL, Iype L, Jain S, Fagan P, Marvin J, Padilla D, Ravichandran V, Schneider B, Thanki N, Weissig H, Westbrook JD, Zardecki C (2002) The Protein Data Bank. Acta Crystallogr D 58:899–907
Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) The Protein Data Bank. Nucleic Acids Res 28:235–242
Hillisch A, Pineda LF, Hilgenfeld R (2004) Utility of homology models in the drug discovery process. Drug Discov Today 9:659–669
Ring CS, Sun E, McKerrow JH, Lee GK, Rosenthal PJ, Kuntz ID, Cohen FE (1993) Structure-based inhibitor design by using protein models for the development of antiparasitic agents. Proc Natl Acad Sci USA 90:3583–3587
Evers A, Klabunde T (2005) Structure-based drug discovery using GPCR homology modeling: successful virtual screening for antagonists of the alpha1A adrenergic receptor. J Med Chem 48:1088–1097
Evers A, Klebe G (2004) Successful virtual screening for a submicromolar antagonist of the neurokinin-1 receptor based on a ligand-supported homology model. J Med Chem 47:5381–5392
Vangrevelinghe E, Zimmermann K, Schoepfer J, Portmann R, Fabbro D, Furet P (2003) Discovery of a potent and selective protein kinase CK2 inhibitor by high-throughput docking. J Med Chem 46:2656–2662
Lengauer T, Lemmen C, Rarey M, Zimmermann M (2004) Novel technologies for virtual screening. Drug Discov Today 9:27–34
Read RJ (2001) Pushing the boundaries of molecular replacement with maximum likelihood. Acta Crystallogr D 57(Pt 10):1373–1382
Skolnick J, Fetrow JS, Kolinski A (2000) Structural genomics and its importance for gene function analysis. Nat Biotech 18:283–287
Moult J (2005) A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction. Curr Opin Struct Biol 15:285–289
Chakravarty S, Godbole S, Zhang B, Berger S, Sanchez R (2008) Systematic analysis of the effect of multiple templates on the accuracy of comparative models of protein structure. BMC Struct Biol 8:31
Sauder JM, Arthur JW, Dunbrack RL Jr (2000) Large-scale comparison of protein sequence alignment algorithms with structure alignments. Proteins 40:6–22
Dunbrack RL Jr (2006) Sequence comparison and protein structure prediction. Curr Opin Struct Biol 16:374–384
Baker D, Sali A (2001) Protein structure prediction and structural genomics. Science 294:93–96
Kopp J, Bordoli L, Battey JN, Kiefer F, Schwede T (2007) Assessment of CASP7 predictions for template-based modeling targets. Proteins 69(Suppl 8):38–56
Nayeem A, Sitkoff D, Krystek S Jr (2006) A comparative study of available software for high-accuracy homology modeling: from sequence alignments to structural models. Protein Sci 15:808–824
Rayan A (2009) New tips for structure prediction by comparative modeling. Bioinformation 3:263–267
Sali A, Blundell TL (1993) Comparative protein modelling by satisfaction of spatial restraints. J Mol Biol 234:779–815
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402
Rychlewski L, Jaroszewski L, Li W, Godzik A (2000) Comparison of sequence profiles. Strategies for structural predictions using sequence information. Protein Sci 9:232–241
Sadreyev RI, Grishin NV (2004) Estimates of statistical significance for comparison of individual positions in multiple sequence alignments. BMC Bioinf 5:106
Panchenko AR (2003) Finding weak similarities between proteins by sequence profile comparison. Nucleic Acids Res 31:683–689
Casbon J, Saqi MA (2005) S4: structure-based sequence alignments of SCOP superfamilies. Nucleic Acids Res 33:D219–222
Tress ML, Jones D, Valencia A (2003) Predicting reliable regions in protein alignments from sequence profiles. J Mol Biol 330:705–718
Sadreyev RI, Grishin NV (2004) Quality of alignment comparison by COMPASS improves with inclusion of diverse confident homologs. Bioinformatics 20:818–828
Koehl P, Levitt M (2002) Protein topology and stability define the space of allowed sequences. Proc Natl Acad Sci USA 99:1280–1285
England JL, Shakhnovich EI (2003) Structural determinant of protein designability. Phys Rev Lett 90:218101
Minor DL Jr, Kim PS (1994) Context is a major determinant of beta-sheet propensity. Nature 371:264–267
Han KF, Baker D (1995) Recurring local sequence motifs in proteins. J Mol Biol 251:176–187
Han KF, Baker D (1996) Global properties of the mapping between local amino acid sequence and local structure in proteins. Proc Natl Acad Sci USA 93:5814–5818
Bystroff C, Simons KT, Han KF, Baker D (1996) Local sequence–structure correlations in proteins. Curr Opin Biotechnol 7:417–421
West MW, Hecht MH (1995) Binary patterning of polar and nonpolar amino acids in the sequences and structures of native proteins. Protein Sci 4:2032–2039
Shakhnovich BE, Deeds E, Delisi C, Shakhnovich E (2005) Protein structure and evolutionary history determine sequence space topology. Genom Res 15:385–392
Edgar RC, Sjolander K (2004) A comparison of scoring functions for protein sequence profile alignment. Bioinformatics 20:1301–1308
Anantharaman V, Aravind L, Koonin EV (2003) Emergence of diverse biochemical activities in evolutionarily conserved structural scaffolds of proteins. Curr Opin Chem Biol 7:12–20
Edgar RC (2004) MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinf 5:113
Acknowledgments
We thank Sucheta Godbole for helping us with the profile–profile alignments. We thank Zhanwen Li of the Godzik Laboratory at the Burnham Institute for helping us with the Fold and Function Assignment (FFAS) server when investigating the test cases of profile–profile alignments. SC thanks Prof. Ming-Ming Zhou for encouragement. The study was supported by the National Institute of General Medicine at the National Institutes of Health [grant 1R01GM081713 (RS)], and South Dakota State University’s (SDSU) Agricultural Experiment Station and Center for Biological Control and Analysis by Applied Photonics (BCAAP) [grant 3SG163 (SC)].
Author information
Authors and Affiliations
Corresponding authors
Electronic Supplementary Material
Below is the link to the electronic supplementary material.
ESM 1
(PDF 2482 kb)
Rights and permissions
About this article
Cite this article
Chakravarty, S., Ghersi, D. & Sanchez, R. Systematic assessment of accuracy of comparative model of proteins belonging to different structural fold classes. J Mol Model 17, 2831–2837 (2011). https://doi.org/10.1007/s00894-011-0976-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00894-011-0976-9