Abstract
Homology modeling was long considered a method of choice in tertiary protein structure prediction. However, it used to provide models of acceptable quality only when templates with appreciable sequence identity with a target could be found. The threshold value was long assumed to be around 20–30%. Below this level, obtained sequence identity was getting dangerously close to values that can be obtained by chance, after aligning any random, unrelated sequences. In these cases, other approaches, including ab initio folding simulations or fragment assembly, were usually employed. The most recent editions of the CASP and CAMEO community-wide modeling methods assessment have brought some surprising outcomes, proving that much more clues can be inferred from protein sequence analyses than previously thought. In this chapter, we focus on recent advances in the field of difficult protein modeling, pushing the threshold deep into the “twilight zone”, with particular attention devoted to improvements in applications of machine learning and model evaluation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Kendrew JC, Bodo G, Dintzis HM et al (1958) A three-dimensional model of the myoglobin molecule obtained by x-ray analysis. Nature 181:662–666. https://doi.org/10.1038/181662a0
Williamson MP, Havel TF, Wüthrich K (1985) Solution conformation of proteinase inhibitor IIA from bull seminal plasma by 1H nuclear magnetic resonance and distance geometry. J Mol Biol 182:295–315. https://doi.org/10.1016/0022-2836(85)90347-x
Cressey D, Callaway E (2017) Cryo-electron microscopy wins chemistry Nobel. Nature 550:167. https://doi.org/10.1038/nature.2017.22738
Yu X, Veesler D, Campbell MG et al (2017) Cryo-EM structure of human adenovirus D26 reveals the conservation of structural organization among human adenoviruses. Sci Adv 3:e1602670. https://doi.org/10.1126/sciadv.1602670
Stephens ZD, Lee SY, Faghri F et al (2015) Big data: astronomical or genomical? PLoS Biol 13:e1002195. https://doi.org/10.1371/journal.pbio.1002195
Berman HM, Coimbatore Narayanan B, Di Costanzo L et al (2013) Trendspotting in the Protein Data Bank. FEBS Lett 587:1036–1045. https://doi.org/10.1016/j.febslet.2012.12.029
Anfinsen CB (1972) The formation and stabilization of protein structure. Biochem J 128:737–749. https://doi.org/10.1042/bj1280737
Taniuchi H, Anfinsen CB (1969) An experimental approach to the study of the folding of staphylococcal nuclease. J Biol Chem 244:3864–3875
Bryngelson JD, Onuchic JN, Socci ND, Wolynes PG (1995) Funnels, pathways, and the energy landscape of protein folding: a synthesis. Proteins 21:167–195. https://doi.org/10.1002/prot.340210302
Onuchic JN, Luthey-Schulten Z, Wolynes PG (1997) Theory of protein folding: the energy landscape perspective. Annu Rev Phys Chem 48:545–600. https://doi.org/10.1146/annurev.physchem.48.1.545
Tzul FO, Vasilchuk D, Makhatadze GI (2017) Evidence for the principle of minimal frustration in the evolution of protein folding landscapes. Proc Natl Acad Sci U S A 114:E1627–E1632. https://doi.org/10.1073/pnas.1613892114
Lindorff-Larsen K, Piana S, Dror RO, Shaw DE (2011) How fast-folding proteins fold. Science 334:517–520. https://doi.org/10.1126/science.1208351
Best RB, Hummer G, Eaton WA (2013) Native contacts determine protein folding mechanisms in atomistic simulations. Proc Natl Acad Sci U S A 110:17874–17879. https://doi.org/10.1073/pnas.1311599110
Hartl FU (2017) Unfolding the chaperone story. Mol Biol Cell 28:2919–2923. https://doi.org/10.1091/mbc.E17-07-0480
Pang Y-P (2014) Low-mass molecular dynamics simulation: a simple and generic technique to enhance configurational sampling. Biochem Biophys Res Commun 452:588–592. https://doi.org/10.1016/j.bbrc.2014.08.119
Singh R, Bansal R, Rathore AS, Goel G (2017) Equilibrium ensembles for insulin folding from bias-exchange metadynamics. Biophys J 112:1571–1585. https://doi.org/10.1016/j.bpj.2017.03.015
Kamberaj H (2018) Faster protein folding using enhanced conformational sampling of molecular dynamics simulation. J Mol Graph Model 81:32–49. https://doi.org/10.1016/j.jmgm.2018.02.008
Okamoto Y (2019) Protein structure predictions by enhanced conformational sampling methods. Biophys Physicobiol 16:344–366. https://doi.org/10.2142/biophysico.16.0_344
Pal MK, Lahiri T, Tanwar G, Kumar R (2018) An improved protein structure evaluation using a semi-empirically derived structure property. BMC Struct Biol 18:16. https://doi.org/10.1186/s12900-018-0097-0
Zhao C, Shukla D (2018) SAXS-guided enhanced unbiased sampling for structure determination of proteins and complexes. Sci Rep 8:17748. https://doi.org/10.1038/s41598-018-36090-z
Moult J, Fidelis K, Kryshtafovych A et al (2018) Critical assessment of methods of protein structure prediction (CASP)-Round XII. Proteins 86(Suppl 1):7–15. https://doi.org/10.1002/prot.25415
Webb B, Sali A (2016) Comparative protein structure modeling using MODELLER. Curr Protoc Bioinformatics 54:5.6.1–5.6.37. https://doi.org/10.1002/cpbi.3
Šali A, Blundell TL (1993) Comparative protein modelling by satisfaction of spatial restraints. J Mol Biol 234:779–815. https://doi.org/10.1006/jmbi.1993.1626
Janson G, Grottesi A, Pietrosanto M et al (2019) Revisiting the “satisfaction of spatial restraints” approach of MODELLER for protein homology modeling. PLoS Comput Biol 15:e1007219. https://doi.org/10.1371/journal.pcbi.1007219
Haas J, Gumienny R, Barbato A et al (2019) Introducing “best single template” models as reference baseline for the Continuous Automated Model Evaluation (CAMEO). Proteins 87:1378–1387. https://doi.org/10.1002/prot.25815
Dill KA, MacCallum JL (2012) The protein-folding problem, 50 years on. Science 338:1042–1046. https://doi.org/10.1126/science.1219021
Chung SY, Subbiah S (1996) A structural explanation for the twilight zone of protein sequence homology. Structure 4:1123–1127. https://doi.org/10.1016/S0969-2126(96)00119-0
Rost B (1999) Twilight zone of protein sequence alignments. Protein Eng 12:85–94. https://doi.org/10.1093/protein/12.2.85
Kryshtafovych A, Monastyrskyy B, Fidelis K et al (2018) Evaluation of the template-based modeling in CASP12. Proteins 86(Suppl 1):321–334. https://doi.org/10.1002/prot.25425
Jones DT, McGuffin LJ (2003) Assembling novel protein folds from super-secondary structural fragments. Proteins 53(Suppl 6):480–485. https://doi.org/10.1002/prot.10542
Marks DS, Colwell LJ, Sheridan R et al (2011) Protein 3D structure computed from evolutionary sequence variation. PLoS One 6:e28766. https://doi.org/10.1371/journal.pone.0028766
Morcos F, Pagnani A, Lunt B et al (2011) Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc Natl Acad Sci U S A 108:E1293–E1301. https://doi.org/10.1073/pnas.1111471108
Jones DT, Buchan DWA, Cozzetto D, Pontil M (2012) PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics 28:184–190. https://doi.org/10.1093/bioinformatics/btr638
Eickholt J, Cheng J (2012) Predicting protein residue–residue contacts using deep networks and boosting. Bioinformatics 28:3066–3072. https://doi.org/10.1093/bioinformatics/bts598
Qian N, Sejnowski TJ (1988) Predicting the secondary structure of globular proteins using neural network models. J Mol Biol 202:865–884. https://doi.org/10.1016/0022-2836(88)90564-5
Holley LH, Karplus M (1989) Protein secondary structure prediction with a neural network. Proc Natl Acad Sci U S A 86:152–156. https://doi.org/10.1073/pnas.86.1.152
Cuff JA, Clamp ME, Siddiqui AS et al (1998) JPred: a consensus secondary structure prediction server. Bioinformatics 14:892–893. https://doi.org/10.1093/bioinformatics/14.10.892
Jones DT (1999) Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 292:195–202. https://doi.org/10.1006/jmbi.1999.3091
Jones DT, Singh T, Kosciolek T, Tetchner S (2015) MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins. Bioinformatics 31:999–1006. https://doi.org/10.1093/bioinformatics/btu791
Torrisi M, Pollastri G, Le Q (2020) Deep learning methods in protein structure prediction. Comput Struct Biotechnol J 18:1301–1310. https://doi.org/10.1016/j.csbj.2019.12.011
Bhattacharya S, Bhattacharya D (2020) Evaluating the significance of contact maps in low-homology protein modeling using contact-assisted threading. Sci Rep 10:2908. https://doi.org/10.1038/s41598-020-59834-2
Eickholt J, Cheng J (2013) A study and benchmark of DNcon: a method for protein residue-residue contact prediction using deep networks. BMC Bioinformatics 14(Suppl 14):S12. https://doi.org/10.1186/1471-2105-14-S14-S12
Fasel B (2003) An introduction to bio-inspired artificial neural network architectures. Acta Neurol Belg 103:6–12
Tripp B (2019) Approximating the architecture of visual cortex in a convolutional network. Neural Comput 31:1551–1591. https://doi.org/10.1162/neco_a_01211
Xu J, Wang S (2019) Analysis of distance-based protein structure prediction by deep learning in CASP13. Proteins 87:1069–1081. https://doi.org/10.1002/prot.25810
Kryshtafovych A, Schwede T, Topf M et al (2019) Critical assessment of methods of protein structure prediction (CASP)—Round XIII. Proteins 87:1011–1020. https://doi.org/10.1002/prot.25823
Zheng W, Li Y, Zhang C et al (2019) Deep-learning contact-map guided protein structure prediction in CASP13. Proteins 87:1149–1164. https://doi.org/10.1002/prot.25792
Croll TI, Sammito MD, Kryshtafovych A, Read RJ (2019) Evaluation of template-based modeling in CASP13. Proteins 87:1113–1127. https://doi.org/10.1002/prot.25800
Senior AW, Evans R, Jumper J et al (2020) Improved protein structure prediction using potentials from deep learning. Nature 577:706–710. https://doi.org/10.1038/s41586-019-1923-7
AlQuraishi M (2019) AlphaFold at CASP13. Bioinformatics 35:4862–4865. https://doi.org/10.1093/bioinformatics/btz422
Brunger AT (2007) Version 1.2 of the crystallography and NMR system. Nat Protoc 2:2728–2733. https://doi.org/10.1038/nprot.2007.406
Billings WM, Hedelius B, Millecam T et al (2019) ProSPr: democratized implementation of Alphafold protein distance prediction network. BioRxiv. https://doi.org/10.1101/830273
Yang J, Anishchenko I, Park H et al (2020) Improved protein structure prediction using predicted inter-residue orientations. Proc Natl Acad Sci U S A 117:1496–1503. https://doi.org/10.1073/pnas.1914677117
Heo L, Feig M (2020) High-accuracy protein structures by combining machine-learning with physics-based refinement. Proteins 88:637–642. https://doi.org/10.1002/prot.25847
Skolnick J, Gao M, Zhou H, Singh S (2021) AlphaFold 2: why it works and its implications for understanding the relationships of protein sequence, structure, and function. J Chem Inf Model 61:4827–4831. https://doi.org/10.1021/acs.jcim.1c01114
Jumper J, Evans R, Pritzel A et al (2021) Highly accurate protein structure prediction with AlphaFold. Nature 596:583–589. https://doi.org/10.1038/s41586-021-03819-2
Benkert P, Künzli M, Schwede T (2009) QMEAN server for protein model quality estimation. Nucleic Acids Res 37:W510–W514. https://doi.org/10.1093/nar/gkp322
di Luccio E, Koehl P (2011) A quality metric for homology modeling: the H-factor. BMC Bioinformatics 12:48. https://doi.org/10.1186/1471-2105-12-48
Sippl MJ (1995) Knowledge-based potentials for proteins. Curr Opin Struct Biol 5:229–235. https://doi.org/10.1016/0959-440x(95)80081-6
Fang Q, Shortle D (2005) A consistent set of statistical potentials for quantifying local side-chain and backbone interactions. Proteins 60:90–96. https://doi.org/10.1002/prot.20482
Summa CM, Levitt M, Degrado WF (2005) An atomic environment potential for use in protein structure prediction. J Mol Biol 352:986–1001. https://doi.org/10.1016/j.jmb.2005.07.054
Berglund A, Head RD, Welsh EA, Marshall GR (2004) ProVal: a protein-scoring function for the selection of native and near-native folds. Proteins 54:289–302. https://doi.org/10.1002/prot.10523
Wallner B, Elofsson A (2003) Can correct protein models be identified? Protein Sci 12:1073–1086. https://doi.org/10.1110/ps.0236803
Lovell SC, Davis IW, Arendall WB et al (2003) Structure validation by Calpha geometry: phi, psi and Cbeta deviation. Proteins 50:437–450. https://doi.org/10.1002/prot.10286
Moult J, Fidelis K, Kryshtafovych A et al (2007) Critical assessment of methods of protein structure prediction-Round VII. Proteins 69(Suppl 8):3–9. https://doi.org/10.1002/prot.21767
Benkert P, Biasini M, Schwede T (2011) Toward the estimation of the absolute quality of individual protein structure models. Bioinformatics 27:343–350. https://doi.org/10.1093/bioinformatics/btq662
Benkert P, Schwede T, Tosatto SC (2009) QMEANclust: estimation of protein model quality by combining a composite scoring function with structural density information. BMC Struct Biol 9:35. https://doi.org/10.1186/1472-6807-9-35
Studer G, Biasini M, Schwede T (2014) Assessing the local structural quality of transmembrane protein models using statistical potentials (QMEANBrane). Bioinformatics 30:i505–i511. https://doi.org/10.1093/bioinformatics/btu457
Studer G, Rempfer C, Waterhouse AM et al (2020) QMEANDisCo-distance constraints applied on model quality estimation. Bioinformatics 36:1765–1771. https://doi.org/10.1093/bioinformatics/btz828
Iwadate M, Kanou K, Terashi G et al (2010) Method for predicting homology modeling accuracy from amino acid sequence alignment: the power function. Chem Pharm Bull 58:1–10. https://doi.org/10.1248/cpb.58.1
Zhang J, Zhang Y (2010) A novel side-chain orientation dependent potential derived from random-walk reference state for protein fold selection and structure prediction. PLoS One 5:e15386. https://doi.org/10.1371/journal.pone.0015386
Shi X, Zhang J, He Z et al (2011) A sampling-based method for ranking protein structural models by integrating multiple scores and features. Curr Protein Pept Sci 12:540–548. https://doi.org/10.2174/138920311796957658
Wang Q, Vantasin K, Xu D, Shang Y (2011) MUFOLD-WQA: a new selective consensus method for quality assessment in protein structure prediction. Proteins 79(Suppl 10):185–195. https://doi.org/10.1002/prot.23185
He Z, Alazmi M, Zhang J, Xu D (2013) Protein structural model selection by combining consensus and single scoring methods. PLoS One 8:e74006. https://doi.org/10.1371/journal.pone.0074006
Mishra A, Rao S, Mittal A, Jayaram B (2013) Capturing native/native like structures with a physico-chemical metric (pcSM) in protein folding. Biochim Biophys Acta 1834:1520–1531. https://doi.org/10.1016/j.bbapap.2013.04.023
Dai W, Song T, Wang X et al (2014) Improvement in low-homology template-based modeling by employing a model evaluation method with focus on topology. PLoS One 9:e89935. https://doi.org/10.1371/journal.pone.0089935
Faraggi E, Kloczkowski A (2014) A global machine learning based scoring function for protein structure prediction. Proteins 82:752–759. https://doi.org/10.1002/prot.24454
Moult J, Fidelis K, Kryshtafovych A et al (2014) Critical assessment of methods of protein structure prediction (CASP)—Round X. Proteins 82:1–6. https://doi.org/10.1002/prot.24452
Roy A, Perez A, Dill KA, Maccallum JL (2014) Computing the relative stabilities and the per-residue components in protein conformational changes. Structure 22:168–175. https://doi.org/10.1016/j.str.2013.10.015
Moult J, Fidelis K, Kryshtafovych A, Tramontano A (2011) Critical assessment of methods of protein structure prediction (CASP)--Round IX. Proteins 79(Suppl 10):1–5. https://doi.org/10.1002/prot.23200
Nguyen SP, Shang Y, Xu D (2014) DL-PRO: a novel deep learning method for protein model quality assessment. Proc Int Jt Conf Neural Netw 2014:2071–2078. https://doi.org/10.1109/IJCNN.2014.6889891
Sarti E, Granata D, Seno F et al (2015) Native fold and docking pose discrimination by the same residue-based scoring function. Proteins 83:621–630. https://doi.org/10.1002/prot.24764
Singh A, Kaushik R, Mishra A et al (2016) ProTSAV: a protein tertiary structure analysis and validation server. Biochim Biophys Acta 1864:11–19. https://doi.org/10.1016/j.bbapap.2015.10.004
Moult J, Fidelis K, Kryshtafovych A et al (2016) Critical assessment of methods of protein structure prediction: progress and new directions in round XI. Proteins 84:4–14. https://doi.org/10.1002/prot.25064
Cao R, Cheng J (2016) Protein single-model quality assessment by feature-based probability density functions. Sci Rep 6:23990. https://doi.org/10.1038/srep23990
Miszta P, Pasznik P, Jakowiecki J et al (2018) GPCRM: a homology modeling web service with triple membrane-fitted quality assessment of GPCR models. Nucleic Acids Res 46:W387–W395. https://doi.org/10.1093/nar/gky429
Ogorzalek TL, Hura GL, Belsom A et al (2018) Small angle X-ray scattering and cross-linking for data assisted protein structure prediction in CASP 12 with prospects for improved accuracy. Proteins 86(Suppl 1):202–214. https://doi.org/10.1002/prot.25452
Pagès G, Charmettant B, Grudinin S (2019) Protein model quality assessment using 3D oriented convolutional neural networks. Bioinformatics 35:3313–3319. https://doi.org/10.1093/bioinformatics/btz122
McGuffin LJ, Adiyaman R, Maghrabi AHA et al (2019) IntFOLD: an integrated web resource for high performance protein structure and function prediction. Nucleic Acids Res 47:W408–W413. https://doi.org/10.1093/nar/gkz322
McGuffin LJ, Shuid AN, Kempster R et al (2018) Accurate template-based modeling in CASP12 using the IntFOLD4-TS, ModFOLD6, and ReFOLD methods. Proteins 86(Suppl 1):335–344. https://doi.org/10.1002/prot.25360
Wang X, Huang S-Y (2019) Integrating bonded and nonbonded potentials in the knowledge-based scoring function for protein structure prediction. J Chem Inf Model 59:3080–3090. https://doi.org/10.1021/acs.jcim.9b00057
Srivastava A, Adusumilli R, Boyce H et al (2019) Semantic workflows for benchmark challenges: enhancing comparability, reusability and reproducibility. Pac Symp Biocomput 24:208–219
Adiyaman R, McGuffin LJ (2019) Methods for the refinement of protein structure 3D models. Int J Mol Sci 20:2301. https://doi.org/10.3390/ijms20092301
Acknowledgments
The research was partially performed under OPUS grant from National Science Center (NCN, Poland), grant number UMO-2017/27/B/NZ7/01767 (to A.A.K).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Bartuzi, D., Kaczor, A.A., Matosiuk, D. (2023). Illuminating the “Twilight Zone”: Advances in Difficult Protein Modeling. In: Filipek, S. (eds) Homology Modeling. Methods in Molecular Biology, vol 2627. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2974-1_2
Download citation
DOI: https://doi.org/10.1007/978-1-0716-2974-1_2
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-2973-4
Online ISBN: 978-1-0716-2974-1
eBook Packages: Springer Protocols