Advertisement

eQuant - A Server for Fast Protein Model Quality Assessment by Integrating High-Dimensional Data and Machine Learning

  • Sebastian Bittrich
  • Florian Heinke
  • Dirk Labudde
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 613)

Abstract

In molecular biology, reliable protein structure models are essential in order to understand the functional role of proteins as well as diseases related to them. Structures are derived by complex and resource-demanding experiments, whereas in silico structure modeling and refinement approaches are established to cope with experimental limitations. Nevertheless, both experimental and computational methods are prone to errors. In consequence, small local regions or even the whole tertiary structure can be unreliable or erroneous, leading the researcher to formulate false hypotheses and draw false conclusions.

Here, we present eQuant, a novel and fast model quality assessment program (MQAP) and server. By utilizing a hybrid approach of established MQAPs in combination with machine learning techniques, eQuant achieves more homogeneous assessments with less uncertainty compared to other established MQAPs. For normal sized protein structures, computation requires less than ten seconds, making eQuant one of the fastest MQAPs available. The eQuant server is freely available at https://biosciences.hs-mittweida.de/equant/.

Keywords

Protein structure Structure quality Quality assessment Quality scoring eQuant 

Notes

Acknowledgments

The authors thank the Free State of Saxony and the Saxon Ministry of Science and the Fine Arts for funding.

Conflict of Interest. The authors declare that there are no conflicts of interest.

References

  1. 1.
    Altschul, S.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25(17), 3389–3402 (1997)CrossRefGoogle Scholar
  2. 2.
    Ambrish, R., Kucukural, A., Zhang, Y.: I-TASSER: a unified platform for automated protein structure and function prediction. Nucleic Acids Res. 5(4), 725–738 (2010)Google Scholar
  3. 3.
    Anfinsen, C.B.: Principles that govern the folding of protein chains. Science 181(4096), 223–230 (1973)CrossRefGoogle Scholar
  4. 4.
    Arnold, K., Bordoli, L., Kopp, J., Schwede, T.: The SWISS-MODEL workspace: a web-based environment for protein structure homology modelling. Bioinformatics 22(2), 195–201 (2006)CrossRefGoogle Scholar
  5. 5.
    Bahar, I., Rader, A.J.: Coarse-grained normal mode analysis in structural biology. Bioinformatics 15(5), 586–592 (2005)Google Scholar
  6. 6.
    Bastolla, U.: Detecting selection on protein stability through statistical mechanical models of folding and evolution. Bioinformatics 4(1), 291–314 (2014)Google Scholar
  7. 7.
    Benkert, P., Biasini, M., Schwede, T.: Toward the estimation of the absolute quality of individual protein structure models. Bioinformatics 27(3), 343–350 (2011)CrossRefGoogle Scholar
  8. 8.
    Benkert, P., Kunzli, M., Schwede, T.: QMEAN server for protein model quality estimation. Nucleic Acids Res. 37(Web Server), W510–W514 (2009)CrossRefGoogle Scholar
  9. 9.
    Benkert, P., Schwede, T., Tosatto, S.: QMEANclust: estimation of protein model quality by combining a composite scoring function with structural density information. Bioinformatics 9(1), 35 (2009)Google Scholar
  10. 10.
    Benkert, P., Tosatto, S.E., Schomburg, D.: QMEAN: a comprehensive scoring function for model quality assessment. Bioinformatics 71(1), 261–277 (2008)Google Scholar
  11. 11.
    Berjanskii, M., Liang, Y., Zhou, J., Tang, P., Stothard, P., Zhou, Y., Cruz, J., MacDonell, C., Lin, G., Lu, P., et al.: PROSESS: a protein structure evaluation suite and server. Nucleic Acids Res. 38(Web Server), W633–W640 (2010)CrossRefGoogle Scholar
  12. 12.
    Bhattacharya, A., Tejero, R., Montelione, G.T.: Evaluating protein structures determined by structural genomics consortia. Bioinformatics 66(4), 778–795 (2006)Google Scholar
  13. 13.
    Biasini, M.: Pv-WebGL-based protein viewer (2014)Google Scholar
  14. 14.
    Biasini, M., Bienert, S., Waterhouse, A., Arnold, K., Studer, G., Schmidt, T., Kiefer, F., Cassarino, T.G., Bertoni, M., Bordoli, L., Schwede, T.: SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information. Nucleic Acids Res. 42(W1), W252–W258 (2014)CrossRefGoogle Scholar
  15. 15.
    Blundell, T., et al.: Structural biology and bioinformatics in drug design: opportunities and challenges for target identification and lead discovery. Bioinformatics 361(1467), 413–423 (2006)Google Scholar
  16. 16.
    Bowie, J., Luthy, R., Eisenberg, D.: A method to identify protein sequences that fold into a known three-dimensional structure. Science 253(5016), 164–170 (1991)CrossRefGoogle Scholar
  17. 17.
    Bradley, P., Malmström, L., Qian, B., Schonbrun, J., Chivian, D., Kim, D., Meiler, J., Misura, K., Baker, D.: Free modeling with Rosetta in CASP6. Science 61(S7), 128–134 (2005)Google Scholar
  18. 18.
    Bryll, R., Gutierrez-Osuna, R., Quek, F.: Attribute bagging: improving accuracy of classifier ensembles by using random feature subsets. Science 36(6), 1291–1302 (2003)zbMATHGoogle Scholar
  19. 19.
    Domingues, F., Lackner, P., Andreeva, A., Sippl, M.J.: Structure-based evaluation of sequence comparison and fold recognition alignment accuracy. Science 297(4), 1003–1013 (2000)Google Scholar
  20. 20.
    Dressel, F., Marsico, A., Tuukkanen, A., Schroeder, M., Labudde, D.: Understanding of SMFS barriers by means of energy profiles. In: Proceedings of German Conference on Bioinformatics, pp. 90–99 (2007)Google Scholar
  21. 21.
    Eisenberg, D., Lüthy, R., Bowie, J.U.: Verify3D: assessment of protein models with three-dimensional profiles. Science 277, 396–404 (1997)Google Scholar
  22. 22.
    Elofsson, A., Le Grand, S.M., Eisenberg, D.: Local moves: an efficient algorithm for simulation of protein folding. Science 23(1), 73–82 (1995)Google Scholar
  23. 23.
    Engh, R.A., Huber, R.: Accurate bond and angle parameters for x-ray protein structure refinement. Science 47(4), 392–400 (1991)Google Scholar
  24. 24.
    Fersht, A.: Structure and Mechanism in Protein Science: A Guide to Enzyme Catalysis and Protein Folding, 3rd edn. W H Freeman & Co, New York (1995)Google Scholar
  25. 25.
    Forster, M.J.: Molecular modelling in structural biology. Science 33(4), 365–384 (2002)MathSciNetGoogle Scholar
  26. 26.
    Frank, E., Hall, M., Trigg, L., Holmes, G., Witten, I.H.: Data mining in bioinformatics using Weka. Bioinformatics 20(15), 2479–2481 (2004)CrossRefGoogle Scholar
  27. 27.
    Fujiwara, T.M., Bichet, D.G.: Molecular biology of hereditary diabetes insipidus. Bioinformatics 16(10), 2836–2846 (2005)Google Scholar
  28. 28.
    Go, N., Noguti, T., Nishikawa, T.: Dynamics of a small globular protein in terms of low-frequency vibrational modes. Bioinformatics 80(12), 3696–3700 (1983)Google Scholar
  29. 29.
    Grabowski, M., Chruszcz, M., Zimmerman, M.D., Kirillova, O., Minor, W.: Benefits of structural genomics for drug discovery research. Bioinformatics 9(5), 459–474 (2009)Google Scholar
  30. 30.
    Guex, N., Peitsch, M.C., Schwede, T.: Automated comparative protein structure modeling with SWISS-MODEL and Swiss-PdbViewer: a historical perspective. Bioinformatics 30(S1), S162–S173 (2009)Google Scholar
  31. 31.
    Haas, J., Roth, S., Arnold, K., Kiefer, F., Schmidt, T., Bordoli, L., Schwede, T.: The protein model portal – a comprehensive resource for protein structure and model information. Database 2013, bat031 (2013)CrossRefGoogle Scholar
  32. 32.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The Weka data mining software. Bioinformatics 11(1), 10 (2009)Google Scholar
  33. 33.
    Heinke, F., Labudde, D.: Membrane protein stability analyses by means of protein energy profiles in case of nephrogenic diabetes insipidus. Bioinformatics 2012, 1–11 (2012)zbMATHGoogle Scholar
  34. 34.
    Heinke, F., Schildbach, S., Stockmann, D., Labudde, D.: eProS-a database and toolbox for investigating protein sequence-structure-function relationships through energy profiles. Bioinformatics 41(D1), D320–D326 (2013)Google Scholar
  35. 35.
    A Highsoft Solutions: Highcharts JS (2012)Google Scholar
  36. 36.
    Holland, R.C.G., Down, T.A., Pocock, M., Prlic, A., Huen, D., James, K., Foisy, S., Drager, A., Yates, A., Heuer, M., et al.: BioJava: an open-source framework for bioinformatics. Bioinformatics 24(18), 2096–2097 (2008)CrossRefGoogle Scholar
  37. 37.
    Holmes, G., Donkin, A., Witten, I.: Weka: a machine learning workbench. In: Proceedings of ANZIIS 94 - Australian New Zealand Intelligent Information Systems Conference, pp. 357–361 (1994)Google Scholar
  38. 38.
    Jones, D.T., Taylort, W.R., Thornton, J.M.: A new approach to protein fold recognition. Nature 358(6381), 86–89 (1992)CrossRefGoogle Scholar
  39. 39.
    Kaiser, F., Eisold, A., Bittrich, S., Labudde, D.: Fit3D - a web application for highly accurate screening of spatial residue patterns in protein structure data. Bioinformatics 32(5), 792–794 (2015)CrossRefGoogle Scholar
  40. 40.
    Kaiser, F., Eisold, A., Labudde, D.: A novel algorithm for enhanced structural motif matching in proteins. Nature 22(7), 698–713 (2015)MathSciNetGoogle Scholar
  41. 41.
    Ho, T.K.: The random subspace method for constructing decision forests. Nature 20(8), 832–844 (1998)Google Scholar
  42. 42.
    Kang, J., Lemaire, H., Unterbeck, A., Salbaum, J.M., Masters, C.L., Grzeschik, K.H., Multhaup, G., Beyreuther, K., Müller-Hill, B.: The precursor of Alzheimer’s disease amyloid A4 protein resembles a cell-surface receptor. Nature 325(6106), 733–736 (1987)CrossRefGoogle Scholar
  43. 43.
    Kendrew, J.C., Bodo, G., Dintzis, H.M., Parrish, R.G., Wyckoff, H., Phillips, D.C.: A three-dimensional model of the myoglobin molecule obtained by x-ray analysis. Nature 181(4610), 662–666 (1958)CrossRefGoogle Scholar
  44. 44.
    Kryshtafovych, A., Barbato, A., Fidelis, K., Monastyrskyy, B., Schwede, T., Tramontano, A.: Assessment of the assessment: evaluation of the model quality estimates in CASP10. Nature 82, 112–126 (2014)Google Scholar
  45. 45.
    Kryshtafovych, A., Monastyrskyy, B., Fidelis, K.: CASP prediction center infrastructure and evaluation measures in CASP10 and CASP ROLL. Nature 82, 7–13 (2014)Google Scholar
  46. 46.
    Kuntz, I.D.: Structure-based strategies for drug design and discovery. Science 257(5073), 1078–1082 (1992)CrossRefGoogle Scholar
  47. 47.
    Laskowski, R., Rullmann, J., MacArthur, M., Kaptein, R., Thornton, J.M.: AQUA and PROCHECK-NMR: programs for checking the quality of protein structures solved by NMR. J. Biomol. NMR 8(4), 477–486 (1996)CrossRefGoogle Scholar
  48. 48.
    Laskowski, R.A., MacArthur, M.W., Moss, D.S., Thornton, J.M.: PROCHECK: a program to check the stereochemical quality of protein structures. Science 26(2), 283–291 (1993)Google Scholar
  49. 49.
    Lüthy, R., Bowie, J.U., Eisenberg, D.: Assessment of protein models with three-dimensional profiles. Nature 356(6364), 83–85 (1992)CrossRefGoogle Scholar
  50. 50.
    Marrin, C.: WebGL Specification. Khronos WebGL Working Group (2011)Google Scholar
  51. 51.
    McGuffin, L.J., Buenavista, M.T., Roche, D.B.: The ModFOLD4 server for the quality assessment of 3D protein models. Nature 41(W1), W368–W372 (2013)Google Scholar
  52. 52.
    Melo, F., Devos, D., Depiereux, E., Feytmans, E.: ANOLEA: a WWW server to assess protein structures. Nature 5, 187–190 (1997)Google Scholar
  53. 53.
    Melo, F., Feytmans, E.: Novel knowledge-based mean force potential at atomic level. Nature 267(1), 207–222 (1997)Google Scholar
  54. 54.
    Melo, F., Feytmans, E.: Assessing protein structures with a non-local atomic interaction energy. Nature 277(5), 1141–1152 (1998)Google Scholar
  55. 55.
    Noguchi, T.: PDB-REPRDB: a database of representative protein chains from the Protein Data Bank (PDB). Nature 29(1), 219–220 (2001)MathSciNetGoogle Scholar
  56. 56.
    Oostenbrink, C., Villa, A., Mark, A.E., van Gunsteren, W.F.: A biomolecular force field based on the free enthalpy of hydration and solvation: the GROMOS force-field parameter sets 53A5 and 53A6. Nature 25(13), 1656–1676 (2004)Google Scholar
  57. 57.
    Panov, P., Dzeroski, S.: Combining bagging and random subspaces to create better ensembles. In: Berthold, M., Shawe-Taylor, J., Lavrač, N. (eds.) IDA 2007. LNCS, vol. 4723, pp. 118–129. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  58. 58.
    Prlic, A., et al.: BioJava: an open-source framework for bioinformatics in 2012. Bioinformatics 28(20), 2693–2695 (2012)CrossRefGoogle Scholar
  59. 59.
    Ramachandran, G., Ramakrishnan, C., Sasisekharan, V.: Stereochemistry of polypeptide chain configurations. Bioinformatics 7(1), 95–99 (1963)Google Scholar
  60. 60.
    Ray, A., Lindahl, E., Wallner, B.: Improved model quality assessment using ProQ2. BMC Bioinform. 13(1), 224 (2012)CrossRefGoogle Scholar
  61. 61.
    Rose, P.W., et al.: The RCSB Protein Data Bank: new resources for research and education. Nucleic Acids Res. 41(Database issue), D475–D482 (2013)CrossRefGoogle Scholar
  62. 62.
    Sadowski, M.I., Jones, D.T.: Benchmarking template selection and model quality assessment for high-resolution comparative modeling. Proteins: Struct. Funct. Bioinform. 69(3), 476–485 (2007)CrossRefGoogle Scholar
  63. 63.
    Sali, A., Blundell, T.L.: Comparative protein modelling by satisfaction of spatial restraints. BMC Bioinform. 234(3), 779–815 (1993)Google Scholar
  64. 64.
    Schulz, G.E., Schirmer, R.H.: Principles of Protein Structure, 5th edn. Springer, New York (1984)Google Scholar
  65. 65.
    Schwede, T., et al.: Outcome of a workshop on applications of protein models in biomedical research. BMC Bioinform. 17(2), 151–159 (2009)MathSciNetGoogle Scholar
  66. 66.
    Sippl, M.J.: Boltzmann’s principle, knowledge-based mean fields and protein folding. An approach to the computational determination of protein structures. J. Comput.-Aided Mol. Des. 7(4), 473–501 (1993)CrossRefGoogle Scholar
  67. 67.
    Sippl, M.J.: Recognition of errors in three-dimensional structures of proteins. BMC Bioinform. 17(4), 355–362 (1993)Google Scholar
  68. 68.
    Sippl, M.J.: Knowledge-based potentials for proteins. BMC Bioinform. 5(2), 229–235 (1995)Google Scholar
  69. 69.
    Strandberg, B.: Chapter 1: building the ground for the first two protein structures: myoglobin and haemoglobin. J. Mol. Biol. 392(1), 2–10 (2009)CrossRefGoogle Scholar
  70. 70.
    Surade, S., Blundell, T.L.: Structural biology and drug discovery of difficult targets: the limits of ligandability. BMC Bioinform. 19(1), 42–50 (2012)Google Scholar
  71. 71.
    The UniProt Consortium: Activities at the universal protein resource (UniProt). Nucleic Acids Res. 42(Database issue), D191–D198 (2014)Google Scholar
  72. 72.
    Verkhivker, G., Appelt, K., Freer, S., Villafranca, J.: Empirical free energy calculations of ligand-protein crystallographic complexes. I. Knowledge-based ligand-protein interaction potentials applied to the prediction of human immunodeficiency virus 1 protease binding affinity. Protein Eng. Des. Sel. 8(7), 677–691 (1995)CrossRefGoogle Scholar
  73. 73.
    Webb, B., Sali, A.: Protein structure modeling with modeller. BMC Bioinform. 1137, 1–15 (2014)Google Scholar
  74. 74.
    Whittle, P.J., Blundell, T.L.: Protein structure-based drug design. BMC Bioinform. 23, 349–375 (1994)Google Scholar
  75. 75.
    Wiederstein, M., Sippl, M.J.: ProSA-web: interactive web service for the recognition of errors in three-dimensional structures of proteins. Nucleic Acids Res. 35(Web Server), W407–W410 (2007)CrossRefGoogle Scholar
  76. 76.
    Willard, L.: VADAR: a web server for quantitative evaluation of protein structure quality. BMC Bioinform. 31(13), 3316–3319 (2003)Google Scholar
  77. 77.
    Wüthrich, K.: Protein structure determination in solution by nmr spectroscopy. BMC Bioinform. 265(36), 22059–22062 (1990)Google Scholar
  78. 78.
    Zemla, A.: LGA: a method for finding 3D similarities in protein structures. BMC Bioinform. 31(13), 3370–3374 (2003)Google Scholar
  79. 79.
    Zhao, N., Han, J.G., Shyu, C., Korkin, D.: Determining effects of non-synonymous SNPs on protein-protein interactions using supervised and semi-supervised learning. PLoS Comput. Biol. 10(5), e1003592 (2014)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Sebastian Bittrich
    • 1
  • Florian Heinke
    • 1
  • Dirk Labudde
    • 1
  1. 1.University of Applied Science MittweidaMittweidaGermany

Personalised recommendations