A Consensus Approach to Predicting Protein Contact Map via Logistic Regression

  • Jian-Yi Yang
  • Xin Chen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6674)

Abstract

Prediction of protein contact map is of great importance since it can facilitate and improve the prediction of protein 3D structure. However, the prediction accuracy is notoriously known to be rather low. In this paper, a consensus contact map prediction method called LRcon is developed, which combines the prediction results from several complementary predictors by using a logistic regression model. Tests on the targets from the recent CASP9 experiment and a large dataset D856 consisting of 856 protein chains show that LRcon not only outperforms its component predictors but also the simple averaging and voting schemes. For example, LRcon achieves 41.5% accuracy on the D856 dataset for the top L/10 long-range contact predictions, which is about 5% higher than its best-performed component predictor. The improvements made by LRcon are mainly attributed to the application of a consensus approach to complementary predictors and the logistic regression analysis under the machine learning framework.

Keywords

Protein contact map CASP Logistic regression Machine learning 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The Protein Data Bank. Nucleic Acids Research 28, 235–242 (2000)CrossRefGoogle Scholar
  2. 2.
    Björkholm, P., Daniluk, P., Kryshtafovych, A., Fidelis, K., Andersson, R., Hvidsten, T.R.: Using multi-data hidden Markov models trained on local neighborhoods of protein structure to predict residue-residue contacts. Bioinformatics 25, 1264–1270 (2009)CrossRefGoogle Scholar
  3. 3.
    Cessie, L.S., van Houwelingen, J.C.: Ridge estimators in logistic regression. Applied Statistics 41, 191–201 (1992)CrossRefMATHGoogle Scholar
  4. 4.
    Cheng, J., Baldi, P.: Improved residue contact prediction using support vector machines and a large feature set. BMC Bioinformatics 8, 113 (2007)CrossRefGoogle Scholar
  5. 5.
    Ezkurdia, I., Graña, O., Izarzugaza, J.M.G., Tress, M.L.: Assessment of domain boundary predictions and the prediction of intramolecular contacts in CASP8. Proteins 77, 196–209 (2009)CrossRefGoogle Scholar
  6. 6.
    Gao, X., Bu, D., Xu, J., Li, M.: Improving consensus contact prediction via server correlation reduction. BMC Structural Biology 9, 28 (2009)CrossRefGoogle Scholar
  7. 7.
    Wu, S., Zhang, Y.: A comprehensive assessment of sequence-based and template-based methods for protein contact prediction. Bioinformatics 24, 924–931 (2008)CrossRefGoogle Scholar
  8. 8.
    Griep, S., Hobohm, U.: PDBselect 1992-2009 and PDBfilter-select. Nucleic Acids Research 38, D318–D319 (2009)CrossRefGoogle Scholar
  9. 9.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explorations 11, 10–18 (2009)CrossRefGoogle Scholar
  10. 10.
    Hamilton, N., Burrage, L., Ragan, M.A., Huber, T.: Protein contact prediction using patterns of correlation. Proteins 7, 679–684 (2004)CrossRefGoogle Scholar
  11. 11.
    Izarzugaza, J.M.G., Graña, O., Tress, M.L., Valencia, A., Clarke, N.: Assessment of intramolecular contact predictions for CASP7. Proteins 69, 152–158 (2007)CrossRefGoogle Scholar
  12. 12.
    Kundrotas, P.J., Alexov, E.G.: Predicting residue contacts using pragmatic correlated mutations method: reducing the false positives. BMC Bioinformatics 7, 503 (2006)CrossRefGoogle Scholar
  13. 13.
    Olmea, O., Valencia, A.: Improving contact predictions by the combination of correlated mutations and other sources of sequence information. Folding & Design 2, S25–S32 (1997)CrossRefGoogle Scholar
  14. 14.
    Pollastri, G., Baldi, P.: Prediction of contact maps by GIOHMMs and recurrent neural networks using lateral propagation from all four cardinal corners. Bioinformatics 70, S62–S70 (2002)CrossRefGoogle Scholar
  15. 15.
    Punta, M., Rost, B.: PROFcon: novel prediction of long-range contacts. Bioinformatics 21, 2960–2968 (2005)CrossRefGoogle Scholar
  16. 16.
    Rajgaria, R., Wei, Y., Floudas, C.A.: Contact prediction for beta and alpha-beta proteins using integer linear optimization and its impact on the first principles 3D structure prediction method ASTRO-FOLD. Proteins 78, 1825–1846 (2010)Google Scholar
  17. 17.
    Shackelford, G., Karplus, K.: Contact prediction using mutual information and neural nets. Proteins 69, 159–164 (2007)CrossRefGoogle Scholar
  18. 18.
    Shao, Y., Bystroff, C.: Predicting interresidue contacts using templates and pathways. Proteins 53, 497–502 (2003)CrossRefGoogle Scholar
  19. 19.
    Tegge, A.N., Wang, Z., Eickholt, J., Cheng, J.: NNcon: improved protein contact map prediction using 2D-recursive neural networks. Nucleic Acids Research 37, W515–W518 (2009)CrossRefGoogle Scholar
  20. 20.
    Thomas, D.J., Casari, G., Sander, C.: The prediction of protein contacts from multiple sequence alignments. Protein Engineering 9, 941–948 (1996)CrossRefGoogle Scholar
  21. 21.
    Tress, M.L., Valencia, A.: Predicted residue-residue contacts can help the scoring of 3D models. Proteins 78, 1980–1991 (2010)Google Scholar
  22. 22.
    Vullo, A., Walsh, I., Pollastri, G.: A two-stage approach for improved prediction of residue contact maps. BMC Bioinformatics 7, 180 (2006)CrossRefGoogle Scholar
  23. 23.
    Xue, B., Faraggi, E., Zhou, Y.: Predicting residue-residue contact maps by a two-layer, integrated neural-network method. Proteins 76, 176–183 (2009)CrossRefGoogle Scholar
  24. 24.
    Zhang, Y., Kolinski, A., Skolnick, J.: TOUCHSTONE II: a new approach to ab initio protein structure prediction. Biophysical Journal 85, 1145–1164 (2003)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Jian-Yi Yang
    • 1
  • Xin Chen
    • 1
  1. 1.Division of Mathematical Sciences, School of Physical and Mathematical SciencesNanyang Technological UniversitySingapore

Personalised recommendations