Abstract
Prediction of protein contact map is of great importance since it can facilitate and improve the prediction of protein 3D structure. However, the prediction accuracy is notoriously known to be rather low. In this paper, a consensus contact map prediction method called LRcon is developed, which combines the prediction results from several complementary predictors by using a logistic regression model. Tests on the targets from the recent CASP9 experiment and a large dataset D856 consisting of 856 protein chains show that LRcon not only outperforms its component predictors but also the simple averaging and voting schemes. For example, LRcon achieves 41.5% accuracy on the D856 dataset for the top L/10 long-range contact predictions, which is about 5% higher than its best-performed component predictor. The improvements made by LRcon are mainly attributed to the application of a consensus approach to complementary predictors and the logistic regression analysis under the machine learning framework.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The Protein Data Bank. Nucleic Acids Research 28, 235–242 (2000)
Björkholm, P., Daniluk, P., Kryshtafovych, A., Fidelis, K., Andersson, R., Hvidsten, T.R.: Using multi-data hidden Markov models trained on local neighborhoods of protein structure to predict residue-residue contacts. Bioinformatics 25, 1264–1270 (2009)
Cessie, L.S., van Houwelingen, J.C.: Ridge estimators in logistic regression. Applied Statistics 41, 191–201 (1992)
Cheng, J., Baldi, P.: Improved residue contact prediction using support vector machines and a large feature set. BMC Bioinformatics 8, 113 (2007)
Ezkurdia, I., Graña, O., Izarzugaza, J.M.G., Tress, M.L.: Assessment of domain boundary predictions and the prediction of intramolecular contacts in CASP8. Proteins 77, 196–209 (2009)
Gao, X., Bu, D., Xu, J., Li, M.: Improving consensus contact prediction via server correlation reduction. BMC Structural Biology 9, 28 (2009)
Wu, S., Zhang, Y.: A comprehensive assessment of sequence-based and template-based methods for protein contact prediction. Bioinformatics 24, 924–931 (2008)
Griep, S., Hobohm, U.: PDBselect 1992-2009 and PDBfilter-select. Nucleic Acids Research 38, D318–D319 (2009)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explorations 11, 10–18 (2009)
Hamilton, N., Burrage, L., Ragan, M.A., Huber, T.: Protein contact prediction using patterns of correlation. Proteins 7, 679–684 (2004)
Izarzugaza, J.M.G., Graña, O., Tress, M.L., Valencia, A., Clarke, N.: Assessment of intramolecular contact predictions for CASP7. Proteins 69, 152–158 (2007)
Kundrotas, P.J., Alexov, E.G.: Predicting residue contacts using pragmatic correlated mutations method: reducing the false positives. BMC Bioinformatics 7, 503 (2006)
Olmea, O., Valencia, A.: Improving contact predictions by the combination of correlated mutations and other sources of sequence information. Folding & Design 2, S25–S32 (1997)
Pollastri, G., Baldi, P.: Prediction of contact maps by GIOHMMs and recurrent neural networks using lateral propagation from all four cardinal corners. Bioinformatics 70, S62–S70 (2002)
Punta, M., Rost, B.: PROFcon: novel prediction of long-range contacts. Bioinformatics 21, 2960–2968 (2005)
Rajgaria, R., Wei, Y., Floudas, C.A.: Contact prediction for beta and alpha-beta proteins using integer linear optimization and its impact on the first principles 3D structure prediction method ASTRO-FOLD. Proteins 78, 1825–1846 (2010)
Shackelford, G., Karplus, K.: Contact prediction using mutual information and neural nets. Proteins 69, 159–164 (2007)
Shao, Y., Bystroff, C.: Predicting interresidue contacts using templates and pathways. Proteins 53, 497–502 (2003)
Tegge, A.N., Wang, Z., Eickholt, J., Cheng, J.: NNcon: improved protein contact map prediction using 2D-recursive neural networks. Nucleic Acids Research 37, W515–W518 (2009)
Thomas, D.J., Casari, G., Sander, C.: The prediction of protein contacts from multiple sequence alignments. Protein Engineering 9, 941–948 (1996)
Tress, M.L., Valencia, A.: Predicted residue-residue contacts can help the scoring of 3D models. Proteins 78, 1980–1991 (2010)
Vullo, A., Walsh, I., Pollastri, G.: A two-stage approach for improved prediction of residue contact maps. BMC Bioinformatics 7, 180 (2006)
Xue, B., Faraggi, E., Zhou, Y.: Predicting residue-residue contact maps by a two-layer, integrated neural-network method. Proteins 76, 176–183 (2009)
Zhang, Y., Kolinski, A., Skolnick, J.: TOUCHSTONE II: a new approach to ab initio protein structure prediction. Biophysical Journal 85, 1145–1164 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yang, JY., Chen, X. (2011). A Consensus Approach to Predicting Protein Contact Map via Logistic Regression. In: Chen, J., Wang, J., Zelikovsky, A. (eds) Bioinformatics Research and Applications. ISBRA 2011. Lecture Notes in Computer Science(), vol 6674. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21260-4_16
Download citation
DOI: https://doi.org/10.1007/978-3-642-21260-4_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21259-8
Online ISBN: 978-3-642-21260-4
eBook Packages: Computer ScienceComputer Science (R0)