A Logistic Regression Approach for Identifying Hot Spots in Protein Interfaces

Li, Peipei; Ryu, Keun Ho

doi:10.1007/978-3-319-22741-2_4

Peipei Li¹⁷ &
Keun Ho Ryu¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9267))

Included in the following conference series:

International Conference on Information Technology in Bio- and Medical Informatics

502 Accesses

Abstract

Protein–protein interactions occur when two or more proteins bind together, often to carry out their biological function. A small fraction of interfaces on protein surface found providing major contributions to the binding free energy are referred as hot spots. Identifying hot spots is important for examining the actions and properties occurring around the binding sites. However experimental studies require significant effort; and computational methods still have limitations in prediction performance and feature interpretation.

In this paper we describe a hot spots residues prediction measure which provides a significant improvement over other existing methods. Combining 8 features derived from accessibility, sequence conservation, inter-residue potentials, computational alanine scanning, small-world structure characteristics, phi-psi interaction, and contact number, logistic regression is used to derive a prediction model. To demonstrate its effectiveness, the proposed method is applied to ASEdb. Our prediction model achieves an accuracy of 0.819, F1 score of 0.743. Experimental results show that the additional features can improve the prediction performance. Especially phi-psi has been found to give important effort. We then perform an exhaustive comparison of our method with various machine learning based methods and those previously published prediction models in the literature. Empirical studies show that our method can yield significantly better prediction performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 34.99; Price excludes VAT (USA)

Softcover Book: USD 44.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Li, P., Heo, L., Li, M., Ryu, K.H.: Protein function prediction using frequent patterns in protein-protein interaction networks. FSDK 3, 1664–1668 (2011)
Google Scholar
Jones, S., Thornton, J.M.: Principles of protein-protein interactions. Proc. Natl. Acad. Sci. 93(1), 13–20 (1996)
Article Google Scholar
Clackson, T., Wells, J.A.: A hot spot of binding energy in a hormone-receptor interface. Science 267(5196), 383–386 (1995)
Article Google Scholar
Morrison, K.L., Weiss, G.A.: Combinatorial alanine-scanning. Curr. Opin. Chem. Biol. 5(3), 302–307 (2001)
Article Google Scholar
Thorn, K.S., Bogan, A.A.: ASEdb: a database of alanine mutations and their effects on the free energy of binding in protein interactions. Bioinformatics 17(3), 284–285 (2001)
Article Google Scholar
Fischer, T.B., Arunachalam, K.V., Bailey, D., Mangual, V., Bakhru, S., et al.: The binding interface database (BID): a compilation of amino acid hot spots in protein interfaces. Bioinformatics 19(11), 1453–1454 (2003)
Article Google Scholar
Bogan, A.A., Thorn, K.S.: Anatomy of hot spots in protein interfaces. J. Mol. Biol. 280(1), 1–9 (1998)
Article Google Scholar
Ma, B., Elkayam, T., Wolfson, H., Nussinov, R.: Protein-protein interactions: structurally conserved residues distinguish between binding sites and exposed protein surfaces. Proc. Natl. Acad. Sci. 100(10), 5772–5777 (2003)
Article Google Scholar
Keskin, O., Ma, B., Nussinov, R.: Hot regions in protein–protein interactions: the organization and contribution of structurally conserved hot spot residues. J. Mol. Biol. 345(5), 1281–1294 (2005)
Article Google Scholar
Chen, X., Jeong, J.: Sequence-based prediction of protein interaction sites with an integrative method. Bioinformatics 25(5), 585–591 (2009)
Article Google Scholar
Li, N., Sun, Z., Jiang, F.: Prediction of protein-protein binding site by using core interface residue and support vector machine. BMC Bioinform. 9(1), 553 (2008)
Article Google Scholar
Xia, J.F., Zhao, X.M., Song, J., Huang, D.S.: APIS: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility. BMC Bioinform. 11, 174 (2010)
Article Google Scholar
Tuncbag, N., Keskin, O., Gursoy, A.: HotPoint: hot spot prediction server for protein interfaces. Nucleic Acids Res. 38, W402–W406 (2010)
Article Google Scholar
Darnell, S.J., Page, D., Mitchell, J.C.: An automated decision-tree approach to predicting protein interaction hot spots. Proteins 68, 813–823 (2007)
Article Google Scholar
Del, Sol: A. and O’Meara, P.: Small-world network approach to identify key residues in protein-protein interaction. Proteins 58(3), 672–682 (2005)
Article Google Scholar
Shrake, A., Rupley, J.A.: Environment and exposure to solvent of protein atoms. Lysozyme and insulin. J. Mol. Biol. 79, 351–371 (1973)
Article Google Scholar
Rost, B., Sander, C.: Conservation and prediction of solvent accessibility in protein families. Proteins 20, 216–226 (1994)
Article Google Scholar
Tuncbag, N., Gursoy, A., Keskin, O.: Identification of computational hot spots in protein interfaces: combining solvent accessibility and inter-residue potentials improves the accuracy. Bioinformatics 25(12), 1513–1520 (2009)
Article Google Scholar
Hubbard, S.J., Thornton, J.M.: NACCESS. Department of Biochemistry and Molecular Biology, University College, London (1993)
Google Scholar
Sankararaman, S., Sha, F., Kirsch, J.F., Jordan, M.I., Sjölander, K.: Active site prediction using evolutionary and structural information. Bioinformatics 26(5), 617–624 (2010)
Article Google Scholar
Guney, E., Tuncbag, N., Keskin, O., Gursoy, A.: HotSprint: database of computational hot spots in protein interfaces. Nucleic Acids Res. 36, D662–D666 (2008)
Article Google Scholar
Dodge, C., Schneider, R., Sander, C.: The HSSP database of protein structure-sequence alignments and family profiles. Nucleic Acids Res. 26(1), 313–315 (1998)
Article Google Scholar
Mayrose, I., Graur, D., Ben-Tal, N., Pupko, T.: Comparison of site-specific rate-inference methods for protein sequences: empirical Bayesian methods are superior. Mol. Biol. Evol. 21(9), 1781–1791 (2004)
Article Google Scholar
Jernigan, R.L., Bahar, I.: Structure-derived potentials and protein simulations. Curr. Opin. Struct. Biol. 6(2), 195–209 (1996)
Article Google Scholar
Greene, L.H., Higman, V.A.: Uncovering network systems within protein structures. J. Mol. Biol. 334(4), 781–791 (2003)
Article Google Scholar
Holland, R.C., Down, T.A., Pocock, M., Prlić, A., Huen, D., et al.: BioJava: an open-source framework for bioinformatics. Bioinformatics 24(18), 2096–2097 (2008)
Article Google Scholar
Pollastri, G., Baldi, P., Fariselli, P., Casadio, R.: Prediction of coordination number and relative solvent accessibility in proteins. Proteins 47, 142–153 (2002)
Article Google Scholar
Li, P., Pok, G., Jung, K.S., Shon, H.S., Ryu, K.H.: QSE: A new solvent exposure measure for the analysis of protein structure. Proteomics 11(19), 3793–3801 (2011)
Article Google Scholar
Karchin, R., Cline, M., Karplus, K.: Evaluation of local structure alphabets based on residue burial. Proteins. 55, 508–518 (2004)
Article Google Scholar
Levesque, R.: SPSS Programming and Data Management: A Guide for SPSS and SAS Users, 4th edn. SPSS Inc., Chicago Ill (2007)
Google Scholar
Hartley, R.W.: Barnase and barstar: two small proteins to fold and fit together. Trends Biochem. Sci. 14(11), 450–454 (1989)
Article Google Scholar

Download references

Acknowledgments

This work was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future Planning (No. 2013R1A2A2A01068923) and by the ITRC(Information Technology Research Center) support program (NIPA-2014-H0301-14-1002)

Author information

Authors and Affiliations

Database/Bioinformatics Laboratory, Chungbuk National University, Cheongju, South Korea
Peipei Li & Keun Ho Ryu

Authors

Peipei Li
View author publications
You can also search for this author in PubMed Google Scholar
Keun Ho Ryu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Keun Ho Ryu .

Editor information

Editors and Affiliations

Institute of Informatics and Telematics, Pisa, Italy
M. Elena Renda
Czech Technical University in Prague, Prague, Czech Republic
Miroslav Bursa
Medical University Graz, Graz, Austria
Andreas Holzinger
San Jose State University, San Jose, California, USA
Sami Khuri

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, P., Ryu, K.H. (2015). A Logistic Regression Approach for Identifying Hot Spots in Protein Interfaces. In: Renda, M., Bursa, M., Holzinger, A., Khuri, S. (eds) Information Technology in Bio- and Medical Informatics. ITBAM 2015. Lecture Notes in Computer Science(), vol 9267. Springer, Cham. https://doi.org/10.1007/978-3-319-22741-2_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-22741-2_4
Published: 11 August 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22740-5
Online ISBN: 978-3-319-22741-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics