Abstract
Protein–protein interactions regulate many essential biological processes and play an important role in health and disease. The process of experimentally characterizing protein residues that contribute the most to protein–protein interaction affinity and specificity is laborious. Thus, developing models that accurately characterize hotspots at protein–protein interfaces provides important information about how to inhibit therapeutically relevant protein–protein interactions. During the course of the ICERM WiSDM workshop 2017, we combined the KFC2a protein–protein interaction hotspot prediction features with Rosetta scoring function terms and interface filter metrics. A two-way and three-way forward selection strategy was employed to train support vector machine classifiers, as was a reverse feature elimination strategy. From these results, we identified subsets of KFC2a and Rosetta combined features that show improved performance over KFC2a features alone.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
M.E. Abram, A.L. Ferris, W. Shao, W.G. Alvord, S.H. Hughes, Nature, position, and frequency of mutations made in a single cycle of HIV-1 replication. J. Virol. 84(19), 9864–9878 (2010)
S. Ahmad, O. Keskin, K. Mizuguchi, A. Sarai, R. Nussinov, CCRXP: exploring clusters of conserved residues in protein structures. Nucleic Acids Res. 38(Web Server issue), W398–401 (2010)
R.F. Alford, A. Leaver-Fay, J.R. Jeliazkov, M.J. O’Meara, F.P. DiMaio, H. Park, M.V. Shapovalov, P.D. Renfrew, V.K. Mulligan, K. Kappel, J.W. Labonte, M.S. Pacella, R. Bonneau, P. Bradley, R.L. Dunbrack, R. Das, D. Baker, B. Kuhlman, T. Kortemme, J.J. Gray, The Rosetta all-atom energy function for macromolecular modeling and design. J. Chem. Theory Comput. 13(6), 3031–3048 (2017)
S.A. Assi, T. Tanaka, T.H. Rabbitts, N. Fernandez-Fuentes, PCRPi: Presaging Critical Residues in Protein interfaces, a new computational tool to chart hot spots in protein interfaces. Nucleic Acids Res. 38(6), e86 (2010)
F. Bahram, N. von der Lehr, C. Cetinkaya, L.G. Larsson, c-Myc hot spot mutations in lymphomas result in inefficient ubiquitination and decreased proteasome-mediated turnover. Blood 95(6), 2104–2110 (2000)
A. Ben-Shimon, M. Eisenstein, Computational mapping of anchoring spots on protein surfaces. J. Mol. Biol. 402(1), 259–277 (2010)
A.A. Bogan, K.S. Thorn, Anatomy of hot spots in protein interfaces. J. Mol. Biol. 280(1), 1–9 (1998)
R.T. Bradshaw, B.H. Patel, E.W. Tate, R.J. Leatherbarrow, I.R. Gould, Comparing experimental and computational alanine scanning techniques for probing a prototypical protein-protein interaction. Protein Eng. Des. Sel. 24(1–2), 197–207 (2011)
A. Chevalier, D.A. Silva, G.J. Rocklin, D.R. Hicks, R. Vergara, P. Murapa, S.M. Bernard, L. Zhang, K.H. Lam, G. Yao et al., Massively parallel de novo protein design for targeted therapeutics. Nature 550(7674), 74–79 (2017)
N. Christianini, J. Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods (Cambridge University Press, Cambridge, 2000)
G.Y. Chuang, R. Mehra-Chaudhary, C.H. Ngan, B.S. Zerbe, D. Kozakov, S. Vajda, L.J. Beamer, Domain motion and interdomain hot spots in a multidomain enzyme. Protein Sci. 19(9), 1662–1672 (2010)
E. Cukuroglu, A. Gursoy, O. Keskin, HotRegion: a database of predicted hot spot clusters. Nucleic Acids Res. 40(Database issue), D829–33 (2012)
S.J. Darnell, D. Page, J.C. Mitchell, An automated decision-tree approach to predicting protein interaction hot spots. Proteins Struct. Funct. Bioinform. 68(4), 813–823 (2007)
S.J. Darnell, L. LeGault, J.C. Mitchell, KFC server: interactive forecasting of protein interaction hot spots. Nucleic Acids Res. 36(Web Server issue), W265–W269 (2008)
W. DeLano, Unraveling hot spots in binding interfaces: progress and challenges. Curr. Opin. Struct. Biol. 12(1), 14–20 (2002)
J.E. Donald, H. Zhu, R.I. Litvinov, W.F. DeGrado, J.S. Bennett, Identification of interacting hot spots in the beta3 integrin stalk using comprehensive interface design. J. Biol. Chem. 285(49), 38658–38665 (2010)
A. Fischer, K. Arunachalam, V. Mangual, S. Bakhru, R. Russo, D. Huang, M. Paczkowski, V. Lalchandani, C. Ramachandra, B. Ellison, S. Galer, J. Shapley, E. Fuentes, J. Tsai, The binding interface database (BID): a compilation of amino acid hot spots in protein interfaces. Bioinformatics 19(11), 1453–1454 (2003)
S. Grosdidier, J. Fernandez-Recio, Identification of hot-spot residues in protein-protein interactions by computational docking. BMC Bioinform. 9, 447 (2008)
R. Guerois, J.E. Nielsen, L. Serrano, Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. J. Mol. Biol. 320(2), 369–387 (2002)
I. Guyon, A. Elisseeff, An introduction to variable and feature selection. J. Mach. Learn. Res. 3(Mar), 1157–1182 (2003)
I. Halperin, H. Wolfson, R. Nussinov, Protein-protein interactions; coupling of structurally conserved residues and of hot spots across interfaces. Implications for docking. Structure (London, England : 1993) 12(6), 1027–1038 (2004)
S. Jones, J.M. Thornton, Analysis of protein-protein interaction sites using surface patches. J. Mol. Biol. 272(1), 121–132 (1997)
L. Kelly, H. Fukushima, R. Karchin, J.M. Gow, L.W. Chinn, U. Pieper, M.R. Segal, D.L. Kroetz, A. Sali, Functional hot spots in human ATP-binding cassette transporter nucleotide binding domains. Protein Sci. 19(11), 2110–2121 (2010)
O. Keskin, B.Y. Ma, R. Nussinov, Hot regions in protein-protein interactions: the organization and contribution of structurally conserved hot spot residues. J. Mol. Biol. 345(5), 1281–1294 (2005)
D. Kim, A feature-based approach to modeling protein-protein interaction hot spots. Nucleic Acids Res. 37(8), 2672–2687 (2009)
N. Koga, R. Tatsumi-Koga, G. Liu, R. Xiao, T.B. Acton, G.T. Montelione, D. Baker, Principles for designing ideal protein structures. Nature 491(7423), 222–227 (2012)
R. Kohavi, G.H. John, Wrappers for feature subset selection. Artif. Intell. 97(1–2), 273–324 (1997)
T.T. Kortemme, D.D. Baker, A simple physical model for binding energy hot spots in protein-protein complexes. Proc. Natl. Acad. Sci. U. S. A. 99(22), 14116–14121 (2002)
D.M. Krüger, H. Gohlke, DrugScorePPI webserver: fast and accurate in silico alanine scanning for scoring protein-protein interactions. Nucleic Acids Res. 38(Web Server issue), W480–W486 (2010)
B. Kuhlman, G. Dantas, G.C. Ireton, G. Varani, B.L. Stoddard, D. Baker, Design of a novel globular protein fold with atomic-level accuracy. Science 302(5649), 1364–1368 (2003)
M.C. Lawrence, P.M. Colman, Shape complementarity at protein/protein interfaces. J. Mol. Biol. 234(4), 946–950 (1993)
A. Leaver-Fay, M. Tyka, S.M. Lewis, O.F. Lange, J. Thompson, R. Jacak, K. Kaufman, P.D. Renfrew, C.A. Smith, W. Sheffler, I.W. Davis, S. Cooper, A. Treuille, D.J. Mandell, F. Richter, Y.E.A. Ban, S.J. Fleishman, J.E. Corn, D.E. Kim, S. Lyskov, M. Berrondo, S. Mentzer, Z. Popović, J.J. Havranek, J. Karanicolas, R. Das, J. Meiler, T. Kortemme, J.J. Gray, B. Kuhlman, D. Baker, P. Bradley, Rosetta3: an object-oriented software suite for the simulation and design of macromolecules. Methods Enzymol. 487, 545–574 (2011)
O. Lichtarge, H.R. Bourne, F.E. Cohen, An evolutionary trace method defines binding surfaces common to protein families. J. Mol. Biol. 257(2), 342–358 (1996)
S. Lise, C. Archambeau, M. Pontil, D.T. Jones, Prediction of hot spot residues at protein-protein interfaces by combining machine learning and energy-based methods. BMC Bioinform. 10, 365 (2009)
Q. Liu, J. Li, Protein binding hot spots and the residue-residue pairing preference: a water exclusion perspective. BMC Bioinform. 11, 244 (2010)
N.A.G. Meenan, A. Sharma, S.J. Fleishman, C.J. Macdonald, B. Morel, R. Boetzel, G.R. Moore, D. Baker, C. Kleanthous, The structural and energetic basis for high selectivity in a high-affinity protein-protein interaction. Proc. Natl. Acad. Sci. U. S. A. 107(22), 10080–10085 (2010)
R. Metternich, G. Tarzia, “Hot spots” in medicinal chemistry. ChemMedChem 5(8), 1159–1162 (2010)
I.H. Moal, J. Fernández-Recio, SKEMPI: a Structural Kinetic and Energetic database of Mutant Protein Interactions and its use in empirical models. Bioinformatics 28(20), 2600–2607 (2012)
J. Nayak, B. Naik, H. Behera, A comprehensive survey on support vector machine in data mining tasks: applications & challenges. Int. J. Database Theory Appl. 8(1), 169–186 (2015)
Y. Ofran, B. Rost, Protein-protein interaction hotspots carved into sequences. PLoS Comput. Biol. 3(7), e119 (2007)
S. Ovchinnikov, H. Park, D.E. Kim, F. DiMaio, D. Baker, Protein structure prediction using Rosetta in CASP12. Proteins: Struct. Funct. Bioinform. 86, 113–116 (2017)
S.E.A. Ozbabacan, A. Gursoy, O. Keskin, R. Nussinov, Conformational ensembles, signal transduction and residue hot spots: application to drug discovery. Curr. Opin. Drug Discov. Dev. 13(5), 527–537 (2010)
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
D.M. Powers, Evaluation: from precision, recall and f-measure to roc, informedness, markedness and correlation. Int. J. Mach. Learn. Technol. 2(1), 37–63 (2011)
V. Pulim, B. Berger, J. Bienkowska, Optimal contact map alignment of protein-protein interfaces. Bioinformatics 24(20), 2324–2328 (2008)
D. Rajamani, S. Thiel, S. Vajda, C.J. Camacho, Anchor residues in protein-protein interactions. Proc. Natl. Acad. Sci. U. S. A. 101(31), 11287–11292 (2004)
I. Res, O. Lichtarge, Character and evolution of protein-protein interfaces. Phys. Biol. 2(2), S36–S43 (2005)
J. Segura, N. Fernandez-Fuentes, PCRPi-DB: a database of computationally annotated hot spots in protein interfaces. Nucleic Acids Res. 39(Database issue), D755–60 (2011)
J. Segura Mora, S.A. Assi, N. Fernandez-Fuentes, Presaging critical residues in protein interfaces-web server (PCRPi-W): a web server to chart hot spots in protein interfaces. PLoS One 5(8), e12352 (2010)
A. Shulman-Peleg, M. Shatsky, R. Nussinov, H.J. Wolfson, Spatial chemical conservation of hot spot interactions in protein-protein complexes. BMC Biol. 5, 43 (2007)
A. Shulman-Peleg, M. Shatsky, R. Nussinov, H.J. Wolfson, MultiBind and MAPPIS: webservers for multiple alignment of protein 3D-binding sites and their interactions. Nucleic Acids Res. 36(Web Server issue), W260–W264 (2008)
K. Tharakaraman, L.N. Robinson, A. Hatas, Y.L. Chen, L. Siyue, S. Raguram, V. Sasisekharan, G.N. Wogan, R. Sasisekharan, Redesign of a cross-reactive antibody to dengue virus with broad-spectrum activity and increased in vivo potency. Proc. Natl. Acad. Sci. U.S.A. 110(17), E1555–E1564 (2013)
N. Tuncbag, A. Gursoy, O. Keskin, Identification of computational hot spots in protein interfaces: combining solvent accessibility and inter-residue potentials improves the accuracy. Bioinformatics 25(12), 1513–1520 (2009)
N. Tuncbag, O. Keskin, A. Gursoy, HotPoint: hot spot prediction server for protein interfaces. Nucleic Acids Res. 38(Web Server issue), W402–W406 (2010)
M. Ui, Y. Tanaka, T. Tsumuraya, I. Fujii, M. Inoue, M. Hirama, Structural and energetic hot-spots for the interaction between a ladder-like polycyclic ether and the anti-ciguatoxin antibody 10C9Fab. Mol. Biosyst. 7, 793–798 (2010)
J.M. Ward, N.M. Gorenstein, J. Tian, S.F. Martin, C.B. Post, Constraining binding hot spots: NMR and molecular dynamics simulations provide a structural explanation for enthalpy-entropy compensation in SH2-ligand binding. J. Am. Chem. Soc. 132(32), 11058–11070 (2010)
J.F. Xia, X.M. Zhao, J. Song, D.S. Huang, APIS: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility. BMC Bioinform. 11, 174 (2010)
L. Yu, H. Liu, Efficient feature selection via analysis of relevance and redundancy. J. Mach. Learn. Res. 5(Oct), 1205–1224 (2004)
X. Zhu, J.C. Mitchell, KFC2: a knowledge-based hot spot prediction method based on interface solvation, atomic density, and plasticity features. Proteins Struct. Funct. Bioinform. 79(9), 1097–0134 (2011)
Acknowledgements
The feature table and feature selection code are available by email to the corresponding author. We thank the Association for Women in Mathematics (AWM) and the Brown University Institute for Computational and Experimental Research in Mathematics (ICERM) for hosting the Women in Data Science and Mathematics (WiSDM) workshop. The Brown University Center for Computation and Visualization (CCV) and the Institute for Protein Design at the University of Washington provided computational resources used for this project. Participation by JM was sponsored by the National Science Foundation [NSF DMS 1160360]. The AWM Advance Program supported participation by FS, AL, YC, TW, and HC. Participation by TW was also supported by DIMACS. FS is generously funded by the Washington Research Foundation Institute for Protein Design Postdoctoral Innovation Fellowship.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 The Author(s) and the Association for Women in Mathematics
About this chapter
Cite this chapter
Seeger, F., Little, A., Chen, Y., Woolf, T., Cheng, H., Mitchell, J.C. (2019). Feature Design for Protein Interface Hotspots Using KFC2 and Rosetta. In: Gasparovic, E., Domeniconi, C. (eds) Research in Data Science. Association for Women in Mathematics Series, vol 17. Springer, Cham. https://doi.org/10.1007/978-3-030-11566-1_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-11566-1_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-11565-4
Online ISBN: 978-3-030-11566-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)