Skip to main content

Learning Sequence Determinants of Protein: Protein Interaction Specificity with Sparse Graphical Models

  • Conference paper
Research in Computational Molecular Biology (RECOMB 2014)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 8394))

Abstract

In studying the strength and specificity of interaction between members of two protein families, key questions center on which pairs of possible partners actually interact, how well they interact, and why they interact while others do not. The advent of large-scale experimental studies of interactions between members of a target family and a diverse set of possible interaction partners offers the opportunity to address these questions. We develop here a method, DgSpi (Data-driven Graphical models of Specificity in Protein:protein Interactions), for learning and using graphical models that explicitly represent the amino acid basis for interaction specificity (why) and extend earlier classification-oriented approaches (which) to predict the \(\varDelta{G}\) of binding (how well). We demonstrate the effectiveness of our approach in analyzing and predicting interactions between a set of 82 PDZ recognition modules, against a panel of 217 possible peptide partners, based on data from MacBeath and colleagues. Our predicted \(\varDelta{G}\) values are highly predictive of the experimentally measured ones, reaching correlation coefficients of 0.69 in 10-fold cross-validation and 0.63 in leave-one-PDZ-out cross-validation. Furthermore, the model serves as a compact representation of amino acid constraints underlying the interactions, enabling protein-level \(\varDelta{G}\) predictions to be naturally understood in terms of residue-level constraints. Finally, the model, DgSpi readily enables the design of new interacting partners, and we demonstrate that designed ligands are novel and diverse.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Balakrishnan, S., Kamisetty, H., Carbonell, J., Lee, S., Langmead, C.: Learning generative models for protein fold families. Proteins: Structure, Function, and Bioinformatics 79(4), 1061–1078 (2011)

    Article  Google Scholar 

  2. Bordner, A., Mittelmann, H.: MultiRTA: A simple yet accurate method for predicting peptide binding affinities for multiple class II MHC allotypes. BMC Bioinformatics 11, 482 (2010)

    Article  Google Scholar 

  3. Brannetti, B., Via, A., Cestra, G., Cesareni, G., Citterich, M.H.: SH3-SPOT: an algorithm to predict preferred ligands to different members of the SH3 gene family. Journal of Molecular Biology 298(2), 313–328 (2000)

    Article  Google Scholar 

  4. Chen, J., Chang, B., Allen, J., Stiffler, M., MacBeath, G.: Predicting PDZ domain-peptide interactions from primary sequences. Nat. Biotechnol. 26(9), 1041–1045 (2008)

    Article  Google Scholar 

  5. Fields, S., Song, O.: A novel genetic system to detect protein-protein interactions. Nature 340(6230), 245–246 (1989)

    Article  Google Scholar 

  6. Fong, J., Keating, A., Singh, M.: Predicting specificity in bZIP coiled-coil protein interactions. Genome Biology 5(2), R11 (2004)

    Google Scholar 

  7. Fuh, G., Pisabarro, M., Li, Y., Quan, C., Lasky, L., Sidhu, S.: Analysis of PDZ domain-ligand interactions using carboxyl-terminal phage display. J. Biol. Chem. 275(28), 21486–21491 (2000)

    Google Scholar 

  8. Grigoryan, G., Reinke, A., Keating, A.: Design of protein-interaction specificity gives selective bZIP-binding peptides. Nature 458(7240), 859–864 (2009)

    Article  Google Scholar 

  9. Guerois, R., Nielsen, J.E., Serrano, L.: Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. Journal of Molecular Biology 320, 369–387 (2002)

    Article  Google Scholar 

  10. Jones, D.T., Buchan, D.W., Cozzetto, D., Pontil, M.: Psicov: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics 28(2), 184–190 (2012)

    Article  Google Scholar 

  11. Kamisetty, H., Ghosh, B., Bailey-Kellogg, C., Langmead, C.: Modeling and Inference of Sequence-Structure Specificity. In: Proc. of the 8th International Conference on Computational Systems Bioinformatics (CSB), pp. 91–101 (2009)

    Google Scholar 

  12. Kamisetty, H., Ovchinnikov, S., Baker, D.: Assessing the utility of coevolution-based residue–residue contact predictions in a sequence-and structure-rich era. Proceedings of the National Academy of Sciences 110(39), 15674–15679 (2013)

    Article  Google Scholar 

  13. Kamisetty, H., Ramanathan, A., Bailey-Kellogg, C., Langmead, C.: Accounting for conformational entropy in predicting binding free energies of protein-protein interactions. Proteins 79(2), 444–462 (2011)

    Article  Google Scholar 

  14. Kamisetty, H., Xing, E., Langmead, C.: Free Energy Estimates of All-atom Protein Structures Using Generalized Belief Propagation. J. Comp. Bio. 15(7), 755–766 (2008)

    Article  MathSciNet  Google Scholar 

  15. Kamisetty, H., Xing, E., Langmead, C.: Approximating Correlated Equilibria using Relaxations on the Marginal Polytope. In: Proc. of the 28th International Conference on Machine Learning (ICML), pp. 1153–1160 (2011)

    Google Scholar 

  16. Kortemme, T., Baker, D.: A simple physical model for binding energy hot spots in protein-protein complexes. Proceedings of the National Academy of Sciences 99(22), 14116–14121 (2002)

    Article  Google Scholar 

  17. Kurakin, A., Swistowski, A., Wu, S., Bredesen, D.: The pdz domain as a complex adaptive system. PLoS One 2(9) 2(9), e953 (2007)

    Google Scholar 

  18. Li, J., Yi, Z.P., Laskowski, M., Laskowski Jr., M., Bailey-Kellogg, C.: Analysis of sequence-reactivity space for protein-protein interactions. Proteins: Structure, Function, and Bioinformatics 58(3), 661–671 (2005)

    Article  Google Scholar 

  19. Liu, Y., Carbonell, J., Gopalakrishnan, V., Weigele, P.: Conditional graphical models for protein structural motif recognition. Journal of Computational Biology 16(5), 639–657 (2009)

    Article  MathSciNet  Google Scholar 

  20. Lockless, S.W., Ranganathan, R.: Evolutionarily conserved pathways of energetic connectivity in protein families. Science 286(5438), 295–299 (1999)

    Article  Google Scholar 

  21. Lu, S., Lu, W., Qasim, M., Anderson, S., Apostol, I., Ardelt, W., Bigler, T., Chiang, Y., Cook, J., James, M., Kato, I., Kelly, C., Kohr, W., Komiyama, T., Lin, T., Ogawa, M., Otlewski, J., Park, S., Qasim, S., Ranjbar, M., Tashiro, M., Warne, N., Whatley, H., Wieczorek, A., Wieczorek, M., Wilusz, T., Wynn, R., Zhang, W., Laskowski Jr., M.: Predicting the reactivity of proteins from their sequence alone: Kazal family of protein inhibitors of serine proteinases. Proceedings of the National Academy of Sciences 98(4), 1410–1415 (2001)

    Article  Google Scholar 

  22. Marks, D.S., Colwell, L.J., Sheridan, R., Hopf, T.A., Pagnani, A., Zecchina, R., Sander, C.: Protein 3d structure computed from evolutionary sequence variation. PLoS One 6(12), e28766 (2011)

    Google Scholar 

  23. Meinshausen, N., Bühlmann, P.: High-dimensional graphs and variable selection with the lasso. The Annals of Statistics 34(3), 1436–1462 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  24. Menke, M., Berger, B., Cowen, L.: Markov random fields reveal an n-terminal double beta-propeller motif as part of a bacterial hybrid two-component sensor system. PNAS 107(9), 4069–4074 (2010)

    Article  Google Scholar 

  25. Moitra, S., Tirupula, K., Klein-Seetharaman, J., Langmead, C.: A minimal ligand binding pocket within a network of correlated mutations identified by multiple sequence and structural analysis of G protein coupled receptors. BMC Biophysics 5(13) (2012), doi:10.1186/2046–1682–5–13

    Google Scholar 

  26. Morcos, F., Pagnani, A., Lunt, B., Bertolino, A., Marks, D.S., Sander, C., Zecchina, R., Onuchic, J.N., Hwa, T., Weigt, M.: Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proceedings of the National Academy of Sciences 108(49), E1293–E1301 (2011)

    Google Scholar 

  27. Nielsen, M., Lundegaard, C., Blicher, T., Lamberth, K., Harndahl, M., Justesen, S., Roder, G., Peters, B., Sette, A., Lund, O., Buus, S.: NetMHCpan, a method for quantitative predictions of peptide binding to any HLA-A and -B locus protein of known sequence. PLoS One 2, e796 (2007)

    Google Scholar 

  28. Nugent, T., Jones, D.T.: Accurate de novo structure prediction of large transmembrane protein domains using fragment-assembly and correlated mutation analysis. Proceedings of the National Academy of Sciences 109(24), E1540–E1547 (2012)

    Google Scholar 

  29. Peters, B., Sidney, J., Bourne, P., Bui, H.H., Buus, S., Doh, G., Fleri, W., Kronenberg, M., Kubo, R., Lund, O., Nemazee, D., Ponomarenko, J.V., Sathiamurthy, M., Schoenberger, S., Stewart, S., Surko, P., Way, S., Wilson, S., Sette, A.: The immune epitope database and analysis resource: from vision to blueprint. PLoS Biol. 3, e91 (2005)

    Google Scholar 

  30. Razavian, N., Kamisetty, H., Langmead, C.: Learning generative models of molecular dynamics. BMC Genomics 13(suppl. 1) (2012), doi:10.1186/1471–2164–13–S1–S5

    Google Scholar 

  31. Saro, D., Li, T., Rupasinghe, C., Paredes, A., Caspers, N., Spaller, M.: A thermodynamic ligand binding study of the third pdz domain (pdz3) from the mammalian neuronal protein psd-95. Biochemistry 46(21), 6340–6352 (2007)

    Article  Google Scholar 

  32. Schmidt, M., van der Berg, E., Friedlander, M.P., Murphy, K.: Optimizing costly functions with simple constraints:a limited-memory projected quasi-newton algorithm. AISTATS 5, 456–463 (2009)

    Google Scholar 

  33. Shao, X., Tan, C., Voss, C., Li, S., Deng, N., Bader, G.: A regression framework incorporating quantitative and negative interaction data improves quantitative prediction of PDZ domain-peptide interaction from primary sequence. Bioinformatics 27(3), 383–390 (2010)

    Article  Google Scholar 

  34. Sheng, M., Sala, C.: Pdz domains and the organization of supramolecular complexes. Annu. Rev. Neurosci. 24, 1–29 (2001)

    Article  Google Scholar 

  35. Smith, C., Kortemme, T.: Structure-based prediction of the peptide sequence space recognized by natural and synthetic pdz domains. Journal of Molecular Biology 402(2), 460–474 (2010)

    Article  Google Scholar 

  36. Stiffler, M., Chen, J., Grantcharova, V., Lei, Y., Fuchs, D., Allen, J., Zaslavskaia, L., MacBeath, G.: Pdz domain binding selectivity is optimized across the mouse proteome. Science 317(5836), 364–369 (2007)

    Article  Google Scholar 

  37. Thomas, J., Ramakrishnan, N., Bailey-Kellogg, C.: Graphical models of residue coupling in protein families. IEEE/ACM Transactions on Computational Biology and Bioinformatics 5(2), 183–197 (2008)

    Article  Google Scholar 

  38. Thomas, J., Ramakrishnan, N., Bailey-Kellogg, C.: Graphical models of protein-protein interaction specificity from correlated mutations and interaction data. Proteins: Structure, Function, and Bioinformatics 76(4), 911–929 (2009)

    Article  Google Scholar 

  39. Thomas, J., Ramakrishnan, N., Bailey-Kellogg, C.: Protein design by sampling an undirected graphical model of residue constraints. IEEE/ACM Transactions on Computational Biology and Bioinformatics 6(3), 506–516 (2009)

    Article  Google Scholar 

  40. Tonikian, R., Zhang, Y., Sazinsky, S., Currell, B., Yeh, J., Reva, B., Held, H., Appleton, B., Evangelista, M., Wu, Y., Xin, X., Chan, A., Seshagiri, S., Lasky, L., Sander, C., Boone, C., Bader, G., Sidhu, S.: A specificity map for the PDZ domain family. Plos Biology 6(9), e239 (2008)

    Google Scholar 

  41. Wang, P., Sidney, J., Dow, C., Mothe, B., Sette, A., Peters, B.: A systematic assessment of MHC class II peptide binding predictions and evaluation of a consensus approach. PLoS Comp. Biol. 4, e1000048 (2008)

    Google Scholar 

  42. Xu, J., Jiao, F., Berger, B.: A parameterized algorithm for protein structure alignment. In: Apostolico, A., Guerra, C., Istrail, S., Pevzner, P.A., Waterman, M. (eds.) RECOMB 2006. LNCS (LNBI), vol. 3909, pp. 488–499. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  43. Zhang, L., Udaka, K., Mamitsuka, H., Zhu, S.: Toward more accurate pan-specific MHC-peptide binding prediction: a review of current methods and tools. Brief. Bioinform. 13, 350–364 (2012)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Kamisetty, H., Ghosh, B., Langmead, C.J., Bailey-Kellogg, C. (2014). Learning Sequence Determinants of Protein: Protein Interaction Specificity with Sparse Graphical Models. In: Sharan, R. (eds) Research in Computational Molecular Biology. RECOMB 2014. Lecture Notes in Computer Science(), vol 8394. Springer, Cham. https://doi.org/10.1007/978-3-319-05269-4_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-05269-4_10

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-05268-7

  • Online ISBN: 978-3-319-05269-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics