SH2 Domains pp 83-97 | Cite as

An Efficient Semi-supervised Learning Approach to Predict SH2 Domain Mediated Interactions

  • Kousik KunduEmail author
  • Rolf Backofen
Part of the Methods in Molecular Biology book series (MIMB, volume 1555)


Src homology 2 (SH2) domain is an important subclass of modular protein domains that plays an indispensable role in several biological processes in eukaryotes. SH2 domains specifically bind to the phosphotyrosine residue of their binding peptides to facilitate various molecular functions. For determining the subtle binding specificities of SH2 domains, it is very important to understand the intriguing mechanisms by which these domains recognize their target peptides in a complex cellular environment. There are several attempts have been made to predict SH2–peptide interactions using high-throughput data. However, these high-throughput data are often affected by a low signal to noise ratio. Furthermore, the prediction methods have several additional shortcomings, such as linearity problem, high computational complexity, etc. Thus, computational identification of SH2–peptide interactions using high-throughput data remains challenging. Here, we propose a machine learning approach based on an efficient semi-supervised learning technique for the prediction of 51 SH2 domain mediated interactions in the human proteome. In our study, we have successfully employed several strategies to tackle the major problems in computational identification of SH2–peptide interactions.

Key words

Src homology 2 domain Signal transduction Protein–protein interaction Phosphotyrosine peptides Support vector machine Semi-supervised learning 



This chapter is based on our previous publication [18]. This work was funded by Bundesministerium für Bildung und Forschung (e-bio; FKZ 0316174A to Rolf Backofen), and the Centre for Biological Signalling Studies (BIOSS), University of Freiburg.


  1. 1.
    Sadowski I, Stone JC, Pawson T (1986) A noncatalytic domain conserved among cytoplasmic protein-tyrosine kinases modifies the kinase function and transforming activity of Fujinami sarcoma virus P130gag-fps. Mol Cell Biol 6(12):4396–4408CrossRefPubMedPubMedCentralGoogle Scholar
  2. 2.
    Mayer BJ, Hamaguchi M, Hanafusa H (1988) A novel viral oncogene with structural similarity to phospholipase C. Nature 332(6161):272–275CrossRefPubMedGoogle Scholar
  3. 3.
    Anderson D, Koch CA, Grey L, Ellis C, Moran MF, Pawson T (1990) Binding of SH2 domains of phospholipase C gamma 1, GAP, and Src to activated growth factor receptors. Science 250(4983):979–982CrossRefPubMedGoogle Scholar
  4. 4.
    Lim WA, Pawson T (2010) Phosphotyrosine signaling: evolving a new cellular communication system. Cell 142(5):661–667CrossRefPubMedPubMedCentralGoogle Scholar
  5. 5.
    Liu BA, Shah E, Jablonowski K, Stergachis A, Engelmann B, Nash PD (2011) The SH2 domain-containing proteins in 21 species establish the provenance and scope of phosphotyrosine signaling in eukaryotes. Sci Signal 4(202):ra83Google Scholar
  6. 6.
    Magrane M, UniProt Consortium (2011) UniProt Knowledgebase: a hub of integrated protein data. Database (Oxford) 2011:bar009Google Scholar
  7. 7.
    Waksman G, Kominos D, Robertson SC, Pant N, Baltimore D, Birge RB, Cowburn D, Hanafusa H, Mayer BJ, Overduin M, Resh MD, Rios CB, Silverman L, Kuriyan J (1992) Crystal structure of the phosphotyrosine recognition domain SH2 of v-src complexed with tyrosine-phosphorylated peptides. Nature 358(6388):646–653CrossRefPubMedGoogle Scholar
  8. 8.
    Pawson T (2004) Specificity in signal transduction: from phosphotyrosine-SH2 domain interactions to complex cellular systems. Cell 116(2):191–203CrossRefPubMedGoogle Scholar
  9. 9.
    Liu BA, Engelmann BW, Nash PD (2012) The language of SH2 domain interactions defines phosphotyrosine-mediated signal transduction. FEBS Lett 586(17):2597–2605Google Scholar
  10. 10.
    Imhof D, Wavreille A-S, May A, Zacharias M, Tridandapani S, Pei D (2006) Sequence specificity of SHP-1 and SHP-2 Src homology 2 domains. Critical roles of residues beyond the pY+3 position. J Biol Chem 281(29):20271–20282CrossRefPubMedGoogle Scholar
  11. 11.
    Sayos J, Wu C, Morra M, Wang N, Zhang X, Allen D, van Schaik S, Notarangelo L, Geha R, Roncarolo MG, Oettgen H, De Vries JE, Aversa G, Terhorst C, (1998) The X-linked lymphoproliferative-disease gene product SAP regulates signals induced through the co-receptor SLAM. Nature 395(6701):462–469Google Scholar
  12. 12.
    Tzeng SR, Pai MT, Lung FD, Wu CW, Roller PP, Lei B, Wei CJ, Tu SC, Chen SH, Soong WJ, Cheng JW (2000) Stability and peptide binding specificity of Btk SH2 domain: molecular basis for X-linked agammaglobulinemia. Protein Sci 9(12):2377–2385CrossRefPubMedPubMedCentralGoogle Scholar
  13. 13.
    Tartaglia M, Mehler EL, Goldberg R, Zampino G, Brunner HG, Kremer H, van der Burgt I, Crosby AH, Ion A, Jeffery S, Kalidas K, Patton MA, Kucherlapati RS, Gelb BD (2001) Mutations in PTPN11, encoding the protein tyrosine phosphatase SHP-2, cause Noonan syndrome. Nat Genet 29(4):465–468Google Scholar
  14. 14.
    Obenauer JC, Cantley LC, Yaffe MB (2003) Scansite 2.0: Proteome-wide prediction of cell signaling interactions using short sequence motifs. Nucleic Acids Res 31(13):3635–3641CrossRefPubMedPubMedCentralGoogle Scholar
  15. 15.
    Li L, Wu C, Huang H, Zhang K, Gan J, Li SS-C (2008) Prediction of phosphotyrosine signaling networks using a scoring matrix-assisted ligand identification approach. Nucleic Acids Res 36(10):3263–3273CrossRefPubMedPubMedCentralGoogle Scholar
  16. 16.
    Ng AY, Jordan MI (2001) On discriminative vs. generative classifiers: a comparison of logistic regression and naive bayes. In: NIPS, pp 841–848Google Scholar
  17. 17.
    Kundu K, Costa F, Backofen R (2013) A graph kernel approach for alignment-free domain-peptide interaction prediction with an application to human SH3 domains. Bioinformatics 29(13):i335–i343CrossRefPubMedPubMedCentralGoogle Scholar
  18. 18.
    Kundu K, Costa F, Huber M, Reth M, Backofen R (2013) Semi-supervised prediction of SH2-peptide interactions from imbalanced high-throughput data. PLoS One 8(5):e62732CrossRefPubMedPubMedCentralGoogle Scholar
  19. 19.
    Kundu K, Mann M, Costa F, Backofen R (2014) MoDPepInt: an interactive web server for prediction of modular domain-peptide interactions. Bioinformatics 30(18):2668–2669CrossRefPubMedPubMedCentralGoogle Scholar
  20. 20.
    Miller ML, Jensen LJ, Diella F, Jorgensen C, Tinti M, Li L, Hsiung M, Parker SA, Bordeaux J, Sicheritz-Ponten T, Olhovsky M, Pasculescu A, Alexander J, Knapp S, Blom N, Bork P, Li S, Cesareni G, Pawson T, Turk BE, Yaffe MB, Brunak S, Linding R (2008) Linear motif atlas for phosphorylation-dependent signaling. Sci Signal 1(35):ra2Google Scholar
  21. 21.
    Jones RB, Gordus A, Krall JA, MacBeath G (2006) A quantitative protein interaction network for the ErbB receptors using protein microarrays. Nature 439(7073):168–174CrossRefPubMedGoogle Scholar
  22. 22.
    Kaushansky A, Gordus A, Chang B, Rush J, MacBeath G (2008) A quantitative study of the recruitment potential of all intracellular tyrosine residues on EGFR, FGFR1 and IGF1R. Mol Biosyst 4(6):643–653CrossRefPubMedPubMedCentralGoogle Scholar
  23. 23.
    Diella F, Gould CM, Chica C, Via A, Gibson TJ (2008) Phospho.ELM: a database of phosphorylation sites–update 2008. Nucleic Acids Res 36(Database issue):D240–D244Google Scholar
  24. 24.
    Liu BA, Jablonowski K, Shah EE, Engelmann BW, Jones RB, Nash PD (2010) SH2 domains recognize contextual peptide sequence information to determine selectivity. Mol Cell Proteomics 9(11):2391–2404CrossRefPubMedPubMedCentralGoogle Scholar
  25. 25.
    Joachims T (1999) Making large-scale SVM learning practical. In: Schölkopf B, Burges C, Smola A (eds) Advanced in Kernel methods-support vector learning. MIT Press, Cambridge, MA, pp 169–184Google Scholar
  26. 26.
    Hastie T, Tibshirani R, Friedman J (2008) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, BerlinGoogle Scholar
  27. 27.
    Wunderlich Z, Mirny LA (2009) Using genome-wide measurements for computational prediction of SH2-peptide interactions. Nucleic Acids Res 37(14):4629–4641CrossRefPubMedPubMedCentralGoogle Scholar
  28. 28.
    Huang DW, Sherman BT, Lempicki RA (2009) Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res 37(1):1–13CrossRefGoogle Scholar
  29. 29.
    Kundu K, Backofen R (2014) Cluster based prediction of PDZ-peptide interactions. BMC Genomics 15(Suppl 1):S5CrossRefPubMedPubMedCentralGoogle Scholar
  30. 30.
    Li L, Zhao B, Du J, Zhang K, Ling CX, Li SS-C (2011) DomPep–a general method for predicting modular domain-mediated protein-protein interactions. PLoS One 6(10):e25528CrossRefPubMedPubMedCentralGoogle Scholar
  31. 31.
    He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21:1263–1284CrossRefGoogle Scholar
  32. 32.
    Ben-Hur A, Noble WS (2006) Choosing negative examples for the prediction of protein-protein interactions. BMC Bioinf 7(Suppl 1):S2CrossRefGoogle Scholar
  33. 33.
    Chawla N, Bowyer K, Hall L, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357Google Scholar
  34. 34.
    Hornbeck PV, Kornhauser JM, Tkachev S, Zhang B, Skrzypek E, Murray B, Latham V, Sullivan M (2012) PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse. Nucleic Acids Res 40(Database issue):D261–D270CrossRefPubMedGoogle Scholar
  35. 35.
    Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25(1):25–29CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer Science+Business Media LLC 2017

Authors and Affiliations

  1. 1.Bioinformatics Group, Department of Computer ScienceUniversity of FreiburgFreiburgGermany
  2. 2.Department of Human GeneticsThe Wellcome Trust Sanger InstituteHinxton, CambridgeUK
  3. 3.Department of HaematologyUniversity of CambridgeCambridgeUK
  4. 4.Centre for Biological Signalling Studies (BIOSS), University of FreiburgFreiburgGermany

Personalised recommendations