Skip to main content

An Efficient Semi-supervised Learning Approach to Predict SH2 Domain Mediated Interactions

  • Protocol
  • First Online:
  • 1439 Accesses

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1555))

Abstract

Src homology 2 (SH2) domain is an important subclass of modular protein domains that plays an indispensable role in several biological processes in eukaryotes. SH2 domains specifically bind to the phosphotyrosine residue of their binding peptides to facilitate various molecular functions. For determining the subtle binding specificities of SH2 domains, it is very important to understand the intriguing mechanisms by which these domains recognize their target peptides in a complex cellular environment. There are several attempts have been made to predict SH2–peptide interactions using high-throughput data. However, these high-throughput data are often affected by a low signal to noise ratio. Furthermore, the prediction methods have several additional shortcomings, such as linearity problem, high computational complexity, etc. Thus, computational identification of SH2–peptide interactions using high-throughput data remains challenging. Here, we propose a machine learning approach based on an efficient semi-supervised learning technique for the prediction of 51 SH2 domain mediated interactions in the human proteome. In our study, we have successfully employed several strategies to tackle the major problems in computational identification of SH2–peptide interactions.

This is a preview of subscription content, log in via an institution.

Buying options

Protocol
USD   49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

References

  1. Sadowski I, Stone JC, Pawson T (1986) A noncatalytic domain conserved among cytoplasmic protein-tyrosine kinases modifies the kinase function and transforming activity of Fujinami sarcoma virus P130gag-fps. Mol Cell Biol 6(12):4396–4408

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Mayer BJ, Hamaguchi M, Hanafusa H (1988) A novel viral oncogene with structural similarity to phospholipase C. Nature 332(6161):272–275

    Article  CAS  PubMed  Google Scholar 

  3. Anderson D, Koch CA, Grey L, Ellis C, Moran MF, Pawson T (1990) Binding of SH2 domains of phospholipase C gamma 1, GAP, and Src to activated growth factor receptors. Science 250(4983):979–982

    Article  CAS  PubMed  Google Scholar 

  4. Lim WA, Pawson T (2010) Phosphotyrosine signaling: evolving a new cellular communication system. Cell 142(5):661–667

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Liu BA, Shah E, Jablonowski K, Stergachis A, Engelmann B, Nash PD (2011) The SH2 domain-containing proteins in 21 species establish the provenance and scope of phosphotyrosine signaling in eukaryotes. Sci Signal 4(202):ra83

    Google Scholar 

  6. Magrane M, UniProt Consortium (2011) UniProt Knowledgebase: a hub of integrated protein data. Database (Oxford) 2011:bar009

    Google Scholar 

  7. Waksman G, Kominos D, Robertson SC, Pant N, Baltimore D, Birge RB, Cowburn D, Hanafusa H, Mayer BJ, Overduin M, Resh MD, Rios CB, Silverman L, Kuriyan J (1992) Crystal structure of the phosphotyrosine recognition domain SH2 of v-src complexed with tyrosine-phosphorylated peptides. Nature 358(6388):646–653

    Article  CAS  PubMed  Google Scholar 

  8. Pawson T (2004) Specificity in signal transduction: from phosphotyrosine-SH2 domain interactions to complex cellular systems. Cell 116(2):191–203

    Article  CAS  PubMed  Google Scholar 

  9. Liu BA, Engelmann BW, Nash PD (2012) The language of SH2 domain interactions defines phosphotyrosine-mediated signal transduction. FEBS Lett 586(17):2597–2605

    Google Scholar 

  10. Imhof D, Wavreille A-S, May A, Zacharias M, Tridandapani S, Pei D (2006) Sequence specificity of SHP-1 and SHP-2 Src homology 2 domains. Critical roles of residues beyond the pY+3 position. J Biol Chem 281(29):20271–20282

    Article  CAS  PubMed  Google Scholar 

  11. Sayos J, Wu C, Morra M, Wang N, Zhang X, Allen D, van Schaik S, Notarangelo L, Geha R, Roncarolo MG, Oettgen H, De Vries JE, Aversa G, Terhorst C, (1998) The X-linked lymphoproliferative-disease gene product SAP regulates signals induced through the co-receptor SLAM. Nature 395(6701):462–469

    Google Scholar 

  12. Tzeng SR, Pai MT, Lung FD, Wu CW, Roller PP, Lei B, Wei CJ, Tu SC, Chen SH, Soong WJ, Cheng JW (2000) Stability and peptide binding specificity of Btk SH2 domain: molecular basis for X-linked agammaglobulinemia. Protein Sci 9(12):2377–2385

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Tartaglia M, Mehler EL, Goldberg R, Zampino G, Brunner HG, Kremer H, van der Burgt I, Crosby AH, Ion A, Jeffery S, Kalidas K, Patton MA, Kucherlapati RS, Gelb BD (2001) Mutations in PTPN11, encoding the protein tyrosine phosphatase SHP-2, cause Noonan syndrome. Nat Genet 29(4):465–468

    Google Scholar 

  14. Obenauer JC, Cantley LC, Yaffe MB (2003) Scansite 2.0: Proteome-wide prediction of cell signaling interactions using short sequence motifs. Nucleic Acids Res 31(13):3635–3641

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Li L, Wu C, Huang H, Zhang K, Gan J, Li SS-C (2008) Prediction of phosphotyrosine signaling networks using a scoring matrix-assisted ligand identification approach. Nucleic Acids Res 36(10):3263–3273

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Ng AY, Jordan MI (2001) On discriminative vs. generative classifiers: a comparison of logistic regression and naive bayes. In: NIPS, pp 841–848

    Google Scholar 

  17. Kundu K, Costa F, Backofen R (2013) A graph kernel approach for alignment-free domain-peptide interaction prediction with an application to human SH3 domains. Bioinformatics 29(13):i335–i343

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Kundu K, Costa F, Huber M, Reth M, Backofen R (2013) Semi-supervised prediction of SH2-peptide interactions from imbalanced high-throughput data. PLoS One 8(5):e62732

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Kundu K, Mann M, Costa F, Backofen R (2014) MoDPepInt: an interactive web server for prediction of modular domain-peptide interactions. Bioinformatics 30(18):2668–2669

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Miller ML, Jensen LJ, Diella F, Jorgensen C, Tinti M, Li L, Hsiung M, Parker SA, Bordeaux J, Sicheritz-Ponten T, Olhovsky M, Pasculescu A, Alexander J, Knapp S, Blom N, Bork P, Li S, Cesareni G, Pawson T, Turk BE, Yaffe MB, Brunak S, Linding R (2008) Linear motif atlas for phosphorylation-dependent signaling. Sci Signal 1(35):ra2

    Google Scholar 

  21. Jones RB, Gordus A, Krall JA, MacBeath G (2006) A quantitative protein interaction network for the ErbB receptors using protein microarrays. Nature 439(7073):168–174

    Article  CAS  PubMed  Google Scholar 

  22. Kaushansky A, Gordus A, Chang B, Rush J, MacBeath G (2008) A quantitative study of the recruitment potential of all intracellular tyrosine residues on EGFR, FGFR1 and IGF1R. Mol Biosyst 4(6):643–653

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Diella F, Gould CM, Chica C, Via A, Gibson TJ (2008) Phospho.ELM: a database of phosphorylation sites–update 2008. Nucleic Acids Res 36(Database issue):D240–D244

    Google Scholar 

  24. Liu BA, Jablonowski K, Shah EE, Engelmann BW, Jones RB, Nash PD (2010) SH2 domains recognize contextual peptide sequence information to determine selectivity. Mol Cell Proteomics 9(11):2391–2404

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Joachims T (1999) Making large-scale SVM learning practical. In: Schölkopf B, Burges C, Smola A (eds) Advanced in Kernel methods-support vector learning. MIT Press, Cambridge, MA, pp 169–184

    Google Scholar 

  26. Hastie T, Tibshirani R, Friedman J (2008) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, Berlin

    Google Scholar 

  27. Wunderlich Z, Mirny LA (2009) Using genome-wide measurements for computational prediction of SH2-peptide interactions. Nucleic Acids Res 37(14):4629–4641

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Huang DW, Sherman BT, Lempicki RA (2009) Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res 37(1):1–13

    Article  Google Scholar 

  29. Kundu K, Backofen R (2014) Cluster based prediction of PDZ-peptide interactions. BMC Genomics 15(Suppl 1):S5

    Article  PubMed  PubMed Central  Google Scholar 

  30. Li L, Zhao B, Du J, Zhang K, Ling CX, Li SS-C (2011) DomPep–a general method for predicting modular domain-mediated protein-protein interactions. PLoS One 6(10):e25528

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21:1263–1284

    Article  Google Scholar 

  32. Ben-Hur A, Noble WS (2006) Choosing negative examples for the prediction of protein-protein interactions. BMC Bioinf 7(Suppl 1):S2

    Article  Google Scholar 

  33. Chawla N, Bowyer K, Hall L, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    Google Scholar 

  34. Hornbeck PV, Kornhauser JM, Tkachev S, Zhang B, Skrzypek E, Murray B, Latham V, Sullivan M (2012) PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse. Nucleic Acids Res 40(Database issue):D261–D270

    Article  CAS  PubMed  Google Scholar 

  35. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25(1):25–29

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

This chapter is based on our previous publication [18]. This work was funded by Bundesministerium für Bildung und Forschung (e-bio; FKZ 0316174A to Rolf Backofen), and the Centre for Biological Signalling Studies (BIOSS), University of Freiburg.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kousik Kundu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media LLC

About this protocol

Cite this protocol

Kundu, K., Backofen, R. (2017). An Efficient Semi-supervised Learning Approach to Predict SH2 Domain Mediated Interactions. In: Machida, K., Liu, B. (eds) SH2 Domains. Methods in Molecular Biology, vol 1555. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-6762-9_6

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-6762-9_6

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-6760-5

  • Online ISBN: 978-1-4939-6762-9

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics