Abstract
Src homology 2 (SH2) domain is an important subclass of modular protein domains that plays an indispensable role in several biological processes in eukaryotes. SH2 domains specifically bind to the phosphotyrosine residue of their binding peptides to facilitate various molecular functions. For determining the subtle binding specificities of SH2 domains, it is very important to understand the intriguing mechanisms by which these domains recognize their target peptides in a complex cellular environment. There are several attempts have been made to predict SH2–peptide interactions using high-throughput data. However, these high-throughput data are often affected by a low signal to noise ratio. Furthermore, the prediction methods have several additional shortcomings, such as linearity problem, high computational complexity, etc. Thus, computational identification of SH2–peptide interactions using high-throughput data remains challenging. Here, we propose a machine learning approach based on an efficient semi-supervised learning technique for the prediction of 51 SH2 domain mediated interactions in the human proteome. In our study, we have successfully employed several strategies to tackle the major problems in computational identification of SH2–peptide interactions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Sadowski I, Stone JC, Pawson T (1986) A noncatalytic domain conserved among cytoplasmic protein-tyrosine kinases modifies the kinase function and transforming activity of Fujinami sarcoma virus P130gag-fps. Mol Cell Biol 6(12):4396–4408
Mayer BJ, Hamaguchi M, Hanafusa H (1988) A novel viral oncogene with structural similarity to phospholipase C. Nature 332(6161):272–275
Anderson D, Koch CA, Grey L, Ellis C, Moran MF, Pawson T (1990) Binding of SH2 domains of phospholipase C gamma 1, GAP, and Src to activated growth factor receptors. Science 250(4983):979–982
Lim WA, Pawson T (2010) Phosphotyrosine signaling: evolving a new cellular communication system. Cell 142(5):661–667
Liu BA, Shah E, Jablonowski K, Stergachis A, Engelmann B, Nash PD (2011) The SH2 domain-containing proteins in 21 species establish the provenance and scope of phosphotyrosine signaling in eukaryotes. Sci Signal 4(202):ra83
Magrane M, UniProt Consortium (2011) UniProt Knowledgebase: a hub of integrated protein data. Database (Oxford) 2011:bar009
Waksman G, Kominos D, Robertson SC, Pant N, Baltimore D, Birge RB, Cowburn D, Hanafusa H, Mayer BJ, Overduin M, Resh MD, Rios CB, Silverman L, Kuriyan J (1992) Crystal structure of the phosphotyrosine recognition domain SH2 of v-src complexed with tyrosine-phosphorylated peptides. Nature 358(6388):646–653
Pawson T (2004) Specificity in signal transduction: from phosphotyrosine-SH2 domain interactions to complex cellular systems. Cell 116(2):191–203
Liu BA, Engelmann BW, Nash PD (2012) The language of SH2 domain interactions defines phosphotyrosine-mediated signal transduction. FEBS Lett 586(17):2597–2605
Imhof D, Wavreille A-S, May A, Zacharias M, Tridandapani S, Pei D (2006) Sequence specificity of SHP-1 and SHP-2 Src homology 2 domains. Critical roles of residues beyond the pY+3 position. J Biol Chem 281(29):20271–20282
Sayos J, Wu C, Morra M, Wang N, Zhang X, Allen D, van Schaik S, Notarangelo L, Geha R, Roncarolo MG, Oettgen H, De Vries JE, Aversa G, Terhorst C, (1998) The X-linked lymphoproliferative-disease gene product SAP regulates signals induced through the co-receptor SLAM. Nature 395(6701):462–469
Tzeng SR, Pai MT, Lung FD, Wu CW, Roller PP, Lei B, Wei CJ, Tu SC, Chen SH, Soong WJ, Cheng JW (2000) Stability and peptide binding specificity of Btk SH2 domain: molecular basis for X-linked agammaglobulinemia. Protein Sci 9(12):2377–2385
Tartaglia M, Mehler EL, Goldberg R, Zampino G, Brunner HG, Kremer H, van der Burgt I, Crosby AH, Ion A, Jeffery S, Kalidas K, Patton MA, Kucherlapati RS, Gelb BD (2001) Mutations in PTPN11, encoding the protein tyrosine phosphatase SHP-2, cause Noonan syndrome. Nat Genet 29(4):465–468
Obenauer JC, Cantley LC, Yaffe MB (2003) Scansite 2.0: Proteome-wide prediction of cell signaling interactions using short sequence motifs. Nucleic Acids Res 31(13):3635–3641
Li L, Wu C, Huang H, Zhang K, Gan J, Li SS-C (2008) Prediction of phosphotyrosine signaling networks using a scoring matrix-assisted ligand identification approach. Nucleic Acids Res 36(10):3263–3273
Ng AY, Jordan MI (2001) On discriminative vs. generative classifiers: a comparison of logistic regression and naive bayes. In: NIPS, pp 841–848
Kundu K, Costa F, Backofen R (2013) A graph kernel approach for alignment-free domain-peptide interaction prediction with an application to human SH3 domains. Bioinformatics 29(13):i335–i343
Kundu K, Costa F, Huber M, Reth M, Backofen R (2013) Semi-supervised prediction of SH2-peptide interactions from imbalanced high-throughput data. PLoS One 8(5):e62732
Kundu K, Mann M, Costa F, Backofen R (2014) MoDPepInt: an interactive web server for prediction of modular domain-peptide interactions. Bioinformatics 30(18):2668–2669
Miller ML, Jensen LJ, Diella F, Jorgensen C, Tinti M, Li L, Hsiung M, Parker SA, Bordeaux J, Sicheritz-Ponten T, Olhovsky M, Pasculescu A, Alexander J, Knapp S, Blom N, Bork P, Li S, Cesareni G, Pawson T, Turk BE, Yaffe MB, Brunak S, Linding R (2008) Linear motif atlas for phosphorylation-dependent signaling. Sci Signal 1(35):ra2
Jones RB, Gordus A, Krall JA, MacBeath G (2006) A quantitative protein interaction network for the ErbB receptors using protein microarrays. Nature 439(7073):168–174
Kaushansky A, Gordus A, Chang B, Rush J, MacBeath G (2008) A quantitative study of the recruitment potential of all intracellular tyrosine residues on EGFR, FGFR1 and IGF1R. Mol Biosyst 4(6):643–653
Diella F, Gould CM, Chica C, Via A, Gibson TJ (2008) Phospho.ELM: a database of phosphorylation sites–update 2008. Nucleic Acids Res 36(Database issue):D240–D244
Liu BA, Jablonowski K, Shah EE, Engelmann BW, Jones RB, Nash PD (2010) SH2 domains recognize contextual peptide sequence information to determine selectivity. Mol Cell Proteomics 9(11):2391–2404
Joachims T (1999) Making large-scale SVM learning practical. In: Schölkopf B, Burges C, Smola A (eds) Advanced in Kernel methods-support vector learning. MIT Press, Cambridge, MA, pp 169–184
Hastie T, Tibshirani R, Friedman J (2008) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, Berlin
Wunderlich Z, Mirny LA (2009) Using genome-wide measurements for computational prediction of SH2-peptide interactions. Nucleic Acids Res 37(14):4629–4641
Huang DW, Sherman BT, Lempicki RA (2009) Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res 37(1):1–13
Kundu K, Backofen R (2014) Cluster based prediction of PDZ-peptide interactions. BMC Genomics 15(Suppl 1):S5
Li L, Zhao B, Du J, Zhang K, Ling CX, Li SS-C (2011) DomPep–a general method for predicting modular domain-mediated protein-protein interactions. PLoS One 6(10):e25528
He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21:1263–1284
Ben-Hur A, Noble WS (2006) Choosing negative examples for the prediction of protein-protein interactions. BMC Bioinf 7(Suppl 1):S2
Chawla N, Bowyer K, Hall L, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Hornbeck PV, Kornhauser JM, Tkachev S, Zhang B, Skrzypek E, Murray B, Latham V, Sullivan M (2012) PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse. Nucleic Acids Res 40(Database issue):D261–D270
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25(1):25–29
Acknowledgements
This chapter is based on our previous publication [18]. This work was funded by Bundesministerium für Bildung und Forschung (e-bio; FKZ 0316174A to Rolf Backofen), and the Centre for Biological Signalling Studies (BIOSS), University of Freiburg.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Science+Business Media LLC
About this protocol
Cite this protocol
Kundu, K., Backofen, R. (2017). An Efficient Semi-supervised Learning Approach to Predict SH2 Domain Mediated Interactions. In: Machida, K., Liu, B. (eds) SH2 Domains. Methods in Molecular Biology, vol 1555. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-6762-9_6
Download citation
DOI: https://doi.org/10.1007/978-1-4939-6762-9_6
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-6760-5
Online ISBN: 978-1-4939-6762-9
eBook Packages: Springer Protocols