A New Machine Learning Approach for Protein Phosphorylation Site Prediction in Plants

  • Jianjiong Gao
  • Ganesh Kumar Agrawal
  • Jay J. Thelen
  • Zoran Obradovic
  • A. Keith Dunker
  • Dong Xu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5462)


Protein phosphorylation is a crucial regulatory mechanism in various organisms. With recent improvements in mass spectrometry, phosphorylation site data are rapidly accumulating. Despite this wealth of data, computational prediction of phosphorylation sites remains a challenging task. This is particularly true in plants, due to the limited information on substrate specificities of protein kinases in plants and the fact that current phosphorylation prediction tools are trained with kinase-specific phosphorylation data from non-plant organisms. In this paper, we proposed a new machine learning approach for phosphorylation site prediction. We incorporate protein sequence information and protein disordered regions, and integrate machine learning techniques of k-nearest neighbor and support vector machine for predicting phosphorylation sites. Test results on the PhosPhAt dataset of phosphoserines in Arabidopsis and the TAIR7 non-redundant protein database show good performance of our proposed phosphorylation site prediction method.


Protein Phosphorylation Phosphoproteomics Arabidopsis Protein Disorder KNN SVM 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Steen, H., Jebanathirajah, J.A., Rush, J., Morrice, N., Kirschner, M.W.: Phosphorylation analysis by mass spectrometry: myths, facts, and the consequences for qualitative and quantitative measurements. Mol. Cell Proteomics 5(1), 172–181 (2006)CrossRefPubMedGoogle Scholar
  2. 2.
    Olsen, J.V., Blagoev, B., Gnad, F., Macek, B., Kumar, C., Mortensen, P., Mann, M.: Global, in vivo, and site-specific phosphorylation dynamics in signaling networks. Cell 127, 635–648 (2006)CrossRefPubMedGoogle Scholar
  3. 3.
    Villén, J., Beausoleil, S.A., Gerber, S.A., Gygi, S.P.: Large-scale phosphorylation analysis of mouse liver. Proc. Natl. Acad. Sci. USA 104, 1488–1493 (2007)CrossRefPubMedPubMedCentralGoogle Scholar
  4. 4.
    Chi, A., Huttenhower, C., Geer, L.Y., Coon, J.J., Syka, J.E., Bai, D.L., Shabanowitz, J., Burke, D.J., Troyanskaya, O.G., Hunt, D.F.: Analysis of phosphorylation sites on proteins from Saccharomyces cerevisiae by electron transfer dissociation (ETD) mass spectrometry. Proc. Natl. Acad. Sci. USA 104, 2193–2198 (2007)CrossRefPubMedPubMedCentralGoogle Scholar
  5. 5.
    Benschop, J.J., Mohammed, S., O’Flaherty, M., Heck, A.J., Slijper, M., Menke, F.L.: Quantitative Phosphoproteomics of Early Elicitor Signaling in Arabidopsis. Mol Cell Proteomics 6, 1198–1214 (2007)CrossRefPubMedGoogle Scholar
  6. 6.
    Sugiyama, N., Nakagami, H., Mochida, K., Daudi, A., Tomita, M., Shirasu, K., Ishihama, Y.: Large-scale phosphorylation mapping reveals the extent of tyrosine phosphorylation in Arabidopsis. Mol. Syst. Biol. 4, 193 (2008)CrossRefPubMedPubMedCentralGoogle Scholar
  7. 7.
    Diella, F., Gould, C.M., Chica, C., Via, A., Gibson, T.J.: Phospho.ELM: a database of phosphorylation sites–update 2008. Nucleic Acids Res. 36(Database issue), D240–D244 (2008)Google Scholar
  8. 8.
    Gnad, F., Ren, S., Cox, J., Olsen, J.V., Macek, B., Oroshi, M., Mann, M.: PHOSIDA (phosphorylation site database): management, structural and evolutionary investigation, and prediction of phosphosites. Genome Biol. 8, R250 (2007)CrossRefGoogle Scholar
  9. 9.
    Tchieu, J.H., Fana, F., Fink, J.L., Harper, J., Nair, T.M., Niedner, R.H., Smith, D.W., Steube, K., Tam, T.M., Veretnik, S., Wang, D., Gribskov, M.: The PlantsP and PlantsT Functional Genomics Databases. Nucleic Acids Res. 31, 342–344 (2003)CrossRefPubMedPubMedCentralGoogle Scholar
  10. 10.
    Heazlewood, J.L., Durek, P., Hummel, J., Selbig, J., Weckwerth, W., Walther, D., Schulze, W.X.: PhosPhAt: a database of phosphorylation sites in Arabidopsis thaliana and a plant-specific phosphorylation site predictor. Nucleic Acids Res. 36(Database issue), D1015–D1021 (2008)Google Scholar
  11. 11.
    Gao, J., Agrawal, G.K., Thelen, J.J., Xu, D.: P3DB: a plant protein phosphorylation database. Nucleic Acids Res. 37(Database issue), D960–D962 (2009)CrossRefGoogle Scholar
  12. 12.
    Obenauer, J.C., Cantley, L.C., Yaffe, M.B.: Scansite 2.0: Proteome-wide prediction of cell signaling interactions using short sequence motifs. Nucleic Acids Res. 31(13), 3635–3641 (2003)CrossRefPubMedPubMedCentralGoogle Scholar
  13. 13.
    Blom, N., Sicheritz-Ponten, T., Gupta, R., Gammeltoft, S., Brunak, S.: Proteomics. Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence 4(6), 1633–1649 (2004)Google Scholar
  14. 14.
    Kim, J.H., Lee, J., Oh, B., Kimm, K., Koh, I.: Prediction of phosphorylation sites using SVMs. Bioinformatics 20(17), 3179–3184 (2004)CrossRefPubMedGoogle Scholar
  15. 15.
    Iakoucheva, L.M., Radivojac, P., Brown, C.J., O’Connor, T.R., Sikes, J.G., Obradovic, Z., Dunker, A.K.: The importance of intrinsic disorder for protein phosphorylation. Nucleic Acids Res. 32(3), 1037–1049 (2004)CrossRefPubMedPubMedCentralGoogle Scholar
  16. 16.
    Huang, H.D., Lee, T.Y., Tzeng, S.W., Horng, J.T.: KinasePhos: a web tool for identifying protein kinase-specific phosphorylation sites. Nucleic Acids Res. 33(Web Server issue), W226–W229 (2005)CrossRefGoogle Scholar
  17. 17.
    Xue, Y., Li, A., Wang, L., Feng, H., Yao, X.: PPSP: prediction of PK-specific phosphorylation site with Bayesian decision theory. BMC Bioinformatics 7, 163 (2006)CrossRefPubMedPubMedCentralGoogle Scholar
  18. 18.
    Neuberger, G., Schneider, G., Eisenhaber, F.: pkaPS: prediction of protein kinase A phosphorylation sites with the simplified kinase substrate binding model. Biol. Direct. 2, 1 (2007)CrossRefPubMedPubMedCentralGoogle Scholar
  19. 19.
    Saunders, N.F., Kobe, B.: The Predikin webserver: improved prediction of protein kinase peptide specificity using structural information. Nucleic Acids Res. 36(Web Server issue), W286–W290 (2008)CrossRefGoogle Scholar
  20. 20.
    Xue, Y., Ren, J., Gao, X., Jin, C., Wen, L., Yao, X.: GPS 2.0, a tool to predict kinase-specific phosphorylation sites in hierarchy. Mol. Cell Proteomics 7(9), 1598–1608 (2008)CrossRefPubMedPubMedCentralGoogle Scholar
  21. 21.
    Plewczynski, D., Tkacz, A., Wyrwicz, L.S., Rychlewski, L., Ginalski, K.: AutoMotif Server for prediction of phosphorylation sites in proteins using support vector machine: 2007 update. J. Mol. Model 14(1), 69–76 (2008)CrossRefPubMedGoogle Scholar
  22. 22.
    Dang, T.H., Van Leemput, K., Verschoren, A., Laukens, K.: Prediction of kinase-specific phosphorylation sites using conditional random fields. Bioinformatics 24(24), 2857–2864 (2008)CrossRefPubMedPubMedCentralGoogle Scholar
  23. 23.
    Swarbreck, D., Wilks, C., Lamesch, P., Berardini, T.Z., Garcia-Hernandez, M., Foerster, H., Li, D., Meyer, T., Muller, R., Ploetz, L., Radenbaugh, A., Singh, S., Swing, V., Tissier, C., Zhang, P., Huala, E.: The Arabidopsis Information Resource (TAIR): gene structure and function annotation. Nucleic Acids Res. 36(Database issue), D1009–D1014 (2008)Google Scholar
  24. 24.
    Kennelly, P.J., Krebs, E.G.: Consensus sequences as substrate specificity determinants for protein kinases and protein phosphatases. J. Biol. Chem. 266, 15555–15558 (1991)PubMedGoogle Scholar
  25. 25.
    Henikoff, S.: Amino acid substitution matrices from protein blocks. Proc. Natl. Acad Sci. USA 89, 10915–10919 (1992)CrossRefPubMedPubMedCentralGoogle Scholar
  26. 26.
    Dunker, A.K., Oldfield, C.J., Meng, J., Romero, P., Yang, J.Y., Chen, J.W., Vacic, V., Obradovic, Z., Uversky, V.N.: The unfoldomics decade: an update on intrinsically disordered proteins. BMC Genomics 9(Suppl. 2), S1 (2008)CrossRefGoogle Scholar
  27. 27.
    Obradovic, Z., Peng, K., Vucetic, S., Radivojac, P., Dunker, A.K.: Exploiting heterogeneous sequence properties improves prediction of protein disorder. Proteins 61(suppl. 7), 176–182 (2005)CrossRefPubMedGoogle Scholar
  28. 28.
    Joachims, T.: SVMlight Version 6.0.2 (2008),

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Jianjiong Gao
    • 1
    • 2
  • Ganesh Kumar Agrawal
    • 2
    • 3
  • Jay J. Thelen
    • 2
    • 3
  • Zoran Obradovic
    • 4
  • A. Keith Dunker
    • 5
  • Dong Xu
    • 1
    • 2
  1. 1.Department of Computer ScienceUSA
  2. 2.C.S. Bond Life Sciences CenterUSA
  3. 3.Department of BiochemistryUniversity of MissouriColumbiaUSA
  4. 4.Center for Information Science and TechnologyTemple UniversityPhiladelphiaUSA
  5. 5.Center for Computational Biology and BioinformaticsIndiana University Schools of Medicine and InformaticsIndianapolisUSA

Personalised recommendations