PhosphoSVM: prediction of phosphorylation sites by integrating various protein sequence attributes with a support vector machine

Dou, Yongchao; Yao, Bo; Zhang, Chi

doi:10.1007/s00726-014-1711-5

PhosphoSVM: prediction of phosphorylation sites by integrating various protein sequence attributes with a support vector machine

Original Article
Published: 13 March 2014

Volume 46, pages 1459–1469, (2014)
Cite this article

Amino Acids Aims and scope Submit manuscript

Yongchao Dou¹,
Bo Yao¹ &
Chi Zhang¹

2836 Accesses
112 Citations
Explore all metrics

Abstract

Phosphorylation is one of the most essential post-translational modifications in eukaryotes. Studies on kinases and their substrates are important for understanding cellular signaling networks. Because of the cost in time and labor associated with large-scale wet-bench experiments, computational prediction of phosphorylation sites becomes important and many computational tools have been developed in the recent decades. The prediction tools can be grouped into two categories: kinase-specific and non-kinase-specific tools. With more kinases being discovered by the new sequencing technologies, accurate non-kinase-specific prediction tools are highly desirable for whole-genome annotation in a wider variety of species. In this manuscript, a support vector machine is used to combine eight different sequence level scoring functions to predict phosphorylation sites. The attributes used by this work, including Shannon entropy, relative entropy, predicted protein secondary structure, predicted protein disorder, solvent accessible area, overlapping properties, averaged cumulative hydrophobicity, and k-nearest neighbor, were able to obtain better results than the previously used attributes by other similar methods. This method achieved AUC values of 0.8405/0.8183/0.7383 for serine (S), threonine (T), and tyrosine (Y) phosphorylation sites, respectively, in animals with a tenfold cross-validation. The model trained by the animal phosphorylation sites was also applied to a plant phosphorylation site dataset as an independent test. The AUC values for the independent test dataset were 0.7761/0.6652/0.5958 for S/T/Y phosphorylation sites, which compared favorably with those of several existing methods. A web server based on our method was constructed for public use. The server, trained model, and all datasets used in the current study are available at http://sysbio.unl.edu/PhosphoSVM.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Prediction of Protein Phosphorylation Sites by Integrating Secondary Structure Information and Other One-Dimensional Structural Properties

Rice_Phospho 1.0: a new rice-specific SVM predictor for protein phosphorylation sites

Article Open access 07 July 2015

JUPred_SVM: Prediction of Phosphorylation Sites Using a Consensus of SVM Classifiers

References

Ahmad S, Gromiha MM, Sarai A (2003) RVP-net: online prediction of real valued accessible surface area of proteins from single sequences. Bioinformatics 19(14):1849–1851
Article CAS PubMed Google Scholar
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402
Article CAS PubMed Central PubMed Google Scholar
Basu S, Plewczynski D (2010) AMS 3.0: prediction of post-translational modifications. BMC Bioinforma 11:210. doi:10.1186/1471-2105-11-210
Article Google Scholar
Biswas AK, Noman N, Sikder AR (2010) Machine learning approach to predict protein phosphorylation sites by incorporating evolutionary information. BMC Bioinforma 11:273. doi:10.1186/1471-2105-11-273
Article Google Scholar
Blom N, Hansen J, Blaas D, Brunak S (1996) Cleavage site analysis in picornaviral polyproteins: discovering cellular targets by neural networks. Protein Sci 5(11):2203–2216. doi:10.1002/pro.5560051107
Article CAS PubMed Central PubMed Google Scholar
Blom N, Gammeltoft S, Brunak S (1999) Sequence and structure-based prediction of eukaryotic protein phosphorylation sites. J Mol Biol 294(5):1351–1362. doi:10.1006/jmbi1999.3310
Article CAS PubMed Google Scholar
Bologna G, Yvon C, Duvaud S, Veuthey AL (2004) N-terminal myristoylation predictions by ensembles of neural networks. Proteomics 4(6):1626–1632. doi:10.1002/pmic.200300783
Article CAS PubMed Google Scholar
Caenepeel S, Charydczak G, Sudarsanam S, Hunter T, Manning G (2004) The mouse kinome: discovery and comparative genomics of all mouse protein kinases. Proc Natl Acad Sci USA 101(32):11707–11712. doi:10.1073/pnas.0306880101
Article CAS PubMed Central PubMed Google Scholar
Capra JA, Singh M (2007) Predicting functionally important residues from sequence conservation. Bioinformatics 23(15):1875–1882. doi:10.1093/bioinformatics/btm270
Article CAS PubMed Google Scholar
Chou PY, Fasman GD (1974) Conformational parameters for amino acids in helical, beta-sheet, and random coil regions calculated from proteins. Biochemistry 13(2):211–222
Article CAS PubMed Google Scholar
DeLong ER, DeLong DM, Clarke-Pearson DL (1988) Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 44(3):837–845
Article CAS PubMed Google Scholar
Diella F, Gould CM, Chica C, Via A, Gibson TJ (2008) Phospho.ELM: a database of phosphorylation sites—update 2008. Nucleic Acids Res 36(Database issue):D240–D244. doi:10.1093/nar/gkm772
CAS PubMed Central PubMed Google Scholar
Dou Y, Zheng X, Yang J, Wang J (2010) Prediction of catalytic residues based on an overlapping amino acid classification. Amino Acids 39(5):1353–1361. doi:10.1007/s00726-010-0587-2
Article CAS PubMed Google Scholar
Dou Y, Wang J, Yang J, Zhang C (2012) L1pred: a sequence-based prediction tool for catalytic residues in enzymes with the L1-logreg classifier. PLoS One 7(4):e35666. doi:10.1371/journal.pone.0035666
Article CAS PubMed Central PubMed Google Scholar
Duckert P, Brunak S, Blom N (2004) Prediction of proprotein convertase cleavage sites. Protein Eng Des Sel 17(1):107–112. doi:10.1093/protein/gzh013
Article CAS PubMed Google Scholar
Durek P, Schmidt R, Heazlewood JL, Jones A, MacLean D, Nagel A, Kersten B, Schulze WX (2010) PhosPhAt: the Arabidopsis thaliana phosphorylation site database. An update. Nucleic Acids Res 38(Database issue):D828–D834. doi:10.1093/nar/gkp810
Article CAS PubMed Central PubMed Google Scholar
Fan RE, Chen PH, Lin CJ (2005) Working set selection using second order information for training support vector machines. J Mach Learn Res 6:1889–1918
Google Scholar
Gao J, Thelen JJ, Dunker AK, Xu D (2010) Musite, a tool for global prediction of general and kinase-specific phosphorylation sites. Mol Cell Proteomics 9(12):2586–2600. doi:10.1074/mcp.M110.001388
Article CAS PubMed Central PubMed Google Scholar
Gok M, Ozcerit AT (2012) Prediction of MHC class I binding peptides with a new feature encoding technique. Cell Immunol 275(1–2):1–4. doi:10.1016/j.cellimm.2012.04.005
Article CAS PubMed Google Scholar
Gupta R, Brunak S (2002) Prediction of glycosylation across the human proteome and the correlation to protein function. Pac Symp Biocomput 7:310–322
Google Scholar
Hamby SE, Hirst JD (2008) Prediction of glycosylation sites using random forests. BMC Bioinforma 9:500. doi:10.1186/1471-2105-9-500
Article Google Scholar
Heazlewood JL, Durek P, Hummel J, Selbig J, Weckwerth W, Walther D, Schulze WX (2008) PhosPhAt: a database of phosphorylation sites in Arabidopsis thaliana and a plant-specific phosphorylation site predictor. Nucleic Acids Res 36(Database issue):D1015–D1021. doi:10.1093/nar/gkm812
CAS PubMed Central PubMed Google Scholar
Hjerrild M, Stensballe A, Rasmussen TE, Kofoed CB, Blom N, Sicheritz-Ponten T, Larsen MR, Brunak S, Jensen ON, Gammeltoft S (2004) Identification of phosphorylation sites in protein kinase A substrates using artificial neural networks and mass spectrometry. J Proteome Res 3(3):426–433
Article CAS PubMed Google Scholar
Iakoucheva LM, Radivojac P, Brown CJ, O’Connor TR, Sikes JG, Obradovic Z, Dunker AK (2004) The importance of intrinsic disorder for protein phosphorylation. Nucleic Acids Res 32(3):1037–1049. doi:10.1093/nar/gkh253
Article CAS PubMed Central PubMed Google Scholar
Johansson F, Toh H (2010) A comparative study of conservation and variation scores. BMC Bioinforma 11:388. doi:10.1186/1471-2105-11-388
Article Google Scholar
Julenius K, Molgaard A, Gupta R, Brunak S (2005) Prediction, conservation analysis, and structural characterization of mammalian mucin-type O-glycosylation sites. Glycobiology 15(2):153–164. doi:10.1093/glycob/cwh151
Article CAS PubMed Google Scholar
Kim JH, Lee J, Oh B, Kimm K, Koh I (2004) Prediction of phosphorylation sites using SVMs. Bioinformatics 20(17):3179–3184. doi:10.1093/bioinformatics/bth382
Article CAS PubMed Google Scholar
Kreegipuu A, Blom N, Brunak S, Jarv J (1998) Statistical analysis of protein kinase specificity determinants. FEBS Lett 430(1–2):45–50
Article CAS PubMed Google Scholar
Kreegipuu A, Blom N, Brunak S (1999) PhosphoBase, a database of phosphorylation sites: release 2.0. Nucleic Acids Res 27(1):237–239
Article CAS PubMed Central PubMed Google Scholar
Lasko TA, Bhagwat JG, Zou KH, Ohno-Machado L (2005) The use of receiver operating characteristic curves in biomedical informatics. J Biomed Inform 38(5):404–415. doi:10.1016/j.jbi.2005.02.008
Article PubMed Google Scholar
Li S, Li H, Li M, Shyr Y, Xie L, Li Y (2009) Improved prediction of lysine acetylation by support vector machines. Protein Pept Lett 16(8):977–983
Article CAS PubMed Google Scholar
Mackintosh RW, Davies SP, Clarke PR, Weekes J, Gillespie JG, Gibb BJ, Hardie DG (1992) Evidence for a protein kinase cascade in higher plants. 3-Hydroxy-3-methylglutaryl-CoA reductase kinase. Eur J Biochem 209(3):923–931
Article CAS PubMed Google Scholar
Manning G, Whyte DB, Martinez R, Hunter T, Sudarsanam S (2002) The protein kinase complement of the human genome. Science 298(5600):1912–1934. doi:10.1126/science.1075762
Article CAS PubMed Google Scholar
McGuffin LJ, Bryson K, Jones DT (2000) The PSIPRED protein structure prediction server. Bioinformatics 16(4):404–405
Article CAS PubMed Google Scholar
Mihalek I, Res I, Lichtarge O (2004) A family of evolution-entropy hybrid methods for ranking protein residues by importance. J Mol Biol 336(5):1265–1282. doi:10.1016/j.jmb.2003.12.078
Article CAS PubMed Google Scholar
Shao J, Xu D, Tsai SN, Wang Y, Ngai SM (2009) Computational identification of protein methylation sites through bi-profile Bayes feature extraction. PLoS One 4(3):e4920. doi:10.1371/journal.pone.0004920
Article PubMed Central PubMed Google Scholar
Swaminathan K, Adamczak R, Porollo A, Meller J (2010) Enhanced prediction of conformational flexibility and phosphorylation in proteins. Adv Exp Med Biol 680:307–319. doi:10.1007/978-1-4419-5913-3_35
Article CAS PubMed Google Scholar
Sweet RM, Eisenberg D (1983) Correlation of sequence hydrophobicities measures similarity in three-dimensional protein structure. J Mol Biol 171(4):479–488
Article CAS PubMed Google Scholar
Taylor WR (1986) The classification of amino acid conservation. J Theor Biol 119(2):205–218
Article CAS PubMed Google Scholar
Trost B, Kusalik A (2011) Computational prediction of eukaryotic phosphorylation sites. Bioinformatics 27(21):2927–2935. doi:10.1093/bioinformatics/btr525
Article CAS PubMed Google Scholar
Vapnik VN (1998) Statistical learning theory. Adaptive and learning systems for signal processing, communications, and control. Wiley, New York
Google Scholar
Vapnik VN (2000) The nature of statistical learning theory. Statistics for engineering and information science, 2nd edn. Springer, New York
Google Scholar
Vergara IA, Norambuena T, Ferrada E, Slater AW, Melo F (2008) StAR: a simple tool for the statistical comparison of ROC curves. BMC Bioinforma 9:265. doi:10.1186/1471-2105-9-265
Article Google Scholar
Vlad F, Turk BE, Peynot P, Leung J, Merlot S (2008) A versatile strategy to define the phosphorylation preferences of plant protein kinases and screen for putative substrates. Plant J 55(1):104–117. doi:10.1111/j.1365-313X.2008.03488.x
Article CAS PubMed Google Scholar
Wang L, Brown SJ (2006) BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences. Nucleic Acids Res 34(Web server issue):W243–W248. doi:10.1093/nar/gkl298
Article CAS PubMed Central PubMed Google Scholar
Ward JJ, Sodhi JS, McGuffin LJ, Buxton BF, Jones DT (2004) Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. J Mol Biol 337(3):635–645. doi:10.1016/j.jmb.2004.02.002
Article CAS PubMed Google Scholar
Wu TD, Brutlag DL (1995) Identification of protein motifs using conserved amino acid properties and partitioning techniques. Proc Int Conf Intell Syst Mol Biol 3:402–410
CAS PubMed Google Scholar
Wu CY, Hwa YH, Chen YC, Lim C (2012) Hidden relationship between conserved residues and locally conserved phosphate-binding structures in NAD(P)-binding proteins. J Phys Chem B. doi:10.1021/jp3014332
Google Scholar
Xue Y, Li A, Wang L, Feng H, Yao X (2006) PPSP: prediction of PK-specific phosphorylation site with Bayesian decision theory. BMC Bioinforma 7:163. doi:10.1186/1471-2105-7-163
Article Google Scholar
Xue Y, Gao X, Cao J, Liu Z, Jin C, Wen L, Yao X, Ren J (2010) A summary of computational resources for protein phosphorylation. Curr Protein Pept Sci 11(6):485–496
Article CAS PubMed Google Scholar
Xue Y, Liu Z, Cao J, Ma Q, Gao X, Wang Q, Jin C, Zhou Y, Wen L, Ren J (2011) GPS 2.1: enhanced prediction of kinase-specific phosphorylation sites with an algorithm of motif length selection. Protein Eng Des Sel 24(3):255–260. doi:10.1093/protein/gzq094
Article CAS PubMed Google Scholar
Zhang T, Zhang H, Chen K, Shen S, Ruan J, Kurgan L (2008) Accurate sequence-based prediction of catalytic residues. Bioinformatics 24(20):2329–2338. doi:10.1093/bioinformatics/btn433
Article CAS PubMed Google Scholar
Zulawski M, Braginets R, Schulze WX (2013) PhosPhAt goes kinases–searchable protein kinase target information in the plant phosphorylation site database PhosPhAt. Nucleic Acids Res 41(Database issue):D1176–D1184. doi:10.1093/nar/gks1081
Article CAS PubMed Central PubMed Google Scholar

Download references

Acknowledgments

This project was supported by funding under CZ’s startup funds from University of Nebraska, Lincoln, NE. The manuscript was written through contributions of all authors. YD designed the study and implemented the algorithm. BY and CZ built the web servers. CZ supervised the whole project. All authors read and approved the final manuscript.

Conflict of interest

The authors declare that they have no conflict of interest.

Author information

Authors and Affiliations

Center for Plant Science and Innovation, School of Biological Sciences, University of Nebraska, Lincoln, NE, 68588, USA
Yongchao Dou, Bo Yao & Chi Zhang

Authors

Yongchao Dou
View author publications
You can also search for this author in PubMed Google Scholar
Bo Yao
View author publications
You can also search for this author in PubMed Google Scholar
Chi Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chi Zhang.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 87 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dou, Y., Yao, B. & Zhang, C. PhosphoSVM: prediction of phosphorylation sites by integrating various protein sequence attributes with a support vector machine. Amino Acids 46, 1459–1469 (2014). https://doi.org/10.1007/s00726-014-1711-5

Download citation

Received: 23 July 2013
Accepted: 21 February 2014
Published: 13 March 2014
Issue Date: June 2014
DOI: https://doi.org/10.1007/s00726-014-1711-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PhosphoSVM: prediction of phosphorylation sites by integrating various protein sequence attributes with a support vector machine

Abstract

Access this article

Similar content being viewed by others

Prediction of Protein Phosphorylation Sites by Integrating Secondary Structure Information and Other One-Dimensional Structural Properties

Rice_Phospho 1.0: a new rice-specific SVM predictor for protein phosphorylation sites

JUPred_SVM: Prediction of Phosphorylation Sites Using a Consensus of SVM Classifiers

References

Acknowledgments

Conflict of interest

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (DOCX 87 kb)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

PhosphoSVM: prediction of phosphorylation sites by integrating various protein sequence attributes with a support vector machine

Abstract

Access this article

Similar content being viewed by others

Prediction of Protein Phosphorylation Sites by Integrating Secondary Structure Information and Other One-Dimensional Structural Properties

Rice_Phospho 1.0: a new rice-specific SVM predictor for protein phosphorylation sites

JUPred_SVM: Prediction of Phosphorylation Sites Using a Consensus of SVM Classifiers

References

Acknowledgments

Conflict of interest

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (DOCX 87 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation