Prediction of phosphorylation sites based on granular support vector machine

Abstract

Protein phosphorylation is the most extensive and important post-translational modification in eukaryotes, regulating the activity of almost all cells. Experimental methods used to identify phosphorylation sites, such as mass spectrometry, are costly and time-consuming. A number of algorithms have been developed to predict phosphorylation sites. However, they often select small data volume by random sampling. This cannot make full use of the characteristics of the entire data set to build a prediction model. According to the granularity calculation combined with the kernel fuzzy C-means clustering, this paper maps the massive raw data to a high-dimensional kernel space, and then divides the grains by clustering to obtain high-dimensional equilibrium grains. In particular, a specific granular support vector machine (KFCC–GSVM) prediction model is built in equilibrium grain data. This novel model improves the rationality and reliability of phosphorylation site data compression, so that the compressed data has the same distribution in the kernel space as the pre-compression data when applying the traditional SVM algorithm classification. Experimental results demonstrate that our method is better than the SVM-based non-kinase-specific phosphorylation site prediction method—Musite and the traditional GSVM method.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

References

  1. Altschul SF, Madden TL, SchFfer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped blast and psi-blast: a new generation of protein database search programs. Nucl Acids Res 25:3389–402

    Article  Google Scholar 

  2. Biswas AK, Noman N, Sikder AR (2010) Machine learning approach to predict protein phosphorylation sites by incorporating evolutionary information. Bmc Bioinform 11(1):273

    Article  Google Scholar 

  3. Blom N, Gammeltoft S, Brunak S (1999a) Sequence and structure-based prediction of eukaryotic protein phosphorylation sites 1. J Mol Biol 294(5):1351–62

    Article  Google Scholar 

  4. Blom N, Kreegipuu A, Brunak S (1999b) Phosphobase: a database of phosphorylation sites. Nucl Acids Res 26(1):237–239

    Google Scholar 

  5. Brett T, Anthony K (2011) Computational prediction of eukaryotic phosphorylation sites. Bioinformatics 27(21):2927–2935

    Article  Google Scholar 

  6. Chapelle O, Vapnik V, Bousquet O, Mukherjee S (2002) Choosing multiple parameters for support vector machines. Mach Learn 46(1–3):131–159

    MATH  Article  Google Scholar 

  7. Chen Q, Wang Y, Chen B, Zhang C, Wang L, Li J (2017) Using propensity scores to predict the kinases of unannotated phosphopeptides. Knowl Based Syst 135:60–76

    Article  Google Scholar 

  8. Chen Q, Deng C, Lan W, Liu Z, Zheng R, Liu J, Wang J (2019) Identifying interactions between kinases and substrates based on protein-protein interaction network. J Comput Biol

  9. Ding S, Zhang X, An Y, Xue Y (2017) Weighted linear loss multiple birth support vector machine based on information granulation for multi-class classification. Pattern Recognit 67:32–46

    Article  Google Scholar 

  10. Dou Y, Yao B, Zhang C (2014) Phosphosvm: prediction of phosphorylation sites by integrating various protein sequence attributes with a support vector machine. Amino Acids 46(6):1459–1469

    Article  Google Scholar 

  11. Dunker AK, Oldfield CJ, Meng J, Romero P, Yang JY, Chen JW, Vacic V, Obradovic Z, Uversky VN (2008) The unfoldomics decade: an update on intrinsically disordered proteins. Bmc Genom 9(Suppl 2):S1–S1

    Article  Google Scholar 

  12. Francesca D, Gould CM, Claudia C, Allegra V, Gibson TJ (2008) Phospho.elm: a database of phosphorylation sites–update 2008. Nucl Acids Res 36(Database issue):240–4

    Google Scholar 

  13. Francesca D, Gould CM, Claudia C, Allegra V, Gibson TJ (2011) Phospho.elm: a database of phosphorylation sites–update 2011. Nucl Acids Res 39(Database issue):D261–D267

    Google Scholar 

  14. Gao J, Agrawal GK, Thelen JJ, Obradovic Z, Dunker AK, Dong X (2009) A new machine learning approach for protein phosphorylation site prediction in plants. Lect Notes Comput Sci 5462/2009:18–29

    Article  Google Scholar 

  15. Gao J, Thelen JJ, Dunker AK, Xu D (2010) Musite, a tool for global prediction of general and kinase-specific phosphorylation sites. Mol Cell Proteom 9(12):2586–2600

    Article  Google Scholar 

  16. Girolami M (2002) Mercer kernel-based clustering in feature space. IEEE Trans Neural Netw 13(3):780–4

    Article  Google Scholar 

  17. Gnad F, Ren S, Cox J, Olsen JV, Macek B, Oroshi M, Mann M (2007) Phosida (phosphorylation site database): management, structural and evolutionary investigation, and prediction of phosphosites. Genome Biol 8(11):R250

    Article  Google Scholar 

  18. Grabiec AM, Korchynskyi O, Tak PP, Reedquist KA (2012) Histone deacetylase inhibitors suppress rheumatoid arthritis fibroblast-like synoviocyte and macrophage il-6 production by accelerating mrna decay. Ann Rheum Dis 71(3):424

    Article  Google Scholar 

  19. Hasan MM, Khatun MS (2018) Prediction of protein post-translational modification sites: an overview. Ann Proteom Bioinform 2:049–057

    Google Scholar 

  20. Henikoff S, Henikoff JG (1992) Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci USA 89(22):10915–9

    Article  Google Scholar 

  21. Hjerrild M, Stensballe A, Rasmussen TE, Kofoed CB, Blom N, Sicheritz-Ponten T, Larsen MR, Brunak S, Jensen ON, Gammeltoft S (2004) Identification of phosphorylation sites in protein kinase a substrates using artificial neural networks and mass spectrometry. J Proteome Res 3(3):426

    Article  Google Scholar 

  22. Hsieh CJ, Si S, Dhillon I (2014) A divide-and-conquer solver for kernel support vector machines. In: International conference on machine learning, pp 566–574

  23. Hsien-Da H, Tzong-Yi L, Shih-Wei T, Jorng-Tzong H (2005) Kinasephos: a web tool for identifying protein kinase-specific phosphorylation sites. Nucl Acids Res 33(Web Server issue):226–9

    Google Scholar 

  24. Iakoucheva LM, Predrag R, Brown CJ, O’Connor TR, Sikes JG, Zoran O, Keith AD (2004) The importance of intrinsic disorder for protein phosphorylation. Nucl Acids Res 32(3):1037–49

    Article  Google Scholar 

  25. Kennelly PJ, Krebs EG (1991) Consensus sequences as substrate specificity determinants for protein kinases and protein phosphatases. J Biol Chem 266(24):15555–15558

    Article  Google Scholar 

  26. Lasko TA, Bhagwat JG, Zou KH, Ohno-Machado L (2005) The use of receiver operating characteristic curves in biomedical informatics. J Biomed Inform 38(5):404–415

    Article  Google Scholar 

  27. Li Y, Cai YZ, Li YG, Xu XM (2004) Rough sets method for svm data preprocessing. In: IEEE conference on cybernetics & intelligent systems

  28. Liu H, Cocea M (2017) Granular computing-based approach for classification towards reduction of bias in ensemble learning. Granul Comput 2(3):1–9

    Article  Google Scholar 

  29. Liu P, You X (2017) Probabilistic linguistic todim approach for multiple attribute decision-making. Granul Comput 12:1–10

    Google Scholar 

  30. Livi L, Sadeghian A (2016) Granular computing, computational intelligence, and the analysis of non-geometric input spaces. Granul Comput 1(1):13–20

    Article  Google Scholar 

  31. Obradovic PKVSRPDAZ (2008) Exploiting heterogeneous sequence properties improves prediction of protein disorder. Proteins 61(Suppl 7):176–182

    Google Scholar 

  32. Shim J, Sohn I, Kim S, Lee JW, Green PE, Hwang C (2009) Selecting marker genes for cancer classification using supervised weighted kernel clustering and the support vector machine. Comput Stat Data Anal 53(5):1736–1742

    MathSciNet  MATH  Article  Google Scholar 

  33. Sweet RM, Eisenberg D (1983) Correlation of sequence hydrophobicities measures similarity in three-dimensional protein structure. J Mol Biol 171(4):479–488

    Article  Google Scholar 

  34. Tang Y (2006) Granular support vector machines based on granular computing, soft computing and statistical learning

  35. Tang Y, Jin B, Zhang YQ (2005) Granular support vector machines with association rules mining for protein homology prediction. Artif Intell Med 35(1):121–134

    Article  Google Scholar 

  36. Tuo Z, Hua Z, Ke C, Shiyi S, Jishou R, Lukasz K (2008) Accurate sequence-based prediction of catalytic residues. Bioinformatics 24(20):2329–2338

    Article  Google Scholar 

  37. Wang G, Yang J, Xu J (2017) Granular computing: from granularity optimization to multi-granularity joint problem solving. Granul Comput 2(3):1–16

    Google Scholar 

  38. Wang W, Guo H (2009) Granular support vector machine learning model. J Shanxi Univ (Natural Science Edition) 4:11

    Google Scholar 

  39. Wilke G, Portmann E (2016) Granular computing as a basis of human-data interaction: a cognitive cities use case. Granul Comput 1(3):181–197

    Article  Google Scholar 

  40. Wu KP, Wang SD (2006) Choosing the kernel parameters of support vector machines according to the inter-cluster distance. In: The 2006 IEEE international joint conference on neural network proceedings. IEEE, pp 1205–1211

  41. Wu KP, Wang SD (2009) Choosing the kernel parameters for support vector machines by the inter-cluster distance in the feature space. Pattern Recognit 42(5):710–717

    MathSciNet  MATH  Article  Google Scholar 

  42. Wu ZD, Xie WX, Yu JP (2003) Fuzzy c-means clustering algorithm based on kernel method. In: International conference on computational intelligence & multimedia applications

  43. Xue Y, Li A, Wang L, Feng H, Yao X (2006) Ppsp: prediction of pk-specific phosphorylation site with bayesian decision theory. Bmc Bioinform 7(1):163

    Article  Google Scholar 

  44. Yu H, Yang J, Han J, Li X (2005) Making svms scalable to large data sets using hierarchical cluster indexing. Data Min Knowl Discov 11(3):295–321

    MathSciNet  Article  Google Scholar 

  45. Zavialova MG, Zgoda VG, Nikolaev EN (2017) Analysis of the role of protein phosphorylation in the development of diseases. Biochem Suppl 11(3):203–218

    Google Scholar 

  46. Zhang X (1999) Using class-center vectors to build support vector machines. In: Neural networks for signal processing IX, IEEE signal processing society workshop

  47. Zhao H, Wang Z, Men J (2007) Facial complex expression recognition based on fuzzy kernel clustering and support vector machines. In: Third international conference on natural computation (ICNC 2007), vol 1. IEEE, pp 562–566

  48. Zhong C, Pedrycz W, Wang D, Li L, Li Z (2016) Granular data imputation: a framework of granular computing. Appl Soft Comput 46:307–316

    Article  Google Scholar 

  49. Zulawski M, Braginets R, Schulze WX (2013) Phosphat goes kinases-searchable protein kinase target information in the plant phosphorylation site database phosPhAt. Nucl Acids Res 41(D1):D1176–D1184

    Article  Google Scholar 

Download references

Acknowledgements

The work reported in this paper was partially supported by a National Natural Science Foundation of China project 61751314, a National Natural Science Foundation of China project 61963004, and a key project of Natural Science Foundation of Guangxi 2017GXNSFDA198033 and a key research and development plan of Guangxi AB17195055.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Qingfeng Chen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Cheng, G., Chen, Q. & Zhang, R. Prediction of phosphorylation sites based on granular support vector machine. Granul. Comput. 6, 107–117 (2021). https://doi.org/10.1007/s41066-019-00202-5

Download citation

Keywords

  • Phosphorylation sites
  • Kernel
  • Clustering
  • Granular support vector machine