Skip to main content

Incorporating Natural Language-Based and Sequence-Based Features to Predict Protein Sumoylation Sites

  • Conference paper
  • First Online:
The 12th Conference on Information Technology and Its Applications (CITA 2023)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 734))

Included in the following conference series:

  • 251 Accesses

Abstract

The incidence of thyroid cancer and breast cancer is increasing every year, and the specific pathogenesis is unclear. Post-translational modifications are an important regulatory mechanism that affects the function of almost all proteins. They are essential for a diverse and well-functioning proteome and can integrate metabolism with physiological and pathological processes. In recent years, post-translational modifications have become a research hotspot, with methylation, phosphorylation, acetylation and succinylation being the main focus. SUMOylated proteins are predominantly localized in the nucleus, and SUMO regulates nuclear processes, including cell cycle control and DNA repair. SUMOylated proteins are predominantly localized in the nucleus, and SUMO regulates nuclear processes, including cell cycle control and DNA repair. SUMOylation has been increasingly implicated in cancer, Alzheimer’s, and Parkinson’s diseases. Therefore, identification and characterization SUMOylation sites are essential for determining modification-specific proteomics. This study aims to propose a novel schema for predicting protein SUMOylation sites based on the incorporation of natural language features (Word2Vec) and sequence-based features. In addition, the novel model, called RSX_SUMO, is proposed for the prediction of protein SUMOylation sites. Our experiments reveal that the performance of RSX_SUMO model achieves the highest performance in both five-fold cross-validation and independent testing, obtain the performance on independent testing with acccuracy at 88.6% and MCC value of 0.743. In addition, the comparison with several existing prediction models show that our proposed model outperforms and obtains the highest performance. We hope that our findings would provide effective suggestions and be a great helpful for researchers related to their related studies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Geiss-Friedlander, R., Melchior, F.: Concepts in sumoylation: a decade on. Nat. Rev. Mol. Cell Biol. 8(12), 947–956 (2007)

    Article  Google Scholar 

  2. Hay, R.T.: SUMO: a history of modification. Mol. Cell 18(1), 1–12 (2005)

    Article  MathSciNet  Google Scholar 

  3. Müller, S., et al.: SUMO, ubiquitin’s mysterious cousin. Nat. Rev. Mol. Cell Biol. 2(3), 202–210 (2001)

    Article  Google Scholar 

  4. Marmor-Kollet, H.S., et al.: Spatiotemporal proteomic analysis of stress granule disassembly using APEX reveals regulation by SUMOylation and links to ALS pathogenesis. Mol. Cell. 80, 15 (2020)

    Google Scholar 

  5. Princz, A.T.: N. SUMOylation in neurodegenerative diseases. Gerontology 66, 8 (2020)

    Article  Google Scholar 

  6. Seeler, J.S.B., Nacerddine, K.O., Dejean, A.: SUMO, the three Rs and cancer. Curr. Top. Microbiol. Immunol. 313, 22 (2007)

    Google Scholar 

  7. Ren, J., et al.: Systematic study of protein sumoylation: development of a site‐specific predictor of SUMOsp 2.0. Proteomics 9(12), 3409–3412 (2009)

    Google Scholar 

  8. Jia, J., et al.: pSumo-CD: predicting sumoylation sites in proteins with covariance discriminant algorithm by incorporating sequence-coupled effects into general PseAAC. Bioinformatics 32(20), 3133–3141 (2016)

    Article  Google Scholar 

  9. Chang, C.C., et al.: SUMOgo: prediction of sumoylation sites on lysines by motif screening models and the effects of various post-translational modifications. Sci. Rep. 8(1), 15512 (2018)

    Article  MathSciNet  Google Scholar 

  10. Dehzangi, A., et al.: SumSec: accurate prediction of sumoylation sites using predicted secondary structure. Molecules 23(12) (2018)

    Google Scholar 

  11. Sharma, A., et al.: HseSUMO: sumoylation site prediction using half-sphere exposures of amino acids residues. BMC Gen. 19(Suppl. 9), 982 (2019)

    Article  Google Scholar 

  12. Qian, Y., et al.: SUMO-Forest: a cascade forest based method for the prediction of SUMOylation sites on imbalanced data. Gene 741, 144536 (2020)

    Article  Google Scholar 

  13. Lopez, Y., Dehzangi, A., Reddy, H.M., Sharma, A.: C-iSUMO: a sumoylation site predictor that incorporates intrinsic characteristics of amino acid sequences. Comput. Biol. Chem. 87 (2020)

    Google Scholar 

  14. Khan, Y.D., et al.: iSUMOK-PseAAC: prediction of lysine sumoylation sites using statistical moments and Chou’s PseAAC. PeerJ 9, e11581 (2021)

    Article  Google Scholar 

  15. Zhao, Q., et al.: GPS-SUMO: a tool for the prediction of sumoylation sites and SUMO-interaction motifs. Nucleic Acids Res.. 42(Web Server issue), W325–W330 (2014)

    Google Scholar 

  16. Chen, Y.Z., Chen, Z., Gong, Y.A., Ying, G.: SUMOhydro: a novel method for the prediction of sumoylation sites based on hydrophobic properties. PloS One 7(6), e39195 (2012)

    Google Scholar 

  17. Nguyen, V.-N., Nguyen, H.-M., Tran, T.-X.: An approach by exploiting support vector machine to characterize and identify protein SUMOylation sites. JASSA. 505, 877 (2012)

    Google Scholar 

  18. Nguyen, V.-N., et al.: Characterization and identification of ubiquitin conjugation sites with E3 ligase recognition specificities. BMC Bioinform. BioMed Central (2015)

    Google Scholar 

  19. Nguyen, V.-N., et al.: A new scheme to characterize and identify protein ubiquitination sites. IEEE/ACM Trans. Comput. Biol. Bioinform. 14(2), 393–403 (2016)

    Article  MathSciNet  Google Scholar 

  20. Bui, V.-M., Nguyen, V.-N.: The prediction of Succinylation site in protein by analyzing amino acid composition. In: Akagi, M., Nguyen, T.T., Vu, D.T., Phung, T.N., Huynh, V.N. (eds.) Advances in Information and Communication Technology. ICTA 2016. Advances in Intelligent Systems and Computing, vol. 538, pp. 633–642. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-49073-1_67

  21. Le, N.Q.K., Ho, Q.T., Ou, Y.Y.: Incorporating deep learning with convolutional neural networks and position specific scoring matrices for identifying electron transport proteins. J. Comput. Chem. 38(23), 2000–2006 (2017)

    Article  Google Scholar 

  22. Nguyen, V.-N., et al. A new schema to identify S-farnesyl cysteine prenylation sites with substrate motifs. in Advances in Information and Communication Technology: Proceedings of the International Conference, ICTA 2016. 2017. Springer

    Google Scholar 

  23. Le, N.Q.K., et al.: Identification of Clathrin proteins by incorporating hyperparameter optimization in deep learning and PSSM profiles. Comput. Methods Programs Biomed. 177, 81–88 (2019)

    Article  Google Scholar 

  24. Nguyen, V.-N., et al.: Exploiting two-layer support vector machine to predict protein sumoylation sites. In: Fujita, H., Nguyen, D., Vu, N., Banh, T., Puta, H. (eds.) Advances in Engineering Research and Application. ICERA 2018. Lecture Notes in Networks and Systems, vol. 63, pp. 324–332. Springer, Cham. https://doi.org/10.1007/978-3-030-04792-4_43

  25. Lu, C.T., et al.: DbPTM 3.0: an informative resource for investigating substrate site specificity and functional association of protein post-translational modifications. Nucleic Acids Res. 41(Database issue), D295–305 (2013)

    Google Scholar 

  26. Beauclair, G., et al.: JASSA: a comprehensive tool for prediction of SUMOylation sites and SIMs. Bioinformatics 31(21), 3483–3491 (2015)

    Article  Google Scholar 

  27. Teng, S., Luo, H., Wang, L.: Predicting protein sumoylation sites from sequence features. Amino Acids 43, 447–455 (2012)

    Article  Google Scholar 

  28. Ho Thanh Lam, L., et al.: Machine learning model for identifying antioxidant proteins using features calculated from primary sequences. Biology 9(10), 325 (2020)

    Google Scholar 

  29. Huang, Y., et al.: CD-HIT Suite: a web server for clustering and comparing biological sequences. Bioinformatics 26(5), 680–682 (2010)

    Article  Google Scholar 

  30. Chen, Z., et al.: iFeature: a Python package and web server for features extraction and selection from protein and peptide sequences. Bioinformatics 34(14), 2499–2502 (2018)

    Article  Google Scholar 

  31. Sahlgren, M.: The distributional hypothesis. Ital. J. Disabil. Stud. 20, 33–53 (2008)

    Google Scholar 

  32. Chiu, B., et al.: How to train good word embeddings for biomedical NLP. In: Proceedings of the 15th Workshop on Biomedical Natural Language Processing (2016)

    Google Scholar 

  33. Lai, S., et al.: How to generate a good word embedding. IEEE Intell. Syst. 31(6), 5–14 (2016)

    Article  Google Scholar 

  34. Crooks, G.E., et al.: WebLogo: a sequence logo generator. Genome Res. 14(6), 1188–1190 (2004)

    Article  Google Scholar 

  35. Vacic, V., Iakoucheva, L.M., Radivojac, P.: Two Sample Logo: a graphical representation of the differences between two sets of sequence alignments. Bioinformatics 22(12), 1536–1537 (2006)

    Article  Google Scholar 

Download references

Acknowledgment

The authors sincerely thank to TUEBA for partly financial supported this research under the TNU-level project ID: ÐH2023-TN08-05.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Van-Nui Nguyen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tran, TX., Nguyen, VN., Le, N. (2023). Incorporating Natural Language-Based and Sequence-Based Features to Predict Protein Sumoylation Sites. In: Nguyen, N.T., Le-Minh, H., Huynh, CP., Nguyen, QV. (eds) The 12th Conference on Information Technology and Its Applications. CITA 2023. Lecture Notes in Networks and Systems, vol 734. Springer, Cham. https://doi.org/10.1007/978-3-031-36886-8_7

Download citation

Publish with us

Policies and ethics