Skip to main content

Incorporation of Kernel Support Vector Machine for Effective Prediction of Lysine Formylation from Class Imbalance Samples

  • Conference paper
  • First Online:
Proceedings of the International Conference on Big Data, IoT, and Machine Learning

Part of the book series: Lecture Notes on Data Engineering and Communications Technologies ((LNDECT,volume 95))

  • 892 Accesses

Abstract

A post-translational modification (PTM) named lysine formylation discovered recently is a reversible and dynamic biological process primarily found on histone proteins of the organism that plays strong roles on modulation of chromatin conformations and the process of gene activation. A large number of traditional laboratory-based experimental methods and computational methods are currently available for identifying the formylated lysine residues. But the experimental methods are more costly and time consuming than computational methods. In order to predict formylated lysine sites, the existing computational methods are not satisfactory to select reliable non-formylated sites for balancing the training samples. A useful bioinformatics model named PLF_RNS is developed in this study by using various sequence-based features with support vector machine algorithm and F-score feature selection method. For this purpose, the verified formylated lysine samples are labeled as positive and reliable negative samples that are filtered from remaining samples using evolutionary information based on BLOSUM62 matrix are labeled as negative samples. The experimental result shows that PLF_RNS has acquired an average accuracy of 95.09% on tenfold cross validation, which is better compared to other currently available models. Therefore, it may be helpful for a better understanding of those types of molecular mechanisms and the development of drugs for related diseases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 329.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Yu B, Yu Z, Chen C et al (2020) DNNAce: prediction of prokaryote lysine acetylation sites through deep neural networks with multi-information fusion. Chemom Intell Lab Syst 200(5):103999–104014. https://doi.org/10.1016/j.chemolab.2020.103999

    Article  Google Scholar 

  2. Ning Q, Ma Z, Zhao X (2019) dForml(KNN)-PseAAC: detecting formylation sites from protein sequences using K-nearest neighbor algorithm via Chou’s 5-step rule and pseudo components. J Theor Bio 470(7):43–49. https://doi.org/10.1016/j.jtbi.2019.03.011

    Article  MATH  Google Scholar 

  3. Ju Z, Wang S (2020) Prediction of lysine formylation sites using the composition of k-spaced amino acid pairs via Chou’s 5-steps rule and general pseudo components. Genomics 112(1):859–866. https://doi.org/10.1016/j.ygeno.2019.05.027

    Article  Google Scholar 

  4. Jia C, Zhang M, Fan C et al (2019) Formator: predicting lysine formylation sites based on the most distant undersampling and safe-level synthetic minority oversampling. IEEE/ACM Trans Computat Biol Bioinf. https://doi.org/10.1109/tcbb.2019.2957758

    Article  Google Scholar 

  5. Jiang T, Zhou X, Taghizadeh K et al (2006) N-formylation of lysine in histone proteins as a secondary modification arising from oxidative DNA damage. Proc Nat Acad Sci 104(1):60–65. https://doi.org/10.1073/pnas.0606775103

    Article  Google Scholar 

  6. Machida Y, Chiba T, Takayanagi A et al (2005) Common anti-apoptotic roles of parkin and α-synuclein in human dopaminergic cells. Biochem Biophys Res Commun 332(1):233–240. https://doi.org/10.1016/j.bbrc.2005.04.124

    Article  Google Scholar 

  7. Sohrawordi M, Hasan M (2020) LyFor: prediction of lysine formylation sites from sequence based features using support vector machine. 2020 IEEE Region 10 Symp (TENSYMP), 250–253. https://doi.org/10.1109/tensymp50017.2020.9230689

  8. Blagus R, Lusa L (2013) SMOTE for high-dimensional class-imbalanced data. BMC Bioinf. https://doi.org/10.1186/1471-2105-14-106

    Article  MATH  Google Scholar 

  9. Xu H, Zhou J, Lin S et al (2017) PLMD: an updated data resource of protein lysine modifications. J Genet Genomics 44(5):243–250. https://doi.org/10.1016/j.jgg.2017.03.007

    Article  Google Scholar 

  10. Bairoch A, Apweiler R, Wu CH et al (2009) The universal protein resource (UniProt) in 2010. Nucleic Acids Res 38(1):D138–D142. https://doi.org/10.1093/nar/gkp846

    Article  Google Scholar 

  11. Huang K, Lee T, Kao H et al (2018) dbPTM in 2019: exploring disease association and cross-talk of post-translational modifications. Nucleic Acids Res 47(D1):D298–D308. https://doi.org/10.1093/nar/gky1074

    Article  Google Scholar 

  12. Fu L, Niu B, Zhu Z et al (2012) CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28(23):3150–3152. https://doi.org/10.1093/bioinformatics/bts565

    Article  Google Scholar 

  13. Zhang L, Dong B, Teng Z et al (2020) Identification of human enzymes using amino acid composition and the composition of k-spaced amino acid pairs. BioMed Res Int 1–11. https://doi.org/10.1155/2020/9235920

  14. Li S, Yu K, Wu G et al (2021) Pcysmod: prediction of multiple cysteine modifications based on deep learning framework. Front Cell Dev Biol. https://doi.org/10.3389/fcell.2021.617366

    Article  Google Scholar 

  15. Ning Q, Zhao X, Bao L et al (2018) Detecting succinylation sites from protein sequences using ensemble support vector machine. BMC Bioinf 19(1):237–235. https://doi.org/10.1186/s12859-018-2249-4

    Article  Google Scholar 

  16. Liu Y, Yu Z, Chen C et al (2020) Prediction of protein crotonylation sites through LightGBM classifier based on SMOTE and elastic net. Anal Biochem 609:113903–113910. https://doi.org/10.1016/j.ab.2020.113903

    Article  Google Scholar 

  17. Gupta S, Mittal P, Madhu M, Sharma VK (2017) IL17eScan: a tool for the identification of peptides inducing IL-17 response. Front Immunol. https://doi.org/10.3389/fimmu.2017.01430

    Article  Google Scholar 

  18. Liu M-L, Su W, Wang J-S et al (2020) Predicting preference of transcription factors for methylated DNA using sequence information. Mol Therapy Nucleic Acids. https://doi.org/10.1016/j.omtn.2020.07.035

    Article  Google Scholar 

  19. Atanaki F, Behrouzi S, Ariaeenejad S et al (2020) BIPEP: sequence-based prediction of biofilm inhibitory peptides using a combination of NMR and physicochemical descriptors. ACS Omega 5:7290–7297. https://doi.org/10.1021/acsomega.9b04119

    Article  Google Scholar 

  20. Yahav S, Bhole G (2020) Learning from imbalanced data in classification. Int J Recent Technol Eng 8:1907–1916. https://doi.org/10.35940/ijrte.e628 6.018520

  21. Chawla N, Bowyer K, Hall L, Kegelmeyer W (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357. https://doi.org/10.1613/jair.953

    Article  MATH  Google Scholar 

  22. Wang M, Cui X, Yu B et al (2020) SulSite-GTB: identification of protein S-sulfenylation sites by fusing multiple feature information and gradient tree boosting. Neural Comput Appl 32:13843–13862. https://doi.org/10.1007/s00521-020-04792-z

    Article  Google Scholar 

  23. Kumari C, Abulaish M, Subbarao N (2020) Using SMOTE to deal with class-imbalance problem in bioactivity data to predict mTOR inhibitors. SN Comput Sci 1. https://doi.org/10.1007/s42979-020-00156-5

  24. Wu L, Gao C, Xiang P et al (2020) CT-imaging based analysis of invasive lung adenocarcinoma presenting as ground glass nodules using peri- and intra-nodular radiomic features. Front Oncol 10. https://doi.org/10.3389/fonc.2020.00838

  25. Mishra S, Mallick PK, Jena L, Chae G-S (2020) Optimization of skewed data using sampling-based preprocessing approach. Front Public Health 8. https://doi.org/10.3389/fpubh.2020.00274

  26. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20:273–297. https://doi.org/10.1007/bf00994018

    Article  MATH  Google Scholar 

  27. Ccrvantes J, Garcia-Lamont F, Rodriguez-Mazahua L, Lopez A (2020) A comprehensive survey on support vector machine classification: applications, challenges and trends. Neurocomputing 408:189–215. https://doi.org/10.1016/j.neucom.2019.10.118

    Article  Google Scholar 

  28. Atasever S, Aydin Z, Erbay H, Sabzekar M (2019) Sample reduction strategies for protein secondary structure prediction. Appl Sci 9:4429. https://doi.org/10.3390/app9204429

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sohrawordi, M., Ali Hossain, M. (2022). Incorporation of Kernel Support Vector Machine for Effective Prediction of Lysine Formylation from Class Imbalance Samples. In: Arefin, M.S., Kaiser, M.S., Bandyopadhyay, A., Ahad, M.A.R., Ray, K. (eds) Proceedings of the International Conference on Big Data, IoT, and Machine Learning. Lecture Notes on Data Engineering and Communications Technologies, vol 95. Springer, Singapore. https://doi.org/10.1007/978-981-16-6636-0_15

Download citation

  • DOI: https://doi.org/10.1007/978-981-16-6636-0_15

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-16-6635-3

  • Online ISBN: 978-981-16-6636-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics