Skip to main content
Log in

FTWSVM-SR: DNA-Binding Proteins Identification via Fuzzy Twin Support Vector Machines on Self-Representation

  • Original research article
  • Published:
Interdisciplinary Sciences: Computational Life Sciences Aims and scope Submit manuscript

Abstract

Due to the high cost of DNA-binding proteins (DBPs) detection, many machine learning algorithms (ML) have been utilized to large-scale process and detect DBPs. The previous methods took no count of the processing of noise samples. In this study, a fuzzy twin support vector machine (FTWSVM) is employed to detect DBPs. First, multiple types of protein sequence features are formed into kernel matrices; Then, multiple kernel learning (MKL) algorithm is utilized to linear combine multiple kernels; next, self-representation-based membership function is utilized to estimate membership value (weight) of each training sample; finally, we feed the integrated kernel matrix and membership values into the FTWSVM-SR model for training and testing. On comparison with other predictive models, FTWSVM based on SR (FTWSVM-SR) obtains the best performance of Matthew’s correlation coefficient (MCC): 0.7410 and 0.5909 on two independent testing sets (PDB186 and PDB2272 datasets), respectively. The results confirm that our method can be an effective DBPs detection tool. Before the biochemical experiment, our model can screen and analyze DBPs on a large scale.

Graphical abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Cong L, Zhang F (2015) Genome engineering using crispr-cas9 system. Methods in molecular biology (Clifton, N.J.), vol 1239, p 197. https://doi.org/10.1007/978-1-4939-1862-1_10

  2. Kumar M, Gromiha MM, Raghava GP (2007) Identification of DNA-binding proteins using support vector machines and evolutionary profiles. BMC Bioinform 8:463. https://doi.org/10.1186/1471-2105-8-463

    Article  CAS  Google Scholar 

  3. Lin W, Fang J, Xiao X, Chou K (2011) idna-prot: Identification of DNA binding proteins using random forest with grey model. PLoS One 6:e24756. https://doi.org/10.1371/journal.pone.0024756

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Liu B, Wang S, Wang X (2015) DNA binding protein identification by combining pseudo amino acid composition and profile-based protein representation. Sci Rep 5:15479. https://doi.org/10.1038/srep15479

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Liu B, Xu J, Lan X, Xu R, Zhou J, Wang X, Chou KC (2014) idna-prot|dis: Identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition. PLoS One 9:e106691. https://doi.org/10.1371/journal.pone.0106691

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Liu B, Xu J, Fan S, Xu R, Zhou J, Wang X (2015) Psedna-pro: DNA-binding protein identification by combining chou’s pseaac and physicochemical distance transformation. Mol Inf 34(1):8–17. https://doi.org/10.1002/minf.201400025

    Article  CAS  Google Scholar 

  7. Wei L, Tang J, Quan Z (2016) Local-dpp: An improved DNA-binding protein prediction method by exploring local evolutionary information. Inf Sci 384:135–144. https://doi.org/10.1016/j.ins.2016.06.026

    Article  Google Scholar 

  8. Rahman MS, Shatabda S, Saha S (2018) Dpp-pseaac: a DNA-binding protein prediction model using chou’s general pseaac. J Theor Biol 452:22–34. https://doi.org/10.1016/j.jtbi.2018.05.006

    Article  CAS  PubMed  Google Scholar 

  9. Liu XJ, Gong XJ, Yu H, Xu JH (2018) A model stacking framework for identifying dna binding proteins by orchestrating multi-view features and classifiers. Genes 9(8):394. https://doi.org/10.3390/genes9080394

    Article  CAS  PubMed Central  Google Scholar 

  10. Ding YJ, Chen F, Guo XY, Tang JJ, Wu HJ (2020) Identification of DNA-binding proteins by multiple kernel support vector machine and sequence information. Curr Proteom 17(4):302–310. https://doi.org/10.2174/1570164616666190417100509

    Article  CAS  Google Scholar 

  11. Zou Y, Ding YJ, Tang JJ, Guo F, Peng L (2019) FKRR-MVSF: a fuzzy kernel ridge regression model for identifying DNA-binding proteins by multi-view sequence features via Chou’s five-step rule. Int J Mol Sci 20(17):4175. https://doi.org/10.3390/ijms20174175

    Article  CAS  PubMed Central  Google Scholar 

  12. Zou Y, Wu HJ, Guo XY, Peng L, Ding YJ, Tang JJ, Guo F (2021) MK-FSVM-SVDD: a multiple kernel-based fuzzy SVM model for predicting DNA-binding proteins via support vector data description. Curr Bioinform 16(2):274–283. https://doi.org/10.2174/1574893615999200607173829

    Article  CAS  Google Scholar 

  13. Adilina S, Farid D, Shatabda S (2019) Effective DNA binding protein prediction by using key features via chou’s general pseaac. J Theor Biol 460:64–78. https://doi.org/10.1016/j.jtbi.2018.10.027

    Article  CAS  PubMed  Google Scholar 

  14. Du X, Diao Y, Liu H (2019) Msdbp: exploring dna-binding proteins by integrating multi-scale sequence information via chou’s 5-steps rule. J Proteome Res 18(8):3119–3132. https://doi.org/10.1021/acs.jproteome.9b00226

    Article  CAS  PubMed  Google Scholar 

  15. Zhang S, Zhu F, Yu Q, Zhu X (2021) Identifying DNA-binding proteins based on multi-features and LASSO feature selection. Biopolymers 112:e23419. https://doi.org/10.1002/bip.23419

    Article  CAS  PubMed  Google Scholar 

  16. Wang J, Zhang S, Qiao H, Wang J (2021) UMAP-DBP: an improved DNA-binding proteins prediction method based on uniform manifold approximation and projection. Protein J 40:562–575. https://doi.org/10.1007/s10930-021-10011-y

    Article  CAS  PubMed  Google Scholar 

  17. Qian Y, Jiang L, Ding Y, Tang J, Guo F (2021) A sequence-based multiple kernel model for identifying DNA-binding proteins. BMC Bioinform 22:291. https://doi.org/10.1186/s12859-020-03875-x

    Article  CAS  Google Scholar 

  18. Qian Y, Meng H, Lu W, Liao Z, Ding Y, Wu H (2021) Identification of DNA-binding proteins via Hypergraph based Laplacian Support Vector Machine. Curr Bioinform. https://doi.org/10.2174/1574893616666210806091922

  19. Ahmad S, Sarai A (2004) Moment-based prediction of DNA-binding proteins. J Mol Biol 341(1):65–71. https://doi.org/10.1016/j.jmb.2004.05.058

    Article  CAS  PubMed  Google Scholar 

  20. Kumar KK, Pugalenthi G, Suganthan PN (2009) Dna-prot: Identification of DNA binding proteins from protein sequence information using random forest. J Biomol Struct Dyn 26(6):679–686. https://doi.org/10.1080/07391102.2009.10507281

    Article  CAS  PubMed  Google Scholar 

  21. Lou W, Wang X, Chen F, Chen Y, Jiang B, Zhang H (2014) Sequence based prediction of DNA-binding proteins based on hybrid feature selection using random forest and gaussian naïve bayes. PLoS One 9:e86703. https://doi.org/10.1371/journal.pone.0086703

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Nanni L, Brahnam S, Lumini A (2012) Wavelet images and chou’s pseudo amino acid composition for protein classification. Amino Acids 43:657–665. https://doi.org/10.1007/s00726-011-1114-9

    Article  CAS  PubMed  Google Scholar 

  23. Jeong JC, Lin X, Chen XW (2011) On position-specific scoring matrix for protein function prediction. IEEE/ACM Trans Comput Biol Bioinf 8(2):308–315. https://doi.org/10.1109/TCBB.2010.93

    Article  Google Scholar 

  24. Wei L, Luan S, Nagai L, Su R, Zou Q (2019) Exploring sequence-based features for the improved prediction of DNA n4-methylcytosine sites in multiple species. Bioinformatics 35:1326–1333. https://doi.org/10.1093/bioinformatics/bty824

    Article  CAS  PubMed  Google Scholar 

  25. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20:273–297. https://doi.org/10.1023/A:1022627411411

    Article  Google Scholar 

  26. Lin CF, Wang SD (2002) Fuzzy support vector machines. IEEE Trans Neural Netw 13(2):464–471. https://doi.org/10.1109/72.991432

    Article  PubMed  Google Scholar 

  27. Jayadeva RK, Khemchandani R, Chandra S (2007) Twin support vector machines for pattern classification. IEEE Trans Pattern Anal Mach Intell 29(5):905–910. https://doi.org/10.1109/TPAMI.2007.1068

    Article  CAS  PubMed  Google Scholar 

  28. Shao YH, Zhang CH, Wang XB (2011) Improvements on twin support vector machines. IEEE Trans Neural Netw 22(6):962–968. https://doi.org/10.1109/TNN.2011.2130540

    Article  PubMed  Google Scholar 

  29. Chou KC, Shen HB (2007) Memtype-2l: a web server for predicting membrane proteins and their types by incorporating evolution information through pse-pssm. Biochem Biophys Res Commun 360(2):339–345. https://doi.org/10.1016/j.bbrc.2007.06.027

    Article  CAS  PubMed  Google Scholar 

  30. Feng ZP, Zhang CT (2000) Prediction of membrane protein types based on the hydrophobic index of amino acids. J Protein Chem 19(4):269–275. https://doi.org/10.1023/A:1007091128394

    Article  CAS  PubMed  Google Scholar 

  31. Li X, Liao B, Shu Y, Zeng Q, Luo J (2009) Protein functional class prediction using global encoding of amino acid sequence. J Theor Biol 261(2):290–293. https://doi.org/10.1016/j.jtbi.2009.07.017

    Article  CAS  PubMed  Google Scholar 

  32. You ZH, Zhu L, Zheng CH, Yu HJ, Deng SP, Ji Z (2014) Prediction of protein–protein interactions from amino acid sequences using a novel multi-scale continuous and discontinuous feature set. BMC Bioinform 15:S9. https://doi.org/10.1186/1471-2105-15-S15-S9

    Article  Google Scholar 

  33. Gretton A, Bousquet O, Smola A, Schölkopf B (2005) Measuring statistical dependence with Hilbert–Schmidt norms. Lect Notes Comput Sci 3734:63–77. https://doi.org/10.1007/11564089_7

    Article  Google Scholar 

  34. Wang T, Li W (2018) Kernel learning and optimization with Hilbert–Schmidt independence criterion. Int J Mach Learn Cybern 9:1707–1717. https://doi.org/10.1007/s13042-017-0675-7

    Article  Google Scholar 

  35. Wang H, Ding YJ, Tang JJ, Guo F (2020) Identification of membrane protein types via multivariate information fusion with Hilbert–Schmidt Independence Criterion. Neurocomputing 383:257–269. https://doi.org/10.1016/j.neucom.2019.11.103

    Article  Google Scholar 

  36. Cristianini N, Kandola J, Elisseeff A (2001) On kernel-target alignment. Adv Neural Inf Process Syst 179(5):367–373. https://doi.org/10.1007/3-540-33486-6_8

    Article  Google Scholar 

  37. Chen SG, Wu XJ (2018) A new fuzzy twin support vector machine for pattern classification. Int J Mach Learn Cybern 9:1553–1564. https://doi.org/10.1007/s13042-017-0664-x

    Article  Google Scholar 

  38. Wright J, Yang AY, Ganesh A, Sastry SS, Ma Y (2009) Robust face recognition via sparse representation. IEEE Trans Pattern Anal Mach Intell 31(2):210–227. https://doi.org/10.1109/TPAMI.2008.79

    Article  PubMed  Google Scholar 

  39. Ding YJ, Tang JJ, Guo F (2019) Protein crystallization identification via fuzzy model on linear neighborhood representation. IEEE/ACM Trans Comput Biol Bioinform. https://doi.org/10.1109/TCBB.2019.2954826

  40. Rezvani S, Wang X, Pourpanah F (2019) Intuitionistic fuzzy twin support vector machines. IEEE Trans Fuzzy Syst 27(11):2140–2151. https://doi.org/10.1109/TFUZZ.2019.2893863

    Article  Google Scholar 

  41. Ahmad S, Sarai A (2020) Stackpdb: Predicting dna-binding proteins based on xgb-rfe feature optimization and stacked ensemble classifier. Appl Soft Comput. https://doi.org/10.1016/j.asoc.2020.106921

Download references

Acknowledgements

This study is supported by the National Science Foundation of China (NSFC 61873112, 61922020, 62172076 and 61902271) and Special Science Foundation of Quzhou (2021D004). The authors also thank professor Bin Liu, Xiuquan Du and Leyi Wei for kindly sharing the dataset.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Li Peng or Quan Zou.

Ethics declarations

Conflict of Interest

The authors have no competing interests.

Availability of data and material

The related data can be download from: https://figshare.com/s/934f45e3a3e7693691d5.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zou, Y., Ding, Y., Peng, L. et al. FTWSVM-SR: DNA-Binding Proteins Identification via Fuzzy Twin Support Vector Machines on Self-Representation. Interdiscip Sci Comput Life Sci 14, 372–384 (2022). https://doi.org/10.1007/s12539-021-00489-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12539-021-00489-6

Keywords

Navigation