Abstract
DNA-binding proteins is the molecular basis for understanding the basic processes of life activities. Many diseases are associated with DNA binding proteins. The methods of detecting DNA-binding proteins are mainly realized by biochemical experiment, which is time consuming and extremely expensive. A lot of computational methods based on Machine Learning (ML) algorithm have been developed to detect DNA-binding proteins. In this study, we propose a novel DNA-binding proteins model via a Fuzzy Multiple Kernel Support Vector Machine. The multiple features of sequence and evolutionary are extracted and constructed as multiple kernels, respectively. Next, these corresponding kernels are integrated by Multiple Kernel Learning (MKL) algorithm. At last, Fuzzy Support Vector Machine (FSVM) is employed to build an effective DNA-binding protein predictor. Comparing with other outstanding methods, our proposed approach achieves good results. The accuracy of our model are 82.98% and 81.70% on PDB1075 (benchmark data set of DNA-binding proteins) and PDB186 (independent test set), respectively. Our approach is comparable to previous methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bhardwaj, N., Langlois, R.E., Zhao, G., Lu, H.: Kernel-based machine learning protocol for predicting DNA-binding proteins. Nucleic Acids Res. 33(20), 6486–6493 (2005)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)
Nimrod, G., Schushan, M., Szilágyi, A., Leslie, C.: iDBPs: a web server for the identification of DNA binding proteins. Bioinformatics 26(5), 692–693 (2010)
Ahmad, S., Sarai, A.: Moment-based prediction of DNA-binding proteins. J. Mol. Biol. 341(1), 65–71 (2004)
Cai, Y.D., Lin, S.L.: Support vector machines for predicting rRNA-, RNA-, and DNA-binding proteins from amino acid sequence. Biochim. Biophys. Acta 1648(1), 127–133 (2003)
Liu, B., Xu, J., Fan, S., Xu, R., Zhou, J., Wang, X.: PseDNA-Pro: DNA-binding protein identification by combining chou’s PseAAC and physicochemical distance transformation. Mol. Inform. 34(1), 8–17 (2015)
Yu, X., Cao, J., Cai, Y., Shi, T., Li, Y.: Predicting rRNA-, RNA-, and DNA-binding proteins from primary structure with support vector machines. J. Theor. Biol. 240(2), 175–184 (2006)
Lipman, D.J., et al.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25(17), 3389–3402 (1997)
Kumar, M., Gromiha, M.M., Raghava, G.P.: Identification of DNA-binding proteins using support vector machines and evolutionary profiles. BMC Bioinformatics 8, 463 (2007)
Liu, B., et al.: iDNA-prot|dis: identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition. PLoS One 9, e106691 (2014)
Wei, L., Tang, J., Quan, Z.: Local-DPP: an improved DNA-binding protein prediction method by exploring local evolutionary information. Inf. Sci. 384, 135–144 (2016)
Lou, W., Wang, X., Chen, F., Chen, Y., Jiang, B., Zhang, H.: Sequence based prediction of DNA-binding proteins based on hybrid feature selection using random forest and Gaussian Naïve Bayes. PLoS One 9, e86703 (2014)
Li, X., Liao, B., Shu, Y., Zeng, Q., Luo, J.: Protein functional class prediction using global encoding of amino acid sequence. J. Theor. Biol. 261(2), 290–293 (2009)
You, Z.H., Zhu, L., Zheng, C.H., Yu, H.J., Deng, S.P., Ji, Z.: Prediction of protein-protein interactions from amino acid sequences using a novel multi-scale continuous and discontinuous feature set. BMC Bioinformatics 15, S9 (2014)
Ding, Y.J., Tang, J.J., Guo, F.: Predicting protein-protein interactions via multivariate mutual information of protein sequences. BMC Bioinformatics 17, 398 (2016)
Feng, Z.P., Zhang, C.T.: Prediction of membrane protein types based on the hydrophobic index of amino acids. J. Protein Chem. 19(4), 269–275 (2000)
Jeong, J.C., Lin, X., Chen, X.W.: On position-specific scoring matrix for protein function prediction. IEEE/ACM Trans. Comput. Biol. Bioinf. 8(2), 308–315 (2011)
Huang, Y.A., You, Z.H., Gao, X., Wong, L., Wang, L.: Using weighted sparse representation model combined with discrete cosine transformation to predict protein-protein interactions from protein sequence. Biomed. Res. Int. 19, 902198 (2015)
Nanni, L., Brahnam, S., Lumini, A.: Wavelet images and Chou’s pseudo amino acid composition for protein classification. Amino Acids 43, 657–665 (2012)
Endres, D.M., Schindelin, J.E.: A new metric for probability distributions. IEEE Trans. Inf. Theory 49(7), 1858–1860 (2003)
Cristianini, N., Kandola, J., Elisseeff, A.: On kernel-target alignment. Adv. Neural. Inf. Process. Syst. 179(5), 367–373 (2001)
Cortes, C., Mohri, M., Rostamizadeh, A.: Algorithms for learning kernels based on centered alignment. J. Mach. Learn. Res. 13(2), 795–828 (2012)
Lin, C.F., Wang, S.D.: Fuzzy support vector machines. IEEE Trans. Neural Networks 13(2), 464–471 (2002)
Rose, P.W., Prlić, A., Bi, C., et al.: The RCSB Protein Data Bank: views of structural biology for basic and applied research and education. Nucleic Acids Res. 43(Database issue), 345–356 (2015)
Lin, W., Fang, J., Xiao, X., Chou, K.: iDNA-Prot: identification of DNA binding proteins using random forest with grey model. PLoS ONE 6, e24756 (2011)
Kumar, K.K., Pugalenthi, G., Suganthan, P.N.: DNA-Prot: identification of DNA binding proteins from protein sequence information using random forest. J. Biomol. Struct. Dyn. 26(6), 679–686 (2009)
Liu, B., Wang, S., Wang, X.: DNA binding protein identification by combining pseudo amino acid composition and profile-based protein representation. Sci. Rep. 5, 15479 (2015)
Xu, R., Zhou, J., Wang, H., He, Y., Wang, X., Liu, B.: Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation. BMC Syst. Biol. 9, S10 (2015)
Acknowledgments
This work is supported by a grant from the National Science Foundation of China (NSFC 61772362).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Ding, Y., Tang, J., Guo, F. (2019). Identification of DNA-Binding Proteins via Fuzzy Multiple Kernel Model and Sequence Information. In: Huang, DS., Jo, KH., Huang, ZK. (eds) Intelligent Computing Theories and Application. ICIC 2019. Lecture Notes in Computer Science(), vol 11644. Springer, Cham. https://doi.org/10.1007/978-3-030-26969-2_45
Download citation
DOI: https://doi.org/10.1007/978-3-030-26969-2_45
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-26968-5
Online ISBN: 978-3-030-26969-2
eBook Packages: Computer ScienceComputer Science (R0)