Computational Prediction of Influenza Neuraminidase Inhibitors Using Machine Learning Algorithms and Recursive Feature Elimination Method
Recent outbreaks of highly pathogenic influenza have highlighted the need to develop novel anti-influenza therapeutics. Neuraminidase has become the most important target for the treatment of influenza virus. In this study, classification models were developed from a large training dataset containing 457 neuraminidase inhibitors and 358 non-inhibitors using random forest and support vector machine algorithms. Recursive feature elimination (RFE) method was used to improve the accuracy of the models by selecting the most relevant molecular descriptors. The performances of the models were evaluated by five-fold cross-validation and independent validation. The accuracies of all the models are over 86% in both validation methods. This work suggests machine learning algorithms combined with RFE method can be used to build useful models for predicting influenza neuraminidase inhibitors.
KeywordsMachine learning Neuraminidase inhibitor Feature selection
This work was supported by the National Natural Science Foundation of China (No: 31570160), Innovation Team Project (No: LT2015011) from Education Department of Liaoning Province, Large-scale Equipment Shared Services Project (No: F15165400) and Applied Basic Research Project (No: F16205151) from Science and Technology Bureau of Shenyang. This project was supported by Engineering Laboratory for Molecular Simulation and Designing of Drug Molecules of Liaoning.
- 7.Cong, Y., Li, B.-K., Yang, X.-G., Xue, Y., Chen, Y.-Z., Zeng, Y.: Quantitative structure–activity relationship study of influenza virus neuraminidase A/PR/8/34 (H1N1) inhibitors by genetic algorithm feature selection and support vector regression. Chemometr. Intell. Lab. 127, 35–42 (2013)CrossRefGoogle Scholar
- 12.Li, B.-K., Cong, Y., Yang, X.-G., Xue, Y., Chen, Y.-Z.: In silico prediction of spleen tyrosine kinase inhibitors using machine learning approaches and an optimized molecular descriptor subset generated by recursive feature elimination method. Comput. Biol. Med. 43, 395–404 (2013)CrossRefGoogle Scholar
- 17.Chen, X., Yan, C.C., Zhang, X., You, Z.-H.: Long non-coding RNAs and complex diseases: from experimental results to computational models. Brief. Bioinform. bbw060 (2016). doi: 10.1093/bib/bbw060
- 18.Chen, W., Feng, P., Yang, H., Ding, H., Lin, H., Chou, K.-C.: iRNA-AI: identifying the adenosine to inosine editing sites in RNA sequences. Oncotarget 8, 4208–4217 (2017)Google Scholar
- 19.Chen, W., Tang, H., Ye, J., Lin, H., Chou, K.-C.: iRNA-PseU: identifying RNA pseudouridine sites. Mol. Ther. Nucleic Acids 5, e332 (2016)Google Scholar
- 20.Chen, X., Huang, Y.-A., You, Z.-H., Yan, G.-Y., Wang, X.-S.: A novel approach based on KATZ measure to predict associations of human microbiota with non-infectious diseases. Bioinformatics 33, 733–739 (2017)Google Scholar
- 21.Huang, Z.-A., Chen, X., Zhu, Z., Liu, H., Yan, G.-Y., You, Z.-H., Wen, Z.: PBHMDA: Path-based human microbe-disease association prediction. Front. Microbiol. 8, 233 (2017)Google Scholar
- 22.Chen, X., Huang, Y.-A., Wang, X.-S., You, Z.-H., Chan, K.: FMLNCSIM: fuzzy measure-based lncRNA functional similarity calculation model. Oncotarget 7, 45948–45958 (2016)Google Scholar
- 23.Chen, X., You, Z., Yan, G., Gong, D.: IRWRLDA: improved random walk with restart for lncRNA-disease association prediction. Oncotarget 7, 57919–57931 (2016)Google Scholar
- 24.Chen, W., Ding, H., Feng, P., Lin, H., Chou, K.-C.: iACP: a sequence-based tool for identifying anticancer peptides. Oncotarget 7, 16895 (2016)Google Scholar