Screen efficiency comparisons of decision tree and neural network algorithms in machine learning assisted drug design
- 51 Downloads
In view of huge search space in drug design, machine learning has become a powerful method to predict the affinity between small molecular drug and targeting protein with the development of artificial intelligence technology. However, various machine learning algorithms including massive different parameters make the prediction framework choice to be quite difficult. In this work, we took a recent drug design competition (from XtalPi company on the DataCastle platform) as the typical case to find the optimized parameters for different machines learning algorithms and the most effective algorithm. After the parameter optimizations, we compared the typical machine learning methods as decision tree (XGBoost, LightGBM) and artificial neural network (MLP, CNN) with root-mean-square error (RMSE) and coefficient of determination (R2) evaluation. As a result, decision tree is more effective than the neural network as LightGBM>XGBoost>CNN>MLP in the affinity prediction of the specific drug design problem with ~160000 samples. For a much larger screening task in a more complicated drug design study, the sophisticated neural network model may go beyond the decision tree algorithm after generalization enhancing and overfitting reducing. The advanced machine learning methods could extract more information of protein-ligand bindings than traditional ones and improve the screen efficiency of drug design up to 200–1000 times.
Keywordsdrug design affinity prediction protein-ligand binding machine learning
This work was supported by the National Natural Science Foundation of China (31571026, 21727817).
- 14.Rastelli G, Del Rio A, Degliesposti G, Sgobba M. J Comput Chem, 2010, 31: 797–810Google Scholar
- 24.Michalski RS, Carbonell JG, Mitchell TM. Machine Learning: An Artificial Intelligence Approach. Berlin-Heidelberg: Springer Science & Business Media, 2013Google Scholar
- 29.Garg V, Kumar H, Sinha R. Speech based emotion recognition based on hierarchical decision tree with SVM, BLG and SVR classifiers. In: 2013 National Conference on Communications. New Delhi: IEEE, 2013. 1–5Google Scholar
- 30.Zhang Z. Artificial neural network. In: Zhang Z, Ed. Multivariate Time Series Analysis in Climate and Environmental Research. Cham: Springer, 2018. 1–35Google Scholar
- 31.Li H, Lin Z, Shen X, Brandt J, Hua G. A convolutional neural network cascade for face detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015.5325–5334Google Scholar
- 36.http://www.dcjingsai.com/Google Scholar
- 38.Rehurek R, Sojka P. Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. Valletta: IEEE, 2010Google Scholar
- 40.Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu TY. Lightgbm: a highly efficient gradient boosting decision tree. In: Advances in Neural Information Processing Systems. Long Beach, 2017. 3146–3154Google Scholar
- 42.Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems. Lake Tahoe, 2012. 1097–1105Google Scholar
- 43.Chen T, He T, Benesty M. Xgboost: extreme gradient boosting. R Package Version 0.4-2, 2015. 1–4Google Scholar
- 46.Oquab M, Bottou L, Laptev I, et al. Learning and transferring midlevel image representations using convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Columbus, 2014. 1717–1724Google Scholar