A Machine Learning-Based QSAR Model for Benzimidazole Derivatives as Corrosion Inhibitors by Incorporating Comprehensive Feature Selection
- 108 Downloads
Computational prediction of inhibition efficiency (IE) for inhibitor molecules is a crucial supplementary way to design novel molecules that can efficiently inhibit corrosion onto metallic surfaces.
Here we are dedicated to developing a new machine learning-based predictor for the inhibition efficiency (IE) of benzimidazole derivatives.
First, a comprehensively numerical representation was given on inhibitor molecules from all aspects of energy, electronic, topological, physicochemical and spatial properties based on 3-D structures and 150 valid structural descriptors were obtained. Then, a thorough investigation of these structural descriptors was implemented. The multicollinearity-based clustering analysis was performed to remove the linear correlated feature variables, so 47 feature clusters were produced. Meanwhile, Gini importance by random forest (RF) was used to further measure the contributions of the descriptors in each cluster and 47 non-linear descriptors were selected with the highest Gini importance score in the corresponding cluster. Further, considering the limited number of available inhibitors, different feature subsets were constructed according to the Gini importance score ranking list of 47 descriptors.
Finally, support vector machine (SVM) models based on different feature subsets were tested by leave-one-out cross validation. Through comparisons, the optimal SVM model with the top 11 descriptors was achieved based on Poly kernel. This model yields a promising performance with the correlation coefficient (R) and root-mean-square error (RMSE) of 0.9589 and 4.45, respectively, which indicates that the method proposed by us gives the best performance for the current data.
Based on our model, 6 new benzimidazole molecules were designed and their IE values predicted by this model indicate that two of them have high potential as outstanding corrosion inhibitors.
KeywordsBenzimidazole derivatives Inhibition efficiency (IE) Machine learning methods Feature extraction and selection
This work was financially supported by Major Science and Technology Project of China National Petroleum Co. Ltd (No.: 2016E − 0609). We also thank the Comprehensive Training Platform of Specialized Laboratory, College of Chemistry, Sichuan University for sample analysis.
Compliance with ethical standards
Conflict of interest
The authors declare no competing financial interests.
- 1.Mikhailovskii AI, Petrov NA (1997) Monitoring of underground pipeline corrosion condition with sensory instruments. Prot Met 33:293–295Google Scholar
- 10.Kabanda MM, Obot IB, Ebenso EE (2013) Computational study of some amino acid derivatives as potential corrosion inhibitors for different metal surfaces and in different media. Int J Electrochem Sci 8:10839–10850Google Scholar
- 14.Shirazi Z, Keshavarz MH, Esmaeilpour K, Golikand AN (2017) A simple approach for assessment of the corrosion inhibition efficiency of triazole, oxadiazole and thiadiazole derivatives as a function of their concentrations without using complex computer codes. Protect Met Phys Chem Surf 53:359–372CrossRefGoogle Scholar
- 30.Hu SQ et al (2011) 3D-QSAR study and molecular design of benzimidazole derivatives as corrosion inhibitor. Chem J Chin Univ 32:2402–2409Google Scholar
- 43.Vapnik V (1998) Statistical learning theory. Wiley, New YorkGoogle Scholar