A combined Fisher and Laplacian score for feature selection in QSAR based drug design using compounds with known and unknown activities
- 131 Downloads
Quantitative structure–activity relationship (QSAR) is an effective computational technique for drug design that relates the chemical structures of compounds to their biological activities. Feature selection is an important step in QSAR based drug design to select the most relevant descriptors. One of the most popular feature selection methods for classification problems is Fisher score which aim is to minimize the within-class distance and maximize the between-class distance. In this study, the properties of Fisher criterion were extended for QSAR models to define the new distance metrics based on the continuous activity values of compounds with known activities. Then, a semi-supervised feature selection method was proposed based on the combination of Fisher and Laplacian criteria which exploits both compounds with known and unknown activities to select the relevant descriptors. To demonstrate the efficiency of the proposed semi-supervised feature selection method in selecting the relevant descriptors, we applied the method and other feature selection methods on three QSAR data sets such as serine/threonine–protein kinase PLK3 inhibitors, ROCK inhibitors and phenol compounds. The results demonstrated that the QSAR models built on the selected descriptors by the proposed semi-supervised method have better performance than other models. This indicates the efficiency of the proposed method in selecting the relevant descriptors using the compounds with known and unknown activities. The results of this study showed that the compounds with known and unknown activities can be helpful to improve the performance of the combined Fisher and Laplacian based feature selection methods.
KeywordsSemi-supervised Feature selection Fisher criterion Graph Laplacian QSAR models
This study was supported by Hematology and Oncology Research Center of Shahid Sadoughi University of Medical Sciences (funding reference number: 5666).
- 1.Jalali-Heravi M, Asadollahi-Baboli M (2009) Quantitative structure–activity relationship study of serotonin (5-HT7) receptor inhibitors using modified ant colony algorithm and adaptive neuro-fuzzy interference system (ANFIS). Eur J Med Chem 44:1463–1470. https://doi.org/10.1016/j.ejmech.2008.09.050 CrossRefGoogle Scholar
- 17.Chang X, Yang Y (2016) Semisupervised feature analysis by mining correlations among multipe tasks. IEEE Trans Neural Networks Learn Syst 1–12. http://arxiv.org/abs/1411.6232
- 18.Chang X, Nie F, Yang Y, Huang H (2014) A Convex formulation for semi-supervised multi-label feature selection. In Proceedings 28th AAAI Conf Artif Intell, pp 1171–1177Google Scholar
- 19.Levatic J, Dzeroski S, Supek F, Smuc T (2013) Semi-supervised learning for quantitative structure-activity modeling. Informatica 37:173–179Google Scholar
- 20.Gu Q, Li Z, Han J (2012) Generalized Fisher score for feature selection. CoRR. abs/1202.3Google Scholar
- 25.BindingDB (n.d.) https://www.bindingdb.org/bind/index.jsp
- 28.Trott O, Olson AJ (2010) AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J Comput Chem 31:455–461Google Scholar
- 30.Alpaydin E (2010) Introduction to machine learning, 2nd edn. MIT Press, CambridgeGoogle Scholar
- 32.Doquire G, Verleysen M (2011) Graph laplacian for semi-supervised feature selection in regression problems. Lect. Notes Comput. Sci. (Including Subser. Lect. Notes Artif. Intell. Lect Notes Bioinformatics) 248–255. https://doi.org/10.1007/978-3-642-21501-8_31
- 34.He X, Cai D, Niyogi P (2005) Laplacian score for feature selection. Adv Neural Inf Process Syst 18:507–514Google Scholar