Journal of Computer-Aided Molecular Design

, Volume 32, Issue 2, pp 375–384 | Cite as

A combined Fisher and Laplacian score for feature selection in QSAR based drug design using compounds with known and unknown activities

  • Mohammad Amin Valizade Hasanloei
  • Razieh Sheikhpour
  • Mehdi Agha Sarram
  • Elnaz Sheikhpour
  • Hamdollah Sharifi


Quantitative structure–activity relationship (QSAR) is an effective computational technique for drug design that relates the chemical structures of compounds to their biological activities. Feature selection is an important step in QSAR based drug design to select the most relevant descriptors. One of the most popular feature selection methods for classification problems is Fisher score which aim is to minimize the within-class distance and maximize the between-class distance. In this study, the properties of Fisher criterion were extended for QSAR models to define the new distance metrics based on the continuous activity values of compounds with known activities. Then, a semi-supervised feature selection method was proposed based on the combination of Fisher and Laplacian criteria which exploits both compounds with known and unknown activities to select the relevant descriptors. To demonstrate the efficiency of the proposed semi-supervised feature selection method in selecting the relevant descriptors, we applied the method and other feature selection methods on three QSAR data sets such as serine/threonine–protein kinase PLK3 inhibitors, ROCK inhibitors and phenol compounds. The results demonstrated that the QSAR models built on the selected descriptors by the proposed semi-supervised method have better performance than other models. This indicates the efficiency of the proposed method in selecting the relevant descriptors using the compounds with known and unknown activities. The results of this study showed that the compounds with known and unknown activities can be helpful to improve the performance of the combined Fisher and Laplacian based feature selection methods.


Semi-supervised Feature selection Fisher criterion Graph Laplacian QSAR models 



This study was supported by Hematology and Oncology Research Center of Shahid Sadoughi University of Medical Sciences (funding reference number: 5666).

Supplementary material

10822_2017_94_MOESM1_ESM.docx (71 kb)
Supplementary material 1 (DOCX 70 KB)


  1. 1.
    Jalali-Heravi M, Asadollahi-Baboli M (2009) Quantitative structure–activity relationship study of serotonin (5-HT7) receptor inhibitors using modified ant colony algorithm and adaptive neuro-fuzzy interference system (ANFIS). Eur J Med Chem 44:1463–1470. CrossRefGoogle Scholar
  2. 2.
    Darnag R, Minaoui B, Fakir M (2012) QSAR models for prediction study of HIV protease inhibitors using support vector machines, neural networks and multiple linear regression. Arab J Chem. Google Scholar
  3. 3.
    Sheikhpour R, Sarram MA, Gharaghani S, Zare MA, Chahooki (2017) Feature selection based on graph Laplacian by utilizing compounds with known and unknown activities. J Chemom. Google Scholar
  4. 4.
    Yao XJ, Panaye A, Doucet JP, Zhang RS, Chen HF, Liu MC et al, (2004) Comparative study of QSAR/QSPR correlations using support vector machines, radial basis function neural networks, and multiple linear regression. J Chem Inf Model 44:1257–1266. Google Scholar
  5. 5.
    Abbasitabar F, Zare-Shahabadi V (2012) Development predictive QSAR models for artemisinin analogues by various feature selection methods: a comparative study. SAR QSAR Environ Res 23:1–15. CrossRefGoogle Scholar
  6. 6.
    Bagheri S, Omidikia N, Kompany-Zareh M (2013) Unsupervised selection of informative descriptors in QSAR study of anti-HIV activities of HEPT derivatives. Chemom Intell Lab Syst 128:135–143. CrossRefGoogle Scholar
  7. 7.
    Bozorgi AH, Bagheri M, Aslebagh R, Rajabi MS (2013) A structure–activity relationship survey of histone deacetylase (HDAC) inhibitors. Chemom Intell Lab Syst 125:132–138CrossRefGoogle Scholar
  8. 8.
    Venkatraman V, Dalby AR, Yang ZR (2004) Evaluation of mutual information, genetic algorithm and SVR for feature selection in QSAR regression. J Chem Inf Comput Sci 44:1688–1692. CrossRefGoogle Scholar
  9. 9.
    Elmi Z, Faez K, Goodarzi M, Goudarzi N (2009) Feature selection method based on fuzzy entropy for regression in QSAR studies. Mol Phys 107:1787–1798. CrossRefGoogle Scholar
  10. 10.
    Goodarzi M, Vander Heyden Y, Funar-Timofei S (2013) Towards better understanding of feature-selection or reduction techniques for quantitative structure–activity relationship models. TrAC Trends Anal Chem 42:49–63. CrossRefGoogle Scholar
  11. 11.
    Mohseni Bababdani B, Mousavi M (2013) Gravitational search algorithm: A new feature selection method for QSAR study of anticancer potency of imidazo[4,5-b]pyridine derivatives. Chemom Intell Lab Syst 122:1–11. CrossRefGoogle Scholar
  12. 12.
    Kalakech M, Biela P, Hamad D, Macaire L (2013) Constraint score evaluation for spectral feature selection. Neural Process Lett 38:155–175. CrossRefGoogle Scholar
  13. 13.
    Sheikhpour R, Sarram MA, Gharaghani S (2017) Constraint score for semi-supervised feature selection in ligand-and receptor-based QSAR on serine/threonine-protein kinase PLK3 inhibitors. Chemom Intell Lab Syst 163:31–40. CrossRefGoogle Scholar
  14. 14.
    Sheikhpour R, Sarram MA, Gharaghani S, Chahooki MAZ (2017) A survey on semi-supervised feature selection methods. Pattern Recognit 64:141–158. CrossRefGoogle Scholar
  15. 15.
    Xu Z, King I, Lyu MRT, Jin R (2010) Discriminative semi-supervised feature selection via manifold regularization. IEEE Trans Neural Networks 21:1033–1047. CrossRefGoogle Scholar
  16. 16.
    Han Y, Yang Y, Yan Y, Ma Z, Sebe N, Member S (2015) Semisupervised feature selection via spline regression for video semantic recognition. IEEE Trans Neural Networks Learn Syst 26:252–264CrossRefGoogle Scholar
  17. 17.
    Chang X, Yang Y (2016) Semisupervised feature analysis by mining correlations among multipe tasks. IEEE Trans Neural Networks Learn Syst 1–12.
  18. 18.
    Chang X, Nie F, Yang Y, Huang H (2014) A Convex formulation for semi-supervised multi-label feature selection. In Proceedings 28th AAAI Conf Artif Intell, pp 1171–1177Google Scholar
  19. 19.
    Levatic J, Dzeroski S, Supek F, Smuc T (2013) Semi-supervised learning for quantitative structure-activity modeling. Informatica 37:173–179Google Scholar
  20. 20.
    Gu Q, Li Z, Han J (2012) Generalized Fisher score for feature selection. CoRR. abs/1202.3Google Scholar
  21. 21.
    Huang H, Li J, Liu J (2012) Enhanced semi-supervised local Fisher discriminant analysis for face recognition. Future Gener Comput Syst 28:244–253. CrossRefGoogle Scholar
  22. 22.
    Tropsha A, Gramatica P, Gombar VK (2003) The importance of being earnest: Validation is the absolute essential for successful application and interpretation of QSPR models. QSAR Comb Sci 22:69–77. CrossRefGoogle Scholar
  23. 23.
    Golbraikh A, Tropsha A (2002) Beware of q2! J Mol Graph Model 20:269–276. CrossRefGoogle Scholar
  24. 24.
    Roy PP, Roy K (2008) On some aspects of variable selection for partial least squares regression models. QSAR Comb Sci 27:302–313. CrossRefGoogle Scholar
  25. 25.
  26. 26.
    Habibi-Yangjeh A, Danandeh-Jenagharad M, Nooshyar M (2006) Application of artificial neural networks for predicting the aqueous acidity of various phenols using QSAR. J Mol Model 12:338–347. CrossRefGoogle Scholar
  27. 27.
    Yap C (2011) PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints. J Comput Chem 32:1446–1474CrossRefGoogle Scholar
  28. 28.
    Trott O, Olson AJ (2010) AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J Comput Chem 31:455–461Google Scholar
  29. 29.
    Durrant JD, McCammon JA (2011) BINANA: a novel algorithm for ligand-binding characterization. J Mol Graph Model 29:888–893. CrossRefGoogle Scholar
  30. 30.
    Alpaydin E (2010) Introduction to machine learning, 2nd edn. MIT Press, CambridgeGoogle Scholar
  31. 31.
    Rácz A, Bajusz D, Héberger K (2015) Consistency of QSAR models: correct split of training and test sets, ranking of models and performance parameters. SAR QSAR Environ Res 26:683–700. CrossRefGoogle Scholar
  32. 32.
    Doquire G, Verleysen M (2011) Graph laplacian for semi-supervised feature selection in regression problems. Lect. Notes Comput. Sci. (Including Subser. Lect. Notes Artif. Intell. Lect Notes Bioinformatics) 248–255.
  33. 33.
    Doquire G, Verleysen M (2013) A graph laplacian based approach to semi-supervised feature selection for regression problems. Neurocomputing 121:5–13. CrossRefGoogle Scholar
  34. 34.
    He X, Cai D, Niyogi P (2005) Laplacian score for feature selection. Adv Neural Inf Process Syst 18:507–514Google Scholar
  35. 35.
    Ventura C, Latino DARS, Martins F (2013) Comparison of multiple linear regressions and neural networks based QSAR models for the design of new antitubercular compounds. Eur J Med Chem 70:831–845. CrossRefGoogle Scholar
  36. 36.
    Luo J, Hu J, Fu L, Liu C, Jin X (2011) Use of artificial neural network for a QSAR study on neurotrophic activities of N-p-tolyl/phenylsulfonyl L-amino acid thiolester derivatives. Procedia Eng 15:5158–5163. CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2017

Authors and Affiliations

  1. 1.Clinical Research Development Unit of Imam Khomeini HospitalUrmia University of Medical SciencesUrmiaIran
  2. 2.Department of Computer EngineeringYazd UniversityYazdIran
  3. 3.Hematology and Oncology Research CenterShahid Sadoughi University of Medical SciencesYazdIran

Personalised recommendations