A combined Fisher and Laplacian score for feature selection in QSAR based drug design using compounds with known and unknown activities

Valizade Hasanloei, Mohammad Amin; Sheikhpour, Razieh; Sarram, Mehdi Agha; Sheikhpour, Elnaz; Sharifi, Hamdollah

doi:10.1007/s10822-017-0094-6

A combined Fisher and Laplacian score for feature selection in QSAR based drug design using compounds with known and unknown activities

Published: 26 December 2017

Volume 32, pages 375–384, (2018)
Cite this article

Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

Mohammad Amin Valizade Hasanloei¹,
Razieh Sheikhpour ORCID: orcid.org/0000-0002-3119-3349²,
Mehdi Agha Sarram²,
Elnaz Sheikhpour³ &
…
Hamdollah Sharifi¹

787 Accesses
17 Citations
3 Altmetric
Explore all metrics

Abstract

Quantitative structure–activity relationship (QSAR) is an effective computational technique for drug design that relates the chemical structures of compounds to their biological activities. Feature selection is an important step in QSAR based drug design to select the most relevant descriptors. One of the most popular feature selection methods for classification problems is Fisher score which aim is to minimize the within-class distance and maximize the between-class distance. In this study, the properties of Fisher criterion were extended for QSAR models to define the new distance metrics based on the continuous activity values of compounds with known activities. Then, a semi-supervised feature selection method was proposed based on the combination of Fisher and Laplacian criteria which exploits both compounds with known and unknown activities to select the relevant descriptors. To demonstrate the efficiency of the proposed semi-supervised feature selection method in selecting the relevant descriptors, we applied the method and other feature selection methods on three QSAR data sets such as serine/threonine–protein kinase PLK3 inhibitors, ROCK inhibitors and phenol compounds. The results demonstrated that the QSAR models built on the selected descriptors by the proposed semi-supervised method have better performance than other models. This indicates the efficiency of the proposed method in selecting the relevant descriptors using the compounds with known and unknown activities. The results of this study showed that the compounds with known and unknown activities can be helpful to improve the performance of the combined Fisher and Laplacian based feature selection methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hybridizing Feature Selection and Feature Learning Approaches in QSAR Modeling for Drug Discovery

Article Open access 25 May 2017

Comparing predictive ability of QSAR/QSPR models using 2D and 3D molecular representations

Article 04 January 2021

On the Relevance of Feature Selection Algorithms While Developing Non-linear QSARs

References

Jalali-Heravi M, Asadollahi-Baboli M (2009) Quantitative structure–activity relationship study of serotonin (5-HT7) receptor inhibitors using modified ant colony algorithm and adaptive neuro-fuzzy interference system (ANFIS). Eur J Med Chem 44:1463–1470. https://doi.org/10.1016/j.ejmech.2008.09.050
Article CAS Google Scholar
Darnag R, Minaoui B, Fakir M (2012) QSAR models for prediction study of HIV protease inhibitors using support vector machines, neural networks and multiple linear regression. Arab J Chem. https://doi.org/10.1016/j.arabjc.2012.10.021
Google Scholar
Sheikhpour R, Sarram MA, Gharaghani S, Zare MA, Chahooki (2017) Feature selection based on graph Laplacian by utilizing compounds with known and unknown activities. J Chemom. https://doi.org/10.1002/cem.2899
Google Scholar
Yao XJ, Panaye A, Doucet JP, Zhang RS, Chen HF, Liu MC et al, (2004) Comparative study of QSAR/QSPR correlations using support vector machines, radial basis function neural networks, and multiple linear regression. J Chem Inf Model 44:1257–1266. https://doi.org/10.1021/ci049965i
CAS Google Scholar
Abbasitabar F, Zare-Shahabadi V (2012) Development predictive QSAR models for artemisinin analogues by various feature selection methods: a comparative study. SAR QSAR Environ Res 23:1–15. https://doi.org/10.1080/1062936X.2011.623316
Article CAS Google Scholar
Bagheri S, Omidikia N, Kompany-Zareh M (2013) Unsupervised selection of informative descriptors in QSAR study of anti-HIV activities of HEPT derivatives. Chemom Intell Lab Syst 128:135–143. https://doi.org/10.1016/j.chemolab.2013.08.004
Article CAS Google Scholar
Bozorgi AH, Bagheri M, Aslebagh R, Rajabi MS (2013) A structure–activity relationship survey of histone deacetylase (HDAC) inhibitors. Chemom Intell Lab Syst 125:132–138
Article CAS Google Scholar
Venkatraman V, Dalby AR, Yang ZR (2004) Evaluation of mutual information, genetic algorithm and SVR for feature selection in QSAR regression. J Chem Inf Comput Sci 44:1688–1692. https://doi.org/10.2174/157016311795563839
Article Google Scholar
Elmi Z, Faez K, Goodarzi M, Goudarzi N (2009) Feature selection method based on fuzzy entropy for regression in QSAR studies. Mol Phys 107:1787–1798. https://doi.org/10.1080/00268970903078559
Article CAS Google Scholar
Goodarzi M, Vander Heyden Y, Funar-Timofei S (2013) Towards better understanding of feature-selection or reduction techniques for quantitative structure–activity relationship models. TrAC Trends Anal Chem 42:49–63. https://doi.org/10.1016/j.trac.2012.09.008
Article CAS Google Scholar
Mohseni Bababdani B, Mousavi M (2013) Gravitational search algorithm: A new feature selection method for QSAR study of anticancer potency of imidazo[4,5-b]pyridine derivatives. Chemom Intell Lab Syst 122:1–11. https://doi.org/10.1016/j.chemolab.2012.12.002
Article CAS Google Scholar
Kalakech M, Biela P, Hamad D, Macaire L (2013) Constraint score evaluation for spectral feature selection. Neural Process Lett 38:155–175. https://doi.org/10.1007/s11063-013-9280-2
Article Google Scholar
Sheikhpour R, Sarram MA, Gharaghani S (2017) Constraint score for semi-supervised feature selection in ligand-and receptor-based QSAR on serine/threonine-protein kinase PLK3 inhibitors. Chemom Intell Lab Syst 163:31–40. https://doi.org/10.1016/j.chemolab.2017.02.006
Article CAS Google Scholar
Sheikhpour R, Sarram MA, Gharaghani S, Chahooki MAZ (2017) A survey on semi-supervised feature selection methods. Pattern Recognit 64:141–158. https://doi.org/10.1016/j.patcog.2016.11.003
Article Google Scholar
Xu Z, King I, Lyu MRT, Jin R (2010) Discriminative semi-supervised feature selection via manifold regularization. IEEE Trans Neural Networks 21:1033–1047. https://doi.org/10.1109/TNN.2010.2047114
Article CAS Google Scholar
Han Y, Yang Y, Yan Y, Ma Z, Sebe N, Member S (2015) Semisupervised feature selection via spline regression for video semantic recognition. IEEE Trans Neural Networks Learn Syst 26:252–264
Article Google Scholar
Chang X, Yang Y (2016) Semisupervised feature analysis by mining correlations among multipe tasks. IEEE Trans Neural Networks Learn Syst 1–12. http://arxiv.org/abs/1411.6232
Chang X, Nie F, Yang Y, Huang H (2014) A Convex formulation for semi-supervised multi-label feature selection. In Proceedings 28th AAAI Conf Artif Intell, pp 1171–1177
Levatic J, Dzeroski S, Supek F, Smuc T (2013) Semi-supervised learning for quantitative structure-activity modeling. Informatica 37:173–179
Google Scholar
Gu Q, Li Z, Han J (2012) Generalized Fisher score for feature selection. CoRR. abs/1202.3
Huang H, Li J, Liu J (2012) Enhanced semi-supervised local Fisher discriminant analysis for face recognition. Future Gener Comput Syst 28:244–253. https://doi.org/10.1016/j.future.2010.11.005
Article CAS Google Scholar
Tropsha A, Gramatica P, Gombar VK (2003) The importance of being earnest: Validation is the absolute essential for successful application and interpretation of QSPR models. QSAR Comb Sci 22:69–77. https://doi.org/10.1002/qsar.200390007
Article CAS Google Scholar
Golbraikh A, Tropsha A (2002) Beware of q²! J Mol Graph Model 20:269–276. https://doi.org/10.1016/S1093-3263(01)00123-1
Article CAS Google Scholar
Roy PP, Roy K (2008) On some aspects of variable selection for partial least squares regression models. QSAR Comb Sci 27:302–313. https://doi.org/10.1002/qsar.200710043
Article CAS Google Scholar
BindingDB (n.d.) https://www.bindingdb.org/bind/index.jsp
Habibi-Yangjeh A, Danandeh-Jenagharad M, Nooshyar M (2006) Application of artificial neural networks for predicting the aqueous acidity of various phenols using QSAR. J Mol Model 12:338–347. https://doi.org/10.1007/s00894-005-0050-6
Article CAS Google Scholar
Yap C (2011) PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints. J Comput Chem 32:1446–1474
Article Google Scholar
Trott O, Olson AJ (2010) AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J Comput Chem 31:455–461
CAS Google Scholar
Durrant JD, McCammon JA (2011) BINANA: a novel algorithm for ligand-binding characterization. J Mol Graph Model 29:888–893. https://doi.org/10.1016/j.jmgm.2011.01.004
Article CAS Google Scholar
Alpaydin E (2010) Introduction to machine learning, 2nd edn. MIT Press, Cambridge
Google Scholar
Rácz A, Bajusz D, Héberger K (2015) Consistency of QSAR models: correct split of training and test sets, ranking of models and performance parameters. SAR QSAR Environ Res 26:683–700. https://doi.org/10.1080/1062936X.2015.1084647
Article Google Scholar
Doquire G, Verleysen M (2011) Graph laplacian for semi-supervised feature selection in regression problems. Lect. Notes Comput. Sci. (Including Subser. Lect. Notes Artif. Intell. Lect Notes Bioinformatics) 248–255. https://doi.org/10.1007/978-3-642-21501-8_31
Doquire G, Verleysen M (2013) A graph laplacian based approach to semi-supervised feature selection for regression problems. Neurocomputing 121:5–13. https://doi.org/10.1016/j.neucom.2012.10.028
Article Google Scholar
He X, Cai D, Niyogi P (2005) Laplacian score for feature selection. Adv Neural Inf Process Syst 18:507–514
Google Scholar
Ventura C, Latino DARS, Martins F (2013) Comparison of multiple linear regressions and neural networks based QSAR models for the design of new antitubercular compounds. Eur J Med Chem 70:831–845. https://doi.org/10.1016/j.ejmech.2013.10.029
Article CAS Google Scholar
Luo J, Hu J, Fu L, Liu C, Jin X (2011) Use of artificial neural network for a QSAR study on neurotrophic activities of N-p-tolyl/phenylsulfonyl L-amino acid thiolester derivatives. Procedia Eng 15:5158–5163. https://doi.org/10.1016/j.proeng.2011.08.957
Article CAS Google Scholar

Download references

Acknowledgements

This study was supported by Hematology and Oncology Research Center of Shahid Sadoughi University of Medical Sciences (funding reference number: 5666).

Author information

Authors and Affiliations

Clinical Research Development Unit of Imam Khomeini Hospital, Urmia University of Medical Sciences, Urmia, Iran
Mohammad Amin Valizade Hasanloei & Hamdollah Sharifi
Department of Computer Engineering, Yazd University, Yazd, Iran
Razieh Sheikhpour & Mehdi Agha Sarram
Hematology and Oncology Research Center, Shahid Sadoughi University of Medical Sciences, Yazd, Iran
Elnaz Sheikhpour

Authors

Mohammad Amin Valizade Hasanloei
View author publications
You can also search for this author in PubMed Google Scholar
Razieh Sheikhpour
View author publications
You can also search for this author in PubMed Google Scholar
Mehdi Agha Sarram
View author publications
You can also search for this author in PubMed Google Scholar
Elnaz Sheikhpour
View author publications
You can also search for this author in PubMed Google Scholar
Hamdollah Sharifi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Razieh Sheikhpour.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 70 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Valizade Hasanloei, M.A., Sheikhpour, R., Sarram, M.A. et al. A combined Fisher and Laplacian score for feature selection in QSAR based drug design using compounds with known and unknown activities. J Comput Aided Mol Des 32, 375–384 (2018). https://doi.org/10.1007/s10822-017-0094-6

Download citation

Received: 21 August 2017
Accepted: 15 December 2017
Published: 26 December 2017
Issue Date: February 2018
DOI: https://doi.org/10.1007/s10822-017-0094-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A combined Fisher and Laplacian score for feature selection in QSAR based drug design using compounds with known and unknown activities

Abstract

Access this article

Similar content being viewed by others

Hybridizing Feature Selection and Feature Learning Approaches in QSAR Modeling for Drug Discovery

Comparing predictive ability of QSAR/QSPR models using 2D and 3D molecular representations

On the Relevance of Feature Selection Algorithms While Developing Non-linear QSARs

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (DOCX 70 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A combined Fisher and Laplacian score for feature selection in QSAR based drug design using compounds with known and unknown activities

Abstract

Access this article

Similar content being viewed by others

Hybridizing Feature Selection and Feature Learning Approaches in QSAR Modeling for Drug Discovery

Comparing predictive ability of QSAR/QSPR models using 2D and 3D molecular representations

On the Relevance of Feature Selection Algorithms While Developing Non-linear QSARs

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (DOCX 70 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation