Skip to main content
Log in

Ligand expansion in ligand-based virtual screening using relevance feedback

  • Published:
Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript


Query expansion is the process of reformulating an original query to improve retrieval performance in information retrieval systems. Relevance feedback is one of the most useful query modification techniques in information retrieval systems. In this paper, we introduce query expansion into ligand-based virtual screening (LBVS) using the relevance feedback technique. In this approach, a few high-ranking molecules of unknown activity are filtered from the outputs of a Bayesian inference network based on a single ligand molecule to form a set of ligand molecules. This set of ligand molecules is used to form a new ligand molecule. Simulated virtual screening experiments with the MDL Drug Data Report and maximum unbiased validation data sets show that the use of ligand expansion provides a very simple way of improving the LBVS, especially when the active molecules being sought have a high degree of structural heterogeneity. However, the effectiveness of the ligand expansion is slightly less when structurally-homogeneous sets of actives are being sought.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others


  1. Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. J Chem Inf Comput Sci 38:983–996

    Article  CAS  Google Scholar 

  2. Carhart RE, Smith DH, Venkataraghavan R (1985) Atom pairs as molecular features in structure-activity studies: definition and applications. J Chem Inf Comput Sci 25:64–73

    Article  CAS  Google Scholar 

  3. Johnson MA, Maggiora GM (1990) Concepts and application of molecular similarity. Wiley, New York

    Google Scholar 

  4. Sheridan RP, Kearsley SK (2002) Why do we need so many chemical similarity search methods? Drug Discov Today 7:903–911

    Article  Google Scholar 

  5. Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity—a review. QSAR Comb Sci 22:1006–1026

    Article  Google Scholar 

  6. Bender A, Glen RC (2004) Molecular similarity: a key technique in molecular informatics. Org Biomol Chem 2:3204–3218

    Article  CAS  Google Scholar 

  7. Maldonado A, Doucet J, Petitjean M, Fan B-T (2006) Molecular similarity and diversity in chemoinformatics: from theory to applications. Mol Divers 10:39–79

    Article  CAS  Google Scholar 

  8. Leach AR, Gillet VJ (2003) An Introduction to chemoinformatics. Kluwer, Dordrecht

    Google Scholar 

  9. Abdo A, Salim N (2009) Similarity-based virtual screening with a Bayesian inference network. ChemMedChem 4:210–218

    Article  CAS  Google Scholar 

  10. Abdo A, Salim N (2011) Ligand-based virtual screening using Bayesian inference network. In: Library design, search methods, and applications of fragment-based drug design, vol 1076. ACS symposium series, vol 1076. American Chemical Society, pp 57–69

  11. Abdo A, Salim N (2011) New fragment weighting scheme for the Bayesian inference network in ligand-based virtual screening. J Chem Inf Model 51:25–32

    Article  CAS  Google Scholar 

  12. Abdo A, Salim N (2009) Bayesian inference network significantly improves the effectiveness of similarity searching using multiple 2D fingerprints and multiple reference structures. QSAR Comb Sci 28:1537–1545

    Article  CAS  Google Scholar 

  13. Abdo A, Salim N (2009) Similarity-based virtual screening using Bayesian inference network: enhanced search using 2D fingerprints and multiple reference structures. QSAR Comb Sci 28:654–663

    Article  CAS  Google Scholar 

  14. Abdo A, Chen B, Mueller C, Salim N, Willett P (2010) Ligand-based virtual screening using Bayesian networks. J Chem Inf Model 50:1012–1020

    Article  CAS  Google Scholar 

  15. Hert J, Willett P, Wilton DJ, Acklin P, Azzaoui K, Jacoby E, Schuffenhauer A (2005) Enhancing the effectiveness of similarity-based virtual screening using nearest-neighbor information. J Med Chem 48:7049–7054

    Article  CAS  Google Scholar 

  16. Hert J, Willett P, Wilton DJ, Acklin P, Azzaoui K, Jacoby E, Schuffenhauer A (2006) New methods for ligand-based virtual screening: use of data fusion and machine learning to enhance the effectiveness of similarity searching. J Chem Inf Model 46:462–470

    Article  CAS  Google Scholar 

  17. Gardiner EJ, Gillet VJ, Haranczyk M, Hert J, Holliday JD, Malim N, Patel Y, Willett P (2009) Turbo similarity searching: effect of fingerprint and dataset on virtual-screening performance. Stat Anal Data Mining 2:103–114

    Article  Google Scholar 

  18. Abdo A, Salim N, Ahmed A (2011) Implementing relevance feedback in ligand-based virtual screening using Bayesian inference network. J Biomol Screen 16:1081–1088

    Article  CAS  Google Scholar 

  19. de Castro P, de França F, Ferreira H, Coelho G, Von Zuben F (2010) Query expansion using an immune-inspired biclustering algorithm. Nat Comput 9:579–602

    Article  Google Scholar 

  20. López-Pujalte C, Guerrero-Bote VP, Moya-Anegón FD (2003) Genetic algorithms in relevance feedback: a second test and new contributions. Inf Process Manage 39:669–687

    Article  Google Scholar 

  21. Taktak I, Tmar M, Hamadou A (2009) Query reformulation based on relevance feedback. In: Andreasen T, Yager R, Bulskov H, Christiansen H, Larsen H (eds) Flexible query answering systems, vol 5822. Lecture notes in computer science. Springer, Berlin, pp 134–144

  22. Symyx Technologies. MDL drug data report. Accessed October 20, 2011

  23. Pipeline Pilot (2008) Accelrys Software Inc., San Diego

  24. Rohrer SG, Baumann K (2009) Maximum unbiased validation (MUV) data sets for virtual screening based on PubChem bioactivity data. J Chem Inf Model 49:169–184

    Article  CAS  Google Scholar 

  25. Siegel S, Castellan NJ (1988) Nonparametric statistics for the behavioral sciences. McGraw-Hill, New York

    Google Scholar 

  26. Swets J (1988) Measuring the accuracy of diagnostic systems. Science 240:1285–1293

    Article  CAS  Google Scholar 

  27. Triballeau N, Acher F, Brabet I, Pin J-P, Bertrand H-O (2005) Virtual screening workflow development guided by the “receiver operating characteristic” curve approach. Application to high-throughput docking on metabotropic glutamate receptor subtype 4. J Med Chem 48(7):2534–2547. doi:10.1021/jm049092j

    Google Scholar 

Download references


This work is supported by Ministry of Higher Education (MOHE) and Research Management Centre (RMC) at the Universiti Teknologi Malaysia (UTM) under Research University Grant Category (VOT Q.J130000.7128.00H72).

Author information

Authors and Affiliations


Corresponding author

Correspondence to Ammar Abdo.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Abdo, A., Saeed, F., Hamza, H. et al. Ligand expansion in ligand-based virtual screening using relevance feedback. J Comput Aided Mol Des 26, 279–287 (2012).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: