Skip to main content
Log in

Turbo prediction: a new approach for bioactivity prediction

  • Published:
Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

Abstract

Nowadays, activity prediction is key to understanding the mechanism-of-action of active structures discovered from phenotypic screening or found in natural products. Machine learning is currently one of the most important and rapidly evolving topics in computer-aided drug discovery to identify and design new drugs with superior biological activities. The performance of a predictive machine learning model can be enhanced through the optimal selection of learning data, algorithm, algorithm parameters, and ensemble methods. In this article, we focus on how to enhance the prediction model using the learning data. However, get an option to add more and accurate data is not easy and available in many cases. This motivated us to propose the turbo prediction model, in which nearest neighbour structures are used to increase prediction accuracy. Five datasets, well known in the literature, were used in this article and experimental results show that turbo prediction can improve the quality prediction of the conventional prediction models, particularly for heterogeneous datasets, without any additional effort on the part of the user carrying out the prediction process, and at a minimal computational cost.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  1. Whitebread S, Hamon J, Bojanic D, Urban L (2005) Keynote review: in vitro safety pharmacology profiling: an essential tool for successful drug development. Drug Discov Today 10:1421–1433. https://doi.org/10.1016/S1359-6446(05)03632-9

    Article  CAS  PubMed  Google Scholar 

  2. Haggarty SJ, Koeller KM, Wong JC et al (2003) Multidimensional chemical genetic analysis of diversity-oriented synthesis-derived deacetylase inhibitors using cell-based assays. Chem Biol 10:383–396. https://doi.org/10.1016/s1074-5521(03)00095-4

    Article  CAS  PubMed  Google Scholar 

  3. Manly CJ, Louise-May S, Hammer JD (2001) The impact of informatics and computational chemistry on synthesis and screening. Drug Discov Today 6:1101–1110. https://doi.org/10.1016/S1359-6446(01)01990-0

    Article  CAS  PubMed  Google Scholar 

  4. Hopkins AL (2009) Drug discovery: Predicting promiscuity. Nature 462:167–168. https://doi.org/10.1038/462167a

    Article  CAS  PubMed  Google Scholar 

  5. Jenkins JL, Bender A, Davies JW (2006) In silico target fishing: predicting biological targets from chemical structure. Drug Discov Today Technol 3:413–421. https://doi.org/10.1016/j.ddtec.2006.12.008

    Article  Google Scholar 

  6. Mathai N, Kirchmair J (2020) Similarity-based methods and machine learning approaches for target prediction in early drug discovery: performance and scope. Int J Mol Sci. https://doi.org/10.3390/ijms21103585

    Article  PubMed  PubMed Central  Google Scholar 

  7. Ding H, Takigawa I, Mamitsuka H, Zhu S (2014) Similarity-based machine learning methods for predicting drug-target interactions: a brief review. Brief Bioinform 15:734–747. https://doi.org/10.1093/bib/bbt056

    Article  PubMed  Google Scholar 

  8. Wang C, Kurgan L (2019) Survey of Similarity-based Prediction of Drug-protein Interactions. Curr Med Chem. https://doi.org/10.2174/0929867326666190808154841

    Article  PubMed  PubMed Central  Google Scholar 

  9. Wang C, Kurgan L (2019) Review and comparative assessment of similarity-based methods for prediction of drug-protein interactions in the druggable human proteome. Brief Bioinform 20:2066–2087. https://doi.org/10.1093/bib/bby069

    Article  CAS  PubMed  Google Scholar 

  10. Lo Y-C, Rensi SE, Torng W, Altman RB (2018) Machine learning in chemoinformatics and drug discovery. Drug Discov Today 23:1538–1546. https://doi.org/10.1016/j.drudis.2018.05.010

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Geppert H, Vogt M, Bajorath J (2010) Current trends in ligand-based virtual screening: molecular representations, data mining methods, new application areas, and performance evaluation. J Chem Inf Model 50:205–216. https://doi.org/10.1021/ci900419k

    Article  CAS  PubMed  Google Scholar 

  12. Belkin NJ, Cool C, Croft WB, Callan JP (1993) The effect multiple query representations on information retrieval system performance. In: proceedings of the 16th annual international ACM SIGIR conference on research and development in information retrieval; SIGIR ’93; ACM. New York, pp 339–346

  13. Xue L, Godden JW, Stahura FL, Bajorath J (2003) Profile scaling increases the similarity search performance of molecular fingerprints containing numerical descriptors and structural keys. J Chem Inf Comput Sci 43:1218–1225. https://doi.org/10.1021/ci030287u

    Article  CAS  PubMed  Google Scholar 

  14. Sheridan RP (2000) The centroid approximation for mixtures: calculating similarity and deriving structure−activity relationships. J Chem Inf Comput Sci 40:1456–1469. https://doi.org/10.1021/ci000045j

    Article  CAS  PubMed  Google Scholar 

  15. Shemetulskis NE, Weininger D, Blankley CJ et al (1996) Stigmata: an algorithm to determine structural commonalities in diverse datasets. J Chem Inf Comput Sci 36:862–871. https://doi.org/10.1021/ci950169

    Article  CAS  PubMed  Google Scholar 

  16. Hert J, Willett P, Wilton DJ et al (2004) Comparison of fingerprint-based methods for virtual screening using multiple bioactive reference structures. J Chem Inf Comput Sci 44:1177–1185. https://doi.org/10.1021/ci034231b

    Article  CAS  PubMed  Google Scholar 

  17. Abdo A, Salim N (2009) Similarity-based virtual screening using Bayesian inference network: enhanced search using 2D fingerprints and multiple reference structures. QSAR Comb Sci 28:654–663. https://doi.org/10.1002/qsar.200860155

    Article  CAS  Google Scholar 

  18. Chen B, Harrison RF, Papadatos G et al (2007) Evaluation of machine-learning methods for ligand-based virtual screening. J Comput Aided Mol Des 21:53–62. https://doi.org/10.1007/s10822-006-9096-5

    Article  CAS  PubMed  Google Scholar 

  19. Geppert H, Horváth T, Gärtner T et al (2008) Support-vector-machine-based ranking significantly improves the effectiveness of similarity searching using 2D fingerprints and multiple reference compounds. J Chem Inf Model 48:742–746. https://doi.org/10.1021/ci700461s

    Article  CAS  PubMed  Google Scholar 

  20. Salton G, Buckley C (1990) Improving retrieval performance by relevance feedback. J Am Soc Inform Sci 41:288–297. https://doi.org/10.1002/(SICI)1097-4571(199006)41:4

    Article  Google Scholar 

  21. Ruthven I, Lalmas M (2003) A survey on the use of relevance feedback for information access systems. Knowl Eng Rev 18:95–145. https://doi.org/10.1017/S0269888903000638

    Article  Google Scholar 

  22. Hert J, Willett P, Wilton DJ et al (2006) New Methods for ligand-based virtual screening: use of data fusion and machine learning to enhance the effectiveness of similarity searching. J Chem Inf Model 46:462–470. https://doi.org/10.1021/ci050348j

    Article  CAS  PubMed  Google Scholar 

  23. Abdo A, Salim N, Ahmed A (2011) Implementing relevance feedback in ligand-based virtual screening using Bayesian inference network. J Biomol Screen 16:1081–1088. https://doi.org/10.1177/1087057111416658

    Article  CAS  PubMed  Google Scholar 

  24. Gardiner EJ, Gillet VJ, Haranczyk M et al (2009) Turbo similarity searching: effect of fingerprint and dataset on virtual-screening performance. Stat Anal Data Min 2:103–114. https://doi.org/10.1002/sam.10037

    Article  Google Scholar 

  25. Abdo A, Saeed F, Hamza H et al (2012) Ligand expansion in ligand-based virtual screening using relevance feedback. J Comput Aided Mol Des 26:279–287. https://doi.org/10.1007/s10822-012-9543-4

    Article  CAS  PubMed  Google Scholar 

  26. Hert J, Willett P, Wilton DJ et al (2005) Enhancing the effectiveness of similarity-based virtual screening using nearest-neighbor information. J Med Chem 48:7049–7054. https://doi.org/10.1021/jm050316n

    Article  CAS  PubMed  Google Scholar 

  27. Kogej T, Engkvist O, Blomberg N, Muresan S (2006) Multifingerprint based similarity searches for targeted class compound selection. J Chem Inf Model 46:1201–1213. https://doi.org/10.1021/ci0504723

    Article  CAS  PubMed  Google Scholar 

  28. Xue L, Godden JW, Bajorath J (2000) Evaluation of descriptors and mini-fingerprints for the identification of molecules with similar activity. J Chem Inf Comput Sci 40:1227–1234. https://doi.org/10.1021/ci000327j

    Article  CAS  PubMed  Google Scholar 

  29. Xue L, Stahura FL, Godden JW, Bajorath J (2001) Mini-fingerprints detect similar activity of receptor ligands previously recognized only by three-dimensional pharmacophore-based methods. J Chem Inf Comput Sci 41:394–401. https://doi.org/10.1021/ci000305x

    Article  CAS  PubMed  Google Scholar 

  30. Alberga D, Trisciuzzi D, Montaruli M et al (2019) A New approach for drug target and bioactivity prediction: the multifingerprint similarity search algorithm (MuSSeL). J Chem Inf Model 59:586–596. https://doi.org/10.1021/acs.jcim.8b00698

    Article  CAS  PubMed  Google Scholar 

  31. Montaruli M, Alberga D, Ciriaco F et al (2019) Accelerating drug discovery by early protein drug target prediction based on a multi-fingerprint similarity search. Molecules 24:2233. https://doi.org/10.3390/molecules24122233

    Article  CAS  PubMed Central  Google Scholar 

  32. Ciriaco F, Gambacorta N, Alberga D, Nicolotti O (2021) Quantitative polypharmacology profiling based on a multifingerprint similarity predictive approach. J Chem Inf Model. https://doi.org/10.1021/acs.jcim.1c00498

    Article  PubMed  Google Scholar 

  33. Willett P (2006) Similarity-based virtual screening using 2D fingerprints. Drug Discovery Today 11:1046–1053. https://doi.org/10.1016/j.drudis.2006.10.005

    Article  CAS  PubMed  Google Scholar 

  34. Abdo A, Chen B, Mueller C et al (2010) Ligand-based virtual screening using Bayesian networks. J Chem Inf Model 50:1012–1020. https://doi.org/10.1021/ci100090p

    Article  CAS  PubMed  Google Scholar 

  35. Johnson MA, Maggiora GM (1990) Concepts and applications of molecular similarity. Wiley, New York

    Google Scholar 

  36. Pedregosa F, Varoquaux G, Gramfort A et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2830

    Google Scholar 

  37. Fan R-E, Chang K-W, Hsieh C-J et al (2008) LIBLINEAR: a library for large linear classification. J Mach Learn Res 9:1871–1874

    Google Scholar 

  38. Goldberger J, Hinton GE, Roweis ST, Salakhutdinov RR (2005) Neighbourhood components analysis. In: Saul LK, Weiss Y, Bottou L (eds) Advances in neural information processing systems 17. MIT Press, pp 513–520

    Google Scholar 

  39. BIOVIA Databases | Bioactivity Databases: MDDR

  40. Rohrer SG, Baumann KMUV (2009) Data sets for virtual screening based on PubChem bioactivity data. J Chem Inf Model 49:169–184. https://doi.org/10.1021/ci8002649

    Article  CAS  PubMed  Google Scholar 

  41. Huang N, Shoichet BK, Irwin JJ (2006) Benchmarking sets for molecular docking. J Med Chem 49:6789–6801. https://doi.org/10.1021/jm0608356

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Pipeline Pilot Scientific Application Overview | Dassault Systèmes BIOVIA

  43. O’Boyle NM, Sayle RA (2016) Comparing structural fingerprints using a literature-based similarity benchmark. J Cheminform. https://doi.org/10.1186/s13321-016-0148-0

    Article  PubMed  PubMed Central  Google Scholar 

  44. Pan S, Wu J, Zhu X et al (2017) Task sensitive feature exploration and learning for multitask graph classification. IEEE Transactions on Cybernetics 47:744–758. https://doi.org/10.1109/TCYB.2016.2526058

    Article  Google Scholar 

  45. Pan S, Wu J, Zhu X et al (2015) Finding the best not the most: regularized loss minimization subgraph selection for graph classification. Pattern Recognit. https://doi.org/10.1016/j.patcog.2015.05.019

    Article  Google Scholar 

  46. Swets JA (1988) Measuring the accuracy of diagnostic systems. Science 240:1285–1293. https://doi.org/10.1126/science.3287615

    Article  CAS  PubMed  Google Scholar 

  47. Triballeau N, Acher F, Brabet I et al (2005) Virtual screening workflow development guided by the “Receiver Operating Characteristic” curve approach. application to high-throughput docking on metabotropic glutamate receptor subtype 4. J Med Chem 48:2534–2547. https://doi.org/10.1021/jm049092j

    Article  CAS  PubMed  Google Scholar 

  48. Wilcoxon F (1945) Individual comparisons by ranking methods. Biom Bull 1:80–83. https://doi.org/10.2307/3001968

    Article  Google Scholar 

  49. Siegel S, Castellan Jr. NJ (1988) Nonparametric statistics for the behavioral sciences. In: Edd ED (ed) Nonparametric statistics for the behavioral sciences, 2nd edn. Mcgraw-Hill Book Company, New York

    Google Scholar 

Download references

Acknowledgements

This work was supported by Lille University, CNRS and Programme national d’aide à l’Accueil en Urgence des Scientifiques en Exil (PAUSE).

Author information

Authors and Affiliations

Authors

Contributions

The research was conducted by mutual contributions of all authors. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Ammar Abdo.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 35 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Abdo, A., Pupin, M. Turbo prediction: a new approach for bioactivity prediction. J Comput Aided Mol Des 36, 77–85 (2022). https://doi.org/10.1007/s10822-021-00440-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10822-021-00440-3

Keywords

Navigation