Skip to main content
Log in

Reverse fingerprinting, similarity searching by group fusion and fingerprint bit importance

  • Full–length paper
  • Published:
Molecular Diversity Aims and scope Submit manuscript

Summary

Recent research has shown that using data fusion rules in fingerprint-based similarity searching can improve results over traditional searches. Group fusion scores, which use multiple reference compounds, have in particular been shown to be quite effective in increasing enrichment rates over single reference structure based searches. In this paper, the effectiveness of using data fusion with multiple reference compounds to increase similarity search recall rates was investigated using 44 biological targets and four different 2D fingerprinting systems, including a new 2D typed triangle fingerprinting system introduced here. Scaffold-hopping abilities using data fusion rules were investigated using eight (8) different classes of scaffolds active against cGMP phosphodiesterase isoform 5 (PDE5). An approach to using the reference group for ranking and visualizing important fingerprints bits, or reverse fingerprinting, was presented, and used to score and visualize important pharmacophore features within sample active molecules. Finally, similarity statistics within the reference groups were investigated and compared to recall rates.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Abbreviations

GpiDAPH:

graph pi-donor-acceptor-polar-hydrophobe fingerprints

TGT:

typed graph triangle fingerprints

PCH:

polar-charged-hydrophobe fingerprints

MACCS:

166 public MACCS keys

n :

group count (# of reference structures)

ROC:

receiver operator characteristic curve

AVE:

average fusion rule

MAX:

maximum fusion rule

vHTS:

virtual high-throughput screening

Ck :

bit coverage in the training group

Tk :

bit importance

wL :

pharmacophore fragment score

fk i :

bit position

References

  1. Willett, P., Chemical similarity searching, J. Chem. Inf. Comput. Sci., 38 (1998) 983–996.

    CAS  Google Scholar 

  2. Sheridan, R.P. and Kearsley, S.K., Why do we need so many chemical similarity search methods?, Drug Discovery Today, 7 (2002) 903–911.

    Article  PubMed  Google Scholar 

  3. Miller, M.A., Chemical Database Techniques in Drug Discovery, Nat. Rev. Drug Discov., 1 (2002) 220–227.

    Article  PubMed  CAS  Google Scholar 

  4. Walters, P. et al., Virtual Screening — an Overview, Drug Discov. Today, 3 (1998) 160–178.

    Article  CAS  Google Scholar 

  5. Johnson, M.A. and Maggiora, G.M., Concepts and Applications of Molecular Similarity, Wiley, New York, 1990.

    Google Scholar 

  6. Kubinyi, H., Similarity and Dissimilarity — A Medicinal Chemists View, Perspect. Drug Discovery Des., 11 (1998) 225–252.

    Article  Google Scholar 

  7. Martin, Y.C., Kofron, J.L. and Traphagan, L.M., Do Structurally Similar Molecules Have Similar Biological Activity?, J. Med. Chem., 45 (2002) 4350–4358.

    Article  PubMed  CAS  Google Scholar 

  8. Downs, G.M. and Willett, P., Similarity Searching in Databases of Chemical Structures, Rev. Comput. Chem., 7 (1995) 1–66.

    Google Scholar 

  9. Leach, A.R. and Gillet, V.J., An Introduction to Chemoinfomatics, Kluwer Academic, Boston, 2003.

    Google Scholar 

  10. Ginn, C.M.R., Willett, P. and Bradshaw, J., Combination of Molecular Similarity Measures using Data Fusion, Perspect Drug Discov Design, 20 (2000) 1–16.

    Article  CAS  Google Scholar 

  11. Ginn, C.M.R., The Application of Data Fusion to Similarity Searching of Chemical Databases, Ph.D. thesis, University of Sheffield, 1998.

  12. Charifson, P.S., Corkery, J.J., Murcko, M.A. and Walters, W.P., Consensus Scoring: A method for Obtaining Improved Hit Rates from Docking Databases of Three-Dimensional Structures to Proteins, J. Med. Chem., 42 (1999) 5100–5109.

    Article  PubMed  CAS  Google Scholar 

  13. Kontoyianni, M., McClellan, L. and Sokol, G.S., Evaluation of Docking Performance: Comparative Data on Docking Algorithms, J. Med. Chem., 47 (2004) 558–565.

    Article  PubMed  CAS  Google Scholar 

  14. Bissantz, C., Folkers, G. and Rognan, D., Protein-based Virtual Screening of Chemical Databases. 1. Evaluation of Different Docking/Scoring Combinations, J. Med. Chem., 43 (2000) 4759–4767.

    Article  PubMed  CAS  Google Scholar 

  15. Stahl, M. and Rarey, M., Detailed Analysis of Scoring Functions for Virtual Screening, J. Med. Chem., 44 (2001) 1035–1042.

    Article  PubMed  CAS  Google Scholar 

  16. Tong, W., Hong, H., Fang, H., Xie, Q. and Perkins, R., Decision Forest: Combining the Predictions of Multiple Independent Decision Tree Models, J, Chem. Inf. Comp. Sci., 43 (2003) 525–531.

    Article  CAS  Google Scholar 

  17. Jurs, P.C., Kaufmann, G.W. and Mattioni, B.E., Predicting the Genotoxicity of Secondary and Aromatic Amines using Data Subsetting to Generate a Model Ensemble, J. Chem. Inf. Comp. Sci., 43 (2003) 949–963.

    Article  CAS  Google Scholar 

  18. Mozziconacci, J.C., Arnoult, E., Baurin, N., Chavatte, P., Marot, C. and Morin-Allory, L., 2-D QSAR Consensus Prediction for High-Throughput Virtual Screening; An Application to COX-2 Inhibition Modeling and Screening of the NCI Database, J. Chem. Inf. Comp. Sci., 44 (2004) 276–285.

    Article  CAS  Google Scholar 

  19. Votano, J.R., Parham, M., Hall, L.H., Kier, L.B., Oloff, S., Tropsha, A., Xie, Q. and Tong, W., Three New Consensus QSAR Models for the Prediction of Ames Genotoxicity, Mutagenesis, 19 (2004) 365–378.

    Article  PubMed  CAS  Google Scholar 

  20. Votano, J.R., Parham, M., Hall, L.H. and Kier, L.B., New Predictors for Several ADME/Tox Properties: Aqueous Solubility, Human Oral Absorption, and Ames Genotoxicity Using Topological Descriptors, Mol. Divers., 8 (2004) 835–841.

    Article  Google Scholar 

  21. Wang, R. and Wang, S., How does Consensus Scoring work for Virtual Library Screening? An Idealized Computer Experiment, J. Chem. Inf. Comp. Sci., 41 (2001) 1422–1426.

    Article  CAS  Google Scholar 

  22. Feher, M., Baber, J.C., Shirley, W.A. and Gao, Y., The Use of Consensus Scoring in Ligand-based Virtual Screening, J. Chem. Inf. Comput. Sci., 46 (2006) 277–288.

    Google Scholar 

  23. Klon, A.E., Glick, M., Thoma, M., Acklin, P. and Davies, J.W., Finding more Needles in the Haystack; A Simple and Efficient Method for Improving High-Throughput Docking Results, J. Med. Chem., 47 (2004) 2743–2749.

    Article  PubMed  CAS  Google Scholar 

  24. Hert, J., Willett, P., Wilton, D.J., Acklin, P.A., Azzaoui, K., Jacoby, E. and Schuffenhauer, A., Comparison of Topological Descriptors for Similarity-Based Virtual Screening using Multiple Bioactive Reference Structures, Org. Biomol. Chem., 2 (2004) 3256–3266.

    Article  PubMed  CAS  Google Scholar 

  25. Willett, P., Searching Techniques for Databases of Two- and Three-Dimensional Chemical Structures, J. Med. Chem., 48 (2005) 4183–4199.

    Article  PubMed  CAS  Google Scholar 

  26. Brown, R. and Martin, E., Use of Structure-Activity Data to Compare Structure-Based Clustering Methods and Descriptors for use in Compound Selection, J. Chem. Inf. Comput. Sci., 36 (1996) 572–584.

    Article  CAS  Google Scholar 

  27. Schuffenhauer, A., Floersheim, P., Acklin, P. and Jacoby, E., Similarity Metrics for Ligands Reflecting the Similarity of the Target Proteins, J. Chem. Inf. Comput. Sci., 43 (2003) 391–405.

    Article  PubMed  CAS  Google Scholar 

  28. Rarey, M. and Dixon, J.S., Feature Trees: A New Molecular Similarity Measure Based on Tree Matching, J. Comput. Aided Mol. Des., 12 (1998) 471–490.

    Article  PubMed  CAS  Google Scholar 

  29. Xue, L., Godden, J.W., Stahura, F.L. and Bajorath, J., Design and Evaluation of a Molecular Fingerprint Involving the Transformation of Property Descriptor Values into a Biniary Classification Scheme, J. Chem. Inf. Comp. Sci. 43 (2003) 1151–1157.

    Article  CAS  Google Scholar 

  30. James, C.A. and Weininger, D., Daylight Theory manual, Daylight Chemical Information Systems, Inc., Irvine, CA, USA, www.daylight.com

  31. Unity, Chemical Information Software, Tripos, Inc., St. Louis, MO, USA, www.tripos.com

  32. Durant, J.L, Leland, B.A., Henry, D.R. and Nourse, J.G., Reoptimization of MDL Keys for use in Drug Discovery, J. Chem. Inf. Comput. Sci., 42 (2002) 1273–1280.

    Article  PubMed  CAS  Google Scholar 

  33. ECFP*/FCFP*, Extended Connectivity Rings, Scitegic Inc., San Diego CA, USA 92123 www.scitegic.com

  34. BCI — Barnard Chemical Information Ltd., Sheffield, UK, www.bci.gb.com

  35. Xue, L., Godden, J.W. and Bajorath, J., Database Searching for Compounds with Similar Biological Activity Using Short Binary Bit String Representations of Molecules, J. Chem. Inf. Comput. Sci., 39 (1999) 881–886.

    Article  PubMed  CAS  Google Scholar 

  36. Good, A.C.; Hermsmeier, M.A. and Hindle, S.A., Measuring CAMD Technique Performance: A Virtual Screening Case Study in the Design of Validation Experiments, J. Comput.-Aided Mol. Des., 18 (2004) 529–536.

    Article  PubMed  CAS  Google Scholar 

  37. Good, A.C., Mason, J.S. and Cho, S.-J., Descriptors You Can Count On? Normalized and Filtered Descriptors for Virtual Screening, J. Comput.-Aided Mol. Des., 18 (2004) 523–527.

    Article  PubMed  CAS  Google Scholar 

  38. MOE software (Version 2005.06) available from Chemical Computing Group Inc., 1010 Sherbrooke St. West, Montreal, Quebec, Canada www.chemcomp.com

  39. Sheridan, R.P., Miller, M.D., Underwood, D.J. and Kearsley, S.K., Chemical Similarity Using Geometric Atom Pair Descriptors, J. Chem. Inf. Comput. Sci., 36 (1996) 128–135.

    Article  CAS  Google Scholar 

  40. Clark, R.D., Fox, P.C. and Abrahamian, E.J., Using pharmacophore multiplets fingerprint for virtual high throughput screening. In: Alvarez, J., Shoichet, B. (Eds.), Virtual Screening in Drug Discovery, Taylor and Francis, New York, 2005, ISBN 0-8247-5479-4, pp. 207–224.

    Google Scholar 

  41. Schneider, G., Neidhart, W., Giller, T. and Schmid, G., “Scaffold hopping” by Topological Pharmacophore Search: A Contribution to Virtual Screening, Angew. Chem. Int. Ed., 38 (1999) 2894–2896.

    Google Scholar 

  42. Gelman, A., Carlin, J.B., Stern, H.S. and Rubin, D.B., Bayesian Data Analysis, Chapman and Hall, New York, 1998.

    Google Scholar 

  43. Labute, P., Binary-QSAR: A New Method for Quantitative Structure-Activity Relationships, in Biocomputing: Proccedings of the 1999 Pacific Symposium, pp. 444–455. World Scientific Publishing, Singapore, 1999.

    Google Scholar 

  44. Shemetulskis, N.E., Weininger, D., Blankley, C.J., Yang, J.J. and Humblet, C., Stigmata: An Algorithm to Determine Structural Commonalities in Diverse Datasets, J. Chem. Inf. Comput. Sci., 36 (1996) 862–871.

    Article  PubMed  CAS  Google Scholar 

  45. MACCS keys: MDL Information Syetems, Inc., 14600 Catalina Street, San Leandro, CA 94577.

  46. Witten, I.H. and Frank, E., Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, Morgan Kaufmann Publishers, New York, 1999.

    Google Scholar 

  47. Holtje, H.-D., Pharmacophore Identification and Receptor Mapping, In Wermuth, C.G. (Ed.), The Practice of Medicinal Chemistry, Academic Press, Boston, 2003, pp. 387–403.

    Chapter  Google Scholar 

  48. Bramson, N.H. et al., Oxindole-Based Inhibitors of Cyclin-Dependent Kinase 2 (CDK2): Design, Synthesis, Enzymatic Activites and X-ray Crystallographic Analysis, J. Med. Chem., 44 (2001) 4339–4358.

    Google Scholar 

  49. Norman, P., PDE4 Inhibitors: Patent and Literature Activity 1999-mid 2000, Exp. Opin. Ther. Patents, 10 (2000) 1417–1429.

    Article  Google Scholar 

  50. Brandstetter, H., Kuhne, A., Bode, W., Huber, R., Von der Saal, W., Wirthensohn, K. and Engh, R.A., X-ray Structure of Active Site Inhibited Clotting Factor Xa: Implications for Drug Design and Substrate Recognition, J. Biol. Chem., 271 (1996) 29988.

    Article  PubMed  CAS  Google Scholar 

  51. Rotella, D.P., Phosphodiestarase 5 Inhibitors: Current Status and Potential Applications, Nature Reviews: Drug Discovery, 1 (2002) 674–682.

    Article  PubMed  CAS  Google Scholar 

  52. Watanabe, Y., Usui, H., Shibano, T., Tanaka, T. and Kanoa, M., Synthesis of Monocyclic and Bicyclic 2,4(1H,3H)-Pyrimidinediones and their Serotonin 2 Antagonist Activities, Chem. Pharm. Bull., 38 (1990) 2726–2732.

    PubMed  CAS  Google Scholar 

  53. Ketanserin patent, Janssen Pharmaceuticals N.V., European Patent Office. Kennis, L.E.J., Van der Aa, M.J.M., Van Heertum, A.M.A. and Jones, A.J. (1980) Nr. 001362, Appl. Nr. 803000–595.

  54. Xu, R.X. et al., Crystal Structures of the Catalytic Domain of Phosphodiesterase 4B Complexed with AMP, 8-Br-AMP and Rolipram, J. Mol. Biol., 337 (2004) 355–365.

    Google Scholar 

  55. Bode, W., Turk, D. and Karshikov, A., The refined 1.9 A X-ray crystal structure of D-Phe-Pro-Arg chloromethyl ketone-inhibited human α thrombin: Structural analysis, overall structure, detailed active site geometry and structure-function relationships, Protein Sci., 1 (1992) 426–471.

    Article  PubMed  CAS  Google Scholar 

  56. Zhang, K.Y.J. et al., A Glutamine Switch Mechanism for Nucleotide Selectivity by Phosphodiesterases, Mol. Cell., 15 (2004) 279–286.

  57. Schneider, G. and Fechner, U., Computer-Based De Novo Design of Drug-Like Molecules, Nature Reviews: Drug Discovery, 4 (2005) 649–663.

    Article  PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chris Williams.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Williams, C. Reverse fingerprinting, similarity searching by group fusion and fingerprint bit importance. Mol Divers 10, 311–332 (2006). https://doi.org/10.1007/s11030-006-9039-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11030-006-9039-z

Key words

Navigation