Advertisement

Descriptor collision and confusion: Toward the design of descriptors to mask chemical structures

Summary

We examined “descriptor collision” for several chemical fingerprint systems (MDL 320, Daylight, SMDL), and for a 2D-based descriptor set. For large databases (ChemNavigator and WOMBAT), the smallest collision rate remains around 5%. We systematically increase the “descriptor collision” rate (here termed “descriptor confusion”), in order to design a set of “descriptors to mask chemical structures”, DMCS. If effective, a DMCS system would not allow third parties to determine the original chemical structures used to derive the DMCS set (i.e., reverse engineering). Using SMDL keys, the “confusion” rate is increased to 45.6% by eliminating those keys that have a low frequency of occurrence in WOMBAT structures. We applied an automated PLS engine, WB-PLS [Olah et al., J. Comput. Aided Mol. Des., 18 (2004) 437], to 1277 series of structures from 948 targets in WOMBAT, in order to validate the biological relevance of the SMDL descriptors as a potential DMCS set. The “reduced set” of SMDL descriptors has a small loss of modeling power (around 20%) compared to the initial descriptor set, while the collision rate is significantly increased. These results indicate that the development of an effective DMCS is possible. If well documented, DMCS systems would encourage private sector data release (e.g., related to water solubility) and directly benefit public sector science.

This is a preview of subscription content, log in to check access.

Access options

Buy single article

Instant unlimited access to the full article PDF.

US$ 39.95

Price includes VAT for USA

Subscribe to journal

Immediate online access to all issues from 2019. Subscription will auto renew annually.

US$ 199

This is the net price. Taxes to be calculated in checkout.

Abbreviations

CMR:

calculated molecular refractivity

ClogP:

program produced by BioByte Corp., Claremont, CA

Daylight/DY:

Daylight Chemical Information Systems

DMCS:

descriptors to mask chemical structures

DMSO:

Dimethylsulfoxide

DPISMR:

the NIH Small Molecule Repository as organized by DPI

LogP:

the logarithm of the octanol-water partition coefficient

LogSw :

the logarithm of the (molar) aqueous solubility

MACCS:

Molecular ACCess System, an MDL product

MDL:

Molecular Design Limited

MLI:

Molecular Libraries and Imaging initiative

NIH:

National Institutes of Health

PLS:

Partial Least Squares/Projection Latent Structures

QSAR:

quantitative structure–activity relationships

SMDL:

Sunset Molecular Discovery, LLC

SMILES:

Simplified Molecular Input Line Entry Specification

WOMBAT/WB:

WOrld of Molecular BioAcTivity database.

References

  1. 1.

    Austin, C.P., Brady, L.S., Insel, T.R. and Collins, F.S., Science, 306 (2004) 1138. Last access on 21.10.05

  2. 2.

    The PubChem database is available online at the National Center for Biotechnology Information, http://pubchem.ncbi.nlm.nih.gov/ Last access on 21.10.05

  3. 3.

    Hahn, M.M. and Green, R., Curr. Opin. Chem. Biol., 3 (1999) 379.

  4. 4.

    Filimonov, D. and Poroikov, V., J. Comput. Aided Mol. Des., 19 (2005) in press

  5. 5.

    Weber, L., Curr. Opin. Chem. Biol., 2 (1998) 381.

  6. 6.

    The iResearch Library™ is available from ChemNavigator, Inc., http://chemnavigator.com/cnc/products/IRL.asp Last access on 21.10.05

  7. 7.

    The Crossfire Beilstein database is available from Elsevier MDL, http://www.mdl.com/products/knowledge/crossfire_beilstein/index.jsp Last access on 21.10.05

  8. 8.

    Tetko, I.V., Abagyan, R. and Oprea, T.I., J. Comput. Aided Mol. Des., 19 (2005) in press

  9. 9.

    Faulon, J.L., Brown, W.M. and Martin, S., J. Comput. Aided Mol. Des., 19 (2005) in press

  10. 10.

    Olah, M., Mracec, M., Ostopovici, L., Rad, R., Bora, A., Hadaruga, N., Olah, I., Banda, M., Simon, S., Mracec, M. and Oprea, T.I., In Oprea, T.I. (Ed), Chemoinformatics in Drug Discovery, Wiley-VCH, New York, 2005, pp. 223–239

  11. 11.

    WOMBAT is available from Sunset Molecular Discovery LLC, http://www.sunsetmolecular.com/ Last access on 21.10.05

  12. 12.

    Weininger, D., J. Chem. Inf. Comput. Sci., 28 (1988) 31.

  13. 13.

    Leo, A. and Weininger, D., CMR3. Daylight Chemical Information Systems, Santa Fe, New Mexico, http://www.daylight.com/, 1995

  14. 14.

    Leo, A., Chem. Rev., 93 (1993) 1281.

  15. 15.

    Leo, A. and Weininger, D., CLOGP 4.0. Daylight Chemical Information Systems, Santa Fe, New Mexico, http://www.daylight.com/, 2001

  16. 16.

    Ran, Y., Jain, N. and Yalkowsky, S.H., J. Chem. Inf. Comput. Sci., 41 (2001) 1208.

  17. 17.

    Livingstone, D.J., Ford, M.G., Huuskonen, J.J. and Salt, D.W., J. Comput. Aided Mol. Des., 15 (2001) 741.

  18. 18.

    Tetko, I.V., Tanchuk, V.Y. and Villa, A.E., J. Chem. Inf. Comput. Sci., 41 (2001) 1407.

  19. 19.

    Glen, R.C., J. Comput. Aided Mol. Des., 8 (1994) 457.

  20. 20.

    Gasteiger, J. and Marsili, M., Tetrahedron, 36 (1980) 3219.

  21. 21.

    Oprea, T.I., J. Comput. Aided Mol. Des., 14 (2000) 251.

  22. 22.

    Balaban, A.T., SAR QSAR Environ. Res., 8 (1998) 1.

  23. 23.

    Kier, L.B. and Hall, L.H. Molecular Connectivity in Structure-Activity Analysis. John Wiley, New York, 1986.

  24. 24.

    Basak, S.C., Balaban, A.T., Grunwald, G.D. and Gute, B.D., J. Chem. Inf. Comput. Sci., 40 (2000) 891.

  25. 25.

    Durant, J.L., Leland, B.A., Henry, D.R. and Nourse, J.G., J. Chem. Inf. Comput. Sci., 42 (2002) 1273.

  26. 26.

    MacCuish, J. and MacCuish, N., Measures software, Mesa Analytics and Computing LLC, Santa Fe, New Mexico, http://www.mesaac.com/ Last access on 21.10.05

  27. 27.

    Daylight fingerprints are available from Daylight Chemical Information Systems, http://www.daylight.com/ Last access on 21.10.05

  28. 28.

    Olah, M., Bologa, C. and Oprea, T.I., J. Comput. Aided Mol. Des., 18 (2004) 437.

  29. 29.

    Schneider, G., Neidhart, W., Giller, T. and Schmidt, G., Angew. Chem. Int. Ed., 38 (1999) 2894.

  30. 30.

    The SMARTS toolkit and SMARTS are available from Daylight Chemical Information Systems, Santa Fe, New Mexico, http://www.daylight.com/dayhtml/doc/theory.smarts.html; online SMARTS tutorial: http://www.daylight.com/dayhtml/doc/theory/smarts.html, 2005

  31. 31.

    SMACK and OEChem are available from OpenEye Scientific Software, Santa Fe, New Mexico, http://www.eyesopen.com/products/applications/smack.html, 2005

  32. 32.

    Wold, S., Johansson, E. and Cocchi, M., In Kubinyi, H., (Ed), 3D QSAR in Drug Design: Theory, Methods and Applications, ESCOM, Leiden, 1993, pp. 523–550

  33. 33.

    Kappler, M.A., Allu, T.K., Bologa, C. and Oprea, T.I., J. Chem. Inf. Model, 45 (2005) in preparation

Download references

Acknowledgments

We thank Jeremy (JJ) Yang from OpenEye Scientific Software (Santa Fe, NM) for advice on descriptor collision. This work was supported by New Mexico Tobacco Settlement Funds for Biocomputing (TKA, MO) and by the New Mexico Molecular Library Screening Center, NIH 1U54 MH074425-01 (CB, TIO).

Author information

Correspondence to Tudor I. Oprea.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Bologa, C., Allu, T.K., Olah, M. et al. Descriptor collision and confusion: Toward the design of descriptors to mask chemical structures. J Comput Aided Mol Des 19, 625–635 (2005). https://doi.org/10.1007/s10822-005-9020-4

Download citation

Keywords

  • chemical fingerprints
  • ChemNavigator
  • descriptor collision
  • descriptor confusion
  • masking chemical structures
  • PLS
  • QSAR
  • SMILES
  • WOMBAT