Advertisement

Surrogate data – a secure way to share corporate data

Summary

The privacy of chemical structure is of paramount importance for the industrial sector, in particular for the pharmaceutical industry. At the same time, companies handle large amounts of physico-chemical and biological data that could be shared in order to improve our molecular understanding of pharmacokinetic and toxicological properties, which could lead to improved predictivity and shorten the development time for drugs, in particular in the early phases of drug discovery. The current study provides some theoretical limits on the information required to produce reverse engineering of molecules from generated descriptors and demonstrates that the information content of molecules can be as low as less than one bit per atom. Thus theoretically just one descriptor can be used to completely disclose the molecular structure. Instead of sharing descriptors, we propose to share surrogate data. The sharing of surrogate data is nothing else but sharing of reliably predicted molecules. The use of surrogate data can provide the same information as the original set. We consider the practical application of this idea to predict lipophilicity of chemical compounds and we demonstrate that surrogate and real (original) data provides similar prediction ability. Thus, our proposed strategy makes it possible not only to share descriptors, but also complete collections of surrogate molecules without the danger of disclosing the underlying molecular structures.

This is a preview of subscription content, log in to check access.

Access options

Buy single article

Instant unlimited access to the full article PDF.

US$ 39.95

Price includes VAT for USA

Subscribe to journal

Immediate online access to all issues from 2019. Subscription will auto renew annually.

US$ 199

This is the net price. Taxes to be calculated in checkout.

References

  1. 1.

    DiMasi J.A., Hansen R.W., Grabowski H.G., (2003) J. Health. Econ. 22: 151

  2. 2.

    Landers, P., The Wall Street Journal, 12/8/2003, 2003

  3. 3.

    Tetko I.V., Poda G.I., (2004) J. Med. Chem. 47: 5601

  4. 4.

    Tetko I.V., Bruneau P., (2004) J. Pharm. Sci. 93: 3103

  5. 5.

    Tetko, I.V., Drug Discov. Today, in press (2005)

  6. 6.

    Irwin J.J., Shoichet B.K., (2005) J. Chem. Inf. Model. 45: 177

  7. 7.

    Tetko I.V., Tanchuk V.Y., Villa A.E., (2001) J. Chem. Inf. Comput. Sci. 41: 1407

  8. 8.

    Tetko I.V., (2002) Neur. Proc. Lett. 16: 187

  9. 9.

    Tetko I.V., (2002) J. Chem. Inf. Comput. Sci. 42: 717

  10. 10.

    Tetko I.V., Villa A.E.P., Aksenova T.I., Zielinski W.L., Brower J., Collantes E.R., Welsh W.J., (1998) J. Chem. Inf. Comput. Sci. 38: 660

  11. 11.

    Hall L.H., Kier L.B., (1995) J. Chem. Inf. Comput. Sci. 35: 1039

  12. 12.

    Kier L.B., Hall L.H., 1999. Molecular Structure Description: The Electrotopological State, Academic Press, London

  13. 13.

    Kier L.B., Hall L.H., (1990) Pharm. Res. 7: 801

  14. 14.

    PHYSPROP database is available from Syracuse, Inc. http://www.syrres.com, 31/07/2005

  15. 15.

    Sadowski J., Gasteiger J., Klebe G., (1994) J. Chem. Inf. Comput. Sci. 34: 1000

  16. 16.

    Todeschini R., Consonni V., 2000. Handbook of Molecular Descriptors, WILEY-VCH, Weinheim

  17. 17.

    Weininger, D., Blaney, J.M. and Dixon, S., 1993 USA

  18. 18.

    Clement, O.O. and Guner, O.F. 229th American Chemical Society National Meeting & Exposition, ACS, San Diego, CA, March 13–17, 2005

  19. 19.

    Bologa, C., Olah, M. and Oprea, T.I. 229th American Chemical Society National Meeting & Exposition, ACS, San Diego, CA, March 13–17, 2005

  20. 20.

    Shen, L., Smith, K.M., Masek, B.B. and Pearlman, R.S. 229th American Chemical Society National Meeting & Exposition, ACS, San Diego, CA, March 13–17, 2005

  21. 21.

    Li M., Vitanyi P., 1997 An Introduction to Kolmogorov Complexity and Its Applications, Springer Verlag, Heidelberg

  22. 22.

    Filimonov, D. and Poroikov, V.V. 229th American Chemical Society National Meeting & Exposition, ACS, San Diego, CA, March 13–17, 2005

  23. 23.

    Abagyan, R. 229th American Chemical Society National Meeting & Exposition, ACS, San Diego, CA, March 13–17, 2005

  24. 24.

    Tetko, I.V. 229th American Chemical Society National Meeting & Exposition, ACS, San Diego, CA, March 13–17, 2005

  25. 25.

    Oprea T.I., (2002) J. Braz. Chem. Soc. 13: 811

  26. 26.

    Solov’ev V.P., Varnek A., Wipff G., (2000). J. Chem. Inf. Comput. Sci. 40: 847

  27. 27.

    Trepalin S.V., Gerasimenko V.A., Kozyukov A.V., Savchuk N.P., Ivaschenko A.A., (2002) J. Chem. Inf. Comput. Sci. 42: 249

  28. 28.

    Mestres, J. and Gregori-Puigjané, E. 229th American Chemical Society National Meeting & Exposition, ACS, San Diego, CA, March 13–17, 2005

  29. 29.

    http://www-groups.dcs.st-and.ac.uk/∼ ∼history/HistTopics/Fermat’s_last_theorem.html, 31/07/2005

  30. 30.

    Young, S.S., Karr, A. and Sanil, A.P. 229th American Chemical Society National Meeting & Exposition, ACS, San Diego, CA, March 13–17, 2005

  31. 31.

    Vapnik V.N., 1998 Statistical Leaning Theory, Wiley, New York

  32. 32.

    Walker M.J., (2004) QSAR Comb. Sci. 23: 515

  33. 33.

    Kappler, M.A., Allu, T.K. and Oprea, T.I. J. Chem. Inf. Model., (2005) in preparation

  34. 34.

    Wilson E.K., (2005) Chem. Eng. News 83: 24

Download references

Acknowledgement

The authors thank Scott Hutton for providing compounds from iResearch library (ChemNavigator) used in the current study, Cristian Bologa (University of New Mexico Division of Biocomputing) and Philip Wong (Institute for Bioinformatics) for their technical help. The authors thank Robert S. Pearlman for his constructive comments. Part of this work was supported by INTAS “Virtual Computational Chemistry Laboratory” http://www.vcclab.org grant (IVT) and by New Mexico Tobacco Settlement Funds for Biocomputing (TIO).

Author information

Correspondence to Igor V. Tetko.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Tetko, I.V., Abagyan, R. & Oprea, T.I. Surrogate data – a secure way to share corporate data. J Comput Aided Mol Des 19, 749–764 (2005). https://doi.org/10.1007/s10822-005-9013-3

Download citation

Key words:

  • drug design
  • structure–property prediction
  • information content of a molecule
  • representation of molecules
  • surrogate data
  • lipophilicity prediction