Advertisement

Journal of Computer-Aided Molecular Design

, Volume 19, Issue 9–10, pp 749–764 | Cite as

Surrogate data – a secure way to share corporate data

  • Igor V. TetkoEmail author
  • Ruben Abagyan
  • Tudor I. Oprea
Article

Summary

The privacy of chemical structure is of paramount importance for the industrial sector, in particular for the pharmaceutical industry. At the same time, companies handle large amounts of physico-chemical and biological data that could be shared in order to improve our molecular understanding of pharmacokinetic and toxicological properties, which could lead to improved predictivity and shorten the development time for drugs, in particular in the early phases of drug discovery. The current study provides some theoretical limits on the information required to produce reverse engineering of molecules from generated descriptors and demonstrates that the information content of molecules can be as low as less than one bit per atom. Thus theoretically just one descriptor can be used to completely disclose the molecular structure. Instead of sharing descriptors, we propose to share surrogate data. The sharing of surrogate data is nothing else but sharing of reliably predicted molecules. The use of surrogate data can provide the same information as the original set. We consider the practical application of this idea to predict lipophilicity of chemical compounds and we demonstrate that surrogate and real (original) data provides similar prediction ability. Thus, our proposed strategy makes it possible not only to share descriptors, but also complete collections of surrogate molecules without the danger of disclosing the underlying molecular structures.

Key words:

drug design structure–property prediction information content of a molecule representation of molecules surrogate data lipophilicity prediction 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Notes

Acknowledgement

The authors thank Scott Hutton for providing compounds from iResearch library (ChemNavigator) used in the current study, Cristian Bologa (University of New Mexico Division of Biocomputing) and Philip Wong (Institute for Bioinformatics) for their technical help. The authors thank Robert S. Pearlman for his constructive comments. Part of this work was supported by INTAS “Virtual Computational Chemistry Laboratory” http://www.vcclab.org grant (IVT) and by New Mexico Tobacco Settlement Funds for Biocomputing (TIO).

References

  1. 1.
    DiMasi J.A., Hansen R.W., Grabowski H.G., (2003) J. Health. Econ. 22: 151CrossRefGoogle Scholar
  2. 2.
    Landers, P., The Wall Street Journal, 12/8/2003, 2003Google Scholar
  3. 3.
    Tetko I.V., Poda G.I., (2004) J. Med. Chem. 47: 5601CrossRefGoogle Scholar
  4. 4.
    Tetko I.V., Bruneau P., (2004) J. Pharm. Sci. 93: 3103CrossRefGoogle Scholar
  5. 5.
    Tetko, I.V., Drug Discov. Today, in press (2005)Google Scholar
  6. 6.
    Irwin J.J., Shoichet B.K., (2005) J. Chem. Inf. Model. 45: 177CrossRefGoogle Scholar
  7. 7.
    Tetko I.V., Tanchuk V.Y., Villa A.E., (2001) J. Chem. Inf. Comput. Sci. 41: 1407CrossRefGoogle Scholar
  8. 8.
    Tetko I.V., (2002) Neur. Proc. Lett. 16: 187CrossRefGoogle Scholar
  9. 9.
    Tetko I.V., (2002) J. Chem. Inf. Comput. Sci. 42: 717CrossRefGoogle Scholar
  10. 10.
    Tetko I.V., Villa A.E.P., Aksenova T.I., Zielinski W.L., Brower J., Collantes E.R., Welsh W.J., (1998) J. Chem. Inf. Comput. Sci. 38: 660CrossRefGoogle Scholar
  11. 11.
    Hall L.H., Kier L.B., (1995) J. Chem. Inf. Comput. Sci. 35: 1039CrossRefGoogle Scholar
  12. 12.
    Kier L.B., Hall L.H., 1999. Molecular Structure Description: The Electrotopological State, Academic Press, LondonGoogle Scholar
  13. 13.
    Kier L.B., Hall L.H., (1990) Pharm. Res. 7: 801CrossRefGoogle Scholar
  14. 14.
    PHYSPROP database is available from Syracuse, Inc. http://www.syrres.com, 31/07/2005Google Scholar
  15. 15.
    Sadowski J., Gasteiger J., Klebe G., (1994) J. Chem. Inf. Comput. Sci. 34: 1000CrossRefGoogle Scholar
  16. 16.
    Todeschini R., Consonni V., 2000. Handbook of Molecular Descriptors, WILEY-VCH, WeinheimGoogle Scholar
  17. 17.
    Weininger, D., Blaney, J.M. and Dixon, S., 1993 USAGoogle Scholar
  18. 18.
    Clement, O.O. and Guner, O.F. 229th American Chemical Society National Meeting & Exposition, ACS, San Diego, CA, March 13–17, 2005Google Scholar
  19. 19.
    Bologa, C., Olah, M. and Oprea, T.I. 229th American Chemical Society National Meeting & Exposition, ACS, San Diego, CA, March 13–17, 2005Google Scholar
  20. 20.
    Shen, L., Smith, K.M., Masek, B.B. and Pearlman, R.S. 229th American Chemical Society National Meeting & Exposition, ACS, San Diego, CA, March 13–17, 2005Google Scholar
  21. 21.
    Li M., Vitanyi P., 1997 An Introduction to Kolmogorov Complexity and Its Applications, Springer Verlag, HeidelbergGoogle Scholar
  22. 22.
    Filimonov, D. and Poroikov, V.V. 229th American Chemical Society National Meeting & Exposition, ACS, San Diego, CA, March 13–17, 2005Google Scholar
  23. 23.
    Abagyan, R. 229th American Chemical Society National Meeting & Exposition, ACS, San Diego, CA, March 13–17, 2005Google Scholar
  24. 24.
    Tetko, I.V. 229th American Chemical Society National Meeting & Exposition, ACS, San Diego, CA, March 13–17, 2005Google Scholar
  25. 25.
    Oprea T.I., (2002) J. Braz. Chem. Soc. 13: 811CrossRefGoogle Scholar
  26. 26.
    Solov’ev V.P., Varnek A., Wipff G., (2000). J. Chem. Inf. Comput. Sci. 40: 847Google Scholar
  27. 27.
    Trepalin S.V., Gerasimenko V.A., Kozyukov A.V., Savchuk N.P., Ivaschenko A.A., (2002) J. Chem. Inf. Comput. Sci. 42: 249CrossRefGoogle Scholar
  28. 28.
    Mestres, J. and Gregori-Puigjané, E. 229th American Chemical Society National Meeting & Exposition, ACS, San Diego, CA, March 13–17, 2005Google Scholar
  29. 29.
    http://www-groups.dcs.st-and.ac.uk/∼ ∼history/HistTopics/Fermat’s_last_theorem.html, 31/07/2005Google Scholar
  30. 30.
    Young, S.S., Karr, A. and Sanil, A.P. 229th American Chemical Society National Meeting & Exposition, ACS, San Diego, CA, March 13–17, 2005Google Scholar
  31. 31.
    Vapnik V.N., 1998 Statistical Leaning Theory, Wiley, New YorkGoogle Scholar
  32. 32.
    Walker M.J., (2004) QSAR Comb. Sci. 23: 515CrossRefGoogle Scholar
  33. 33.
    Kappler, M.A., Allu, T.K. and Oprea, T.I. J. Chem. Inf. Model., (2005) in preparationGoogle Scholar
  34. 34.
    Wilson E.K., (2005) Chem. Eng. News 83: 24Google Scholar

Copyright information

© Springer Science + Business Media Inc., 2005

Authors and Affiliations

  • Igor V. Tetko
    • 1
    • 4
    Email author
  • Ruben Abagyan
    • 2
  • Tudor I. Oprea
    • 3
  1. 1.Institute of Bioorganic and Petroleum ChemistryUkrainian Academy of SciencesKyivUkraine
  2. 2.Molecular BiologyThe Scripps Research InstituteLa JollaUSA
  3. 3.Division of Biocomputing, University of New Mexico School of MedicineUniversity of New MexicoAlbuquerqueUSA
  4. 4.GSF – Forschungszentrum fuer Umwelt und Gesundheit, GmbHInstitute for BioinformaticsNeuherbergGermany

Personalised recommendations