Skip to main content
Log in

JEDA: Joint entropy diversity analysis. An information-theoretic method for choosing diverse and representative subsets from combinatorial libraries

  • Full–length Paper
  • Published:
Molecular Diversity Aims and scope Submit manuscript

Summary

The joint entropy-based diversity analysis (JEDA) program is a new method of selecting representative subsets of compounds from combinatorial libraries. Similar to other cell-based diversity analyses, a set of chemical descriptors is used to partition the chemical space of a library of compounds; however, unlike other metrics for choosing a compound from each partition, a Shannon-entropy based scoring function implemented in a probabilistic search algorithm determines a representative subset of compounds. This approach enables the selection of compounds that are not only diverse but that also represent the densities of chemical space occupied by the original chemical library. Additionally, JEDA permits the user to define the size of the subset that the chemist wishes to create so that restrictions on time and chemical reagents can be considered. Subsets created from a chemical library by JEDA are compared to subsets obtained using other partition-based diversity analyses, namely principal components analysis and median partitioning, on a combinatorial library derived from the Comprehensive Medical Chemistry Dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Kitchen, D.B., Stahura, F.L. and Bajorath, J., Computational techniques for diversity analysis and compound classification, Mini. Rev. Med. Chem., 4 (2004) 1029–1039.

    PubMed  CAS  Google Scholar 

  2. Godden, J.W. Median Partitioning: A novel method for the selection of representative subsets from large compound pools J. Chem. Inf. Comput. Sci., 42 (2002) 885–893.

    Article  PubMed  CAS  Google Scholar 

  3. Glen, W.G., Dunn, W.J. and Scott, D. R., Principal components analysis and partial least squares regression, Tetrahedron Comput. Methodol., 2 (1989) 349–376.

    Article  Google Scholar 

  4. Bayley, M.J. and Willett, P., Binning schemes for partition-based compound selection, J. Mol. Graph Model., 17 (1999) 10–18.

    Article  PubMed  CAS  Google Scholar 

  5. Raymond, J.W., Blankley, C.J. and Willett, P., Comparison of chemical clustering methods using graph- and fingerprint-based similarity measures J. Mol. Graph Model., 21 (2003) 421–433.

    Article  PubMed  CAS  Google Scholar 

  6. MacCuish, J., Nicolaou, C. and MacCuish, N.E., Ties in proximity and clustering compounds J. Chem .Inf. Comput. Sci., 41 (2001) 134–146.

    Article  PubMed  CAS  Google Scholar 

  7. Shannon, C., A Mathematical Theory of Communication Bell System Technical J., 27 (1948) 623–656.

    Google Scholar 

  8. Lin, S.K., Molecular diversity assessment: Logarithmic relations of information and species diversity and logarithmic relations of entropy and indistinguishability after rejection of Gibbs paradox of entropy mixing Molecules, 1 (1996) 57–67.

    Article  CAS  Google Scholar 

  9. Agrafiotis, D.K., On the use of information theory for assessing molecular diversity J. Chem. Inf. Comput. Sci., 37 (1997) 576–580.

    Article  CAS  Google Scholar 

  10. Godden, J.W. and Bajorath, J., Shannon entropy – a novel concept in molecular descriptor and diversity analysis J. Mol. Graph Model, 18 (2000) 73–76.

    PubMed  CAS  Google Scholar 

  11. Godden, J.W., Stahura, F.L. and Bajorath, J., Variability of molecular descriptors in compound databases revealed by Shannon entropy calculations J. Chem. Inf. Comput. Sci., 40 (2000) 796–800.

    Article  PubMed  CAS  Google Scholar 

  12. Godden, J.W. and Bajorath, J., Differential Shannon entropy as a sensitive measure of differences in database variability of molecular descriptors, J. Chem. Inf. Comput. Sci., 41 (2001) 1060–1066.

    Article  PubMed  CAS  Google Scholar 

  13. Miller, J.L., Bradley, E.K. and Teig, S.L., Luddite: An information-theoretic library design tool J. Chem. Inf. Comput. Sci., 43 (2003) 47–54.

    Article  PubMed  CAS  Google Scholar 

  14. Xue, L., Godden, J.W. and Bajorath, J., Database searching for compounds with similar biological activity using short binary bit string representations of molecules J. Chem. Inf. Comput. Sci., 39 (1999) 881–886.

    Article  PubMed  CAS  Google Scholar 

  15. Xue, L., et al., Design and evaluation of a molecular fingerprint involving the transformation of property descriptor values into a binary classification scheme, J. Chem. Inf. Comput. Sci., 43 (2003) 1151–7.

    Article  PubMed  CAS  Google Scholar 

  16. Comprehensive Medicinal Chemistry, MDL Information Systems, Inc.: San Leandro, CA, 2004.

  17. ChemFinder Ultra, Cambridgesoft, Cambridge, MA, 2001.

  18. Molecular Operating Environment (MOE), Chemical Computing Group, Montreal, Quebec, 2004.

  19. Labute, P., A widely applicable set of descriptors, J. Mol. Graph Model, 18 (2000) 464–477.

    Article  PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Scott E. Schaus.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Landon, M.R., Schaus, S.E. JEDA: Joint entropy diversity analysis. An information-theoretic method for choosing diverse and representative subsets from combinatorial libraries. Mol Divers 10, 333–339 (2006). https://doi.org/10.1007/s11030-006-9042-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11030-006-9042-4

Key words

Navigation