Summary
The joint entropy-based diversity analysis (JEDA) program is a new method of selecting representative subsets of compounds from combinatorial libraries. Similar to other cell-based diversity analyses, a set of chemical descriptors is used to partition the chemical space of a library of compounds; however, unlike other metrics for choosing a compound from each partition, a Shannon-entropy based scoring function implemented in a probabilistic search algorithm determines a representative subset of compounds. This approach enables the selection of compounds that are not only diverse but that also represent the densities of chemical space occupied by the original chemical library. Additionally, JEDA permits the user to define the size of the subset that the chemist wishes to create so that restrictions on time and chemical reagents can be considered. Subsets created from a chemical library by JEDA are compared to subsets obtained using other partition-based diversity analyses, namely principal components analysis and median partitioning, on a combinatorial library derived from the Comprehensive Medical Chemistry Dataset.
Similar content being viewed by others
References
Kitchen, D.B., Stahura, F.L. and Bajorath, J., Computational techniques for diversity analysis and compound classification, Mini. Rev. Med. Chem., 4 (2004) 1029–1039.
Godden, J.W. Median Partitioning: A novel method for the selection of representative subsets from large compound pools J. Chem. Inf. Comput. Sci., 42 (2002) 885–893.
Glen, W.G., Dunn, W.J. and Scott, D. R., Principal components analysis and partial least squares regression, Tetrahedron Comput. Methodol., 2 (1989) 349–376.
Bayley, M.J. and Willett, P., Binning schemes for partition-based compound selection, J. Mol. Graph Model., 17 (1999) 10–18.
Raymond, J.W., Blankley, C.J. and Willett, P., Comparison of chemical clustering methods using graph- and fingerprint-based similarity measures J. Mol. Graph Model., 21 (2003) 421–433.
MacCuish, J., Nicolaou, C. and MacCuish, N.E., Ties in proximity and clustering compounds J. Chem .Inf. Comput. Sci., 41 (2001) 134–146.
Shannon, C., A Mathematical Theory of Communication Bell System Technical J., 27 (1948) 623–656.
Lin, S.K., Molecular diversity assessment: Logarithmic relations of information and species diversity and logarithmic relations of entropy and indistinguishability after rejection of Gibbs paradox of entropy mixing Molecules, 1 (1996) 57–67.
Agrafiotis, D.K., On the use of information theory for assessing molecular diversity J. Chem. Inf. Comput. Sci., 37 (1997) 576–580.
Godden, J.W. and Bajorath, J., Shannon entropy – a novel concept in molecular descriptor and diversity analysis J. Mol. Graph Model, 18 (2000) 73–76.
Godden, J.W., Stahura, F.L. and Bajorath, J., Variability of molecular descriptors in compound databases revealed by Shannon entropy calculations J. Chem. Inf. Comput. Sci., 40 (2000) 796–800.
Godden, J.W. and Bajorath, J., Differential Shannon entropy as a sensitive measure of differences in database variability of molecular descriptors, J. Chem. Inf. Comput. Sci., 41 (2001) 1060–1066.
Miller, J.L., Bradley, E.K. and Teig, S.L., Luddite: An information-theoretic library design tool J. Chem. Inf. Comput. Sci., 43 (2003) 47–54.
Xue, L., Godden, J.W. and Bajorath, J., Database searching for compounds with similar biological activity using short binary bit string representations of molecules J. Chem. Inf. Comput. Sci., 39 (1999) 881–886.
Xue, L., et al., Design and evaluation of a molecular fingerprint involving the transformation of property descriptor values into a binary classification scheme, J. Chem. Inf. Comput. Sci., 43 (2003) 1151–7.
Comprehensive Medicinal Chemistry, MDL Information Systems, Inc.: San Leandro, CA, 2004.
ChemFinder Ultra, Cambridgesoft, Cambridge, MA, 2001.
Molecular Operating Environment (MOE), Chemical Computing Group, Montreal, Quebec, 2004.
Labute, P., A widely applicable set of descriptors, J. Mol. Graph Model, 18 (2000) 464–477.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Landon, M.R., Schaus, S.E. JEDA: Joint entropy diversity analysis. An information-theoretic method for choosing diverse and representative subsets from combinatorial libraries. Mol Divers 10, 333–339 (2006). https://doi.org/10.1007/s11030-006-9042-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11030-006-9042-4