Summary
The data for 3.8 million compounds from structural databases of 32 providers were gathered and stored in a single chemical database. Duplicates are removed using the IUPAC International Chemical Identifier. After this, 2.6 million compounds remain. Each database and the final one were studied in term of uniqueness, diversity, frameworks, ‘drug-like’ and ‘lead–like’ properties. This study also shows that there are more than 87 000 frameworks in the database. It contains 2.1 million ‘drug-like’ molecules among which, more than one million are ‘lead-like’. This study has been carried out using ‘ScreeningAssistant’, a software dedicated to chemical databases management and screening sets generation. Compounds are stored in a MySQL database and all the operations on this database are carried out by Java code. The druglikeness and leadlikeness are estimated with ‘in–house’ scores using functions to estimate convenience to properties; unicity using the InChI code and diversity using molecular frameworks and fingerprints. The software has been conceived in order to facilitate the update of the database. ‘ScreeningAssistant’ is freely available under the GPL license.
Similar content being viewed by others
Abbreviations
- HBA:
-
H bond acceptor
- HBD:
-
H bond donor
- HTS:
-
high-throughput screening
- InChI:
-
IUPAC International Chemical Identifier
- JNI:
-
Java Native Interface
- MW:
-
molecular weight
- RO5:
-
rule-of-five
- SCA:
-
stochastic clustering analysis
- SSSR:
-
smallest set of smallest rings
References
Bradley, M.P., An overview of the diversity represented in commercially-available databases, J. Comput. Aided Mol. Des., 16 (2002) 299–300.
Mozziconacci, J.C., Arnoult, E., Baurin, N., Marot, C. and Morin-Allory, L., Preparation of a molecular database from a set of 2 million compounds for virtual screening applications : Gathering, structural analysis and filtering, 9th Electronic Computational Chemistry Conference, World Wide Web, March (2003).
Sirois, S., Hatzakis, G., Wei, D., Du, Q., Chou, K.C., Assessment of chemical libraries for their druggability, Comput. Biol. Chem., 29 (2005) 55–67.
Baurin, N., Baker, R., Richardson, C., Chen, I., Foloppe, N., Potter, A., Jordan, A., Roughley, S., Parratt, M., Greaney, P., Morley, D. and Hubbard, R.E., Drug-like annotation and duplicate analysis of a 23-supplier chemical database totalling 2.7 million compounds, J. Chem. Inf. Comput. Sci., 44 (2004) 643–657.
Cummins, D.J., Andrews, C.W., Bentley, J.A. and Cory, M., Molecular diversity in chemical databases: Comparison of medicinal chemistry knowledge bases and databases of commercially available compounds, J. Chem. Inf. Comput. Sci., 36 (1996) 750–763.
Voigt, J.H., Bienfait, B., Wang, S. and Nicklaus, M.C., Comparison of the NCI open database with seven large chemical structural databases, J. Chem. Inf. Comput. Sci., 41 (2001) 702–712.
Monge, A., Screening assistant, http://screenassistant.sourceforge.net/
Wegner, J.K., JOELib, http://joelib.sourceforge.net
Corina. Molecular Networks GmbH. http://www.mol-net.com
The IUPAC International Chemical Identifier Project, http://www.iupac.org/inchi/
Murray-Rust, P., Rzepa, H.S., Stewart, J.J., Zhang, Y., A global resource for computational chemistry, J. Mol. Model., 11 (2005) 532–541.
Coles, S.J., Day, N.E., Murray-Rust, P., Rzepa, H.S. and Zhang, Y., Enhancement of the chemical semantic web through the use of InChI identifiers, Org. Biomol. Chem., 3 (2005) 1832–1834.
Prasanna, M.D., Vondrasek, J., Wlodawer, A. and Bhat, T.N., Application of InChI to curate, index, and query 3-D structures, Proteins, 60 (2005) 1–4.
Molecular Operating Environment (MOE), Chemical Computing, http://www.chemcomp.com
OEChem, OpenEye Scientific Software, http://www.eyesopen.com
Marvin, ChemAxon. http://www.chemaxon.com
Groupement De Service Chimiothèque Nationale, http://chimiotheque-nationale.enscm.fr
Reynolds, C.H., Druker, R. and Pfahle, L.B., Lead discovery using stochastic cluster analysis (SCA): A new method for clustering structurally similar compounds, J. Chem. Inf. Comput. Sci., 38 (1998) 305–312.
Xue, L., Godden, J.W. and Bajorath, J., Database searching for compounds with similar biological activity using short binary bit string representations of molecules, J. Chem. Inf. Comput. Sci., 39 (1999) 881–886.
Bemis, G.W. and Murcko, M.A., The properties of known drugs. 1. Molecular frameworks, J. Med. Chem., 39 (1996) 2887–2893.
Lajiness, M.S., Vieth, M. and Erickson, J., Molecular properties that influence oral drug-like behavior, Curr. Opin. Drug Discov. Devel., 7 (2004) 470–477.
Walters, W.P. and Murcko, M.A., Prediction of ‘drug-likeness’, Adv. Drug Delivery Rev., 54 (2002) 255–271.
Clark, D.E., Pickett, S.D., Computational methods for the prediction of ‘druglikeness’, Drug Discov. Today, 5 (2000), 49–58.
Muegge, I., Selection criteria for drug-like compounds, Med. Res. Rev., 23 (2003) 302–321.
Lipinski, C.A., Lombardo, F., Dominy, B.W. and Feeney, P.J., Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings, Adv. Drug Deliv. Rev., 23 (1997) 3–25.
Lipinski, C.A., Lead- and drug-like compounds: The rule-of-five revolution, Drug Discov. Today, 1 (2004) 337–341.
Frimurer, T.M., Bywater, R., Nærum, L., Lauritsen, L.N. and Brunak, S., Improving the odds in discriminating “drug-like” from “non drug-like” compounds, J. Chem. Inf. Comput. Sci., 40 (2000), 1315–1324.
Oprea, T.I., Property distribution of drug-related chemical databases, J. Comput. Aided Mol. Des., 14 (2000) 251–264.
Xu, J., Stevenson, J., Drug-like index: A new approach to measure drug-like compounds and their diversity, J. Chem. Inf. Comput. Sci., 40 (2000) 1177–1187.
Veber, D.F., Johnson, S.R., Cheng, H.Y., Smith, B.R., Ward, K.W., Kopple, K.D., Molecular properties that influence the oral bioavailability of drug candidates, J. Med. Chem., 45 (2002) 2615–2623.
Zheng, S., Luo, X., Chen, G., Zhu, W., Shen, J., Chen, K. and Jiang, H., A new rapid and effective chemistry space filter in recognizing a druglike database, J. Chem. Inf. Comput. Sci., 45 (2005) 856–862.
Muegge, I., Heald, S.L. and Brittelli, D., Simple selection criteria for drug-like chemical matter, J. Med. Chem., 44 (2001) 1841–1846.
Zernov, V.V., Balakin, K.V., Ivaschenko, A.A., Savchuk, N.P. and Pletnev, I.V., Drug Discovery using support vector machines. The case studies of drug-likeness, agrochemical-likeness, and enzyme inhibition predictions, J. Chem. Inf. Comput. Sci., 43 (2003), 2048–2056.
Ajay, A., Walters, W.P. and Murcko, M.A., Can we learn to distinguish between “drug-like” and “nondrug-like” molecules?, J. Med. Chem., 41 (1998) 3314–3324.
Sadowski, J. and Kubinyi, H., A scoring scheme for discriminating between drugs and nondrugs, J. Med. Chem., 41 (1998) 3325–3329.
Charifson, P.S. and Walters, W.P., Filtering databases and chemical libraries, J. Comput. Aided Mol. Des., 16 (2002) 311–323.
Rishton, G.M., Reactive compounds and in vitro false positives in HTS, Drug Discov. Today, 2 (1997) 382–384.
Wildman, S.A. and Crippen, G.M., Prediction of physicochemical parameters by atomic contributions, J. Chem. Inf. Comput. Sci., 39 (1999) 868–873.
Hann, M.M., Leach, A.R. and Harper, G., Molecular complexity and its impact on the probability of finding leads for drug discovery, J. Chem. Inf. Comput. Sci., 41 (2001) 856–864.
Oprea, T.I., Current trends in lead discovery: Are we looking for the appropriate properties?, J. Comput. Aided Mol. Des., 16 (2002) 325–334.
Davis, A.M., Teague, S.J. and Kleywegt, G.J., Application and limitations of X-ray crystallographic data in structure-based ligand and drug design, J. Chem. Inf. Comput. Sci., 42 (2003) 2718–2736.
Hann, M.M. and Oprea, T.I., Pursuing the leadlikeness concept in pharmaceutical research, Curr. Opin. Chem. Biol., 8 (2004) 255–263.
Wenlock, M.C., Austin, R.P., Barton, P., Davis, A.M. and Leeson P.D., A comparison of physiochemical property profiles of development and marketed oral drugs, J. Med. Chem., 46 (2003) 1250–1256.
Hou, T.J., Xia, K., Zhang, W. and Xu, X.J., ADME evaluation in drug discovery. 4. Prediction of aqueous solubility based on atom contribution approach, J. Chem. Inf. Comput. Sci., 44 (2004) 266–275.
Ertl, P., Rohde, B. and Selzer, P., Fast calculation of molecular polar surface area as a sum of fragment-based contributions and its application to the prediction of drug transport properties, J. Med. Chem., 43 (2000) 3714–3717.
Palm, K., Stenberg, P., Luthman, K. and Artursson, P., Polar molecular surface properties predict the intestinal absorption of drugs in humans, Pharm. Res., 14 (1997) 568–571.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Monge, A., Arrault, A., Marot, C. et al. Managing, profiling and analyzing a library of 2.6 million compounds gathered from 32 chemical providers. Mol Divers 10, 389–403 (2006). https://doi.org/10.1007/s11030-006-9033-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11030-006-9033-5