Estimation of the size of drug-like chemical space based on GDB-17 data
- 894 Downloads
The goal of this paper is to estimate the number of realistic drug-like molecules which could ever be synthesized. Unlike previous studies based on exhaustive enumeration of molecular graphs or on combinatorial enumeration preselected fragments, we used results of constrained graphs enumeration by Reymond to establish a correlation between the number of generated structures (M) and the number of heavy atoms (N): logM = 0.584 × N × logN + 0.356. The number of atoms limiting drug-like chemical space of molecules which follow Lipinsky’s rules (N = 36) has been obtained from the analysis of the PubChem database. This results in M ≈ 1033 which is in between the numbers estimated by Ertl (1023) and by Bohacek (1060).
KeywordsChemical space Drug-like chemical space Graphs enumeration
Authors thank Dr. I. Baskin, Prof. I. Antipin and Dr. G. Marcou for valuable comments. PP thanks the French Embassy in Ukraine for the support of his stay at the University of Strasbourg in 2012. TM acknowledges Kazan Federal University for the support of his stay at the University of Strasbourg in 2012.
- 6.Bohacek RS, McMartin C, Guida WC (1996) The art and practice of structure-based drug design: a molecular modeling perspective. Med Res Rev 16(1):3–50. doi: 10.1002/(sici)1098-1128(199601)16:1<3:aid-med1>3.0.co;2-6 CrossRefGoogle Scholar
- 16.Polya G (1936) Algebraische Berechnung der Anzahl der Isomeren einiger organischer Verbindungen, Zeit. f. KristallGoogle Scholar
- 18.Read R (1976) The enumeration of acyclic chemical compounds. Academic Press, New YorkGoogle Scholar
- 21.Sloane NJA, Sloane N (1973) A handbook of integer sequences, vol 65. Academic Press, New YorkGoogle Scholar
- 23.Weininger D (2002) Combinatorics of small molecular structures. In: Encyclopedia of computational chemistry. John Wiley & Sons, Ltd. doi: 10.1002/0470845015.cna014m
- 30.Giménez O, Noy M (2005) The number of planar graphs and properties of random planar graphs. In: International conference on analysis of algorithms DMTCS proc. AD, Barcelona, Spain, 6-10 June 2005. Discrete Mathematics and Theoretical Computer Science (DMTCS), Nancy, France. p 147–156 Google Scholar
- 31.R: A Language and Environment for Statistical Computing (2012) R Foundation for Statistical Computing, Vienna, AustriaGoogle Scholar
- 32.Lipinski C (1995) Computational alerts for potential absorption problems: profiles of clinically tested drugs. Paper presented at the tools for oral absorption. Part II. Predicting human absorption. BIOTEC. PDD symposium, AAPS, MiamiGoogle Scholar