Journal of Computer-Aided Molecular Design

, Volume 27, Issue 8, pp 675–679 | Cite as

Estimation of the size of drug-like chemical space based on GDB-17 data

  • P. G. Polishchuk
  • T. I. Madzhidov
  • A. VarnekEmail author


The goal of this paper is to estimate the number of realistic drug-like molecules which could ever be synthesized. Unlike previous studies based on exhaustive enumeration of molecular graphs or on combinatorial enumeration preselected fragments, we used results of constrained graphs enumeration by Reymond to establish a correlation between the number of generated structures (M) and the number of heavy atoms (N): logM = 0.584 × N × logN + 0.356. The number of atoms limiting drug-like chemical space of molecules which follow Lipinsky’s rules (N = 36) has been obtained from the analysis of the PubChem database. This results in M ≈ 1033 which is in between the numbers estimated by Ertl (1023) and by Bohacek (1060).


Chemical space Drug-like chemical space Graphs enumeration 



Authors thank Dr. I. Baskin, Prof. I. Antipin and Dr. G. Marcou for valuable comments. PP thanks the French Embassy in Ukraine for the support of his stay at the University of Strasbourg in 2012. TM acknowledges Kazan Federal University for the support of his stay at the University of Strasbourg in 2012.


  1. 1.
    Pólya G, Read RC (1987) Combinatorial enumeration of groups, graphs, and chemical compounds. Springer-Verlag Inc., New YorkCrossRefGoogle Scholar
  2. 2.
    Bergeron F, Labelle G, Leroux P (1997) Combinatorial species and tree-like structures, vol 67. Cambridge University Press, CambridgeCrossRefGoogle Scholar
  3. 3.
    Fujita S (1991) Symmetry and combinatorial enumeration in chemistry, vol 8. Springer-Verlag, Berlin, HeidelbergCrossRefGoogle Scholar
  4. 4.
    Henze HR, Blair CM (1931) The number of isomeric hydrocarbons of the methane series. J Am Chem Soc 53(8):3077–3085. doi: 10.1021/ja01359a034 CrossRefGoogle Scholar
  5. 5.
    Blair CM, Henze HR (1932) The number of stereoisomeric and non-stereoisomeric paraffin hydrocarbons. J Am Chem Soc 54(4):1538–1545. doi: 10.1021/ja01343a044 CrossRefGoogle Scholar
  6. 6.
    Bohacek RS, McMartin C, Guida WC (1996) The art and practice of structure-based drug design: a molecular modeling perspective. Med Res Rev 16(1):3–50. doi: 10.1002/(sici)1098-1128(199601)16:1<3:aid-med1>;2-6 CrossRefGoogle Scholar
  7. 7.
    Ertl P (2002) Cheminformatics Analysis of Organic Substituents: identification of the most common substituents, calculation of substituent properties, and automatic identification of drug-like bioisosteric groups. J Chem Inf Comput Sci 43(2):374–380. doi: 10.1021/ci0255782 Google Scholar
  8. 8.
    Weaver DF, Weaver CA (2011) Exploring neurotherapeutic space: how many neurological drugs exist (or could exist)? J Pharm Pharmacol 63(1):136–139. doi: 10.1111/j.2042-7158.2010.01161.x CrossRefGoogle Scholar
  9. 9.
    Fink T, Bruggesser H, Reymond J-L (2005) Virtual exploration of the small-molecule chemical universe below 160 Daltons. Angew Chem Int Ed 44(10):1504–1508. doi: 10.1002/anie.200462457 CrossRefGoogle Scholar
  10. 10.
    Ruddigkeit L, van Deursen R, Blum LC, Reymond J-L (2012) Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17. J Chem Inf Model 52(11):2864–2875. doi: 10.1021/ci300415d CrossRefGoogle Scholar
  11. 11.
    Cayley E (1875) Ueber die analytischen Figuren, welche in der Mathematik Bäume genannt werden und ihre Anwendung auf die Theorie chemischer Verbindungen. Ber Dtsch Chem Ges 8(2):1056–1059. doi: 10.1002/cber.18750080252 CrossRefGoogle Scholar
  12. 12.
    Herrmann F (1897) Ueber das Problem, die Anzahl der isomeren Paraffine von der Formel CnH2n + 2 zu bestimmen. Ber Dtsch Chem Ges 30(3):2423–2426. doi: 10.1002/cber.18970300310 CrossRefGoogle Scholar
  13. 13.
    Schiff H (1875) Zur Statistik chemischer Verbindungen. Ber Dtsch Chem Ges 8(2):1542–1547. doi: 10.1002/cber.187500802191 CrossRefGoogle Scholar
  14. 14.
    Losanitsch SM (1897) Die Isomerie-Arten bei den Homologen der Paraffin-Reihe. Ber Dtsch Chem Ges 30(2):1917–1926. doi: 10.1002/cber.189703002144 CrossRefGoogle Scholar
  15. 15.
    Perry D (1932) The number of structural isomers of certain homologs of methane and methanol. J Am Chem Soc 54(7):2918–2920. doi: 10.1021/ja01346a035 CrossRefGoogle Scholar
  16. 16.
    Polya G (1936) Algebraische Berechnung der Anzahl der Isomeren einiger organischer Verbindungen, Zeit. f. KristallGoogle Scholar
  17. 17.
    Harary F, Norman RZ (1960) Dissimilarity characteristic theorems for graphs. Proc Am Math Soc 11(2):332–334CrossRefGoogle Scholar
  18. 18.
    Read R (1976) The enumeration of acyclic chemical compounds. Academic Press, New YorkGoogle Scholar
  19. 19.
    Robinson RW, Harry F, Balaban AT (1976) The numbers of chiral and achiral alkanes and monosubstituted alkanes. Tetrahedron 32(3):355–361. doi: 10.1016/0040-4020(76)80049-X CrossRefGoogle Scholar
  20. 20.
    Cyvin SJ, Brunvoll J, Cyvin BN (1995) Enumeration of constitutional isomers of polyenes. J Mol Struct THEOCHEM 357(3):255–261. doi: 10.1016/0166-1280(95)04329-6 CrossRefGoogle Scholar
  21. 21.
    Sloane NJA, Sloane N (1973) A handbook of integer sequences, vol 65. Academic Press, New YorkGoogle Scholar
  22. 22.
    Leonard JE, Hammond GS, Simmons HE (1975) Apparent symmetry of cyclohexane. J Am Chem Soc 97(18):5052–5054. doi: 10.1021/ja00851a003 CrossRefGoogle Scholar
  23. 23.
    Weininger D (2002) Combinatorics of small molecular structures. In: Encyclopedia of computational chemistry. John Wiley & Sons, Ltd. doi: 10.1002/0470845015.cna014m
  24. 24.
    Ogata K, Isomura T, Yamashita H, Kubodera H (2007) A quantitative approach to the estimation of chemical space from a given geometry by the combination of atomic species. QSAR Comb Sci 26(5):596–607. doi: 10.1002/qsar.200630037 CrossRefGoogle Scholar
  25. 25.
    Drew KLM, Baiman H, Khwaounjoo P, Yu B, Reynisson J (2012) Size estimation of chemical space: how big is it? J Pharm Pharmacol 64(4):490–495. doi: 10.1111/j.2042-7158.2011.01424.x CrossRefGoogle Scholar
  26. 26.
    Walters WP, Stahl MT, Murcko MA (1998) Virtual screening—an overview. Drug Discov Today 3(4):160–178. doi: 10.1016/S1359-6446(97)01163-X CrossRefGoogle Scholar
  27. 27.
    Gorse A-D (2006) Diversity in medicinal chemistry space. Curr Trends Med Chem 6(1):3–18CrossRefGoogle Scholar
  28. 28.
    Mario Geysen H, Schoenen F, Wagner D, Wagner R (2003) Combinatorial compound libraries for drug discovery: an ongoing challenge. Nat Rev Drug Discov 2(3):222–230CrossRefGoogle Scholar
  29. 29.
    Valler MJ, Green D (2000) Diversity screening versus focussed screening in drug discovery. Drug Discov Today 5(7):286–293. doi: 10.1016/S1359-6446(00)01517-8 CrossRefGoogle Scholar
  30. 30.
    Giménez O, Noy M (2005) The number of planar graphs and properties of random planar graphs. In: International conference on analysis of algorithms DMTCS proc. AD, Barcelona, Spain, 6-10 June 2005. Discrete Mathematics and Theoretical Computer Science (DMTCS), Nancy, France. p 147–156 Google Scholar
  31. 31.
    R: A Language and Environment for Statistical Computing (2012) R Foundation for Statistical Computing, Vienna, AustriaGoogle Scholar
  32. 32.
    Lipinski C (1995) Computational alerts for potential absorption problems: profiles of clinically tested drugs. Paper presented at the tools for oral absorption. Part II. Predicting human absorption. BIOTEC. PDD symposium, AAPS, MiamiGoogle Scholar
  33. 33.
    Irwin JJ, Sterling T, Mysinger MM, Bolstad ES, Coleman RG (2012) ZINC: a free tool to discover chemistry for biology. J Chem Inf Model 52(7):1757–1768. doi: 10.1021/ci3001277 CrossRefGoogle Scholar
  34. 34.
    Shoichet BK (2013) Drug discovery: nature’s pieces. Nat Chem 5(1):9–10CrossRefGoogle Scholar
  35. 35.
    Gillet VJ, Khatib W, Willett P, Fleming PJ, Green DVS (2002) Combinatorial library design using a multiobjective genetic algorithm. J Chem Inf Comput Sci 42(2):375–385. doi: 10.1021/ci010375j CrossRefGoogle Scholar
  36. 36.
    van Deursen R, Reymond J-L (2007) Chemical space travel. ChemMedChem 2(5):636–640. doi: 10.1002/cmdc.200700021 CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2013

Authors and Affiliations

  • P. G. Polishchuk
    • 1
  • T. I. Madzhidov
    • 2
  • A. Varnek
    • 3
    Email author
  1. 1.A.V. Bogatsky Physico-Chemical Institute of National Academy of Sciences of UkraineOdessaUkraine
  2. 2.A.M. Butlerov Institute of ChemistryKazan Federal UniversityKazanRussia
  3. 3.Laboratory of ChemoinformaticsUniversity of StrasbourgStrasbourgFrance

Personalised recommendations