Journal of Computer-Aided Molecular Design

, Volume 25, Issue 7, pp 649–662 | Cite as

Visualisation of the chemical space of fragments, lead-like and drug-like molecules in PubChem

  • Ruud van Deursen
  • Lorenz C. Blum
  • Jean-Louis Reymond


The 4.5 million organic molecules with up to 20 non-hydrogen atoms in PubChem were analyzed using the MQN-system, which consists in 42 integer value descriptors of molecular structure. The 42-dimensional MQN-space was visualised by principal component analysis and representation of the (PC1, PC2), (PC1, PC3) and (PC2, PC3) planes. The molecules were organized according to ring count (PC1, 38% of variance), the molecular size (PC2, 25% of variance), and the H-bond acceptor count (PC3, 12% of variance). Compounds following Lipinski’s bioavailability, Oprea’s lead-likeness and Congreve’s fragment-likeness criteria formed separated groups in MQN-space visible in the (PC2, PC3) plane. MQN-similarity searches of the 4.5 million molecules (see the browser available at gave significant enrichment factors for recovering groups of fragment-sized bioactive compounds related to ten different biological targets taken from Chembl, allowing lead-hopping relationships not seen with substructure fingerprint similarity searches. The diversity of different compound series was analyzed by MQN-distance histograms.

Graphical Abstract


PubChem Fragments Chemical space Virtual screening 



This work was supported financially by the University of Berne, the Swiss National Science Foundation, and the NCCR TransCure.



City-block distance in MQN-space. Values are positive integers with small numbers indicating high similarity


City-block distance in substructure-fingerprint space. Values are positive integers with small numbers indicating high similarity


Center of gravity of a compound series in MQN-space. 42-line vector composed of the 42 average values of the MQNs calculated across the compound series


Subset of PubChem hac < 20 following Oprea’s criteria for lead-likeness


Molecular Quantum Numbers. A set of 42 integer value descriptors of molecular structure. See Legend of Fig. 3 for detailed listing


A position in MQN-space corresonding to a particular combination of 42 MQN-values. An MQN-bin may contain one or several molecules


42-dimensional space created by the MQNs


Subset of PubChem hac < 20 following Congreve’s “rule of 3” for fragment-likeness


Subset of Ro3 with further restrictions in rotatable bonds (rbc ≤ 3) and polar surface area (PSA ≤ 60 Å2)


Subset of the PubChem hac ≤ 20 following Lipinski’s “rule of 5”


Substructure fingerprint. Here the 1024-bit Daylight-type substructure fingerprint in the JChem package of ChemAxon was used


Scalar Tanimoto similarity coefficient comparing two scalar vectors of 42 MQN-values. Values between 0 and 1 with 1 indicating high similarity


Binary Tanimoto similarity coefficient comparing two binary substructure fingerprints. Values between 0 and 1 with 1 indicating high similarity


  1. 1.
    Coyne AG, Scott DE, Abell C (2010) Drugging challenging targets using fragment-based approaches. Curr Opin Chem Biol 14:299–307CrossRefGoogle Scholar
  2. 2.
    Schulz MN, Hubbard RE (2009) Recent progress in fragment-based lead discovery. Curr Opin Pharmacol 9:615–621CrossRefGoogle Scholar
  3. 3.
    Hartenfeller M, Schneider G (2011) De novo drug design. Methods Mol Biol 672:299–323CrossRefGoogle Scholar
  4. 4.
    Venhorst J, Nunez S, Kruse CG (2010) Design of a high fragment efficiency library by molecular graph theory. ACS Med Chem Lett 1:499–503CrossRefGoogle Scholar
  5. 5.
    Carr RA, Congreve M, Murray CW, Rees DC (2005) Fragment-based lead discovery: leads by design. Drug Discov Today 10:987–992CrossRefGoogle Scholar
  6. 6.
    Rees DC, Congreve M, Murray CW, Carr R (2004) Fragment-based lead discovery. Nat Rev Drug Discov 3:660–672CrossRefGoogle Scholar
  7. 7.
    Hajduk PJ, Greer J (2007) A decade of fragment-based drug design: strategic advances and lessons learned. Nat Rev Drug Discov 6:211–219CrossRefGoogle Scholar
  8. 8.
    Boyd SM, de Kloe GE (2010) Fragment library design: efficiently hunting drugs in chemical space. Drug Discov Today 7:e173–e180CrossRefGoogle Scholar
  9. 9.
    Dobson CM (2004) Chemical space and biology. Nature 432:824–828CrossRefGoogle Scholar
  10. 10.
    Wang Y, Xiao J, Suzek TO, Zhang J, Wang J, Bryant SH (2009) PubChem: a public information system for analyzing bioactivities of small molecules. Nucleic Acids Res 37:W623–W633CrossRefGoogle Scholar
  11. 11.
    Nguyen KT, Blum LC, van Deursen R, Reymond JL (2009) Classification of organic molecules by molecular quantum numbers. ChemMedChem 4:1803–1805CrossRefGoogle Scholar
  12. 12.
    Wang SG, Schwarz WH (2009) Icon of chemistry: the periodic system of chemical elements in the new century. Angew Chem Int Ed Engl 48:3404–3415CrossRefGoogle Scholar
  13. 13.
    Pearlman RS, Smith KM (1998) Novel software tools for chemical diversity. Perspect Drug Discov Des 9–11:339–353CrossRefGoogle Scholar
  14. 14.
    Geppert H, Vogt M, Bajorath J (2010) Current trends in ligand-based virtual screening: molecular representations, data mining methods, new application areas, and performance evaluation. J Chem Inf Model 50:205–216CrossRefGoogle Scholar
  15. 15.
    Reymond JL, Van Deursen R, Blum LC, Ruddigkeit L (2010) Chemical space as a source for new drugs. Med Chem Commun 1:30–38CrossRefGoogle Scholar
  16. 16.
    Akella LB, DeCaprio D (2010) Cheminformatics approaches to analyze diversity in compound screening libraries. Curr Opin Chem Biol 14:325–330CrossRefGoogle Scholar
  17. 17.
    Burden FR (1989) Molecular-Identification Number for Substructure Searches. J Chem Inf Comput Sci 29:225–227Google Scholar
  18. 18.
    Pearlman RS, Smith KM (1999) Metric validation and the receptor-relevant subspace concept. J Chem Inf Comput Sci 39:28–35Google Scholar
  19. 19.
    Oprea TI, Gottfries J (2001) Chemography: the art of navigating in chemical space. J Comb Chem 3:157–166CrossRefGoogle Scholar
  20. 20.
    Rosen J, Gottfries J, Muresan S, Backlund A, Oprea TI (2009) Novel chemical space exploration via natural products. J Med Chem 52:1953–1962CrossRefGoogle Scholar
  21. 21.
    Willett P (2006) Similarity-based virtual screening using 2D fingerprints. Drug Discov Today 11:1046–1053CrossRefGoogle Scholar
  22. 22.
    Fink T, Bruggesser H, Reymond JL (2005) Virtual exploration of the small-molecule chemical universe below 160 daltons. Angew. Chem Int Ed Engl 44:1504–1508CrossRefGoogle Scholar
  23. 23.
    Fink T, Reymond JL (2007) Virtual exploration of the chemical universe up to 11 atoms of C, N, O, F: Assembly of 26.4 million structures (110.9 million stereoisomers) and analysis for new ring systems, stereochemistry, physicochemical properties, compound classes, and drug discovery. J Chem Inf Model 47:342–353CrossRefGoogle Scholar
  24. 24.
    Blum LC, Reymond JL (2009) 970 million druglike small molecules for virtual screening in the chemical universe database GDB-13. J Am Chem Soc 131:8732–8733CrossRefGoogle Scholar
  25. 25.
    Blum LC, van Deursen R, Reymond JL (2011) Visualisation and subsets of the chemical universe database GDB-13 for virtual screening. J Comput Aided Mol Des. doi: 10.1007/s10822-011-9436-y
  26. 26.
    Huang N, Shoichet BK, Irwin JJ (2006) Benchmarking sets for molecular docking. J Med Chem 49:6789–6801CrossRefGoogle Scholar
  27. 27.
    van Deursen R, Blum LC, Reymond JL (2010) A searchable map of PubChem. J Chem Inf Model 50:1924–1934CrossRefGoogle Scholar
  28. 28.
    Siegal G, Ab E, Schultz J (2007) Integration of fragment screening and library design. Drug Discov Today 12:1032–1039CrossRefGoogle Scholar
  29. 29.
    Congreve M, Chessari G, Tisi D, Woodhead AJ (2008) Recent developments in fragment-based drug discovery. J Med Chem 51:3661–3680CrossRefGoogle Scholar
  30. 30.
    Lipinski CA, Lombardo F, Dominy BW, Feeney PJ (1997) Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv Drug Del Rev 23:3–25CrossRefGoogle Scholar
  31. 31.
    Teague SJ, Davis AM, Leeson PD, Oprea T (1999) The Design of Leadlike Combinatorial Libraries. Angew Chem Int Ed Engl 38:3743–3748CrossRefGoogle Scholar
  32. 32.
    Congreve M, Carr R, Murray C, Jhoti H (2003) A rule of three for fragment-based lead discovery? Drug Discov Today 8:876–877CrossRefGoogle Scholar
  33. 33.
    Khalifa AA, Haranczyk M, Holliday J (2009) Comparison of nonbinary similarity coefficients for similarity searching, clustering and compound selection. J Chem Inf Model 49:1193–1201CrossRefGoogle Scholar
  34. 34.
    Overington J (2009) ChEMBL. An interview with John Overington, team leader, chemogenomics at the European Bioinformatics Institute Outstation of the European Molecular Biology Laboratory (EMBL-EBI) Interview by Wendy A. Warr. J Comput Aided Mol Des 23:195–198CrossRefGoogle Scholar
  35. 35.
    Schuffenhauer A, Brown N, Selzer P, Ertl P, Jacoby E (2005) Relationships between molecular complexity, biological activity, and structural diversity. J Chem Inf Model 46:525–535CrossRefGoogle Scholar
  36. 36.
    Krier M, Bret G, Rognan D (2006) Assessing the scaffold diversity of screening libraries. J Chem Inf Model 46:512–524CrossRefGoogle Scholar
  37. 37.
    Rogers D, Hahn M (2010) Extended-Connectivity Fingerprints. J Chem Inf Model 50:742–754CrossRefGoogle Scholar
  38. 38.
    Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. J Chem Inf Comput Sci 38:983–996Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2011

Authors and Affiliations

  • Ruud van Deursen
    • 1
  • Lorenz C. Blum
    • 1
  • Jean-Louis Reymond
    • 1
  1. 1.Department of Chemistry and Biochemistry, Swiss National Center of Competence in Research, NCCR-TransCureUniversity of BerneBerneSwitzerland

Personalised recommendations