Visualisation of the chemical space of fragments, lead-like and drug-like molecules in PubChem
- First Online:
- Cite this article as:
- van Deursen, R., Blum, L.C. & Reymond, JL. J Comput Aided Mol Des (2011) 25: 649. doi:10.1007/s10822-011-9437-x
- 381 Downloads
The 4.5 million organic molecules with up to 20 non-hydrogen atoms in PubChem were analyzed using the MQN-system, which consists in 42 integer value descriptors of molecular structure. The 42-dimensional MQN-space was visualised by principal component analysis and representation of the (PC1, PC2), (PC1, PC3) and (PC2, PC3) planes. The molecules were organized according to ring count (PC1, 38% of variance), the molecular size (PC2, 25% of variance), and the H-bond acceptor count (PC3, 12% of variance). Compounds following Lipinski’s bioavailability, Oprea’s lead-likeness and Congreve’s fragment-likeness criteria formed separated groups in MQN-space visible in the (PC2, PC3) plane. MQN-similarity searches of the 4.5 million molecules (see the browser available at www.gdb.unibe.ch) gave significant enrichment factors for recovering groups of fragment-sized bioactive compounds related to ten different biological targets taken from Chembl, allowing lead-hopping relationships not seen with substructure fingerprint similarity searches. The diversity of different compound series was analyzed by MQN-distance histograms.