Molecular Diversity

, Volume 21, Issue 4, pp 769–778 | Cite as

The octet rule in chemical space: generating virtual molecules

  • Rafel Israels
  • Astrid Maaß
  • Jan Hamaekers
Original Article


We present a generator of virtual molecules that selects valid chemistry on the basis of the octet rule. Also, we introduce a mesomer group key that allows a fast detection of duplicates in the generated structures. Compared to existing approaches, our model is simpler and faster, generates new chemistry and avoids invalid chemistry. Its versatility is illustrated by the correct generation of molecules containing third-row elements and a surprisingly adept handling of complex boron chemistry. Without any empirical parameters, our model is designed to be valid also in unexplored regions of chemical space. One first unexpected finding is the high prevalence of dipolar structures among generated molecules.


Chemical space Virtual chemistry GDB Virtual compounds Virtual libraries 


  1. 1.
    Armiento R, Kozinsky B, Fornari M, Ceder G (2011) Screening for high-performance piezoelectrics using high-throughput density functional theory. Phys Rev B 84:014103. doi: 10.1103/PhysRevB.84.014103 CrossRefGoogle Scholar
  2. 2.
    Bolton EE, Wang Y, Thiessen PA, Bryant SH (2008) PubChem: integrated platform of small molecules and biological activities. Annu Rep Comput Chem 4:217–241. doi: 10.1016/S1574-1400(08)00012-1 CrossRefGoogle Scholar
  3. 3.
    CAS REGISTRY-The gold standard for chemical substance information (2016) Accessed Sept 2016
  4. 4.
    Chevillard F, Kolb P (2015) Scubidoo: a large yet screenable and easily searchable database of computationally created chemical compounds optimized toward high likelihood of synthetic tractability. J Chem Inf Model 55:1824–1835. doi: 10.1021/acs.jcim.5b00203 CrossRefPubMedGoogle Scholar
  5. 5.
    Cole JM, Low KS, Ozoe H, Stathi P, Kitamura C, Kurata H, Rudolf P, Kawase T (2014) Data mining with molecular design rules identifies new class of dyes for dye-sensitised solar cells. Phys Chem Chem Phys 16:26684–26690. doi: 10.1039/C4CP02645D CrossRefPubMedGoogle Scholar
  6. 6.
    Curtarolo S, Hart GL, Nardelli MB, Mingo N, Sanvito S, Levy O (2013) The high-throughput highway to computational materials design. Nat Mater 12:191–201. doi: 10.1038/nmat3568 CrossRefPubMedGoogle Scholar
  7. 7.
    De Bartók AP S, Csányi G, Ceriotti M (2016) Comparing molecules and solids across structural and alchemical space. Phys Chem Chem Phys 18:13754–13769. doi: 10.1039/C6CP00415F CrossRefPubMedGoogle Scholar
  8. 8.
    Fink T, Reymond JL (2007) Virtual exploration of the chemical universe up to 11 atoms of C, N, O, F: assembly of 26.4 million structures (110.9 million stereoisomers) and analysis for new ring systems, stereochemistry, physicochemical properties, compound classes, and drug discovery. J Chem Inf Model 47:342–353. doi: 10.1021/ci600423u CrossRefPubMedGoogle Scholar
  9. 9.
    Fink T, Bruggesser H, Reymond JL (2005) Virtual exploration of the small-molecule chemical universe below 160 daltons. Angew Chem Int Ed 44:1504–1508. doi: 10.1002/anie.200462457 CrossRefGoogle Scholar
  10. 10.
    Gugisch R, Kerber A, Laue R, Meringer M, Weidinger J (2000) MOLGEN-COMB, a software package for combinatorial chemistry. Match Commun Math Co 41:189–203Google Scholar
  11. 11.
    Hachmann J, Olivares-Amaya R, Jinich A, Appleton AL, Blood-Forsythe MA, Seress LR, Roman-Salgado C, Trepte K, Atahan-Evrenk S, Er S, Shrestha S, Mondal R, Sokolov A, Bao Z, Aspuru-Guzik A (2014) Lead candidates for high-performance organic photovoltaics from high-throughput quantum chemistry-the Harvard clean energy project. Energy Environ Sci 7:698–704. doi: 10.1039/C3EE42756K CrossRefGoogle Scholar
  12. 12.
    Hamdalla MA, Mandoiu II, Hill DW, Rajasekaran S, Grant DF (2013) BioSM: metabolomics tool for identifying endogenous mammalian biochemical structures in chemical structure space. J Chem Inf Model 53:601–612. doi: 10.1021/ci300512q CrossRefPubMedPubMedCentralGoogle Scholar
  13. 13.
    Hoksza D, Škoda P, Voršilák M, Svozil D (2014) Molpher: a software framework for systematic chemical space exploration. J Cheminform 6:1. doi: 10.1186/1758-2946-6-7 CrossRefGoogle Scholar
  14. 14.
    Husch T, Korth M (2015) Charting the known chemical space for non-aqueous lithium-air battery electrolyte solvents. Phys Chem Chem Phys 17:22596–22603. doi: 10.1039/C5CP02937F CrossRefPubMedGoogle Scholar
  15. 15.
    Kayala MA, Baldi P (2012) Reactionpredictor: prediction of complex chemical reactions at the mechanistic level using machine learning. J Chem Inf Model 52:2526–2540. doi: 10.1021/ci3003039 CrossRefPubMedGoogle Scholar
  16. 16.
    Kayala MA, Azencott CA, Chen JH, Baldi P (2011) Learning to predict chemical reactions. J Chem Inf Model 51:2209–2222. doi: 10.1021/ci200207y CrossRefPubMedPubMedCentralGoogle Scholar
  17. 17.
    Korth M, Grimme S (2009) “Mindless” DFT benchmarking. J Chem Theory Comput 5:993–1003. doi: 10.1021/ct800511q CrossRefPubMedGoogle Scholar
  18. 18.
    McKay BD, Piperno A (2014) Practical graph isomorphism, ii. J Symb Comput 60:94–112. doi: 10.1016/j.jsc.2013.09.003 CrossRefGoogle Scholar
  19. 19.
    McNaught AD, Wilkinson A (1997) IUPAC. Compendium of Chemical Terminology, (the “Gold Book”). Blackwell Scientific Publications, Oxford, xML on-line corrected version:
  20. 20.
    Mfuh AM, Larionov OV (2015) Heterocyclic n-oxides-an emerging class of therapeutic agents. Curr Med Chem 22:2819–2857CrossRefPubMedPubMedCentralGoogle Scholar
  21. 21.
    Nuzillard JM (2003) Automatic structure determination of organic molecules: principle and implementation of the LSD program. Chin J Chem 21:1263–1267. doi: 10.1002/cjoc.20030211006 CrossRefGoogle Scholar
  22. 22.
    O’Boyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR (2011a) Open babel: an open chemical toolbox. J Cheminform 3:33. doi: 10.1186/1758-2946-3-33 CrossRefPubMedPubMedCentralGoogle Scholar
  23. 23.
    O’Boyle NM, Campbell CM, Hutchison GR (2011b) Computational design and selection of optimal organic photovoltaic materials. J Phys Chem C 115:16200–16210. doi: 10.1021/jp202765c CrossRefGoogle Scholar
  24. 24.
    Octet rule (2016) Octet rule. Wikipedia: the free encyclopedia, Wikimedia Foundation, Inc., accessed Sept 2016Google Scholar
  25. 25.
    Peironcely JE, Reijmers T, Coulier L, Bender A, Hankemeier T (2011) Understanding and classifying metabolite space and metabolite-likeness. PloS One 6:e28966. doi: 10.1371/journal.pone.0028966 CrossRefPubMedPubMedCentralGoogle Scholar
  26. 26.
    Peironcely JE, Rojas-Chertó M, Fichera D, Reijmers T, Coulier L, Faulon JL, Hankemeier T (2012) OMG: open molecule generator. J Cheminform 4:1. doi: 10.1186/1758-2946-4-21 CrossRefGoogle Scholar
  27. 27.
    Reymond JL (2015) The chemical space project. Acc Chem Res 48:722–730. doi: 10.1021/ar500432k CrossRefPubMedGoogle Scholar
  28. 28.
    Rupakheti C, Virshup A, Yang W, Beratan DN (2015) Strategy to discover diverse optimal molecules in the small molecule universe. J Chem Inf Model 55:529–537. doi: 10.1021/ci500749q CrossRefPubMedPubMedCentralGoogle Scholar
  29. 29.
    Russell S, Norvig P (2013) Artificial Intelligence, A Modern Approach. Pearson, EssexGoogle Scholar
  30. 30.
    Scannell JW, Blanckley A, Boldon H, Warrington B (2012) Diagnosing the decline in pharmaceutical randd efficiency. Nat Rev Drug Discov 11:191–200. doi: 10.1038/nrd3681 CrossRefPubMedGoogle Scholar
  31. 31.
    Schäfer A, Huber C, Ahlrichs R (1994) Fully optimized contracted gaussian basis sets of triple zeta valence quality for atoms Li to Kr. J Chem Phys 100:5829–5835. doi: 10.1063/1.467146 CrossRefGoogle Scholar
  32. 32.
    Steinbeck C, Han Y, Kuhn S, Horlacher O, Luttmann E, Willighagen E (2003) The chemistry development kit (CDK): an open-source java library for chemo-and bioinformatics. J Chem Inf Comput Sci 43:493–500. doi: 10.1021/ci025584y CrossRefPubMedPubMedCentralGoogle Scholar
  33. 33.
    The IUPAC International Chemical Identifier (InChI) (2016) Accessed Sept 2016
  34. 34.
    Valiev M, Bylaska EJ, Govind N, Kowalski K, Straatsma TP, Van Dam HJ, Wang D, Nieplocha J, Apra E, Windus TL (2010) NWChem: a comprehensive and scalable open-source solution for large scale molecular simulations. Comput Phys Commun 181:1477–1489. doi: 10.1016/j.cpc.2010.04.018 CrossRefGoogle Scholar
  35. 35.
    van Deursen R, Reymond JL (2007) Chemical space travel. Chem Med Chem 2:636–640. doi: 10.1002/cmdc.200700021 CrossRefPubMedGoogle Scholar
  36. 36.
    Virshup AM, Contreras-García J, Wipf P, Yang W, Beratan DN (2013) Stochastic voyages into uncharted chemical space produce a representative library of all possible drug-like compounds. J Am Chem Soc 135:7296–7303. doi: 10.1021/ja401184g CrossRefPubMedPubMedCentralGoogle Scholar
  37. 37.
    von Lilienfeld OA (2013) First principles view on chemical compound space: gaining rigorous atomistic control of molecular properties. Int J Quantum Chem 113:1676–1689. doi: 10.1002/qua.24375 CrossRefGoogle Scholar
  38. 38.
    von Lilienfeld OA, Ramakrishnan R, Rupp M, Knoll A (2015) Fourier series of atomic radial distribution functions: a molecular fingerprint for machine learning models of quantum chemical properties. Int J Quantum Chem 115:1084–1093. doi: 10.1002/qua.24912 CrossRefGoogle Scholar
  39. 39.
    Weigend F, Ahlrichs R (2005) Balanced basis sets of split valence, triple zeta valence and quadruple zeta valence quality for h to rn: design and assessment of accuracy. Phys Chem Chem Phys 7:3297–3305. doi: 10.1039/B508541A CrossRefPubMedGoogle Scholar
  40. 40.
    Weininger D (1970) SMILES, a chemical language and information system. 1. introduction to methodology and encoding rules. Proc Edinb Math SOC 17:1–14. doi: 10.1021/ci00057a005 CrossRefGoogle Scholar
  41. 41.
    Zhao Y, Truhlar DG (2006) A new local density functional for main-group thermochemistry, transition metal bonding, thermochemical kinetics, and noncovalent interactions. J Chem Phys 125:194101. doi: 10.1063/1.2370993 CrossRefPubMedGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Fraunhofer-Institut für Algorithmen und Wissenschaftliches Rechnen SCAI, Schloss BirlinghovenSankt AugustinGermany

Personalised recommendations