Skip to main content
Log in

The octet rule in chemical space: generating virtual molecules

  • Original Article
  • Published:
Molecular Diversity Aims and scope Submit manuscript

Abstract

We present a generator of virtual molecules that selects valid chemistry on the basis of the octet rule. Also, we introduce a mesomer group key that allows a fast detection of duplicates in the generated structures. Compared to existing approaches, our model is simpler and faster, generates new chemistry and avoids invalid chemistry. Its versatility is illustrated by the correct generation of molecules containing third-row elements and a surprisingly adept handling of complex boron chemistry. Without any empirical parameters, our model is designed to be valid also in unexplored regions of chemical space. One first unexpected finding is the high prevalence of dipolar structures among generated molecules.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. Milliseconds on a 3.4 GHz Intel Core i7-3770 CPU.

References

  1. Armiento R, Kozinsky B, Fornari M, Ceder G (2011) Screening for high-performance piezoelectrics using high-throughput density functional theory. Phys Rev B 84:014103. doi:10.1103/PhysRevB.84.014103

    Article  Google Scholar 

  2. Bolton EE, Wang Y, Thiessen PA, Bryant SH (2008) PubChem: integrated platform of small molecules and biological activities. Annu Rep Comput Chem 4:217–241. doi:10.1016/S1574-1400(08)00012-1

    Article  CAS  Google Scholar 

  3. CAS REGISTRY-The gold standard for chemical substance information (2016) http://www.cas.org/. Accessed Sept 2016

  4. Chevillard F, Kolb P (2015) Scubidoo: a large yet screenable and easily searchable database of computationally created chemical compounds optimized toward high likelihood of synthetic tractability. J Chem Inf Model 55:1824–1835. doi:10.1021/acs.jcim.5b00203

    Article  CAS  PubMed  Google Scholar 

  5. Cole JM, Low KS, Ozoe H, Stathi P, Kitamura C, Kurata H, Rudolf P, Kawase T (2014) Data mining with molecular design rules identifies new class of dyes for dye-sensitised solar cells. Phys Chem Chem Phys 16:26684–26690. doi:10.1039/C4CP02645D

    Article  CAS  PubMed  Google Scholar 

  6. Curtarolo S, Hart GL, Nardelli MB, Mingo N, Sanvito S, Levy O (2013) The high-throughput highway to computational materials design. Nat Mater 12:191–201. doi:10.1038/nmat3568

    Article  CAS  PubMed  Google Scholar 

  7. De Bartók AP S, Csányi G, Ceriotti M (2016) Comparing molecules and solids across structural and alchemical space. Phys Chem Chem Phys 18:13754–13769. doi:10.1039/C6CP00415F

    Article  PubMed  Google Scholar 

  8. Fink T, Reymond JL (2007) Virtual exploration of the chemical universe up to 11 atoms of C, N, O, F: assembly of 26.4 million structures (110.9 million stereoisomers) and analysis for new ring systems, stereochemistry, physicochemical properties, compound classes, and drug discovery. J Chem Inf Model 47:342–353. doi:10.1021/ci600423u

    Article  CAS  PubMed  Google Scholar 

  9. Fink T, Bruggesser H, Reymond JL (2005) Virtual exploration of the small-molecule chemical universe below 160 daltons. Angew Chem Int Ed 44:1504–1508. doi:10.1002/anie.200462457

    Article  CAS  Google Scholar 

  10. Gugisch R, Kerber A, Laue R, Meringer M, Weidinger J (2000) MOLGEN-COMB, a software package for combinatorial chemistry. Match Commun Math Co 41:189–203

    CAS  Google Scholar 

  11. Hachmann J, Olivares-Amaya R, Jinich A, Appleton AL, Blood-Forsythe MA, Seress LR, Roman-Salgado C, Trepte K, Atahan-Evrenk S, Er S, Shrestha S, Mondal R, Sokolov A, Bao Z, Aspuru-Guzik A (2014) Lead candidates for high-performance organic photovoltaics from high-throughput quantum chemistry-the Harvard clean energy project. Energy Environ Sci 7:698–704. doi:10.1039/C3EE42756K

    Article  CAS  Google Scholar 

  12. Hamdalla MA, Mandoiu II, Hill DW, Rajasekaran S, Grant DF (2013) BioSM: metabolomics tool for identifying endogenous mammalian biochemical structures in chemical structure space. J Chem Inf Model 53:601–612. doi:10.1021/ci300512q

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Hoksza D, Škoda P, Voršilák M, Svozil D (2014) Molpher: a software framework for systematic chemical space exploration. J Cheminform 6:1. doi:10.1186/1758-2946-6-7

    Article  Google Scholar 

  14. Husch T, Korth M (2015) Charting the known chemical space for non-aqueous lithium-air battery electrolyte solvents. Phys Chem Chem Phys 17:22596–22603. doi:10.1039/C5CP02937F

    Article  CAS  PubMed  Google Scholar 

  15. Kayala MA, Baldi P (2012) Reactionpredictor: prediction of complex chemical reactions at the mechanistic level using machine learning. J Chem Inf Model 52:2526–2540. doi:10.1021/ci3003039

    Article  CAS  PubMed  Google Scholar 

  16. Kayala MA, Azencott CA, Chen JH, Baldi P (2011) Learning to predict chemical reactions. J Chem Inf Model 51:2209–2222. doi:10.1021/ci200207y

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Korth M, Grimme S (2009) “Mindless” DFT benchmarking. J Chem Theory Comput 5:993–1003. doi:10.1021/ct800511q

    Article  CAS  PubMed  Google Scholar 

  18. McKay BD, Piperno A (2014) Practical graph isomorphism, ii. J Symb Comput 60:94–112. doi:10.1016/j.jsc.2013.09.003

    Article  Google Scholar 

  19. McNaught AD, Wilkinson A (1997) IUPAC. Compendium of Chemical Terminology, (the “Gold Book”). Blackwell Scientific Publications, Oxford, xML on-line corrected version: http://goldbook.iupac.org

  20. Mfuh AM, Larionov OV (2015) Heterocyclic n-oxides-an emerging class of therapeutic agents. Curr Med Chem 22:2819–2857

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Nuzillard JM (2003) Automatic structure determination of organic molecules: principle and implementation of the LSD program. Chin J Chem 21:1263–1267. doi:10.1002/cjoc.20030211006

    Article  CAS  Google Scholar 

  22. O’Boyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR (2011a) Open babel: an open chemical toolbox. J Cheminform 3:33. doi:10.1186/1758-2946-3-33

    Article  PubMed  PubMed Central  Google Scholar 

  23. O’Boyle NM, Campbell CM, Hutchison GR (2011b) Computational design and selection of optimal organic photovoltaic materials. J Phys Chem C 115:16200–16210. doi:10.1021/jp202765c

    Article  Google Scholar 

  24. Octet rule (2016) Octet rule. Wikipedia: the free encyclopedia, Wikimedia Foundation, Inc., accessed Sept 2016

  25. Peironcely JE, Reijmers T, Coulier L, Bender A, Hankemeier T (2011) Understanding and classifying metabolite space and metabolite-likeness. PloS One 6:e28966. doi:10.1371/journal.pone.0028966

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Peironcely JE, Rojas-Chertó M, Fichera D, Reijmers T, Coulier L, Faulon JL, Hankemeier T (2012) OMG: open molecule generator. J Cheminform 4:1. doi:10.1186/1758-2946-4-21

    Article  Google Scholar 

  27. Reymond JL (2015) The chemical space project. Acc Chem Res 48:722–730. doi:10.1021/ar500432k

    Article  CAS  PubMed  Google Scholar 

  28. Rupakheti C, Virshup A, Yang W, Beratan DN (2015) Strategy to discover diverse optimal molecules in the small molecule universe. J Chem Inf Model 55:529–537. doi:10.1021/ci500749q

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Russell S, Norvig P (2013) Artificial Intelligence, A Modern Approach. Pearson, Essex

    Google Scholar 

  30. Scannell JW, Blanckley A, Boldon H, Warrington B (2012) Diagnosing the decline in pharmaceutical randd efficiency. Nat Rev Drug Discov 11:191–200. doi:10.1038/nrd3681

    Article  CAS  PubMed  Google Scholar 

  31. Schäfer A, Huber C, Ahlrichs R (1994) Fully optimized contracted gaussian basis sets of triple zeta valence quality for atoms Li to Kr. J Chem Phys 100:5829–5835. doi:10.1063/1.467146

    Article  Google Scholar 

  32. Steinbeck C, Han Y, Kuhn S, Horlacher O, Luttmann E, Willighagen E (2003) The chemistry development kit (CDK): an open-source java library for chemo-and bioinformatics. J Chem Inf Comput Sci 43:493–500. doi:10.1021/ci025584y

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. The IUPAC International Chemical Identifier (InChI) (2016) http://www.iupac.org/who-we-are/divisions/division-details/inchi/. Accessed Sept 2016

  34. Valiev M, Bylaska EJ, Govind N, Kowalski K, Straatsma TP, Van Dam HJ, Wang D, Nieplocha J, Apra E, Windus TL (2010) NWChem: a comprehensive and scalable open-source solution for large scale molecular simulations. Comput Phys Commun 181:1477–1489. doi:10.1016/j.cpc.2010.04.018

    Article  CAS  Google Scholar 

  35. van Deursen R, Reymond JL (2007) Chemical space travel. Chem Med Chem 2:636–640. doi:10.1002/cmdc.200700021

    Article  PubMed  Google Scholar 

  36. Virshup AM, Contreras-García J, Wipf P, Yang W, Beratan DN (2013) Stochastic voyages into uncharted chemical space produce a representative library of all possible drug-like compounds. J Am Chem Soc 135:7296–7303. doi:10.1021/ja401184g

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. von Lilienfeld OA (2013) First principles view on chemical compound space: gaining rigorous atomistic control of molecular properties. Int J Quantum Chem 113:1676–1689. doi:10.1002/qua.24375

    Article  Google Scholar 

  38. von Lilienfeld OA, Ramakrishnan R, Rupp M, Knoll A (2015) Fourier series of atomic radial distribution functions: a molecular fingerprint for machine learning models of quantum chemical properties. Int J Quantum Chem 115:1084–1093. doi:10.1002/qua.24912

    Article  Google Scholar 

  39. Weigend F, Ahlrichs R (2005) Balanced basis sets of split valence, triple zeta valence and quadruple zeta valence quality for h to rn: design and assessment of accuracy. Phys Chem Chem Phys 7:3297–3305. doi:10.1039/B508541A

    Article  CAS  PubMed  Google Scholar 

  40. Weininger D (1970) SMILES, a chemical language and information system. 1. introduction to methodology and encoding rules. Proc Edinb Math SOC 17:1–14. doi:10.1021/ci00057a005

    Article  Google Scholar 

  41. Zhao Y, Truhlar DG (2006) A new local density functional for main-group thermochemistry, transition metal bonding, thermochemical kinetics, and noncovalent interactions. J Chem Phys 125:194101. doi:10.1063/1.2370993

    Article  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jan Hamaekers.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Israels, R., Maaß, A. & Hamaekers, J. The octet rule in chemical space: generating virtual molecules. Mol Divers 21, 769–778 (2017). https://doi.org/10.1007/s11030-017-9775-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11030-017-9775-2

Keywords

Navigation