Abstract
We present a generator of virtual molecules that selects valid chemistry on the basis of the octet rule. Also, we introduce a mesomer group key that allows a fast detection of duplicates in the generated structures. Compared to existing approaches, our model is simpler and faster, generates new chemistry and avoids invalid chemistry. Its versatility is illustrated by the correct generation of molecules containing third-row elements and a surprisingly adept handling of complex boron chemistry. Without any empirical parameters, our model is designed to be valid also in unexplored regions of chemical space. One first unexpected finding is the high prevalence of dipolar structures among generated molecules.
Similar content being viewed by others
Notes
Milliseconds on a 3.4 GHz Intel Core i7-3770 CPU.
References
Armiento R, Kozinsky B, Fornari M, Ceder G (2011) Screening for high-performance piezoelectrics using high-throughput density functional theory. Phys Rev B 84:014103. doi:10.1103/PhysRevB.84.014103
Bolton EE, Wang Y, Thiessen PA, Bryant SH (2008) PubChem: integrated platform of small molecules and biological activities. Annu Rep Comput Chem 4:217–241. doi:10.1016/S1574-1400(08)00012-1
CAS REGISTRY-The gold standard for chemical substance information (2016) http://www.cas.org/. Accessed Sept 2016
Chevillard F, Kolb P (2015) Scubidoo: a large yet screenable and easily searchable database of computationally created chemical compounds optimized toward high likelihood of synthetic tractability. J Chem Inf Model 55:1824–1835. doi:10.1021/acs.jcim.5b00203
Cole JM, Low KS, Ozoe H, Stathi P, Kitamura C, Kurata H, Rudolf P, Kawase T (2014) Data mining with molecular design rules identifies new class of dyes for dye-sensitised solar cells. Phys Chem Chem Phys 16:26684–26690. doi:10.1039/C4CP02645D
Curtarolo S, Hart GL, Nardelli MB, Mingo N, Sanvito S, Levy O (2013) The high-throughput highway to computational materials design. Nat Mater 12:191–201. doi:10.1038/nmat3568
De Bartók AP S, Csányi G, Ceriotti M (2016) Comparing molecules and solids across structural and alchemical space. Phys Chem Chem Phys 18:13754–13769. doi:10.1039/C6CP00415F
Fink T, Reymond JL (2007) Virtual exploration of the chemical universe up to 11 atoms of C, N, O, F: assembly of 26.4 million structures (110.9 million stereoisomers) and analysis for new ring systems, stereochemistry, physicochemical properties, compound classes, and drug discovery. J Chem Inf Model 47:342–353. doi:10.1021/ci600423u
Fink T, Bruggesser H, Reymond JL (2005) Virtual exploration of the small-molecule chemical universe below 160 daltons. Angew Chem Int Ed 44:1504–1508. doi:10.1002/anie.200462457
Gugisch R, Kerber A, Laue R, Meringer M, Weidinger J (2000) MOLGEN-COMB, a software package for combinatorial chemistry. Match Commun Math Co 41:189–203
Hachmann J, Olivares-Amaya R, Jinich A, Appleton AL, Blood-Forsythe MA, Seress LR, Roman-Salgado C, Trepte K, Atahan-Evrenk S, Er S, Shrestha S, Mondal R, Sokolov A, Bao Z, Aspuru-Guzik A (2014) Lead candidates for high-performance organic photovoltaics from high-throughput quantum chemistry-the Harvard clean energy project. Energy Environ Sci 7:698–704. doi:10.1039/C3EE42756K
Hamdalla MA, Mandoiu II, Hill DW, Rajasekaran S, Grant DF (2013) BioSM: metabolomics tool for identifying endogenous mammalian biochemical structures in chemical structure space. J Chem Inf Model 53:601–612. doi:10.1021/ci300512q
Hoksza D, Škoda P, Voršilák M, Svozil D (2014) Molpher: a software framework for systematic chemical space exploration. J Cheminform 6:1. doi:10.1186/1758-2946-6-7
Husch T, Korth M (2015) Charting the known chemical space for non-aqueous lithium-air battery electrolyte solvents. Phys Chem Chem Phys 17:22596–22603. doi:10.1039/C5CP02937F
Kayala MA, Baldi P (2012) Reactionpredictor: prediction of complex chemical reactions at the mechanistic level using machine learning. J Chem Inf Model 52:2526–2540. doi:10.1021/ci3003039
Kayala MA, Azencott CA, Chen JH, Baldi P (2011) Learning to predict chemical reactions. J Chem Inf Model 51:2209–2222. doi:10.1021/ci200207y
Korth M, Grimme S (2009) “Mindless” DFT benchmarking. J Chem Theory Comput 5:993–1003. doi:10.1021/ct800511q
McKay BD, Piperno A (2014) Practical graph isomorphism, ii. J Symb Comput 60:94–112. doi:10.1016/j.jsc.2013.09.003
McNaught AD, Wilkinson A (1997) IUPAC. Compendium of Chemical Terminology, (the “Gold Book”). Blackwell Scientific Publications, Oxford, xML on-line corrected version: http://goldbook.iupac.org
Mfuh AM, Larionov OV (2015) Heterocyclic n-oxides-an emerging class of therapeutic agents. Curr Med Chem 22:2819–2857
Nuzillard JM (2003) Automatic structure determination of organic molecules: principle and implementation of the LSD program. Chin J Chem 21:1263–1267. doi:10.1002/cjoc.20030211006
O’Boyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR (2011a) Open babel: an open chemical toolbox. J Cheminform 3:33. doi:10.1186/1758-2946-3-33
O’Boyle NM, Campbell CM, Hutchison GR (2011b) Computational design and selection of optimal organic photovoltaic materials. J Phys Chem C 115:16200–16210. doi:10.1021/jp202765c
Octet rule (2016) Octet rule. Wikipedia: the free encyclopedia, Wikimedia Foundation, Inc., accessed Sept 2016
Peironcely JE, Reijmers T, Coulier L, Bender A, Hankemeier T (2011) Understanding and classifying metabolite space and metabolite-likeness. PloS One 6:e28966. doi:10.1371/journal.pone.0028966
Peironcely JE, Rojas-Chertó M, Fichera D, Reijmers T, Coulier L, Faulon JL, Hankemeier T (2012) OMG: open molecule generator. J Cheminform 4:1. doi:10.1186/1758-2946-4-21
Reymond JL (2015) The chemical space project. Acc Chem Res 48:722–730. doi:10.1021/ar500432k
Rupakheti C, Virshup A, Yang W, Beratan DN (2015) Strategy to discover diverse optimal molecules in the small molecule universe. J Chem Inf Model 55:529–537. doi:10.1021/ci500749q
Russell S, Norvig P (2013) Artificial Intelligence, A Modern Approach. Pearson, Essex
Scannell JW, Blanckley A, Boldon H, Warrington B (2012) Diagnosing the decline in pharmaceutical randd efficiency. Nat Rev Drug Discov 11:191–200. doi:10.1038/nrd3681
Schäfer A, Huber C, Ahlrichs R (1994) Fully optimized contracted gaussian basis sets of triple zeta valence quality for atoms Li to Kr. J Chem Phys 100:5829–5835. doi:10.1063/1.467146
Steinbeck C, Han Y, Kuhn S, Horlacher O, Luttmann E, Willighagen E (2003) The chemistry development kit (CDK): an open-source java library for chemo-and bioinformatics. J Chem Inf Comput Sci 43:493–500. doi:10.1021/ci025584y
The IUPAC International Chemical Identifier (InChI) (2016) http://www.iupac.org/who-we-are/divisions/division-details/inchi/. Accessed Sept 2016
Valiev M, Bylaska EJ, Govind N, Kowalski K, Straatsma TP, Van Dam HJ, Wang D, Nieplocha J, Apra E, Windus TL (2010) NWChem: a comprehensive and scalable open-source solution for large scale molecular simulations. Comput Phys Commun 181:1477–1489. doi:10.1016/j.cpc.2010.04.018
van Deursen R, Reymond JL (2007) Chemical space travel. Chem Med Chem 2:636–640. doi:10.1002/cmdc.200700021
Virshup AM, Contreras-García J, Wipf P, Yang W, Beratan DN (2013) Stochastic voyages into uncharted chemical space produce a representative library of all possible drug-like compounds. J Am Chem Soc 135:7296–7303. doi:10.1021/ja401184g
von Lilienfeld OA (2013) First principles view on chemical compound space: gaining rigorous atomistic control of molecular properties. Int J Quantum Chem 113:1676–1689. doi:10.1002/qua.24375
von Lilienfeld OA, Ramakrishnan R, Rupp M, Knoll A (2015) Fourier series of atomic radial distribution functions: a molecular fingerprint for machine learning models of quantum chemical properties. Int J Quantum Chem 115:1084–1093. doi:10.1002/qua.24912
Weigend F, Ahlrichs R (2005) Balanced basis sets of split valence, triple zeta valence and quadruple zeta valence quality for h to rn: design and assessment of accuracy. Phys Chem Chem Phys 7:3297–3305. doi:10.1039/B508541A
Weininger D (1970) SMILES, a chemical language and information system. 1. introduction to methodology and encoding rules. Proc Edinb Math SOC 17:1–14. doi:10.1021/ci00057a005
Zhao Y, Truhlar DG (2006) A new local density functional for main-group thermochemistry, transition metal bonding, thermochemical kinetics, and noncovalent interactions. J Chem Phys 125:194101. doi:10.1063/1.2370993
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Israels, R., Maaß, A. & Hamaekers, J. The octet rule in chemical space: generating virtual molecules. Mol Divers 21, 769–778 (2017). https://doi.org/10.1007/s11030-017-9775-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11030-017-9775-2