Diversifying chemical libraries with generative topographic mapping
- 125 Downloads
Generative topographic mapping was used to investigate the possibility to diversify the in-house compounds collection of Boehringer Ingelheim (BI). For this purpose, a 2D map covering the relevant chemical space was trained, and the BI compound library was compared to the Aldrich-Market Select (AMS) database of more than 8M purchasable compounds. In order to discover new (sub)structures, the “AutoZoom” tool was developed and applied in order to analyze chemotypes of molecules residing in heavily populated zones of a map and to extract the corresponding maximum common substructures. A set of 401K new structures from the AMS database was retrieved and checked for drug-likeness and biological activity.
KeywordsGenerative topographic mapping Chemical library diversity enrichment Big data
Generative topographic mapping
Radial basis function
Aldrich Market Select
Maximum common substructure
The authors thank Boehringer Ingelheim Pharma GmbH & Co KG for the provided data.
The manuscript was written through contributions of all authors. All authors have given approval to the final version of the manuscript.
The project leading to this article has received funding from the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie Grant Agreement No. 676434, “Big Data in Chemistry” (“BIGCHEM”, http://bigchem.eu).
- 6.Chang J-W, Jin D-S (2003) A new cell-based clustering method for large, high-dimensional data in data mining applications. In: Proceedings of the 2002 ACM symposium on Applied computing. ACM, p 503Google Scholar
- 16.ChemAxon Standardizer. https://docs.chemaxon.com/display/docs/Standardizer. Accessed 1 Feb 2019
- 17.ChemAxon JChem. https://chemaxon.com/products/jchem-engines. Accessed 1 Feb 2019
- 20.Monev V (2004) Introduction to similarity searching in chemistry *. Match-Commun Math Comput Chem 51:7–38Google Scholar
- 21.(2019) RDKit: Open-source cheminformatics. http://www.rdkit.org. Accessed 1 Feb 2019
- 28.Marcou G, Solov’ev VP, Horvath D, Varnek A (2017) ISIDA fragmentor—user manualGoogle Scholar
- 32.Oliphant TE (2006) A guide to NumPy. Tregol Publishing, USAGoogle Scholar
- 34.Inc. PT (2015) Collaborative data science. In: Plotly Technol. Inc. https://plot.ly. Accessed 1 Feb 2019
- 36.Brenk R, Schipani A, James D et al (2008) Lessons learnt from assembling screening libraries for drug discovery for neglected diseases. ChemMedChem Chem Enabling Drug Discov 3:435–444Google Scholar