Virtual high throughput screening of in-house compound collections and vendor catalogs is a validated approach in the quest for novel molecular entities. However, these libraries are small compared to the overall synthesizable number of compounds from validated "wet" chemical reactions in pharma companies or the public domain.

In order to overcome this limitation, we designed a large virtual combinatorial chemistry space from publicly available combinatorial libraries that gives access to billions of synthetically accessible compounds. Together with FTrees, a fuzzy similarity calculator, the researcher has a means of searching this KnowledgeSpace for analogues to one or several query molecules within a few minutes. The resulting compounds not only exhibit similar properties to the query molecule(s), but also feature an annotation through which of the synthetic routes these molecules can be made. Results can be expected to be diverse, based on FTrees scaffold hopping capabilities, and provide ideas for hit follow-up into novel compound classes.

In this contribution we present the design and properties of the KnowledgeSpace and other in-house chemistry spaces that build on the same strategy as well as validation of results and a number of successful applications including prospective results.