Background & Summary

Strong interest in virtual library screening started to emerge in the 1990s, with the advent of combinatorial chemistry and parallel synthesis1. Since then, progress in the field has been incremental, and driven mainly by two factors: the growing size of chemical libraries, and the exponential increase in computational power enabling the screen of ever larger compound collections. Indeed, it is now established that virtually screening larger libraries leads to the discovery of better fitting molecules for a given binding site2. A compounding factor is the emergence of deep learning methods that are expected to soon enable robust screening with speed that is not accessible to physics-based approaches3.

Based on these observations, sustained efforts are ongoing to increase the size of the synthetically accessible chemical space. The main actors in the field include chemical vendors such as Enamine, WuXi, Otava chemicals, or Mcule, that all have catalogs in the billions of molecules. In particular, Enamine now offers a library of 6 billion make-on-demand compounds from their Enamine REAL database4, and 48 billion make-on-demand compounds from the REAL space5. A similar quest takes place in industry, where pharmaceutical companies are rapidly growing their searchable chemical space6. In the public sector, the Synthetically accessible Virtual Inventory (SaVI) is composed of 1.75 billion compounds accessible with commercial reagents using a collection of 53 chemical reactions7.

While the reactions used by chemical vendors and SaVI are generally well known to medicinal chemists, taking advantage of the innovative chemistry invented in academic laboratories could open the gates to vast areas of the chemical space so far less accessible to drug discovery. As a pioneering example, an efficient synthetic scheme for tetrahydropyridines developed by Ellman and colleagues enabled the constitution of a bespoke library of 75 million molecules focused on aminergic G-protein-coupled receptors, and mostly absent from chemical catalogs. Virtual screening of this biased set led to the discovery of the first 5-HT2A receptor agonists with antidepressant activity8. Inspired by this approach, we initiated the enumeration of the Pan-Canadian Chemical Library (PCCL), where chemical reactions developed by a growing network of academic chemistry groups across Canada are enumerated into a virtual screening-ready collection of compounds chemically accessible with commercially available reagents and up to two synthetic steps.

The PCCL combines chemical reactions from the academic laboratories of the research groups of Prof. Robert Batey at the University of Toronto, Prof. Tabitha Wood at the University of Winnipeg, and Prof. Frederick West at the University of Alberta. Combined with compatible reagents from the ZINC database, these reactions generate more than 148 billion compounds synthesizable at any cost, and up to 401 million cheap compounds, where “cheap” compounds are defined as made from in stock building blocks listed in the ZINC database with the best combination of price and delivery speed. Among these more affordable molecules, 128 million satisfy Lipinski and Veber druglikeness rules and can be queried and downloaded from the website https://pccl.thesgc.org.

This druglike and inexpensive collection is as diverse as commercial catalogs in terms of physicochemical properties, three-dimensionality, and chemical scaffolds, while its overlap with existing libraries is almost non-existent.

Opening virtual screening to molecules accessible via novel chemistry invented in the public or private sector can explode the boundaries of the accessible chemical space in drug discovery and other fields. The Pan-Canadian Chemical Library showcases the potential of integrating academic ingenuity and in silico compound generation to extend the frontiers of chemical exploration. It may also serve as a valuable resource for the development of pharmacological modulators for every human protein by 2035, a goal set by the Target 2035 initiative to explore the unknown biology of the dark proteome and reveal novel opportunities for precision medicine9,10.

Methods

Chemical reactions

The pilot version of the PCCL was created from six unique chemical reactions. For each reaction, a set of information was requested as part of the workflow:

  • Inclusion patterns, determined from the 2D diagram of the chemical reaction in the form of reagent A + reagent B - > reaction product, where the reagents are building blocks with specified functional groups involved in the chemical reaction.

  • Global exclusion patterns, to exclude functional groups or structures incompatible with the chemical reaction or reaction intermediates, to be applied to all reagents.

  • Reagent-specific exclusion patterns, to exclude incompatible functional groups or structures in each reagent, and to describe more precisely what is and is not allowed for each R-group.

Inclusion patterns, exclusion patterns, and the chemical reaction were encoded in SMARTS format, which enables the specification of chemical patterns for each atom or group of atoms. In addition, up to 40 global exclusion rules from ZINC patterns11 were added systematically to avoid reactive and unstable functional groups,by first removing from the list the groups corresponding to the chemical reaction studied on a case-by-case analysis.

Finally, for each reaction, up to 100 compounds were selected using a MaxMin algorithm using ECFP-4 2048 bits fingerprints and Tanimoto coefficient to produce a representative collection of 100 reaction products and their respective reagents. The collection was then further visually inspected by chemists who flagged incompatible reagents, leading to additional exclusion filters. After two or three such curation cycles, no chemical outliers were found, and the full library was enumerated.

Reactions from the Batey lab, University of Toronto, ON

Chemical reactions from the Batey lab produced β-keto-imides12,13, 5-amino-thiatriazoles14, and 5-amino-tetrazoles15,16. β-Keto-imide products were enumerated from dioxinones and primary and secondary amides (Fig. 1A, Table 1). Given the low number of dioxinones commercially available, we added an intermediate one-component reaction to obtain them from β-keto acids, including O-tert-butyl, O-methyl, O-ethyl and O-benzyl protected acidic groups.

Fig. 1
figure 1

The Batey lab simplified chemical reactions schemes.

Table 1 The Batey lab inclusion filters and chemical reactions as SMARTS strings.

5-Amino-thiatriazoles were enumerated from primary amines, secondary amines, or amino acid derivatives in a one-component chemical reaction (Fig. 1B, Table 1). This reaction included a single variable reagent and led to a small collection of only 7,410 compounds commercially available in the Zinc20 database of 1.4 billion compounds17.

5-Amino-tetrazoles were virtually synthesized from primary or secondary amines and isothiocyanates (Fig. 1C, Table 1).

Reactions from Wood lab, University of Winnipeg, MB

The reaction submitted by the Wood lab is the Truce-Smiles rearrangement, generating aryl-containing products18,19,20 (Fig. 2, Table 2). In this reaction, the Ar group of reagent A must be any aromatic ring and Z-H either a primary amine, an alcohol, a thiol or a primary sulfonamide group. The R1–X group of reagent B represents an acyl halide group (chloride, bromide or iodide), with the carbon ideally positioned within three to five consecutive atoms next to the electron-withdrawing group EWG. Given the configuration of reagents A and B, multiple SMARTS were developed. Reagent A was defined using either the primary amine, alcohol or thiol in the first case, or the primary sulfonamide in the second case. Reagent B was defined by the number of additional carbons between the acyl halide carbon and the central carbon, with 0 to 2 additional sp3 carbons bound to 2 hydrogens. In addition, another subdivision was required to differentiate reagents B with R2 as a hydrogen atom, leading to non-chiral compounds, from reagents with other R2, leading to chiral compounds.

Fig. 2
figure 2

Wood lab simplified chemical reaction scheme.

Table 2 Wood Research lab inclusion filters and chemical reactions as SMARTS strings.

The specificity of the Truce-Smiles rearrangement is the inversion of the R1 group with the additional carbons in the final product19. As it was not possible to create a single SMARTS for all types of R1, 12 chemical reactions coded in SMARTS format had to be created based on the 2 different conditions for reagent A and 6 different conditions for reagent B.

Reactions from the West lab, University of Alberta, AB

The reactions proposed by the West lab are [2 + 2]- and [4 + 2]-cycloadditions, generating bicyclooctenes and bridged tricyclic products via the generation of cyclic allenes21,22,23,24 (Fig. 3, Table 3). These reactions require the same reagent A: 1,2-acyloxycyclohexadienes. However, as this family of compounds is not commercially available in sufficient diversity, it is necessary to synthesize them upstream from anhydrides or acyl chlorides24.

Fig. 3
figure 3

The Westgroup simplified chemical reaction schemes.

Table 3 The West group inclusion filters and chemical reactions as SMARTS strings.

In the case of the [2 + 2]-cycloaddition, reagent B is a styrene or an electron-deficient olefin (Fig. 3A). To consider all possible cases, reagent B was separated into two categories, whether it contains one (1-substituted with R2 as H) or two (1,1-substituted with R2 ≠ H) substituents.

The case of the [4 + 2]-cycloaddition is more complex, as several families of reagent B can be accepted depending on the type of the atom X in the 5-membered ring (Fig. 3B). Reagent B can be either a furan, a cyclopentadiene, or a pyrrole, where X is an oxygen, carbon or nitrogen-based substituent respectively. In addition, some reagents may be incompatible if they are too sterically hindered in positions R1 and R3. To provide several sets of enumerated compounds according to their hindrance, all families of reagents B were divided into three categories, where R1 and R3 are both hydrogens atoms, R1 or R3 is a hydrogen atom, and neither is a hydrogen.

As a result of the many variations in reagents A and B, there are a total of 4 chemical reactions encoded into SMARTS strings for the [2 + 2]-cycloaddition, and a total of 6 for the [4 + 2]-cycloaddition.

Building blocks

We searched the Zinc database on the Arthor website (arthorbb.docking.org) to identify compatible building blocks for each chemical reaction17. This database, updated in the first quarter of 2022, categorizes commercial building blocks based on their availability and price across five groups25.

  • The BB-50 group includes in-stock building blocks with the best combination of price and delivery speed.

  • The BB-40 group includes second tier in-stock building blocks.

  • The BB-30 group includes in-stock building blocks with information that cannot be accurately verified.

  • The BB-20 group includes make-on-demand building blocks, with delivery around 6 weeks and a price above 500 USD per 100 mg.

  • The BB-10 group includes make-on demand building blocks with delivery around 6 weeks and a price above 1000 USD per 100 mg, as well as expensive in-stock building blocks.

To facilitate the process, we organized these different groups into different categories, “cheap” and “expensive”. The cheap category includes affordable in-stock building blocks from groups BB-50 and BB-40. The expensive category includes all other affordable in stock compounds, make-on-demand and expensive in-stock building blocks from groups BB-30, BB-20, and BB-10.

Building blocks downloaded from Arthor were then subjected to exclusion filters using RDKit26. In addition, building blocks were limited in size to 40 heavy atoms. All reagents were saved in SMILES format.

Enumeration and physicochemical descriptors

The 2D enumeration of the chemical libraries was performed using python3 scripts based on RDKit. With the help of python3 multiprocessing library, this step was executed on a large-scale using computing resources from the Digital Research Alliance of Canada (DRAC). All reagent SMILES files were divided into groups of up to 2,000 building blocks, to divide the enumeration into 48 or 64 CPU threads depending on the DRAC cluster used. Physicochemical parameters were generated using the QED module27. Structural alerts were processed using the RDKit FilterCatalog module. In this study, we applied the Pan assay interference patterns PAINS, separated into three sets PAINS A, PAINS B and PAINS C, to identify compounds that can interact non-specifically and give false positive results28, the Brenk filters to flag unwanted functionality due to potential tox reasons or unfavorable pharmacokinetics29, and the NIH filters to annotate compounds with reactive or undesired functional groups as well as fluorescent compounds30,31.

Some of the physicochemical parameters were used to apply drug likeness rules, including Lipinski’s rule of five32 and Veber’s rule33. The output included all the parameters used to define the druglike subset, Fsp3, QED27, structural alerts, InChiKey and reagents identifiers.

Additional modules were developed to provide information on Bemis-Murcko scaffolds to assess scaffold and structural diversity34, principal moments of inertia with the normalized ratio NPR1 and NPR2 to assess the shape of the compounds35, and the partitioning of InChiKeys into several files for chemical identity searches with other databases. The principal moments of inertia were performed based on the method described by Irwin et al.11. Using RDKit, the distance-geometry-based conformer generator EmbedMolecule was used to quickly obtain three-dimensional conformations, and the rdMolDescriptors module generated the NPR1 and NPR2 parameters. The data was then binned using pandas and numpy libraries in 200 × 200 bins for better data management and graph observation. The Bemis-Murcko scaffolds were generated using the MurckoScaffold.GetScaffoldForMol function from RDKit. Statistical analysis and overlap between different libraries were performed using the pandas library36. Finally, an InChiKey partitioning was generated, by registering the InChiKeys in different directories and files based on their number of heavy atoms and the first two letters of the InChiKey. The presence or absence of the compound in another library was then verified using the bash function grep from a python3 script running in parallel on up to 64 CPU threads.

PostgreSQL/RDKit data management and website development

All cheap and druglike compounds from the PCCL were enumerated and stored in a PostgreSQL database with native RDKit cartridge implementation. From the import of a molecule in SMILES format, a PostgreSQL database can efficiently generate a wide range of molecular descriptors, manage substructure and similarity searches from fingerprints also calculated by the database, or generate 2D pictures in a SVG format.

Cheap druglike PCCL compounds were imported in the database from a list in CSV format including the SMILES string, the identifier given by the compound during the enumeration, the physicochemical parameters generated to filter the druglikeness of the compounds, the calculated Fsp3 and QED, and the ZINC identifiers of reagents. For greater practicality and scalability, each chemical reaction was separated into distinct tables.

A website available at https://pccl.thesgc.org/ was developed using a combination of HTML, JavaScript and PHP to make the cheap and druglike compounds database accessible to the scientific community. Users can visualize and download in smiles format any list of compounds satisfying their specified structural queries (drawn with the javascript applet JSME Molecule Editor37), physicochemical or QED descriptor restrictions. Descriptors statistics and plots for all chemical reactions are also made available using the JavaScript charting library Chart.js38.

Data Records

The 127.5 million compounds of the Pan-Canadian Chemical Library, composed of druglike compounds affordable to synthesize, can be explored at https://pccl.thesgc.org/, and can be downloaded from Zenodo at https://zenodo.org/records/1137191939. The PCCL library hosted on Zenodo is split by reaction, then by number of heavy atoms. Two types of files are available in zip archives:

  • The SMILES format files (delimited by a tab character), with the SMILES string and their product name.

  • The CSV format file (delimited by comma characters), with all the information generated during their enumeration: ZINC ID of reagents, druglike properties, and purchasability.

The purchasability value is defined by two integers:

  • 1 for products only composed of BB-50 reagents.

  • 2 for products composed of at least one BB-40 reagent, in combination with one BB-50 or one BB-40.

Detailed inclusion and exclusion filters, as well as the encoded chemical reactions, are all available in the GitHub repository https://github.com/cbedart/PCCL in the “PCCL_reactions” section. For each chemical reaction, two types of files are available.

The “reagents” text files, with the names formatted as “REACTION_Reagents.txt”, containing:

  • The synthon SMARTS with an associated Synthon ID for each type of reagent used.

  • The symmetric synthon SMARTS filter in the case of symmetrical reagents.

  • Synthon-specific exclusion SMARTS filters for each Synthon ID.

  • Reaction tags for each Synthon ID.

The “reactions” text files, with the names formatted as “REACTION_Reactions.txt”, containing the reaction SMARTS, an associated Reaction ID, and the mapping of the chemical reactions using the reaction tags for each Synthon ID defined in the “REACTION_Reagents.txt” file.

Based on the information provided, all the 148 billion compounds can be enumerated.

Technical Validation

Composition of the database

The construction of the pilot version of the Pan-Canadian Chemical Library was initially focused on β-keto-imides, 5-amino-thiatriazoles, and 5-amino-tetrazoles, Truce-Smiles reaction products, bicyclooctenes, and bridged tricyclics. The library enumeration was based on the ZINC building blocks via the Arthor database, where a total of 165.2 million compatible building blocks with a maximum of 40 heavy atoms were identified, including 1.9 million low-cost compounds. Following the use of the exclusion rules defined above, reagents not compatible with each chemical reaction were removed, resulting in a total of 76.8 million compatible building blocks, 736,639 of which were low-cost (Table 4). Building block availability was highly variable across all chemical reactions, ranging from 305 reagent Bs for Truce-Smiles reactions to 40,091,545 reagent As for 5-amino-thiatriazoles.

Table 4 Number of commercially available building blocks for each chemical reaction from Arthor database, after filtering with exclusion filters.

Using commercially available building blocks, a total of 148 billion compounds were enumerated, including 401 million cheap compounds (Table 5).

Table 5 Number of enumerable compounds for each chemical reaction.

Enumeration of Cheap/Druglike subsets

A druglike library of 127.5 million compounds accessible with cheap reagents was compiled using the Lipinski and Veber rules described above, stored in a postgreSQL/RDKit database, and made available on https://pccl.thesgc.org (Table 6). The distribution in physicochemical descriptors varies depending on the chemical reaction used to enumerate the library (Fig. 4A). In particular, [2 + 2]- and [4 + 2]-cycloadditions produce larger compounds due to the large core scaffolds created during the reactions. At the opposite end of the molecular weight spectrum, 5-amino-thiatriazoles are smaller as they involve a single building block.

Table 6 Number of cheap and druglike compounds for each chemical reaction.
Fig. 4
figure 4

(A) Distribution of physicochemical descriptors for each enumerated library (B) Main physicochemical parameters distribution for the druglike subset of the PCCL (in green), Enamine REAL (in orange), and druglike-filtered SAVI 2020 (in blue) databases.

Comparison with enamine REAL and SAVI databases

The main goal of the PCCL is to open new chemical spaces not covered by existing chemical libraries for applications in chemical biology, drug discovery or other fields. To evaluate its chemical diversity, we compared this first version of the PCCL with two ultra large commercial and academic libraries, Enamine REAL and the Synthetically Accessible Virtual Inventory (SAVI) respectively (Table 7). We used the June 2023 version of Enamine REAL containing 6 billion druglike molecules and the April 2020 version of SAVI, a library developed by the NIH National Cancer Institute, with 1.75 billion compounds. Since not all SAVI compounds were druglike, we filtered the library with the same scripts and rules used to create the druglike subset of the PCCL, leading to a SAVI library of 1.4 billion molecules.

Table 7 Data for Enamine REAL and SAVI 2020 databases compared to the cheap and druglike subset of the PCCL.

Using RDKit filter catalogs, we evaluated the proportion of compounds flagged as problematic in each chemical library. The percentage of compounds flagged by the various filters was similar in the PCCL, while Enamine REAL fared better on the various structural alerts. For instance, 2.55% of PCCL compounds and 2.77% of SAVI compounds were flagged as PAINS, compared with 0.29% of Enamine REAL compounds (Table 7 - Filters).

Physicochemical statistics

Using the same methods as above, we compared the distribution of the main physicochemical descriptors across the different libraries. A significant difference in terms of molecular weight distribution between Enamine REAL, SAVI, and the PCCL was observed. Enamine REAL seems to offer a large majority of compounds with a molecular weight below 400 Da, that can be functionalized in hit-to-lead processes while remaining within the limits of Lipinski’s rule of five. The filtered SAVI library also features a majority of compounds below 400 Da. By analyzing the building blocks used by SAVI on their website, this distribution is achieved through the use of small building blocks, with an average weight of 212 Da and 13.5 heavy atoms40. While still satisfying Lipinski and Veber rules, compounds from the PCCL are larger and synthesized from building blocks with an average weight between 230 and 290 Da, and an average heavy atom count between 16 and 20, depending on the chemical reactions (Fig. 4B). Molecules with lower molecular weight are typically better chemical starting points for lead optimization, but larger compounds may be necessary to generate hits for challenging proteins with shallow binding sites. Importantly, the number of hydrogen-bond donors in PCCL compounds remains low, a necessity as, unlike other Lipinski boundaries, a maximum of five hydrogen bond donors is a limit that cannot be transgressed41.

Three-dimensional properties

The three-dimensional shapes of every chemical library were analyzed using the normalized principal moments of inertia (PMI) ratios NPR1 and NPR235, leading to 2D plots of chemical libraries where the top-left corner represents one-dimensional rod-like molecules, the bottom is populated with planar compounds and the top-right corner is filled with three-dimensional molecules (Fig. 5). The PCCL covers the same disc-shaped and rod-shaped areas, at the top left corner of the PMI triangle. The main benefit of the PCCL library compared to Enamine REAL is the proportionally different coverage of highly three-dimensional spaces, historically underrepresented, to the sphere-shaped area at the top right corner.

Fig. 5
figure 5

Molecular shape distribution of PCCL, Enamine REAL, and druglike-filtered SAVI, from a Principal Moments of Inertia analysis leading to the calculation of normalized PMI ratios NPR1 and NPR2. A pixel corresponds to a specific percentage of the database defined by its color.

Chemical diversity and novelty

To assess the chemical diversity of the cheap and druglike PCCL, its Bemis-Murcko Scaffolds composition34 was compared to that of other libraries (Table 8). We found that the PCCL and SAVI druglike collections had on average 14 compounds per Bemis-Murcko scaffold, compared with 17 for Enamine REAL, reflecting a modest increase of 20% in the diversity of the PCCL and SAVI libraries. This difference correlates with the average number of compounds produced per reaction, which is 21.4 million for PCCL and 36.2 million for Enamine REAL. This also indicates that on average, a slightly wider range of analogs should be available for any given hit compound from Enamine REAL. But we envision that if a hit is identified from the cheap PCCL, analogs could also be sought after from the much larger set of >150 billion less affordable PCCL compounds. The Bemis-Murcko Scaffold composition of this collection was not analyzed due to its overwhelming size, but since it is generated from the same set of six chemical reactions, we expect that it would include a wide range of analogs for any molecule from the cheap and druglike PCCL set.

Table 8 Bemis-Murcko Scaffolds for the druglike subset of the PCCL, Enamine REAL, and druglike-filtered SAVI 2020 databases.

The chemical novelty of the PCCL was first assessed by calculating the overlap of its Bemis-Murcko scaffolds with the other libraries (Table 9). The overlap in chemical scaffolds is clearly negligible: 0.29% of scaffolds found in the cheap and druglike PCCL are also found in Enamine REAL, and 0.25% in the druglike SAVI collection. This in contrast with a significant overlap between the other two libraries, where 21.57% of SAVI scaffolds are also found in Enamine REAL.

Table 9 Comparison of the number of Bemis-Murcko Scaffolds shared between two chemical libraries.

To confirm the chemical novelty of the PCCL, we used InChiKey representations of the molecules to determine the presence or absence of each fully enumerated cheap and druglike PCCL compound in the Enamine Real and druglike SAVI collections (Table 10). This analysis reinforced the previous results: only 21,581 out of 128,207,251 PCCL compounds can be found in Enamine REAL, and only 33,050 in SAVI, representing an overlap below 0.03% in both cases. Limitations in computing power precluded us from comparing the SAVI set with the 6 billion REAL compounds, but we were able to conduct the analysis with the 2020 version of Enamine REAL containing 1.2 billion compounds. Here, we found 142.8 million identical molecules, representing an overlap of 11.9% between Enamine REAL and SAVI libraries. This probably reflects the fact that numerous chemical reactions used to generate SAVI are underlying the Enamine collection, such as Hartenfeller’s collection of chemical reactions42. Together, these results confirm that a library such as the PCCL, derived from chemical reactions that are underexplored in medicinal chemistry, opens-up a novel and diverse chemical space for drug discovery.

Table 10 Comparison of the compounds shared between two chemical libraries.

Synthesis success rate

The average success rate for the chemical synthesis of PCCL compounds is not well defined. We anticipate that in some cases (such as reactions from the Batey lab above), it is close to the ~80% success rate provided by commercial vendors43,44, but we expect that it will vary from one reaction to another. A mechanism that may be implemented in the future would be to synthesize 50 or more representative compounds to experimentally evaluate synthesis success rate before any new reaction is added to the PCCL.

Usage Notes

We envision that the primary use of the PCCL is the discovery of hit molecules for challenging target classes where other libraries have failed to deliver a chemically tractable hit. As more chemical reactions underexplored in medicinal chemistry are incorporated, we expect that the PCCL will grow in the trillions of molecules. The more limited cheap and druglike collection will probably reach billions of compounds. Given the low experimental confirmation rate of computational hit candidates, we anticipate that primary virtual screening will focus on this smaller, more affordable set, while hit expansion could benefit from the full PCCL collection.

Even with relatively modest computing resources, modern AI-accelerated or synthon-based virtual screening techniques (where the synthons rather than the combinatorially enumerated library are screened and then assembled) are well adapted to screen such ultra-large libraries. One example is the hierarchical structure-based screening, introduced by Zhou et al. in 200945, and made popular by the V-SYNTHES software developed by Sadybekov et al. in 202146. To facilitate the application of synthon-based screening to the PCCL, we developed SATELLiTES (Synthon-based Approach for the Targeted Enumeration of Ligand Libraries and Expeditious Screening), a freely available software available at https://github.com/cbedart/SATELLiTES that requires chemical reactions in SMARTS format as input and generates virtual-screening-ready collections of commercially available synthons where the reactive functional group is replaced by a simple chemotype of choice, such as a methyl group (to be published). Synthon hit candidates are then automatically combined by SATELLiTES into small collections of fully enumerated molecules for rapid virtual screening.

We hope that the PCCL will prove a successful and convincing paradigm where chemical reactions developed in academia or the industry that are typically overlooked in large commercial libraries are used to open uncharted areas of the chemical space for virtual screening, with potential applications in drug discovery, material sciences and other fields. While our choice to focus here on Canadian chemistry groups is meant to facilitate operations and driven by the nationally fragmented nature of funding mechanisms in academia, the process could in principle be expanded across borders. Ideally, future breakthroughs in computational hit prediction, maybe driven by artificial intelligence and revealed by benchmarking challenges such as CACHE47, will turn this novel library screening paradigm into a well-established modus operandi.