Plasmodium species cannot salvage preformed pyrimidine bases for nucleic acid synthesis as its human hosts can. Dihydroorotate dehydrogenase (DHODH) is a critical enzyme in the de novo pyrimidine synthesis pathway in the parasite and, thus, a potential target for antimalarial drug therapies [7, 8]. The enzyme from Plasmodium species is located in the mitochondrion and utilizes ubiquinone (also known as coenzyme Q) as a cosubstrate.
Predicting DHODH inhibition
Data on PfDHODH inhibition were collected from the literature for a structurally diverse group of compounds. These included triazolopyrimidines [7, 9], diphenylureas , and 2-cyano-3-hydroxy acrylanilides , among others [12,13,14,15]. Quantitative data were not available for inhibition of the intact enzyme, which is bound to the mitochondrial membrane by its hydrophobic N-terminus in vivo . A total of 89 of those compounds had been assayed against the soluble recombinant form of PfDHODH (s-PfDHODH) from which the hydrophobic “tail” has been removed. The very low solubility of CoQ-10 limits its utility in vitro, so an artificial ubiquinone (dodecylubiquinone [DQ]) is usually used as substrate instead.
The IC50 data for these 89 compounds were converted to inhibition constants (Ki’s) by assuming competitive inhibition against DQ and applying the method described by Cheng and Prusoff . Two artificial neural network ensemble (ANNE) regression models were constructed (henceforth referred to as Model A and Model B) using ADMET Modeler™ module of ADMET Predictor® . The models were built on distinct 20% hold-out test sets (see Supplementary Methods for details). That and the use of different random number seeds led to somewhat different descriptor sets, complexities and performance statistics (Table 1). The two models complemented each other well, with Model A yielding more conservative activity estimates (fewer false positives) and model B being more sensitive (more prone to generating false negatives). Hence, using them together to identify potent compounds worked better than using either model alone. Plotting the observed versus predicted values from the models illustrated the ability of both models to adequately predict the Ki values from both the training set that was used to build the models and the test set used to validate the models (Fig. 1).
Selecting a lead series
The developed QSAR models were used to analyze structures for 13,533 antimalarials identified in a phenotypic screen of nearly 2 million compounds that was carried out by GlaxoSmithKline (GSK). The published data included P. falciparum growth inhibition for the chloroquine-sensitive 3D7 (PubChem AID 2306) as well as for the multidrug resistant strain Dd2 (PubChem AID 2302). Data from a counter screen for mammalian cytotoxicity was available from PubChem AID 2303 .
There were some important limitations in working with this data set. First, evaluating efficacy against parasites cultured in human erythrocytes provides no direct information on a compound’s mode of action. Secondly, only results for the active compounds (those that inhibited parasite growth by at least 80% at a concentration of 2 μM) were reported. The lack of data on inactives posed a considerable risk of synthesis resources being wasted on seemingly “novel” analogs that in fact had already been synthesized but not reported due to their lack of activity. Lastly, many of the compounds in the screening data set exhibited undesirable physical properties and/or undesirable substructures (e.g., multiple halogenated thiophene rings) that would cause them to be avoided by many pharmaceutical companies .
Candidate lead series from the active compounds in the phenotypic screening data set were identified and structural classes were generated in MedChem Studio™  based on shared substructures. Unlike most other approaches , the method used in MedChem Studio does not cluster compounds (in this case, actives) into exclusive subsets based on the degree of pairwise overlap in their substructural fingerprints. Rather, it generates nonexclusive classes of molecules that share a maximal common substructure. In most cases, this shared substructure is, itself, a scaffold in the medicinal chemistry sense. If not, it can readily be modified to become one. The nonexclusivity is advantageous for lead series identification because it is typically easier to see cases where overlapping classes can profitably be combined by merging the scaffolds that define them.
Classes were evaluated as potential lead series based on the QSAR-predicted inhibitory potencies against PfDHODH and the degree of parasite growth inhibition observed in the phenotypic screen for chloroquine-sensitive P. falciparum 3D7 (PubChem AID 2306). ADMET Risk™, which indicates how many of an array of ADMET property rules are violated by a compound , was also taken into consideration. The rule thresholds are calibrated against a subset of the World Drug Index (WDI) that is enriched in orally delivered commercial drugs similar to that used by Lipinski, et al., for anticipating solubility and oral absorption problems . Ninety percent of the WDI reference set have an ADMET Risk score less than seven.
Low group averages were preferred for both predicted Ki and ADMET Risk, but the spread in each profile was also important. Lack of variation in potency indicates a “lack of SAR”, which makes it difficult to enhance activity against the target by modifying structure. Similarly, a lack of variation in ADMET Risk suggests that it will be difficult to engineer out potential liabilities by changing structures. The exception would be a class wherein all ADMET Risk scores are very low, and that was never the case in this dataset.
Results for three classes at an intermediate stage of the series definition process are shown in Fig. 2. Two of the classes—diphenyl ureas (DPUs) and triazolopyrimidines (Tzs)—represent PfDHODH inhibitor lead series that had already been explored or were being actively explored by other research groups [24, 27].
The 2-(3-aminopropylamino)-4-quinolone (APAQ) class was particularly promising because its scaffold bore a distinct resemblance to ubiquinone, which is a cosubstrate for PfDHODH in vivo (Fig. 3). They also resembled N-hydroxy-2-dodecyl-4-quinolone (HDQ) (Fig. 3) which Dong et al. had determined to have good antimalarial activity, with PfDHODH as its primary target . Although they reported that HDQ inhibited parasite growth in culture and were able to show that PfDHODH was the site of action, they did not have access to soluble enzyme. Hence the XC50 obtained in culture could not be translated into an in vitro Ki for the enzyme. HDQ was therefore not included in the data set used to build the PfDHODH inhibition models. Two quinolones found among the “hits” reported by Patel et al.  were initially grouped with some of the APAQs. These lack the defining 2-amino substitution and so did not appear in the final lead series, which was comprised of 113 actives.
None of the APAQs were reported to be cytotoxic to the mammalian HepG2 cells used in the companion AID 2303 screen, which suggests that the intrinsic risk of mammalian toxicity is small. That said, many of them had discouragingly high ADMET Risk scores. In particular, the two most active examples (Fig. 3) violated 7 and 9 of the standard ADMET Risk rules, respectively. Both are large and were predicted to be excessively hydrophobic, to be highly bound in plasma, to exhibit low water solubility, and to be acutely toxic to rats. In addition, CID 44535189 has an excessively large number of rotatable bonds and was predicted to exhibit chronic toxicity in mice. Besides the five likely liabilities shared with CID 44535189, CID 44534046 was predicted to be metabolized by CYP2D6 and CYP3A4, to inhibit CYP3A4, and to bind to the human ether-a-go-go (hERG) gene product.
Analog generation and selection
The predictions of high lipophilicity, low water solubility, high susceptibility to metabolism by cytochrome P450s (especially CYP2D6), and hepatotoxicity were common among the actives in the APAQ class. Substituents from the less risky APAQs were combinatorially reshuffled in silico to generate lead candidates with more favorable activity and property profiles. Substituents from each R-group were drawn from the known actives in the class. Doing so increases the likelihood that the structures produced will be synthetically accessible, since at least one example already exists where that substituent was placed at that position. In addition: only single substitutions were allowed at the distal nitrogen; only simple substituents (hydrogen, halogen, methyl, or trifluoromethyl) were allowed on the quinolone ring; and substitution was restricted on the distal aromatic rings. The resulting constrained combinatorial R-group explosion produced a virtual library of ~ 99,000 novel analogs.
Attractive lead analogs from the virtual library were selected based on their predicted PfDHODH inhibitory power in both QSAR models, their predicted ADMET properties, and their expected ease of synthesis. Compounds for which the predicted PfDHODH Ki for Model A was greater than 0.1 µM and with ADMET Risk > 4 were set aside, as were compounds that were out-of-scopeFootnote 1 with respect to Model A. Though Model B was useful for identifying good lead series, it failed to provide much further discrimination between APAQ analogs, hence whether its prediction for a particular molecule was out-of-scope or not was disregarded.
As is often the case, descriptors for some of the compounds ultimately selected for synthesis based on activity predictions fell well outside the range of descriptors for compounds used to train one or more of the metabolism models available at the time. Predictions for such compounds were considered “out-of-scope” extrapolations and, therefore, not necessarily reliable. Compounds with in-scope predictions were favored in selecting candidates for further analysis but ignoring those for which any prediction was out-of-scope was impractical. Such limitations in coverage are common for any novel chemistry, but they are also a major motivation for continuously refining and expanding ADMET models to improve their predictive performance.
Once analogs predicted to have relatively low potency or discouraging ADMET properties had been filtered out, approximately 34,001 analogs still remained. Of those, 17,334 bore symmetrical central diamine moieties, a feature expected to greatly simplify synthesis. ADMET Predictor now includes a synthetic difficulty score analogous to that described to the synthetic accessibility score described by Ertl and Schuffenhauer . That functionality was not available when this project began, however, so assessment was carried out manually by visual inspection of analogs displayed in a tile view, i.e., in a gridded rather than a row format. In retrospect, the manual process was reasonably effective. The synthetic difficulty scores for the symmetrical diamine analogs ranged from 1.95 to 5.0 (median 3.62, where 10 is most difficult) and the final dozen candidates (vide infra) ranged from 2.5 to 4.15 (median 3.35), indicating that the final candidates were fairly representative of the initial pool in terms of synthetic accessibility.
Marginal aqueous solubility was a central concern that had to be balanced against model predictions that substitution at the central carbon of a 1,3-diaminopropyl bridge should improve activity and reduce CYP metabolism. This and other ADMET Risks prompted us to manually expand the library by introducing small point changes to some marginal analogs. One such change was “mutating” a gem-cyclohexyl ring at the central carbon of the aminopropylamino bridge to a gem-cyclopentyl ring. No such compound existed among the GSK actives, though there were several derived from trans 1-amino-2-aminomethylcyclopentane [19, 27]. A 1,2-ring is expected to adopt a quite different 3D conformation from that for a 2,2-ring, thereby placing the terminal group at quite different positions in space.
After the initial set of lead compounds were selected, their expected pharmacokinetic profiles were evaluated by applying the in silico predictions and standard human physiologies in GastroPlus®  to predict plasma concentration-versus-time profiles. Property estimates were taken from ADMET Predictor and dosing regimens were similar to those used for existing antimalarial drugs, e.g., chloroquine. Target plasma concentrations were based on predicted Ki values.
An initial set of diverse synthesis candidates (Supplementary Table S2) was selected from among those predicted to have good potency as well as an acceptable ADMET Risk score and favorable simulated in vivo PK profiles. The twelve synthesis targets produced by this process had structural variations at three different positions: the R1 substituent on the quinolone end of the scaffold; the R2 and R3 substituents off of the distal nitrogen of the aminopropylamino bridge; and the R4 and R5 substituents off of the central carbon in the bridge. The beginning of the reduction-to-practice phase was announced in a press release in September 2011  and bids were solicited from several contract synthesis companies. Four of the targets distributed are shown in Fig. 4.
The individual compounds proposed were synthetically feasible, but the number of separate intermediates and variations in reaction conditions required would have been prohibitively expensive. Instead, variations were restricted to either end of the molecule and we focused on a single kind of aminopropylamino bridge. Doing so made it likely that similar reaction conditions and intermediates could be used to make multiple analogs. The convergent synthesis plan greatly reduced the potential for complications and budget overruns. Though not often dwelt on in the literature, such considerations are often a critical practical consideration for any molecular design project; hence their explicit inclusion here. In the end, we settled on seven analogs built around a simplified scaffold bearing the novel gem-cyclopentyl group at the central carbon of the bridge. Diversity was provided by placing three different substituents on the distal nitrogen (i.e., R2) (Fig. 5).
Kalexsyn, Inc. (Kalamazoo, MI) synthesized a revised list of seven target analogs. Carbonyl precursors suitable for reductive amination were available or accessible for some of the desired R2 substituents, considerably simplifying their synthesis.