Background

While recently progress has been made towards the reduction of malaria related morbidity and mortality, an increasing at-risk population and insecticide resistant vectors hamper eradication efforts. In 2015, nearly half of the world’s population (3.2 billion people) were at risk of contracting malaria, and 97 countries and territories had ongoing malaria transmission [1]. Most malaria cases and deaths occur in sub-Saharan Africa [1]; however, Asia, Latin America, and, to a lesser extent, the Middle East and parts of Europe are also at risk. In particular, the emergence and spread of drug-resistant parasites continues to limit the effective lifespan of current anti-malarial drugs [2].

Drug resistance has reduced the effectiveness of previous standard therapies for malaria, such as chloroquine and sulfadoxine–pyrimethamine [3, 4]. Today, artemisinin-based combination therapy (ACT) is frequently utilized, nearly universally in endemic regions, to reduce the selection of drug-resistant parasites [2]. Currently, five artemisinin-based combinations (ACT) are available for the treatment of uncomplicated Plasmodium falciparum malaria, namely artemether and lumefantrine, artesunate and amodiaquine, artesunate and mefloquine, dihydroartemisinin and piperaquine, as well as artesunate and sulfadoxine–pyrimethamine [5]. New combinations (such as artesunate and pyronaridine) have also recently been registered for use in particular countries [5]. In order to avoid drug resistance, combination therapies comprised of single agents with different modes of actions are usually indicated. However, this is not always true since among the ACT medicines, some combinations such as mefloquine and lumefantrine act on similar pathways to those of artemisinin-derived drugs [6]. Recently, reduced response to ACT has been observed which indicates an urgent need for new combination therapies [7].

In this regard, a high-throughput combination screening study has recently been performed for malaria to provide experimental evidence for the efficacy of combinations [6]. However, due to the combinatorial explosion of the number of combinations possible, using a computational rationale for optimal selection of compound combinations can save cost and effort by providing a way to prioritize subsets of combinations from many possible compound pairs. Such compound combination modelling approaches have recently emerged for other diseases, particularly cancer [8, 9]; however there have been relatively few studies focusing on malaria. The studies which have been published include in silico screening of targets and molecular docking to find novel anti-malarial agents [10,11,12], and a drug synergy visualisation tool for malaria [13]. The machine learning approaches applied in the malaria field are usually for diagnostic purposes [14], or have other aims such as improving docking scores [15] and identifying a malaria transmission model [16]. However, this excludes learning from large scale in vitro malaria screening data to predict novel anti-malarial combination therapies, which is the aim of this study. In this work, a novel systematic approach is proposed for identifying compounds that boost human response to severe malaria and combination of compounds that show synergy in the malaria in vitro system (Fig. 1). The former was achieved via a transcriptional drug repositioning approach for shortlisting single agents and the later was achieved with a machine learning approach trained on in-vitro malaria screening data.

Fig. 1
figure 1

Data analysis work flow for predicting active compounds against malaria. The computational approach benefits from both transcriptional drug repositioning and machine learning. a The transcriptional drug repositioning approach imports gene expression data from GEO dataset (GDS4259) and compares patients with severe malaria vs patients with mild malaria to get gene expression profile of malaria. Transcriptional drug repositioning approach developed in this work is then used to predict potentially active single agents. b The machine learning part is trained on a dataset of activity of 1540 compound combinations applied on three different malaria P. falciparum strains. Target prediction and pathway annotation is used to define the features. c All combinations of potentially active single agents were annotated with targets and pathways and used as a test set input to the machine learning model built. d Activity of all possible combinations were predicted and computationally validated. All possible combinations were also experimentally validated and full accuracy of the algorithm in practice was evaluated

Transcriptomic data has been used previously for drug repositioning [17, 18], as well as for the selection of combination therapies in other areas [19]. There have been some transcriptional drug repositioning studies published for infectious diseases; [20, 21] however it has not been previously employed for malaria to the best knowledge of the authors. The usage of transcriptional data for compound selection is generally based on the hypothesis that compounds that affect differentially expressed genes of a disease in the opposite way to the disease itself (‘anti-correlated gene signatures’) have therapeutic potential for treating that particular disease. Experimental evidence supporting this approach has been presented in several studies, such as the repositioning of the antiulcer drug cimetidine for lung cancer [22], and of the antiepileptic topiramate for inflammatory bowel disease (IBS) [22]. In the present work, the human response to severe malaria has been derived in a different way compared to previous studies, namely gene expression data of blood samples of children with severe malaria was compared to gene expression signatures of the same children presented 1 month later, with mild malaria symptoms. Next, 201,776 compound gene signatures extracted from the recently published LINCS database [23] were compared to the malaria severity signature. This leads to a rank ordered list of compounds with anti-correlated gene signatures with malaria severity signature.

In this work, apart from the novelty of the application of transcriptional drug repositioning to malaria, previous approaches are also extended by utilising pathway annotations and in silico target predictions [24, 25] to the compound selection step. In this way, both biological readouts and on target bioactivity information are used to increase the signal and to have a more accessible understanding of compound action at hand. While understanding gene expression signatures can be rather complex, involving the up- and downregulation of pathways as well as the activity on individual proteins can often be more readily understood.

To this end, the in vitro combination screening data from 1540 compound pairs against three P. falciparum laboratory strains (3D7, DD, HB3) were utilized as a training set for the machine learning approach. To integrate the transcriptional drug repositioning, the machine learning and pathway annotation approach, synergistic combination pairs were selected from the single agents that were discovered in the preceding transcriptional drug repositioning approach. The activity predictions of individual compounds, and subsequently compound pairs, have been followed by prospective experimental validation, which resulted in the identification of novel single agent and synergistic compound combinations against P. falciparum. Hence, the results suggest that this method may be useful for prioritising new drug combinations for treating malaria.

Methods

Data

Training data previously generated at NCATS [6] included 1540 combinations of 56 individual compounds measured on three different P. falciparum strains, namely 3D7, DD2 and HB3 (https://tripod.nih.gov/matrix-client/?p=183, with assay IDs: 1463, 1464, 1465). For each of the combinations, the γ measure was calculated to characterize synergy between the pairs as described previously [26]. Values for γ < 1 represent synergy, γ = 1 represents additivity and γ > 1 represents antagonism. Combination screening quality is characterized by a per-combination QC metric, whose value ranges from 1 to 18, where 1 represents the highest quality data and 18 represents lowest data quality. Only data points with QC values less than 3 were used for subsequent analyses.

Transcriptional drug repositioning approach

In order to select new compounds for screening, first a combined bioinformatics and cheminformatics approach developed earlier [27] was used to predict single compounds with activity against malaria. For this purpose GEO dataset [28] GDS4259 was downloaded using the GEOImporter [29] tool of GenePattern [30]. This dataset contains data derived from peripheral blood samples from five Malawian children with severe P. falciparum malaria, compared to the same children who presented with a mild case of malaria 1 month later [31]. This comparison gives rise to the gene signature of severe malaria, and it was used in this work as a disease signature. The controls were chosen as human host cells that have responded in a curative manner to the disease, and hence decreased disease severity. This choice was made to make sure that the host response is present in the malaria severity signature. In the next steps, it was searched for drugs to reverse this severity signature which is expected to change a state of severe malaria to convalescent and help boost the host response.

In order to find compounds that have strong negative connectivity (anti-correlation) with the gene signature of malaria severity, compound-gene expression profiles from the LINCS database (Phase I, GEO dataset GSE92742) [23] were utilized. This calculation was performed based on a modified Gene Set Enrichment Analysis (GSEA) [32]. Considering the top 50 and bottom 50 up- and down-regulated genes of every compound, the calculation derives a score describing the strength of connection of compound signatures with a given disease profile. The scoring system leads to a rank ordered list of compounds that are assumed to be able to reverse the gene expression signature present in the malaria severity signature. Fifty highly scored compounds were shortlisted as single agent candidates.

Machine learning approach

The machine learning part of the algorithm involved in-silico target prediction, pathway annotation, synergy model generation and computational validation as follows.

Target prediction

Firstly, protein targets of all the 56 compounds in the training dataset were predicted based on an updated version of target prediction algorithm developed previously [24]. The algorithm was retrained on compound/target pairs from ChEMBL [33] v17 and compound targets were predicted using a Laplacian Modified Naïve Bayes scoring system [25, 32]. Next, protein targets were predicted for the 50 shortlisted compounds from the transcriptional drug repositioning approach. The predicted protein targets with a score over 14 were selected as potential protein targets of each compound. The score cut-off gave rise on average to six targets for a compound, which is a reasonable number of targets for a ligand, based on previous analyses [34].

Pathway annotation

Next, pathways enriched by the targets of each compound were identified. For this purpose, 2010 human pathways and their associated genes from the Biosystems [35] database were extracted. The equivalent Entrez gene IDs of each protein target of compounds were retrieved using Biomart [36]. For each compound a list of, on average, six gene IDs that the compound is predicted to interact with (the ‘compound gene signature’) was obtained. Then, the number of shared genes between each compound gene signature and gene members of each of 2010 pathways were found and stored as a raw integer score in a sparse vector, resulting in a vector with 2010 values for each compound, termed the ‘pathway signature of a compound’.

Next, for each combination of compound ‘a’ and compound ‘b’ in the combination dataset, a descriptor vector was created by merging their compound pathway signatures using the following equation:

$$ p_{i,a,b} = \left( {P_{i,a} + 1} \right)*\left( {P_{i,b} + 1} \right) $$
(1)

where Pi,a and Pi,b are the ith value in compound ‘a’ or ‘b’ pathway signature which is the number of shared genes between ith pathway and compound ‘a’ or ‘b’ gene signature. pi,a,b is the ith value in the pathway signature of combination of compound ‘a’ and compound ‘b’. Each term was added with one to avoid pi,a,b = 0 in cases where one of Pi,a or Pi,b equals zero. In this way, it is avoided to filter out the effect of one compound on the pathway when the other one has no effect on that pathway. In other words, the above formula gives a high score to pathways that are shared between both compounds which is the product of the score of both compounds, while also not neglecting the pathways that are only hit by one compound.

Synergy model generation

Using the above formula, a training file was created for each pair of compound pathway signatures labelled with synergy or no synergy, depending on gamma and QC. Pairs of compounds with gamma value < 0.975 and QC ≤ 3 were marked as synergistic and gamma > 1.025 and QC ≤ 3 were marked as antagonistic. The 0.975 and 1.025 cut-offs for gamma were chosen to avoid the additivity window. All the data with QC > 3 was discarded. Next, the training file was used as input for a Random Forest [37] algorithm consisting of 200 trees as implemented in Matlab (2015B) using the TreeBagger [38] function. This algorithm was hence able to predict the synergy of a compound combination, based on the pathways that are hit by this given pair of compounds.

Computational validation for drug combination

As a computational step to validate the approach prior experiments, cross validation was used to evaluate performance of the methods. The number of true positives (TP), false positives (FP) and false negatives (FN) were used to calculate precision (\( \frac{\text{TP}}{{{\text{TP}} + {\text{FP}}}} \)) and recall (\( \frac{\text{TP}}{{{\text{TP}} + {\text{FN}}}} \)). The tenfold cross validation was applied on the original data which was yielding 66% precision, 61% recall and 61% for the F-measure over the two classes of synergy and antagonism, averaged over all three strains.

Prospective utilisation of the synergy model

Next, combinations of compounds in the shortlist produced using the above transcriptional drug repositioning approach were annotated with pathways using Eq. 1 and fed to the random forest model as a prospective test set (in order to predict likely synergistic compound combinations for this new set of compounds). The random forest model hence predicted synergies and antagonism for compound combinations, and predicted pairs were ranked by their probability of being synergistic. The top 17 compound combinations, representing 14 unique single agents, were selected for experimental validation as summarized in Additional file 1: Table S2.

Experimental validation of compounds for malaria

The P. falciparum parasite lines were maintained in in vitro culture conditions as described previously [39]. Methods for the SYBR qHTS and calculation of IC50 and definition of curve classes have been used as in previous work [39,40,41], as was the plating of compounds for combination screening using acoustic dispense methods [42]. As shown in the Additional file 2: Figure S1, classes of activity were assigned based on growth inhibition curves where − 1.1 shows a complete curve as well as high efficacy, − 1.2 represents a complete curve but only partial efficacy, − 2.1 symbolizes a partial curve which however exhibits high efficacy, − 2.2 represents a partial curve with only partial efficacy and − 2.3 represents a partial curve with high efficacy, but only poor curve fit. All HTS assays were read at 72 h. Percent response values shown in matrix heat maps represent relative growth as obtained from SYBR Green fluorescence intensity values normalized to controls. Synergy metrics for combinations were computed as described [42]. Data generated as part of the prospective validation can be accessed at https://tripod.nih.gov/matrix-client/?p=1261.

Results

Prospective validation of single agents

With the aid of the transcriptional drug repositioning approach, 14 single agents (structures are displayed in Fig. 2) were prioritized and subsequently tested experimentally for their activity against the 3D7, DD2 and HB3 strains of P. falciparum. The results of the screen are summarised in Table 1, where chromomycin-a3 (CHR), fulvestrant (FUL), apicidin (API), ingenol (ING) and tacrolimus (TAC) exhibited full inhibition in the dose response curves in low concentrations. Of those, chromomycin-a3, fulvestrant and apicidin were active in nanomolar doses in all the three strains tested, with the average AC50 values of 11, 67 and 78 nM. JX-401 (JX4), raloxifene (RAL), hydroxyzine (HYD), thioridazine (THI), KIN001-244 (KIN), PI 828 (PI8), megestrol (MEG) exhibited complete or partial activity in different individual strains. Monorden (MON) and chelidonine (CHE) exhibited partial or single dose activity across all strains and were not progressed further. All experimental results data can be accessed at https://tripod.nih.gov/matrix-client/?p=1261.

Fig. 2
figure 2

Chemical structures of predicted single agents against malaria. All the structures were experimentally tested in the P. falciparum screen (see Table 1 for results)

Table 1 Experimental validation of predicted anti-malarial single agent compounds

Prospective validation of drug combinations

Thirty five compound pairs were selected based on synergy prediction model as well as single agent screening data and were experimentally tested against P. falciparum strains of 3D7, DD2 and Hb3. Table 2 represents predicted probability for synergy of each of combinations in each strain (calculated before experiments) as well as synergy metric, γ [26] values, (calculated after experiments). Among 35 combination pairs that have been tested in this study, 28 were predicted to be synergistic at least in one strain. Among those 28 predicted pairs, 27 were showing mild-strong synergy, 17 were showing moderate-strong synergy and 8 were showing strong synergy at least in one strain. As one can see in the Table 3, only one of five currently used ACT medicines (mefloquine–artesunate) shows strong synergy on all the three strains and one (amodiaquine–artesunate) has strong synergy only on the HB3 strain. Precisions and recall measures were utilized to measure the accuracy of synergy predictions compared to experiments, the results of which are shown in Table 4. For this purpose various gamma cut-offs were used to signify mild-to-strong (gamma ≤ 0.995), moderate-to-strong (gamma ≤ 0.975) and strong (gamma ≤ 0.95) synergies. The overall average precision and recall of experiments over the three strains were 83.5 and 65.1% for mild-to-strong, 48.8 and 75.5% for moderate-strong and 12.0 and 62.7% for strong synergies.

Table 2 Synergy prediction and synergy scores based on experiments
Table 3 Gamma synergy values for ACT as current synergistic therapies
Table 4 Precision and recall calculated for synergistic compounds

A network representation of compound pairs were provided in Fig. 3 to represent synergy strength of different compound pairs and identify compounds that have highest number of synergies with other compounds. It can be seen that tacrolimus-hydroxyzine and raloxifene-thioridazine, represent the most synergistic compound pairs followed by ingenol-tacrolimus, tacrolimus-fulvestrant, tacrolimus-apicidin and raloxifene-megestrol pairs. Tacrolimus is synergistic with highest number of compounds (three).

Fig. 3
figure 3

Network representation of compound combination synergies. Red thick edges represent high synergy and blue thin edges represent additivity. The level of thickness or redness is inversely related to gamma (calculated based on experiments) in average for the tree P. falciparum strains used in this study. Tacrolimus-hydroxyzine, raloxifene-thioridazine, represent highest synergistic compound pairs followed by ingenol-tacrolimus, tacrolimus-fulvestrant, tacrolimus-apicidin and raloxifene-megestrol pairs. Tacrolimus is synergistic with highest number of compounds (three)

The experimental validation of the most synergistic compound combinations in the P. falciparum strains were depicted in Fig. 4 using the HSA synergy model [43] to identify the optimal synergy doses. In the DD2 assay, the most synergistic combination was tacrolimus with hydroxyzine, where 1.250 µM hydroxyzine and 0.156 µM tacrolimus demonstrated highest synergy of − 0.45 in the HSA system (Fig. 4b). In the 3D7 and HB3 strains on the other hand, raloxifene together with thioridazine was the most synergistic compound pair (Fig. 4d). Here for the 3D7 strain 1.250 µM raloxifene and 2.5 µM thioridazine achieved maximum synergy of − 0.85, while in the HB3 assay this is the case for concentrations of 2.5 µM of raloxifene and 10 µM of thioridazine with synergy of − 0.94. However, high synergy values were also observed at other compound concentrations, such as 2.5 µM of raloxifene with 5 µM of thioridazine, as well as 1.25 µM of raloxifene and 10 µM thioridazine with synergy values of − 0.78 and − 0.72, respectively.

Fig. 4
figure 4

Synergistic compound combinations identified against P. falciparum. Representation of dose–response matrix and HSA synergy model for tacrolimus-hydroxyzine (a, b) and raloxifene-thioridazine (c, d) in the three parasite lines (3D7, DD2, HB3). The effect of increasing drug concentration from right to left and bottom to top was assessed in a 10 × 10 block size matrix (a, c). HSA synergy model represents in which concentrations maximum synergy occurs (b, d)

Discussion

Mechanistic justification of the compounds

In this section, differentially expressed genes and pathways that are involved in the malaria severity signature as well as potential modes of action on the target and pathway level are discussed. It is important to consider that human targets and human pathways are used as the gene expression data utilized is also from human sources. This will enable studying the effect of the compounds on the host.

In the malaria severity signature, several immune response related genes were down-regulated. In other words, these genes were up-regulated in patients that responded to the disease in a curative way and hence recovered from the disease. The computational transcriptional drug repositioning approach is looking for compounds that reverse the malaria severity signature and up-regulate the immune response genes to help boost the immune response. The immunity related genes that are down-regulated in the malaria severity signature are: PRF1, GNLY, OAS2, MX1, OAS3 and CCL5. PRF1 encodes a protein with structural similarities to complement component C9 that is important in immunity [44]. GNLY protein is present in cytotoxic granules of cytotoxic T lymphocytes and natural killer cells, and it has antimicrobial activity against M. tuberculosis and other organisms [44]. OAS2 and OAS3 encode a members of the 2–5A synthetase family, essential proteins involved in the innate immune response to viral infection. These molecules activate latent RNase L, which results in viral RNA degradation and the inhibition of viral replication. They play significant role in the inhibition of cellular protein synthesis and viral infection resistance [46]. MX1 participates in the cellular antiviral response. CCL5 is one of several chemokine genes. Chemokines form a superfamily of secreted proteins involved in immunoregulatory and inflammatory processes. SLPI is also up-regulated in the malaria severity signature. Its inhibitory effect contributes to the immune response [44]. Pathways associated with mild malaria include the type I interferon-mediated signalling pathway, T cell activation and other pathways representing many aspects of immune activation [31]. The malaria severity signature contains six genes that were associated with severe malaria, including carbonic anhydrase 1 (CA1), G-protein-coupled receptor 89B (GPR89B), lipocalin 2 (LCN2), thymidine kinase 1 (TK1), small nucleolar RNA, C/D box 30 (SNORD30), and TBC1 domain family member 3 (TBC1D3) (P ≤ 0.05; fold change of ≥ 1.5) [31]. Thymidine kinase 1 was recently found to be a biomarker of cerebral malaria susceptibility in the murine model [45], and carbonic anhydrase, reflects the blood’s abnormal acid base environment during severe disease. Further discussion of genes and pathways in the malaria severity signature has been published before [31].

Among the most potent single agents, tacrolimus has targets with established links to malaria. Tacrolimus has been previously studied for its potential anti-malarial activity by binding to PfFKBP35 and PvFKBP35 proteins of the parasite [46, 47]; however, the synergistic combination with hydroxyzine has not been reported before. Tacrolimus as well as apicidin were predicted to target “TGF-beta receptor signalling” pathway of human host. It is known that tacrolimus enhances TGF-Beta expression [48] while production of this protein is inversely correlated with severity of murine malaria infection [49]. Activation of another member of this pathway namely latent TGF-beta is suggested as a novel mechanism for direct modulation of the host response by malaria parasites [50]. Hence, modulation of TGF-Beta signalling may be a novel mechanism of tacrolimus that directly affects the host. Moreover, in vivo validation of apicidin against Plasmodium berghei in mice is evident from the literature [51].

On the other hand, pathways enriched by the predicted targets of the selected single agents according to Biosystems pathways are depicted in Fig. 5 (also represented in Additional file 3: Figure S2 and listed in more detail in Additional file 4: Table S1). Figure 5 suggests that highest synergistic combinations such as tacrolimus with hydroxyzine (average (γ) = 0.928), as well as raloxifene and thioridazine (average (γ) = 0.932), target independent pathways rather than shared pathways. On the other hand, pairs of compounds that have the most similar pathway signature did not have strong synergy. As an example, tacrolimus-chromomycin, hydroxyzine-KIN001244 and fulvestrant-megestrol are pairs that are clustered together (Fig. 5) but all of them are showing week synergies with average γ values of 0.991, 0.965, and 0.966. The data suggests that targeting independent pathways by compound pairs is privileged to targeting the same pathway multiple times for synergy to emerge.

Fig. 5
figure 5

Pathway annotation of selected agents in the P. falciparum screen. Y axis pathway IDs are explained in Additional file 4: Table S1. Compounds are clustered based on their similarity in pathway bioactivity space, with yellow indicating high enrichment Z-score of a particular pathway, and red indicating lower enrichments

Limitations of the study

The limitation of transcriptional drug repositioning approaches that are gaining popularity is that they cannot be applied for predicting compounds that act on parasite directly. Some recent other applications of transcriptional drug repositioning approaches for finding antivirals took into account the effect of compounds on host cells [20, 21]. This limitation is due to CMap and LINCS databases that are the gateways for transcriptional drug repositioning approaches. The LINCS database used in this work contains 201,776 compound treatment gene signatures that are applied on “human” cell lines and hence provides genes up/down-regulated in human tissue. Hence, it is not applicable to the parasites directly. Another limitation of the transcriptional drug repositioning approaches is that they are limited to the compounds that are included in LINCS or CMap database.

The use of transcriptional drug repositioning in the context of this project can facilitate identifying single agent compounds that help boost immunity of host to the parasite. Given that the combination of those compounds are expected to show synergy in the in vitro system as well, a two round selection process was performed. In the first round, single compounds were predicted to be active in blood cells of patients infected with severe malaria. In the second round, combinations of those single compounds were selected that were predicted to be synergistic on the parasite infected in vitro system containing host red blood cells. The final compound combinations were as a result of using both filtering methods. It needs to be considered that the in vitro system includes human red-blood cells and this means human pathways are present in the in vitro system. Moreover, the training data for synergies was generated based on the same in vitro platform of the final experimental tests which has human red blood cells infected with malaria parasite. Hence, the final combination sets are correctly predicted for the right platform.

One of the limitations of the study is the number of patient samples used in this work. Even though the gene expression samples of only five patients were used in the study, they were hand-picked among 119 patients participating in Blantyre malaria research. These patients were selected as they manifested clinical features of severe malaria but were dramatically recovered, having no clinical evidence of severe disease. They were also reported to be clear of peripheral blood parasites and bacterial meningitis at admission and after 48 h [31]. The authors that published the database claimed that “five severe/mild paired samples had sufficiently high-quality RNA for microarray and further analysis”. However, the sample size was mentioned as a limitation of the study.

Generally, the application of computational biology approach introduced in this study is limited to early stage drug discovery/repurposing of single agents as well as combination therapies. Any of the findings of such early stage studies should be followed up with further lab and possibly animal studies to be able to be translated into clinical trials. However, the approach can be used on other diseases as well given availability of appropriate data.

Performance evaluation and effect of gamma choice

The choice of gamma affects sample size of the synergy class and hence it is a trade-off between the sample size and quality of samples (degree of synergy). Gamma cut-off of 0.975 marked in average between the three strains 379.3 out of 1540 combinations in the training set as synergistic while gamma cut-off of 0.95 reduced average number of synergistic instances to 229.7. Given the dimensionality of the data with 2010 features, higher gamma cut-off of 0.975 was chosen to keep the sample size in the synergy class as high as possible while avoiding the additivity window. To show that the gamma cut-off is relevant in this study, Table 3 shows gamma values of current combinational therapies. Only one of five currently used ACT medicines (mefloquine–artesunate) on all the three strains and one (amodiaquine–artesunate) on the HB3 strain have gamma values bellow 0.95. Hence, the cut-off chosen is appropriate, given the observed synergy even for established combination therapies.

To address the effect of gamma choice on the validation step, various gamma cut-offs were used and the accuracy of the algorithm based on each of the cut-offs were validated. Table 4 shows the precision and recall based on each of the cut-offs. When gamma cut-offs of 0.975 and 0.995 are used, the precision of the algorithm is 1.98 and 2.02 times better than the training set while recall has been kept as high as 75.5 and 65.1%. It is notable that the training data is selected in a biased way rather than randomly, as it includes current combination therapies and investigational drugs. In case of 0.95 cut-off, the average recall is kept still as high as 62.7% in average. The precision on the DD2 strain is 1.45 times better than the biased training set. However, in average over the three strains the precision is similar to the training set which was selected with biological insights. Hence the algorithm is almost twice better than biased selection for predicting mild-to-strong or moderate-to-strong synergies. In case of only strong synergy detection the approach is 1.4 times better than biased in the DD2 strain but similar to biased in average for the three strains. To the best of authors’ knowledge there is no similar study published elsewhere that has a better precision and recall for predicting synergistic pairs for malaria.

Conclusions

Recently drug resistance has emerged, and continues to develop, to the existing drug combination therapies for malaria, and hence discovering novel therapies is of much importance [7]. However, testing combinations is costly and time consuming. Systematic approaches towards novel compound combination discoveries can reduce financial cost as it helps to identify compounds in a more informed manner. In this work, an integrated transcriptional drug repositioning and machine learning approach was developed to predict synergistic compound combinations for malaria. The transcriptional drug repositioning approach led to the identification of active single agents against malaria. Chromomycin-a3 (AC50 = 0.011 µM), fulvestrant (0.0667 µM), apicidin (0.078 µM), ingenol (1.4408 µM) and tacrolimus (2.4383 µM) exhibited full inhibition in low concentrations. The machine learning approach utilized here was trained on a dataset of experimentally assayed compound combinations (including synergistic, antagonistic and additive responses), across three strains of malaria to suggest new compound combinations.

The predicted synergistic compound combinations were experimentally validated, and synergistic compound combinations were identified with overall precision and recall of 83.5 and 65.1% for mild-to-strong, 48.8 and 75.5% for moderate-strong and 12.0 and 62.7% for strong synergies. In all of the predictions recall has been kept high between 50 and 80% depending the expected synergy level and the strain. Highest precision is achieved for mild-to-strong synergies ranging from 61.3 to 72%. However, as the synergy level expectation is increased to moderate-to-strong or only strong synergies the precision drops to 30–59% and 4.8–22.7%. Strong novel synergistic compound combinations including combination of tacrolimus with hydroxyzine (average (γ) = 0.928) as well as raloxifene and thioridazine (average (γ) = 0.932) were identified. Moreover, according to the data, targeting independent pathways was more privileged that targeting the same pathway with both compounds in a combination. For retrospective validation, it is notable that tacrolimus and apicidin that were predicted for P. falciparum strains have previously been studied for malaria. In vivo validation of apicidin against P. berghei malaria in mice is evident from the literature [51]. This retrospective validation provides further assurance of the computational approach presented in this study.

For a single agent to be translated to actual cure, there are several requirements. One of these is that the dosage that activity is occurring, should be lower than maximal safe plasma concentrations of the drugs. The maximal plasma concentration of the single agents are provided in Table 5. In this study, the highest active single agents were apicidin, fulvestrant and chromomycin-a3. The only single agent that is active in lower doses than the maximal safe plasma concentration is apicidin. The AC50 value of apicidin is 74.9, 84.1 and 74.9 nM in 3D7, DD2 and HB3 P. falciparum strains while its maximal safe plasma concentration in human is 547.6 ± 136.6 nM. Apicidin at the dose of 500 nM kills in average 97% of the parasite while it is a safe dose for human. However, apicidin is not approved drug and is not in clinical trials. Hence, entering clinic requires further safety and efficacy studies of apicidin.

Table 5 Maximum plasma concentration of experimentally validated single agents with activity against malaria

The most synergistic compound combinations were tacrolimus-hydroxyzine, tacrolimus-fulvestrant, ingenol-tacrolimus, fulvestrant-megestrol and raloxifene-thioridazine. All of the synergies were occurring at doses above the maximal safe plasma concentrations, which renders the current study more of a proof-of-principle than a method to select therapeutically relevant compound combinations directly. A clear definition of required pharmacokinetics and pharmacodynamics properties will also help with the design of drug regimens for dosing anti-malarial drug combinations [52]. Hence, the synergistic pairs will require further investigations in in vivo models to fully evaluate the possibility of transferring them into clinic. Moreover, the generated dataset can be used as a training set for further compound combination predictions. Moreover, the approach is interesting to be followed up with larger synergy prediction candidates (with a different test set).

Accuracy of the predictions and identification of novel single agents and combinations suggests that the integration of machine learning methods with gene expression data analysis can lead to a robust platform for predicting novel synergistic compound combinations, in this case applied to malaria.