Drug efficacy and toxicity prediction: an innovative application of transcriptomic data

Drug toxicity and efficacy are difficult to predict partly because they are both poorly defined, which I aim to remedy here from a transcriptomic perspective. There are two major categories of drugs: (1) restorative drugs aiming to restore an abnormal cell, tissue, or organ to normal function (e.g., restoring normal membrane function of epithelial cells in cystic fibrosis), and (2) disruptive drugs aiming to kill pathogens or malignant cells. These two types of drugs require different definition of efficacy and toxicity. I outlined rationales for defining transcriptomic efficacy and toxicity and illustrated numerically their application with two sets of transcriptomic data, one for restorative drugs (treating cystic fibrosis with lumacaftor/ivacaftor aiming to restore the cellular function of epithelial cells) and the other for disruptive drugs (treating acute myeloid leukemia with prexasertib). The conceptual framework presented will help and sensitize researchers to collect data required for determining drug toxicity. Electronic supplementary material The online version of this article (10.1007/s10565-020-09552-2) contains supplementary material, which is available to authorized users.


Introduction
The most desirable drug is of high efficacy, low toxicity (side effects), low chance of drug resistance, low cost, and low deleterious effect on the environment, e.g., no reactivation by bacterial species after human use (Xia 2017). Among these five key features, drug toxicity is perhaps the most difficult to define, quantify, and predict (Sosnin et al. 2019). In this review, I aim to introduce a standard definition for drug efficacy and drug toxicity from a transcriptomic perspective to facilitate their prediction in drug discovery in a transcriptomic context. Drugs can be classified broadly into restorative and disruptive drugs. Restorative drugs aim to restore cellular functions. For example, in cystic fibrosis (CF) patients homozygous for the ΔF508 mutation (deletion of a phenylalanine at site 508) in the CFTR gene, the misfolded protein in endoplasmic reticulum (ER) is mostly degraded after failing to go through the quality control system (Fraser-Pitt and O'Neil 2015). The few CFTR proteins that do escape the degradation and are exported to their membrane location typically do not function very well. Thus, any modulators that can increase the export of CFTR protein to the membrane and improve its ion channel function would contribute to restoring the epithelial cell function and alleviate the associated symptoms (Deeks 2016;Sala and Jain 2018;Gentzsch and Mall 2018). Lumacaftor/ivacaftor for treating CF patients who are ΔF508 homozygotes is a drug combination representative of restorative drugs. From a transcriptomic perspective, the efficacy of such drugs is measured by how much they can reduce the difference in transcriptomic profile between patients and healthy controls, especially for a subset of genes directly related to the disease (Failli et al. 2019;Karagianni et al. 2019). Their toxicity is measured by the drug-induced differences in transcriptomic profile for genes that are not intended to be affected by the drug.
In contrast to restorative drugs, disruptive drugs are intended to disrupt cell growth and proliferation and to induce apoptosis. These drugs are used in the fight against pathogens or malignant cells (Moffat et al. 2014;Shoemaker 2006) without deleterious effect on normal human cells. Drug efficacy of disruptive drugs can be directly measured by the proportion of cancer cells or pathogens killed, from which one can obtain an estimate of the propensity of cancer cell or pathogen mortality. From a transcriptomic perspective, drug efficacy can be defined as an index of disruption, measured by the druginduced difference in transcriptomic profile of malignant cells before and after drug use, especially the induction of apoptosis genes and activation of apoptosis pathways. The drug toxicity could be conceptually defined as druginduced transcriptomic differences of normal cells before and after drug administration. In practice, this definition has limitations and alternatives are discussed.
I detail the definitions below, outline the rationale behind such definitions, and illustrate their applications that lead to meaningful quantification of drug efficacy and toxicity from transcriptomic data. Two sets of largescale transcriptomic data are used for the illustration and can be downloaded from NCBI. I also include two supplemental files containing the data used in this paper, with detailed instructions on how to replicate the results in the paper. The first data set involves a restorative drug, i.e., lumacaftor/ivacaftor used to restore the cellular function of epithelial cells of CF patients with the double ΔF508 mutation (Kopp et al. 2019;Kopp et al. 2020). The second data set resulted from treating acute myeloid leukemia with prexasertib (Kaufmann and Li 2019).

Drug efficacy and toxicity: restorative drug
Lumacaftor/ivacaftor for CF is a drug combination representative of restorative drugs. The majority of CF is caused by the deletion of F508 (ΔF508) in both alleles of the CFTR gene (Brockman et al. 2017;Esposito et al. 2016;Faure et al. 2016). The ΔF508 CFTR proteins cannot be folded properly in ER lumen and are mostly degraded after failing to be exported to the cell membrane to perform its ion channel function (Fraser-Pitt and O'Neil 2015). CFTR-Associated Ligand (CAL), an ER-localized protein, binds to ΔF508 CFTR, leading to degradation in the 26S proteasome (Bergbower et al. 2018) through the ubiquitin-proteasome pathway (Sondo et al. 2018). The few ΔF508 CFTR that do find their way to cell membrane do not function well due to severe gating defects (Bose et al. 2019). Thus, drugs that can decrease the degradation of ΔF508 CFTR protein, increase the export of ΔF508 CFTR protein to the plasma membrane, and improve its ion channel function, would contribute to restoring the epithelial cell function and alleviating the associated symptoms of cystic fibrosis. Such drugs and drug candidates include lumacaftor/ivacaftor (Deeks 2016;Gentzsch and Mall 2018;Kmit et al. 1865), tezacaftor/ivacaftor (Sala and Jain 2018;Faure et al. 2016;Donaldson et al. 2018), fatty acid cysteamine (Vu et al. 2017), or even rattlesnake phospholipase A2 (Faure et al. 2016). How to evaluate efficacy and toxicity of these drugs with transcriptomic data?
A recent study (Geo DataSets accession GSE124548) characterized whole-blood transcriptomic responses to lumacaftor/ivacaftor therapy in CF patients homozygous for ΔF508 (Kopp et al. 2019;Kopp et al. 2020). It gathered transcriptomic data for a total of 15,570 RNA and protein-coding genes from 20 CF patients before and after administration of lumacaftor/ ivacaftor, as well as 20 non-CF individuals as control. Thus, the complete set of gene expression data is a 15,700 × 60 matrix. For each gene, there are 20 gene expression values for patients before the drug administration, 20 values for the same set of patients after the drug administration, and 20 values for healthy controls. I normalized the total number of read counts (i.e., the summation of each column of 15,570 values) to one million to facilitate comparison.
One might question the relevance of whole-blood transcriptomic data to CF drug efficacy. The most direct measure of efficacy would seem to be peeling off a piece of epithelium (especially those lining the airways) to test for the presence (or increased amount) of functional CFTR protein. If this invasive approach is not acceptable, then there are simple alternatives such as the conventional sweat test for efficacy. A reduction in the amount of chloride in sweat would seem to be an excellent index of efficacy for any CF drug. However, while lumacaftor/ivacaftor treatment does not seem to reduce chloride in sweat, the treated patients did report an improved quality of life.
Experimental evidence that implicated leucocytes in CF development has accumulated in the last 20 years. CF is made much worse by human immune responses mediated by leucocytes (mainly through neutrophils) (Makam et al. 2009;Tirouvanziam et al. 2006;Tirouvanziam et al. 2000;Tirouvanziam et al. 2008). Not only can a subset of neutrophils cause CF in mice, such neutrophils from CF patients can even transfer CF to mice (Genschmer et al. 2019). While it is not clear which subset of genes exhibit abnormal expression that leads to the full-fledged development of CF, it is clear that gene expression in leucocytes plays a key role in the development of CF (Lin et al. 2008). In addition to mRNA differences, microRNA miR-155 is also highly expressed in circulating CF neutrophils biopsied from CF patients (Bhattacharyya et al. 2011). For this reason, it is not outlandish to use whole-blood transcriptomic differences to characterize efficacy and toxicity of CF drugs.
Designate transcriptomes for patients before and after the administration of the drug as P b and P a (where subscripted b and a stand for before and after) respectively, and those for healthy controls as H (for healthy). Ideally one should compile a list of target genes that are particularly relevant to CF and formulate transcriptomic efficacy and toxicity based on how the drug treatment would restore their gene expression to that of healthy controls. However, given the existing knowledge on CF, there is practical difficulty in compiling such a set of target genes, so we will use all genes with expression levels clearly above background. Gene expression differs dramatically among genes, with S100A9 and EEF1A1 having mean expression equal to 10,611.94 and 8871.31, respectively, but many others with small values. I excluded genes with mean expression values lower than 10. This leaves 8558 genes.
The first 20 genes that differ most between H and P b are listed in Table 1, together with differences between H and P a and the results of significance tests. The first gene is ARID3A which belongs to the Arid (AT-rich interaction domain) family of DNA-binding proteins. At least one member of the family (ARID3B) is highly expressed in adult fibrotic lung tissue (Lin et al. 2008).
ARID3A is expressed only in human B cells, but its function is little known (Nixon et al. 2004). The second gene (Table 1) is STX3 which encodes a protein targeted to the apical membrane of epithelial cells and is crucial for the normal function of CFTR (Tang et al. 2011). The third gene is SOD1 belonging to the superoxide dismutase family. Superoxide dismutases, especially extracellular ones, play a key role in preventing pulmonary fibrosis (Gao et al. 2008). These suggest that wholeblood transcriptomic data may shed light on mechanism of CF development.
It is almost never easy to identify key genes responsible for a disease. The patient group may exhibit altered expression of many genes including the disease-causing genes and those representing secondary responses. The healthy controls may include individuals who are about to have the disease and exhibit disease-specific gene expression patterns but have not yet manifested the disease symptoms. In this context, it is encouraging to identify genes such as ARID3A, STX3, and SOD1 that are known to be directly or indirectly related to CF.
Drug efficacy is the summation of everything better after drug treatment than before drug treatment. ARID3A expression is much higher in CF patients than in healthy control (95.4289 vs 63.5434, Table 1). After drug treatment, ARID3A expression is reduced to 82.7131 (Table 1), closer to that of the healthy control by 12.7158. Designate mean expression of H, P b , and P a as MeanH, MeanP b and MeanP a , respectively. Now for ARID3A where Δ D is desirable if positive and undesirable if negative. A better replacement of D MeanH∼MeanP a and D MeanH∼MeanP b is the t statistic which incorporates the standard error (SE) of the differences. Again for ARID3A, t MeanH∼MeanP b and t MeanH∼MeanP a values measure deviation of gene expression in CF patients from that of healthy controls before and after drug treatment, respectively. Ideally, all t MeanH∼MeanP a values would be zero; i.e., the gene expression is perfectly restored to that of healthy controls. In the case of ARID3A, although t MeanH∼MeanP a is not zero, it is at least smaller than t MeanH∼MeanP b ; i.e., the gene expression is closer to that of the healthy control after drug treatment than before drug treatment. The distribution of t MeanH∼MeanP b and t M eanH∼MeanP a for the 8558 genes ( Fig. 1a) suggests positive drug effect. That is, the distribution of t MeanH∼MeanP a has shifted towards smaller values relative to the distribution of t MeanH∼MeanP b .
Δt in Eq. (6) measures drug effect on a specific gene (ARID3A). If the drug is efficacious, we expect most genes to have positive Δt values than genes with negative Δt values. Among the 8558 genes, 5710 has positive values and 2848 have negative values. The distribution of the 8558 Δt values (Fig. 1b) suggests a mean Δt greater than 0. The 20 genes with the most negative Table 1 The first 20 genes that differ most in transcriptome between the control (H) and CF patients before drug administration (P b ), together with associated t tests Δ t values, i.e., genes with the greatest side effect (or toxicity effect), are listed in Table 2. I now define an index of drug desirability as where I DD is simply an average of Δ t . The drug is desirable if I DD > 0, undesirable if I DD < 0, and neither desirable nor undesirable if I DD = 0 (the null hypothesis). For the CF transcriptomic data, I DD is highly significantly greater than 0 based on the 8558 Δ t values (t = 41.8478, DF = 8557, p < 10 −20 ). The standard error (SE) of Δ t is 0.01456, so that 95% confidence interval of I DD is (0.58073, 0.63781).
Given 5710 genes with positive Δ t and 2848 genes with negative Δ t , the drug efficacy and drug toxicity can be defined as which implies that a drug can become more desirable by either increasing E or reducing T.
To generate a more informative I DD , one should use a fixed set of candidate genes known a priori to be relevant for the disease instead of using all 8558 genes. We can use this set of candidate genes to replace the 8558 genes in the computation. For disruptive drugs aiming to kill cancer cells, such a fixed set of genes could simply be all genes involved in apoptosis pathways. For restoration drugs aiming to restore a specific function, then all genes contributing to the function can be included in the set.
Suppose a researcher has done a similar experiment with a new drug, or the same drug with a different dose, and wish to compare his I DD , E and T against those reported above. He may compute I DD from his experimental result with the same set of genes. If the lower limit of his 95% confidence interval for I DD is greater than my calculated I DD (= 0.6093) or if his I DD is greater than the upper limit of my 95% confidence interval (= 0.63781), then he may conclude that his drug or his dosage is more desirable than the lumacaftor/ivacaftor treatment. He can then further dissect the result to see whether his increase in I DD is due to increased E or reduced T.

Drug efficacy and toxicity: disruptive drugs
Disruptive drugs aim to induce large changes in the target cells, ideally leading to cell death. Suppose we are treating liver cancer with a particular drug. We would need transcriptomes from normal liver cells and malignant liver cells before and after drug treatment, represented as GE nb , GE na , GE mb , and GE ma , where GE stands for gene expression, and the subscript n stands for normal, m for malignant, b for before, and a for after. I will first start with a general case with no specific set of target genes, and then narrow down to a set of genes involved in apoptosis.

Drug efficacy with no specific set of target genes
For an anti-cancer drug, it is desirable to disrupt the cancer cell as much as possible, so GE mb , and GE ma should differ as much as possible. For M genes with gene expression clearly above background, the transcriptomic efficacy (E) is defined as the mean |t| value: Take for example the transcriptomic data (GSE131912) for treating acute myeloid leukemia with 10 nM prexasertib (Kaufmann and Li 2019), with three controls (CTRL) and three treatments (TREAT). I again normalized each of the six columns of data (3 CTRLs and 3 TREATs) to have a summed total of 1,000,000. After excluding genes with mean expression smaller than 10, I have 9446 genes remaining. The resulting E is It is easy to test if this E = 5.54566 is statistically significant. The expected |t| value for any degree of freedom (ν) can be obtained by the following equation: where f(t| ν) is the probability density function of t distribution given ν. In our case with ν = 4 (for a t test with 3 CTRLs and 3 TREATs), the expected |t| value is 1 when the null hypothesis of no difference is true. Therefore, we can do a simple t test as where SE is the standard error of the 9446 t values used to calculate E in Eq. (12). The p value is effectively 0. In other words, the prexasertib treatment very strongly perturbed gene expression.
A multitude of diversifying lineages has been reported in tumors (Bailey et al. 2020;Turajlic et al. 2019;Wu et al. 2016), which can complicate transcriptomic data analysis (Xia 2017;Navin 2015). It would be interesting to know if a certain anti-cancer drug will perturb all different cancer cell lineages or just a subset of the lineages. An anti-cancer drug against one or only a subset of proliferating lineages will not be efficacious against the cancer.
Drug efficacy with a set of target genes such as genes involved in apoptosis Although one could calculate E by using all genes whose expressions are not too low, as is done above, the E value would be more informative if we use a set of candidate genes more relevant to the conventional sense of efficacy. For example, if the drugs are for inducing apoptosis in cancer cells, then we would be more interested in 80 or so apoptosis genes involved in extrinsic and intrinsic apoptosis pathways (Burke 2017), which we can obtain by using databases such as KEGG (Kanehisa 2002) or apoptosis database specific for human cancer such as ApoCanD (Kumar and Raghava 2016).
For illustration, I downloaded the 82 sequences for proteins involved in apoptosis from ApoCanD.    Table 3. In this case with a fixed set of genes, E is calculated as This E apoptosis from the 66 genes involved in apoptosis is greater than E (= 5.54566) from all 9446 genes, suggesting that gene expression of apoptosis genes has been changed more by the drug than that of an average gene. One question that a researcher is interested in answering is whether this difference is statistically significant. I tested the difference between the 66 t values from apoptosis genes against the 9446 t values from the 9446 genes. The difference is significant, with t = 2.059644, DF = 9510, p = 0.03946. Thus, the drug altered expression of apoptosis genes more than it does to an average gene.
One may criticize the formulation of E in Eq. (11) as being only an index of gene expression disruption. If the drug is to induce apoptosis, then the efficiency should be defined as the propensity of tumor cell death (p). This p is typically dose-dependent for a drug. Table 4 illustrates such a set of fictitious data with measured dosedependent cancer cell mortality (the first three columns), as well as expression for two apoptosis-related genes (AG1 and AG2). As there are 80 or so genes closely involved in various apoptosis pathways, a real data set could include 80 or so columns in Table 4, each column representing dose-dependent expression of a gene. For simplicity we illustrate with only two genes.
From the first three columns in Table 4, one can obtain the dose-dependent p using logistic regression (Berkson 1944), which would give us p ¼ e aþbÁDose 1 þ e e aþbÁDose ¼ e −3:3827þ0:0130Dose 1 þ e −3:3827þ0:0130Dose ð16Þ p is highly significantly related to dose (Fig. 2a) and the relationship is depicted in Fig. 2b. Because p seems to be a direct measure of the drug efficacy in killing cancer cells, what is the need for a transcription-based index of efficacy? Table 2 The first 20 genes that have the most negative Δt values (negative Δt means gene expression deviating even more from healthy control after drug treatment and is therefore undesirable). Column headings same as in Table 1 ID There are two arguments for the relevance of transcription data. First, we wish to know which gene whose altered expression may have contributed to the observed death of cancer cells. We can use generalized linear model (Nelder and Wedderburn 1972) to fit the relationship between p and AG1 and AG2. As shown in Fig. 2c, AG1 is not related to cancer cell mortality p, but AG2 is highly significantly related to p. Also, there is no interaction between AG1 and AG2 (Fig. 2c). The best model, based on either likelihood ratio test or informationtheoretic indices such as AIC or BIC (Burnham and Anderson 2002;Xia 2009), has AG2 as the only independent variable. The fitted model (Fig. 2 d and e) shows the AG2-dependent drug efficacy. Such knowledge gains us a better understanding of the mechanistic basis of cancer cell death; i.e., the drug may have induced apoptosis through the pathway with a strong dependence on differential AG2 expression. Second, alteration of gene expression typically occurs much earlier than cell death, so an efficacy index based on gene expression (especially those directly related to apoptosis) is likely more sensitive than one based on observed cell death.
Different drugs may target different sets of gene with totally different outcome in terms of transcriptome response although all of them may be highly efficient treatments against a certain disease. For example, some anti-cancer drugs aim to decrease expression of antiapoptotic genes such as BCL-2, BCL-XL, and MCL1 (Luo et al. 2014;Sattler et al. 1997;Beroukhim et al. 2010), or increase the expression of pro-apoptotic genes such as BID, BIM, BAD, PUMA, and NOXA (Happo et al. 2010;Slinger et al. 2016;Zhang et al. 2013). Both types can lead to mitochondrial outer membrane permeabilization and subsequent activation of apoptosis agents such as caspases. If one compares transcriptomic efficacy of drugs suppressing the expression of antiapoptotic genes such as BCL-2, BCL-XL, and MCL1 against transcription efficacy of drugs increasing proapoptotic genes such as BID, BIM, BAD, PUMA, and NOXA, then one would be comparing apples and oranges. In such cases, cancer cell mortality is a more general measure of drug efficacy. In short, the transcriptomic efficacy complements but does not replace the measure of cancer cell mortality as drug efficacy.

Toxicity
A good drug should have high efficacy but low toxicity. For disruptive drugs, the transcriptomic toxicity is difficult to define except for the simplest cases such as skin cancer and some mouth cancer where (1) the tumor and the surrounding normal tissues are clearly distinguishable and (2) topical chemotherapy is used so that the tumor and the surrounding normal tissues are subject to the same treatment. In such cases, GE m.b and GE m.a can be characterized from the tumor, and GE n.b and GE n.a can be characterized from surrounding normal tissues. Transcriptomic toxicity T can then be calculated from GE n.b and GE n.a , Again, the expected t, when there is no difference between GE n.a and GE n.b , is specified in Eq. (13), which allows us to carry out a significance test of whether the drug has statistically significant transcriptomic toxicity.
E and T values should mainly be used to facilitate comparisons. If we have a new drug with an E value much greater than that for the old one, but a T value that is similar to, or smaller than, that for the old one, then we would be inclined to choose the new one over the old one. Similarly, if a heavier dose of prexasertib leads to much higher E but the same T, then the heavier dose is preferred. The transcriptomic toxicity defined in Eq. (17) is limited for two reasons. First, GE n.a typically cannot be measured because anti-cancer drugs almost invariably have strong side effect so it is consequently unethical to recruit heathy human subjects to take the drugs for measuring GE na . Prednisone (a glucocorticosteroid) used in some anti-cancer chemotherapies was the only one tested with healthy volunteers, but with only a single dose, causing a 72% decrease of the total lymphocyte number and a 97% decrease in total eosinophil count (Schuyler et al. 1984). Such a study would not be possible today. Second, when anti-cancer drugs are infused intravenously, numerous numbers of tissues and cell lineages are affected. A reasonable assessment of toxicity would need GE nb and GE na from all of these affected tissues. One alternative is to use animal models of human diseases such as mouse models of human cancer (Borowsky 2011;Cheon and Orsulic 2011;Rudin et al. 2019;Swiatnicki and Andrechek 2019), especially when oncogenes and tumor-suppressor genes can be conditionally turned on or off. These animal models allow us to measure GE nb and GE na as well as GE mb and GE ma for a variety of tissues. Another alternative is to use cell lines.
However, in spite of these two available alternatives (animal models and cell lines), transcriptomic cancer studies tend to measure only GE mb and GE ma , but almost never GE nb and GE na . It is too wasteful to collect transcriptomic data that cannot be used to quantify toxicity without which one cannot say whether one drug is more preferable than another. This general negligence to collect relevant data to estimate transcriptomic toxicity hinders cancer research and informed decisionmaking in drug administration. I hope that the definitions and illustrations I used in this paper will encourage researchers to collect more complete and informative data in the future and to formulate better indices of efficacy and toxicity.

Conclusion
Drug toxicity prediction is a difficult subject, made more so by a lack of definitions. I have proposed informative definitions for transcriptomic efficacy and toxicity that are easy to use in real research settings with transcriptomic data. The conceptual framework associated with the definitions also highlights the general negligence of researchers in collecting data relevant to measure drug toxicity. I expect these definitions will result in significant improvement of accuracy and precision in drug development.
Funding information This research was funded by Discovery Grant from Natural Science and Engineering Research Council (NSERC, RGPIN/ 2018-03878) of Canada.

Compliance with ethical standards
Conflict of interest The author declares that there is no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.