Background

APOBEC3A and APOBEC3B (apolipoprotein B mRNA-editing enzymes 3A and 3B, catalytic polypeptide-like) are cytosine deaminases from the AID/APOBEC family, members of which play important roles in host immunity against pathogens [1, 2]. The activity of multiple members of the AID/APOBEC family including APOBEC3A but not APOBEC3B has also been linked to epigenetic processes involving DNA demethylation via deamination of 5-hydroxymethyl-cytozine (5-hmC) to 5-hydroxymethyl-uracil (5-hmU) [1, 3, 4]. APOBEC3B is an endogenous mutagen which generates DNA substitutions, most frequently C to T, via a process that involves cytosine to uracil deamination of single-stranded DNA, most commonly in the 5′-TCW-3′ (where W is either A or T) sequence context [2]. In multiple human cancer categories, increased APOBEC3B gene expression has been associated with genome-wide hypermutation and with kataegis, a mutagenic process that generates clusters of closely spaced, single-strand-specific DNA substitutions, which are predominantly C to T [5, 6]. Clusters of APOBEC3B mutations are often localized at breakpoints of chromosomal rearrangements [2]. Increased APOBEC3B gene expression, germline polymorphisms in the APOBEC3 genome region, and higher degree of abundance of APOBEC3B mutational signatures have been associated with increased cancer risk and patient survival [5, 7].

APOBEC3B mutagenesis has a characteristic pattern of mutational specificity. It is most commonly represented by the 5′-T(C>T)W-3′ sequence motif [8], where “>” indicates the C to T substitution, and W is an [A or T]. This hypermutation pattern and high mRNA expression levels of APOBEC3B have been found in several cancer types [9, 10]. Additional mutation patterns have also been reported for APOBEC3B, although some of these patterns may also be attributed to other APOBEC family members [6, 7, 10, 11]. According to various reports, in addition to the C>T transitions, these patterns may include possible C>G and, in some specific cancer types such as ovarian carcinomas, C>A transversions, as well as a possible 5′-TC(A or G)-3′ sequence context, so that possible mutational motifs could be represented as 5′-T(C>K)W-3′, 5′-T(C>D)R-3′, or 5′-T(C>D)D-3′, where K is [G or T], W is [A or T], R is [A or G], and D is [A or G or T] according to the IUB-IUPAC ambiguity codes [6,7,8, 11,12,13]. Below, we present these sequence motifs in the 5′ to 3′ direction as T(C>K)W, T(C>D)R, and T(C>D)D.

While APOBEC3B plays a prominent role in cancer mutagenesis, several other AID/APOBEC family members also have mutagenic roles and affect DNA integrity [9, 14]. Most of them have separate distinct specificities for genome sequence context [2, 8,9,10, 15, 16]. However, a possible overlap between the activities of APOBEC3B and APOBEC3A has not been fully resolved. The APOBEC3A gene is located in proximity to APOBEC3B in the APOBEC genomic cluster in the chromosomal region 22q13.1 [7]. An APOBEC3A-APOBEC3B fusion transcript may be produced due to a germline deletion polymorphism, which results in the complete loss of the coding part of the APOBEC3B gene and abolishes APOBEC3B gene expression; this deletion polymorphism produces a fusion product of the APOBEC3A gene with the 3′-UTR of APOBEC3B gene, and it has been associated with an increased risk of several types of cancer [7, 17]. The evidence for a mutagenic role of APOBEC3A so far has been less conclusive than that of APOBEC3B [12, 18]. However, a number of studies suggested that APOBEC3A also acts as an endogenous mutagen that can produce genomic damage, with a mutation signature that may be distinguishable to some extent from that of APOBEC3B [7, 13, 19,20,21,22,23,24,25]. In addition to mutagenesis linked to DNA deamination of single-stranded DNA, both APOBEC3B and APOBEC3A can bind RNA, and APOBEC3A has been reported to be involved in both C to U and G to A RNA editing [16, 26].

Based on the strong evidence for APOBEC-associated mutagenesis in a variety of cancer types, it is important to learn whether such mutagenic processes may affect cancer response to therapy, in order to exploit potential pathways involved in sensitivity and to avoid potential mechanisms of resistance. To date, the effect of APOBEC3B-like mutagenic processes on therapeutic response has not been fully understood, with several reports of divergent directions of association. Some studies suggested a potential role of APOBEC mutagenesis in tumor resistance to therapy, with a possible resistance mechanism explained by increased tumor heterogeneity when APOBEC3B activity is elevated [18]. Clinical studies and an analysis of murine xenograft models found an association of increased APOBEC3B mRNA expression levels with tamoxifen resistance in estrogen receptor-positive (ER+) breast cancer [18]. In an analysis of 30 human cell lines, expression levels of the APOBEC3B gene were associated with resistance to vinblastine, topotecan, paclitaxel, mitoxantrone, mitomycin C, etoposide, and doxorubicin [27]. In contrast, a study of bladder cancer patients from the Cancer Genome Atlas (TCGA) demonstrated improved survival of those patients who had elevated numbers of APOBEC signature mutations [7]. Experimental in vitro overexpression of APOBEC3B in the 293-A3B and 293-GFP cell lines with inactivated p53 resulted in an increase in APOBEC mutagenesis and kataegic events, which were accompanied by cell hypersensitivity to small-molecule DNA damage response inhibitors including ATR (VX-970 and AZD673), CHEK1 (SAR020106), CHEK2 (CCT241553), PARP (olaparib and BMN-673), and WEE1 (AZD1775) inhibitors, as well as by sensitivity to combinations of cisplatin/ATR inhibitor, ATR/PARP inhibitor, and PARP/WEE1 inhibitor [28]. Increased APOBEC3B expression in breast cell lines was also correlated with sensitivity to the CHEK1 inhibitor CCT244747 [29]. In contrast, APOBEC3B or APOBEC3A expression levels were not significantly correlated with sensitivity to any drugs in breast cancer cell lines from the Genomics of Drug Sensitivity in Cancer (GDSC, or GDS1000) dataset [30]; however, they were associated with sensitivity to 38 and 16 agents, respectively, in a joint analysis of all cancer types [31].

At the molecular level, APOBEC3B hypermutation activity has been reported to have a synergistic effect with the absence of the uracil-specific uracil DNA glycosylase (UNG) and to involve molecular steps that require the activity of the translesion synthesis DNA polymerase REV1 [8, 20, 22, 24]. APOBEC mutagenesis may also be increased in case of reduced expression or the loss of protein activity of the tumor suppressor fragile histidine triad protein (FHIT), and higher levels of APOBEC mutagenesis were observed in TCGA lung adenocarcinoma tumors that had both increased APOBEC3B expression and the loss of FHIT protein expression [7, 9, 32].

Whereas many studies have focused on the molecular roles of APOBEC3B, and to some extent APOBEC3A, possible cumulative effects of action of APOBEC3A, APOBEC3B, UNG, REV1, and FHIT on generation of APOBEC3B-like mutation motifs and on drug sensitivity in cancer have not been clearly elucidated. To address this question, we investigated the presence of APOBEC3B-like mutational patterns and mRNA expression of the APOBEC3A, APOBEC3B, UNG, REV1, and FHIT genes in cancer cell lines, in order to identify those cancer cell lines that may have experienced kataegis events. We further examined associations between mutational patterns of APOBEC3 activity, individual cancer types, and chemosensitivity to a variety of antitumor agents. This analysis was carried out using whole-exome sequencing (WES) data, gene expression microarray data, and drug response data for 255 agents from the Cancer Cell Line Encyclopedia (CCLE) [33, 34] and the GDSC resource [30, 35, 36].

Methods

Analysis of whole-exome sequencing data

We downloaded unprocessed WES BAM files, which were available for 325 CCLE cell lines (Fig. 1), from the CCLE project at the National Cancer Institute (NCI) Cancer Genomics Hub; these data are available at the NCI Genomic Data Commons (GDC) data portal [37]. All CCLE WES data had been reported to be sequenced at the Broad Institute using the same version of the Agilent Exome Bait kit, and the same sequencing protocols and data processing pipeline were applied to all samples across all cancer categories [37, 38].

Fig. 1
figure 1

Venn diagram showing the numbers of CCLE cell lines with available data

Raw BAM files were preprocessed according to the GATK Best Practices pipeline v. 3.5 as of 15 May 2016 [39,40,41] using default or recommended parameters for each tool and using Hg19 as the reference human genome assembly. Single nucleotide variant discovery using preprocessed BAM files was carried out with VarScan2 using default parameters [42]. Nucleotide substitutions were filtered by their allele frequencies in the 1000 Genomes Project dataset (August 2015 release), eliminating common population variants with variant allele frequency > 1% in the combined 1000 Genomes Project dataset from all populations [43]. To identify the prevalence of mutation counts, we computed the sum of identified single nucleotide variants across all sequenced exome regions in several separate categories of DNA sequence changes including all SNV mutation counts, as well as C>G, C>T, and C>K counts on one or both genome strands.

We searched the WES nucleotide changes in each cell line for the presence of the three reported APOBEC3B mutation motifs, T(C>K)W, T(C>D)R, or T(C>D)D. This motif representation includes nucleotide IUPAC symbols in three consecutive genome sequence positions, with the two symbols in parenthesis separated by the “>” symbol indicating the direction of nucleotide substitution change. For example, T(C>K)W indicates that the reference genome sequence is 5′-TCA-3′ or 5′-TCT-3′, and an either C>G or C>T substitution was found in the second nucleotide of the triplet. We refer to the three sequence motifs, T(C>K)W, T(C>D)R, and T(C>D)D which were analyzed in this study, as APOBEC-like motifs, in order to distinguish them from the APOBEC mutational signature term, which commonly refers to a matrix of mutational changes that are characteristic of APOBEC activity in the 96-trinucletide format [14, 44]. Both motif and signature formats represent the same patterns of APOBEC mutational activity, and both terms have been used interchangeably in the earlier reports [10].

Because APOBEC activity is characterized by clusters of co-occurring APOBEC motifs with closely spaced mutations on the same genome strands, we further searched each cell line for the presence of kataegis clusters, which were defined using two different but related criteria, either as (a) the same motif occurring on the same genome strand at least five times in a 1000-bp window, to which we refer as 5/1000; or as (b) the same motif occurring on the same genome strand at least six times in a 10,000-bp window, to which we refer as 6/10000. For each cell line, four possible measures of APOBEC-like mutational activity were considered, which defined overall abundance of the APOBEC-like motifs and the abundance and the length of kataegis clusters per WES data of that cell line: (1) the total number of APOBEC-like motifs present in the WES data of each cell line, (2) the number of APOBEC motifs in distinct non-overlapping kataegis regions in WES data of that cell line, (3) the number of distinct non-overlapping kataegis regions in WES data of that cell line, and (4) the total combined length of distinct non-overlapping kataegis regions in WES data of that cell line. We also examined seven overall nucleotide substitution counts for each cell line, including the combined counts of all categories of nucleotide substitutions, and the numbers of C>G, C>T, or C>K substitutions on the reference genome strand and on both genome strands.

Gene expression analysis

Log2-transformed gene expression levels that were available for 1036 cell lines from the Cancer Cell Line Encyclopedia (Fig. 1) were downloaded from the CCLE web resource of the Broad Institute [34]. These measures had been generated using Affymetrix Human Genome U133 Plus 2.0 microarrays and normalized using the Robust Multi-array Average (RMA) algorithm [33, 45]. We analyzed expression of five genes, APOBEC3B, APOBEC3A, REV1, UNG, and FHIT, which may be involved in generation of APOBEC-like mutation motifs. Gene expression data from multiple microarray probes for each gene were averaged. Microarray-derived gene expression values for each gene analyzed in this study were in strong agreement with RNA-seq gene expression measures which recently became available from the CCLE resource [34], with Spearman correlation coefficient ρ between 0.883 and 0.947 and the correlation p values ≤ 3.33 × 10−144 for each of the five genes (data not shown).

To examine possible associations of expression levels of APOBEC3A and APOBEC3B with the germline APOBEC3B gene deletion, we downloaded the copy number status of the APOBEC3B gene from the CCLE web resource of the Broad Institute [34]. The copy number data had been generated by the CCLE Consortium using Affymetrix 6.0 SNP arrays, with segmentation of normalized log2 ratios of the copy number estimates performed using the circular binary segmentation algorithm [34].

Analysis of drug response

The IC50 measures of cell line chemosensitivity, representing the total drug inhibitor concentration that reduced cell activity by 50%, were available for 24 drug agents from the Cancer Cell Line Encyclopedia [33] (Fig. 1). These data were downloaded from the CCLE web resource of the Broad Institute [34]. In addition, chemosensitivity values for 251 drug agents for the same cell lines were available from the Genomics of Drug Sensitivity in Cancer resource [30, 35, 36]. GDSC drug response data, in the ln(IC50) format, were obtained from the supplementary Table 4A of Iorio et al. [30]. All drug sensitivity values derived from the CCLE and GDSC datasets were transformed to the log10(IC50) scale, to which we further refer as log(IC50). Identities of cell lines present in both CCLE and GDSC datasets were verified using information from Cellosaurus [46]. Drug sensitivity measures for 11 agents which were present in both CCLE and GDSC datasets were analyzed separately for the CCLE and GDSC response measures. For those agents that had duplicate measurements within the GDSC dataset [30], we analyzed their drug response by using a combined average of their drug response measurements from separate experiments. The resulting dataset had 275 CCLE and GDSC drug response measures for 255 distinct antitumor agents. The concordance of drug response measures between the CCLE and GDSC datasets has been studied extensively [47, 48] and validated in an independent screening study [49]. While some authors questioned the extent of the agreement between the two sets of measures [48], most studies confirmed that for the majority of the agents, a solid overall agreement was found between the drug response measures, cell line classification as sensitive or resistant, and molecular predictors of drug sensitivity derived from the GDSC and CCLE datasets [47, 49].

Statistical analysis

We examined Spearman rank-order correlation among gene expression values, mutation counts, measures of abundance of motifs and kataegis clusters, and drug sensitivity values (log10(IC50)) in a combined analysis of all cancer types and within individual types of cancer. The p values were adjusted for multiple testing using the Benjamini and Hochberg method of adjustment for false discovery rate, or FDR [50], accounting for 275 drug sensitivity measures, 3 APOBEC-like motifs, 7 different categories of mutation counts, and expression levels of 5 candidate genes. Correlations with FDR adjusted p < 0.05 were considered statistically significant. In this report, ρ denotes the Spearman correlation coefficient, p is a p value prior to FDR adjustment, padj is an FDR-adjusted p value, Ntests is the number of correlation tests for which the FDR adjustment of p values was made, and n is the sample size (the number of cell lines used in estimation or the number of pairs included in the correlation analysis). We focused our discussion on statistically significant moderate or strong correlation results with padj < 0.05 and the absolute value of Spearman correlation coefficient |ρ| > 0.25.

Analyses of candidate gene expression levels, motif and kataegis cluster abundance, and correlation analyses were performed both in a combined dataset of all cell lines from different cancer types (pan-cancer analysis), and also within 32 individual cancer categories (Table 1). Many cancer categories were based on TCGA definitions. However, some cancer types from the same organ were grouped in broader categories in order to allow for an inclusion of a broader range of the cell lines than those defined by the TCGA enrollment criteria, and additional categories were included with several cancer types not presented in TCGA (e.g., small cell lung cancer and pediatric tumor categories). These categories are described in Table 1 and in the list of abbreviations. Only those cancer types for which at least 5 cell lines had pairs of available matching data (e.g., WES and expression, expression and drug response, or WES and drug response information) were included in the stratified correlation analyses of individual cancer categories. Accordingly, adjustment for false discovery rate in correlation analyses accounted for 23 cancer categories with ≥ 5 cell lines per category for gene expression comparisons, 17 cancer categories with ≥ 5 cell lines that had both expression and WES data, 26 cancer histologies with expression and chemosensitivity data, and 26 cancer types with ≥ 5 cell lines that had both drug sensitivity data and counts of specific APOBEC-like motif counts derived from WES data. All cell lines with available data were included in the pan-cancer correlation analysis combining all cancer categories. To examine the possible effect of the estrogen receptor status on drug sensitivity of breast cancer cell lines, we performed an additional stratified analysis of ER+ and ER breast cancer cell lines, with their estrogen receptor status defined based on available literature reports [51,52,53,54].

Table 1 Expression of the five candidate genes in cell lines from different cancer types

Bioinformatic and statistical analyses were performed using Python v. 2.7 and R v. 3.4.

Results

Candidate gene expression patterns

Table 1 provides expression levels of each candidate gene in the cell lines from individual cancer types as well as average gene expression levels in the pan-cancer dataset. Examination of gene expression measures in the pan-cancer dataset showed a bimodal distribution of APOBEC3B expression (Fig. 2b), whereas APOBEC3A, REV1, UNG, and FHIT had unimodal distributions of their expression measures (Fig. 2a, c–e). Analysis of the APOBEC3B copy number status showed that low levels of APOBEC3B expression were observed both in the samples with the APOBEC3B gene loss due to the APOBEC3B germline deletion polymorphism and in a number of samples without the loss of the APOBEC3B gene (Fig. 2f). The expression of APOBEC3A was low in many of the cell lines (mean = 3.89; Table 1; Fig. 2a), in agreement with an earlier study [7].

Fig. 2
figure 2

ae Histograms and density functions showing the distributions of expression of the five candidate genes in the cell lines. a APOBEC3A. b APOBEC3B. c REV1. d UNG. e FHIT. Horizontal scale represents log2-transformed gene expression values. The left vertical scale represents cell line counts, whereas the right vertical scale represents density values. f A scatterplot of APOBEC3B vs APOBEC3A expression in 1012 cell lines from the CCLE microarray expression dataset which shows the copy number status of the APOBEC3B gene according to the CCLE data [33]. Cell lines with log2(normalized ratio of APOBEC3B copy number estimate) ≥ − 0.75 are shown in blue, whereas those with log2(normalized ratio of APOBEC3B copy number estimate) < − 0.75 are shown in red

When compared to the mean APOBEC3A and APOBEC3B gene expression levels in the pan-cancer dataset (Table 1; mean expression values of 3.89 and 8.43, respectively), cell lines from the following cancer categories had elevated expression values of both APOBEC3A and APOBEC3B: bladder (mean values of 4.11 and 9.59, respectively), head and neck (HNSC; 4.93 and 9.54), chronic myelogenous leukemia (LCML; 6.20 and 12.56), and multiple myeloma (MM; 4.12 and 9.52). Several other cancer types had increased levels of expression of the APOBEC3B gene, but their mean expression levels of APOBEC3A were comparable to the mean APOBEC3A expression across all cancer types. Among the cancer categories with ≥ 5 cell lines, these included acute myeloid leukemia (LAML; mean APOBEC3B expression of 9.44) and melanoma (MEL; 9.81).

Our findings of elevated APOBEC3B and APOBEC3A expression in cell lines from several cancer types presented in Table 1 were consistent with earlier studies of patient-based samples. Many earlier studies reported elevated expression and activity of APOBEC3B and APOBEC3A in bladder cancer and of APOBEC3B in head and neck cancer patients [5, 6, 9, 55, 56]. APOBEC-derived mutagenesis is considered to be the predominant mutation source in 65% of invasive bladder cancers in the TCGA dataset [57]. Similarly, a genomic signature attributed to APOBEC3 activity was reported in a subset of patients with all melanoma subtypes, although C>T transitions attributed to APOBEC activity could be confounded with UV-induced substitutions in many melanoma cells [12, 57, 58]. Increased expression and activity of both APOBEC3A and APOBEC3B were also reported in multiple myeloma patients, most commonly in those with the t(14:16) translocation, which was associated with poor survival [56, 59, 60].

Elevated levels of UNG expression, but not of other candidate genes, were found in the prostate adenocarcinoma (PRAD; 10.15) and small cell lung cancer (SCLC; 10.10) cell lines (Table 1). Clusters of single-strand mutation patterns suggestive of APOBEC activity were previously reported in prostate cancer [56], and it may be possible that increased UNG expression may contribute to mutagenesis in that cancer category. Because abrogated FHIT activity may increase the levels of mutagenesis both as a standalone mechanism and synergistically with APOBEC3B [7, 9, 32], we note that cell lines from several cancer types including head and neck (4.85) and sarcoma (4.87) had a considerably lower mean FHIT expression than the pan-cancer average (5.74). Therefore, both high levels of APOBEC3B and APOBEC3A and low levels of FHIT expression may influence APOBEC mutagenesis in the head and neck cancer.

Expression levels of APOBEC3B showed strong and statistically significant positive correlation with APOBEC3A expression in 21 cancer categories (Table 2; ρ between 0.576 and 1.000; padj < 0.05). These categories (NSCLC, LAML, GLIOMA, COAD/READ, MATBCL, STAD, OVARIAN, RCC, MEL, CLLE, SAR, BREAST, BLADDER, LIHC, EC, PAAD, HNSC, CESC, MM, THCA, and UCEC; see legend of Table 1 and the list of abbreviations for their description) included both solid tumors and hematological malignancies. A strong positive and highly significant correlation between APOBEC3B and APOBEC3A expression was also observed in the pan-cancer analysis (Table 2; ρ = 0.714, padj < 0.001, n = 1036, Ntests = 10). Interestingly, breast cancer cell lines were among the cancer types with positive correlation between APOBEC3A and APOBEC3B expression (Table 2). Earlier studies found strong evidence for increased APOBEC3B activity and mutagenesis in a subset of breast cancers [7, 20, 21, 61] and with APOBEC signature enrichment in the HER2 breast cancer subtype and in triple negative breast cancer (TNBC) [6, 62]; however, a study of breast cancer cell lines found generally low levels of APOBEC3A expression [29]. Possible molecular impact of coordinated expression levels of APOBEC3A and APOBEC3B in the breast cancer cell lines analyzed in our study is of interest and requires further investigation.

Table 2 Significant correlations among candidate gene expression levels

Expression of APOBEC3B was significantly negatively correlated with FHIT expression in glioma cell lines (ρ = − 0.407, padj = 0.0022, n = 79, Ntests = 230). This negative correlation is notable because low levels of the FHIT gene expression or the loss of FHIT function have been reported to have a cooperative effect with APOBEC3B in mutagenesis, even though APOBEC3B overexpression and DNA damage induced by the replication stress caused by the loss of FHIT have been proposed to occur independently from each other [7, 9, 32]. Negative correlation between APOBEC3B and FHIT expression levels could potentially produce hypermutated clusters in those cells where APOBEC3B expression were elevated and FHIT expression were diminished. However, this did not appear to be the case because in our analysis of glioma cell lines, which included astrocytoma, lower-grade glioma, and glioblastoma multiforme cell lines, mean APOBEC3B and FHIT expression levels were comparable to those in the pan-cancer dataset (Table 1). Such expression levels were consistent with earlier studies [12, 63], which had reported low levels of APOBEC3B in lower-grade glioma TCGA patient samples and had suggested that mutation processes in glioma tumors could be caused by mechanisms other than APOBEC mutagenesis.

UNG expression was negatively correlated with APOBEC3B expression in mature B cell lymphoma cell lines (MATBCL; ρ = − 0.372, padj = 0.034, n = 60, Ntests = 230) and with APOBEC3A expression in chronic lymphocytic leukemia cells (CLLE; ρ = − 0.324, padj = 0.037, n = 78, Ntests = 230). Expression levels of UNG and REV1 were significantly positively correlated in non-small cell lung cancer cell lines (NSCLC; ρ = 0.344, padj = 2.98 × 10−5, n = 186, Ntests = 230).

APOBEC-like mutation motifs and mutation loads in cancer cell lines

Prevalence of mutation counts and single nucleotide positions in the combined analysis of all cancer categories and within individual cancer types in the 325 cell lines with available WES data is provided in Table 3. Because some individual cancer categories had small sample sizes of the cell lines with WES data, not all mutation counts in cell lines were representative of mutation counts in large patient samples for specific cancer types. For example, mutation counts at single nucleotide positions in the bladder cancer category, which included six cell lines, were lower than the typically high mutation rates that are commonly seen in bladder cancer patients [12, 57, 64]. However, clusters of mutations in genome regions have been reported to provide a more robust representation of mutational processes in tumor genomes that do average mutation rates at single positions [13]. As discussed below, the prevalence of APOBEC-like motifs and kataegis clusters (Fig. 3) in bladder cancer cell lines and in cell lines from several other cancer categories of our dataset was generally consistent with the relative ranking of cancer categories previously described using patient data.

Table 3 Prevalence of mutation counts in the whole-exome sequencing data
Fig. 3
figure 3

ac Overall motif counts in different cancer types and across all cell lines (pan-cancer analysis). The y axis is presented on the log10 scale. a T(C>K)W motif counts. b T(C>D)R motif counts. c T(C>D)D motif counts. df Numbers of distinct, not overlapping 5/1000 kataegis clusters with ≥ 5 motifs on the same genome strand per 1000 bp in different cancer types and in the pan-cancer dataset. d T(C>K)W motif counts. e T(C>D)R motif counts. f T(C>D)D motif counts. Horizontal middle bars show the mean for each cancer category. Vertical bars show mean ± standard deviation. Negative values of (mean − standard deviation) in d and e were truncated at 0. Cancer categories with no vertical columns had no predicted kataegis clusters (df) and/or too few cell lines to compute the standard deviation (n = 2 for mesothelioma, ac)

Table 4 shows the abundance of the three APOBEC-like motifs and their predicted kataegis clusters in WES sequence data of CCLE cell lines in the combined analysis of all cancer types. Among the three motifs, the commonly reported APOBEC3B motif with narrow specificity, T(C>K)W [7], resulted in the smallest numbers of predicted motifs (mean ± standard deviation of 603.58 ± 121.17) and kataegis clusters (0.12 ± 0.36 clusters of 5 motifs in 1000-bp windows per cell line), followed by higher numbers of motifs (743.51 ± 317.68) and kataegis clusters (0.56 ± 0.77) for the T(C>D)R motif. The highest numbers of APOBEC-like motifs (1184.94 ± 887.46) and clusters (2 ± 1.2 per cell line) were predicted for the least specific motif, T(C>D)D. That motif included possible nucleotide changes of both motifs T(C>K)W and T(C>D)R. Similar patterns were observed for the combined length of the 5/1000 kataegis clusters, the numbers of motifs in distinct 5/1000 clusters, or when considering 6/10000 kataegis clusters (Table 4).

Table 4 Prevalence of APOBEC mutation motifs and kataegis clusters in a combined analysis of all cancer categories

Similar trends in the abundance of motifs and kataegis-like clusters were also observed among individual cancer categories, as presented in Fig. 3, which shows the distributions of motif counts and numbers of the 5/1000 kataegis clusters among cell lines from different cancer types. For the most specific APOBEC motif, T(C>K)W, the highest mean number of motifs per cell line was observed in cervical squamous cell carcinoma and endocervical adenocarcinoma (CESC; mean = 736 motifs per cell line), followed by bladder cancer (mean = 716 motifs), and melanoma (mean = 642 motifs; Fig. 3a). These categories have been reported to have high levels of APOBEC3 activity [12], although some C>K mutations in melanoma were likely caused by ultraviolet (UV) radiation [10, 14]. The highest mean number of the 5/1000 kataegis clusters with the T(C>K)W motif was observed in bladder cancer (mean = 0.33 clusters per cell line), followed by mature B cell lymphoma (MATBCL; mean = 0.28 clusters), and NSCLC (mean = 0.19 clusters; Fig. 3d). For a less specific motif, T(C>D)R, the three cell line categories with the highest mean numbers of motifs were CESC (mean = 1343 motifs per cell line), uterine corpus endometrial carcinoma (UCEC; mean = 842 motifs), and bladder cancer (mean = 781 motifs; Fig. 3b). While high levels of APOBEC3 activity have been reported in these cancers, additional mechanisms may also be contributing to UCEC mutagenesis [12]; in addition, only two UCEC cell lines had WES data, resulting in a very small sample size. The highest mean number of the 5/1000 kataegis clusters with the T(C>D)R motif was observed for THCA (mean = 1.00 cluster), followed by MATBCL (mean = 0.83 clusters) and the liver hepatocellular carcinoma (LIHC; mean = 0.76; Fig. 3e). The highest counts of the third and the least specific motif, T(C>D)D, were found in CESC (mean = 2744 motifs per cell line), UCEC (mean = 2177 motifs), and bladder cancer cell lines (mean = 1221 motifs; Fig. 3c). These cancer categories been reported to have strong APOBEC3 activity [12]. The highest numbers of 5/1000 kataegis clusters with the T(C>D)D motif were observed in LIHC (mean = 2.65 clusters), renal cell carcinoma (RCC; mean = 2.50 clusters), and UCEC (mean = 2.50 clusters; Fig. 3f). When 6/10000 kataegis clusters (data not shown), the two cancer types with the highest mean numbers of kataegis clusters were LIHC (mean = 0.76 clusters for T(C>K)W, 1.24 clusters for T(C>D)R, and 3.24 clusters for the T(C>D)D motif) and RCC (mean = 0.38, 0.88, and 2.13 clusters, respectively).

Our findings for bladder cancer, melanoma, non-small cell lung cancer, uterine corpus endometrial carcinoma, and prostate adenocarcinoma were consistent with previous reports which suggested the roles for APOBEC3 mutagenesis in those cancer types [5, 6, 12, 57, 60, 65]. In contrast, APOBEC3B was reported to be less likely to play a role in mutagenesis of renal cell carcinoma cell lines [6, 12, 65], suggesting that high prevalence of mutation clusters in the RCC cell lines observed in our study could be generated by molecular factors other than APOBEC3B. The increased prevalence of mutagenic clusters in mature B cell lymphoma cell lines may be explained by the effects of translesion synthesis DNA polymerase η [13, 66]. It is also possible that some of the mutations in MATBCL could be explained by a partial overlap of the motifs examined in our study with a characteristic signature for another member of the APOBEC family, the activation-induced cytidine deaminase (AID), which has been linked to mutagenesis in MATBCL. However, AID has a distinct preference for the WRCY/RGYW motif, and its mutational signature is distinguishable from that of APOBEC3A/B [9, 10, 16, 67], and therefore, it is less likely that an increased number of APOBEC3-like motifs found in MATBCL could be attributed to AID activity.

The statistically significantly increased APOBEC3B gene and protein expression in hepatocellular carcinoma as compared to non-tumor tissues, as well as the high rates of C>D mutation changes in the genomes of hepatocellular carcinoma tumors have been documented previously [68,69,70,71,72], in agreement with an increased prevalence of APOBEC-like motifs in LIHC cell lines in our dataset (Fig. 3). However, the potential role of APOBEC3B in mutagenesis in hepatocellular carcinoma has been controversial, with some studies reporting its tumor-inducing roles and others suggesting that it may play a role in tumor suppression. Mutation signature analysis found the presence of signatures other than those induced by APOBEC3B in patient samples of hepatocellular carcinoma [11]. Other molecular factors such as transcription-coupled repair, inhibition of UNG accompanied by APOBEC3G-induced hypermutation, translesion synthesis by one of the DNA polymerases, or the role of APOBEC1 have been implicated in mutagenesis of hepatocellular carcinomas [10, 17, 69,70,71, 73, 74], and therefore it may be possible that the increased prevalence of APOBEC-like motif clusters in LIHC cell lines may be caused by factors other than APOBEC3B.

Correlation of gene expression levels with mutation counts and with prevalence of APOBEC-like motifs

Analysis of the pan-cancer dataset showed a very weak correlation (|r| ≤ 0.161) of expression levels of candidate genes with motif counts, counts of kataegis clusters, and mutation counts in the WES data. None of these correlations were statistically significant (padj ≥ 0.08). Among the five candidate genes, the strongest correlations were observed for APOBEC3A, APOBEC3B, and REV1.

Among individual cancer types, we observed a strong (ρ between − 0.738 and − 0.902) and statistically significant (padj < 0.05) negative correlation of the frequencies of C>T, C>G, and C>K substitutions and overall nucleotide substitution counts with REV1 expression in sarcoma and UNG expression in melanoma (Table 5). The third ranking gene for correlations with mutation counts was APOBEC3A. Although it did not reach the stringent threshold of FDR adjusted p < 0.05, it showed strong positive correlations (ρ ≤ 0.90, padj ≥ 0.07) with several categories of mutation counts in renal cell carcinoma. APOBEC3B expression also had the strongest correlation with mutation counts in RCC as opposed to other cancer categories; however, such correlations for APOBEC3B were somewhat weaker and less significant (ρ ≤ 0.86, padj ≥ 0.16) than those for APOBEC3A (data not shown). These correlation results suggest a strong contribution of REV1, UNG, and possibly APOBEC3A to overall mutagenesis in sarcoma, melanoma, and renal cell carcinoma, respectively. A large proportion of C>T and C>G substitutions in melanoma cell lines were likely generated via mutagenic processes related to UV radiation exposure [10, 14]. However, the role for APOBEC3 in melanoma mutagenesis has also been established in a subset of melanomas [58], and experimental evidence has suggested an important role of APOBEC3A generating mutations specific to skin lesions [75].

Table 5 Statistically significant correlations of gene expression levels with mutation counts

Among the correlations of gene expression levels with APOBEC-like motif counts and measures of kataegis, significant or nearly significant correlations were observed for UNG expression with kataegis measures (ρ between  0.81 and − 0.80, 0.039 ≤ padj ≤ 0.063, n = 17, Ntests = 475) of the T(C>D)D motif in melanoma, and for APOBEC3A expression with motif counts and kataegis measures in renal cell carcinoma (ρ between 0.93 and 0.98, 0.008 ≤ padj ≤ 0.087 with n = 8 and Ntests = 510 for the T(C>D)R and T(C>D)D motifs; data not shown).

Correlation of candidate gene expression with chemosensitivity

Table 6 lists the strongest (|ρ| > 0.25) statistically significant (padj < 0.05) correlations between candidate gene expression levels and cell line chemosensitivity to drug treatment. Several strong correlations were observed in PAAD, PRAD, CESC, MM, SAR, RCC, NSCLC, MEL, and SCLC cell lines.

Table 6 Strongest significant correlations between candidate gene expression and drug sensitivity

In pancreatic adenocarcinoma (PAAD) cell lines, both APOBEC3A and UNG expression was significantly negatively correlated (Table 6; ρ ≤ − 0.819, padj ≤ 0.0001; n = 28 for APOBEC3A and 5 for UNG; Ntests = 26,610) with log(IC50) of the BET inhibitor JQ1 (Fig. 4a). JQ1 has been reported to inhibit pancreatic cancer cells in vitro and in vivo [76,77,78]. Correlation of APOBEC3A and UNG expression with PAAD sensitivity to JQ1 may suggest a possibility that expression of both of these genes may be relevant to the strength of the clinical response to this agent.

Fig. 4
figure 4

Scatterplots of drug sensitivity measures from the GDSC dataset in selected cancer types. a log(IC50) of JQ1 vs log2 of the APOBEC3A gene expression in pancreatic adenocarcinoma cell lines. b log(IC50) of bicalutamide vs the combined length of predicted 5/1000 kataegis clusters with the T(C>D)D motif in breast cancer cell lines. The names of individual breast cancer cell lines are shown. r Pearson’s correlation coefficient

Expression of REV1 in the non-small cell lung cancer cell lines was significantly positively correlated with log(IC50) of MEK (mitogen-activated protein kinase) inhibitors PD-0325901, RDEA119, and trametinib, as well as AKT inhibitor VIII, XIAP inhibitor embelin, PI3Kβ inhibitor AZD6482, and a cyclin-dependent kinase (CDK) 4/6 inhibitor PD-0332991, or palbociclib (Table 6; 0.348 ≤ ρ ≤ 0.405, padj ≤ 0.0436, n ≥ 100, Ntests = 26,610). A number of these agents, e.g., trametinib and its combination with palbociclib, have been used or are under investigation for treatment of NSCLC [79, 80]. PD-0325901 has an in vitro inhibiting effect in NSCLC; however, a phase II clinical trial of that antitumor agent in NSCLC patients did not meet the primary efficacy end point [81, 82]. RDEA119 (refametinib) has antitumor activity in a variety of cancer types including in vitro activity in NCSLC, and it has been under evaluation for its effectiveness in NSCLC [82,83,84].

In melanoma cell lines, FHIT expression was associated with chemoresistance to the ALK inhibitor TAE684 (Table 6; ρ = 0.621, padj = 0.0326, n = 38, Ntests = 26,610).

Multiple strong significant correlations between expression levels of each of the five candidate genes and sensitivity to multiple agents were found in prostate adenocarcinoma (Table 6); however, the sample size of the PRAD category was small (n = 5), and therefore the validity of such correlations may require confirmation in a larger dataset. Similarly, additional correlations found in MM, SAR, CESC, RCC, and SCLC cell lines reported in Table 6 had n between 5 and 6 and also require a follow-up confirmation in larger datasets.

In agreement with an earlier report [31], we did not observe an association between APOBEC3B expression in breast cancer cell lines and sensitivity to CHK1 inhibitors AZD7762 (ρ = − 0.198, padj = 0.8660, n = 33, Ntests = 26,610) or Calbiochem 681,640 (ρ = 0.143, padj = 0.933, n = 40, Ntests = 26,610, data not shown), and no other correlations between gene expression and log(IC50) in breast cancer cell lines were statistically significant. Although an association between APOBEC3B expression in breast cancer cells and sensitivity to another CHK1 inhibitor, CCT244747, was previously reported [29], that agent was absent from both the CCLE and the GDSC drug sensitivity data sets.

In the pan-cancer analysis, APOBEC3B expression was significantly negatively correlated with sensitivity to an HSP90 (molecular chaperone heat shock protein 90) inhibitor 17-AAG (tanespimycin) (Table 6; ρ = − 0.293, padj = 5.85 × 10−9, n = 536, Ntests = 1375). Higher levels of APOBEC3B expression were associated with higher sensitivity to this agent, which may have a clinical significance. 17-AAG acts in a variety of tumor types [85], and sensitivity to this agent was also correlated with APOBEC3B in an earlier analysis of RNA-seq gene expression in the CCLE and GDSC cell lines by Cescon and Haibe-Kains [31].

Some other strong association results did not reach statistical significance, but they had padj close to 0.05. For example, higher level of expression of APOBEC3B in glioma was correlated with increased sensitivity to an HSP90 inhibitor AUY922 (ρ = − 0.556, padj = 0.0701, n = 44, Ntests = 26,610; data not shown). This correlation may have a clinical significance, as this agent has an antitumor effect in glioblastoma [85].

Correlation between the prevalence of kataegis clusters and chemosensitivity

We examined correlations between chemosensitivity to anticancer drugs and the prevalence of predicted kataegis clusters of APOBEC-like motifs which were identified using the 5/1000 criterion. None of the correlations achieved statistical significance in the combined analysis of all cancer cell lines (padj > 0.1 for comparisons). In a stratified analysis among cancer types, a number of statistically significant strong correlations (0.991 ≤ |ρ| ≤ 1.0, padj ≤ 0.0021) were observed in BREAST, COAD/READ, GLIOMA, OVARIAN, and PAAD cell lines (Table 7). However, the number of cell lines in each cancer category with significant correlations was small (n = 5–7), and therefore, these correlations need future confirmation in larger collections of cell lines of their respective cancer categories. Among notable correlations, the combined length of clusters with the T(C>D)D motif had a strong correlation (5 ≤ n ≤ 7, Ntests = 1834) with chemoresistance to bicalutamide, a nonsteroidal antiandrogen drug, in the pancreatic adenocarcinoma and breast cancer cell lines (Table 7; Fig. 4b). As discussed above, we did not observe a statistically significant correlation between expression of any candidate gene and the prevalence of T(C>D)D or any other motif in breast cancer cell lines. Sequence variation of breast cancer genomes is shaped by a diversity of mutational processes [86], and further investigation is needed to establish whether the T(C>D)D motif in the breast cancer cell lines is predominantly generated by APOBEC3B and APOBEC3A and/or requires an additional role or REV1, UNG, and FHIT, or whether it involves other molecular mechanisms. Bicalutamide is effective in androgen receptor (AR)-positive breast tumors [87, 88]. Previous studies demonstrated the effectiveness of this agent in triple negative breast tumors [89]. To our knowledge, no relationship between the abundance of APOBEC-like signatures and sensitivity to this agent has been reported, although HER2-enriched cell lines have been reported to have high levels of APOBEC mutagenesis and to be among the breast cancer categories that are likely to be sensitive to bicalutamide [6, 62, 89]. Consistent with an earlier report that suggested the higher prevalence of APOBEC signature in TNBC cells [62], we found that the two TNBC lines with available WES data and bicalutamide sensitivity measures, HCC1395 and MDA-MB-436, had large values of the combined length of the kataegis clusters with the T(C>D)D motif (Fig. 4b). However, both of these cell lines had relatively low sensitivity to bicalutamide in the GDSC dataset (Fig. 4b). We did not find any obvious association between molecular subtypes of the available breast cancer cell lines in our dataset, including their HER2 status [51,52,53], that could explain the inverse relationship between the length of the T(C>D)D motif clusters and bicalutamide sensitivity presented in Fig. 4b. It is possible that AR-positive status which is associated with bicalutamide sensitivity could affect the expression of genes involved in T(C>D)D motif signature generation; however, the exact molecular mechanisms underlying this relationship remain unclear.

Table 7 Significant correlations between the measures of prevalence of APOBEC-like motifs or kataegis clusters and drug sensitivity

Multiple other strong correlations were observed in different cancer categories. For example, in pancreatic adenocarcinoma cell lines, log(IC50) values of tipifarnib, a farnesyl transferase inhibitor of the Ras pathway [90], the AKT kinase inhibitor VIII, and the IGF1R/insulin receptor inhibitor GSK-1904529A [36] were associated (|ρ| = 1, padj ≤ 5.15 × 10−22, n = 5, Ntests = 1834) with the overall counts of the motif T(C>K)W which is commonly attributed to APOBEC3B activity. Similarly, log(IC50) of the hedgehog signaling pathway inhibitor vismodegib [91] and of the PPARγ/PPARδ inhibitor FH535 [36] were associated with the overall counts of the T(C>D)R motif. The overall counts of the T(C>D)D motif were associated with log(IC50) of the PKCB inhibitor LY317615 [36], whereas the length of its predicted kataegis regions was associated with log(IC50) of the Aurora kinase A/B inhibitor Genentech Cpd10, a DNA-damaging agent gemcitabine, and, as discussed above, with a nonsteroidal antiandrogen agent bicalutamide (Table 7). While the correlation of these motif counts and kataegis measures with drug sensitivity in PAAD is notable, none of the five candidate genes had significantly associated expression with sensitivity to these agents in PAAD cell lines, although, as discussed above, in the NSCLC cell lines, log(IC50) of AKT inhibitor VIII was correlated with REV1 expression (Table 6; ρ = 0.373, padj = 2.51 × 10−5, n = 121, Ntests = 26,610). Further validation of observations presented in Table 7 is needed in larger datasets of specific cancer types.

Discussion

We observed a bimodal distribution of APOBEC3B expression and unimodal distributions of APOBEC3A, REV1, UNG, and FHIT in the pan-cancer dataset (Figs. 2a–e). The bimodal distribution of APOBEC3B is likely due to several reasons which include previously reported differences in expression levels of this gene among specific cancer types and individual cell lines within specific cancer categories, along with the germline deletion polymorphism that results in the loss the APOBEC3B gene in a subset of the samples [7, 11, 17, 43, 58, 92]. The bimodal distribution of APOBEC3B expression is of interest since some studies previously suggested the utility of the genes with bimodally distributed expression patterns as diagnostic and prognostic biomarkers within specific cancer types [93, 94].

We observed low expression levels of APOBEC3B in a subset of cell lines and of APOBEC3A in many cell lines (Fig. 2; Table 1). Low pre-treatment levels of APOBEC3A have been reported previously, and expression of both APOBEC3B and APOBEC3A has been reported to increase in response to cancer cell treatment with DNA-damaging agents or as part of cellular interferon-induced transcriptional response to viral infections [7]. Low expression levels of APOBEC3A in nearly all cancer categories and of APOBEC3B in specific cancer categories may provide high levels of noise in correlation analyses [95], and therefore, association results for these genes should be interpreted with caution.

As shown in Fig. 2f, a strong correlation between APOBEC3A and APOBEC3B expression levels (Table 2) appeared to be independent from the APOBEC3B deletion polymorphism which removes the coding area of the APOBEC3B gene and creates a fusion transcript of APOBEC3A with the 3′-UTR of the APOBEC3 gene, although earlier reports suggest that this transcript increases APOBEC3A levels due to the increase in stability of the fusion transcript [7, 17, 26]. According to Fig. 2f, the correlation between the APOBEC3A and APOBEC3B gene expression levels also appears to be independent of the copy number status of the APOBEC3B gene. One possible explanation could be a transcriptional co-regulation of these two genes, which are located in proximity of one another in the chromosomal region 22q13.1 [7].

Mutagenesis in cancer cells generated due to the activity of APOBEC family members, and in particular of APOBEC3B, has been a subject of many recent studies. While the contributing role of REV1, UNG, and FHIT activity to mutagenic processes has been well established [8, 9, 14, 20, 24, 66], their contribution to the generation of signatures attributed to APOBEC3B and other APOBEC family members and their possible effects on sensitivity to drug treatment have not been examined in depth. Our analysis of cancer cell lines showed that expression levels of REV1 and UNG were significantly correlated with mutagenesis in sarcoma and melanoma cell lines, respectively (Table 5), and that expression of all the five genes examined in our study was significantly correlated with chemosensitivity to various antitumor agents (Table 6).

We focused our analyses on two members of the AID/APOBEC family, APOBEC3A and APOBEC3B, and on three additional genes which are involved in molecular pathways associated in their mutagenesis. Several other APOBEC family members have been implicated in mutagenic processes, with some of them, e.g., AID, APOBEC3F, and APOBEC3G, showing sequence specificities that are distinct from APOBEC3A and APOBEC3B [9, 10, 16, 96]. However, the full extent of overlap among sequence specificities of different APOBEC family members remains an active research area. While we found an increased number of APOBEC-like motifs in mature B cell lymphoma, we did not include the AID gene expression in our analysis because both the mutational sequence specificity of AID and the biological context in which AID mutations occur are different from those of APOBEC3B and APOBEC3A [1, 9, 10, 16]. AID is an important deaminating factor in antigen-dependent antibody diversification process of immunoglobulin (Ig) genes through somatic hypermutation and class-switch recombination, and it has also been suggested to be involved in epigenetic processes of demethylation by deaminating cytosine, 5-methylcytosine (5-mC), or 5-hmC [1, 9, 10, 16, 67]. While translocations involving the Ig genes in B cell lymphomas and off-target hypermutational activity of AID in other genome regions have been found in several other cancer types (e.g., gastric, liver, breast, ovarian, lung, and T cell lymphomas), AID-specific mutational patterns are clearly distinguishable from the APOBEC3B/A signature patterns [9, 10]. AID deaminates cytosines within the characteristic WRC motif, or more broadly the WRCY/RGYW motif, with several other AID motif variants having been reported [1, 9, 10, 16]. The AID-specific motif is different from the three motifs reported for APOBEC3B and APOBEC3A that were analyzed in our study, and AID signature patterns can be distinguished computationally from those of APOBEC3A and APOBEC3B [10, 11]. For that reason, we excluded AID gene expression from our analysis.

Cancer cell lines provide a convenient model for a combined analysis of molecular information and drug response to a wide range of antitumor agents which cannot be achieved in a clinical setting. However, additional factors may affect clinical outcomes in vivo, including, for example, the strength of the immune response and interaction of the tumor with surrounding tissues. Expression levels of APOBEC3A, APOBEC3B, APOBEC3D, APOBEC3G, and APOBEC3H in tumor specimens from cancer patients were associated with varying clinical responses to chemotherapy and with overall patient survival, and possible suggested mechanisms of such associations, which may also involve other APOBEC genes, include immune targeting of increased mutation diversity due to higher levels of APOBEC mutagenesis, associated inflammation, PD-L1 expression on tumor-infiltrating mononuclear cells, and the degree of T lymphocyte infiltration [7, 92, 97,98,99].

Because our study analyzed cell line data, it could examine only cell line response to chemotherapy and did not account for in vivo effects that may also influence therapy response. Several correlations of APOBEC3B and APOBEC3A expression and of motifs attributed to APOBEC3 activity observed in our study were consistent with drug sensitivity associations with APOBEC3A and APOBEC3B activity identified in cell line models by a previous study [31]. Our analysis of breast cancer cell lines, however, was not able to replicate the previously reported correlation of APOBEC3B expression level in vivo with resistance to tamoxifen in a clinical setting or in murine xenograft models in ER+ breast cancer [18] due to the lack of statistical significance. We observed ρ between − 0.118 and − 0.049, padj > 0.94 (n = 43, Ntests = 26,110) for correlations of both APOBEC3B and APOBEC3A expression levels with log(IC50) of tamoxifen in breast cancer cell lines. Stratified analysis of ER and ER+ breast cell lines with available information about their estrogen receptor status showed the absence of association in the ER cell lines with log(IC50) of tamoxifen (− 0.083 ≤ ρ ≤ − 0.026, unadjusted p > 0.67, n = 28). In the ER+ cell lines, we observed an association with sensitivity to tamoxifen for both genes (ρ = 0.− 0.362 for APOBEC3A and − 0.418 for APOBEC3B, n = 13) which was consistent with that of Law et al. [18]; however, the results for both genes in our study were statistically non-significant (p = 0.157 for APOBEC3A and 0.224 for APOBEC3B), possibly due to a small number of ER+ breast cell lines in the dataset. Additionally, the study of Law et al. [18], which reported association of the APOBEC3B expression with tamoxifen resistance, included primary breast tumors from hormone therapy-naïve patients, whereas some of the cell lines in our analysis were likely obtained from patients with prior treatment. In our study, none of the correlations of chemosensitivity to tamoxifen with expression of either of the five candidate genes in any cancer category or in the pan-cancer analysis achieved statistical significance. Therefore, while our use of cell line resources was able to draw from a wealth of molecular information and the data on sensitivity to multiple tumor agents, in using the cell line-based approach, we also encountered several limitations including restricted clinical information, much smaller sample sizes than those available for patient-based clinical studies, and the absence of normal tissues from the same patients that could allow for more accurate inference of mutation calls and for tissue-specific normalization of gene expression levels.

Despite these limitations, we observed a number of correlations, e.g., those between APOBEC3A and APOBEC3B expression levels, that have also been reported in patient tumor samples [7]. In addition, our results presented in Table 6 show that expression of all five candidate genes was correlated with sensitivity to chemotherapy and that log(IC50) of a number of antitumor agents was significantly correlated not only with expression levels of APOBEC3B, but also with those of APOBEC3A, REV1 UNG, and FHIT. Three of these genes, REV1, UNG, and APOBEC3A, were also associated with overall mutation activity and/or with prevalence of APOBEC-like motifs and kataegis clusters in specific cancer types. Because APOBEC3A is also involved in RNA editing [26], association of its expression with drug sensitivity might potentially involve the RNA editing mechanism instead of or in addition to DNA mutagenesis; however, both of these mechanisms would require additional experimental validation. Additionally, as APOBEC3A has also been linked to epigenetic processes of DNA demethylation [1, 3, 4], its involvement in epigenetic mechanisms of sensitivity or resistance to cancer treatment cannot be ruled out, even though the associations reported in Tables 6 and 7 involve non-epigenetic agents.

Recent studies suggest that clustered mutations, including those attributed to APOBEC activity, more accurately represent mutagenic processes in tumors than do overall mutation rates [13]. We observed significant correlations of the prevalence of all the three APOBEC-like motifs with chemosensitivity to multiple agents in small groups of cell lines from specific cancer types (Table 7). When using measures of kataegis clusters, we observed correlations of the combined length of kataegis clusters of the least specific T(C>D)D motif with sensitivity to various agents in breast, pancreatic adenocarcinoma, and colon adenocarcinoma and rectum adenocarcinoma cancer cell lines. However, because expression of none of the five candidate genes was significantly associated with the abundance of the T(C>D)D motif or with the clusters containing this motif, further studies are needed to better understand the mutational pathways generating the T(C>D)D motif and to examine whether additional members of the APOBEC family or translesion DNA polymerases may contribute to its occurrence. Molecular mechanisms underlying correlations of cell line response to treatment with specific agents with motif abundance or with expression of APOBEC3A, APOBEC3B, REV1, UNG, and FHIT also require further investigation. Nevertheless, specific correlations observed in our studies suggest that both expression levels of candidate genes and the prevalence of APOBEC-like motifs and their clusters could potentially be examined for their roles as biomarkers of drug sensitivity to several agents. Association of activity of these genes with drug response could be examined further when significantly associated agents are evaluated in experimental in vitro studies and in a clinical setting.

Conclusions

Our analysis of cancer cell line data identified associations of drug sensitivity with expression levels of APOBEC3A, APOBEC3B, REV1, and UNG genes and with abundance of sequence motifs and kataegis clusters attributed to APOBEC activity. The analysis of exome sequence data suggested that expression of REV1 and UNG and to a lesser extent of APOBEC3A was correlated with mutation patterns attributed to APOBEC activity, suggesting that APOBEC-like mutagenic patterns may result from the complex interplay among multiple molecular factors. Future studies may examine the biological mechanisms that could explain how each of the five genes associated with APOBEC-like mutagenic processes may contribute to sensitivity or resistance of tumor cells to cancer drug treatment.