Background

Increasing evidence suggests that clinical efficacy of cancer immunotherapy is driven by T cell reactivity against neo-antigens [15]. While not yet fully understood, immune response and recognition of tumor cells containing specific peptides depends critically on the ability of the MHC class I complexes to bind to the peptide in order to present it to a T cell. Neo-antigens can be created by a multitude of processes like aberrant expression of genes normally restricted to immuno-privileged tissues, viral etiology or by tumor specific DNA alterations that result in the formation of novel protein sequences. Furthermore there is now evidence for neo-epitopes generated from alternative splicing [6] and alterations in non-coding regions [7].

With the advent of affordable short read sequencing, comprehensive neo-antigen screening based on whole exome sequencing has become feasible and many cancer immune therapeutic approaches try to utilize detailed understanding of the neo-epitope spectrum to create additional or boost pre-existing T cell reactivity for therapeutic purposes [8, 9]. However, in practice the selection and validation of the most promising neo-epitope candidates is a difficult and time-consuming task. The typical approach is based on the private mutational catalogue of the individual patient: exome sequencing data is subjected to bioinformatics analysis and used to predict neo-epitopes and their binding affinities to the MHC class I complex. Our study aims to complement this approach by a precision medicine perspective. We search and prioritize neo-epitope candidates which have a high potential for neo-antigen generation and are likely to appear in multiple patients. These neo-antigens hold the potential for development of off the shelf T cell therapies for sub groups of cancer patients. We use epidemiological data to give rough estimates for the expected number of patients in these groups.

Candidate prediction always relies on somatic variant detection workflows and affinity prediction algorithms based on machine learning, see e.g. [10]. Binding prediction far from perfect [11] especially for rarer HLA types, and may also depend on mutational context [12]. Catalogues of the neo-epitope landscape across various cancer entities have been created by various authors [1315]. While neoantigen landscape is diverse and sparse [13], here we provide an unbiased, comprehensive ranking of candidates, defined as neo-epitopes arising from recurrent mutations, predicted to be binding to a specific HLA-1 allele. The candidates are ranked according to the expected number of target patients.

Methods

Data sets

Somatic variants for different cancer entities have been determined using matched pairs of tumor and blood whole exome or whole genome sequencing in the TCGA consortium. We downloaded the open-access somatic variants from GDC data release 7.0 [16], consisting of 33 TCGA projects and 10,182 donors in total. Details of the somatic variant calling can be found in [17]. We excluded patients without corresponding entries in the clinical information tables, and 7 projects with less than 100 samples, yielding 9,641 samples covering 26 cancer studies. Figure 1 provides an overview of the complete bioinformatics process, from the GDC somatic single nucleotide variants to the identification of the candidates.

Fig. 1
figure 1

Workflow overview a Overview of the recurrent neo-epitope candidates generation process: TCGA studies are selected for at least 100 donors with clinical annotations. For each of these studies, recurrent strongly supported missense Single-Nucleotide Variants are collected. Neo-epitopes binding to 11 HLA-1 types are predicted, redundancy is removed from that set (see B) and strong binders are retained. b Example of epitope redundancy: the 18 amino-acids long sequence surrounding recurrent variant GLRA3:S274L generates 7 binding neo-epitopes for the type HLA-A*02:01. Our pipeline retains only the strongest predicted binder for a given variant and HLA-1 type pair (the first, with an IC50 of 8.8 nM in the example). c Number of SNVs occuring in genes classified as Oncogenes or Tumor Suppressors by Vogelstein et al. [28], at various point of the variant selection and neo-epitope selection process

Variant selection

For each sample we selected all single nucleotide variants obtained by the “mutect2” pipeline, that had a “Variant_Type” equal to “SNP”, a valid ENSEMBL transcript ID and a valid protein mutation in “HGVSp_Short”. From these variants, we selected those with a “Variant_Classification” equal to “Missense_Mutation”. We checked that all variants had a “Mutation_Status” equal to (up to capitalisation) “Somatic”, that the total depth “t_depth” was the sum of the reference “t_ref_count” and the alternate “t_alt_count” alleles counts, and that the genomics variant length is one nucleotide. To avoid high number of false positives we consider only variants that are supported by at least 5 reads and have a VAF of at least 10%. Furthermore we removed any variant that occurs with more than 1% in any population contained in the ExAC database version 0.31 [18], by coordinates liftover from the GRCh38 to hg19 human genome versions. This way we obtained 26 cancer entity data sets containing a total of 9,641 samples with an overall 1,384,531 variants.

Recurrent protein variant selection

We define recurrence strictly on the protein/amino acid exchange level, i.e. different nucleotide acid variants leading to the same amino acid exchange due to code redundancy will be counted together. Recurrent protein variants are defined within each TCGA study. A protein variant is deemed recurrent when it appears in at least 1% of all the patients in the cohort. As cancer types are only considered when the number of patients involved in the studies is greater than 100, this threshold ensures that every recurrent variant has been observed in at least 2 patients for a given cancer type. To be conservative, the recurrence frequency has been computed using, for the denominator, all patients with clinical information in the study, including those without high-confidence missense SNVs. Using this definition, the total number of recurrent amino acid changes is 1055. A variant recurrent in multiple cancer types is counted multiple times in the above number, the number of unique recurrent variants regardless of the cancer is 869. Additional file 1 shows the most frequent amino acid exchanges across 25 cancer entities, as no variant from project TCGA-KIRC’s donors is labeled as recurrent.

Recurrent variants occurring at the same positions (for example when gene’s IDH1 codon R132 is mutated to amino acid H, C, G or S) have been merged into 819 variants suitable for comparisons with the cancer hot spots lists [14]. 122 out of the 819 merged variants belong to the set of 470 cancer hotspot variants, and 5 (PCBP1:L100, SPTLC3:R97, EEF1A1:T432, BCLAF1:E163 & TTN:S3271) to the set of presumptive false positives hotspots listed in the supplementary material of [14].

MHC class i binding prediction and epitopes selection

For all recurrent variants identified, we assess in silico their predicted propensity that the amino-acid exchange generates a binding neo-epitope.

A variety of machine learning algorithms have been developed to determine the MHC binding in silico, see ref. [19] for review. Most methods are trained on Immune Epitope Database (IEDB) [20] entries and use allele specific predictors for frequent alleles, while pan-methods are applied to extrapolate to less common alleles. We predicted the MHC class I binding using NetMHCcons [21] v1.1, which predicts peptides IC50 binding, and classifies these predictions as non-binder, weak and strong binders, based on the relative ranking of binding predictions. As the range of IC50 binding values strongly depend on the HLA-1 allele [22], we have used the NetMHCcons classification to select our neo-epitope candidates.

For a given recurrent variant and a given HLA-1 type, the epitope prediction pipeline can produce multiple overlapping epitopes candidates, differing by their length and/or their position (see Fig. 1B). To remove such size redundancy, only the epitope with the lowest predicted mutant sequence IC50 is retained. This procedure also removes non-overlapping epitopes, to keep only at most one epitope per recurrent protein variant and HLA-1 type. For comparison we also compute the IC50 for the respective wild type peptide.

For MHC class I binding prediction we selected 11 frequent HLA-1 types: HLA-A*01:01, HLA-A*02:01, HLA-A*03:01, HLA-A*11:01, HLA-B*07:02, HLA-B*08:01, HLA-B*15:01, HLA-C*04:01, HLA-C*06:02, HLA-C*07:01, HLA-C*07:02. We limited the search for poly-peptides 9, 10 and 11 amino-acids long. For these alleles, we obtain 769 strong binding recurrent peptides and 1829 weak binders, over all considered cancer types. Their complete list is in Additional file 2, where each candidate is listed with the HLA-1 type it is preticted to bind to.

Data QC

To ensure that the proportion of variants caused by technical artifacts is small, we have computed the proportion of SNVs called in poly-A, poly-C, poly-G or poly-T repeats of length greater than 6 have been computed for each data study [23], for unique variants (that occur in only one patient across a project cohort), and for variants that are observed more than once in a cohort (Additional file 3). For comparison, we have computed the expected frequency of such events, assuming that all possible 11-mers (the mutated nucleotide at the center, flanked by 5 nucleotides on each side) are equiprobable, regardless of their sequence.

Based on this equiprobable model, we have computed the probability that the number of mutations found in repeat locii is equal to or greater than the observed numbers. When considering variants appearing more than once, this probability is not significant for all studies; when unique variants are considered, those appear in repeat locii significantly more often than expected by chance in 7 out of 26 studies (TCGA-COAD, TCGA-KIRP, TCGA-LIHC, TCGA-READ, TCGA-SKCM, TCGA-TGCT & TCGA-UCAC, significance level set to 0.05 after Benjamini-Hochberg multiple testing correction).

Mice

ABabDII mice (described in detail in [24]) have been used for this study. They are transgenic for entire human TCR- α and TCR- β gene loci, as well as for HHD molecule [25] and deficient for the murine Tcr- α and - β chains, as well as for murine β2m and H2- Db genes. The mice used in the study were generated and housed under SPF conditions (caged enriched with bedding material, 3-5 mice/cage, standard light/dark cycle, food and water ad libitum) at the Max-Delbrück-Center animal facility. All animal experiments were approved by the Landesamt für Arbeitsschutz, Gesundheitsschutz und technische Sicherheit, Berlin, Germany.

Generation of mutation-specific t cells in ABabDII mice

For each candidate, 3 ABabDII mice between 8 to 12 weeks old (6 in total) underwent immunisation. They were injected subcutaneously with 100 μg of mutant short peptide (9-10mers, JPT) supplemented with 50 μg CpG 1826 (TIB Molbiol), emulsified in incomplete Freund’s adjuvant (Sigma). Repetitive immunizations were performed with the same mixture at least three weeks apart. Mutation-specific CD8 + T cells in the peripheral blood of immunized animals were assessed by intracellular cytokine staining (ICS) for IFN γ 7 days after each boost. All 6 animals were peptide-reactive. The 6 mice were sacrificed for spleen preparation by cervical dislocation after isofluran anasthesia.

Patient number estimates and HLA-1 frequencies

HLA-1 frequency data fh for the U.S. population was retrieved from the Allele Frequency Net Database (AFND) [26]. Frequency data were estimated by averaging the allele frequencies of multiple population datasets from the North American (NAM) geographical region. The major U.S. ethnic groups were included and sampled under the NAM category. Cancer incidence data for the U.S. population (Nd) was retrieved from the GLOBOCAN 2012 project of the International Agency for Research on Cancer, WHO [27].

Assuming that the fraction of a recurrent variant in the U.S. population affected by cancer entity d (rd) is identical to the observed ratio of that variant in the corresponding TCGA study, the number of patients of HLA-1 type h whose tumor contain the variant is expected to be

$$ n_{h} = f_{h} \sum_{d} r_{d} N_{d}. $$

The summation runs over 18 diseases d for which both the TCGA projects and the cancer incidence data are available.

Results

Recurrent variants and candidates

From the GDC repository [16], we have collected somatic variants for 33 TGCA studies. After removing patients without clinical meta-data, and studies with less than 100 patients, we have selected 1,384,531 high-confidence missense SNPs from 9,641 patients, see methods for details. Using this data, 1,055 variants are deemed recurrent (Additional file 1), as they can be found in more than 1% of the patients in the respective study cohort. These recurrent variants correspond to 869 unique protein changes, as some appear in multiple cancer entities. 77 of the recurrent variants occur in at least 3% of their cohort (43 unique protein changes).

From these 869 unique protein changes, we have generated candidates that are predicted to be strong MHC class I binders in frequent HLA-1 types that we considered for initial selection. 415 (48%) of them lead to a strong binder prediction. In total, there are 772 candidates that are recurrent in a cancer entity cohort, and predicted as binding for a considered HLA-1 type. These candidates are non-redundant among all the 9-, 10- & 11-mers containing the variant: the selection process retains only the peptide sequence with the lowest predicted IC50. Figure 1 and Table 1 provide an overview of the variant selection and neo-epitope candidates generation processes, while Additional file 2 lists all neo-epitopes (weak and strong predicted binders) after removing redundancy.

Table 1 Overview of the 33 TCGA studies used in this analysis

Despite large differences between variant selection protocols, 123 variants deemed recurrent by the above process can be found among the 470 variants identified in the cancer hotspot datasets [14] (Additional file 4). This overlap is strongly dependent on how frequent those variants are observed: there are 54 common variants out of the 61 variants observed more than 10 times over our dataset (>88%). Among the 819 variants retained for the comparison (see methods for details), only 5 appear among the variants flagged as possible false positive by Chang et al. (<1%).

Enrichment in known cancer related genes

We observe that recurrent variants occur substantially more frequently in known cancer-related genes than in other genes (Fig. 1c). Initially approximatively one percent of all observed variants are found in genes that have been described [28, 29] as oncogenes (54 genes) or tumor suppressor genes (71 genes). When recurrent unique protein changes are considered, the fraction of known oncogenes or tumor suppressor genes is substantially increased to 13% and 6.5% respectively (a χ2 test between unique protein changes and unique recurrent variants gives a P value smaller than 10−16). These fractions only marginally increase to 14% and 7% when only the unique protein changes leading to predicted strong binders for frequent HLA-1 types are considered (a χ2 test between unique recurrent variants and strong binders gives a non-significant P value). Additional file 5 shows a similar enrichment of known cancer-related genes per cohort. We observe that the enrichment is stronger for oncogenes than for tumor suppressors. This might be expected, as activating mutations in oncogenes are mainly distributed on a few protein positions, while loss of function mutations in tumor suppressors are generally distributed more broadly along the protein sequence.

It is interesting to observe that several of the highly prevalent neo-epitope candidates occur in genes that are involved in known immune escape mechanisms: RAC1:P29S is recurrent in study SKCM (melanoma), is predicted to lead to strong binding neo-epitopes for HLA-A*01:01 and HLA-A*02:01, and is reported to up-regulate PD-L1 in melanoma [30]. CTNNB1:S33C is recurrent in studies LIHC (liver hepatocellular carcinoma) and UCEC (uterine corpus endometrial carcinoma), is predicted to lead to strong binding neo-epitopes for HLA-A*02:01, and has been shown to increase the expression of the Wnt-signalling pathway in hepatocellular carcinoma [31], leading to modulation of the immune response [32] and ultimately to tumor immune escape [33]. In a separate study, Cho et al. [34] show that this mutation confers acquired resistance to the drug imatinib in metastatic melanoma. Finally, FLT3:D835Y recurrent in study LAML (acute myeloid leukemia), is predicted to lead to a strong binding neo-epitope for HLA-A*01:01, HLA-A*02:01 and HLA-C*06:02, and following Reiter et al. [35], Tyrosine Kinase Inhibitors promote the surface expression of the mutated FLT3, enhancing FLT3-directed immunotherapy options, as its surface expression is negatively correlated with proliferation.

While the described mechanisms are probably sufficient to explain immune escape in tumor evolution, the candidates could nevertheless be viable targets for adoptive T cell therapy or TCR gene therapy.

Recurrent neo-epitopes in patient populations

Upon assumption of statistical independence, the product of the frequency of a recurrent variant with the frequency of class I alleles in the population and the incidence rates of cancer types provides an estimate for the number of patients that carry that specific candidate. Using the number of newly diagnosed patients per year and HLA-1 frequency in the US population, we are able to compute the expected number of patients for 18 cancer entities for which both cancer census data and a TCGA study are available. The occurrence numbers for individual candidates range from 0 to 2,254 for PIK3CA:H1047R in breast cancer patients of type HLA-C*07:01; Table 2 presents a summary of expected patient numbers for the complete set of candidates. We estimate that, in the US alone, the previously discussed RAC1:P29S mutation might be present in 628 new patients carrying the HLA-A*02:01 allele each year (in 556 melanoma patients and in 72 lung small cell, head & neck or uterine carcinomas patients, see Additional file 6 for details). For the CTNNB1:S33C mutation, the total number of HLA-A*02:01 patients in the US is expected to be 364, from uterine corpus, prostate and liver cancer types. As another example, 115 myeloid leukemia patients in the US are expected to be of type HLA-A*02:01 and carry the FLT3:D835Y mutation.

Table 2 Expected number of newly diagnosed U.S. patients by HLA-1 type and cancer entity

Figure 2 shows the cumulative expected number of patients that carry a specific epitope, and with matching HLA-1 type, for the 50 candidates with the highest expected patients number. The number of patients is derived from the sum over all cancer entities, including those in which the candidate is not recurrent according to our criteria. For example, among newly diagnosed US patients of type HLA-C*04:01, 88 prostate cancer patients are expected to carry the mutation PIK3CA:R88Q, even though its observed frequency in the PRAD study is as low as 0.2%. The data shown in Fig. 2 can be found in Additional file 6.

Fig. 2
figure 2

50 most frequent candidates in patients for which strong MHC I binding is predicted. For each candidate, the expected number of patients is obtained by summing over the 18 cancer entities for which the number of newly diagnosed patients in the US is available, and for which a corresponding TCGA study has been included in our analysis

Accessible patient population

As our current understanding of peptide immunogenicity is still incomplete [36], not all candidates predicted by our pipeline can be expected to trigger an immunogenic response in patients. To further evaluate the usefulness of our results we consider the list of candidates (neoepitope and HLA type pairs) selected form our ranking. Assuming a T cell therapy could be generated for every candidate we can compute the number of patients that would benefit, see methods. Because of imperfections in candidate prediction, not all candidates hold the potential for an effective T cell therapy, and these ineffective candidates can be thus viewed as “false positives”. Because it is impossible to create a reliable estimated for the fraction of these false positives due to the complexity of the underlying algorithm and biological process we decided to consider a broad range of possible values from 50% to 95%, cf. Figure 3. Using a subset of 6868 patients for whom HLA types were known, we predict the number of patients for whom such positive response might be expected, as a function of the proportion of “false positives” in our candidates. To estimate the impact of such “false positives”, we have randomly flagged 1000 times 337, 539, 607 & 640 candidates as “false positives”, which is corresponding to a fraction of about 50%, 80%, 90% and 95% of the total 674 candidates. This procedure left us with 1000 sets of 337, 135, 67 & 34 candidates that were not flagged as “false positives”. Figure 3 shows that for a pessimistic 90% of false positive candidates, more than 1.5% of patients over all cancer entities (95% CI between 1.25% & 2.65%, mean 1.78%, median 1.72%, both corresponding to about 20000 new patients per year in the U.S.) are still expected to carry at least one of the 67 remaining candidates’ mutation and corresponding HLA allele. While the proportions are modest, the absolute number of patients seems relevant. The figure in Additional file 7 shows that there are considerable differences between entities: the proportion of matching patients is much higher in diseases with high mutational load such as melanomas (TCGA-SKCM, median about 9% for 90% false positives), than in diseases with lower mutational load, such as thyroid cancer (TCGA-THCA, 0.2%, 90% false positives).

Fig. 3
figure 3

Expected influence of the proportion of false positive neo-epitope candidates on the patient population. Proportion of the patients that carry at least one neo-epitope candidate mutation, and whose HLA-I allele set contains the candidate HLA type, when a limited percentage of the neo-epitope candidates is considered. The patient cohort considered here consists of 6868 patients from the 18 TCGA cohorts for whom the HLA types are known. For each false positive proportion, the false positive candidates have been selected 1000 times at random

Confirmational evidence

A limited validation of our method was performed in two steps: first, we confirmed that our pipeline was able to identify candidates that have been previously reported as eliciting spontaneous CD8 + T-cell responses in cancer patients in whom the target epitopes were subsequently discovered [37, 38]. Both sets together (Additional file 8) contain 37 epitopes, 35 of which could be mapped to an ENSEMBL transcript (33 unique genes). For 27 of these epitopes our pipeline predicted strong binding with the specific HLA-1 type reported in the corresponding wet-lab investigations. Another 5 epitopes where predicted as weak binders, some of the latter are also predicted to be strong binders in other HLA-1 types. Our pipeline classified 70% of a set of known tumor neo-antigens as strong binders and another 14% as weak binders.

4 out of 34 unique identifiable variants studied by van Buuren et al. [38] and Fritsch [37] are found among our set of high confidence missense variants, but only one (CTNNB1:S37F) fulfills the 1% recurrence threshold (9 uterine carcinoma patients). This variant was shown to trigger immunological response against HLA-A*24:02 [39], which isn’t in the set of alleles that we have systematically tested. However, our prediction show that the same peptide might also be reactive against HLA-C07:02.

Finally, the CDK4:R24C peptide (sequence ACDPHSGHFV, see Additional file 8) is not predicted to bind to HLA-A*02:01, even though it leads to confirmed T cell response [40], and has been related to cutaneous malignant melanoma and hereditary cutaneous melanoma [41], [42]. Taken together, these results show that our candidate prediction pipeline is able to recapitulate most clinically validated neo-epitopes reported in [38] and [37], and that some of these neo-epitopes occur from recurrent variants.

We have also performed preliminary validation for two candidates: RAC1:P29S & TRRAP:S722F binding to HLA-A*02:01 (Fig. 4). We utilized ABabDII mice, transgenic animals that harbour the human TCR αβ gene loci, a chimeric HLA-A2 gene and are deficient for mouse TCR αβ and mouse MHC I genes. These mice have been shown to express a diverse human TCR repertoire [24, 43] and thus mimic human T cell response. They were immunized at least twice with mutant peptides and IFN γ producing CD8 + T cells were monitored in ex vivo ICS analysis 7 days after the last immunization. CD8 + T cells were purified from spleen cell cultures of reactive mice using either IFN γ-capture or tetramer-guided FACSort. Sequencing of specific TCR α and β chain amplicons that were obtained by RACE-PCR revealed that this procedure yields an almost monoclonal CD8 + T cell population (not shown). In both cases, tested neo-antigen candidates lead to T cell reactivity, confirming not only predicted MHC binding by our pipeline but also immunogenicity in vivo in human TCR transgenic mice. Therefore this workflow also allows to generate potentially therapeutic relevant TCRs to be used in the clinics for cancer immunotherapy.

Fig. 4
figure 4

Recognition of predicted epitopes by CD8 + T cells. Epitopes for recurrent mutations that have been identified in silico to bind to HLA-A*02:01 using our pipeline were synthesized and used for immunization of human TCR transgenic ABabDII mice. Examples (RAC1:P29S and TRRAP:S722F) of ex vivo ICS analysis of mutant peptide immunized ABabDII mice 7 days after the last immunization are shown. Polyclonal stimulation with CD3/CD28 dynabeads was used as positive control, stimulation with an irrelevant peptide served as negative control (data not shown)

Discussion

By virtue of the underlying mutational processes, the genome architecture and accessibility as well as for functional reasons within the disease process, certain somatic mutations will be present in multiple patients while still being highly specific to the tumor [14]. Using existing cancer studies and neo-epitope binding predictions to MHC class I proteins, we propose a ranking of candidates which mutation occur frequently in observed cancer patient cohorts. The candidates are ranked according to the expected number of target patients. For one candidate, the target patients are defined as those who bear the candidate’s mutation, and whose HLA types contain the candidate’s. The expected number of target patients is proportional to the HLA type frequency in the population, and to the frequency of the mutation in the cancer cohorts. Taking into account the fact that MHC binding is a necessary but not sufficient condition for T cell activity, and the limitations of MHC binding prediction algorithms, our method provides an objective ranking of neo-epitopes based on recurrent variants, as a basis for the development of off-the-shelf immunotherapy treatments.

Despite numerous mechanisms of immune evasion, neo-epitopes are important targets of endogenous immunity [5]. In some cases at least, it has been shown that they contribute to tumor recognition [44], achieve high objective response (in melanoma, see ref. [45, 46]), and a single of them is presumably sufficient for tumor regression [47]. Moreover, positive association has been shown between antigen load and cytolytic activity [48], activated T cells [13] and high levels of the PD-1 ligand [49]. Taken together, these results suggest that neo-epitopes occupy a central role in regulating immune response to cancer, and that this role can be exploited for cancer immunotherapy. Even though the question of negative selection for strong binding neo-epitopes and its relation to other immune evasion mechanisms like HLA loss or PD-L1, CTLA4 dis-regulation is still open [50]. A recent CRISPR screen suggest that more then 500 genes are essential for cancer immunotherapy [51].

Targeting neo-epitopes based on non-recurrent, private somatic variants requires generation of private TCRs or CARs for each individual patient, which is challenging [52]. Successful treatments based on genetically engineered lymphocytes has been shown for epitopes arising from unmutated proteins, i.e. public epitopes: MART-1 and gp100 proteins have been targeted in melanoma cases [53]. In another trial, Robbins et al. [54] have studied long-term follow-up of patients who were treated with TCR-transduced T cells against NY-ESO-1, a protein whose expression is normally restricted to testis, but which is frequently aberrantly expressed in tumor cells. They show that treatment may be effective for some patients. These results show that immune treatments based on public variants can be beneficial, suggesting that similar success may potentially be achieved using candidates based on recurrent variants.

However, targeting such non somatic epitopes presents safety and efficacy concerns [2]. The administration of T cells transduced with MART-1 specific T-cell receptor have led to fatal outcomes [55]. Cross-reactivity of TCR against MAGE-A3 (a protein normally restricted to testis and placenta) caused cardiovascular toxicity [56]. Neo-epitopes based on recurrent somatic variants potentially alleviate such problems, as the target sequences are truly restricted to tumor cells.

Our computation of expected targetable patient groups assumes that neither the cancer type nor the patient’s mutanome are associated with the patient’s HLA-1 alleles. In a recent study, Van den Eyden et al. [50] show that there is little (if any) antigen depletion due to the negative selection pressure from the immune response. Molecular evolution methods applied to somatic mutations show that nearly all mutations escape negative selection [57]. Taken together, these results suggest that the expected probability of a recurrent variant being present in a patient somatic mutations pool should not be affected (significantly) by the patient’s HLA-1 alleles.

The neo-epitope landscape is diverse and sparse [13]. Few neo-epitopes are predicted to be both strong binders and present in multiple patients. In their analysis, Hartmaier et al. [58] estimate that neo-epitopes suitable for precision immuno-therapy might be relevant for about 0.3% of the patients, which is in agreement with our results. However, the absolute number of patients is still considerable, see Table 2. Our study shows that a relatively large number of patients (about 1% of newly diagnosed patients) might benefit from a small library of candidates proven to generate immunological response. These numbers must be compared to “conventional” personalised immunotherapy, where a immunologically active candidate must be identified for each new patient for which efficacy and safety are always unknown. Even if a substantial part of the neo-epitopes we suggest turns out to be false positives due to the limitation of prediction algorithms and understanding of immune response, there is potential to help tens of thousands of patients.

Conclusions

Off the shelf immune treatments can be faster, less costly and safer for individual patients, because each neo-epitope based treatment scheme can be reused on hundreds of patients per year. In this respect, they might open the way to supplement existing personalized cancer immune treatments approaches with precision treatment options.

We believe that our ranking provides a rational order for testing for and selecting off the shelf neo-epitope based therapies. Our preliminary in vivo mouse experiments show that this in principle feasible.