Background

The efficacy of therapeutic effect of immune checkpoint blockade such as PD1 and CLTA4 antibodies is hypothesized to be dependent on mutant peptide epitopes which cause the T cell dependent cytotoxicity toward tumor cells. Epitopes for CD4 T cells are proposed to be a major mechanism. In mouse models, both artificial protein antigens and mutant peptide antigens derived from tumor cells were found to elicit tumorcidal T cell responses [1,2,3]. Clinical trials using long peptides or mRNA to deliver CD4 T cell epitopes to dendritic cells have shown success in inducing mutant peptide-specific CD4 T cells and their association with anti-tumor efficacy [4,5,6].

In this study, we analyzed next generation sequencing data from 147 lung adenocarcinoma patients deposited in the Cancer Genome Atlas, to identify both the driver and passenger mutations which may be presented by HLA Class II molecules. Due to the complexity of polymorphisms of both alpha and beta chains of HLA Class II molecules, we only studied the binding of mutant peptides to HLA DRB1 molecules that pair with an invariant alpha chain, HLA DRA.

Methods

Standardization and tracking of mutation data from TCGA

We collected mutations of lung adenocarcinoma from TCGA [7]. The data collection criteria was established as follows: 1, Tumor and matched normal adjacent tissue were included; 2, Samples that contain all somatic mutation, expression, SNP (single nucleotide polymorphism) array information were included; 3, Tumor samples from same patients were removed; 4, Samples with purity lower than 20% or ploidy larger than 6 were removed, purity and ploidy were reported by AbsCN-seq [8].

To remove common sequencing artifacts or residual germ line variation, each mutation was subjected to a ‘Panel of Normals’ filtering process using a panel of over 600 BAM files from normal samples. Mutations observed more than 1% in the panel of normals, dbSNP [9] or 1000G [10] were removed. Finally, all mutations with covered reads less than 10X were filtered out.

Purity and ploidy analysis

Purity and ploidy were estimated by AbsCN-seq, a software developed for WES (whole exon sequencing) data, based on SNV (single nucleotide variations) frequency and segment copy number.

Mutation clonality analysis

After estimating the tumor purity, we calculated the CCF (cancer cell fraction) for each mutation. The CCF is the percentage of tumor cells harboring a given mutation. Clonal mutations have a true CCF of 1, and subclonal mutations have a true CCF < 1. The observed allele counts correspond to a probability density of the CCF, which can be estimated with the following equation, where q(m) is the local copy number at the given mutation m, a is purity, and CCF ranges from 0 to 1. pdf is probability density function, alt is the alternate allele counts, ref. is the reference allele counts [11].

$$ \mathrm{pdf}\left(\mathrm{CCF},\mathrm{m}\right)=\upbeta \mathrm{pdf}\left\{{\mathrm{CCF}}^{\ast}\upalpha, \mathrm{alt}\left(\mathrm{m}\right)/\left[{2}^{\ast}\left(1-\upalpha \right)\kern0.37em +\kern0.37em {\upalpha}^{\ast}\mathrm{q}\left(\mathrm{m}\right)\right]\kern0.37em +\kern0.37em 1,\mathrm{ref}\left(\mathrm{m}\right)+1\right\} $$

Neo-peptides prediction

We first confirmed that the mutated genes were expressed by RNA-seq data. Genes with 3 or more reads covered were defined as expressed according to Kandoth et al. [12]. 29-mer polypeptides centered on mutated residues were scanned to identify candidate peptides binding to MHC Class I or II molecules [13], i.e., peptide sequences surrounding mutated amino acids resulting from missense mutations, frame-shift or non-frame-shift indels. The affinity of 8–11 peptides binding to MHC Class I molecules were predicted using the NetMHCPan2.4 binding algorithm [14]. The affinity of 15 mer peptides binding to MHC Class II molecules were predicted using the NetMHCIIPan3.1 binding algorithm [15]. Threshold for strong binding peptides is defined as half-maximum inhibitory concentration (IC50) < 50 nM; Threshold for weak binding peptides is defined as IC50 < 500 nM [15,16,17].

MHC Class II molecules include HLA DP, DQ, and DR molecules. These molecules are composed of alpha and beta subunits. For DP and DQ molecules, both alpha and beta subunits are polymorphic. DR molecules are composed by a polymorphic beta subunit and an invariant alpha subunit. In this study, we focused on HLA DRB1, the most prevalent beta subunit of HLA DR [18]. The frequencies of other DRB molecules (DRB3, 4 and 5) are 5 to 10 fold lower than DRB1 (reference [18]). Clearly DRB1 molecules are significantly more frequent in presenting neo-antigens.

Results

To ensure high quality mutation calls for lung adenocarcinoma, stringent filters (Methods) were applied in sample and mutation collecting. A total of 40,229 somatic mutations in 147 lung adenocarcinomas were included for downstream analysis, including 26,296 missense, 8965 silent, 2061 nonsense, 911 splice site, 98 non-stop/read through, 1735 frame shift insertions/deletions (indels) and 163 inframe indels.

We assessed the CCF(cancer cell fraction) of each mutation as described in Carter et al. [19] to assess whether mutations are clonal (i.e., present in all cancer cells). Mutations are considered clonal if the CCF is close to 1. To determine the CCF, we calculated the sample purity (i.e., the percentage of tumor cells in sample), ploidy (i.e., a measure of the number of chromosomes in a cell) and absolute copy number by Abs-CNseq. We further identified clonal mutations based on beta distribution. In total, we identified 21,710 clonal mutations (Fig. 1), including the known proliferation-related genes (e.g., TP53, KRAS, EGFR).

Fig. 1
figure 1

Flow chart of clonal mutation analysis and HLA-binding neo-antigen prediction for lung adenocarcinoma patients

High-affinity candidate T cell epitopes were identified in silico by scanning of the mutant peptides resulting from missense mutations, frame-shift or non-frame-shift indels. T cell epitopes presented by MHC Class I molecules were predicted by NetMHCPan2.4 binding algorithm (Additional file 1: Table S1, Additional file 2: Table S2 and Additional file 3: Table S3). T cell epitopes presented by MHC Class II molecules were predicted by NetMHCIIPan3.1 binding algorithm. We focused on HLA DRB1, the most prevalent beta subunit of HLA DR which pairs with invariant alpha subunit HLA DRA [18]. In total, 8804 neo-peptides, including 375 strong binders and 8429 weak binders were found (Fig. 2). For DRB1*01:01, 950 neo-peptides, including 54 strong binders and 896 weak binders were found. The most commonly mutated genes with predicted neo-antigens are KRAS, TTN, RYR2, MUC16, TP53, USH2A, ZFHX4, KEAP1, STK11, FAT3, NAV3 and EGFR (Table 1). The exact mutated sequences are listed in Additional file 4: Table S4. The frequency of neo-peptides varies widely in individual patients of lung adenocarcinomas, from 0 to 523 (Fig. 2). Table 2 shows the distribution of neo-antigens in different HLA DRB1 alleles. DRB1*01:02, DRB1*12:01, DRB1*11:04, DRB1*01:01 were found to be the most frequent DRB1 alleles which present neo-antigens. High frequency of neo-peptides were found in hotspots of KRAS (Table 3, G12C or G12 V). INDEL mutations were found in most patients (Fig. 3). However, no linear correlation was found between SNV and INDEL mutations.

Fig. 2
figure 2

Predicted HLA-DRB1-binding neo-antigen mutant peptides in 147 lung adenocarcinoma patients. Somatic mutations were predicted by NetMHCIIPan3.1. All patients were lined up according to numbers of HLA-DRB1-binding neo-antigen mutations, including both strong-binders (SB, blue color) and weak-binders (WB, red color). Gray color indicates other mutations which do not bind to MHC Class II molecules. Smokers and non-smokers were analyzed separately

Table 1 Top mutated genes with predicted HLA DRB1 binding neo-peptides in lung adenocarcinoma patients in this study
Table 2 Number of predicted neo-antigen peptides presented by MHC Class II molecule HLA DRB1
Table 3 Predicted HLA DRB1-binding neo-peptides of KRAS, EGFR, TP53, and MUC16 in lung adenocarcinoma patients in this study
Fig. 3
figure 3

Predicted HLA-DRB1-binding INDEL mutant peptides in 147 lung adenocarcinoma patients

Discussion

Several groups have proposed to predict HLA Class II presented neo-antigens through next generation sequencing for cancer immunotherapy [1,2,3,4,5,6]. In both mouse models and human patients, the function of predicted neo-antigens have been verified,by measuring CD4 T cell responses or tumor rejection.

In this study, we have predicted the HLA Class II-presented neo-antigen peptides in lung adenocarcinoma. An average of 59 HLA DRB1-presented neo-antigen mutations were predicted per lung cancer patient. This prediction is based on the assumption that all HLA DRB1 alleles may be the MHC class II molecule to present mutated peptides in a patient. Since a specific cancer patient only express one HLA DRB1 allele, the actual mutant peptide epitope presented by a cancer patient is much lower. Unfortunately, the HLA DRB1 allele data are not available in public TCGA database for the lung cancer patients we have studied. Assuming HLA DRB1*01:01 is the HLA DRB1 allele, 54 strong binders and 896 weak binders were found in 147 patients. In average, 5 mutant peptides were found per patient with HLA DRB1*01:01 allele.

van Buuren et al. reported that the sensitivity of neo-epitope prediction from analysis of exonic SNVs in cancer exome sequencing data requires little improvement [20]. Our analysis on mutant peptides presented by HLA Class I molecules in lung cancer patients is consistent with this conclusion (Additional file 1: Table S1 and Additional file 5: Table S5, top mutated genes with predicted epitopes binding to HLA Class I molecules).

A weakness of our analysis is that the expression of predicted neo-epitopes could not be determined. As we described, genes with 3 or more reads covered in RNA-seq data were defined as expressed according to Kandoth et al. [12]. Although the normal copy of a gene may be expressed, its variants may not be expressed, especially truncating variants that may undergo nonsense-mediated transcript decay. Mass spectrometry-based new technologies are emerging to verify predicted neo-epitopes [21,22,23], through analysis of eluted peptides from HLA molecules purified from cancer tissues.

K-Ras, TP53, and EGFR mutants are well known vaccine candidates which are currently in clinical trials [24,25,26,27]. Our data suggest that such mutations in proliferation-related genes are also candidate for CD4 epitopes. In addition, neo-antigens of passenger mutations are also attractive targets for individualized precision therapy. There is urgent need for technologies which may help to determine whether the predicted neo-antigen mutations are presented by HLA Class II molecules. Technical platforms include ELISPOT assay by synthetic candidate peptide epitopes, T cell stimulation assay by using antigen presenting cell lines expressing specific HLA DRB1 molecules, and tetramer staining-based sorting of neoantigen-specific T cells.

Conclusions

This study used clonal mutation analysis to predict HLA DRB1 molecule presented neo-antigen mutant peptides which are expressed at RNA level. Genes discovered here provide clues for identifying CD4 T cell epitopes for immune monitoring and therapy.