Use of an integrated pan-cancer oncology enrichment NGS assay to measure tumour mutational burden and detect clinically actionable variants

The identification of tumour mutational burden (TMB) as a biomarker of response to PD-1 immunotherapy has necessitated the development of genomic assays to measure this. We carried out comprehensive molecular profiling of cancers using the Illumina TruSight Oncology panel (TSO500) and compared to whole genome sequencing. Methods Cancer samples derived from formalin fixed material were profiled on the TSO500 panel, sequenced on an Illumina NextSeq 500 instrument and processed through the TSO500 Docker Pipeline. Either FASTQ files (PierianDx) or VCF files (OncoKDM) were processed to understand clinical actionability In total, 108 samples (a mixture of colorectal, lung, oesophageal and control samples) were processed via the DNA panel. There was good correlation between TMB, SNV, indels and CNV as predicted by TSO500 and WGS (R2>0.9) and good reproducibility, with less than 5% variability between repeated controls. For the RNA panel, 13 samples were processed, with all known fusions observed via orthogonal techniques detected. For clinical actionability 72 Tier 1 variants and 297 Tier 2 variants were identified with clinical trials identified for all patients. The TruSight Oncology 500 assay accurately measures TMB, MSI, single nucleotide variants, indels, copy number/structural variation and gene fusions when compared to whole genome sequencing and orthogonal technologies. Coupled with a clinical annotation pipeline this provides a powerful methodology for identification of clinically actionable variants.


Introduction
Recent developments in next-generation sequencing and tumour immunology have allowed the discovery that targeting the CTLA4, PD-1 and PD-L1 receptors using therapeutic monoclonal antibodies (1) can unmask cancer to the immune system, facilitating its immunemediated destruction. Although initial trials of PD-1 inhibitors had mixed results (2,3), as with previous targeted therapies, it was determined that a specific tumour genotype was required in order for these inhibitors to be effective, leading to the finding that dramatic regression of tumours could occur with the correct genotype.
In order for tumours to become immunogenic, a high neoepitope load must be generated via hypermutation (4)(5)(6), ideally indel/frameshifts or non-synonymous mutations that generate novel proteins that can be recognised by the immune system. These neoepitopes can then be presented via MHC in order to aid immune killing (7).
The CHECKMATE (8-10) series of trials have suggest that a specific threshold of "tumour mutational burden" (TMB) must be reached in order for PD-1 blockade to become effective.
Although TMB has variable definitions, it is broadly accepted (9) as the number of missense mutations in the tumour genome, either divided by the size of the exome panel (35)(36)(37)(38)(39)(40)(41)(42)(43)(44)(45) or via the size of the human genome for whole genome sequencing (3.3 Gb). Based on the CHECKMATE trials, the suggested TMB threshold is greater than 10 mutations/mb, based on the objective response rates of the tumours in these studies not improving much beyond this threshold.
Initially, TMB was measured using whole genome and whole exome sequencing (11), however these technologies are not cost effective currently for routine use in the clinic.
Despite the falling cost of next-generation sequencing reagents, the volume of data required for sufficient coverage of either whole genome (200-300 Gb for 60X read depth) or whole exome (4-5 Gb for 100X read depth) sequencing make it impractical except for dedicated sequencing cores. Secondly, even with high read depth, sufficiently deep coverage in order to identify rare subclonal (12) mutations that may contribute to the neoantigen load is required, of the order of 500X. Thus, whole genome/exome coverage is not cost effective (13). Rizvi et al (13) demonstrated that in order to accurately measure TMB using a NGS-based assay, a panel size of at least 1.5 megabases is required. This panel size offers opportunities for a pan-cancer assay, as a panel of this size could cover the majority of known driver genes across multiple cancer types. In designing an oncology assay, ideally other types of variations would be included. Recent studies (14,15) have shown the . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 4, 2020. ; https://doi.org/10.1101/2020.02.01.20019992 doi: medRxiv preprint potential utility of selecting targeted therapies using large gene panels and therefore a panel should include mutations associated with targeting therapies.
An additional advantage of panel-based designs is the ability to enrich RNA targets. Recent studies have shown the importance of RNA fusions such as the TMPRSS-ERG fusion in prostate cancer (16), the FGFR2 fusion in cholangiocarcinoma (17), and the NTRK fusion in lung and other cancers (18). These fusions are either targetable with molecularly targeted agents (e.g. larotectinib (19) or pemigatinib (20)) or are prognostically relevant (i.e.

TMPRSS-ERG).
An ideal oncology panel-based assay would have several characteristics (21): enrichment chemistry rather than PCR chemistry for identification of rare alleles with straightforward library preparation; a broad panel that targets the majority of DNA & RNA alterations in cancer; rapid run time; prediction of novel biomarkers such as TMB; and a standardised, reproducible analysis pipeline that can be used in a clinical setting.
In this study, we present our initial results using the Illumina TruSight Oncology 500 assay across a range of cancer types. We benchmarked it against whole genome and exome sequencing, as well as determining its ability to detect RNA fusions and copy number variants.
. CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint

Bioinformatics
Raw sequencing output was transferred from the sequencing instrument to a bioinformatics server running Ubuntu 18.04LTS. A pre-supplied Docker image (The TruSight Oncology 500 pipeline; Illumina, San Diego, CA, USA) was used to generate TMB and Microsatellite Instability (MSI) calls. The pipeline consists of several steps. Initially, raw bcl files are converted to sample-specific FASTQ files as specified by the sample index. FASTQ files were then aligned against the hg19 reference genome using Isaac 4, local realignment to indels was performed, paired-end reads were stitched together, followed by variant calling with the somatic sample caller Pisces. Germline variants were filtered using a proprietary database, then the called variants were annotated to identify synonymous and nonsynonymous variants. Actual coverage of the panel compared to the reference coverage was computed and TMB was calculated based on the number of synonymous and nonsynonymous mutations detected divided by the size of the panel successfully sequenced.
Small variants were exported from the TSO500 pipeline and annotated using VEP, then converted using vcf2maf and imported into the maftools module of R/Bioconductor. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 4, 2020. ; https://doi.org/10.1101/2020.02.01.20019992 doi: medRxiv preprint TMB calls for whole genome sequenced control data were carried out using the Genomics England v3 pipeline for calling tumour-normal pairs and used to compare to calls from the TSO-500 pipeline. In brief, this pipeline utilised Isaac v3 to align sequence data to the hg19 genome, followed by copy number variant calling using Canvas and structural variant calling using Manta. CNV calls for the TSO500 files were obtained using the Craft copy number caller set in somatic tumour only mode. Overlaps were computed using bedtools. SV calls for the TSO500 files were obtained using the Manta structural variant caller set in tumour only mode with a custom modification to the C++ code of the Manta SV caller to enable detection with less read support and on amplicon sequencing data. SV overlaps were computed using bedtools. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 4, 2020. ; https://doi.org/10.1101/2020.02.01.20019992 doi: medRxiv preprint

DNAquality metrics
In total, 108 samples were profiled using the assay, with a median sample age of 2 years (range 4 months-10 years). All samples were from FFPE blocks. This input for all assays  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 4, 2020. ; https://doi.org/10.1101/2020.02.01.20019992 doi: medRxiv preprint

Precision of control calls
In order to understand the ability of the assay to detect low variant allele frequencies (VAF), assessment of the VAF was performed for two known low VAF mutations in the HD753 cell line (Figure 4). This cell line has validated mutations in AKT1 (E17K, chr14: 105246551C>T) with a VAF of 0.05 and in PIK3CA (E545K, chr3: 178936091G>A) with a VAF of 0.056. The same control was run across 12 runs, with AKT1 median VAF=0.059 (IQR 0.037-0.072) and PIK3CA median VAF=0.036 (IQR 0.033-0.0493).

Copy number calls
A subset of 24 sample underwent copy number calling with the Craft pipeline. A variety of copy number gains and losses were detected in the 520 genes profiled on the TSO500 panel. The HD753 control was used to determine whether the observed copy number calls

Structural variant calls
The HD753 control is known to have a variety of structural variants including a SLC34A2/ROS fusion (VAF = 5.6%) and CCDC6/RET fusion (VAF = 5.0%). With use of a custom pipeline there was evidence for detection of both fusions: 7/506 reads supported the SLC34A2/ROS fusion and 5/498 reads supported the CCDC6/RET fusion. A variety of structural variants were observed in the tumour cohort. In addition, long indels were successfully detected by the Manta pipeline, specifically a 14bp deletion in EGFR (NM_005228.5:c.2235_2249del) known to be present in the HD753 control.

Tumour mutational burden (TMB) & Microsatellite instability (MSI)
TMB calling was successfully performed in 107/108 samples. The one failure was a very poor sample quality that failed hybridisation. There was good correlation between TMB determined by TSO500 and WGS (R 2 =0.9, Figure 6). The median TMB was 8. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 4, 2020. ; https://doi.org/10.1101/2020.02.01.20019992 doi: medRxiv preprint 311 muts/mb (range 289-325) in the HD753 control, a variance in TMB score of +/-5%.
Comparison to The Cancer Genome Atlas (TCGA) tumour cohorts was performed in mafTools and is shown in Figure 8.
For microsatellite instability (Figure 9), the threshold for classification as MSI-H was >10% of microsatellite sites being unstable. Using this threshold, both known somatic mismatch repair mutant (MLH1 and MSH6) cancers were MSI-H with 55% and 67% of sites being unstable respectively. Reassuringly, the POLE mutant cancer had 2% of MSI sites being unstable, meaning it was microsatellite stable (MSS) as is typical in POLE mutant cancer.

RNA fusions
RNA fusion analysis was carried out on 13 samples, of which 6 had known fusions. Fusions were detected between ETV6/NTRK3 (3 samples), RBPMS/NTRK3 (1 sample), EML4/ALK (1 sample), and TG/RET fusion (1 sample). All fusions that had previously been identified by FISH were detected using this methodology. A fusion was detected in one sample between ETV6/NTRK3 that had not been identified via FISH, however the fusion was supported by 12,627 reads in the sequencing run, which we felt was unlikely to be a false positive and therefore labelled it as a true fusion.

Clinical actionability
In order to recover as many clinically actionable variants as possible, mutational calls were is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 4, 2020. ; https://doi.org/10.1101/2020.02.01.20019992 doi: medRxiv preprint PierianDx and OncoKDM pipelines were not directly comparable in this study because of the differing inputs (FASTQ for PierianDx, VCF for OncoKDM) but the combination of both platforms provided a comprehensive variant overview.

Conclusions
We utilised the Trusight Oncology 500 assay in order to understand its utility and accuracy in determining both the tumour mutational burden and druggable mutation calls in cancer. One of the key challenges with patient testing (23) is the ability to take a patient biopsy sample, with limited input material and produce sequencing data and mutational calls of sufficient quality in order to make decisions on target selection and drug therapy (24).
The assay was designed in its first iteration to measure tumour mutational burden as a surrogate marker for response to anti-PD1 immunotherapy as multiple studies have shown a correlation between TMB and response to this type of therapy (8,13). The TSO500 assay performs well in this respect with accurate measurement of TMB when compared to whole genome sequencing. Taking a threshold of 10 mut/mb as "TMB-high" (i.e. that which would have benefit for immunotherapy), we found that the TSO500 assay was able to classify samples with 100% accuracy. The precision of the calls varies at the extremes of TMB value, undoubtedly as a factor of panel size in calling TMB at extremely high levels. We conclude that the TSO500 pipeline is usable in clinical determination of TMB status across a range of clinical sample types and DNA inputs.
We successfully detected microsatellite instability in all samples that were known to be MSI-H using TSO500. MSI detection using NGS has been shown to be feasible (25) previously using a variety of software solutions, usually relying on off-target reads (26), but other assays have used dedicated MSI probes (like the TSO500). We have found that the performance of this approach is variable, as the probes are vulnerable to drop out in FFPE samples. We propose that TMB instead may be a good surrogate biomarker for MSI, as a range of 30-80 mut/mb is typically seen in MSI tumours as opposed to MSS POLE/POLD1 tumours which typically have greater than 150 mut/mb.
A key requirement for clinical specimens is the ability to process low-input specimens as well as the ability to detect the low variant allele frequencies (VAF) associated with these specimens (27). Reassuringly, we found that the TSO500 assay performed well at its recommended input concentration and also below these levels. Within our control samples with known VAF (of approx. VAF=0.05), we determined that there was good precision and . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 4, 2020. ; https://doi.org/10.1101/2020.02.01.20019992 doi: medRxiv preprint reproducibility with minimal variability. Another advantage to tolerance of low sample input is the possibility of using input levels seen in circulating tumour DNA (ctDNA) which are typically 1 ng/ml plasma in most cancers. This would allow derivation of blood TMB (28), which has been shown to be a better biomarker of response in PD-1/PD-L1 inhibitor therapy.
The assay is also performed at sufficiently high read depth to allow calculation of clonal TMB (29,30), another marker associated with more accurate identification of potential response to immunotherapy.
In terms of identifying druggable mutations for targeted therapy selection, the TSO pipeline presents an attractive platform especially when coupled with a clinical annotation engine such as the two used here (OncoKDM and PierianDx CGW (31)). We found good correlation between mutations detected in whole genome sequencing experiments, and the identification of druggable mutations was made straightforward by the use of integrated clinical pipelines to produce reproducible data.
Copy number variations, especially amplifications, represent important therapeutic targets.
The TSO500 assay detected the known amplifications in a control sample meaning that patients can potentially undergo therapeutic targeting. A unique advantage of the TSO500 system is the ability of a partner targeted RNA-seq assay that can detect RNA fusions. We found that the assay reliably detected NTRK (32), ALK (33), and RET (34) fusions that had previously been identified by FISH, as well as a novel fusion not previously detected using other technologies. Intriguingly, we also successfully detected known fusions at the DNA level de novo in the HD753 control sample, suggesting that this methodology may also be valid for future use, although DNA-based fusion calling has a high false negative rate. Fusion genes represent good drug targets, and a number of novel agents (19,32) have been shown to be active against fusion genes. Detection of circulating RNA for these fusion genes may also be possible (35) using this assay and could be explored further.
The UK 100,000 Genome project has recently completed, and analysis and reporting is ongoing. The use of whole genome sequencing for tumour-normal pairs using fresh frozen material still has significant challenges from a cost perspective as well as the practicalities of obtaining fresh-frozen tissue over readily available paraffin embedded material. The TSO500 assay costs approximately one third the price of a whole genome sequencing assay, requires no germline DNA control, allows RNA fusion detection, and can be implemented on benchtop sequencers. Its main disadvantages include more laborious library preparation and enrichment chemistry that is vulnerable to drop out.
In conclusion, we believe that the TruSight Oncology 500 assay offers a cost-effective, accurate, pan-cancer assay that can derive SNP, CNV, and gene fusion information across . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 4, 2020. ; https://doi.org/10.1101/2020.02.01.20019992 doi: medRxiv preprint the majority of cancers using a standardised pipeline and therefore is suitable for routine use in precision oncology as a comprehensive genomic profiling solution.
. CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 4, 2020. ; https://doi.org/10.1101/2020.02.01.20019992 doi: medRxiv preprint      is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint   is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 4, 2020.        is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 4, 2020. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted February 4, 2020. ; https://doi.org/10.1101/2020.02.01.20019992 doi: medRxiv preprint