Introduction

Epithelial ovarian cancer (EOC), which constitutes 90% of ovarian cancer (OC) cases, is the leading cause of death in women with gynecologic cancer, as up to 70% is diagnosed in late, more advanced stages (Federation of Gynecology and Obstetrics (FIGO III-IV)) [1]. The 5-year survival rate depends highly on the stage at the diagnosis. Hence, early stage (FIGO I) patients display a survival rate above 90%, while for late stages (FIGO IV) it is below 30% [2]. EOC can be categorized into five distinct subtypes: high-grade serous (HGSC, 70%), clear cell (oCCC, 10%), endometrioid (10%), mucinous (< 5%), and low-grade serous carcinomas (< 5%) [3]. These subtypes display different morphological, genetic, epigenetic, and clinical features [3]. For example, HGSC, which is the most common subtype of EOC, is characterized by TP53 mutations (96% cases) and alterations in homologous recombination (HR) DNA damage response pathway (DDR) (approximately 50% cases) [4]. Contrary to HGSCs, oCCCs usually present a lower frequency of HR gene alterations and express wild-type p53 protein [5]. Genetic alterations in ARID1A and PIK3CA as single or double-hit mutations are most frequent in oCCC patients (approximately 50%) [6].

Diverse next-generation sequencing (NGS) approaches such as whole genome sequencing (WGS), whole exome sequencing (WES), RNA sequencing or targeted gene panels are widely used in cancer research and diagnostics worldwide [7]. WGS or WES offer a potential to identify molecular biomarkers, especially to be used in new clinical trials for being validated in future standardized or experimental treatment options, whereas targeted gene panels, which provide greater depth of coverage by focusing on known cancer associated genes, are used routinely in diagnostics to guide personalized treatment [7, 8]. There is ongoing discussion which of these approaches should be implemented in a diagnostic routine use, as there are many factors to consider, e.g., time, cost, confidence of discovered variants, data-analysis effort of large datasets if not in-silico based strategies are used and low-likelihood of benefits [9, 10]. Moreover, each of these sequencing approaches consists of a chain of various biochemical steps and different data filtering and analysis strategies that may impact coverage and variant calling accuracy among vendors [11]. DNA quality and extraction methods, library preparation, the sequencing platform, coverage, data filtering and analysis workflow are among main factors that might impact final sequencing results in clinical context, where formalin-fixed paraffin-embedded tissues are routinely used [11, 12].

Recently, we have published results of NGS in OC patients based on two approaches: target gene panel (OCAv3) [13, 14] and WES [15]. Here, we present a direct comparison between OCAv3 and WES including detection of somatic single and multiple nucleotide variants (SNV and MNV), as well as small insertions/deletions (INDELs) in exonic and splice site regions of 146 OCAv3 genes in 5 HGSC and 3 oCCC patients. In order to limit the number of factors that could possibly contribute to any variation, we decided to perform the comparison of two strategies (OCAv3 and WES) by use of fresh-frozen tissues although formalin-fixed paraffin-embedded tissues are main source for tumor molecular characterization in a daily clinical routine. However, formalin treatment might lead to a range of chemical modifications to the DNA, thereby causing technical challenges and affecting the accuracy of sequencing [16]. The libraries for both strategies were prepared from the same DNA sample for each sample.

Moreover, the two strategies are offered by the same vendor, which offers a possibility to have same library preparation strategy, and sequencing platform, as well as comparable data filtering and analysis workflow. The choice of the vendor was based on the current platform being used for routine clinical testing in our department.

Materials and methods

Patient cohort

The fresh frozen samples of five high-grade serous (HGSC) and three clear cell ovarian (oCCC) cancer patients samples were acquired from two Danish projects: the Pelvic Mass study (2004–2014) and the GOVEC (Gynecological Ovarian Vulva Endometrial Cervix cancer) study (2015 – ongoing) through the Bio- and Genome Bank Denmark. The oCCC_02 sample was described as mixed clear-cell and endometroid histology although with dominant oCCC histology. The study was performed according to the guidelines of the Declaration of Helsinki, including written informed consent from all patients. The study has been approved by the Danish National Committee for Research Ethics, Capital Region (H-17,029,749/H-15,020,061). To determine percentage of tumour cells, a pathologist specialized in gynecology examined haematoxylin and eosin (H&E)-stained tissue slides neighboring the excised tumor.

Wes and OCAV3 sequencing

Genomic DNA was extracted from fresh frozen samples using Maxwell RSC Tissue DNA (AS1610, Promega). DNA concentration measurements were performed on the Qubit system with the High Sensitivity dsDNA assay kit (Q33120, Thermo Fisher Scientific). Exome sequencing libraries were prepared from 100 ng DNA using the Ion AmpliSeq Exome RDY kit (A38262, Thermo Fisher Scientific) according to the manufacturer’s protocol. The Oncomine™ Comprehensive Assay v3 (OCAv3) libraries were prepared according to the manufacturer’s instructions MAN0015885 (Revision C.0) with Ion AmpliSeq™ Library Kit Plus (Thermo Fisher Scientific). Multiplex PCR amplification was conducted using a DNA concentration of approximately 20 ng as input for OCAv3 assay. Amplified exome and OCAv3 DNA libraries were loaded onto an Ion 550 Chip (A34537, Thermo Fisher Scientific) using the Ion Chef System (Thermo Fisher Scientific). Sequencing was performed on an Ion S5XL System (Thermo Fisher Scientific).

Data processing and variant calling

Exome sequencing data were acquired, pre-processed, aligned to the human genome assembly 19 and analyzed by Ion Reporter™ Software (v. 5.10) (Thermo Fisher Scientific), coupled with AmpliSeq Exome single sample (Somatic) analysis module. For OCAv3, Ion Reporter™ Software (v. 5.18) (Thermo Fisher Scientific), coupled with Oncomine Comprehensive v3 - w4.2 - DNA - Single Sample analysis module was used for initial automated analysis. Files were downloaded without any filter chain to include all identified variants. Further filtering for true variants was performed using R environment and Python programming language (3.9.2) [17], as described previously [14, 15], albeit with modifications regarding “Potential germline” (Allele ratio on target allele ≥ 0.98 instead of Allele ratio on target allele = 1) and “Strand bias” (Phred-scaled p-value from a Fisher’s Exact Test > 55 instead of 60) to filter out false positive variants. Only SNV, MNV or INDEL variants located in exonic or splice-site (located within the first three nucleotides of the 5’ or 3’ end) regions of the 146 genes from the OCAv3 were selected for further analysis. Moreover, these variants had to pass the Ion Reporter™ Default Variant View filter and their nucleotide length should be equal or above 1. TP53 variants are found in more than 90% of HGSC cases based on previous reports [18], therefore in order to determine cut-off for a coverage filter for WES, we performed first manual TP53 variant check of non-filtered data from the subjects with HGSC (value = 49). The following thresholds define subsequently applied exclusion criteria in OCAv3 and WES workflows:

  • UCSC Common SNPs (SNPs with a minor allele frequency of at least 1% and mapped to a single location in the reference genome assembly) = “CommonSNP”.

  • Ion Reporter Variant Effect = “Synonymous”.

  • Coverage < 100 (OCAv3) or Coverage < 49 (WES) = “Low overall coverage”.

  • Coverage < 10% of mean coverage above 100 (OCAv3) or 49 (WES) = “Low base coverage”.

  • Allele ratio on target allele ≥ 0.98 = “Potential germline”.

  • Homopolymer length > 5 = “High homopolymer content”.

  • Allele ratio < 25% of average allele ratio per sample = “Allele ratio below Q1”.

  • Phred score < 200 (OCAv3) or < 100 (WES) = “Low Phred score”.

  • Ion Reporterp-value > 0.01 = “Above p-value”.

  • Phred-scaled p-value from a Fisher’s Exact Test > 55 = “Strand bias”.

All variants that passed the above-described criteria were clinically annotated using the ClinVar database (data status check: March 14, 2023). Variants classified as “Benign” or “Likely Benign” by the ClinVar were excluded from further analysis. All remaining variants were manually assessed for sequencing and annotation errors with integrated genomic viewer (IGV) (Broad Institute, USA) to confirm or exclude findings.

Genomic ranges filtering

The OCAv3 enables DNA-targeted sequencing of 146 cancer-associated genes (Online Resource 1). To compare variants in genomic regions covered by both assays, we downloaded the files from the Ion Reporter™ Software (“amplicons_low_no_coverage_statistics.txt”) and extracted mutually covered locus positions for 146 genes for both OCAv3 and WES. Furthermore, a cross-check for gene names was performed and two gene names from WES panel were updated to newly approved gene names from OCAv3: H3F3A and HIST1H3B were replaced with H3-3A and H3C2, respectively.

Results

OCAv3 sequencing was performed according to our routine protocol used to support patient diagnosis and treatment decisions where we aim for at least 8 million reads per sample which corresponds to a mean coverage of approximately 2400. For OCAv3 sequencing, the mean of mapped reads was 11.97 (± 3.40) million and mean coverage depth 3527.63 (± 1053.58) (Table 1). These numbers are in line with our previously described results of OCAv3 sequencing of 50 FFPE samples, which resulted in mean of 11.41 (± 4.44) million mapped reads and mean coverage depth of 3100.16 (± 1194.72) [14]. For WES, we aimed at 200x coverage according to the Ion AmpliSeq™ Exome RDY Library Preparation User Guide. The obtained values are not ideally in line with the expectations, as for WES there were 5 samples below expected 200x coverage (Table 1).

Among 25 variants detected by OCAv3 panel (pathogenic, likely pathogenic or variants of uncertain significance) (Table 2), two variants have not passed filtering criteria for WES: ARID1A: p.Gln563Ter and TP53: p.Ser261ValfsTer84 due to low coverage (Table 3). Both variants are not present in the ClinVar database. The mean coverage for the OCAv3 assay is 1707, whereas for WES is 189, when considering all targeted regions specific for each assay. However, when comparing the mean coverage per gene, there are some differences, for example the average coverage for targeted regions of TP53 is 120, whereas for ARID1A is 181 for WES (Fig. 1 and Online Resource 2). Not all regions of both genes are covered uniformly for both assays (Fig. 1 and Online Resource 2). All TP53 and 2 out of 99 ARID1A OCAv3 amplicons have coverage above 100x. Conversely, 2 out of 14 TP53 and 7 out of 43 ARID1A WES amplicons had coverage below 49 (Online Resource 2).

There is a significant difference of coverage for these two variants for both assays (1994 versus 25 for the ARID1A variant, and 1756 versus 37 for the TP53 variant) (Table 3). Both variants are associated with high frequency > 83%. The tumor content estimated by the pathologist for oCCC_01 was 40%, and 50% for oCCC_03. There was another sample with similar tumor content: HGSC_05 – 40% and no differences between the variants were reported for the two strategies: OCAv3 and WES (Table 1).

There were no variants that were reported only by WES and not by OCAv3 in the overlapping regions of the 146 shared genes.

Table 1 Sequencing metrics for Oncomine™ Comprehensive Assay v3 and whole-exome sequencing
Table 2 Summary of pathogenic (P), likely-pathogenic (LP), or variants of unknown significance (VUS) found by whole-exome (WES) and/or Oncomine™ Comprehensive Assay v3 (OCAv3) sequencing in exonic and splice-site regions of 146 OCAv3 genes that were covered by both assays
Table 3 Comparison of the coverage and allele frequency for variants that were found by the Oncomine™ Comprehensive Assay v3 (OCAv3) workflow, but have not passed minimum coverage filtering criteria for whole-exome sequencing (WES)

Discussion

Targeted gene panels such as the OCAv3 panel, which covers 146 cancer-associated genes, display many advantages such as lower cost, short turnaround time, low rate of unspecific or incidental findings, and a high depth of coverage as well as using formalin fixed and paraffin embedded tissue from routine pathology setting. However, they may be less applicable for discovery studies than WES or WGS approaches [20]. The generation of large datasets requires additional computational power, data analysis if all data are analysed and also storage costs. Special consideration for research studies has to be paid to novel or secondary findings in regards to their potential impact [21].

Although there is a lot of discussion regarding the benefits and the disadvantages of both investigated approaches, there are limited reports, which compare the diagnostic performance of targeted gene panels versus WES [20]. Therefore, we compared the performance of OCAv3 and WES, while detecting somatic SNVs, MNVs, and INDELs in exonic and splice-site regions of all 146 OCAv3 genes in five HGSC and three oCCC patients. Our study indicates there is a risk of missing variants in clinically relevant genes when performing the WES testing, for example when comparing the coverage differences between two assays for the TP53 gene (Fig. 1A) or the ARID1A gene (Fig. 1B). Indeed, when comparing with the OCAv3 assay, two variants were not found by WES: ARID1A: p.Gln563Ter and TP53:p.Ser261ValfsTer84 (Table 3). Variant classification is not a straightforward task as until now there is no gold standard method for determining variant pathogenicity. Consequently, various resources gathering population data, functional information, disease databases and scientific reports are used by scientists and clinicians to categorize variants [22]. The ClinVar database (http://www.ncbi.nlm.nih.gov/clinvar/) at the National Center for Biotechnology Information is a freely available archive of submitted interpretations of the clinical significance of variants [23]. Interestingly, interpretations for a same variant might disagree, as shown in Table 2 for some variants: e.g., TP53:p.Val216Met or FANCA:p.His1417Asp, which emphasizes the need for comprehensive set of standards for variant classification [24]. Both missing variants: ARID1A:p.Gln563Ter and TP53:p.Ser261ValfsTer84 were not present in the ClinVar database, but they would be classified in clinical routine practice as likely oncogenic/oncogenic based on the “Standards for the classification of pathogenicity of somatic variants in cancer” recently published as joint recommendations of Clinical Genome Resource (ClinGen), Cancer Genomics Consortium (CGC), and Variant Interpretation for Cancer Consortium (VICC) [25]. The TP53:p.Ser261ValfsTer84 variant was recently reported as likely oncogenic in Brazilian patients EOC cohort [26].

The missing variants are not associated with low allele frequencies (Table 3). In case of OC, variants are subtype-specific, therefore based on the sequencing analysis it is possible to confirm or exclude an initial diagnosis. HGSC is characterized by TP53 mutations (96% cases), whereas variants in ARID1A and PIK3CA are most frequent in oCCC patients (approximately 50%) [5, 6]. Satisfactory coverage (preferably 250) for detection of somatic variants for WES sequencing is associated with higher cost regarding consumables, turnaround time, data analysis and storage [19]. Moreover, increasing the overall coverage for WES sequencing might not be sufficient enough, as the reduced coverage in very specific regions will remain the problem for precise variant calling [27, 28]. As presented in the study, the expected and achieved coverage were different for more than 50% of samples, therefore the real values need to be determined empirically in specific clinical settings. The overall coverage for TP53 gene in overlapping regions is 120 for WES (Online Resource 2), but for particular regions it might be below the coverage threshold, which might lead to omitting potentially relevant variants, as shown in our study. However, this appears to only impact a subset of variants, as other variants from TP53:p.Gly244Asp located on exon 7 and ARID1A:p.Gln594SerfsTer25 located on exon 3 were detected using WES showing the highly variable coverage across same exons. Therefore, before implementing any of sequencing strategies in a clinical context, evaluation of the potential influence of reduced coverage on the clinically relevant regions of the genome needs to be performed.

In order to perform targeted NGS sequencing, target enrichment is required and it can be accomplished by using two major approaches: PCR-based amplicon and hybrid capture-based methods [19]. PCR amplification has been highly effective in sequencing applications where the nucleic acids are scarce or of poor quality, such as fine needle aspirates or formalin-fixed paraffin embedded tissues. Moreover, it offers short, simple and cost-effective workflow. However, when compared to hybrid-capture-based methods, PCR-based amplicon method might result in higher rate of sequencing errors, for example in regions rich in repetitive sequences [19, 29, 30].

Our study has been focused on the analytical performance between the OCAv3 and WES and have been performed on the same isolated DNA per each sample, based on same target enrichment strategy (PCR-based amplicon) and were sequenced by use of the same platform. Therefore, it would be beneficial to compare the impact of various enrichment strategies and sequencers on the final list of detected variants, while working with OC, as such impact has been reported previously, however not in OC [29].

Conclusions

WES offers the potential to investigate nearly all protein coding regions of the genome. Therefore, it enables finding of variants in novel disease associated genes variants that might be particularly interesting for discovery of potential new treatment targets and options. Such findings may not be beneficial for individual patients if there are no approved treatment strategies available at the sequencing testing time. However, novel findings might facilitate recruitment into clinical trials if genes are not currently used to guide diagnostics or treatment decisions. When moving from exploratory WES to diagnostic WES is considered, the need for a much higher sequencing depth should to be taken into account. Therefore, to minimize costs associated with performing assay in routine diagnostic setting, data analysis and storage, targeted gene panels such as OCAv3 seem to be more suitable in clinical testing. Moreover, as we demonstrate that the differences between WES and OCAv3 in clinically relevant genes for subtype classification can be observed, each testing center needs to take into consideration the advantages and benefits of implementing these strategies for performing clinical testing [19, 31, 32].

Fig. 1
figure 1

The average coverage of overlapping TP53 (A) and ARID1A (B) regions in eight samples sequenced by both assays (OCAv3 and WES). Horizontal black line indicates coverage of 250x, which is suggested as satisfactory for detection of somatic variants [19]. Note that y-axis is not continuous to display both strategies on the same plot. The coverage for each overlapping region between OVAv3 and WES was averaged across eight samples. Outliers (black dots) are defined as values exceeding 1.5 times the interquartile range above the 75th percentile or falling below 1.5 times the interquartile range below the 25th percentile