Background

Recently, there has been considerable discussion regarding how massively parallel sequencing (MPS) can optimally be applied in the context of clinical genetics services. Whole-genome MPS remains prohibitive in terms of cost, throughput, data handling and bioinformatic analysis complexity, as well as challenging clinical interpretation and raising many issues around the ethics of reporting results. Targeted MPS can address these issues by efficiently restricting clinical testing to sets of genes or genomic regions with known diagnostic value, while providing marked time- and cost-related advantages over traditional Sanger sequencing-based strategies.

We previously developed and reported Hi-Plex, a streamlined highly-multiplexed PCR approach for MPS library preparation, using DNA derived from both lymphoblastoid cell line and formalin-fixed, paraffin-embedded tumour tissue [1]. Our Hi-Plex library-building method integrates simple, automated primer design software that enables control of amplicon size. Importantly, this feature allows complete overlap of read pairs following paired-end sequencing to facilitate stringent downstream filtering of sequencing errors. We recently demonstrated that Hi-Plex using hybrid adapter primers (containing 5′-TruSeq compatible and 3′-Ion Torrent compatible sequences) can produce libraries suitable for both the Ion Torrent (PGM and Proton instruments, Life Technologies, Carlsbad, CA, USA) and TruSeq (MiSeq and HiSeq instruments, Illumina, San Diego, CA, USA) systems, which currently represent the two most commonly used MPS chemistries [2].

To assess the effectiveness of Hi-Plex in a high-throughput context, we used the MiSeq platform to perform mutation screening of 95 specimens, including three duplicated specimens, screened previously for genetic variants in the breast cancer susceptibility gene PALB2 (GenBank reference sequence NM_024675; MIM#610355). Variant calling was blinded to the known PALB2 germline status.

Methods

DNA samples

Our sample set consisted of 95 blood-derived DNAs derived from women affected by breast cancer that had been screened previously for mutations in the coding and flanking intronic regions of PALB2 (n = 90) or genotyped for known PALB2 pathogenic mutations (n = 5). All participants provided written informed consent for participation in the study. This study was approved by The University of Melbourne Human Research Ethics Committee.

Biological samples were provided by the Australian Breast Cancer Family Registry (ABCFR, 91 specimens, of which three were duplicated specimens) and the Kathleen Cuningham Foundation Consortium for research into Familial Breast cancer (kConFab, Melbourne, Australia, four specimens). DNAs from both resources were extracted using QIAamp DNA Blood Kit (Qiagen, Hilden, Germany). Quant-iT™ PicoGreen® dsDNA Assay Kit (Life Technologies) was used for quantification.

Previous screens were done by Sanger sequencing and high-resolution melting curve analysis (HRM) for 85 specimens, including the duplicates, whereas HRM only was applied to five specimens. We included five specimens carrying pathogenic non-sense mutations identified previously by Taqman probe-based assays: PALB2:c.196C>T (n = 1) and PALB2:c.3113G>A (n = 4). Sanger sequencing was performed as previously described in [3] (unpublished data). HRM and Taqman probe-based assays are described in [4] and results of variant detection are reported in [4, 5].

Mutation screening using Hi-Plex

This Hi-Plex assay was designed to target the PALB2 and XRCC2 genes. However, genotyping aspects of this study focus on PALB2 only, as we did not have a similar test set with genotype data for XRCC2.

Sixty primer pairs targeting the protein coding and some flanking intronic and untranslated regions of PALB2 and XRCC2 are described in [1] and Additional file 1. Dual-indexed hybrid adapter primer sets are described in Additional file 2. All oligonucleotides were obtained from Integrated DNA Technologies (Coralville, IA, USA).

96 individual PCR reactions (95 specimen DNAs and one no-template control) were performed in a standard skirted PCR plate, in a final volume of 50 μl, with1X Phusion® HF PCR buffer (ThermoScientific, Waltham, MA, USA), 2 units of Phusion Hot Start II High-Fidelity DNA Polymerase (ThermoScientific), 400 μM dNTPs (Bioline, London, UK), approximately 0.5 μM gene-specific primer pool (individual gene-specific primer concentrations vary and are described in [2]), 2.5 mM MgCl2 (ThermoScientific) and 25 ng input genomic DNA. The following steps were then applied to conduct PCR: 98°C for 1 min, 6 cycles of [98°C for 30 sec, 50°C for 1 min, 55°C for 1 min, 60°C for 1 min, 65°C for 1 min, 70°C for 1 min], addition of 2 μM each dual-indexed hybrid N50#_TSIT_A and N70#_TSIT_P adapter primers, then a further 19 cycles of [98°C for 30 sec, 50°C for 1 min, 55°C for 1 min, 60°C for 1 min, 65°C for 1 min, 70°C for 1 min], followed by incubation at 60°C for 20 min. Five μl of each reaction were pooled before subjecting the resulting barcoded library (including the 96 sub-libraries) to electrophoresis on a 2% HR-agarose gel (Life Technologies). Size selection, gel extraction and purification were performed as described previously [1].

The library was then sequenced on a MiSeq instrument, using the MiSeq Reagent kit v2 300 cycles (Illumina). Prior to performing the run, 3.4 μL of 100 μM sequencing primers were added to the respective read1, read2 and i7 primer reservoirs in the reagent cartridge. Sequencing primers were obtained from Integrated DNA Technologies (sequences are provided in Additional file 2).

Sequencing data were mapped to the entire human genome (hg19) using bowtie2-2.1.0 [6] applying default parameters except for --trim5 20 --trim3 20. Bedtools v2.16.1 [7] was used to compute on-target coverage. We used ROVER variant caller, a software tool developed in-house and made available at https://github.com/bjpop/rover to perform automated variant calling. To be called in this application, genetic variants had to appear in i) both members of read-pairs; ii) at least 2 read-pairs; and iii) ≥ 15% of read-pairs. Homozygous variants were called when the minor allele was present in ≥85% of read-pairs. The tool also reports the number of read pairs covering each targeted amplicon. Sequencing statistics reported in this paper (on-target and coverage calculations) include both XRCC2 and PALB2, as they represent all the targeted regions. To assess the efficiency of the 60-plex assay across all 95 specimens, depth of coverage data were reported for 60 × 95 = 5,700 amplicons in total.

When validation was required for a genetic variant identified by Hi-Plex but not reported in previous screens, Sanger sequencing was performed using BigDye Terminator v3.1 (Life Technologies), according to the manufacturer’s instructions.

Results and discussion

In our set of 95 samples, of reads mapping to the hg19 human genome build an average of 96.62% were on target. Across samples, the on-target rate ranged from 93.01% to 98.26% and the total number of reads that mapped on-target ranged from 7,933 to 171,466. When considering only correctly paired, on-target reads, we observed that 99.93% (5,696/5,700) of amplicons were represented at ≥10× coverage, across samples. Additionally, we found that 88.3% (5037/5700), 96.02% (5472/5700), 98.54% (5617/5700) and 99.30% (5660/5700) of amplicons were represented within 5-fold, 10-fold, 20-fold and 30-fold of the median coverage. Additional file 3 illustrates the coverage distribution across a sample of BAM files.

We accurately detected all 56 variant calls identified through previous mutation screening by Sanger sequencing and/or HRM, and Taqman probe-based genotyping. Heterozygous variants were observed in 37.23% (35/94) to 62.33% (513/823) of read-pairs (median = 51.23%). No false positive calls were assigned. All three pairs of duplicated samples yielded concordant genotypes.

The 56 calls comprised instances of 11 distinct genetic variants, including two non-sense variants (PALB2:c.196C>T and PALB2:c.3113G>A), two frameshift variants (PALB2:c.1947_1948insA and PALB2:c.2982_2983insT), four missense variants (PALB2:c.1010T>C, PALB2:c.1676A>G, PALB2:c.2014G>C and PALB2:c.2993G>A) and three synonymous variants (PALB2:c.1572A>G, PALB2:c.3300T>G and PALB2:c.3495G>A). Additional information regarding genotyping results is available in Table 1.

Table 1 PALB2 variants identified in previous screens (Sanger sequencing and HRM) or genotyping assays (Taqman probe-based), and detected via Hi-Plex

Our screening by Hi-Plex also detected one PALB2:c.1470C>T carrier that was identified by HRM but not reported by prior Sanger sequencing, and one PALB2:c.2590C>T carrier that was not reported by either method. Upon re-analysis of the respective chromatograms and HRM curve, both variants were apparent in the expected samples (Additional file 4).

Discordant results were observed for two samples screened by Hi-Plex and HRM methods. The PALB2:c.2993G>A variant was detectable upon re-analysis of the HRM curve, whereas the PALB2:c.1676A>G carrier was not (Table 1). All four additionally identified variants were confirmed by follow-up Sanger sequencing.

Here, we have validated that Hi-Plex is capable of accurate, cost-effective and rapid high-throughput mutation screening using a series of 95 specimens previously characterized for PALB2 genotype.

By performing single-step, highly-multiplexed PCR library-building, we avoided multiple manipulations, and waste of biological material and reagents associated with alternative methods [8]. Results reported here demonstrate that not only does Hi-Plex extensively reduce labour associated with amplification protocol optimization and library preparation, it also allows accurate screening without the need for normalisation of individual barcoded libraries before pooling and sequencing.

Easy and rapid library preparation did not compromise sequencing efficiency as shown by the 99.93% of amplicons represented at ≥10×. It did not impact on the sensitivity and specificity of variant detection either. All previously identified genetic variants were detected using our method. Furthermore, no false positive variants were called. Discordant calls as compared to previous screens proved to be genuine variants following confirmatory Sanger sequencing or detectable upon re-analysis of chromatograms and/or HRM curves. As stated previously, Hi-Plex’s experimental strategy includes a primer design tool that allows generation of primers for amplicons of a defined size, which should be shorter than the length of a sequencing read. As such, completely-overlapping reads can be achieved when performing paired-end sequencing. This allows stringent filtering of sequencing chemistry-induced artefacts by only considering variants that appear in both reads of pairs. In turn, this allows highly accurate variant detection.

The screen for genetic variations across 95 specimens reported here was achieved in two days at a cost of ~ AU$20/specimen, accounting for all aspects of library-building, MPS and analysis (including technician time). The equivalent Sanger sequencing-based screen would take approximately two weeks and confer a total cost of ~ AU$400/specimen.

This report shows that our Hi-Plex approach performs with a sensitivity and accuracy suitable for diagnostic application, while being more time- and cost-effective than Sanger sequencing, the current “gold standard” screening method. The mechanisms underlying Hi-Plex suggest that higher parallelization should be achievable without extensive protocol adjustment. Future experiments will involve increasing the level of multiplexing of Hi-Plex, with the aim of achieving robust thousands-plex multiplexing. Cost-effective and rapid methods for screening are highly desirable for mutation scanning, particularly in clinical settings, where eligibility is partly dictated by cost of testing. Lower screening costs could help facilitate the shift from single-gene to gene-panel screening and support a new approach to personalised clinical genetics service delivery.

Conclusions

In the context of research and ‘gene association’ studies, Hi-Plex enables large-scale sequencing in genetic epidemiological studies at relatively low cost, with more flexibility than currently offered commercial solutions where targeted sequencing is often constrained to specific platforms. The latter confer design inflexibilities and are costly to re-design in a setting where screening strategies are often re-directed by recent findings. Hi-Plex’s intrinsic modular flexibility in terms of target region design, as well as sequencing platform, renders the approach highly attractive for an extensive range of clinical and research applications.