Validation cohorts
Sequencing on a MinION instrument for 2 h generated 0.4Gb of sequence data across the 6 individuals of the whole-blood validation cohort, with an average depth of 36,000x over the HBB locus and an average quality score of Q20. Genotypes were classified as either heterozygous or homozygous based on their variant allele frequency (VAF) values (40–60% for heterozygous, ≥75% for homozygous). There were 13 germline variants in our WB validation cohort. Of these, 7 were synonymous changes, 2 were non-synonymous and 4 were found to affect the HBB initiation codon. All 6 individuals contained a homozygous non-pathogenic same-sense polymorphism (average VAF 99.7%, range 99.4–99.9) affecting the His3 residue (HBB; c.T9C). 2 individuals were heterozygous for the pathogenic HBB: c.20A > T mutation associated with SCD, with VAFs of 46 and 50%. 4 individuals were found to have HBB: c.2 T > C mutations, resulting in a loss of the HBB initiation codon. All 4 individuals were part of a single-family group, 2 parents and 2 daughters. Mother, father and 1 daughter were heterozygous for the variant (VAF 56, 44 and 41% respectively), while the remaining daughter was homozygous, with a VAF of 77%. All variants described were also identified by Sanger sequencing (Tables 1 and 2).
The 9 newborn DBS samples were sequenced in two groups, for 2 h each, and generated a total of 0.7Gb of sequence data, with an average depth of coverage of 48,000x and an average quality score of Q20. Of the 18 mutations identified, 9 were synonymous and 9 were nonsynonymous. All synonymous mutations were His3His variants, including 1 heterozygous (VAF 55%) and 8 homozygous (average VAF 99.3%, range 97.9–99.9%). The SCD-associated HBB; c20A > T variant was found in all 9 individuals, 5 heterozygous (average VAF 56.6%, range 44.7–72.6%) and 4 homozygous (average VAF 88.6%, range 80.7–95.1%). These variant calls were consistent with previously obtained data from IEF screening (Tables 1 and 2).
Overall, there was 100% concordance for identifying the SCD status between standard tests and Nanopore sequencing. In addition, the Nanopore sequencing approach detected additional variants of clinical significance in the beta-globin locus. There were no sequencing failures that required repeat analysis.
Discovery cohort
We then extended our analysis to a discovery cohort of 18 patients, using DNA extracted from both whole blood (n = 12) and dried blood spots (n = 6). Across three sequencing runs, lasting 2 to 4 h, we generated 1Gb of sequence data, with an average quality score of Q20, and an average depth of coverage of 17,599x. There were 44 variants (22 synonymous, 14 nonsynonymous, 6 stop-gains and 2 splice-site) across all 18 individuals (Table 2). Again, all samples contained the common His3His polymorphism, while sample 245 harboured an additional synonymous polymorphism affecting the Lysine residue at position 60 (HBB; c.180G > A). The HBB; c 20A > T mutation was identified in 12 individuals, 1 homozygous and 11 heterozygous, with average VAFs of 94 and 47%, respectively. In 2 patients we identified a heterozygous variant located in the splice-donor region of exon one (HBB; c.92 + 1G > A) and (HBB; c.92 + 1G > T), also referred to as IVS1-1G > A and IVS1-1G > T. Stop-gain mutations were found in two individuals, both located in exon 2 (HBB; c.114G > A and HBB; c.118C > T), predicted to result in the loss of ~ 75% of the coding region (Table 2).
As samples 144, 245 and 462 were all also heterozygous for the HBB; p. Glu7Val variant, we used WhatsHap to determine the phasing for each mutation pair. In all three cases we were able to demonstrate that the reference allele at the Glu7 residue is in trans with the alternate allele for the second mutation (HBB; c.92 + 1G > A, HBB; c.114G > A and HBB; c.118C > T, respectively, Fig. 2). This information is expected to have a significant impact on the phenotype of the patient, and would be missed by a protein-based SCD screening test.
We found no mutation in 6 dB samples that were excluded in Table 2 that contain only non-synonymous mutations.
Cost analysis
One of the aims of this study was to develop a low-cost assay to screen for sickle cell and β-thalassaemia mutations, which would enable the assay to be adopted in low and middle-income countries (LMIC). As a consequence of amplifying the full β-globin locus in each patient, we were able to incorporate molecular barcodes for each patient during the library preparation stage. This allowed us to sequence multiple samples in a single run, using their barcode sequences to differentiate between them during data analysis. One advantage of the Oxford Nanopore flow cell is that they can be flushed without carry-over, allowing for a second sequencing run, on the same flow cell. On this basis, we prepared 24 samples per library, sequencing 8 libraries per flow cell. Thanks to this approach, we estimate the current cost of the test to be £11.57 per sample for consumables (Table 3). While this does not include hands-on time for library and bioinformatics analysis, the consumable cost compares very favourably with that of other sequencing methods and even with those of IEF and HPLC.
Table 3 Cost analysis of sickle cell and β-thal screening by nanopore sequencing. Cost estimates are based on 24 samples per library, and 8 libraries sequenced per flowcell