Background

Infection remains a feared and devastating complication of orthopaedic implant surgery. It occurs in up to 2% of prosthetic joint replacements [1] and may present several years after implantation [2]. Recent studies in England of joint revisions undertaken for infection report an increase in prevalence for both knee and hip revisions between 2003 and 2014 (2.5-fold and 7.5-fold and 2.3-fold and 3.0-fold increase following primary and revision knee and hip replacements respectively [3, 4]). It has been estimated that in the USA, there will be more than 65,500 infected joint replacements per year by 2020 [5]. Improvements in speed and accuracy of diagnosis may improve outcomes following revision surgery by allowing more targeted therapy. PJI diagnosis can be challenging as infections may be associated with biofilms that colonise the orthopaedic devices [6], with a small but potentially problematic number caused by fastidious or slow-growing organisms that are not detectable by culture or from patients who have received prior antibiotics. Although culture of multiple periprosthetic tissue (PPT) samples remains the gold standard for microbial detection, it is relatively insensitive, with only approximately 65% of causative bacteria detected even when multiple PPT samples are collected [7,8,9].

Development of molecular methods, such as 16 s rRNA sequencing, can be more sensitive in detection of PJI [10]. An alternative is the use of metagenomic shotgun sequencing that can detect full bacteria genomes directly from a sample. Sequencing directly from samples can provide accurate diagnostic information for PJIs when compared to laboratory culture and can also detect additional organisms [11, 12] and potentially provide additional information such as presence of antimicrobial resistance genes [12].

Using third generation sequencing technology, developed by Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio), longer read lengths in faster turnarounds are possible. The ONT MinION potentially could allow analysis to be conducted in real-time with obvious advantages to clinical diagnosis of infection. Examples of metagenomic pathogen studies using MinION include viral detection from serum [13] and bacteria from urines [12]. These previous studies have shown proof-of- principle for direct from sample clinical sequencing using ONT MinION. However, PJI sequencing has a further challenge of high human DNA contamination which require specific laboratory preparation and bioinformatic analyses to overcome. A previous study using ONT MinION sequencing to identify pathogens within highly human DNA contaminated pleura effusion samples used 16 s rDNA sequencing [14]. This proved quick identification was possible in high host DNA samples but could not provide further genomic information.

Here we describe proof-of-principle for the use of ONT MinION sequencing for the diagnosis of PJI when compared to standard microbiological culture and Illumina sequencing. We describe an analysis work-flow that differentiates between predicted infection species and background contamination and can be run during sequencing for real-time species detection.

Methods

Samples

Samples used in this study were collected by the Bone Infection Unit at the Nuffield Orthopaedic Centre (NOC) in Oxford University Hospitals (OUH), UK, as previously described [11]. Nine samples previously assessed by Illumina MiSeq sequencing were chosen for further analysis by ONT MinION sequencing. Samples were chosen from the remaining DNA extracts that had sufficient DNA to either be sequenced directly, or amplified and sequenced, and to represent a range of disparate species and compositions.

DNA preparation and sequencing

Libraries were prepared for sequencing on an Oxford Nanopore MinION (Oxford Nanopore Technologies (ONT)) using genomic DNA previously extracted from sonication fluid samples [11]. Samples 259, 312, 335, 352 and 354 were prepared using the 1D genomic DNA by ligation protocol (SQK-LSK108) (ONT). Samples 229, 249, 506 and 509 had insufficient DNA for this protocol so were prepared using either a PCR-based protocol for low input genomic DNA with modified primers (DP006_revB_14Aug2015), followed by rapid sequencing adapter ligation (ONT) (sample 229) or the 1D low input genomic DNA with PCR protocol (SQK-LSK108) (ONT) (samples 249, 506 and 509). Briefly, the protocols comprise DNA end-repair and dA-tailing (NEBNext Ultra II End Repair/dA-Tailing Module, New England Biolabs (NEB), Ipswich, MA, USA) followed by purification using AMPure XP solid phase reversible immobilisation (SPRI) beads (Beckman Coulter, High Wycombe, UK); Sequencing adapter ligation (Blunt/TA Ligase Master Mix, NEB) followed by additional SPRI bead purification. For the samples with insufficient DNA requiring PCR amplification, additional steps between end-repair and sequencing adapter ligation included; PCR adapter ligation (Blunt/TA Ligase Master Mix, NEB) followed by SPRI bead purification; PCR amplification (Phusion High Fidelity PCR Master Mix, NEB) with 18 cycles (samples 229 and 249) or 24 cycles (samples 506 and 509) followed by additional SPRI bead purification. Samples were sequenced on FLO-MIN105 (v.R9) (sample 229) or FLO-MIN106 (v.R9.4) (all other samples) SpotON flowcells.

PCR analysis of sample 354a

Quantitative real-time PCR (q-PCR) was performed for sample 354a to determine relative amounts of both Arcanobacterium haemolyticum and Fusobacterium nucleatum DNA in the original sonication fluid genomic DNA extract. qPCR was performed on a Stratagene MX3005P QPCR System (Agilent Technologies, Santa Clara, CA, USA) using Luna Universal Probe qPCR Master Mix (New England Biolabs, Ipswich, MA, USA). For A. haemolyticum, primers and probe were designed to target the phospholipase D gene: forward primer ATGTACGACGATGAAGACGCG (previously published, [15]), reverse primer TTGATTGCGTCATCGACACT, probe [6FAM]-TTGGTAGTGCGGCTGCTGCGCC-[TAM]. For F. nucleatum, primers and probe were designed to target the nusG gene: forward primer CAGCAACTTGTCCTTCTTGATCC, reverse primer CTGGATTTGTAGGAGTTGGTTC, probe [6FAM]-AGACCCTATTCCTATGGAAGAGGAAGAAGTA-[TAM]. Reactions were performed in 20 μl with 2ul of template DNA, 0.4 μM of each primer and 0.2 μM of the probe. Cycling conditions were an initial denaturation at 95 °C for 1 min, followed by 40 cycles of 95 °C denaturation for 15 s and 60 °C extension for 30 s. Genomic DNA, extracted from cultures of A. haemolyticum (Type Strain NCTC 8452) and F. nucleatum subspecies vincentii (Type Strain ATCC 49256), was diluted to 100,000 genome copies per μl then serially diluted to 10 genome copies per μl and used to create copy number standard curves for both species. Negative controls, replacing template DNA with water, were also performed. All reactions were performed in triplicate.

Bioinformatics analysis

We assembled an analysis pipeline for detection of bacterial pathogens using ONT MinION sequencing of orthopaedic device infections. The pipeline includes filtering steps for the genetic sequence data that have been tuned on seven positive samples with known infections and two culture negative samples.

The analysis was performed within a Nextflow workflow [16] with the software contained within a Singularity [17] image generated from a Docker repository [18]. This workflow and software are available for public use, [19], with our intention for the analysis to be reproducible or replicable with other datasets on most systems.

The workflow, CRuMPIT, has three major components, as shown in Fig. 1. The first monitors the output of a MinION device or devices and creates batches of fast5 files (default 1000) as they are written to a storage drive location, Fig. 1 (a,b). The second receives the fast5 files and uses a Nextflow workflow that basecalls data to be classified and aligns them to specific reference sequences with results pushed to a database, Fig. 1(c). Thirdly, analysis results including species identified, are determined and continually updated as the run progresses, Fig. 1(d).

Fig. 1
figure 1

Diagram of analysis process. a MinION sequencing using MinKNOW (runs outside of CRuMPIT). b Fast5 files are detected and submitted as batches for the Nextflow workflow. c Nextflow workflow which is contained within a singularity image and can be distributed across a cluster (SLURM used here) or on a local machine. d Run analysis using data pushed to a MongoDB database, this can be conducted separately on any machine with network access to the database. Each component (green or blue rounded rectangle) of CRuMPIT can be run independently from the same or different networked computers, (e) or the entire process can be run from a single program. Square rectangles represent programs, some of which are within python wrappers. Arrows represent direction of data transfer within the workflow or between componants

During the progression of this project, ONT have released several different software applications for basecalling, with each version improving accuracy [20]; we used the most up to date and reliable version at the time of sequencing. Basecalling from the fast5 files used different versions of either Metrichor (dragonet), MinKNOW-Live or ONT Albacore, Table 1. Fastq files were generated from the Metrichor or MinKNOW basecalled fast5 files using fast5watcher.py (commit b88e14a) [21] for downstream analysis. Albacore is now used as the basecaller within the CRuMPIT workflow, with sequences basecalled directly to fastq files for analysis. Experimental use of Guppy (ONT developer access required, version 0.3.0) as a basecaller was performed to compare speeds. An additional Porechop [22] (v0.2.3) step for de multiplexing barcodes was added for use with Guppy.

Table 1 Nanopore basecallers and versions used for each sample

To minimise spurious read classifications caused by repeat regions, sequences within the fastq files were separated based on molecular complexity, with only high complexity reads analysed further. Complexity was calculated using a dust score threshold of seven with prinseq-lite-0.20.4 [23] which removes reads containing sequences consisting only of homopolymer, dipolymer and triploymer repeats.

Centrifuge [24] was used to classify sequencing reads to a taxonomic identifier. We used Centrifuge instead of Kraken [25] for this analysis because the initial starting match uses kmers of length 16, which is more suited to the Nanopore error profile compared to Kraken where databases are built with a default kmer size of 31. Additionally, the Centrifuge indexes require significantly less storage and memory compared to Kraken. A Centrifuge index [24] was constructed using bacterial and viral genomes downloaded from NCBI RefSeq as of 03-March-2017, and the human reference genome (GRCh38). Low complexity regions with a dust score greater than 20 in the reference sequences were masked using dustmasker (v 1.0.0, NCBI). Alternatively, the precompiled “P_compressed_b + v + h” available to download from the Centrifuge authors was also used, yielding very similar results to our database. We used our database for this analysis because it is a more recent and complete dataset. However, for ease of reproducibility, the precompiled databases can also be used.

Sequences with a taxonomic id, or a descendant, that belonged to a list of bacterial reference genome sequences downloaded from NCBI RefSeq, were mapped using minimap2 [26] (v2.2-r409). To be considered for detection, bacterial species were first classified by Centrifuge with a score of 150 or greater with over 10% of the classified bacterial bases. The score of 150 was chosen as a suitable cutoff after several thresholds were tested, Additional file 1: Figure S1. To remove spurious hits and background lab contamination, species were reported if they accounted for over 10% of the classified bacterial bases by Centrifuge which also removed the majority of negative control hits, Additional file 2: Figure S2. Alternatively, a read number threshold could have been chosen, however the margin of proportional read numbers was deemed too narrow between positive samples and negative controls. Therefore, a further mapping step was added to validate the Centrifuge classification.

To be confirmed as a positive the mapped reads required a mapping quality score (mapq) of 50 or above and had to account for greater than 1% of the classified bacterial bases. Mapq 50 was used to ensure high quality alignments and helped to remove any remaining indiscriminate alignments, Additional file 3: Figure S3. The 1% bases threshold was used after plotting bases over reads for positive samples and negative controls, Additional file 4: Figure S4. However, if a detection species meets these criteria, the mapped reads can have any Centrifuge score and are included in further analysis. Therefore, more reads can be included if mapping provides satisfactory alignment over Centrifuge classification. This filtering method was tuned to remove all hits from the negative controls but leave as many validated positive detection species reads as possible. It is therefore a heuristic method and can be tuned with greater power when more samples have been processed.

The entire workflow was run in Nextflow [16] with the software contained inside a Singularity [17] image. This has enabled the entire pipeline to run on a distributed cluster (SLURM [27]) with the flexibility to run on other platforms including locally on a single computer. A SLURM cluster was setup and used to handle the high computational demands of basecalling with Albacore, with the remaining pipeline requiring less computer time to complete. The cluster setup was built from a head node and four worker nodes with a total of 21 worker cores. Centrifuge was only run on two of the nodes, each with at least 16gb of memory. The workflow can be run in real time and detect new fast5 files from a MinION sequencing run, process them and push the data to a MongoDB database for analysis.

Results

Sample composition after analysis

Nine samples previously sequenced with an Illumina MiSeq were sequenced using the Oxford Nanopore MinION platform. Seven samples were extracted from bacterial culture positive sonication fluid. The remaining two samples, extracted from culture negative sonication fluid, were used as negative controls. Between 0.2 and 2.8 gigabases were basecalled for each sequencing run, with read lengths averaging between 500 bp and 1.7 kb (Table 2).

Table 2 Oxford nanopore technologies MinION sequencing yields and basic details and breakdown of centrifuge classification

The majority of classified reads were human, Table 2, with a range of 80% to 97% of bases in the sequenced culture positive samples coming from host contamination. A range of 0.04% to over 6% of bases were classified as bacterial by Centrifuge in the culture positive samples, Table 2.

Our analysis workflow identified one or more bacterial species per sample, with the exception of the two culture negative samples, 509a and 506a (Table 3). One sample, 354a, was polymicrobial, with Enterococcus faecalis, Arcanobacterium haemolyticum and Fusobacterium nucleatum identified. Two species of the same genus, Bacillus cereus and Bacillus thuringiensis, were identified in sample 352a. All other samples had only a single bacterial species identified.

Table 3 Species detected after read classification and reference genome alignment in CRuMPIT

The results from ONT MinION sequencing correspond with previously published analysis of the same samples by conventional microbiology culture and metagenomic Illumina MiSeq sequencing, Table 2 [11]. A notable difference between the two molecular analyses can be seen in sample 352a, where ONT MinION sequencing enabled species level detection. The Illumina short read sequencing identified Bacillus spp. only (agreeing with the corresponding culture results) whereas ONT MinION sequencing identified two species from the Bacillus cereus group: Bacillus cereus and Bacillus thuringiensis. It is worth noting that speciation within the Bacillus cerus group is problematic as species within this group share a high level of genome sequence identity [28]. Further investigation would be required to determine whether both species are actually present in this sample.

Another difference observed between the two sequencing techniques is in sample 354a, and concerns the relative abundance of sequencing reads/bases for the multiple species classified in this polymicrobial sample. The Illumina MiSeq sequencing identified A. haemolyticum as the most abundant species, at 72% of bacterial reads, with F. nucleatum representing 7% of bacterial reads. However, ONT MinION sequencing classified very similar base numbers for both F. nucleatum and A. haemolyticum (493,717 and 547,413 bases respectively) We speculated that this observed difference in proportions of reads for the F. nucleatum and A. haemolyticum was caused by platform sequencing bias, possibly as a result of variable genome GC content: The A. haemolyticum genome is 54% GC, compared to 27% for F. nucleatum. We used qPCR to test our hypothesis, and investigate which platform represents an estimate of genome abundance of these two species that is closest to the original DNA extract from sample 354a. qPCR results detected approximately equal copy numbers of both A. haemolyticum and F. nucleatum genomes in the original DNA extract, suggesting that ONT MinION sequencing has given a more accurate representation of species abundance in sample 354a, Table 4. However, standard deviations were high therefore further investigation will be needed to confirm this.

Table 4 qPCR results

Real time analysis

Using the ONT MinION platform, it was possible to analyse sequences in real-time, and predict the species composition of culture positive samples minutes after data acquisition. Samples containing a larger yield of bacterial DNA, such as 354a and 249a, produced several hundred kilobases of sequences within the first two of hours, Fig. 2a, b. Samples with lower yields, such as 352a, produced less sequence data, with several kilobases generated in the first 2 hours, Fig. 2c. For all the species identified that passed the analysis thresholds, however, the sequences generated after data acquisition were consistent with the species identified by traditional culture methods and MiSeq sequencing, Fig. 3. Each batch analysed within the Nextflow workflow took between four and fifteen minutes to process using a single core, depending on which node the job was submitted to, Additional file 5: Figure S5a. Therefore, real-time in this context needs to include this bioinformatics analysis time, the majority of which is basecalling. Encouragingly, basecalling speed was improved dramatically by using Guppy and utilising the graphics card of a single local PC, Additional file 5: Figure S5b. This enabled CRuMPIT analysis to be fully conducted on a single computer and time to detection more than halved.

Fig. 2
figure 2

Cumulative bases classified by Centrifuge and minimap2 reference alignment over the first few hours of sequencing on the MinION. Each marker on the plots represents a new sequence classified. Times are on the day of sequencing and taken from the read timestamp and doesn’t include bioinformatic time. Three samples shown showcasing the best and worst performers. a Sample 354a containing three different species. b Sample 249a containing Cutibacterium acne. c Sample 352a containing two different Bacillus species

Fig. 3
figure 3

Percentage of mapped bases (minimap2) to total centrifuge classified bacterial bases over the first two hours of sequencing. As with Fig. 2, each marker on the plots represents a new sequence classified. Times are on the day of sequencing. Three samples shown showcasing the best and worst performers. a Sample 354a containing three different species. b Sample 249a containing Cutibacterium acne. c Sample 352a containing two different Bacillus species

Discussion

Here we demonstrate proof-of-principle that long-read sequencing using the ONT MinION can detect bacterial infections from DNA extracted directly from sonication fluid samples, and potentially do so within minutes of starting sequencing. If DNA extraction techniques can be similarly optimised, these technologies have the potential to make intra-operative diagnosis of the causes of specific infections possible. This would allow both local and systemic antibiotics to be targeted to the causative organisms in prosthetic joint infections, starting at the time of surgery.

Analysis of the MinION data indicates concordance with the current gold standard laboratory culture and also Illumina short-read sequencing. In addition, we present a new analytical tool, CRuMPIT, which automates analysis of MinION data in this setting, and could be applied by other researchers and clinicians. By using negative controls we were able to determine signatures of background contamination - a challenge to diagnostic metagenomic interpretation [11, 29]. The thresholds and scores used within our bioinformatics workflow were determined after sequencing two negative controls that allow us to create heuristic thresholds to remove background sequences from kit contamination and false positives without masking the infection species. It will be important to determine the limits of detection for bacterial DNA in high host contaminated samples. Future studies will involve sequencing more samples and-spiked in references so refined threshold scores can be determined. This can be done as before with a Youden Index and J-statistic [11]. Sensitivity and specificity of MinION cannot be determined from this study and therefore further, more extensive studies are required before use in a routine diagnostic microbiology laboratory can be recommended.

Although we were able to predict each species present within the sequenced samples, the vast majority of DNA sequenced was human, from host contamination, despite efforts to reduce this in the laboratory preparation. Depletion of host DNA contamination will facilitate greater pathogen genome sequencing coverage but this continues to present challenges as the numbers of bacterial cells in joint infections is low [7] in relation to human cells. Previous studies with ONT MinION on direct clinical samples have used samples with relatively high concentrations of bacteria in urine [30] (compared to PJI samples) or moderate to high viral titres in blood [13]. The MinION has also been used for metagenomics in environmental samples [31]. However, reduction of human DNA could allow better genotyping, transmission analysis and antimicrobial resistance gene prediction as the proportion of bacterial DNA increases. Currently, this depends on laboratory development to reduce the number of human cells in samples rather than downstream bioinformatic analysis.

The sequencing yields here were low compared to other ONT MinION sequencing yields sequenced within the same lab (data not shown). DNA read lengths sequenced in this project are also relatively short, with the average under 1 kilobase, where mean read lengths can be expected over 10 kilobases with this method. This is likely due to the DNA extraction methods used, as they were optimised for MiSeq sequencing. However, of the four samples processed by PCR due to low DNA concentration, there was variation in read length and depth ranging from highest to lowest.

There are known biases for organisms associated with GC content in using PCR-based methods for sample preparation [32] and with Illumina metagenomic data [33]. We found some evidence that MinION sequencing may better reflect the relative abundance of pathogen DNA in polymicrobial infections, as it appeared less prone to GC biases than Illumina MiSeq short-read sequencing.

Detection of the species was possible within minutes of the sequencing run starting, and this includes the time required to process the sequencing data, with basecalling being the biggest bottleneck. The fast5 file batch size has an effect on turnaround time and reducing batch sizes is preferable for longer reads that take more time to basecall. We have tested the pipeline on a single PC and on a SLURM cluster on the same network as the computers running the MinION sequencers, enabling us to scale to the rate of sequencing and basecall with greater throughput than we could with a single machine, and analyse multiple sequencing runs in parallel.

A limitation of this study was seen in runs where reads were live basecalled with the MinKNOW basecaller: the runs produced data too quickly for the system to keep up. Retrospective basecalling was not possible at the time and the skipped reads have since been discarded. Therefore, in future studies using Albacore, as is the case with the most recent two sequencing runs (506a and 509a), we expect the average DNA yields to increase, which will aid species classification and potential genome completion.

The ONT MinION sequencing process has undergone continual development with substantial improvements since this project began. Therefore, we have used three different basecallers, Metrichor, MinKNOW and Albacore, for converting the raw signal or event data to DNA sequences. It is possible to rebasecall some of this data, but as we no longer have access to some sample raw data files, we cannot rebasecall all the samples. Also, as this would not reflect the real-time analysis carried out, we have not rebasecalled all samples with the same software version. Future studies should continue to use the most accurate, current, and efficient basecaller for real-time analysis. Furthermore, as ONT routinely updates protocols and computational tools, the impact on clinical diagnostics would need to be constantly evaluated and tested to achieve and maintain accreditation.

Although analysis of the sequencing is close to real-time, the DNA extraction and library preparation takes several hours, with 1D ligation preparation currently taking approximately 70 min or PCR amplification taking 150 mins. There are rapid library preparation kits available, however we feel the sequencing yield is currently too low for these to be a viable route to detection of pathogens directly from samples, particularly in samples with high host contamination. In addition, future studies will need to replicate samples to show this process is reproducible. This project was a proof of concept, but to be cost effective in the future, multiplexing of samples, smaller cheaper flowcells or reusable/washable flowcells may need to be employed.

Conclusions

The study shows reliable detection of infection species composition in prosthetic joint infections using ONT MinION sequencing. This represents proof of concept for utilising real time ONT MinION sequencing for PJI diagnostics. The speed of detection indicates that this technology has the potential to deliver results to the clinician in a timelier manner than traditional microbiological methods. Reduction of diagnostic time could have a significant positive influence on patient outcome, allowing prompt, targeted antimicrobial therapy.

The development of a reproducible workflow, as described in this study, has potential use for any clinical sample metagenomic ONT MinION sequencing, not just sonication fluids. The software used for analysis is provided [19] and can be installed and run locally or in a distributed cluster to scale with throughput. The controlling and analysis of the workflow is written in a python3 wrapper that relies on open source tools including, pysam [34], Biopython [35], Pandas [36], Matplotlib [37], ETE3 toolkit [38] and Numpy [39].