Analytical and Bioanalytical Chemistry

, Volume 410, Issue 22, pp 5439–5444 | Cite as

Serogroup-level resolution of the “Super-7” Shiga toxin-producing Escherichia coli using nanopore single-molecule DNA sequencing

  • Adam Peritz
  • George C. Paoli
  • Chin-Yi Chen
  • Andrew G. Gehring
Part of the following topical collections:
  1. Food Safety Analysis


DNA sequencing and other DNA-based methods are now broadly used for detection and identification of bacterial foodborne pathogens. For the identification of foodborne bacterial pathogens, taxonomic assignments must be made to the species or even subspecies level. Long-read DNA sequencing provides finer taxonomic resolution than short-read sequencing. Here, we demonstrate the potential of long-read shotgun sequencing obtained from the Oxford Nanopore Technologies (ONT) MinION single-molecule sequencer, in combination with the Basic Local Alignment Search Tool (BLAST) with custom sequence databases, for foodborne pathogen identification. A library of mixed DNA from strains of the “Super-7” Shiga toxin-producing Escherichia coli (STEC) serogroups (O26, O45, O103, O111, O121, O145, and O157[:H7]) was sequenced using the ONT MinION resulting in 44,245 long-read sequences. The ONT MinION sequences were compared to a custom database composed of the E. coli O-antigen gene clusters. A vast majority of the sequence reads were from outside of the O-antigen cluster and did not align to any sequences in the O-antigen database. However, 58 sequences (0.13% of the total sequence reads) did align to a specific Super-7 O-antigen gene cluster, with each O-antigen cluster aligning to at least four sequence reads. BLAST analysis against a custom whole-genome database revealed that 5096 (11.5%) of the MinION sequence reads aligned to one and only one sequence in the database, of which 99.6% aligned to a sequence from a “Super-7” STEC. These results demonstrate the ability of the method to resolve STEC to the serogroup level and the potential general utility of the MinION for the detection and typing of foodborne pathogens.


MinION Nanopore DNA sequencing Shiga toxin-producing E. coli, STEC Foodborne pathogen detection 


In a recent report, the World Health Organization considered 31 foodborne hazards and estimated the annual worldwide burden of foodborne disease at 600 million illnesses and 420,000 deaths [1]. According to the most recent estimates from the US Centers for Diseases Control and Prevention (CDC) [2], foodborne illnesses affect 48 million US Americans and result in 128,000 hospitalizations and 3000 deaths annually. While 80% of the illnesses in the CDC report were the result of unspecified agents, the annual cost associated with the 20% of illnesses that were attributed to specific pathogens was estimated to be in the range of $15 billion [3]. These sobering figures underscore the importance of detecting and identifying pathogens in foods before the food products are distributed and sold to consumers.

The food industry and regulators require rapid, specific, and cost-effective methods for the detection of bacterial pathogens in foods. Several factors complicate the detection and identification of pathogens from foods, for example, the complex nature of food matrices, the diverse microbial communities present in foods, and the fact that, when they are present, pathogens are typically at very low levels. Furthermore, in order to confirm the presence of a foodborne pathogen, it is necessary to assign taxonomic classification to a species or subspecies level (e.g., serotype or pathotype). Traditional microbiological and biochemical methods used to detect foodborne pathogens can take up to 3 days for preliminary results and a week or more to confirm pathogen identification. Many methods have been developed for more rapid detection of pathogens. These methods include (1) DNA-based methods such as end-point PCR, qPCR, and microarrays; (2) immunological methods such as immunofluorescent and enzyme-linked immunosorbent assays; and (3) advanced biosensors [4, 5, 6, 7]. All of these rapid methods require the use of biorecognition elements such as oligonucleotide primers and/or probes or antibodies that bind to specific pathogen-derived molecules. Thus, it is not possible to apply these methods for the detection of pathogens unless the biorecognition elements are already developed and readily available. More recently, other pathogen identification methods, such as DNA sequencing and mass spectroscopy, have gained acceptance. Although these methods are less reliant on the prior development of a specific biorecognition element, they require the existence of large reference databases to allow pathogen identification based on the identification of pathogen-specific biomolecules. Although DNA sequencing and MALDI-TOF mass spectrometry have been used to assign taxonomy at the species level for pathogens [8], these methods typically require expensive instrumentation that does not lend itself to field portability and require highly skilled operators to collect and interpret data. While whole-genome sequencing (WGS) has recently been adopted by regulatory agencies, replacing pulsed field gel electrophoresis as the gold standard for identification and tracking of outbreaks of foodborne illnesses, WGS currently requires isolation of the foodborne bacterial pathogen prior to sequencing and analysis [9, 10].

Recently, the MinION DNA sequencer (Oxford Nanopore Technologies, Little Chesterford, UK) was introduced. The Oxford Nanopore Technologies (ONT) MinION DNA sequencer is inexpensive and, being about the size and weight of a smart phone, is very portable. The ONT MinION determines the nucleotide sequence of long strands of DNA by measuring changes in current as a single molecule of DNA passes through a protein nanopore, resulting in very long sequence reads, currently up to 100 kb. Nucleotide base calling is rapid, occurs in real time, and DNA sequence information can be collected within minutes of starting an instrument run. Benitez-Paez et al. [11] demonstrated that the long-read length DNA sequences obtained using the ONT MinION could achieve species level resolution by sequencing near full-length amplicons of 16S rDNA. In addition, Quick et al. [12] demonstrated that the ONT MinION could be used to type Salmonella during an outbreak with a time-to-result of only 2 h from the start of the ONT MinION sequencing run. Schmidt el al. [13] also used the ONT MinION to identify bacterial pathogens and antimicrobial resistance genes from urine samples without the need for pathogen isolation or culture enrichment. Herein we present a proof-of-concept study on the application of the field-portable ONT MinION to identify the seven USDA-regulated Shiga toxin-producing Escherichia coli serogroups (i.e., the “Super-7” STEC; serogroups O26, O45, O103, O111, O121, O145, and serotype O157[:H7]) from a polymicrobial sample using a mixture of DNAs from strains of the Super-7 STEC and a shotgun metagenomic sequencing approach.

Materials and methods

Bacterial strains, DNA extraction, and “Super-7 STEC library” preparation

STEC strains of serotypes O26:H11, O45:H2, O103:H2, O111:NM, O121:H19, and O145:H25 were obtained from the CDC. The genome sequences of these strains are available in the GenBank database (Bioproject PRJNA218110). STEC serotype O157:H7 strain Sakai was obtained from our own laboratory strain collection and the genome sequence is available in GenBank (accession no. NC_002695). The strains and relevant genome information are listed in Table 1.
Table 1

STEC strains and genome sequences used to prepare the Super-7 STEC library




Sequencing technology

GenBank accession number

Genome size (Mbp)

Number of contigs


















































aNucleotide sequence assembly is not available in GenBank

These seven STEC strains were used to generate a Super-7 STEC library (one strain of each of the Super-7 serogroups) for ONT MinION sequencing. The STEC strains were grown separately overnight with shaking at 37 °C in Lysogeny Broth to an OD600 of 2.0. Genomic DNA was isolated using the Qiagen genome tip 100/G kit following manufacturer’s instructions. The Super-7 STEC library was prepare by mixing an equal amount of DNA (150 ng) from each strain, and the mixed DNAs were prepared for ONT MinION sequencing using a 2D library kit (SQK-NSK007; Oxford Nanopore Technologies).

Custom O-antigen and whole-genome databases

Two custom DNA sequence databases were created for comparison to the ONT MinION sequences derived from the Super-7 STEC library: an O-antigen database and a whole-genome database. The O-antigen database consisted of the O-antigen gene cluster sequences from 198 E. coli serovars [14]. The whole-genome database consisted of the whole-genome sequences of 15 bacterial stains listed in Tables 1 and 2. The nucleotide sequence data for the CDC serogroup O103 strain 08-3366 was not assembled into contigs in the GenBank submission (Table 1), so the assembled genome sequence from the STEC serogroup O103 strain 12009 (NCBI accession no. NC_013353) was used for our custom database instead (Table 2).
Table 2

List of additional genome sequences used for the whole genome database




GenBank accession number

Escherichia coli




Escherichia coli




Escherichia coli




Escherichia coli




Escherichia coli




Escherichia coli




Escherichia coli


K12 MG1655


Listeria monocytogenes




Salmonella enterica




The whole genome database consisted of genome sequences from the strains listed in Table 1 in addition to the strains listed below

Nanopore sequencing and bioinformatics analysis

The Super-7 STEC library was sequenced on the ONT MinION using a version R9.1 flow cell and the ONT MinKnow software version 0.84 (Oxford Nanopore Technologies). Basecalls were made from the HDF5 files on the Metrichor website ( using EPI2ME rev1.107 2D basecalling RNN software. Data were converted from HDF5 to fasta format using Poretools [15]. DNA sequences were analyzed using stand-alone megaBLAST against the O-antigen and whole-genome databases described above. A Python script was created to parse the BLAST results based on the number of significant alignments (see Electronic Supplementary Material (ESM) Fig. S1).

Results and discussion

Shotgun sequencing of the library of mixed DNA extracted from the Super-7 STEC strains listed in Table 1 (the Super-7 STEC library) was done using the ONT MinION sequencer and yielded 139,488 DNA sequence reads. 2D basecalling (i.e., where sequences are derived from both strands of the DNA fragment) was successful for 63,297 of the 139,488 collected sequence reads resulting in 447.3 Mb of 2D sequence reads, with an average sequence length of 5.9 kb. Using the Poretools software, 2D reads smaller than 4.0 kb were filtered out to avoid analyzing DNA sequences derived from lambda phage control DNA. This left a total of 44,245 2D reads that were used in the BLAST analysis against two custom databases. One of the custom databases contained the DNA sequences of all E. coli O-antigen gene clusters (O-antigen database), and the other custom database was made up of the complete genome sequences of the strains of STEC and other bacteria listed in Tables 1 and 2 (the whole-genome database).

Using a custom Python script (ESM Fig. S1), the sequence reads were parsed into three groups based on the number of significant alignments to sequences in the databases. The first group contained sequence reads with no significant alignments. The second group contained reads with more than one significant alignment. The final group contained reads for which there was only one significant alignment. A read was considered to have only one significant alignment using the following criteria: if the second significant alignment had a BLAST score that was less than 50% of the highest significant alignment, or if the sequence coverage of the second significant alignment was less than 75% of the read length.

Results of the BLAST alignment of the 44,245 ONT MinION 2D sequences from the Super-7 STEC library against the O-antigen database revealed that, as expected, a vast majority of these sequences (96.25%) were from regions outside of the O-antigen clusters and did not match any sequence in the database (Table 3). After removing short alignments (i.e., those showing less than 75% coverage of the sequence read), 87 of the Super-7 STEC library sequences aligned to only a single O-antigen cluster (Table 3), with only 9 of the Super-7 STEC library sequence reads aligning to O-antigen genes from more than one serogroup. Interestingly, the percentage of reads that correctly mapped to the O-antigen clusters (0.20%; 87/44,425 reads) is close to a predicted value of 0.22% based on the genomic size and O-antigen genome coverage (i.e., the average STEC genome is ~ 5500 kb and the average size of O-antigen clusters is ~ 12 kb).
Table 3

Distribution of ONT MinION Super-7 STEC library sequences with significant BLAST alignments to sequences in the O-antigen or whole genome databases

Number of significant BLAST alignments

Number of MinION sequence reads (% of total reads)

O-antigen database

CDC/whole genome database


42,587 (96.25%)

126 (0.28%)


87 (0.20%)

5096 (11.52%)

1 short alignmenta

610 (1.38%)

180 (0.41%)


0 (0%)

633 (1.43%)

2 short alignmenta

167 (0.38%)

93 (0.21%)

3 or more

9 (0.02%)

32,225 (72.83%)

3 or more short alignmenta

773 (1.75%)

5892 (13.32%)

aAligned to sequences less than 75% of the query length

When the 44,245 ONT MinION 2D sequences from the Super-7 STEC library were subjected to BLAST against the whole-genome database, a vast majority of the library sequences aligned with multiple sequences in the database. This would be expected because the whole-genome database included the genomes from 13 E. coli strains as well as the genomes from two other bacterial species (Tables 1 and 2). In addition, very few Super-7 STEC library sequences did not align to any sequences in the whole-genome database (0.28%; 129/44,245), and 11.5% (5096/44,245) aligned to only a single sequence in the whole-genome database.

The ONT MinION sequencing reads from the Super-7 STEC library sequences that aligned to a single sequence within the custom databases were parsed to determine to which serogroup these sequences aligned (Table 4). The data reveal a distribution of the library sequences aligning to the O-antigen database to each of the Super-7 serogroups. As the O-antigen database was made up of the O-antigen clusters from all 198 different E. coli serogroups, it is worth noting that all of the Super-7 STEC library fragments aligned to O-antigen cluster genes of the Super-7 serogroups, with the exception of 29 sequences that aligned to serogroup O14 (data not shown). It must be noted that serogroup O14 strains are phenotypically rough, that is, these strains synthesize only the enterobacterial common antigen and do not produce an O-antigen [16]. Furthermore, O14 strains do not possess O-antigen cluster genes between galF and gnd [17]. As such, Iguchi et al. [18] excluded the serogroup O14 O-antigen cluster from their comprehensive analysis of the O-antigen gene clusters. Further examination of the resulting sequence alignment between the Super-7 DNA library sequences and the O14 O-antigen gene cluster revealed that this was indeed an artifact resulting from the inclusion of accessory genes necessary for lipopolysaccharide synthesis and colonic acid synthesis in the O14 GenBank submission (no. AB972414).
Table 4

Distribution of ONT MinION sequences with significant BLAST alignment to a single sequence in the whole-genome database


Number of ONT MinION sequences

O-antigen serogroup

Custom sequence database

O-antigen databasea

CDC/whole-genome database




























aTwenty-nine additional ONT MinION library sequences aligned uniquely to the O14 O-antigen cluster and were not included in this table

The BLAST results for the 58 ONT MinION library sequences that aligned to a single STEC O-antigen cluster are listed in Table S1 (see ESM). The average identity score among the 58 sequence comparisons was 85.4% with a maximum of 97% and a minimum of 77%, indicating a suggested sequencing error rate of approximately 14.6%. While identity scores do not fully reflect the consequences of sequencing errors (e.g., indels are another measure of sequencing errors and are reflected in the BLAST gap scores which were an average of 6.3% with a maximum of 12% and a minimum of 2% among the 58 O-antigen sequence alignments), they do provide an estimate of mismatch sequencing errors. While earlier versions of the ONT MinION 2D sequencing chemistry and flow cells (R7.X) resulted in an average 85% sequence identity using M13 double-stranded genomic DNA [19], indicating a roughly 15% error rate [20], newer versions (R9.0) yielded 94% sequence identity using E. coli K-12 genomic DNA [21]. Although a version R9.1 flow cell was used in our study, the average sequence identity (84.5%) did not approach the 94% reported by Jain et al. [21]. In spite of the high sequencing error rate of the ONT MinION, the specific attribution of the individual STEC strains from the polymicrobial DNA mixture to their cognate O-antigen clearly demonstrates the efficacy of the ONT MinION sequencing for STEC identification from complex mixtures of genomic DNA.

Parsing of the 5096 single-hit sequence alignments between the STEC Super-7 STEC library and the whole genome database revealed that 5074 of the sequences (99.6%) of the sequences aligned to a sequence from one of the Super-7 serogroups (Table 4). It should be noted here that, other than the genomes of the strains from which the Super-7 STEC library was prepared, only nine other genomes were included in the whole-genome database, and only three of those were not STEC (Table 2). Furthermore, only one or two genomes from each Super-7 STEC serogroup were included in the database. Thus, it is not possible to distinguish the single-alignment sequences that are serogroup specific from those that may be merely strain specific. This is particularly true because E. coli has an open genome, having a proportionately small core genome (less than 50% of the genome), a very large pan-genome, and a relatively large accessory genome for any given strain [22]. The Super-7 STEC library sequences that resulted in a single alignment will be further analyzed in an attempt to identify serogroup-specific sequences outside of the O-antigen cluster.

In summary, this proof-of-concept study demonstrated that DNA sequences acquired using a shotgun sequencing approach with the ONT MinION sequencer were of sufficient length and quality to allow the alignment to O-antigen and other unique serogroup- or strain-specific sequences, providing sufficient taxonomic discrimination to identify the Super-7 STEC. Besides providing finer taxonomic resolution of the organisms in a microbiome, shotgun metagenomic sequencing also eliminates the PCR step required for 16S rDNA amplification prior to sequencing, affording more rapid and less biased detection of foodborne pathogens. Additional studies will be required to demonstrate sufficient sensitivity and the potential for more broad application of the ONT MinION for detection of pathogens from foods.



This work is supported by the USDA, Agricultural Research Service, National Program 108 Food Safety in-house projects. The authors would like to thank Dr. Pina Fratamico reviewing the manuscript. We would also like to thank Drs. Rebecca Lindsey and Nancy Strockbine from CDC for providing the non-O157:H7 STEC strains. Mention of brand or firm names does not constitute an endorsement by the USDA over others of a similar nature not mentioned. The USDA is an equal opportunity employer. BLAST® is a Registered Trademark of the National Library of Medicine.

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflicts of interest.

Supplementary material

216_2018_877_MOESM1_ESM.pdf (172 kb)
ESM 1 (PDF 171 kb)


  1. 1.
    WHO. WHO estimates of the global burden of foodborne diseases: foodborne diseases burden epidemiology reference group 2007–2015. Geneva: WHO; 2015. Accessed 20 Nov 2017Google Scholar
  2. 2.
    Scallan E, Griffin PM, Angulo FJ, Tauxe RV, Hoekstra RM. Foodborne illness acquired in the United States—unspecified agents. Emerg Infect Dis. 2011;17(1):16–22. CrossRefPubMedPubMedCentralGoogle Scholar
  3. 3.
    Hoffman, S, Maculloch, B, Batz, M. Economic burden of major foodborne illnesses acquired in the United States. USDA Economic Research Service: Economic Information Bulletin Number 140. USDA, Washington, DC, USA. 2015. Accessed 28 Nov 2017.
  4. 4.
    Umesha S, Manukumar HM. Advanced molecular diagnostic techniques for detection of food-borne pathogens: current applications and future challenges. Crit Rev Food Sci Nutr. 2017;8:1–21. CrossRefGoogle Scholar
  5. 5.
    JW-F L, Ab Mutalib N-S, Chan K-G, Lee L-H. Rapid methods for the detection of foodborne bacterial pathogens: principles, applications, advantages and limitations. Front Microbiol. 2014;5:770. CrossRefGoogle Scholar
  6. 6.
    Mangal M, Bansal S, Sharma SK, Gupta RK. Molecular detection of foodborne pathogens: a rapid and accurate answer to food safety. Crit Rev Food Sci Nutr. 2016;56:1568–84. Scholar
  7. 7.
    Zhao X, Lin CW, Wang J, Oh DH. Advances in rapid detection methods for foodborne pathogens. J Microbiol Biotechnol. 2014;24:297–312.CrossRefPubMedGoogle Scholar
  8. 8.
    Sandrin TR, Goldstein JE, Schumaker S. MALDI TOF MS profiling of bacteria at the strain level: a review. Mass Spectrom Rev. 2013;32:188–217. Scholar
  9. 9.
    Allard MW, Strain E, Melka D, Bunning K, Musser SM, Brown EW, et al. Practical value of food pathogen traceability through building a whole-genome sequencing network and database. J Clin Microbiol. 2016;54:1975–83. Scholar
  10. 10.
    Gilchrist CA, Turner SD, Riley MF, Petri WA Jr, Hewlett EL. Whole-genome sequencing in outbreak analysis. Clin Microbiol Rev. 2015;28:541–63. Scholar
  11. 11.
    Benítez-Páez A, Portune KJ, Sanz Y. Species-level resolution of 16S rRNA gene amplicons sequenced through the MinION™ portable nanopore sequencer. Gigascience. 2016;5:1–9. CrossRefGoogle Scholar
  12. 12.
    Quick J, Ashton P, Calus S, Chatt C, Gossain S, Hawker J, et al. Rapid draft sequencing and real-time nanopore sequencing in a hospital outbreak of Salmonella. Genome Biol. 2015;16:114. Scholar
  13. 13.
    Schmidt K, Mwaigwisya S, Crossman LC, Doumith M, Munroe D, Pires C, et al. Identification of bacterial pathogens and antimicrobial resistance directly from clinical urines by nanopore-based metagenomic sequencing. J Antimicrob Chemother. 2017;72:104–14. CrossRefPubMedGoogle Scholar
  14. 14.
    Debroy C, Fratamico PM, Yan X, Baranzoni GM, Liu Y, Needleman DS, et al. Comparison of O-antigen gene clusters of all O-serogroups of Escherichia coli and proposal for adopting a new nomenclature for O-typing. PLoS One. 2016;11:e0147434. Scholar
  15. 15.
    Loman NJ, Quinlan AR. Poretools: a toolkit for analyzing nanopore sequence data. Bioinformatics. 2014;30:3399–401. Scholar
  16. 16.
    Kunin CM, Beard MV, Halmagyi NE. Evidence of common hapten associated with endotoxin fractions of E. coli and other Enterobacteriaceae. Proc Soc Exp Biol Med. 1962;111:160–6. Scholar
  17. 17.
    Jensen SO, Reeves PR. Deletion of the Escherichia coli O14:K7 O antigen gene cluster. Can J Microbiol. 2004;50:299–302. Scholar
  18. 18.
    Iguchi A, Iyoda S, Kikuchi T, Ogura Y, Katsura K, Ohnishi M, et al. A complete view of the genetic diversity of the Escherichia coli O-antigen biosynthesis gene cluster. DNA Res. 2014;22:101–7. Scholar
  19. 19.
    Jain M, Fiddes I, Miga KH, Olsen HE, Paten B, Akeson M. Improved data analysis for the MinION nanopore sequencer. Nat Methods. 2016;12:351–6. CrossRefGoogle Scholar
  20. 20.
    McIntyre ABR, Rizzardi L, Yu AM, Alexander N, Rosen GL, Botkin DJ, et al. Nanopore sequencing in microgravity. NPJ Microgravity. 2016;2:16035. Scholar
  21. 21.
    Jain M, Tyson JR, Loose M, CLC I, Eccles DA, O’Grady JO, et al. NinION analysis and reference consortium. MinION analysis and reference consortium: phase 2 data release and analysis of R9.0 chemistry. F1000Res. 2017;6:760. CrossRefPubMedPubMedCentralGoogle Scholar
  22. 22.
    Sadiq, AM, Hazen TH, Rasko, DA, Eppinger M. Enterohemorrhagic Escherichia coli genomics: past, present, and future. Microbiol Spect. 2014; 2:EHEC-0020-2013. doi:

Copyright information

© The Author(s) 2018

Authors and Affiliations

  1. 1.Molecular Characterization of Foodborne Pathogens Research Unit, Eastern Regional Research CenterAgricultural Research Service, U. S. Department of AgricultureWyndmoorUSA

Personalised recommendations