Genome sequence analyses of two isolates from the recent Escherichia coli outbreak in Germany reveal the emergence of a new pathotype: Entero-Aggregative-Haemorrhagic Escherichia coli (EAHEC)
- First Online:
- Cite this article as:
- Brzuszkiewicz, E., Thürmer, A., Schuldes, J. et al. Arch Microbiol (2011) 193: 883. doi:10.1007/s00203-011-0725-6
The genome sequences of two Escherichia coli O104:H4 strains derived from two different patients of the 2011 German E. coli outbreak were determined. The two analyzed strains were designated E. coli GOS1 and GOS2 (German outbreak strain). Both isolates comprise one chromosome of approximately 5.31 Mbp and two putative plasmids. Comparisons of the 5,217 (GOS1) and 5,224 (GOS2) predicted protein-encoding genes with various E. coli strains, and a multilocus sequence typing analysis revealed that the isolates were most similar to the entero-aggregative E. coli (EAEC) strain 55989. In addition, one of the putative plasmids of the outbreak strain is similar to pAA-type plasmids of EAEC strains, which contain aggregative adhesion fimbrial operons. The second putative plasmid harbors genes for extended-spectrum β-lactamases. This type of plasmid is widely distributed in pathogenic E. coli strains. A significant difference of the E. coli GOS1 and GOS2 genomes to those of EAEC strains is the presence of a prophage encoding the Shiga toxin, which is characteristic for enterohemorrhagic E. coli (EHEC) strains. The unique combination of genomic features of the German outbreak strain, containing characteristics from pathotypes EAEC and EHEC, suggested that it represents a new pathotype Entero-Aggregative-Haemorrhagic Escherichiacoli (EAHEC).
KeywordsEHEC outbreakEAHECGenome sequencingPathotypeGenome evolution
Escherichia coli is a bacterium that is commonly found in the intestine of humans and other mammals. Most E. coli strains are harmless commensals. However, some strains such as enterohemorrhagic E. coli (EHEC) strains can cause severe food-borne diseases. These pathogens are transmitted to humans primarily through consumption of contaminated drinking water and foods such as raw or undercooked ground meat products, raw milk, and even vegetables (Kaper et al. 2004). In addition, person-to-person transmission is possible. The significance of EHEC as a public health problem was first recognized in 1982, following an outbreak in the United States of America associated with undercooked hamburgers (Kaper et al. 2004).
Infections caused by EHEC may lead to severe diarrhea and hemorrhagic colitis with complications such as microangiopathic hemolytic anemia, thrombocytopenia, and fatal acute renal failure, which are summarized as hemolytic uremic syndrome (HUS) (Karmali et al. 1983, 1985; Law et al. 1992). Ruminants, predominantly cows, are the natural reservoir of EHEC strains (Kaper et al. 2004).
EHEC is known to produce characteristic toxins, which are similar to toxins produced by Shigella dysenteriae and are known as verocytotoxins or Shiga toxins (STX) (Kaper et al. 2004; Karch et al. 2005; Tarr et al. 2005). Absorption of these toxins by the bloodstream leads to damage to the kidneys and to HUS. The most significant serogroups among EHEC strains are O26, O103, O111, and O157. E. coli O157:H7 is the most important EHEC serotype with respect to public health in North America, the United Kingdom, and Japan (Kaper et al. 2004). Typical EHEC strains produce STX but also encode a LEE (locus of enterocyte effacement) pathogenicity island, which is important for adherence in the colon (Jores et al. 2004). E. coli strains that encode a Shiga toxin, but do not contain the LEE pathogenicity island, are designated as STEC (Shiga toxin-producing E. coli) strains. Approximately 200 different serogroups of STEC strains are known and more than 100 harbor a virulence potential. Up to 50% of infections with STEC strains are linked to non-O157 serogroups (Kaper et al. 2004).
The EHEC outbreak started in Germany in May 2011 with 3,368 cases including 36 deaths (as of June 14th, 2011, European Centre for Disease Prevention and Control; http://www.ecdc.europa.eu/en/Pages/home.aspx). This is the second largest food-borne E. coli outbreak in history. The enterohemorrhagic E. coli strain O104:H4 was identified as the causative agent of the EHEC infection outbreak. This strain was found in humans before but never as causative agent of an EHEC outbreak (Robert Koch Institute, Berlin, Germany; http://www.rki.de). Only one case of infection with strain O104:H4 has been documented in the literature prior to the 2011 outbreak. In this case, the strain was isolated from a 29-year-old Korean woman, who suffered from HUS (Bae et al. 2006).
In this study, we report on the genome sequences of two O104:H4 isolates, which were derived from two patients of the 2011 EHEC outbreak in Germany. The determination of the genomic features of the isolates provides insights into the genomic potential, pathogenicity, and evolution of the O104:H4 strain. Comparison of our E. coli O104:H4 genome sequences with that of other pathogenic E. coli suggests that strain O104:H4 represents a new E. coli pathotype, which we named Entero-Aggregative-Haemorrhagic Escherichiacoli (EAHEC).
General features of E. coli GOS1 and GOS2 genome sequences
Assembly data of the Escherichia coli GOS1 and GOS2 genome sequences
E. coli GOS1
E. coli GOS2
Genome size (Mbp)
GC content (%)
Number of large contigs (>500 bp)
Average contig size (kbp)
N50 contig size (kbp)
Largest contig size (kbp)
Q40 value (%)
Genome comparison of GOS1 and GOS2 with selected E. coli genomes
Sequence alignment of E. coli GOS1 and GOS2 genome sequences using the MUMmer software tool (Kurtz et al. 2003) revealed 99.9% identity of both sequences. We could not find a single-nucleotide polymorphism when we compared the draft genomes of E. coli GOS1 and GOS2 by employing the GS Mapper Reference software (Roche 454, Branford, USA). Thus, as these isolates derived from patients showing different gender and age, it appears that the genome of E. coli O104:H4 is stable during its infection in different hosts. This assumption was supported by comparison of the E. coli GOS1 and GOS2 genomes with the three other available draft genome sequences of E. coli O104:H4 isolates derived from the German outbreak. The sequence identities of E. coli GOS1 to the genome sequences of E. coli O104:H4 isolates TY-2482 (Beijing Genomics Institute, China), LB226692 (Life Technologies, Germany; University of Münster, Germany), and H112180280 (Health Protection Agency, Cambridge, United Kingdom) were 99.8, 99.5, and 99.9%, respectively. Taking into account the overall high similarity of all five genome sequences and the different sequencing approaches used, we assume that the recorded differences of the genome sequences are mainly due to sequencing errors and not to changes within the genome of the different isolates. In addition, as all analyzed chromosomal E. coli sequences share synteny over the whole chromosome length, we could align chromosomal contigs of all available sequences of the German outbreak to the chromosome of EAEC 55989 and obtain the contig order for the genomes of E. coli GOS1 and GOS2 (Fig. S2).
Comparison of the complete gene content of E. coli GOS1 and GOS2 with selected E. coli genomes showed that the chromosome of both isolates is most similar to that of the entero-aggregative E. coli (EAEC) strain 55989 (Fig. S2). E. coli strain 55989 was originally isolated from the diarrheagenic stools of an HIV-positive adult suffering from persistent watery diarrhea (Mossoro et al. 2002). Genome wide BiBag comparisons revealed a set of 4,606 (GOS1) and 4,607 (GOS2) orthologous genes that are shared by at least one chromosome of the selected reference E. coli strains (Table S1). Among the remaining 611 (GOS1) and 617 (GOS2) genes 122 and 211, respectively, genes were orthologous to genes located on plasmids.
We could identify 336 prophage-encoding genes for GOS1 and 334 for GOS2 (Tables S3, S4). The key virulence factor of EHEC, STX, is encoded on a lambda-like bacteriophage, the Stx-phage. Acquisition of this phage was a key step in the evolution of EHEC from EPEC (Reid et al. 2000). A Stx-phage is present in the outbreak strain (Fig. 1). This phage shows high identity to the stx2-containing enterobacteria phage VT2phi_272 from E. coli O157:H7 strain 71074 (HQ424691). The GOS1 Stx-prophage consists of 66 encoding genes and is identical to the GOS2 Stx-phage (Tables S3, S4). In addition to the Stx-phage, 70 prophage-encoding genes (Tables S3, S4) that are not present in E. coli 55989 could be identified in the genome of E. coli GOS1. These genes have high similarity to STX-producing prophages and also to the other above-mentioned phage in the outbreak strain, but lack stx2AB (Fig. S3).
EHEC O157:H7 strains resist the highly toxic tellurium oxyanion, tellurite (Tel) (Zadik et al. 1993; Taylor et al. 2002; Bielaszewska et al. 2005; Orth et al. 2007). Tellurite resistance (TelR) of EHEC O157:H7 is encoded by the chromosomal terZABCDEF gene cluster (Taylor et al. 2002; Bielaszewska et al. 2005), which is highly homologous to the ter cluster on plasmid R478 of Serratia marcescens (Whelan et al. 1995; Taylor et al. 2002). TelR is a common, but not obligatory, feature of EHEC O157:H7 strains, as tellurite-susceptible E. coli O157:H7 strains have been isolated in North America (Taylor et al. 2002) and Europe (Bielaszewska et al. 2005). We identified all proteins of the terZABCDEF operon in the outbreak strain (ORFs RGOS02836 to RGOS02842).
In addition, the German outbreak strain could bear a mercuric resistance plasmid, as in many bacteria resistance to mercury is associated with a plasmid (Smith 1967; Novick and Roth 1968; Summers and Silver 1972; Kondo et al. 1974). Correspondingly, the predicted proteins involved in mercury resistance were located all on one contig (GOS1_contig00023). These genes encode the putative mercuric ion transport proteins MerT, MerP, and MerC (RGOS00392, RGOS00393, and RGOS00394, respectively), the corresponding transcriptional regulators MerR (RGOS00391) and MerD (RGOS00396), and mercuric ion reductase MerA (RGOS00395). In addition to genes involved in mercuric resistance and tellurium resistance, we have predicted and annotated many genes involved in antibiotic resistance such as putative gene-encoding chloramphenicol (RGO00056), tetracycline (RGOS00387, RGOS00388), or streptomycin resistance (RGOS00359).
Chromosomes and plasmids
The chromosomes of the E. coli isolates GOS1 and GOS2 are most similar to the chromosome of EAEC strain 55989 isolated in Africa over a decade ago. EAEC strains are the most recently emerged E. coli intestinal pathotype and the second most common agent of traveler’s diarrhea (Huang et al. 2006). EAEC pathogenesis is thought to involve three primary steps. First, the bacteria adhere to the intestinal mucosa using aggregative adherent fimbriae (AAF). Second, these fimbriae cause autoaggregative adhesion, by which the bacteria adhere to each other in a ‘stacked-brick’ configuration producing a mucous-mediated biofilm on the enterocyte surface. Third, the bacteria release toxins that affect the inflammatory response, intestinal secretion, and mucosal cytotoxicity. Aspects of each of these steps involve plasmid-encoded traits but also chromosomal-encoded virulence factors (Kaper et al. 2004).
In addition to the chromosomal similarity, E. coli GOS1 and GOS2 share with EAEC strain 55989 part of the EAEC plasmid 55989p. This plasmid carries the AAF operon aat and the regulator aggR. Nevertheless, a different aggregative adhesion fimbrial complement was present in our strains. The AAF operon is usually localized on an approximately 100-kb plasmid, termed the “pAA plasmid” (Nataro et al. 1987). Four genetically distinct allelic variants of AAF have been identified previously, AAF/I from EAEC strain 17-2 (Nataro et al. 1992), AAF/II from strain O42 (Nataro et al. 1995), AAF/III from strain 55989 (Bernier et al. 2002), and Hda from strain C1010-00 (Boisen et al. 2008). All the identified AAF allelic types appear to be plasmid encoded, and most of the analyzed strains possess only a single AAF allelic type (Harrington et al. 2006). The outbreak strain is no exception and seems to contain the relatively rare AAF/I locus of EAEC. Additionally, the ipd gene encoding an extracellular serine protease and the gene encoding serine protease Pet were found in the German outbreak strain. Usually, these virulence factors are localized next to the AAF operon on the pAA plasmid. Another virulence feature, the aatPABCD operon (dispersin secretion locus), is a plasmid-borne characteristic of EAEC strains. This operon is also present in the genome of the German outbreak strain.
Two RepA proteins were found in the German outbreak strain. This suggests that this strain harbors at least two plasmids. In addition to the pAA-like plasmid, we identified contigs showing high similarity to the previously described plasmids pEC_Bactec, pCVM29188_101, and pEK204 (Fricke et al. 2009; Woodford et al. 2009; Smet et al. 2010). These plasmids encode the extended-spectrum β-lactamases blaCTX-M and blaTEM-1.
Evolution: horizontal gene transfer (HGT)
Escherichia coli virulence factors such as enterotoxins, invasion factors, adhesion factors, or Shiga toxins can be encoded by several mobile genetic elements, including transposons (Tn), plasmids, bacteriophages, or pathogenicity islands (e.g., LEE island). Bacterial plasmids play a key role in a variety of traits like drug resistance, virulence, and the metabolism of rare substrates under specific conditions (Actis et al.1999). Plasmids are able to mobilize these traits between different strains and thus play an important role in horizontal gene transfer. The analyses indicate that a number of horizontal gene transfer events took place to create the genome of the German outbreak strain. This strain probably originated from an EAEC pathotype, which is suggested by the missing LEE island and the high similarity of the genome to the genome of EAEC strain 55989. In contrast to the EAEC strains, the German outbreak strain has acquired the Stx-phage, which is typical for EHEC strains (Fig. 1).
Another feature of the new outbreak strain is the acquisition of plasmid-encoded drug resistances. The strain has acquired a plasmid sharing high similarity with the plasmids pEC_Bactec, pCVM29188_10, and pEK204. The origin of this plasmid remains unclear, since the extended-spectrum β-lactamases (ESBLs) CTX-M and TEM-1 resistances seem to be located on a Tn3-type transposon that has been widely spread among enteric bacteria.
Materials and methods
Sample preparation and DNA extraction
The two E. coli O104:H4 isolates GOS1 and GOS2 were derived from stool samples of two different patients of the 2011 German outbreak. E. coli GOS1 and GOS2 were recovered from a 75-year-old woman and a 48-year-old man, respectively. To isolate these strains, stool samples were plated on Brilliance™ ESBL Agar plates (Oxoid, Wesel, Germany) and incubated for 24 h at 37°C. Initially, the E. coli O104:H4 strains were identified by the ability to produce STX2. For this purpose, the LightMix® kits E. coli EHEC Stx1 and Stx2 were applied as recommended by the manufacturer (TIB MOLBIOL, Berlin, Germany). A colony of each strain from the thereby recovered positive strains, E. coli GOS1 and GOS1, was grown in 4 ml EHEC-direct-media (Heipha Diagnostics, Eppelheim, Germany) overnight at 37°C. To isolate genomic DNA, the cultures were pelleted (5 min, 2,000g), resuspended in 1 ml S.T.A.R. Buffer (Roche, Molecular Diagnostics, Rotkreuz, Switzerland), and incubated for 5 min at 95°C. Subsequently, the suspension was subjected to centrifugation for 1 min at 1,100g. The cell-free supernatant (500 μl) was used for the preparation of the genomic DNA by employing the High Pure 16 System Viral Nucleic Acid kit as recommended by the manufacturer (Roche Applied Science, Mannheim, Germany). The resulting DNA solution (260 ng/μl) was used for further analysis.
To confirm that E. coli isolates GOS1 and GOS2 were O104:H4 serotype, a PCR-based detection of four specific marker genes (stx2, terD, rfbO104, and fliC H4) was performed according to the PCR typing scheme by the group of Prof. Karch at the National Consulting Laboratory on HUS at the University of Münster (see http://www.ehec.org/pdf/Laborinfo_01062011.pdf, 2011) with slight adaptations. Briefly, the PCR reaction mixture (25 μl) contained 2.5 μl tenfold reaction buffer (Bioline, Luckenwalde, Germany), 0.2 mM of each of the four deoxynucleoside triphosphates, 1.5 mM MgCl2, 0.2 μM of each of the primers, 1 U of BIO-X-ACT™ DNA Polymerase (Bioline), and 100 ng of isolated genomic DNA as template. The stx2, terD, rfbO104, and fliC H4 were amplified with the following set of primers: stx2, 5′-ATCCTATTCCCGGGAGTTTACG-3′ and 5′-GCGTCATCGTATACACAGGAGC-3′; terD, 5′-AGTAAAGCAGCTCCGTCAAT-3′ and 5′-CCGAACAGCATGGCAGTCT-3′; rfbO104, 5′-TGAACTGATTTTTAGGATGG-3′ and 5′-AGAACCTCACTCAAATTATG-3′; and fliC H4, 5′-GGCGAAACTGACGGCTGCTG-3′ and 5′-GCACCAACAGTTACCGCCGC-3′. The following thermal cycling scheme was used: initial denaturation at 94°C for 5 min, 30 cycles of denaturation at 94°C for 45 s, annealing at 55°C (stx2, terD, rfbO104) or 63°C (fliC H4) for 45 s, and extension at 72°C for 60 s (stx2, terD, rfbO104) or 30 s (fliC H4) followed by a final extension period at 72°C for 5 min. Subsequently, PCR products were separated by agarose gel electrophoresis (1.5% gels) and analyzed. The analysis revealed that all four marker genes were present in E. coli isolates GOS1 and GOS2 in the expected sizes (Fig. S1).
Sequencing and assembly
The isolated DNA from both strains was used to create 454-shotgun libraries following the GS Rapid library protocol (Roche 454, Branford, USA). The resulting two 454 DNA libraries were sequenced with the Genome Sequencer FLX (Roche 454) using Titanium chemistry. For sequencing of each sample, 1.5 medium lanes of a Titanium picotiter plate were used. A total of 349,788 and 311,478 shotgun reads were achieved for E. coli GOS1 and E. coli GOS2, respectively. Reads were assembled de novo using the Roche Newbler assembly software 2.3 (Roche 454) (Table 1).
Gene prediction and annotation
Gene prediction was performed with Glimmer3 (Delcher et al. 2007). Automatic gene annotation was done by transferring annotations from orthologous genes of reference strains (Table S1) available at the EMBL database. Orthologous genes were identified as described previously by bidirectional BLAST comparisons (Schmeisser et al. 2009). Proteins without orthologs in the reference strains were annotated according to their best BLAST hits to the SwissProt subset of the UniProt Database (Jain et al. 2009, http://www.uniprot.org). Sequence data of isolates GOS1 and GOS2 are publicly available and can be downloaded from the Göttingen Genomics Laboratory website (ftp://18.104.22.168; UserID: EAHEC_GOS; Password: EAHEC_GOS).
In order to analyze the presence of prophage regions, the Prophage Finder software has been employed (http://22.214.171.124/~phage/ProphageFinder.php). This web application provides a quick prediction of prophage loci in prokaryotic genome sequences based on BLASTX comparisons to predicted prophage sequences. The contig order of the E. coli GOS1 and GOS2 draft genomes was obtained by comparison to the reference genome of E. coli strain 55989 using the Mauve Multiple Genome Alignment software (Darling et al. 2010).
Whole genome sequence alignments of the different E. coli O104:H4 isolates (GOS1, GOS2, TY-2482, LB226692, H112180280) were done with the MUMmer software tool (Kurtz et al. 2003). Single-nucleotide polymorphism (SNP) analyses were performed using the GS Reference Mapper Software tool (Roche 454). SNPs were filtered using the following criteria: 100% variation frequency, a minimum of tenfold depth within the variation, the variation is located outside a homopolymer region, and each nucleotide exchange is located at least 100 bp offwards a contig end. For whole genome comparison, the BiBag software tool (Bidirectional BLAST for the identification of bacterial pan and core genomes, Göttingen Genomics Laboratory, Germany) was applied. Visualization of genomic, plasmid, and phage region comparisons was done with the programs Artemis (Rutherford et al. 2000), ACT (Carver et al. 2005), and DNAplotter (Carver et al. 2009) from the Sanger Institute (http://www.sanger.ac.uk/).
Phylogenetic analysis based on MLST
The phylogenetic tree was calculated according to the Achtman MLST scheme (Wirth et al. 2006), which includes sequences of seven housekeeping genes adk, fumC, gyrB, icd, mdh, purA, and recA. The alleles for these genes were extracted from E. coli GOS1 and GOS2, and 42 completely sequenced E. coli strains. Sequences of the seven housekeeping genes were concatenated, and an alignment was calculated with ClustalW included in MEGA 5.05 (Tamura et al. 2011). The tree was calculated with the Maximum Likelihood method based on the Tamura-Nei model (Tamura and Nei 1993). The bootstrap consensus tree was inferred from 100 replicates. Tree calculation and drawing were done with the software MEGA 5.05 (Tamura et al. 2011). The alleles of the seven housekeeping genes from Escherichia fergusonii ATCC 35469 were used as outgroup.
We thank Sascha Dietrich for bioinformatic support.
Conflict of interest
The authors declare no conflict of interest.
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.