Background

ETEC infections are an important cause of childhood diarrhea resulting in significant morbidity and mortality, primarily among children aged <5 years living in developing countries [1] as well as travelers visiting these countries [2]. ETEC is characterized by the presence of the heat-labile toxin (LT) and/or the heat-stable toxin (ST), both of which are plasmid encoded [3]. The presence of virulence factors such as enterotoxins and colonization factors differentiate ETEC from other categories of diarrheagenic E. coli. [4]. Colonization factors (CFs) enable ETEC bacteria to adhere to the intestinal epithelium [5]. At present more than 25 different CFs have been identified [5]. In addition to the CFs, other putative factors involved in ETEC pathogenesis were also identified, such as EtpA and EatA. EtpA can act as a bridge between the bacterial flagella and host epithelial cells [6] and EatA is a protein of the serine protease autotransporters of the Enterobacteriaceae (SPATE) family [7].

For a long time E. coli H10407 and E24377A were the only two ETEC strains infecting humans that have their genomes completely sequenced together with a draft genome of ETEC strain B7A [8,9]. Recently whole genome sequences of additional draft genomes were published [10]. A comprehensive analysis of 362 ETEC genomes from strains isolated globally over three decades identified that ETEC distribute into several conserved monophyletic lineages that have distrubuted globally [11] . In this study we analysed four additional ETEC strains with the aim to compare additional ETEC isolated in China and Bangladesh with the global collection and to better understand the dissemination of the pathogen. We also included two additional Bangladeshi strains to increase the number of sequenced genomes from Bangladeshi ETEC strains.

Methods

Strain selection

To assess the frequency of ETEC in Beijing, China, we investigated patients presenting with acute watery diarrhea at four hospitals between 2010 and 2011. This research was approved by the Research Ethics Committee of the Institute of Microbiology, Chinese Academy of Sciences. ETEC isolates were recovered after streaking diarrheal samples on to MacConkey agar followed by PCR confirmation for ETEC-specific enterotoxins [12]. In total, 880 cases were enrolled and tested for ETEC but ETEC was only recovered from three cases (0.3%). The two ETEC isolates CE516 and CE549 from China were recovered from stool of patients that tested negative for Vibrio cholerae, Shigella spp and Salmonella spp. CE549 expressed the heat-labile enterotoxin (LT) and the human heat-stabile enterotoxin (STh) in combination with CFs CS2, CS3 and CS21; CE516 expressed LT and CS6, CS8. Antimicrobial susceptibility was determined using the VITEK 2 Gram Negative Susceptibility Test Cards AST-GN04 and AST-GN 13 (Biomerieux, Marcy l’Etoile France). CE549 was resistant to 14 of the 22 antibiotics tested (cefuroxime axetil, sulfamethoxazole, ampicillin, tobramycin, ceftriaxone, aztreonam, piperacillin, cefuroxime, cefazolin, ceftazidime, cefepime, levofloxacin, gentamicin, ciprofloxacin, and extended spectrum beta lacatamase (ESBL) positive), while CE516 showed sensitive to all 22 antibiotics and was ESBL negative.

The two ETEC isolates E1777 and E2265 were collected from adult Bangladeshi patients that sought medical attention for severe diarrhea in hospital facilities in April 2005 and March 2006 during the bi-annual ETEC epidemic peaks in Dhaka, Bangladesh [13]. Stool samples were confirmed to be negative for Vibrio cholerae, Shigella ssp and Salmonella ssp. MacConkey agar plates were used for identification of lactose fermenting E. coli like colonies selection followed by PCR confirmation for ETEC [12]. The strains were further characterized by immunodiagnostic methods for toxins and colonization factors [12]. Both isolates expressed the common virulence factor combination of the enterotoxins heat labile toxin LT and heat stable toxin STh and the CFs CS5 and CS6.

Genome sequencing, assembly and annotation

DNA was extracted from bacterial cells cultured in Luria broth (LB) medium using the DNA Tissue and Blood kit (Qiagen, Duesseldorf, Germany). Genome sequencing work was carried out at the Microbial Genome Research Center, Institute of Microbiology, Chinese Academy of Sciences, Beijing. The genome sequences of each ETEC isolate were generated using paired-end libraries with 350 ~ 400 bp inserts on an Illumina GAIIX (Illumina, San Diego, CA, USA). The detailed methods for genome assembly were described in another paper [14]. Genome sequences were annotated by using Subsystem Technology (RAST) [15]. The functions of predicted protein-coding genes were then annotated through comparisons with the databases of NCBI-NR, and COG. To search the antibiotic resistance genes, the protein-coding sequences were aligned against Antibiotic Resistance Database (ARDB) [16], using similarity thresholds as recommended in ARDB.

Multiple locus sequence typing (MLST)

We used MLST system including the following seven housekeeping genes: adk, fumC, gyrB, icd, mdh, purA, and recA [17], which were extracted from draft genome sequences and were compared to allele profiles in the MLST database (http://mlst.warwick.ac.uk/mlst/dbs/Ecoli/documents/primersColi_html).

Comparative genomics

For comparative genomic analysis, genome sequences of 13 previously reported isolates including Escherichia coli B7A (GenBank accession number NZ_AAJT02000001.1), E24377A (NC_009786.1), H10407 (NC_017633.1), IAI39 (NC_011750.1), O127 H6 E2348 69 (NC_011601.1), O157 H7 EC4115 (NC_011350.1), O157 H7 EDL933 (NC_002655.2), O157 H7 TW14359 (NC_013008.1), O157 H7 Sakai (NC_002127.1), SMS 3 5 (NC_010485.1), TW10598 (NZ_AELA01000001.1), TW10722 (NZ_AELB01000001.1), and TW10828 (NZ_AELC01000001.1) were downloaded from the NCBI website (Table 1). Multiple sequence alignments of Escherichia coli genomes were performed with Mugsy [18]. The trees were constructed based on core SNPs (single nucleotide polymorphisms) from whole genome alignment by using the maximum-likelihood method in Phylogeny Inference Package (http://evolution.genetics.washington.edu/phylip.html). The map of ORF comparisons among E. coli genomes was constructed using Circos [19].

Table 1 Reference strains used for this study

Quality assurance

The genomic DNA was isolated from pure bacterial isolate and was further confirmed with 16S rRNA gene sequencing. Bioinformatic assessment of potential contamination of the genomic library by allochthonous microorganisms was done using PGAAP and RAST annotation system.

Initial findings

Genome characteristics

Through genome assembly, we obtained 99 scaffolds of 5,068,634 bp for CE516, 137 scaffolds of 4,859,890 bp for CE549, 150 scaffolds of 5,117,746 bp for E1777, and 142 scaffolds of 4,946,932 bp for E2265 (Table 2). RAST annotation of the whole genome indicated the presence of 611, 590, 605, and 605 SEED subsystems in CE516, CE549, E1777, and E2265, respectively. Table 3 shows the comparison of genomic features of the four sequenced ETEC genomes.

Table 2 Genomic characteristics of the 4 ETEC genomes
Table 3 Comparisons of subsystem features among the 4 ETEC genomes

Phylogenetic analysis

A maximum-likelihood tree of the sequenced 4 genomes and 13 publicly available Escherichia coli complete genomes which represent the classical phylogenetic groups (A, B1, B2, D, and E) were created based on core SNPs from whole genome alignment (Figure 1). The sequenced strains in this study grouped with the classical Escherichia coli phylogenetic groups A and B1. Specifically, strains CE549, H10407 and TW10598 which belong to group A were grouped together, while other sequenced strains which belong to group B1 as well as the previously sequenced strains formed a clade. Strains CE549 and TW10598 are closely related to each other, while strains E1777 and E2265 are closely related to each other. MLST analysis was used to compare the strains to a global collection of ETEC [11]. Three strains were found to belong to the major lineages described in ETEC [11]. Strains E1777 and E2265 belong to the global lineage L5 which express LT STh CS5 + CS5, while strain CE156, the multi drug-resistant isolate belongs to the conserved ETEC lineage L2 that is distributed globally [11]. The Chinese strain CS516 belonged to a MLST type previously identified in Bangladeshi and Egyptian ETEC strains [11].

Figure 1
figure 1

Phylogenetic relationships of E. coli strains based on SNPs from whole genome sequences. The trees were constructed by the maximum-likelihood method. Scale bar indicates nucleotides substitutions per site.

Genomic variants among ETEC strains

We compared proteins from the 4 draft genomes and 6 references within groups A and B1 with that from H10407 using BLASTP and revealed many large variable regions (VR1 to VR10) (Figure 2). Among these VRs, VR3 and VR10 (regions of 5,072 to 5,121 kb) were predicted to be prophage loci which were highly variable among all strains. Interestingly, all strains within group B1 lack VR7 gene cluster encoding general secretory pathway associated genes. In addition, region 2,405 to 2,414 kb adjacent to VR4, which encoded ribitol metabolism related genes, was presented within group A but not detected within group B1.

Figure 2
figure 2

ORF comparisons of E. coli genomes. Proteins from the 4 genomes and 6 references within groups A and B1 were aligned using H10407 as a reference. Track shows a plot of G + C contents. Circles from inside to outside are the BLASTP percent identities of H10407 against ORFs of H10407, TW10598, CE549, TW10722, E1777, E2265, E24377A, CE516, TW10828, B7A. Red is 90–100% identity, yellow is 60–89% identity, blue is 0–59% identity.

Virulence factors

The strains were analyzed for presence of known ETEC virulence factors. Strains E1777, E2265, and CE549 contained both LT and ST genes (Table 4). The ST structural gene (estA) was present in all strains except in strain CE516, while the LT structural gene (eltA) was present in all four genomes. In addition, genes clyA (cytolysin), eatA (serine protease autotransporter), and ecpA (pilus subunit) were also present in all of the 4 ETEC strains, but genes leoA (accessory protein for LT secretion), tibA (autotransporter), and tia (surface protein) were absent in all genomes. Only CE549 contained the complete ~14 kb operon encoding longus known as a type IV pilus [20]. The etpA gene, which mediates adhesion between ETEC flagella and host cells [6], was present only in CE549 but absent in other strains. These specific virulence factors present in CE549 may increase its virulence in humans, but their functional effects remain to be further determined.

Table 4 Virulence factors present or absent in the 4 ETEC genomes

Antibiotic resistance genes

We compared all the protein-coding genes from the 4 ETEC strains with known antibiotic resistance genes [16] and found many kinds of antibiotic resistance genes, such as macrolide, tetracycline, fosmidomycin and polymyxin resistance genes (Table 5), most of which were annotated as Multidrug resistance efflux pump. Interestingly, strain CE549 has two tetracycline resistance genes that were not identified in the other 3 isolates. In addition, different resistance genes profiles were found between ETEC strains from different countries. For instance, the resistant type EmrE was only identified in the two strains isolated from China.

Table 5 Putative antibiotic resistance genes in the 4 ETEC strains determined using the antibiotic resistance genes database

Future directions

This study analyzed the prevalence of ETEC in Beijing, China and it was found that ETEC is not common. However the results reveal for the first time to our knowledge that a strain that belong to the globally distributed ETEC lineage L2 is multi resistant. This might have important implications for transmission of multi resistant ETEC strains as well as treatment of ETEC diarrhea and needs to be further addressed. The Chinese genomes presented here together with the two novel Bangladeshi ETEC genomes, will be valuable for future comparative genomic analysis of ETEC and will aid in molecular characterization of this important diarrheal pathogen.

Availability of supporting data

The genome sequences of ETEC strains CE516, CE549, E1777 and E2265 reported in this paper have been deposited in the GenBank under accession numbers JTGM00000000, JTGK00000000, JTHI00000000 and JUBB00000000, respectively.