Genetic makeup of Shiga toxin-producing Escherichia coli in relation to clinical symptoms and duration of shedding: a microarray analysis of isolates from Swedish children

Shiga toxin (Stx)-producing Escherichia coli (STECs) cause non-bloody diarrhea, hemorrhagic colitis, and hemolytic uremic syndrome, and are the primary cause of acute renal failure in children worldwide. This study investigated the correlation of genetic makeup of STEC strains as revealed by DNA microarray to clinical symptoms and the duration of STEC shedding. All STEC isolated (n = 96) from patients <10 years of age in Jönköping County, Sweden from 2003 to 2015 were included. Isolates were characterized by DNA microarray, including almost 280 genes. Clinical data were collected through a questionnaire and by reviewing medical records. Of the 96 virulence genes (including stx) in the microarray, 62 genes were present in at least one isolate. Statistically significant differences in prevalence were observed for 21 genes when comparing patients with bloody diarrhea (BD) and with non-bloody stool (18 of 21 associated with BD). Most genes encode toxins (e.g., stx2 alleles, astA, toxB), adhesion factors (i.e. espB_O157, tir, eae), or secretion factors (e.g., espA, espF, espJ, etpD, nleA, nleB, nleC, tccP). Seven genes were associated with prolonged stx shedding; the presence of three genes (lpfA, senB, and stx1) and the absence of four genes (espB_O157, espF, astA, and intI1). We found STEC genes that might predict severe disease outcome already at diagnosis. This can be used to develop diagnostic tools for risk assessment of disease outcome. Furthermore, genes associated with the duration of stx shedding were detected, enabling a possible better prediction of length of STEC carriage after infection.


Introduction
Shiga toxin (Stx)-producing Escherichia coli (STEC) are causative pathogens of non-bloody diarrhea (NBD), bloody diarrhea (BD), hemorrhagic colitis (HC), and hemolytic uremic syndrome (HUS), and are the most common cause of acute renal failure in children worldwide [1][2][3][4]. There is a large variety of STEC circulating within cattle; however, only a subset of them seem to cause disease in humans. Human STEC isolates are also designated enterohemorrhagic E. coli (EHEC), and O157:H7 is actually the predominant serotype responsible for outbreaks worldwide [5,6]. The main focus for diagnostics has therefore focused on EHEC O157; however, also non-O157 serotypes, such as O26, O103, O111, and O145, contribute significantly to cases of diarrhea, HC, and HUS [6]. In 2011, a large outbreak of E. coli O104:H4 in Germany led to HUS in more than 800 patients, and 53 deaths [7][8][9]. It is recommended that samples from suspected STECinfected patients should be tested as soon as possible after onset of symptoms [10]. Also, rapid STEC detection is important in outbreak management and patient treatment, including prompt parenteral hydration, monitoring for development of severe disease, and avoidance of antibiotics and antidiarrheal agents, which can exacerbate disease [11].
Stx is the most important virulence factor in EHEC/STEC. It is divided into two major types, Stx1 and Stx2, where Stx2 is responsible for the most cases with severe symptoms. Stx1 and Stx2 are further categorized into several subtypes; Stx1 consists of three subtypes, stx1a, stx1c, and stx1d, whereas Stx2 is composed of seven subtypes; stx2a, stx2b, stx2c, stx2d, stx2e, stx2f, and stx2g [12]. The Stxs are not exclusively responsible for pathogenesis of STEC; several other virulence factors encoded by genes carried on mobile genetic elements also play a role [13,14]. A major virulence factor is the outer membrane protein intimin, encoded by eae. Intimin is thought to be the determinant of the formation of attaching and effacing (A/E) lesions. Additional virulence markers are enterohemolysin and, in strains lacking eae, an autoagglutinating adhesin (Saa). Furthermore, Ferdous et al. recently showed that isolates with virulence genes encoding type III secretion proteins and adhesins were associated with HC, and that these isolates were from diverse phylogenetic backgrounds [15]. In addition, de Boer et al. proposed a diagnostic algorithm applied directly on stool samples of patients presenting with gastrointestinal symptoms to assess the public health risk of STEC infection [16].
A variety of methods have been used to classify STEC. Karmali et al. introduced seropathotypes to assess the pathogenic potential of STEC on the basis of their reported frequencies in human illness [17]. In addition, specific genetic lineages of STEC, such as the clade 8 of EHEC O157:H7, apparently are more prone to cause severe disease [18]. The analysis of single nucleotide polymorphisms (SNPs), also useful for outbreak investigations, can resolve closely related bacterial genotypes, and contribute to associations between bacterial genetic makeup and disease severity [18].
There are reports showing long-term STEC shedding, a long time after symptoms are resolved. The median duration of shedding has been shown to be 20 days; however, some patients were stx PCR-positive for up to 9 months [19,20]. This can have major implications for families. For instance, in Sweden there is a legal requirement for at least one stx-negative stool sample before children are allowed to return to kindergarten. A recent study could not find an association between duration of shedding and either stx type nor the presence of intimin [19]. The possibility to predict duration of shedding based on STEC features would enable optimized infection control measures.
DNA-microarray-based genotyping of STEC facilitates the simultaneous detection of hundreds of different genes including virulence, typing, and resistance markers. This information can also be used to assign isolates to different lineages and determine the genetic relationship between them, comparable with analysis of SNPs mentioned above and to discriminate between different stx subtypes [21,22].
The aim of this study was to characterize STEC isolated from children in Region Jönköping County, Sweden from May 2003 through January 2015 by microarray analysis, and correlate genetic makeup of strains to clinical symptoms and duration of STEC shedding. This could facilitate an assessment of the public health risk of strains when dealing with infected patients, optimized treatment regimens, and control measure guidelines.

Patients and isolation of bacteria
From the 215 children <10 years in Jönköping County (approximately 330,000 inhabitants, served by three hospitals and 46 health care centers) from May 2003 through January 2015 with PCR-positive STEC samples, all cultivable primary STEC isolates (n = 96) were included. STEC detection was done by detection of stx by real-time PCR on suspensions of overnight cultures on blood agar plates [23]. PCR positive specimens were sent to the Karolinska University Laboratory, Stockholm, Sweden for confirmatory testing and isolation of STEC according to methods described by Svenungsson B. et al. [24]. Patients were sampled weekly until they were stx-PCR-negative, and the duration of stx shedding was defined as the time from the first positive sample to the first negative sample, as previously described [19].

Isolate characterization
In Sweden, all STEC isolates are submitted to the The Public Health Agency of Sweden for confirmation and further typing as part of the national microbial surveillance program. In the Agency, isolates were serogrouped (with regard to O-type) by agglutination in micro titer plates using antisera (SSI Diagnostica, Copenhagen, Denmark).
Clinical data were collected from all patients through a questionnaire and by reviewing medical records (Table 1). Clinical symptoms (diarrhea, bloody diarrhea, abdominal pains, vomiting, fever, and HUS) were concluded in a clinical symptom score ranging from 0 to 6, where 6 corresponded to the most severe presentation. Criteria for HUS included three primary symptoms: hemolytic anemia with fragmentocytes, low platelet count, and acute renal failure with a creatinine above the age-specific reference range.

Statistical analyses
χ2 and Fisher's exact test were done for comparing categorical data, using Statistica 12 (StatSoft, Inc. Tulsa, OK) and VassarStats (http://vassarstats.net). Mann-Whitney U test was used for comparing continuous data, using Statistica (Statsoft, Inc.). Kaplan-Meier survival analyses were done using Statistica 12 (Statsoft, Inc.). Significant findings from univariate logistic regression were further explored by multiple logistic regression models, using a forward stepwise approach (Statistica v.13.1, Statsoft, Inc.). Due to the small sample size, subsets of no more than ten genes were analyzed in each multiple logistic regression model. P < 0.05 was considered statistically significant in all statistical analyses. A minimum spanning tree (MST) based on the 62 virulence genes present in at least one isolate was constructed using BioNumerics v. 6.1 (Applied Maths, Sint-Martens-Latem, Belgium).

Results
Patient data, stx subtypes, and serogenotype distribution Patient data and clinical symptoms are presented in Table 1. In total, 79 patients suffered from diarrhea, and 17 of these developed BD (one case with BD also developed HUS). Fourteen patients were free of symptoms, most of which were sampled for contact tracing and others due to other symptoms other than the six included in the clinical symptom score.
The concordance between agglutination and serogenotyping with regard to O-type was 83%. One isolate was assigned to different O-serotypes by the two methods. By serogenotyping and agglutination 12 and eight isolates respectively, were not assigned an O-serotype. The Oserogenotype was considered the true serotype when available. If an O-serogenotype was not available, the serotype revealed by agglutination was used. Serogenotyping always assigned an H-serotype.

Microarray analysis
Analysis of the 96 virulence genes (including stx) in the microarray revealed that 62 genes (65%) were present in at least one isolate. Statistically significant differences (χ2 and Fisher's exact test) in prevalence of these genes (Tables 3 and 4) were observed for: & Twenty-one genes when comparing patients with BD and without bloody stool (18 of 21 associated with BD). & Nine genes when comparing patients aged 0 to 4 to those aged 5 to 10 (six of nine associated with lower age). & Two genes when comparing short (<3 weeks) to long (≥3 weeks) carriage (both genes associated with longer stx shedding). & Sixteen genes when comparing low (0-3) to high (4)(5)(6) total score of clinical symptoms (ten of 16 associated with high score).

Duration of stx shedding in feces
The median length of carriage was 24 days (Table 1), which was used to separate short (<3 weeks) and long (≥3 weeks) carriage. Fisher's exact test revealed two genes (lpfA, senB), associated with long duration of stx shedding in feces. Kaplan-Meier survival analysis (of the bacteria) was used to determine how the genetic makeup related to duration of shedding, and revealed seven genes influencing prolonged stx shedding; the presence of three genes (lpfA, senB and stx1) and the absence of four genes (espB_O157, espF, astA and intI1) predicted long duration of shedding.

Multiple logistic regression
When comparing patients with BD and non-bloody stool, univariate logistic regression revealed statistically significant differences for the same genes as identified using the χ2 and Fisher's exact test (Tables 3 and 4), with the exception of lpfA. The genes were assembled into the following groups: adhesins (eae and espB_O157), secretion systems (espA, espF, espJ, etpD, nleA, nleB, nleB O157:H7, nleC, tccP), toxins (astA and toxB), and shigatoxins (stx1, stx2, stx2a, stx2c) and analyzed separately using forward stepwise logistic regression. The   Fig. 1 Distribution of serotypes among STEC causing BD and NBS, ONT O-serotype non-typeable univariate statistically significant genes from the miscellaneous group (iss, katP and tir) were not analyzed by multiple logistic regression due to their diverseness. espB_O157, espF, nleC, astA, stx2a and stx2c remained significant in the multiple logistic regression model. Of the eight genes (espB_O157, espF, etpD, nleB O157:H7, stx2, stx2a, stx2c) associated, by univariate logistic regression, with a high total score of clinical symptoms espF and stx2a remained significant in the multiple logistic regression model. When comparing patients of younger and older age, univariate logistic regression revealed statistically significant differences for the same nine genes as identified using the χ2 and Fisher's exact test (Tables 3  and 4). Among these only ireA was significant in the multiple logistic regression model. A single gene (lpfA) was statistically significant in the univariate logistic regression model for duration of shedding; hence, no multiple logistic regression model was applied.

Analysis of similarity
The analysis of similarity (MST) included the 62 virulence genes (including stx) present in at least one isolate. Serogenotypes and antibiotic resistance genes were excluded from the data used to construct the trees. To describe the isolates from the trees, the MSTs in Figs. 2 and 3 are divided into three groups (groups A-C, only shown in Fig. 2). In group A, all strains contained a stx2 subtype and eae, in group B 33% contained stx2 subtypes and all strains eae. In group C stx1 dominated, 30% contained stx2b, and none of the strains contained eae. Group A was significantly associated with BD (p = 0.0004) and high symptom score (p = 0.0007).

Antibiotic resistance gene content in STEC
Genes associated with resistance to aminoglycosides, beta lactams, chloramphenicol, macrolides, quinolones, sulphonamides, and trimethoprim were detected in low prevalence among the STEC isolates. The most common antibiotic resistance genes were sul2, strB, and dfrA12 (present in 15, 14, and 13 isolates respectively, and encoding sulphonamide-, streptomycin-and trimethoprim resistance respectively).

Discussion
In this study, we found various STEC genes associated with severe disease and long duration of stx shedding.
The microarray analyses revealed 62 virulence genes present in at least one of the isolates, and of these genes 18 were associated with severe disease (BD and one HUS case) and two genes with milder symptoms. Among the major virulence factors stx2 and eae were associated with severe disease, which is in line with previous findings [27][28][29]. Among the severe STEC cases all carried subtype stx2a and/or stx2c, or alternatively, stx1a. Also multiple regression models showed that stx2a was associated with severe disease (BD and high total score of clinical symptoms). The subtypes stx2a or stx2c are known to cause severe outcome such as HC and HUS in STEC-infected patients [30]. In contrast, stx1 and its subtypes have previously been associated with milder disease [12,29,31]. The MST cluster analysis showed three groups; A, B, and C (Fig. 2). Group A was significantly associated with BD; all of these strains contained stx2 subtypes in combination with Table 4 Prevalence of genes where significant differences were observed in at least one of the comparisons (length of shedding and symptom score) eae. This combination of virulence genes was also shown to be predictive for severe disease elsewhere [12]. Most strains in group B contained a subtype of stx2, and more than 60% of them harbored eae. In group C, all strains contained stx1 or stx2b, and lacked eae. stx1 subtypes and stx2b have only rarely been associated with severe disease in previous studies [12].
Our data indicate that there is a close genetic relationship between strains that are prone to cause BD and non-bloody stool (NBS). Recently, this has also been shown for EHEC O157 by typing based on 96 specific SNPs. Nine E. coli O157:H7 clades were defined, and clade 8 strains were associated with most cases of severe disease including HUS [18]. In addition to the major known virulence factors described above, 13 additional STEC genes were associated with BD, of which four (espB_O157, espF, nleC, astA) were still associated with BD after multiple logistic regression. Most genes encode toxins (e.g., astA, toxB), adhesion factors (i.e., espB_O157, tir) or secretion factors (e.g., espA, espF, espJ, etpD, nleA, nleB, nleC, tccP). The majority of these genes were also shown to be associated with BD in a recent Dutch study [15]. Furthermore, a recent diagnostic algorithm applied directly on fecal samples to assess the public health risk of STEC infection showed that six out of the 13 genes described above predict severity (toxB, espA, tccP, nleA, nleB and tir) [16]. In addition, Buvens et al. found that the individual genes stx2, eae, espP, sen, nleB, nleE, and the efa cluster were significantly more often present in non-O157 STEC associated with HUS [32]. STEC O157 is generally considered to cause more severe infections than non-O157 [33,34]; this has, however, recently been questioned [35]. Also, our data, with severe cases (i.e., BD) caused by STEC of five different serotypes, indicate that serotype alone is not enough to predict how severe the infection will be. Instead, the results indicate a possibility of predicting disease severity based on Fig. 3 MST based on the microarray results of 62 virulence genes for all STEC isolates. Dark blue represents patients with a total clinical symptom score of 0, light blue = 1, purple = 2, yellow = 3, orange = 4 and red = 5. The length of the branches is proportional to the distance between the types. The size of the nodes represents the number of isolates included in that type Fig. 2 MST based on microarray results of 62 virulence genes for all STEC isolates. eae is present in all strains in groups A and B but in none of the strains in group C. In group A, all strains have an stx2 subtype, and in group B 33% do. In group C, stx1 dominates and 30% harbors stx2b. All strains with serotype O157:H7 are found in group A, and O26:H11 are found in group B. Light grey represents patients with BD and dark grey patients with NBS. The length of the branches is proportional to the distance between the types. The size of the nodes represents the number of isolates included in that type STEC genetic makeup. In our study, we found two genes (astA and katP) associated with BD and one gene (iss) associated with NBS, which were not found in the studies by de Boer et al. and Ferdous et al. [15,16].
The risk assessment of STEC virulence based on analysis of the genetic makeup of the actual strain might have the potential to provide valuable information to community health services in estimating the level of action required (with regard to source/ contact tracing and infection control measures to minimize secondary transmission) to address the potential public health risk [30]. It has previously been described that rapid STEC detection is important in outbreak management and patient treatment [11]. In line with this, the microarray procedure can be performed in a single day, providing a potential tool for risk assessment. The clinical outcome in different patients infected by a certain STEC strain varies partly due to variation in, for example, age, immunity, ingested dose, and antimicrobial treatment. This may hamper the conclusions regarding virulence factors associated with disease severity. However, the statistical analysis and number of cases in this study makes the findings relevant, which are also in part described by others [15,16].
Fisher's exact test revealed two genes (lpfA, senB) associated with long duration of stx shedding in feces. lpfA encodes fimbriae, and senB encodes a toxin. In addition, Kaplan-Meier analyses revealed seven genes influencing prolonged shedding. Of these genes, the presence of three genes (lpfA, senB, and stx1) and the absence of four genes (espB_O157, espF, astA, and intI1) predicted long duration of stx shedding. Our data indicate that the presence of a combination of genes coding for fimbriae (lpfA) and two toxins (senB and stx1) and the absence of a toxin (astA), a secretion factor (espF), an adhesion molecule (esp_O157), and an integron (intI1), could predict long duration of stx shedding. It has to the best of our knowledge never been shown that the presence or absence of certain STEC virulence factors is associated with prolonged duration of stx shedding in humans. This could in future possibly be used to predict duration of shedding and to optimize, or individualize, infection control measures to minimize secondary transmission of STEC. Furthermore, a multiplex real-time PCR performed directly on feces to detect genes associated with long duration of STEC shedding could be developed based on these findings. However, the relevance of these genes in predicting long duration of shedding needs further investigation.
The distribution of serotypes in the present study was very similar to national as well as European STEC data (http://www. sva.se/globalassets/redesign2011/pdf/om_sva/publikationer/ surveillance-2015-w.pdf, http://ecdc.europa.eu/en/publications/ publications/food-and-waterborne-diseases-surveillance-report-2015.pdf). The concordance between traditional serotyping and serogenotyping was 83% for O-serotypes. Agglutination was not used to determine H-serotype. These combined data indicate that serogenotyping can be used to determine the serotype, which has also been shown in previous studies [25,36].
In conclusion, we found STEC genes that might predict severe disease outcome already at diagnosis. This could perhaps be used to develop diagnostic tools for risk assessment of disease outcome and public health risk, including the food chain. Furthermore, genes associated with the duration of stx shedding were potentially detected enabling a possible better prediction of length of STEC carriage after infection.