Introduction

Urinary tract infection (UTI) is one of the most common infectious diseases in humans and a major cause of morbidity. It is estimated that 40–50% of adult healthy women have experienced at least one UTI episode (Foxman 2002). UTI can be caused either by pathogenic strains leading to symptomatic UTI or by asymptomatic bacteriuria (ABU) strains resulting in a symptom-free carriage resembling commensalism. Escherichia coli is responsible for more than 80% of all UTIs. Acute pyelonephritis is a severe acute systemic infection caused by uropathogenic E. coli (UPEC) clones with virulence genes clustered on “pathogenicity islands” (PAIs) (Eden et al. 1976; Funfstuck et al. 1986; Stenqvist et al. 1987; Orskov et al. 1988; Johnson 1991; Welch et al. 2002). Paradoxically, a large proportion of UTIs are caused by ABU E. coli. Individuals infected with ABU-class E. coli may carry high urine titres of a single E. coli strain for months or years without provoking a host response.

Escherichia coli 83972 is a prototype ABU strain and undoubtedly the best-characterised ABU-class E. coli to date. Strain 83972 was originally isolated in the 1970s from a young girl who had carried it for at least 3 years without symptoms (Lindberg et al. 1975; Andersson et al. 1991). The strain is well adapted for growth in the human urinary tract (UT) where it establishes long-term bacteriuria (Hull et al. 2000). It has been used for prophylactic purposes in numerous studies; as such it has been used as an alternative treatment in patients with recurrent UTI who are refractory to conventional therapy (Hull et al. 2000). An ongoing study on patients infected with strain 83972 has so far reported over 50 patient years with no serious side effects (Sundén et al. 2006).

ABU patients may carry a single strain for months or years, creating a condition that resembles commensalism, but with a strain that may have evolved from a pathogenic ancestor. Several lines of evidence support the notion that the ancestor of strain 83972 was a pyelonephritic UPEC strain; it belongs to the B2 clonal group, a group associated with pyelonephritis and other extra-intestinal invasive clinical syndromes such as bacteremia, prostatitis and meningitis; the strain also contains gene clusters in various stages of erosion encoding the three UPEC-class fimbriae, i.e. the fim, pap and sfa/foc clusters (Klemm et al. 2006; Roos et al. 2006a).

Several studies have investigated the virulence characteristics of uropathogenic E. coli (UPEC) isolates and ABU isolates, in order to get better understanding of how some UTI strains can cause severe disease, while others can be used prophylactically to prevent the same (Blanco et al. 1996; Dobrindt et al. 2003; Vranes et al. 2003; Johnson et al. 2005; Marrs et al. 2005). ABU strains have been shown to lack many of the virulence-associated phenotypes; many of them are nonhaemolytic, nonadherent and lack haemagglutination ability (Vranes et al. 2003). Strain 83972 lacks many of the virulence-associated phenotypes but has been shown to carry many of the virulence-associated genes, such as kps, iutA, fyuA and malX (Dobrindt et al. 2003). However, apart from the fimbrial clusters, the strain has not been sequenced and it is not known which genes it shares with other E. coli isolates, which of the genes in the E. coli “core genome” it carries and which genes it shares with other UTI isolates.

Thus far, although a number of UPEC isolates (i.e. CFT073, 536, UTI89 and F11) have been completely sequenced, no genomic sequencing of any ABU strain has been reported. Comparative genomics profiling using microarray chips designed to cover entire genomes is one strategy to obtain information about variability between different strains of the same species and indication of horizontal gene transfer (Willenbrock et al. 2006). DNA microarray-assisted functional genomics provides the global expression profile of a strain, revealing which genes are expressed under certain conditions. Global gene expression profiling of ABU strain 83972 employing the GeneChip E. coli Genome 2.0 Array (Affymetrix), containing four E. coli genomes including that of the UPEC isolate CFT073, will not only provide information regarding the up- and down-regulation of genes comparing different conditions, but will also reveal which genes are actually present (expressed) in the genome of 83972. Bacterial pathogens differ from commensals by expression of specific virulence factors such as those that mediate histological damage. Commensals, in contrast, have generally been regarded as bacteria lacking such virulence factors or other specific mechanisms for interaction with host tissues. Here we compare the global expression profiles of E. coli ABU strain 83972 grown under a number of different in vitro conditions and in three patients in order to get a representative picture of which genes are present/expressed in the genome of this asymptomatic UTI strain.

Materials and methods

Bacterial strain

Escherichia coli 83972 is a prototype ABU strain and lacks defined O and K surface antigens (Lindberg et al. 1975). It belongs to the ECOR group B2 together with many other UTI strains such as the well-characterized and virulent E. coli isolates CFT073, 536 and J96.

Growth conditions and stabilisation of RNA for microarray experiments of E. coli 83972 grown on urine agar plates

Human urine was collected from four healthy men and women volunteers who had no history of UTI or antibiotic use in the prior 2 months. The urine was pooled, filter sterilised, stored at 4°C, and used within the following day. E. coli 83972 was grown aerated in triplicates in 10 ml of human urine for 6 h. Thereafter, 100 μl of each culture was spread on urine plates (1:1 ratio of human urine and 0.9% NaCl) containing 1.5% agar. The plates were incubated at 37°C for 16 h. Subsequently, 600 μl of a 1:2 mixture of PBS and RNAprotect™ Bacteria Reagent (QIAGEN AG) was poured on the plates, mixed with the lawn of cells and incubated for 5 min at room temperature to stabilise RNA. The stabilised mixture was then centrifuged and pellets were stored at −80°C. The samples from 83972, grown exponentially in MOPS and urine, in urine biofilms and in patients (>108 CFU/ml) were all treated identically with RNAprotect Bacteria Reagent and have been described previously (Roos and Klemm 2006; Hancock and Klemm 2007).

RNA isolation and microarray hybridisation

Total RNA was isolated using the RNeasy® Mini Kit (QIAGEN AG) and on-column DNase digestion was performed using RNase-Free DNase Set (QIAGEN AG). The quality of the total RNA was examined by agarose gel electrophoresis and by measuring the absorbance at 260 and 280 nm to ensure intact high-quality RNA. Purified RNA was precipitated with ethanol and stored at −80°C until further use. Conversion of RNA (10 μg per sample) to cDNA, labelling and microarray hybridisation were performed according to the GeneChip Expression Analysis Technical Manual 701023 Rev. 4 (Affymetrix, Inc., Santa Clara, CA). GeneChip E. coli Genome 2.0 Arrays (Affymetrix) were used for hybridisation of the labelled cDNA. The microarrays were scanned using the GeneChip Scanner 3000.

Data analysis

The raw intensities from the microarray experiments were background corrected and quantile-normalised. All microarray data in the study were obtained from mRNA being converted to cDNA, i.e. no genomic DNA was used for hybridisation. Probe intensities were summarised to yield expression values for each probe set or gene. These calculations were performed using the implementation of GCRMA (Wu et al. 2004) in Bioconductor (Gentleman et al. 2004) (http://www.bioconductor.org, http://www.r-project.org). In order to derive a cut-off expression value for making presence/absence calls, we made use of intensities due to control probe sets with IDs beginning with AFFY. There were 96 such probes. The cut-off value was set so that only the top 1/16th of these control probes would be flagged as present. As a result 4,109 genes in the array were marked as present; the remaining genes are referred to as “absent” throughout this report, i.e. these genes could be truly absent, non-homologous or not expressed during any of the seven different growth conditions. Orthologs of all the genes in the array across E. coli K12 MG1655, E. coli O157:H7 Sakai, E. coli O157:H7 EDL933 and E. coli CFT073 were identified using bidirectional best hit BLAST.

Microarray data accession number

The supporting microarray data have been deposited in ArrayExpress (http://www.ebi.ac.uk/arrayexpress) with accession numbers E-MEXP-584 (MOPS, urine and patient arrays), E-MEXP-926 (biofilm arrays) and E-MEXP-1453 (urine-agar plate arrays).

Results

Genes expressed in ABU E. coli 83972

The bacterial transcriptome is a dynamic entity that reflects the organism’s immediate, ongoing response to its environment. DNA microarray-assisted functional genomics provides the global expression profile of the genome. The genomic expression profiles of the urinary tract infectious E. coli isolate 83972 were analysed under several different growth conditions and in different media using the GeneChip E. coli Genome 2.0 Array (Affymetrix). This array contains approximately 10,000 probe sets for all 20,366 genes present in E. coli strains MG1655 (K-12), CFT073 (UPEC), EDL933 (EHEC) and O157:H7-Sakai (EHEC). Due to the high degree of similarity between the E. coli strains, whenever possible, a single probe set is tiled to represent the equivalent ortholog in all the four strains.

In total, 21 microarrays were included in the study; arrays in triplicates were hybridised with RNA of the ABU strain 83972 cultured (1) aerobically to exponential phase in MOPS minimal medium, (2) aerobically to exponential phase in pooled human urine, (3) on urine agar plates, (4) statically in urine biofilm on Petri dishes and finally, (5–7) in three patients (Pat1, Pat2 and Pat3) in vivo. Figure 1 shows the expression levels of all CFT073 genes in strain 83972 during growth in the different environments; many genes were similarly expressed during all seven conditions. However, some genes were expressed only during one or a few of the conditions. For example, the genes encoding yersiniabactin in the high pathogenicity island (HPI), i.e. PAI-asnT, were highly expressed in Pat2 (and in biofilm), but much lower during the other conditions. The c2557–c2563 genes (around 2.4 M in Fig. 1), involved in nucleotide sugar and mannose metabolism and encoding hypothetical proteins, were highly expressed in Pat3 but not under any other condition. Another example is the c1968–c1971 genes (around 1.8 M), i.e. ydfI encoding a d-mannonate oxidoreductase, ydfJ encoding a metabolite transport protein and rspAB involved in the starvation response, which also were highly expressed only in Pat3.

Fig. 1
figure 1

The expression levels of CFT073 genes in strain 83972 during seven different growth conditions. The outer blue circle shows the calculated absence (0.0) and presence (1.0) of CFT073 genes in ABU strain 83972. The seven PAIs of CFT073 are indicated in red

In total, there were 108 genes that were significantly changed in all six urine environments compared with MOPS. Twenty of these genes were up-regulated in all six urine conditions whereof half were related to different iron systems, i.e. iroN, fepA, fecI, iucBC, fhuA and exbD, as well as b3337 and b1995 involved in iron storage and encoding a putative haemin receptor, respectively. The other urine up-regulated genes were marA, a multiple antibiotic resistance gene, sodA, encoding superoxide dismutase, ahpC, encoding hydroperoxide reductase, b1452, c1220, c4210, lysA, rrsG, rrsH and yrbL. Most iron acquisition systems were expressed in all the six urine environments; the enterobactin, salmochelin, aerobactin, haem and sitABCD systems were all expressed in all the six urine conditions (although weaker in the urine plates). Interestingly, the fec system, which is a citrate-dependent iron uptake system found in K-12 but missing in CFT073 and other UPEC strains, was highly expressed in Pat3. Up-regulation of all these iron-uptake systems revealed that the strain has an impressive array of iron acquisition systems and all of these are active in the human bladder.

Nineteen of the top 31 highest expressed genes overall were genes involved in ribosomal synthesis. The high expression of ribosomal genes in E. coli 83972 suggests a rapid growth rate; the highest expression values were obtained in Pat1 followed by MOPS, urine and Pat2, indicating a growth rate just as fast in the patients in vivo as in exponential growth phase in a shake flask. This supports our hypothesis that the strain’s optimized growth properties in human urine explain its ability to successfully colonize the human urinary tract in the absence of functional fimbriae (Roos et al. 2006b).

Figure 1 reveals that strain 83972 almost exclusively expresses the iron uptake and transport systems in the seven CFT073 PAIs, almost none of the other genes in these islands are expressed. There are only two exceptions; c0300, located in PAI-aspV encoding a hypothetical protein, and c3686–3690, located in PAI-pheV encoding YrbH and KpsEDC. The yrbH gene belongs to the 131 genes that were recently identified as UPEC specific and it was the second highest expressed UPEC-specific gene in mice (Lloyd et al. 2007); in our samples the highest expression was found in the three patients and in MOPS. Outside the PAIs there are a few genes/gene clusters that are highly expressed in all urine samples or only in the patients. The enterobactin system was up-regulated during all urine conditions and the chu cluster (involved in haem uptake and transport) was highest up-regulated in the patients followed by in vitro urine growth. The ycdO and ycdB genes were highly expressed in the three patients; these have recently been identified to encode haemoproteins, probably involved in iron transport, induced at acidic conditions (Sturm et al. 2006).

Looking at the significantly changed genes for all six urine conditions compared with MOPS (in total 1,897 genes) revealed that Pat2 and Pat3 shared the largest number of similarly changed genes; 75% of all changed genes in Pat2 are regulated in the same way (i.e. up or down-regulated) in Pat3 (Fig. 2). Interestingly, Pat1 shared the largest number similarly regulated genes with the biofilm growth mode; also for Pat2 and Pat3, the biofilm growth mode showed a larger number of similarly regulated genes than Pat1 or any other condition. This could indicate that the expression profile of strain 83972 during in vivo growth is closer related to biofilm growth than to growth in shake flasks or plates.

Fig. 2
figure 2

Number of significantly up- and down-regulated genes in strain 83972 during the different growth conditions (i.e. exponential growth in urine, on urine-agar plates, in urine biofilm, in vivo in three patients) compared with exponential growth in MOPS minimal lab medium. The diagonal boxes (dark blue colour) show the number of significantly changed genes during cultivation in that specific condition compared with MOPS (e.g. 664 genes were up- or down-regulated in urine compared with MOPS and 938 genes were changed in plates compared with MOPS) and the other boxes show the number of significantly changed genes shared between two conditions (e.g. 311 of the 664 and 938 significantly changed genes in urine and plates compared with MOPS were shared between these two conditions, i.e. up- or down-regulated in both urine and plates compared with MOPS). Stronger blue colour indicates larger number of significantly changed genes shared between two conditions

Closeness to CFT073

Given the different growth conditions analysed, it is not unrealistic to assume that most genes present in strain 83972 would be expressed, to some extent, under at least one of these seven different conditions/environments, i.e. growth in liquid and on solid media; during exponential phase, in biofilm and during colony-forming conditions; in different growth media (human urine and minimal lab medium); as well as in vivo in three different individuals.

Data analysis of the 21 microarrays revealed that of the 8,716 E. coli transcripts on the microarray (not including probes representing intergenic regions and controls), 4,109 transcripts (47%) showed expression levels above detection limit during at least one of the growth conditions investigated (referred to as “present”, see blue, outer circle in Fig. 1). Figure 3 shows the distribution among the four E. coli genomes represented on the microarray of these 4,109 transcripts expressed in E. coli 83972. Not surprisingly, the UTI strain 83972 shows highest similarity with the UPEC isolate CFT073 of the four genomes on the array; the large majority of the 4,109 transcripts found present (96.3%) can be found in CFT073, corresponding to 71% of the CFT073 genome. E. coli 83972 expressed 150 genes that do not exist in CFT073; 85 of these can be found in MG1655 and the remaining 65 genes can be found exclusively in one or both of the two EHEC strains present on the array (Fig. 3). Thirty of the 65 genes homologous to EHEC genes are encoding proteins of cryptic prophages, whereas the large majority of the remaining 35 genes encode unknown or hypothetical proteins. The 85 genes that can be found in MG1655 but not in CFT073 includes the fec cluster encoding an iron citrate transport system (fecABCDEIR). In total, 3,959 CFT073 genes were expressed in strain 83972; this could be compared with 4,162 CFT073 genes present in the UPEC (cystitis) isolate F11 (Lloyd et al. 2007).

Fig. 3
figure 3

Venn diagrams showing the distribution of the 4,109 genes filtered present in strain 83972. The percentages indicated below each strain show how large part of the genome of the corresponding strain was filtered present in strain 83972

E. coli core genome

There is a large diversity in size of the chromosome of E. coli; in all 32 E. coli (and Shigella) genomes that have been fully sequenced, or at least with an expected coverage of greater than 99%, the size of the chromosome ranges from 4.5 to 5.6 Mbp. The genomes show a considerable amount of diversity, and the estimated size of the current pan-genome was estimated to contain 9,433 different genes (Willenbrock et al. 2008). Several studies have identified sets of “core genes” found in most E. coli genomes. However, the number of these core genes tends to decrease as the full genomic sequences of new E. coli strains become available. The size of the E. coli core genome has recently been predicted to contain 1,563 genes for an infinite number of E. coli strains, and the number of new genes predicted from each new E. coli genome that is sequenced is ∼79 (Willenbrock et al. 2008). In our analysis, 2,472 (60%) of the genes found present in strain 83972 were common in all the four E. coli genomes on the array (Fig. 3), which is well above the estimated E. coli core genome and also above the 2,241 common genes conserved among the 32 sequenced E. coli strains (Willenbrock et al. 2008). Furthermore, considering the fact that the microarray contains only four E. coli genomes, the total number of genes detected present (4,109 genes) in 83972 seems reasonable comparing the size of other sequenced UTI E. coli genomes. The genome size of strain 83972 has been reported to be 4.9±0.2 Mbp (Zdziarski et al. 2007), indicating that the strain contains roughly an additional 800 genes, not identified in the present analysis.

Of the 2,734 transcripts on the chip that are present in all the four strains represented on the microarray, 393 transcripts were below detection limit on all 21 microarrays and filtered as “absent” in strain 83972. These included 81 genes encoding hypothetical proteins. Several of the absent genes were found in clusters, many of which are involved in surface structure elements and chemotaxis. These included genes involved in flagellar biosynthesis (flgABCDEFGHIJKL, flhABE, fliACDEFGHIJKLNOPQRSTZ and motAB), curli production (csgABCEFG), colanic acid synthesis (wcaABCDEFGHI and wza) and chemotaxis (cheBRWYZ and tap). Other whole cluster of genes that were not expressed in the ABU strain but found in all the four E. coli present on the chip were hyaBCDEF (hydrogenase I), hycACD (hydrogenase 3), tauABCD (responsible for taurine uptake in E. coli) and b1500–1505 (containing the fimbrial-like genes ydeQRST), as well as the fimEAIC genes which previously have been shown to be absent in strain 83972 (Klemm et al. 2006).

UPEC-associated genes present in strain 83972

The four UPEC isolates that have been sequenced, CFT073, UTI89, 536 and F11, contain 5,379, 5,154, 4,766 and 4,467 genes, respectively, on the chromosome. CFT073 and 536 are both O6 strains and yet show a large diversity; the genome of 536 is almost 300 kb smaller than that of CFT073 (Brzuszkiewicz et al. 2006). The genomic differences are mainly restricted to large pathogenicity islands, the additional DNA in CFT073 are genes of five cryptic prophages, which are absent in 536 (Brzuszkiewicz et al. 2006). The 427 genes that are present only in the strain 536, and the 432 genes present only in the two UPEC (compared with other sequenced E. coli) are scattered all over the genome (Brzuszkiewicz et al. 2006). Over 70% of the CFT073 transcripts were present in strain 83972 compared with 89% of the CFT073 transcripts found in strain 536. Figure 4 shows the homology of 16 sequenced E. coli and Shigella isolates including the three sequenced UPEC strains (UTI89, 536 and F11) pasted on the CFT073 genome; the outer, red circle in the figure shows the results from the presence/absence analysis on strain 83972. Many virulence-associated genes are located on the large pathogenicity islands (PAIs) found in different UPEC strains. The large pathogenicity island at pheV in CFT073 (also called PAI ICFT073) encodes haemolysin (hlyCABD), aerobactin biosynthesis proteins (iutA and iucABCD), antigen 43 (c3655) and the secreted autotransporter toxin (sat); these were all filtered present in our analysis, suggesting that strain 83972 harbours a similar island on its chromosome. Interestingly, the aerobactin system is missing in the other three UPEC isolates. Furthermore, this PAI contains genes encoding the uropathogenic-associated P fimbriae (papIBAHCDJKEFG). The pap gene cluster of 83972 has been sequenced (Klemm et al. 2006); the pap genes are all present and show 72–100% sequence homology with the corresponding genes in CFT073. The results of the microarray analysis corresponded very well to the observed sequence homology of the different genes in the cluster (i.e. if a specific gene on the microarray is represented with probes that contain a non-homologous region compared with the corresponding gene in the hybridised sample, that gene will not hybridise and will be filtered absent); the six genes with highest sequence homology were filtered present (i.e. papHCDJKF with 98, 100, 100, 98, 99 and 95% homology, respectively) and the four with least sequence homology were filtered absent (i.e. papIAEG with 94, 83, 77 and 72% homology).

Fig. 4
figure 4

BLAST atlas comparing the absent (0.0) and present (1.0) CFT073 genes in strain 83972 with other sequenced E. coli and Shigella strains, including the three sequenced UPEC isolates 536, UTI89 and F11. The UPEC CFT073 genome is used as reference. The outer blue circle represents the calculated absence/presence in 83972 followed by the three UPEC isolates; the six inner circles represent Shigella strains. The seven PAIs of CFT073 are indicated in red. The blow-up shows the presence/absence of the fim cluster (c5391–5400) in strain 83972

The employed microarray contains probes for all ten known and putative fimbriae-encoding gene clusters in CFT073. Together with the pap cluster, two other fimbrial clusters that have been associated with UPEC virulence are known to be present in strain 83972 and have been sequenced, i.e. the fim and sfa/foc clusters. As for the pap cluster, the filtering of absent genes corresponded very well to the actual presence and sequence homology of the genes; strain 83972 contains a large deletion in the fim cluster but shows high sequence homology with the present genes, and all the genes in the deleted part of the cluster, i.e. fimEAIC, were filtered absent (see blow-up in Fig. 4). Also, the sfa/foc cluster in 83972 shows high homology with that in CFT073 (98–100%), and eight of nine genes were filtered present; the putative regulatory gene, sfaC, was filtered absent. Regarding the other fimbrial clusters present on the microarray, none of the genes encoding F9 fimbriae, which appear to be common in UPEC and plays a role in biofilm formation (Ulett et al. 2007), and another putative fimbriae (yehABCD) were expressed and might be absent in strain 83972 (Table 1).

Table 1 Analysis of fimbriae-encoding genes in strain 83972

Presence of other pathogenicity islands in 83972

Strain 83972 seems to carry most of the pathogenicity islands of CFT073 (or PAIs similar to the ones in CFT073) according to our present/absent analysis (Table 2). The only PAI of CFT073 in which most genes (i.e. 93%) were filtered absent in strain 83972 is PAI-pheU (PAI IICFT073), the island that contains a second pap cluster. The three genes filtered present in this PAI are present in several of the other sequenced E. coli and Shigella strains indicating that these three genes not are unique/characteristic for this island wherefore this PAI is most probably absent in strain 83972.

Table 2 Analysis of presence of pathogenicity islands in strain 83972

Insertion of the high pathogenicity island (HPI) of Yersinia pestis has been suggested to be one of the earliest events in the evolution of extraintestinal E. coli strains (Welch et al. 2002). The genes of HPI encoding yersiniabactin (Ybt) were all expressed in strain 83972. The HPI genes have been found up-regulated during urine biofilm growth of 83972 indicating that Ybt-mediated iron-uptake might play an important role in biofilm growth (Hancock and Klemm 2007) and a deletion mutant in the Ybt uptake receptor (FyuA) exhibits reduced biofilm formation (Hancock et al. 2008). The HPI genes have also been found up-regulated in vivo in two of the three patients (particularly in Pat2, see Fig. 1) infected with this strain (Roos and Klemm 2006).

The pks island, a recently characterised and widely spread genomic island found in, for example, meningitis strains and the uropathogenic strain CFT073, encodes a machinery for the synthesis of peptide–polyketides hybrid compounds (Nougayrede et al. 2006). The presence of the island is associated with the accumulation of double-strand DNA breaks in host cells and has genotoxic activity (Nougayrede et al. 2006). This island was expressed in strain 83972 and up-regulated in urine and in vivo (Table 4; Fig. 1). The pks island is widely distributed within E. coli phylogenetic group B2, and has been found in both pathogenic and commensal isolates; in commensal strains the cell-cycle-blocking activity might slow the turnover of the intestinal epithelium, and therefore prolong colonisation.

Presence of positively selected UPEC genes

A recent paper comparing the UPEC isolates CFT073 and UTI89 with six other finished E. coli genome sequences presented 29 genes that are under positive selection only in UPEC strains (Chen et al. 2006). These 29 genes are involved in various aspects of cell surface structure, DNA metabolism, nutrient acquisition and UTI. Of these 29 genes, 25 were filtered present in our ABU strain 83972; many of these genes are represented by more than one transcript on the array due to sequence differences among the four strains present on the array, in all cases the gene filtered present in 83972 corresponded to the CFT073 transcript. Four genes were filtered absent, agaI, yjiL, recC and yegO; they encode a putative galactosamine-6-phosphate isomerase, a hypothetical protein, exodeoxyribonuclease V gamma subunit and a hypothetical transport protein, respectively. The genes in the two COG categories that were significantly enriched in the two UPEC strains, i.e. “cell wall/membrane biogenesis” (amiA, cutE, fepE, ompC, ompF and yfaL) and “secondary metabolites biosynthesis, transport and metabolism” (entD, entF and yojI) (Chen et al. 2006), were all present in strain 83972.

Functional analysis of MG1655 transcripts of ABU E. coli 83972

To gain more information concerning what type of genes were absent, the MG1655 genes were grouped into functional categories defined by the clusters of orthologous groups (COGs) of proteins (Tatusov et al. 1997). Previous studies have, in attempts to identify essential genes and the E. coli core genome, found that groups with genes involved in metabolism and various cellular processes (excluding cell motility) contain a substantially higher percentage of conserved and essential genes, while COGs with genes of unknown function and external origin as well as genes involved in signalling and motility contain fewer essential genes (Anjum et al. 2003; Gerdes et al. 2003). Classification of the absent genes of strain 83972 revealed that the groups “cell motility”, “defence mechanisms” and “not in COGs” had a significant overrepresentation of absent genes (Table 3). A significantly lower proportion of absent genes were found in the groups: “cell cycle control”, “posttranslational modification” and “translation”. This is in agreement with a previously published study of pathogenic E. coli; Anjum et al. (2003) studied 26 strains of E. coli and found that the two groups with largest proportion of absent genes were “not in COGs” and “cell motility”, while the six groups with the lowest proportion of absent genes were “translation”, “cell division”, “posttranslational modification”, “coenzyme metabolism”, “nucleotide transport and metabolism” and “energy production and conversion”, which all, with exception for the last group, contained significantly fewer absent genes in strain 83972 (Table 3). This suggests that strain 83972 utilises a similar set of core genes as other E. coli strains.

Table 3 Distribution of absent genes in functional categories

CFT073 genes absent in strain 83972

There were 1,636 CFT073 genes that could not be detected according to our expression profiling in ABU strain 83972; 961 of these genes are exclusively found in CFT073, i.e. not present in the other three strains represented on the array. The majority, 645 genes, corresponded to hypothetical, putative or unknown proteins. Considering the very different patient symptom profiles of strains CFT073 and 83972 (one being a true pathogen, while the latter is a commensal-like strain), genes that are present in UPEC isolate CFT073 but not expressed in ABU strain 83972 can be considered as virulence factor candidates. However, most genes associated with UPEC pathogenesis were expressed in strain 83972 and up-regulated during growth in urine, e.g. all iron-related genes encoding uptake and transport of aerobactin, salmochelin, yersiniabactin and haem/haemoglobin (Table 4). Two exceptions were the ireA gene encoding an iron-regulated outer-membrane protein that was filtered absent as well as the tsx gene encoding a nucleoside-binding outer-membrane protein. Although the tsx gene has not previously been associated with UPEC virulence, it has just recently been identified together with more well-known UPEC genes as involved in movement from the intestinal tract to the bladder and vagina (i.e. occurred significantly more often in multiple-site isolates than in rectal site-only isolates) (Xie et al. 2006); furthermore, Tsx was also recently identified together with 22 other outer-membrane proteins from CFT073 cells grown under conditions mimicking the urinary tract (Hagan and Mobley 2007).

Table 4 Characteristics of ABU isolate 83972 compared with UPEC isolates CFT073, UTI89 and 536

Type IV fimbriae are assembled by the type II general secretory pathway. They occur in a wide range of species and frequently are associated with diseases. The ppdD and hofBC genes (b0106–0108), which encode type IV prepilin and are present in CFT073, EDL933 and MG1655, were filtered absent in strain 83972.

CFT073 genes present in strain 83972 but not found in other UPEC strains

The majority of the genes that are absent in the other three UPEC isolates (i.e. 536, UTI89 and F11) were filtered absent in strain 83972 as well (gaps in Fig. 4). However, there are a few exceptions where a gene that is not found in any of the other UPEC strains is filtered present in strain 83972. The aerobactin system belongs to one of the exceptions, indicating that strain 83972 is particularly well equipped with iron uptake systems. The other exceptions are all but one located on PAIs and they all encode hypothetical proteins: c1194–c1204 (on PAI-serX), c1522–c1528 (on PAI-icdA), c3394–c3396 (on PAI-metV), c3681–c3682 (on PAI-pheV where the aerobactin genes also are found) and c5372–c5382. c3394–c3396 and c5372–c5382 are not present in any of the 16 sequenced E. coli and Shigella strains represented in Fig. 4, indicating that some genes unique to CFT073 can be found in strain 83972 as well.

Discussion

Bacterial genomes are under constant change. New genes are acquired by horizontal transfer and old ones are lost by mutations. It is generally believed that commensal E. coli can become pathogenic through the acquisition of novel genes encoding virulence factors and niche-adaptation factors (Kaper et al. 2004). In contrast to organisms that have acquired genes for pathogenesis, E. coli 83972 is an example of an organism that has adapted to a commensal-like existence through gene deletions and point mutations. Using primarily the CFT073 as a scaffold, we used presence/absence data from seven sets of different gene expression profiles (in total 21 microarrays) to model the gene pool of strain 83972. Given the limitations of the approach, i.e. genes not present on the employed chip have been ignored, a substantial body of information was gathered concerning the genomic content of the strain. As it turned out the strain was highly similar to CFT073; 96% (3,959) of the genes found to be expressed on the employed microarray by 83972 are also found in CFT073, and genes on six of the seven pathogenicity islands of CFT073 were expressed by 83972; furthermore, CFT073 genes not found in any other UPEC isolate were expressed by 83972. An estimated ∼900 CFT073 genes are not expressed by 83972. Arguably, in the light of the difference in patient symptoms invoked by encounters with the two strains, this list represents virulence gene candidates.

Although strain 83972 seems to be a deconstructed uropathogen and does not provoke symptoms in the human host it grows fast in urine and is an excellent colonizer of the human bladder (Roos and Klemm 2006; Roos et al. 2006b; Klemm et al. 2007). It can do so because it has kept a large assortment of fitness factors required for this particular ecological niche. Among the genes expressed under realistic environmental conditions such as in the human bladder are candidates for fitness factor genes, e.g. the many iron acquisitions systems expressed by the strain and many genes involved in sugar acid and amino acid metabolism. Interestingly, many of the known and putative virulence factors of the urinary tract are expressed by strain 83972 and might therefore be considered as fitness factors rather than virulence factors; these include 25 of 29 positively selected UPEC genes as well as the newly characterised pks island inducing breaks in double-stranded DNA in host cells. Also, virulence-associated genes such as cdiA, mchBCDEF, flu, hcp, rfaH, sat, picU and vat were all expressed by strain 83972. Very few of the known or putative virulence factors were absent in (or not expressed by) strain 83972. The pap, fim and foc/sfa clusters encoding UPEC-class fimbriae are dysfunctional in strain 83972 and the clpB, ireA and tsx genes were not expressed in the ABU strain. These stand out as potential virulence candidates together with a number of uncharacterised genes encoding hypothetical proteins.

Thus from the analyses performed here we can make predictions about several gene categories such as potential virulence genes, fitness genes and “household-class” genes. It is also noteworthy that the information reported herein complements a potential genome sequence of strain 83972. Whole genome sequencing can identify the presence of genes but is unable to reveal if they are transcribed. Genes can be silenced not only due to lesions in the actual gene and its promoter but also due to mutations of genes encoding regulatory factors. The methodology employed in the present work reveals the active genome of strain 83972.

ABU strain 83972 is closely related to fully virulent uropathogenic strains. All evidences suggest that the strain is a deconstructed pathogen. This study dispels the commonly held idea that ABU strains are commensals that have picked up niche-adaptation genes by horizontal gene transfer. Rather, strain 83972 was originally a true pathogenic strain that has lost whole or part of operons that contribute to virulence.