Transcriptomics and adaptive genomics of the asymptomatic bacteriuria Escherichia coli strain 83972
- First Online:
- Cite this article as:
- Hancock, V., Seshasayee, A.S., Ussery, D.W. et al. Mol Genet Genomics (2008) 279: 523. doi:10.1007/s00438-008-0330-9
- 369 Downloads
Escherichia coli strains are the major cause of urinary tract infections in humans. Such strains can be divided into virulent, UPEC strains causing symptomatic infections, and asymptomatic, commensal-like strains causing asymptomatic bacteriuria, ABU. The best-characterized ABU strain is strain 83972. Global gene expression profiling of strain 83972 has been carried out under seven different sets of environmental conditions ranging from laboratory minimal medium to human bladders. The data reveal highly specific gene expression responses to different conditions. A number of potential fitness factors for the human urinary tract could be identified. Also, presence/absence data of the gene expression was used as an adaptive genomics tool to model the gene pool of 83972 using primarily UPEC strain CFT073 as a scaffold. In our analysis, 96% of the transcripts filtered present in strain 83972 can be found in CFT073, and genes on six of the seven pathogenicity islands were expressed in 83972. Despite the very different patient symptom profiles, the two strains seem to be very similar. Genes expressed in CFT073 but not in 83972 were identified and can be considered as virulence factor candidates. Strain 83972 is a deconstructed pathogen rather than a commensal strain that has acquired fitness properties.
KeywordsAsymptomatic bacteriuriaGlobal gene expressionMicroarrayUrinary tract infectionsVirulence factors
Urinary tract infection (UTI) is one of the most common infectious diseases in humans and a major cause of morbidity. It is estimated that 40–50% of adult healthy women have experienced at least one UTI episode (Foxman 2002). UTI can be caused either by pathogenic strains leading to symptomatic UTI or by asymptomatic bacteriuria (ABU) strains resulting in a symptom-free carriage resembling commensalism. Escherichia coli is responsible for more than 80% of all UTIs. Acute pyelonephritis is a severe acute systemic infection caused by uropathogenic E. coli (UPEC) clones with virulence genes clustered on “pathogenicity islands” (PAIs) (Eden et al. 1976; Funfstuck et al. 1986; Stenqvist et al. 1987; Orskov et al. 1988; Johnson 1991; Welch et al. 2002). Paradoxically, a large proportion of UTIs are caused by ABU E. coli. Individuals infected with ABU-class E. coli may carry high urine titres of a single E. coli strain for months or years without provoking a host response.
Escherichia coli 83972 is a prototype ABU strain and undoubtedly the best-characterised ABU-class E. coli to date. Strain 83972 was originally isolated in the 1970s from a young girl who had carried it for at least 3 years without symptoms (Lindberg et al. 1975; Andersson et al. 1991). The strain is well adapted for growth in the human urinary tract (UT) where it establishes long-term bacteriuria (Hull et al. 2000). It has been used for prophylactic purposes in numerous studies; as such it has been used as an alternative treatment in patients with recurrent UTI who are refractory to conventional therapy (Hull et al. 2000). An ongoing study on patients infected with strain 83972 has so far reported over 50 patient years with no serious side effects (Sundén et al. 2006).
ABU patients may carry a single strain for months or years, creating a condition that resembles commensalism, but with a strain that may have evolved from a pathogenic ancestor. Several lines of evidence support the notion that the ancestor of strain 83972 was a pyelonephritic UPEC strain; it belongs to the B2 clonal group, a group associated with pyelonephritis and other extra-intestinal invasive clinical syndromes such as bacteremia, prostatitis and meningitis; the strain also contains gene clusters in various stages of erosion encoding the three UPEC-class fimbriae, i.e. the fim, pap and sfa/foc clusters (Klemm et al. 2006; Roos et al. 2006a).
Several studies have investigated the virulence characteristics of uropathogenic E. coli (UPEC) isolates and ABU isolates, in order to get better understanding of how some UTI strains can cause severe disease, while others can be used prophylactically to prevent the same (Blanco et al. 1996; Dobrindt et al. 2003; Vranes et al. 2003; Johnson et al. 2005; Marrs et al. 2005). ABU strains have been shown to lack many of the virulence-associated phenotypes; many of them are nonhaemolytic, nonadherent and lack haemagglutination ability (Vranes et al. 2003). Strain 83972 lacks many of the virulence-associated phenotypes but has been shown to carry many of the virulence-associated genes, such as kps, iutA, fyuA and malX (Dobrindt et al. 2003). However, apart from the fimbrial clusters, the strain has not been sequenced and it is not known which genes it shares with other E. coli isolates, which of the genes in the E. coli “core genome” it carries and which genes it shares with other UTI isolates.
Thus far, although a number of UPEC isolates (i.e. CFT073, 536, UTI89 and F11) have been completely sequenced, no genomic sequencing of any ABU strain has been reported. Comparative genomics profiling using microarray chips designed to cover entire genomes is one strategy to obtain information about variability between different strains of the same species and indication of horizontal gene transfer (Willenbrock et al. 2006). DNA microarray-assisted functional genomics provides the global expression profile of a strain, revealing which genes are expressed under certain conditions. Global gene expression profiling of ABU strain 83972 employing the GeneChip E. coli Genome 2.0 Array (Affymetrix), containing four E. coli genomes including that of the UPEC isolate CFT073, will not only provide information regarding the up- and down-regulation of genes comparing different conditions, but will also reveal which genes are actually present (expressed) in the genome of 83972. Bacterial pathogens differ from commensals by expression of specific virulence factors such as those that mediate histological damage. Commensals, in contrast, have generally been regarded as bacteria lacking such virulence factors or other specific mechanisms for interaction with host tissues. Here we compare the global expression profiles of E. coli ABU strain 83972 grown under a number of different in vitro conditions and in three patients in order to get a representative picture of which genes are present/expressed in the genome of this asymptomatic UTI strain.
Materials and methods
Escherichia coli 83972 is a prototype ABU strain and lacks defined O and K surface antigens (Lindberg et al. 1975). It belongs to the ECOR group B2 together with many other UTI strains such as the well-characterized and virulent E. coli isolates CFT073, 536 and J96.
Growth conditions and stabilisation of RNA for microarray experiments of E. coli 83972 grown on urine agar plates
Human urine was collected from four healthy men and women volunteers who had no history of UTI or antibiotic use in the prior 2 months. The urine was pooled, filter sterilised, stored at 4°C, and used within the following day. E. coli 83972 was grown aerated in triplicates in 10 ml of human urine for 6 h. Thereafter, 100 μl of each culture was spread on urine plates (1:1 ratio of human urine and 0.9% NaCl) containing 1.5% agar. The plates were incubated at 37°C for 16 h. Subsequently, 600 μl of a 1:2 mixture of PBS and RNAprotect™ Bacteria Reagent (QIAGEN AG) was poured on the plates, mixed with the lawn of cells and incubated for 5 min at room temperature to stabilise RNA. The stabilised mixture was then centrifuged and pellets were stored at −80°C. The samples from 83972, grown exponentially in MOPS and urine, in urine biofilms and in patients (>108 CFU/ml) were all treated identically with RNAprotect Bacteria Reagent and have been described previously (Roos and Klemm 2006; Hancock and Klemm 2007).
RNA isolation and microarray hybridisation
Total RNA was isolated using the RNeasy® Mini Kit (QIAGEN AG) and on-column DNase digestion was performed using RNase-Free DNase Set (QIAGEN AG). The quality of the total RNA was examined by agarose gel electrophoresis and by measuring the absorbance at 260 and 280 nm to ensure intact high-quality RNA. Purified RNA was precipitated with ethanol and stored at −80°C until further use. Conversion of RNA (10 μg per sample) to cDNA, labelling and microarray hybridisation were performed according to the GeneChip Expression Analysis Technical Manual 701023 Rev. 4 (Affymetrix, Inc., Santa Clara, CA). GeneChip E. coli Genome 2.0 Arrays (Affymetrix) were used for hybridisation of the labelled cDNA. The microarrays were scanned using the GeneChip Scanner 3000.
The raw intensities from the microarray experiments were background corrected and quantile-normalised. All microarray data in the study were obtained from mRNA being converted to cDNA, i.e. no genomic DNA was used for hybridisation. Probe intensities were summarised to yield expression values for each probe set or gene. These calculations were performed using the implementation of GCRMA (Wu et al. 2004) in Bioconductor (Gentleman et al. 2004) (http://www.bioconductor.org, http://www.r-project.org). In order to derive a cut-off expression value for making presence/absence calls, we made use of intensities due to control probe sets with IDs beginning with AFFY. There were 96 such probes. The cut-off value was set so that only the top 1/16th of these control probes would be flagged as present. As a result 4,109 genes in the array were marked as present; the remaining genes are referred to as “absent” throughout this report, i.e. these genes could be truly absent, non-homologous or not expressed during any of the seven different growth conditions. Orthologs of all the genes in the array across E. coli K12 MG1655, E. coli O157:H7 Sakai, E. coli O157:H7 EDL933 and E. coli CFT073 were identified using bidirectional best hit BLAST.
Microarray data accession number
The supporting microarray data have been deposited in ArrayExpress (http://www.ebi.ac.uk/arrayexpress) with accession numbers E-MEXP-584 (MOPS, urine and patient arrays), E-MEXP-926 (biofilm arrays) and E-MEXP-1453 (urine-agar plate arrays).
Genes expressed in ABU E. coli 83972
The bacterial transcriptome is a dynamic entity that reflects the organism’s immediate, ongoing response to its environment. DNA microarray-assisted functional genomics provides the global expression profile of the genome. The genomic expression profiles of the urinary tract infectious E. coli isolate 83972 were analysed under several different growth conditions and in different media using the GeneChip E. coli Genome 2.0 Array (Affymetrix). This array contains approximately 10,000 probe sets for all 20,366 genes present in E. coli strains MG1655 (K-12), CFT073 (UPEC), EDL933 (EHEC) and O157:H7-Sakai (EHEC). Due to the high degree of similarity between the E. coli strains, whenever possible, a single probe set is tiled to represent the equivalent ortholog in all the four strains.
In total, there were 108 genes that were significantly changed in all six urine environments compared with MOPS. Twenty of these genes were up-regulated in all six urine conditions whereof half were related to different iron systems, i.e. iroN, fepA, fecI, iucBC, fhuA and exbD, as well as b3337 and b1995 involved in iron storage and encoding a putative haemin receptor, respectively. The other urine up-regulated genes were marA, a multiple antibiotic resistance gene, sodA, encoding superoxide dismutase, ahpC, encoding hydroperoxide reductase, b1452, c1220, c4210, lysA, rrsG, rrsH and yrbL. Most iron acquisition systems were expressed in all the six urine environments; the enterobactin, salmochelin, aerobactin, haem and sitABCD systems were all expressed in all the six urine conditions (although weaker in the urine plates). Interestingly, the fec system, which is a citrate-dependent iron uptake system found in K-12 but missing in CFT073 and other UPEC strains, was highly expressed in Pat3. Up-regulation of all these iron-uptake systems revealed that the strain has an impressive array of iron acquisition systems and all of these are active in the human bladder.
Nineteen of the top 31 highest expressed genes overall were genes involved in ribosomal synthesis. The high expression of ribosomal genes in E. coli 83972 suggests a rapid growth rate; the highest expression values were obtained in Pat1 followed by MOPS, urine and Pat2, indicating a growth rate just as fast in the patients in vivo as in exponential growth phase in a shake flask. This supports our hypothesis that the strain’s optimized growth properties in human urine explain its ability to successfully colonize the human urinary tract in the absence of functional fimbriae (Roos et al. 2006b).
Figure 1 reveals that strain 83972 almost exclusively expresses the iron uptake and transport systems in the seven CFT073 PAIs, almost none of the other genes in these islands are expressed. There are only two exceptions; c0300, located in PAI-aspV encoding a hypothetical protein, and c3686–3690, located in PAI-pheV encoding YrbH and KpsEDC. The yrbH gene belongs to the 131 genes that were recently identified as UPEC specific and it was the second highest expressed UPEC-specific gene in mice (Lloyd et al. 2007); in our samples the highest expression was found in the three patients and in MOPS. Outside the PAIs there are a few genes/gene clusters that are highly expressed in all urine samples or only in the patients. The enterobactin system was up-regulated during all urine conditions and the chu cluster (involved in haem uptake and transport) was highest up-regulated in the patients followed by in vitro urine growth. The ycdO and ycdB genes were highly expressed in the three patients; these have recently been identified to encode haemoproteins, probably involved in iron transport, induced at acidic conditions (Sturm et al. 2006).
Closeness to CFT073
Given the different growth conditions analysed, it is not unrealistic to assume that most genes present in strain 83972 would be expressed, to some extent, under at least one of these seven different conditions/environments, i.e. growth in liquid and on solid media; during exponential phase, in biofilm and during colony-forming conditions; in different growth media (human urine and minimal lab medium); as well as in vivo in three different individuals.
E. coli core genome
There is a large diversity in size of the chromosome of E. coli; in all 32 E. coli (and Shigella) genomes that have been fully sequenced, or at least with an expected coverage of greater than 99%, the size of the chromosome ranges from 4.5 to 5.6 Mbp. The genomes show a considerable amount of diversity, and the estimated size of the current pan-genome was estimated to contain 9,433 different genes (Willenbrock et al. 2008). Several studies have identified sets of “core genes” found in most E. coli genomes. However, the number of these core genes tends to decrease as the full genomic sequences of new E. coli strains become available. The size of the E. coli core genome has recently been predicted to contain 1,563 genes for an infinite number of E. coli strains, and the number of new genes predicted from each new E. coli genome that is sequenced is ∼79 (Willenbrock et al. 2008). In our analysis, 2,472 (60%) of the genes found present in strain 83972 were common in all the four E. coli genomes on the array (Fig. 3), which is well above the estimated E. coli core genome and also above the 2,241 common genes conserved among the 32 sequenced E. coli strains (Willenbrock et al. 2008). Furthermore, considering the fact that the microarray contains only four E. coli genomes, the total number of genes detected present (4,109 genes) in 83972 seems reasonable comparing the size of other sequenced UTI E. coli genomes. The genome size of strain 83972 has been reported to be 4.9±0.2 Mbp (Zdziarski et al. 2007), indicating that the strain contains roughly an additional 800 genes, not identified in the present analysis.
Of the 2,734 transcripts on the chip that are present in all the four strains represented on the microarray, 393 transcripts were below detection limit on all 21 microarrays and filtered as “absent” in strain 83972. These included 81 genes encoding hypothetical proteins. Several of the absent genes were found in clusters, many of which are involved in surface structure elements and chemotaxis. These included genes involved in flagellar biosynthesis (flgABCDEFGHIJKL, flhABE, fliACDEFGHIJKLNOPQRSTZ and motAB), curli production (csgABCEFG), colanic acid synthesis (wcaABCDEFGHI and wza) and chemotaxis (cheBRWYZ and tap). Other whole cluster of genes that were not expressed in the ABU strain but found in all the four E. coli present on the chip were hyaBCDEF (hydrogenase I), hycACD (hydrogenase 3), tauABCD (responsible for taurine uptake in E. coli) and b1500–1505 (containing the fimbrial-like genes ydeQRST), as well as the fimEAIC genes which previously have been shown to be absent in strain 83972 (Klemm et al. 2006).
UPEC-associated genes present in strain 83972
Analysis of fimbriae-encoding genes in strain 83972
No of genes
No (%) of absent genes
Putative chaperone-usher fimbrial operon
Putative chaperone-usher fimbrial operon
Putative chaperone-usher fimbrial operon
Putative chaperone-usher fimbrial operon
P fimbriae (2)a
Type 1 fimbriae
Presence of other pathogenicity islands in 83972
Analysis of presence of pathogenicity islands in strain 83972
No of genesa
PAI III CFT073
cdiA (c0345), picU (c0350)
mchBCDEF (c1227, c1229–1232), sfa/foc (c1237–c1247), iroNEDCB (c1250–c1254), ag43 (c1273)
hcp (c3391), clpB (c3392)
PAI I CFT073
hlyA (c3570), pap (c3582–c3593), iha (c3610), sat (c3619), iutA, iucDCBA (c3623–3628), ag43 (c3655), kpsTM (c3697–c3698)
PAI II CFT073
Insertion of the high pathogenicity island (HPI) of Yersinia pestis has been suggested to be one of the earliest events in the evolution of extraintestinal E. coli strains (Welch et al. 2002). The genes of HPI encoding yersiniabactin (Ybt) were all expressed in strain 83972. The HPI genes have been found up-regulated during urine biofilm growth of 83972 indicating that Ybt-mediated iron-uptake might play an important role in biofilm growth (Hancock and Klemm 2007) and a deletion mutant in the Ybt uptake receptor (FyuA) exhibits reduced biofilm formation (Hancock et al. 2008). The HPI genes have also been found up-regulated in vivo in two of the three patients (particularly in Pat2, see Fig. 1) infected with this strain (Roos and Klemm 2006).
The pks island, a recently characterised and widely spread genomic island found in, for example, meningitis strains and the uropathogenic strain CFT073, encodes a machinery for the synthesis of peptide–polyketides hybrid compounds (Nougayrede et al. 2006). The presence of the island is associated with the accumulation of double-strand DNA breaks in host cells and has genotoxic activity (Nougayrede et al. 2006). This island was expressed in strain 83972 and up-regulated in urine and in vivo (Table 4; Fig. 1). The pks island is widely distributed within E. coli phylogenetic group B2, and has been found in both pathogenic and commensal isolates; in commensal strains the cell-cycle-blocking activity might slow the turnover of the intestinal epithelium, and therefore prolong colonisation.
Presence of positively selected UPEC genes
A recent paper comparing the UPEC isolates CFT073 and UTI89 with six other finished E. coli genome sequences presented 29 genes that are under positive selection only in UPEC strains (Chen et al. 2006). These 29 genes are involved in various aspects of cell surface structure, DNA metabolism, nutrient acquisition and UTI. Of these 29 genes, 25 were filtered present in our ABU strain 83972; many of these genes are represented by more than one transcript on the array due to sequence differences among the four strains present on the array, in all cases the gene filtered present in 83972 corresponded to the CFT073 transcript. Four genes were filtered absent, agaI, yjiL, recC and yegO; they encode a putative galactosamine-6-phosphate isomerase, a hypothetical protein, exodeoxyribonuclease V gamma subunit and a hypothetical transport protein, respectively. The genes in the two COG categories that were significantly enriched in the two UPEC strains, i.e. “cell wall/membrane biogenesis” (amiA, cutE, fepE, ompC, ompF and yfaL) and “secondary metabolites biosynthesis, transport and metabolism” (entD, entF and yojI) (Chen et al. 2006), were all present in strain 83972.
Functional analysis of MG1655 transcripts of ABU E. coli 83972
Distribution of absent genes in functional categories
Amino acid transport and metabolism
Carbohydrate transport and metabolism
Cell cycle control, cell division and chromosome partitioning
Cell wall/membrane/envelope biogenesis
Coenzyme transport and metabolism
Energy production and conversion
General function prediction only
Inorganic ion transport and metabolism
Intracellular trafficking, secretion and vesicular transport
Lipid transport and metabolism
Nucleotide transport and metabolism
Posttranslational modification, protein turnover, chaperones
Replication, recombination and repair
Secondary metabolites biosynthesis, transport and catabolism
Signal transduction mechanisms
Translation, ribosomal structure and biogenesis
Not in COGs
CFT073 genes absent in strain 83972
Characteristics of ABU isolate 83972 compared with UPEC isolates CFT073, UTI89 and 536
Expression in 83972b
U, BF, Pat
U, BF, Pat
Pl, U, BF, Pat
Pl, U, BF, Pat
Pl, U, BF, Pat
U, BF, Pat
U, BF, Pat
U, BF, Pat
Type IV fimbriae are assembled by the type II general secretory pathway. They occur in a wide range of species and frequently are associated with diseases. The ppdD and hofBC genes (b0106–0108), which encode type IV prepilin and are present in CFT073, EDL933 and MG1655, were filtered absent in strain 83972.
CFT073 genes present in strain 83972 but not found in other UPEC strains
The majority of the genes that are absent in the other three UPEC isolates (i.e. 536, UTI89 and F11) were filtered absent in strain 83972 as well (gaps in Fig. 4). However, there are a few exceptions where a gene that is not found in any of the other UPEC strains is filtered present in strain 83972. The aerobactin system belongs to one of the exceptions, indicating that strain 83972 is particularly well equipped with iron uptake systems. The other exceptions are all but one located on PAIs and they all encode hypothetical proteins: c1194–c1204 (on PAI-serX), c1522–c1528 (on PAI-icdA), c3394–c3396 (on PAI-metV), c3681–c3682 (on PAI-pheV where the aerobactin genes also are found) and c5372–c5382. c3394–c3396 and c5372–c5382 are not present in any of the 16 sequenced E. coli and Shigella strains represented in Fig. 4, indicating that some genes unique to CFT073 can be found in strain 83972 as well.
Bacterial genomes are under constant change. New genes are acquired by horizontal transfer and old ones are lost by mutations. It is generally believed that commensal E. coli can become pathogenic through the acquisition of novel genes encoding virulence factors and niche-adaptation factors (Kaper et al. 2004). In contrast to organisms that have acquired genes for pathogenesis, E. coli 83972 is an example of an organism that has adapted to a commensal-like existence through gene deletions and point mutations. Using primarily the CFT073 as a scaffold, we used presence/absence data from seven sets of different gene expression profiles (in total 21 microarrays) to model the gene pool of strain 83972. Given the limitations of the approach, i.e. genes not present on the employed chip have been ignored, a substantial body of information was gathered concerning the genomic content of the strain. As it turned out the strain was highly similar to CFT073; 96% (3,959) of the genes found to be expressed on the employed microarray by 83972 are also found in CFT073, and genes on six of the seven pathogenicity islands of CFT073 were expressed by 83972; furthermore, CFT073 genes not found in any other UPEC isolate were expressed by 83972. An estimated ∼900 CFT073 genes are not expressed by 83972. Arguably, in the light of the difference in patient symptoms invoked by encounters with the two strains, this list represents virulence gene candidates.
Although strain 83972 seems to be a deconstructed uropathogen and does not provoke symptoms in the human host it grows fast in urine and is an excellent colonizer of the human bladder (Roos and Klemm 2006; Roos et al. 2006b; Klemm et al. 2007). It can do so because it has kept a large assortment of fitness factors required for this particular ecological niche. Among the genes expressed under realistic environmental conditions such as in the human bladder are candidates for fitness factor genes, e.g. the many iron acquisitions systems expressed by the strain and many genes involved in sugar acid and amino acid metabolism. Interestingly, many of the known and putative virulence factors of the urinary tract are expressed by strain 83972 and might therefore be considered as fitness factors rather than virulence factors; these include 25 of 29 positively selected UPEC genes as well as the newly characterised pks island inducing breaks in double-stranded DNA in host cells. Also, virulence-associated genes such as cdiA, mchBCDEF, flu, hcp, rfaH, sat, picU and vat were all expressed by strain 83972. Very few of the known or putative virulence factors were absent in (or not expressed by) strain 83972. The pap, fim and foc/sfa clusters encoding UPEC-class fimbriae are dysfunctional in strain 83972 and the clpB, ireA and tsx genes were not expressed in the ABU strain. These stand out as potential virulence candidates together with a number of uncharacterised genes encoding hypothetical proteins.
Thus from the analyses performed here we can make predictions about several gene categories such as potential virulence genes, fitness genes and “household-class” genes. It is also noteworthy that the information reported herein complements a potential genome sequence of strain 83972. Whole genome sequencing can identify the presence of genes but is unable to reveal if they are transcribed. Genes can be silenced not only due to lesions in the actual gene and its promoter but also due to mutations of genes encoding regulatory factors. The methodology employed in the present work reveals the active genome of strain 83972.
ABU strain 83972 is closely related to fully virulent uropathogenic strains. All evidences suggest that the strain is a deconstructed pathogen. This study dispels the commonly held idea that ABU strains are commensals that have picked up niche-adaptation genes by horizontal gene transfer. Rather, strain 83972 was originally a true pathogenic strain that has lost whole or part of operons that contribute to virulence.
This work was supported by grants from the Danish Medical Research Council (271-06-0555), Lundbeckfonden and Inlaks Foundation, India.
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.