Background

When a library of transposon (Tn) integrations is created in a bacterial genome, some insertions are not recovered in the resulting pool of mutants, either because the insertion is in an essential gene or because the gene is required in the media used to grow the bacterium. This fact has been exploited extensively to identify genes under selection when growth conditions are changed[13].

Another potential utility of such data, that is explored here, is to compare different strains, serovars, and species to reveal apparent orthologs that have very different levels of fitness in different strains. We perform the first experiments to quantitate this phenomenon in Salmonella.

We used high-throughput sequencing to determine the location of tens of thousands of integration sites of a Tn5 derivative in the genome of Salmonella enterica serovar Typhimurium strain ATCC 14028 (STM1) and in Salmonella enterica serovar Typhi Ty2 (STY1) after growth in rich media (Luria Broth). Our datasets were compared to each other and to a series of other published data on the fitness of mutations in Salmonella[46] and Escherichia coli (ECO)[710], including a previously obtained transposon profile in a separate Ty2 isolate, STY2, which differs from STY1 by having mutations in htrA aroC and aroD[3].

Differences in the selective pressure on apparent orthologs in the related genomes (STM1, STY1, STY2 and ECO) are of interest because they likely reflect differences in the systems that interact with these otherwise functionally similar genes or their products.

Results and discussion

Profiling of a library of transposon insertions in Salmonella Typhimurium and Typhi

Five independent transposon libraries were constructed in S. Typhimurium ATCC 14028 (STM1) and two in S. Typhi Ty2 (STY1), using the EZ-Tn5 < KAN-2 > Promoter Insertion Kit (Epicentre Biotechnologies) (see methods), and grown in Luria broth (Additional file1: Table S1). The genomic DNA directly adjacent to each transposon was obtained using a procedure similar to that described in Santiviago et al.[4] and sequenced as described in Additional file2 and in Additional file3: Figure S1.

We obtained a total of 16,642,379 first-strand Illumina sequencing reads of 100 bases in length. Sequences were subsequently filtered to reveal those reads that contained a complete primer including each unique barcode, followed by two bases of transposon (Tn) beyond the primer. The remainder of the sequence was mapped to the genome to determine the transposon integration site.

Each Tn integration site was generally represented by multiple reads, and these reads usually varied in the length of the Salmonella sequence due to the random DNA shearing used in the sequencing protocol. To reduce bias due to preferential PCR of some fragments, duplicate identical shear events were removed. The remaining reads for each transposon integration site were used to determine the number of different “shear events” for that transposon. This filtering resulted in the mapping of 234,152 and 53,556 Tn integration locations, with 2,827,876 and 313,585 unique shear events in STM1 and STY1, respectively. The average density of integrations into the genome was one every 20 bases with an average of about 12 shear events per site in STM1, and one every 90 bases with an average of about 6 shear events per site in STY1.

A genome-wide survey of permitted transposon integrations

The transposon libraries used in these experiments were constructed and grown in rich media. Those regions of the genome with rare or absent transposon integrations include regions that are essential or under strong selection in rich media. A sizeable subset of these regions should also be essential under all growth conditions. An example of a profile of transposon integrations is shown in Figure1, which displays a region in the STM1 genome that includes a known essential gene, priA (primosome assembly), and a gene with an essential region, ftsN (involved in cell division). Regions that were essential in Luria broth are identifiable in this plot as having no transposon integrations. Near-essential regions have a lower than average number of integration sites, usually accompanied by a lower number of shear events.

Figure 1
figure 1

Visualization of transposon integrations into an S . Typhimurium 14028 genome region. The number of different sequencing reads originating from transposons (shear events) is plotted, averaged across 500 bases. Red, positive strand; blue, negative strand. The displayed region contains one gene necessary for growth in LB (priA) and one gene where only a segment shows selection (ftsN).

Our data also provide information on the orientation of each transposon, which can be informative. For example, in Figure1, at the beginning of cytR, adjacent to ftsN, the negative strand contains far more transposons; this is the strand in which the heavily expressed antibiotic resistance marker of the transposon is oriented away from ftsN, likely making these integrations less disruptive. Many other examples of general selection and strand-specific selection are seen in Additional file4: Figure S2, which shows a plot of the frequency of transposon insertions across the entire STM1 genome. A dramatic example of strand-specific selection is seen in the ribosomal RNA operons, for example at position 290,000 in the genome (Additional file4: Figure S2). In this operon transposons are only permitted in the antisense strand, perhaps because truncated sense strand transcripts produced by the strong antibiotic resistance promoter in the transposon would disrupt ribosome assembly.

The transposon frequency analyses for all genes in STM1 (compared with known essentiality information for STY2 and ECO) and STY1 are presented in Additional file5: Table S2 and Additional file6: Table S3, respectively. Data for the htrA-aroC- and aroD- mutant STY2 were derived from two selections: a single passage on a solid medium (i) and six passages in Luria broth (ii).

Essential genes in E. coli

Between E. coli, S. Typhimurium and S. Typhi, over 60% of protein coding genes are syntenic and have over 95% amino acid sequence identity[11]. There are 339 of the approximately 4,000 genes in ECO that have been reported to be essential in at least one of two comprehensive studies, the PEC (Profiling of E. coli Chromosome) database and the Keio collection[710] (sources are summarized in Table1). Of these potentially essential genes, 277 genes are considered essential in both databases, and all 277 genes have syntenic homologs in STM1. These genes are listed in Additional file5: Table S2.

Table 1 Numbers of essential genes under laboratory conditions in relevant E. coli , S. Typhimurium and S. Typhi isolates

We ranked all Salmonella genes based on their density of transposons and the total number of shear events, and set a threshold of the 15th percentile for “highly selected” genes. Exactly 549 genes in STM1, 582 genes in STY1, and 437 genes in STY2 met these criteria. Only six of the 277 genes essential in ECO were not among these highly selected genes in STM1 (Table2): three narrowly missed the threshold (folK, yejM and trpS) and three had average amounts of transposon insertions (murB ftsN and degS) indicating that mutants were not under selection in LB in this isolate. Two other genes did not meet the “selected” threshold in our STY1 assay (yrfF, gpsA) and one gene, folA, was not found to be selected in the published STY2 data[3].

Table 2 Essential genes in E. coli that are not as strongly selected in Typhimurium or Typhi*

Visual inspection of Additional file4: Figure S2 revealed that in three of the genes that are essential in ECO but seemingly not essential in STM1, part of the respective gene was, in fact, devoid of any transposon insertions: yejM, a putative hydrolase; ftsN, which encodes a cell division protein; and murB, a UDP-N-acetylenolpyruvoylglucosamine reductase. Figure1 shows the ftsN gene as an example. These genes can apparently be disrupted in certain locations without losing their essential function. Thus, because of the high density of transposon integration data, we were able to reveal those cases where only part of the gene is essential.

The fourth of the six cases, trpS, encodes a tryptophanyl-tRNA synthetase. It may tolerate transposon insertions in STM1 and STY because of the presence of a distant paralog (trpS2), which does not exist in ECO. TrpS2 may substitute the TrpS function, although it is only 28% identical[12].

The fifth gene essential in ECO and not under strong selection in STM1 is degS, a serine endoprotease. This gene was under strong selection in STY2[3]. That strain (but not our STY1) is an htrA mutant, a paralog of degS[13]. In our STY1 data, transposon insertion into degS was somewhat diminished, but not enough to qualify the gene for the “selected” category, suggesting at least some effect of the lack of a functional HtrA in STY2. However, in STM1 there is no evidence for any degree of transposon underrepresentation in degS, and it seems likely that another paralog, as yet undefined, can perform the proteolytic activity of DegS in this strain, if needed.

Finally, folK was somewhat underrepresented in transposon insertion frequency in STM1, but not to a degree that warranted inclusion in the “selected” category.

There are three other genes that are essential in ECO and STM1, but seemed not strongly selected in our STY1 survey and/or STY2. The folA gene, involved in folate biosynthesis, was strongly selected in STM1 and STY1 but did not show as much selection in STY2, especially after six passages in LB. Lastly, yrfF and gpsA are two genes essential in ECO that did not meet the “selected” threshold in our STY1 assay. Both these genes were under strong selection in STM1 and in STY2 after six passages in LB, indicating that mutations cannot be maintained for many passages in this medium. The yrfF (igaA) gene encodes an intracellular growth attenuator protein; and gpsA encodes a NAD(P)H-dependent glycerol-3-phosphate dehydrogenase. IgaA has been described as essential in S. Typhimurium unless there are additional mutations in the RcsCDB system, because it acts as a repressor of this system[14, 15]. Expression of igaA is positively regulated by Lon and negatively modulated by Hnr (MviA) through the transcriptional regulator RpoS[16]. S. Typhi Ty2 carries a defect in the rpoS gene[17, 18], which may explain the lesser degree of selection of igaA in STY.

Genes that are not essential in E. coli but are under strong selection in Typhimurium and Typhi

The list of essential genes in ECO is stringent, generally including the inability to obtain a viable deletion mutant in rich medium. In contrast, the measure available from transposon integrations in STM1 and STY reveals genes that were under strong selection, but not necessarily essential. A group of 159 such genes that are under selection in all Salmonella (i.e. STM1, STY1 and STY2) but not essential in ECO were identified (Additional file5: Table S2). This list included many of the genes that might be expected to be under selection, such as genes encoding parts of the ribosome and its accessory proteins, as well as some genes encoding replication components. However, there were at least 14 genes that still have an unknown or a poorly understood function (ybaB, ybeD, ybeY, phoL, ycaR, ycdC, yciM, yciS, ygfZ, yhaL, yheM, yheN, wecF, and yigP). Given the conservation of these genes between Salmonella and E. coli, these are particularly interesting targets for future studies to determine their exact function.

The approximately 900 genes shared by STM1 and STY that have no synteny in ECO (Salmonella-specific genes) yielded only two strongly and consistently selected genes: STM14_5442 and STM14_2872, which both encode putative cytoplasmic proteins. However, Santiviago et al. reported successful knockout mutations in both of these genes in this same strain, so they are likely not essential[4]. However, the genes may have an effect on growth, because mutants in these genes grow poorly in competitive assays (unpublished data).

Genes under greater selection in Typhimurium than in Typhi

Genes that were under stronger selection in STM1 than in STY when the transposon libraries were grown in LB are depicted in Table3A. The most dramatic difference was in hns, with an almost equal effect on the near adjacent gene hnr. Whereas hns is essential in Salmonella unless certain second site mutations are also present[19], no selection against insertion in this gene was found in STY in this study. The most likely explanation for this phenomenon is that S. Typhi Ty2 is known to contain a mutation in the rpoS gene[17, 18]. Mutations in this gene permit second site mutations in hns to be viable[19], and hnr (mviA) is a response regulator which post-transcriptionally modulates RpoS levels[20]. Interestingly, stpA, which encodes a 53% identical paralog of hns, also showed a greater tolerance for transposons in STY than in STM1. Unlike hns, viable hnr mutants can be obtained in S. Typhimurium[21], even though this gene appears to be strongly selected in STM1. Hnr participates in RpoS stability by acting as an adaptor for degradation by the ClpXP protease[22]. Mutants in hnr show reduced growth rate because of an increased RpoS stability, which increases transcription of genes involved in growth arrest and resistance to a variety of stresses[20]. The selection against hnr mutations seen in STM1 might be due to the non-advantageous phenotype of slow cell division in these mutants when they are in competitive growth.

Table 3 Genes displaying prominent differences in selection between Typhimurium and Typhi*

At least 26 additional genes appeared to be under strong selection in STM1 but not in either STY. This class includes, among others, the putrescine ABC transporter potF; an excisionase; a cation transport regulator chaB; the L,D-carboxypeptidase A ldcA; the phosphoribosylaminoimidazole-succinocarboxamide synthase purC; and cpxP, a periplasmic repressor of the envelope stress response pathway. Viable mutants were obtained in this same Typhimurium strain for six of these genes[4] (listed in Table3). It is not yet known if the remaining 20 genes can be deleted but it is likely that most, if not all, are not essential.

Genes under greater selection in Typhi than in Typhimurium

Genes that were under greater selection in STY1 and STY2 compared to STM1 are listed in Table3B. Among the 10 genes under consistent selection in STY2 (both after passage in LB and after growth on aro-mix agar[3]) and our own STY1 data, but not in STM1, were four genes that encode hypothetical proteins and the previously mentioned degS. Other genes in this class include eutA, involved in the ethanolamine utilization pathway, the dimethyladenosine transferase ksgA and treC, a trehalose-6-phosphate hydrolase. The gene holE is an interesting example that is more strongly selected in STY than in STM1. This gene encodes the theta subunit of DNA polymerase III. The STM14_5586 protein encoded on the virulence plasmid in STM1, which is not present in STY, is a paralog that may partially substitute for holE[23].

Pseudogenes

Integrations in genes that are thought to be pseudogenes in STY and intact in STM1, or vice versa, were inspected. Of approximately 60 putative pseudogenes annotated in the S. Typhimurium 14028 genome, four showed strong selection in STM1 and are annotated as intact in S. Typhi Ty2: STM14_1358, STM14_1498.L, STM14_1778, and STM14_4596. Only one of them, STM14_1358, has an ortholog in ECO (yceQ) and, interestingly, has been reported as essential in this species. In STM14_1358 and STM14_1498.L, levels of selection in STM1 were similar to the levels in both STY. In the other two cases, there was no selection in at least one of the two STY isolates.

Of approximately 200 pseudogenes in S. Typhi Ty2, four were strongly selected in at least one of the STY: eda, a keto-hydroxyglutarate-aldolase/keto-deoxy-phosphogluconate aldolase; astA, an arginine succinyltransferase; t2152 (STM14_0843), a putative glycosyltransferase involved in cell wall biogenesis; and t3548 (STM14_4894), a putative cytoplasmic protein. The eda gene is the only case showing a strong selection in both STY1 and STY2.

In S. Typhi Ty2, an RNA-seq analysis of the transcriptome was recently published[24] and concluded that the vast majority of pseudogenes had low or undetected transcription. Only nine pseudogenes showed high levels of transcription, none of which correspond to our four strongly selected pseudogenes. A region annotated as a pseudogene and showing strong selection in any environmental condition suggests that a function is encoded in this region, whether it is a partial protein or a regulatory region.

Differences between transposon mutant libraries of two strains of Typhi

In our study, 53,556 transposon integration sites for STY1 were determined. In a previous work, 370,000 insertion sites were identified in STY2. Over 100 genes showed a difference in fitness between these two strains. Some of these differences may be attributable to the growth conditions used in the two studies. Our STY1 data are from a single LB growth passage whereas the STY2 data were from L-agar supplemented with aromatic compounds as well as from six passages in LB. Furthermore, STY2 is an attenuated strain, CVD908-htrA[3], which differs from STY1 in that it carries additional mutations in aroC aroD, and htrA[25]. Deletions in the aro genes, which encode enzymes involved in the shikimate biosynthesis pathway, render bacteria auxotrophic for the aromatic amino acids p-aminobenzoate (pABA) and 2,3-dihydroxybenzoate[26]. These aro mutations also result in the inability to produce ubiquinone and menaquinone, leading to respiration defects[27], and in defects in some components of the cell envelope[28], unless aromatic precursors are added to the medium. HtrA is a serine protease involved in the degradation of aberrant periplasmic proteins. An htrA mutant presents more susceptibility to oxidative stress than the wild type[29].

We found 17 transposons in htrA and one each in aroC and aroD in STY1. The number of transposons in htrA was close to the average random transposon insertion frequency (15.9) whereas aroC and aroD showed selection in both STY1 and STY2. The apparent selection of aroC and aroD in STY2 is explained by the fact that these genes were knocked out in this strain, resulting in a much smaller gene remnant as transposon target area. The htrA mutation may explain at least some differences between the two strains of STY, such as the selection of the periplasmic protein HlpA and the strong selection of DsbA after six passages in LB in STY2, but not in our STY1 study. In E. coli, the skp (hlpA in Salmonella) degP (htrA in STM) double mutant is lethal[30] and the dsbA degP double mutant shows reduced growth[31].

STY1 showed selection in some Salmonella Pathogenicity Island 2 (SPI-2) genes[32, 33]. Some of these were also selected in STM1 (ssaI ssaH ssaR ssaT sifB) but not in STY2. Some other SPI genes were interesting from a regulatory point of view. The hilC and rtsA SPI-1 regulators were selected in STY1, but not in STY2[34]. RtsA is encoded in an operon that also includes the similarly selected rtsB, whose product represses the master regulator of the flagellar regulon, flhDC[35].

Flagellar genes

Flagellar genes show different patterns of selection comparing growth in LB broth under aeration (STM1, STY1 and STY2) versus on LB agar (STY2). Flagella and motility are highly regulated in S. Typhimurium and involve over 60 genes integrated in a hierarchy of controlled transcription[36]. The flagellar structure consists of three components: the basal body, the hook and the filament. The basal body consists of three rings and a rod which transverses the periplasmic space. These three components are expressed coordinately, first the hook-basal body (HBB) and later the filament, and assembled via a flagellar type III secretion apparatus[37]. The flgM gene encoding the anti-sigma 28 negative regulator of the synthesis of the flagellar filament, was selected in broth media, but not on LB agar. FlgM binds to the sigma 28 factor to prevent transcription of late flagellar genes before the completion of HBB structure[38]. In contrast, flgC flgJ fliI fliK, and fliO were selected only when bacteria were grown on an agar surface. FlgC is one of the structural rod components and FlgJ is the capping protein of the rod which also possesses muramidase activity[39]. FliI is an ATPase that forms part of the flagellar type III export apparatus, although it is not essential[40]. FliK regulates the length of the hook by switching the secretion specificity from rod-hook type substrates to filament-type substrates[41]. FliO is one of the integral membrane proteins of the flagellar secretion system which seems to have a role in stabilizing another protein of this system, FliP[42]. These five proteins are components of the basal body or the flagellar export apparatus, or interact with these structures. Furthermore, fliT, which encodes a protein that acts as a repressor of flagella biosynthesis[43], was selected after six passages in LB in STY2[3]. The flagellar gene flhE showed selection on agar growth and after six passages in LB in STY2[3]. Although it is known that the lack of FlhE does not affect flagella biogenesis or swimming motility, these mutants are defective in swarming motility[44]. In agreement with these results, Wang et al. reported that flagellar genes were regulated in a surface-specific manner[45]. Overall, our data indicates selection for genes encoding inhibitors of flagellar biogenesis when bacteria are grown in LB broth, such as flgM and fliT; and selection for genes necessary for flagellar biosynthesis when bacteria are grown on agar.

Comparison of fitness data with previous efforts to identify essential genes in Salmonella

We compared our data (Additional file5: Table S2) to previous experiments that had sought to identify essential and non-essential genes in Typhimurium (Table1). We previously reported 1,023 genes that give viable mutants in rich media in the same strain of Typhimurium as used here[4]. At least 38 of these genes are “selected” in our study (among the 15% with the lowest density of transposon integrations and shear events). The differences may be attributable to measuring essentiality, which is absolute, versus fitness selection, which is relative. In another study, 257 genes were identified as potentially essential genes for in vitro growth in LB using an insertion-duplication mutagenesis (IDM) strategy based on a temperature-sensitive integration plasmid[5]. Of these 257 genes, only 104 were under strong selection in STM1 in our data. It was previously suggested that some of these putative essential genes might be non-essential[46]. It appears that IDM may give a high rate of false positives for essential genes. In another study, a prediction using metabolic reconstruction identified 144 genes that could be essential for growth of S. Typhimurium LT2 in LB, of which 71 were already known to be essential in E. coli[6]. Of the remaining 74 potentially novel essential phenotypes, 19 were under strong selection in STM1. Additionally, 57 genes were predicted to be essential only in minimal media. Six of these genes were under strong selection in rich media in STM1. Given the considerable discrepancy in the observed and predicted genes under selection, fitness data have the great potential to improve systems biology models in the future.

An extended motif for transposon integration

Tn5 transposition can occur into almost any sequence. Indeed, we were unable to find any bias in integration targeting when we used all the transposon integration data we had available and a variety of motif-finding programs (data not shown), indicating that the vast bulk of integration sites were largely randomly distributed. However, hotspots, where Tn5 integration is preferred, have been reported[47]. Goryshin et al. sequenced 198 integration sites in a plasmid and identified a short degenerate consensus palindromic motif where integrations were more frequent: a-GntYWRanC-t[48]. The dash indicates the site of cleavage of the target that is then duplicated on either side of the transposon integration. To examine whether hotspots for integration in our data might further refine this motif we filtered over 300,000 integration sites for those sites that occurred in more than one of our independent transposon libraries. Then we filtered for integration sites represented by more than the average number of total shear events. This strategy yielded 654 sites that were among the most highly preferred targets for integration. By aligning the sequences surrounding these sites, we were able to refine the originally reported motif and further extend the motif by five bases on either side, including a highly conserved C and G located two and five bases upstream of the nick in the target DNA: cGcgCa-GttYWRaaC-tGcgCg (Figure2). The opportunity for preferred interactions along the length of this 21-base target motif might stabilize a transposase-DNA pre-cleavage or cleavage complex for co-crystalization or other imaging studies.

Figure 2
figure 2

An extended motif for transposon integration hot spots. Integration sites that occurred in the same location and were overrepresented in more than one transposon library were aligned and a consensus was obtained. Base frequencies over 50% are purple, between 30 and 50% light blue, and below 16%, orange.

Conclusions

We have identified differences in the ability to tolerate transposon integrations between Salmonella Typhimurium strain 14028 and two strains of S. Typhi Ty2. We also found potential differences in essentiality of homologous genes between Salmonella and E. coli (summarized in Tables 2 and3).

Sometimes, these differences in selection can be explained by the presence of an identifiable paralog, present in one genome but not another, which can take over some or all of the functions of the mutated gene. Examples include trpS and holE. In other cases, a difference in a function encoded elsewhere in the genome differentially impacts the role of orthologs. The best example of this phenomenon in the present study is hns which is essential in STM1 but not in either STY dataset. It is known that rpoS mutants permit mutations in hns in Salmonella in some circumstances[19], and STY1 and STY2 are rpoS mutants[17, 18]. Similarly, hnr also showed a profound selection only in STM1. Although this gene is not essential in Salmonella, the presence of a functional copy of rpoS may also be the reason for this selection, because Hnr acts as a stability moderator for RpoS[22].

For some genes, such as yejM, ftsN and murB, we identified ORF segments that are essential whereas other regions of the gene can be freely interrupted by transposon insertions. Likely, these non-essential ORF regions exclude protein domains that are involved in critical functional modules of those genes.

Our experiments here involve comparisons among three Salmonella strains grown in rich media. However, the study of fitness profiles in hundreds of different strains in multiple growth conditions has the potential to reveal differences in life strategy not evident from the genome sequences alone and to contribute to understanding natural diversity. Advances in DNA sequencing and the ability to incorporate any number of different barcodes for comparison of multiple samples at the same time mean that comparative analysis of fitness among many different natural strains with different phenotypes has become practical. We speculate that as systems biology models become more refined, the fitness profiles of genomes may become useful for constraining these models.

Methods

Strains and growth conditions

The strains used in this study were Salmonella enterica serovar Typhimurium ATCC 14028 (STM1) and S. Typhi Ty2 strain JSG624 (STY1) provided by Ferric Fang (University of Washington, Seattle, WA). Bacterial cells were grown in LB medium containing 1% Bacto tryptone (Difco), 0.5% Bacto yeast extract (Becton, Dickinson and Company), and 1% NaCl, supplemented with kanamycin at 50 μg/ml, when necessary.

Construction of transposon integrations in S. Typhimurium 14028 and S. Typhi Ty2

Salmonella cells were made competent by standard methodology. Briefly, cells were grown in LB with shaking at 37°C to logarithmic phase, then washed three times with cold 10% glycerol and concentrated 250 fold in 10% glycerol. Transposome mixtures were prepared mixing 2 μl glycerol, 2 μl EZ-Tn5 < T7/KAN-2 > transposon, and 4 μl EZ-Tn5 transposase. After 3 h of incubation, 1 μl of this mixture was mixed with 50 μl of competent cells and 1 μl TypeOne restriction inhibitor. Transformation was performed at 2.5 kV using 0.2 cm electrode gap cuvettes and a Bio-Rad MicroPulser at EC2 setting. Transformed cells in each cuvette were resuspended in 1 ml of LB and incubated for 1 h at 37°C. After incubation, reaction aliquots were joined, complexity was determined by cell counts of various dilutions on LB agar containing kanamycin, and the remainder was grown overnight at 37°C in LB broth supplemented with kanamycin. Stocks of the different transposon mutant libraries containing 20% glycerol were prepared from the overnight cultures.

High-throughput sequencing of transposon insertion sites

The entire procedure is illustrated in Additional file3: Figure S1. In brief, DNA was sheared, poly(A) tailed and PCR amplified using a pair of primers, one located in the transposon and one appended to the poly(A) tail, in a manner similar to that described in Santiviago et al.[4]. Subsequently, Illumina sequencing primers were added by PCR and sequencing performed on a Genome Analyzer GAII.

Mapping of transposons to the genome

The beginning of each read primer contained a code that defined the particular transposon library used. The codes are listed in Additional file1: Table S1. The reads were sorted into seven libraries. The sequencing primers were positioned such that the first two bases beyond the primers used for PCR would be the last two bases of the transposon. Reads that included this AG sequence were retained. Most transposons were represented by many shear events. Transposons that were represented by only one or two shear events were distributed throughout the genome, including in known essential regions. Thus, these rare reads were artifactual samples of the whole genome. These sites were eliminated from further consideration.

Mapping of genes between genomes

Best hits between Typhimurium 14028, Typhimurium LT2, Typhi CT18, Typhi Ty2 and E. coli K-12 were identified by Blast searches of annotated genes against each other in the different genomes. Synteny was determined based on at least one of the two adjacent genes also being a best hit and being adjacent in other genomes. The assigned unique gene numbers differ for all the genome annotations, so all assigned gene numbers and gene symbols in these five genomes are reported in the Additional file5: Table S2 for the convenience of the reader.

Data processing for genome comparisons

Because the distribution and number of transposons is not identical between STM1 and STY1, the data were processed to express the ratio of the number of transposon integrations in each gene and intergenic regions versus the number of integrations in that region that would be expected if all the integrations were randomly distributed. The ratio of observed divided by expected numbers were then expressed as a log2. The published STY2 data[3] were recalculated to allow direct comparison.

To identify those genes that had large differences in fitness among STM1, STY1 and STY2, all log2 ratios (observed number of insertion events/expected number of insertion events) among all 3,907 genes shared between these genomes were ranked from those with the lowest density of transposons to those with the highest density. Ranks in STM1 and both STYs were subsequently subtracted to identify those genes with the biggest difference in rank order of transposon density.

The percentile rank of the number of transposons per base and the number of reads per base (shear events) were calculated for each putative ortholog in each experiment in Salmonella. We arbitrarily defined “selected” genes as those having a sum of the two percentile ranks less than 30% (i.e., averaging less than the 15th percentile).