Introduction

Rhizobia are alpha- and beta- proteobacteria that, through the establishment of symbiotic interactions with leguminous plants, are able to fix atmospheric nitrogen as ammonium. As a result of this interaction, rhizobia induce the development of specialized root structures, the nodules, where they differentiate into bacteroids with the ability to fix atmospheric N2, providing the plant with a source of nitrogen in exchange for photosynthetic carbon, mostly in the form of dicarboxylic acids [1,2,3,4,5]. Rhizobia are thus of particular interest due to their potential to improve crop yields and reduce the need for the use of synthetic nitrogen fertilizers [6].

The successful establishment of a symbiotic interaction between rhizobia and legumes is highly dependent on the availability of nitrogen sources in the soil, and on the specific strain of infecting rhizobia, being some strains highly competitive for the colonization of plant roots but poor N2 fixers [7]. The symbiotic space (i.e. the number of nodules) is limited by the host to meet only its nitrogen needs and is therefore accessed only by the most competent rhizobia. The colonization of the rhizosphere –the region of soil under the influence of plant roots– is the earliest step in the symbiotic process, and is critical during the selection of the rhizobial strains that will end up within the nodules [8, 9]. The rhizosphere is a complex and dynamic environment where the interaction between plants, microorganisms, and soil particles takes place, thus playing a crucial role in plant growth and health [10]. Nevertheless, it is not solely inhabited by rhizobia, but also by a diverse community of bacteria, fungi, and other microorganisms that compete for nutrients, but also for the access to the host plant. The colonization of the rhizosphere is therefore a complex process that involves both plant-microbe interactions and competition with other rhizospheric organisms, and is influenced by various factors, including its growth rate, motility, and production of enzymes and other metabolites [9, 11]. Understanding the mechanisms and factors that influence rhizobia ability to colonize the rhizosphere can provide some insight into the dynamics of the rhizobia-legume symbiosis, thus providing valuable information to try improving positive microbe-plant interactions.

Rhizobial genomes are very plastic, and are usually constituted by a chromosome, chromids –replicons of large size, difficult to distinguish from plasmids–, and several plasmids, some of them of large size –megaplasmids– and some smaller, generally considered as part of the accessory genome [3, 12, 13]. Most of the genes required for the establishment of the symbiotic interaction with legume plants are located in megaplasmids, or in certain cases, in chromosomal integrative and conjugative elements (ICEs) [3], and there is evidence suggesting that they actively mobilize within rhizobial populations [14]. Furthermore, rhizobial genomes contain a great number of insertion sequences (IS) [15,16,17].

Insertion sequences (ISs) are simple transposable genetic elements that can move to different locations within the bacterial genomes [18]. ISs are known to play an important evolutionary role, contributing to genome plasticity by acting as recombination hot-spots, and disrupting coding and regulatory sequences [18,19,20,21]. In rhizobia, genome architecture has been shown to be under constant modification, with replicons frequently being cointegrated and excised [22]. Moreover, the role of tandemly repeated ISs as drivers of genomic recombination events was demonstrated in artificial evolution experiments [23]. ISs can also modify the expression of genes, being well-known the mutation of Sinorhizobium meliloti 1021 expR –a LuxR family transcriptional regulator that controls the expression of the symbiotically active exopolysaccharide (EPS) EPS II– by the insertion of a ISRm2011-1 insertion sequence [24]. Beside this example, rhizobial genomes possess large amounts of partial and most likely inactive transposases suggesting that some IS insertions could have taken place in a distant ancestor. The role of recent transposition events in rhizobia has not been thoroughly explored thus far.

In this study, we used ISCompare [25] software to identify ISs that have changed their location in nearly related rhizobial strains, and analyzed the genes disrupted by these differentially located ISs (DLISs). Through this study, we have gained a better understanding of the impact of recent IS transposition events in the rhizobial lifestyle.

Materials and methods

Identification of differentially located ISs with ISCompare

ISCompare was used to compare the genomes of several rhizobial strains from the following genera and species: Bradyrhizobium diazoefficiens, Bradyrhizobium japonicum, Mesorhizobium ciceri, Mesorhizobium loti, Rhizobium etli, Rhizobium leguminosarum, and Sinorhizobium meliloti. Genome sequences were obtained from the National Center for Biotechnology Information (NCBI) assembly database [26]. The accession numbers for the sequences used in this study are compiled in Table S1.A. The sequences were downloaded in FASTA, CDS FASTA, and GBFF formats. The sequences used as IS database for ISCompare were downloaded from https://github.com/thanhleviet/Isfinder-sequences.

ISCompare [25] is a tool used to find differential located ISs between two closely related strains. For each of the analyzed species we made pairwise comparisons using a selected strain as reference and all the complete closed genomes available as of 5/Jan/2023 as target. Only the results classified by ISCompare as DLIS and with a complete IS match, were analyzed in this study. The software was run with the following parameters: E-value = 1e-10, minLength = 50, ISdiff = 50, scaffoldDiff = 20, minAlnLength = 50, surroundingLen = 500, surroundingLen2 = 500, shift = 0. The total number of DLISs located in accessory replicons –plasmids, megaplasmids– or in the main replicon –chromosome– was determined. DLIS counts were normalized by the species-average size of the main and accessory replicons –number of DLISs per 100,000 base pairs– to account for the difference in their genome sizes. For DLISs located within genomic symbiotic islands, the counts were normalized by their respective size. The statistical analysis was done with python scripts using scipy [27]. For the comparison of proportions Chi2 test was used summing the counts over all the strains within each species.

Distribution of ISs in rhizobia

A custom python script was used to count the genes and pseudogenes annotated as transposase or being part of an insertion sequence. In order to obtain this information, genbank files were processed with Biopython [28] and if the words ‘transposase’ or ‘insertion seq’ were found in a feature qualifier ‘product’, a transposase was counted. Pseudogenes were identified based on the presence of a ‘pseudo’ feature qualifier. In addition, the replicon information –transposon counts per replicon– was saved for further analysis.

DLISs functional analysis

To determine whether an IS was located within an intergenic region that could correspond to a promoter, the following analysis was done. First, the two closest CDS to a DLIS insertion site were found using the “closest” command from bedtools software [29]. When the insertion site was located within 150 nucleotides upstream of a CDS, the DLIS was considered to be interrupting a probable promoter region. This cutoff value was selected based on the average location of transcriptional start sites (TSS) in S. meliloti at -68 nucleotides [30] from the start codon − 5′-UTRs longer than 100 nt were found in 1,041 genes–, and the fact that promoter sequences are usually in the − 10 and − 35 positions from the TSS. To identify DLISs interrupting a probable operon, the two closest CDS to the insertion site were determined with bedtools “closest” command, and the orientation and a distance between these CDS was analyzed. A possible operon was considered if the two CDS were in the same orientation, and separated by a maximum of 100 nucleotides [31, 32].

In the case of DLISs inserted within coding sequences, a functional classification was made using COG (Cluster of orthologous groups database) [33]. COGs were assigned to each protein using the eggNOG-mapper web server with the eggNOG 5 database [34].

Figures were plotted using python matplotlib [35] and seaborn [36]. Whenever it was required, final figures were edited with Inkscape v 1.3 (The Inkscape team, https://inkscape.org/).

RESULTS

Analysis of ISs and DLISs distribution in rhizobia

The genomes of all Bradyrhizobium diazoefficiens, Bradyrhizobium japonicum, Mesorhizobium ciceri, Mesorhizobium loti, Rhizobium etli, Rhizobium leguminosarum, and Sinorhizobium meliloti strains containing a complete assembly level were downloaded from NCBI Assembly database [26] (accessed 5/Jan/2023) and pairwise comparisons to identify DLISs were made using ISCompare (Table S1.B). ISCompare identifies DLISs by searching all the ISs in a query genome, extracting their flanking genomic DNA sequences, and looking for them in a target genome. A DLIS is reported when those sequences are found uninterrupted by an IS in the target genome. Next the reference and target genomes are exchanged and the process is repeated [25]. For each species, a specific strain was used as reference for all the pairwise comparisons. In order to estimate the proportion of active ISs –the relation between the number of DLISs and the total number of ISs in the pair of genomes– the total number of ISs in a determined genome was estimated as the number of transposases plus the number of other genes annotated with functions related to insertion sequences (Fig. 1, Table S1.C). The estimated numbers of total ISs were in accordance with those previously reported [12, 15,16,17]. A recurrent observation is that chromosomal replicons tend to have lower number of total ISs whereas plasmids have a considerably higher amount (Fig. 1.B, Table S1.E). This does not seem to be the case for DLISs (Fig. 2.B., Table S2) which were detected with comparable frequency in plasmids (less but not significantly so) and chromosomal replicons. Of the rhizobia studied here, Sinorhizobium and Bradyrhizobium were the ones that showed the greatest number of DLISs (Fig. 2.A) which agrees with the fact that they also present the highest number of ISs in their genomes (Fig. 1.D-E, Table S1.C). Further, they also presented the highest proportion of ISs that have recently changed their location in the genome –active ISs–, calculated as the number of DLISs / (total ISs on reference strain + total ISs on target strain) (Fig. 3.A). Finally, a weak but significant correlation between the number of DLISs and the total number of ISs was observed (Fig. 3.B, R2 = 0.11; p-value = 4.552 × 10− 5).

Fig. 1
figure 1

Distribution of transposase and IS related genes. A. Normalized IS and pseudo IS counts. B. Normalized number of IS elements by replicon. C. Normalized numbers of ISs and pseudo ISs by replicon. D. Normalized number of IS elements by species and replicon. E. Normalized number of ISs and pseudo ISs by species. The number of transposases and IS related genes was estimated from the genbank files annotation using custom python scripts that looked for the terms ‘transposase’ and ‘insertion seq’ in the product descriptions of coding sequences and pseudo genes. Normalized counts were calculated as ISs / 100,000 bp. Significance was determined using the Mann-Whitney test and the normalized counts (*: 1.00e-02 < p < = 5.00e-02; **: 1.00e-03 < p < = 1.00e-02; ***: 1.00e-04 < p < = 1.00e-03; ****: p < = 1.00e-04)

Fig. 2
figure 2

Distribution of DLISs. Chromosomes vs. plasmids. A. Total DLIS counts. B. DLIS counts and normalized DLIS counts discriminated by rhizobia and replicon type. C. Proportions of DLISs in chromosomal replicons, calculated from DLIS counts. D. Proportions of DLISs in chromosomal replicons, calculated from DLIS normalized counts. DLISs were identified with ISCompare, and those with a full match for an insertion sequence and confidently identified as DLIS were analyzed. Significance was determined either using Chi2 test and the total DLIS counts in each species, or the Mann-Whitney test and the normalized DLIS counts (DLIS counts / 100,000 bp). *: 1.00e-02 < p < = 5.00e-02; **: 1.00e-03 < p < = 1.00e-02; ***: 1.00e-04 < p < = 1.00e-03; ****: p < = 1.00e-04

Fig. 3
figure 3

Proportion of recently active ISs. (A) Proportion of active ISs. The proportion of recently active ISs was estimated as the relation of DLISs over the total ISs count. Significance was determined using the Mann-Whitney test and the proportions of active ISs (DLISs / ISs). *: 1.00e-02 < p < = 5.00e-02; **: 1.00e-03 < p < = 1.00e-02; ***: 1.00e-04 < p < = 1.00e-03; ****: p < = 1.00e-04. (B) Pearson correlation of IS counts and DLIS counts

Next, the distribution of DLISs insertion sites within genes, intergenic regions and possible operons was determined. Most DLISs presented insertion sites within intergenic regions (ca. 60%, Fig. 4.A, Table S3.A), and the lowest proportion was observed for DLISs inserted within putative operons. Both Mesorhizobium species analyzed and R. etli presented a higher proportion of DLISs inserted within coding sequences (Fig. 4.B-C), although they also presented the lowest numbers of DLISs. Remarkably, half of the intergenic DLISs have the potential to interrupt promoter regions (Fig. 4.C).

Fig. 4
figure 4

Distribution of DLISs in coding and intergenic regions. A. Proportion of DLISs inserted within genes, intergenic regions and putative operons. B. Proportion of DLISs inserted within genes, intergenic regions and putative operons discriminated by rhizobial species. C. Absolute count numbers of DLISs within genes, intergenic regions, and putative promoter regions. The count of the promoter regions is a subset of the counts for intergenic regions. When a DLIS was inserted within a short intergenic region between two divergent genes it was counted twice since it could be affecting the transcription of both genes. The location of the DLIS was determined using custom python scripts and the bedtools software as described in material and methods. Significance was determined using the Mann-Whitney test and the normalized counts

For genera presenting genomic symbiotic islands (GSI) —Bradyrhizobium and Mesorhizobium [16, 37, 38]— we searched for DLISs that were inserted within the GSI in the reference genome, while absent in the corresponding target genome. We found 6 DLISs in B. japonicum USDA6, of which two were intragenic and four were intergenic, one of the latter inserted near a promoter. Also the number of normalized DLIS was higher in the GSI (0.452 ± 0.245) than in the chromosome (0.076 ± 0.036) (Table S2.C), although no significant differences were found (Wilcoxon test, p = 0.0625).

Functional distribution of genes interrupted by DLISs

To analyze the cellular functions affected by DLISs insertions, we recovered the amino acid sequences of the uninterrupted version of the proteins from the genomes that didn’t have the DLIS. All these proteins were annotated with COGs using the eggNOG-mapper web server [34]. We analyzed all the genes with an inserted DLIS in the target genomes (the uninterrupted copy located in the reference genomes, Table S3.B). A total of 510 of the identified DLISs were inserted within genes, of which 280 (55%) presented a genomic product annotation that was not related to transposases or hypothetical proteins. The overall results showed that the top COG categories –those with the greatest number of DLISs– were S (Function unknown), K (Transcription), L (Replication, recombination and repair), E (Amino Acid metabolism and transport), M (Cell wall/membrane/envelope biogenesis), T(Signal Transduction), P (Inorganic ion transport and metabolism) and Q (Secondary metabolites biosynthesis, transport and catabolism) (Fig. 5. Table 1, Table S3.C). L, S and K were also the principal categories in most rhizobia.

Fig. 5
figure 5

Distribution of COG functional categories. COGs functional categories for the genes interrupted by a DLIS were assigned with eggNOG-mapper. A. Distribution of DLIS interrupted genes by COG functional category. B. Heatmap of COG distribution per reference strain. COG categorías are as follows. -: Not assigned; A: RNA processing and modification; B: Chromatin Structure and dynamics; C: Energy production and conversion; D: Cell cycle control and mitosis; E: Amino Acid metabolism and transport, F: Nucleotide metabolism and transport, G: Carbohydrate metabolism and transport; H: Coenzyme metabolism; I: Lipid metabolism; J: Translation; K: Transcription; L: Replication and repair; M: Cell wall/membrane/envelope biogenesis; N: Cell motility; O: Post-translational modification, protein turnover, chaperone functions; P: Inorganic ion transport and metabolism; Q: Secondary metabolites biosynthesis, transport and catabolism; R: General Functional Prediction only; S: Function Unknown; T: Signal Transduction; U: Intracellular trafficking and secretion; V: Defense mechanisms: W: Extracellular structures: X: Mobilome: prophages, transposons; Y: Nuclear structure; Z: Cytoskeleton

Table 1 List of genes interrupted by a DLIS and annotated with a gene name in the reference strain

Proteins with a classification in categories S and (-) corresponded majorly to hypothetical and conserved hypothetical proteins, or to proteins with unknown function. However, 70% of the proteins in the S category were annotated in the genome with a more complete function description (putative cellulose biosynthesis, chitinase, dihydroxyacetone kinase and nitrile hydratase activities, among others; Table S3.B). The L category contained DLISs that in most cases were inserted within genes encoding for transposases, integrases and reverse transcriptases, with only a few DLISs inserted in genes annotated as involved in DNA repair, nucleases, DNA polymerases, recombinases and methylases. For the K category, the proteins interrupted by DLISs corresponded to different families of transcriptional regulators, acetyltransferase domain containing proteins, and proteins involved in plasmid partition. Particularly, NifA and NodD1 presented DLISs insertions in S. meliloti AK76 and B. japonicum USDA 6, respectively. The nifA gene of S. meliloti –a regulatory nitrogen fixation gene required for the induction of several key nif and fix genes [39]– presented an inserted DLIS in strain AK76. NifA null mutants induce white nodules in the roots of the host plant, have reduced swarming ability, and present lower levels of acyl-homoserine lactones and of extracellular proteins [40]. In B. japonicum USDA 6, NodD1, the positive regulatory protein of the nodYABC operon [41], was found to be interrupted by a DLIS. In that case, a shorter ORF is annotated in the genome, and it could be possible that its product is still functional.

Several proteins in the M (Cell wall/membrane/envelope biogenesis) COG category were interrupted by DLISs in S. meliloti strains, including three enzymes involved in the biosynthesis of lipopolysaccharide (LPS) (Tetraacyldisaccharide 4’-kinase, LpxK; Arabinose-5-phosphate isomerase, KdsD; and the polysaccharide biosynthesis protein LpsB2), a capsular polysaccharide biosynthesis export transmembrane protein (RkpI), and a choline-glycine betaine transporter. LpxK is involved in the biosynthesis of lipopolysaccharide (LPS), catalysing the sixth step in the lipid A synthesis [42], while LpsB2 is required for O-antigen biosynthesis and its mutants presented reduced motility, grew faster than the parental strain, and were more sensitive to maize benzoxazinones and polymyxin B [43, 44]. KdsD is required for the biosynthesis of 3-deoxy-d-manno-octulosonic acid (Kdo), a key sugar in the core region of LPS [45]. RkpI is a transmembrane protein involved in capsular polysaccharide (KPS) biosynthesis [46]. Both LPS and KPS have been reported to have a role in symbiosis, KPS being relevant in the early steps of the infection process, while LPS has been shown to be important in the later stages of the nodulation processes [47, 48]. In addition, the gene encoding for MacA, a membrane fusion protein of an ABC-Type efflux transporter (a type 1 secretion system that transports diverse molecules including antibiotics and peptides across the inner and outer membranes) [49, 50], was interrupted in S. meliloti Ak57.

In the E category (Amino Acid metabolism and transport), insertions were within genes encoding different enzymes annotated as arginine/lysine/ornithine decarboxylase, aspartate/tyrosine/aromatic aminotransferase (AspB), choline dehydrogenase, ethanolamine ammonia lyase (EutB), and phosphoribosyl anthranilate isomerase (Usg). AspB catalyses the transfer of an alpha amino group from aromatic amino acids, but also from aspartate, to different substrates. It was reported that it contributed, under high levels of exogenous tryptophan, to the biosynthesis of indole-acetic acid (IAA), and to nitrogen scavenging under nitrogen deprivation [51]. Also related to the IAA metabolism, nthA a gene encoding for nitrile hydratase subunit alpha was found interrupted by a DLIS in S. meliloti RM41 [52].

Other insertions that could be relevant for rhizobial fitness were observed in proteins belonging to the categories T (Signal Transduction) and P (Inorganic ion transport and metabolism). In particular DLISs insertions were observed in adenylate/guanylate cyclases, in two component systems (both in histidine kinases and response regulators), and in several Na+/H+, Mg2+/Co2+, Fe3+ and K+ transport proteins. Among these, the genes encoding for a magnesium and cobalt transport protein (CorA), a putative molybdenum transport ATP-binding ABC transporter (ModC), and a Na+/H+ antiporter (NhaA) presented DLISs insertions in S. meliloti. CorA was shown to be important for growth on glucose at 21% O2 [53], while ModC is required for nitrogen fixation on limiting levels of molybdate and was reported to be the high-affinity molybdate transporter in B. diazoefficiens [54]. In the case of NhaA, it was shown to be induced under acid conditions, and it is likely that it functions by expelling protons toward the periplasm [55]. Also in the P category, pstS, a part of the pstSCAB phosphate-specific transport operon that functions as high-affinity phosphate transporter [56], was found to be interrupted in B. diazoefficiens NK6.

In the G category (Carbohydrate metabolism and transport) DLISs were found mainly in ABC transporters including a ribose/xylose/arabinose/galactoside transporter, and a branched-chain amino acid transport system. Finally, in the C category (energy production and conversion), genes encoding for proteins annotated as succinate-semialdehyde dehydrogenase (GabD), nitrite reductase (NirB), cytochrome bd terminal oxidase, and pyruvate ferredoxin/flavodoxin oxidoreductase activities presented DLISs insertions. NirB was shown to be involved in the nitrate assimilatory pathway and to participate indirectly in NO synthesis, possibly contributing with the denitrification pathway [57], and thus, to the generation of greenhouse gases. GabD homologs are required for growth on γ-aminobutyrate (GABA) as the sole nitrogen source [58].

Other genes that were found interrupted by DLISs and could affect rhizobial fitness were classified in minority COG categories. Among them, DLISs were found interrupting acsA1 (acyl-coenzyme A synthetase, an enzyme required for growth using acetate as carbon source) [59] in S. meliloti strain RCAM1115; rhiA (the first gene of the rhizosphere-expressed genes operon, that was reported to influence nodulation) [60] in R. leguminosarum Vaf10; GshA (glutathione synthetase, involved in the biosynthesis of glutathione, which has been shown to be important for growth under defined conditions, and to play an important role in symbiosis) [61,62,63] in B. japonicum CC829; and finally, Carbonic anhydrase (CA) (an enzyme involved in the interconversion of carbon dioxide and bicarbonate, which was was hypothesized to help in the protonation of extracellular NH3 facilitating its diffusion and transport to the plant tissues) [64] in S. meliloti AK57.

Discussion

Rhizobial plasmids are known to have a larger number of ISs than chromosomes and it has been shown that they can generate gene disruptions or act as mediators of homologous recombination leading to genomic rearrangements [22]. Surprisingly, we found a greater number of DLISs in chromosomal replicons (Fig. 2.C-D). DLISs are ISs that are inserted into highly conserved regions of a bacterial genome while absent in the homologous region of another genome under comparison, and could therefore be broadly considered as recently active ISs. That DLISs tend to be more abundant in chromosomal replicons than in plasmids might indicate that a higher activity of ISs is present in the chromosomes. However, this result might also be a consequence of the inherent difficulty to confidently identify DLISs in plasmids, since they are less conserved, have a significantly higher number of pseudo ISs than chromosomal replicons (Fig. 1.C), and usually show a higher frequency of recombination events. In this work we only considered the DLISs identified by ISCompare where complete blast hits to ISs in the library were found, disregarding partial ISs which could be related to ancestral insertions, new ISs –absent in the IS database used– or recombination events, and thus the actual number of DLISs in plasmids may have been underestimated. We also observed a weak correlation of the number of identified DLISs with the Average Nucleotide Identity between the strains under comparison (Figure S1), with strains with ANI under 95% presenting in general less than 10 DLISs, and even less located in plasmids. This was expected since the plasmids from those strains are less conserved, making the identification of DLISs using ISCompare even more difficult.

Of the analyzed rhizobia genera, Sinorhizobium and Bradyrhizobium presented the highest numbers of ISs and DLISs, and also the highest proportions of active ISs (number of DLISs in relation to the total number of ISs in the reference and target strains). Furthermore, we found a weak but significant correlation (r = 0.115, p-value = 4.552 × 10− 5, Fig. 3.B) between the number of ISs and the number of DLISs. Thus, a rough estimate of the importance of DLISs in genome dynamics could be obtained from the number of ISs which are easier to quantify using bioinformatic methods. A key finding was that most rhizobial DLISs were found to be inserted within intergenic regions (Fig. 4), and almost half of them were far from the 5’-ends of coding sequences. Nevertheless, it has to be also considered that intergenic regions could encode for small ORFs or RNAs which have not yet been annotated. Furthermore, many DLISs were located in the neighborhood of promoter regions (Fig. 4.C, Figure S2). These insertions could have a role in environmental adaptation by affecting transcription. However a more precise analysis taking into account the predicted promoters, transcriptional start sites, and the presence of transcriptional terminators within the DLISs will be required to accurately assess the real number of DLISs insertions that could have an effect on transcription. In addition, our results also suggest that only a low proportion of the identified DLISs are expected to generate polar mutations by interrupting putative operons (Fig. 4.A-B), and thus knock-out several ORFs with a single transposition event.

Regarding the type of interrupted genes, approximately half of them corresponded to insertion-sequence-related genes and hypothetical proteins, mostly in COG categories L (replication and repair), S (unknown function) and - (not assigned). Using the Fitness Browser database (https://fit.genomics.lbl.gov/, accessed 8/8/2024) we analyzed the phenotype of transposon insertion mutants in all S. meliloti 1021 genes that were interrupted by a DLISs insertion in other S. meliloti strains (Table S3.B, S. meliloti 1021), and found that in in more than 80 growth conditions only 6 genes had a clearly affected phenotype (Table S3.D). These results together with the small number of DLISs per genome (an average of 25 DLISs per comparison, Table S3.A, Fig. 2.A) may indicate that, in rhizobia, mutations produced by DLISs may be of less importance as a mechanism for environmental adaptation than their role as recombination hotspots, that can ultimately lead to gene gain or loss. Nevertheless, certain strains presented DLISs inserted within genes reported to have a role in the symbiotic interaction with the host plant, or to be required for growth in certain conditions. In particular, we identified DLISs inserted in genes corresponding to enzymes involved in the biosynthesis of lipopolysaccharide (LPS), capsular polysaccharide (KPS), on the nitrate assimilatory pathway, and on several regulators, among others. The impact of these insertions in the symbiotic phenotype remains still unknown. Moreover, a homologous of Smc03123 (putative transcriptional regulator), a gene from S. meliloti 2011 involved in the competence for rhizosphere colonization at 3 days post-inoculation [9] was interrupted by a DLIS in S. meliloti USDA1157.

Competition in the rhizosphere requires rhizobia to be able to use diverse compounds as carbon and nitrogen sources. Possessing a wide repertoire of metabolic pathways and transporters, enable rhizobia to thrive and compete for the nutrients exuded to the rhizosphere by the plant host. Dynamic inactivation of genes by DLISs could play an important role, granting a faster growth capability. However, it could also be detrimental, affecting posterior steps that occur during the interaction with the plant host, or under changing environmental conditions.

Conclusion

In this work we analyzed recent transposition events in rhizobia by identifying the DLISs by means of ISCompare. Our results revealed that recent IS transposition events could play a role in adaptation either by affecting transcription (e.g. by DLISs inserted in the proximity of promoter regions) or through the disruption of coding sequences. This hypothesis is supported by the majority of DLISs being inserted near to possible promoters and transcriptional regulator regions, and by the fact that half of the intragenic DLISs were inserted within genes annotated with at least a rough functional prediction. It should be also noted that hypothetical proteins could have yet undiscovered functions that might prove to be relevant for rhizobial lifestyle either in the soil or in their symbiotic relation with legume plants. However, we found that in S. meliloti most of the genes with a DLIS appear to be non-essential. This makes evolutionary sense, since as a consequence of purifying selection DLISs will tend to be fixated more frequently in silent genomic regions. A secondary conclusion is that ISs might have a more important role in adaptation by acting as mediators of homologous recombination, allowing the dynamic interchange of information between chromosomal and plasmid replicons, and between different rhizobial species. To further characterize the role of DLISs in rhizobia, experimental evolution of single rhizobial isolates should be carried on. Through this type of experiment a clearer picture of the dynamics of IS transposition and recombination would probably be achieved.