Background

Genomes of many higher eukaryotes are known to teem with repetitive DNA elements. By contrast, bacteria are notorious for their high coding density [1], which leaves significantly less space for expansion of repeats. Repetitive elements identified in bacteria can be generally divided into coding and noncoding ones. The former is typically represented by insertion sequences and transposons, parasitic DNA elements that catalyze their own movement and replication (with help of host cell´s functions) [2]. Noncoding repeats (apart from repeated genes coding for structural RNAs) comprise several distinct types, often connected to various cellular functions. For example, short, overrepresented DNA motifs mark DNA to be taken up by natural transformation in Haemophilus and related bacteria [3]. Similarly, Chi sequences, which serve as sites of recombination initiation, are overrepresented in host genomes [4]. Repeated elements are part of sophisticated CRISPR systems, which provide defense against invading mobile elements [5]. Finally, various types of MITEs (miniature inverted repeats transposable elements), which are predicted to be derived from autonomous transposable elements, are implicated in transcription regulation and other processes [6, 7].

REP (repetitive extragenic palindrome) elements have now been known for over 30 years [8], originally from Escherichia coli and related enterobacteria [9]. They were later identified in other species, belonging predominantly to gammaproteobacteria – Pseudomonas putida[10], Pseudomonas fluorescens[11, 12], Stenotrophomonas maltophilia[13], Xanthomonas campestris and others [14], each species possessing different types of REP sequences. REPs are typically highly numerous and occur almost exclusively in intergenic regions. The definition of REP elements was recently refined [14] to reflect their common features on sequence level: a 5´-terminal conserved tetranucleotide (GTA/GG) and downstream complementary (palindromic) region with variable base composition. REP elements are mostly arranged into repeats of higher order. REPINs (REP doublet forming hairpin) are composed of two closely spaced REPs in inverted orientation [15] and were found to represent the predominant REP form in P. fluorescens[11, 15], P. putida[10] and S. maltophilia[13]. BIMEs (bacterial interspersed mosaic elements), abundant in E. coli, consist of tandemly repeated REPIN-like doublets [16]. Importantly, in E. coli, three significant proteins interact with REPs or BIMEs: integration host factor [17], DNA gyrase [18] and DNA polymerase I [19], indicating the role for these elements in major cellular processes. Furthermore, REPs were shown to modulate transcription and mRNA stability in both E. coli[20] and S. maltophilia[13]. REPs inhabit only the core parts of host genomes and are absent from laterally transferred regions [1113].

A few years ago, we described a protein family associated with REP sequences, RAYTs (REP-associated tyrosine transposases) [14]. Related to transposases of the IS200/IS605 insertion sequence family [21, 22], RAYTs carry conserved residues to perform DNA cleavage – the catalytic tyrosine and two metal-coordinating histidines. Since REP elements were found flanking RAYT genes in almost all species where they have been previously recorded, REPs were the likely substrates to be cleaved by RAYTs. The predicted REP-specific nuclease activity of E. coli RAYT was recently confirmed experimentally [23], and the crystal structure of REP/RAYT complex was solved [24]. The structure helped to elucidate the role of conserved tetranucleotide and palindrome (two defining features of REP elements) in REP recognition by RAYTs.

Owing to rapid expansion of Next-generation DNA sequencing methods, increasing numbers of new genomic sequences are reported each year. These provide great opportunity to conduct comparative analyses. We explored the distribution of REP elements and their associated RAYTs in sequenced genomes of sixty-three fluorescent pseudomonads and ten stenotrophomonads, two groups of omnipresent environmental bacteria with biotechnological and biocontrol applications [12, 25]. Our results indicate rapid diversification and proliferation of REPs in both studied groups. Furthermore, RAYTs appear to play a principal role in REP dissemination, as RAYT presence correlates with REP abundance. Our results provide support for the hypothesis that REP/RAYT system is an example of mobile element domestication.

Results and discussion

Phylogenetic relationships of studied bacteria

Our preliminary analysis of available genomes revealed that the greatest intraspecific diversity of REP elements and their associated RAYTs existed in bacteria of the Pseudomonas fluorescens complex and in Stenotrophomonas sp. (data not shown). Comprehensive mining of bacterial genomic databases recovered 63 genomes affiliated to Pseudomonas fluorescens (fluorescent pseudomonads) and 10 genomes affiliated to Stenotrophomonas maltophilia (stenotrophomonads). Among fluorescent pseudomonads, species of P. agarici, P. brassicacearum, P. chlororaphis, P. extremaustralis, P. fragi, P. fuscovaginae, P. mandelii, P. protegens, P. psychrophila, P. synxantha and P. tolaasii, previously shown to belong to the P. fluorescens complex [26], were included, as well as numerous Pseudomonas sp. isolates, unassigned to any species. For stenotrophomonads, Pseudomonas geniculata, synonym for S. maltophilia[27], was included, as well as Stenotrophomonas sp. SKA14. To resolve the evolutionary relationship between the strains, phylogenetic trees were constructed from three housekeeping genes (Figure 1, Figure 2). The phylogram of fluorescent pseudomonads revealed nine well-supported clades (A – I). The phylogram of stenotrophomonads identified three clades (A – C) and two solitary strains. The inter- and intra-clade phylogram resolution was perfect for stenotrophomonads while only partially satisfactory for fluorescent pseudomonads. This difference might be due to the effect of recombination, since P. fluorescens was shown to be naturally competent for transformation [28], whereas natural competence is unknown in S. maltophilia.

Figure 1
figure 1

Neighbor-Joining phylogram of 63 fluorescent pseudomonads. The tree was constructed from concatenated complete nucleotide sequences of gyrB, rpoB and rpoD genes. Resulting clades are marked with vertical lines to the right of corresponding strains and labeled with letters AI.

Figure 2
figure 2

Neighbor-Joining phylogram of 10 stenotrophomonads. The tree was constructed from concatenated complete nucleotide sequences of gyrB, rpoB and rpoD genes. Resulting clades are marked with vertical lines to the right of corresponding strains and labeled with letters AC.

Diversity of REP sequences and RAYTs

In the next step, the spectrum of REP elements was determined in genomes of studied strains. For this purpose, we utilized the specific association between RAYT (REP-associated tyrosine transposase) genes and REP elements. This approach (see Methods) yielded twenty-two and thirteen unique classes of REP elements in fluorescent pseudomonads (PF1 – PF22) and stenotrophomonads (SM1 – SM13), respectively (Table 1, Table 2, Additional file 1) For some REP classes, sequence ambiguities were detected when two slightly different REP sequences were associated with the same rayt gene. REPs of stenotrophomonads always contain eight or nine perfectly complementary bases, located directly adjacent to the GTA/GG tetranucleotide. In contrast, in fluorescent pseudomonads REPs, palindromes are flanked by additional two or three nucleotides on both sides and the length of the palindromes is significantly shorter (Table 1, Table 2). The majority of detected REPs occurred as close inverted doublets (REPINs), as reported previously [13, 15]. The cognate RAYTs of both bacterial groups are monophyletic (Additional file 2), suggesting that although quite diverse, they have been present in their host genomes for substantial evolutionary time. Intriguingly, several different classes of REP sequences were found to flank orthologous RAYT genes (as judged by their shared chromosomal location - synteny) between related strains in both bacterial sets. These cases were gathered into so called orthogroups. An orthogroup comprises the classes of REP elements associated with synthenic (orthologous) RAYTs. Three orthogroups were detected in stenotrophomonads and four in fluorescent pseudomonads (Table 1, Table 2), of which orthogroup IV is the most numerous and includes nine distinct REP classes (PF8 - PF16).

Table 1 Summary information on identified RAYTs and their cognate REP elements in sequenced fluorescent pseudomonads
Table 2 Summary information on identified RAYTs and their cognate REP elements in sequenced stenotrophomonads

Variability of REP copy numbers

The copy numbers of particular REP element classes were determined and compared in genomes of related bacterial strains. Table 3 and Table 4 reveal a strikingly uneven distribution of REP sequences among different hosts. High REP abundance was found to be restricted to single strain (PF1 and PF22 in P. fluorescens SBW25, SM13 in S. maltophilia PML 168), single clade (PF3 in clade B, PF4 in clade H) or several clades (PF8 in clades G and I, PF21 in clades C and H). Various other patterns in distribution can also be detected. REP numbers typically reach hundreds of occurrences of particular REP classes, and are typically more numerous in fluorescent pseudomonads. Here, in four cases, REP numbers exceed a thousand of copies per genome (PF9 in P. sp. GM48, P. sp. GM79 and P. fluorescens R124 and PF10 in P. fluorescens NZ011). Typically, several REP classes occur in a single host strain.

Table 3 The abundances of 22 REP classes in genomes of 63 sequenced fluorescent pseudomonads
Table 4 The abundances of 13 REP classes in genomes of 10 sequenced stenotrophomonads

RAYTs and REP abundance

Finally, we examined if the presence of RAYTs influenced REP abundance. In most cases, RAYTs associated with abundant REP classes were indeed present in host bacterial strains (Table 3, Table 4, Additional file 3, Additional file 4). On average, two to three RAYTs were present per strain. A maximum of four RAYTs were detected in a single host genome, and several strains contained no RAYTs at all. Sometimes, the RAYT genes contained frameshift or nonsense mutations, indicative of recent pseudogenization. Interestingly, three strains (P. fluorescens R124, P. sp. UW4 and P. sp. GM78) contained two RAYTs associated with two different REPs belonging to the same orthogroup IV. In these cases, one RAYT gene is always located at a novel chromosomal site. This indicates different evolutionary origins of these RAYTs/REPs, for example RAYT duplication followed by mutation of flanking REPs into another REP class of orthogroup IV, or horizontal acquisition and integration of RAYT gene into a novel locus.

The instances when particular REPs were overrepresented while their cognate RAYTs were absent appeared quite often. However, for a great majority of these cases, one of the following was also observed: i) related strains possessed RAYTs associated with REP sequences in question, or ii) RAYTs in given strain were associated with different REP classes, belonging to the same orthogroup (Additional file 3, Additional file 4). As for i), this might indicate loss of RAYT genes from host strain. As for ii), this was represented for example by fluorescent pseudomonads of clade D which harboured REP classes PF9, PF12 and PF13 of orthogroup IV. While multiple copies of each of these REP classes were present, RAYT associated with only one class was detected in each genome. From this, it can be inferred that original REP sequences flanking the RAYT genes have undergone mutations into another REP variants and were subsequently multiplicated, leading to the presence of both classes from the same REP orthogroup in host genomes. We will call this process an orthoswitch. Although the assumed orthoswitches occurred considerably frequently (i.e. at least once in every orthogroup, Table 1 and Table 2), we can only speculate about their molecular mechanism.

In Additional file 5, a model to explain the discrepancies between REP abundance and RAYT presence/absence is proposed. The model assumes an active role of RAYTs in REP proliferation, based on their REP-dependent nuclease activity [23] and coupling of transcription and translation in uncompartmentalized bacterial cell, allowing for preferential RAYT action on REPs that flank their encoding genes (due to their juxtaposition during RAYT expression). According to the model, only the presence of an active RAYT can support multiplication of its cognate REPs and their long-term persistence. When RAYT is inactivated by pseudogenization or completely lost from the host genome, the cognate REPs could no longer multiply, leading to their gradual degradation by mutational processes (Additional file 5A). Depending on when RAYT loss/inactivation occurred, corresponding numbers of REP elements would remain in the host chromosome. Similarly, if an orthoswitch occurred, novel REP variants associated with RAYT genes would spread, while the original REP elements would remain in the host genome and decay mutationally (Additional file 5C). Furthermore, RAYT duplication with concomitant REP diversification (which could proceed with mechanism similar to orthoswitch, see above) would lead to emergence of novel REP classes (Additional file 5B). Finally, horizontal transfer from closely or more distantly related strains might have significantly impacted the REP/RAYT diversity within the analyzed genomes. Horizontal transfer is likely to have accounted at least for the isolated occurrences of some RAYTs and their cognate REPs (for example PF3 in P. sp. GM55, see Table 3).

Conclusions

In the last decade, there has been a considerable resurgence of interest in REP elements. This was prompted by several factors, notably genomic analyses of newly sequenced bacteria which revealed novel REP elements [29], and the discovery of candidate REP mobilizers, RAYTs [14]. In this study, we aimed to assess the diversity of REP elements and RAYTs in large genomic sets of environmental bacteria – fluorescent pseudomonads and stenotrophomonads. Two previous works have already focused on the intraspecific variability of REPs [12, 13], but their authors used different, less stringent criteria for REP selection leading to a more relaxed definition of REP classes. We analyzed precisely those REP elements for which association with RAYTs was detected. In addition, our dataset was much broader than those of the two aforementioned studies [12, 13]. Our results confirm that REPs of fluorescent pseudomonads and stenotrophomonads are very diverse and dynamic. Also, REP host specificity ranges greatly: strain-specific, clade ("subspecies")-specific and species-specific REP sequences were observed (Table 3, Table 4).

Such large-scale analysis of diverse bacteria allowed us to reconstruct the evolutionary scenario for these repetitive elements and associated RAYTs. Since RAYTs of both bacterial groups are monophyletic (Additional file 2), unique original RAYTs were likely to be present in the genomes of common ancestors of fluorescent pseudomonads and stenotrophomonads, their genes flanked by ancestral REPs. Later during evolution, RAYT genes have undergone duplications and diversified to the state which is seen in more derived clades (Table 3, Table 4), with concomitant diversification of their cognate REPs. The later the novel RAYT/REP variants emerged, the more phylogenetically restricted incidence they show. Novel REP variants might also partially replace the original ones following an orthoswitch (Additional file 5). Upon RAYT pseudogenization which may be followed by RAYT loss from the host genome, proliferation of cognate REPs would cease. Although beneficial roles for REPs have been proposed (see Background), extremely high REP numbers might pose a burden to bacterial hosts, and RAYT inactivation could help keep REP numbers within range tolerable by host cell. A minority of derived strains would lose all RAYTs, leading to greatly reduced REP numbers in their genomes (Table 3).

Since the mechanisms behind REP dissemination and changeability are not known yet, our findings could provide foundations for understanding the evolution of REP element diversity and suggest possible directions for further laboratory research.

Methods

Genomic analyses

Bacterial genomic sequences were downloaded from the NCBI Genome database [30]. RAYTs were identified by performing TBLASTN search [31], using previously described Pseudomonas fluorescens and Stenotrophomonas maltophilia RAYTs [14] as query sequences. RAYTs that were not annotated were conceptually translated from corresponding DNA sequences using Transeq [32]. Identified RAYTs were checked to contain previously characterized sequence motifs peculiar to RAYTs [14]. REP sequences flanking rayt genes were identified as inverted repeats located both upstream and downstream of the genes, with characteristic REP features: conserved 5´-terminal tetranucleotide (GTA/GG) and downstream palindrome. REP copy numbers were determined using pDRAW32 [33].

Phylogenetic analyses

Concatenated complete nucleotide sequences of genes coding for RNA polymerase beta subunit (rpoB), DNA gyrase beta subunit (gyrB) and RNA polymerase sigma subunit (rpoD) as well as RAYT protein sequences were processed with MEGA5 package [34]. Sequences were aligned, trimmed of unaligned nucleotides or amino acids, and Neighbor-Joining phylograms were constructed with 1 000 bootstrap replicates.

Authors´ contributions

JN conceived the study, performed the analysis and drafted the manuscript. BS and IL supervised the work and critically read the manuscript. All authors read and approved the final manuscript.