Background

Insertion sequences (IS) are ubiquitous, autonomous prokaryotic transposable elements (TEs), displaying variable genomic copy numbers [1,2,3]. They have a simple organization, typically consisting of a transposase (TnpA) coding gene, flanked by two terminal inverted repeats (IR) [4]. IS have been classified into families, not only by comparing their protein sequences, but also according to their TnpA chemistry as well as their structural features, including length of terminal IR and direct repeats (DR) generated upon transposition [5]. Details on IS families can be found on ISFinder, an online database including more than 4000 elements, grouped into 26 families (https://www-is.biotoul.fr/; last Database Update: 2019-11-13; last accessed May 2020 [6];). While most elements pertain to defined families, some are still “orphans” and are designated as Not Classified Yet or “ISNCY”.

Transposase chemistry dictates the breaking and re-joining of the DNA fragment during transposition. There are four distinct types of enzymes classified by their catalytic domains. Nevertheless, one important characteristic is shared by the four groups: the hydrolysis of high-energy cofactors is not required for any of the mobility steps [7]. During transposition, a nucleophilic attack allows strand break and the formation of an active protein - DNA complex, also known as a “synaptic complex” or “transpososome”. The next step is either the duplication of the element, thus mobilizing the TE via a replicative mechanism (i.e. copy-paste [8];), or a second strand break to excise it, therefore employing a conservative one (i.e. cut-paste [9];). The first enzyme type corresponds to TnpA with an RNase H-like catalytic domain. The active site of these enzymes includes a three-residue catalytic constellation: DDE (most frequently) or DDD. Often, the catalytic triad is surrounded by conserved amino acids (aa) or amino acids sharing chemical properties, the most common being K/R residues located six/seven aa downstream of the E (DDE(N)6/7K/R). The second type contains the HUH (H for histidine and U for bulky hydrophobic residue) TnpA, active only on single strand DNA (ssDNA), and operating via a peel and paste mechanism [10]. The third and fourth types are Serine (S) and Tyrosine (Y) TnpA, respectively. These transposases both share many catalytic features with other site-specific recombinases such as invertases and resolvases [7, 11]. Interestingly, most conjugative transposons are also mobilized by these S- or Y-TnpA [12, 13].

Transposase genes are highly regulated either through intrinsic regulation at the transcriptional or translational level, or by the host itself, as suggested in the case of certain bacterial stresses [14]. In some cases, the TnpA is encoded by more than one Coding DNA Sequence (CDS) in the event of a “programmed ribosomal frameshifting”, which may occur in various forms (− 1, − 2, + 1 or + 2) depending on the motifs present in the nucleotide sequence of the gene, such as slippery codons or hairpin loops in the DNA [15, 16]. These events are apparently used by both prokaryotic and eukaryotic organisms to regulate TnpA synthesis [14, 17] whose coding genes have been found to be the most prevalent genes in nature [18]. Their density on bacterial chromosomes is generally below 3%, with some exceptions, such as Bordetella pertussis, in which a particularly an abundance of IS481 elements is found on the chromosome [19]. As for plasmids, IS density can reach up to an extreme of 40% as in the case of the Shigella flexneri plasmid pW100 [20].

Among the known IS families, that of IS982 has remained largely ignored. Although many of its members are associated with important features such as antibiotic resistance, little is known about this family, its peculiarities and mode of transposition. Therefore, the aim of this mini-review was to bring new insights into IS982 family, from its early days to the most recent discoveries. For this purpose, an extensive literary review of IS982 known elements is accompanied here by a bioinformatic approach.

IS982: discovery and known elements

Twenty-five years ago, Yu and collaborators (1995) discovered the first IS982 element in the lactose plasmid pSK11L of a Lactococcus lactis (L. lactis) strain, between the origin of replication and the opp (oligopeptide permease) gene cluster [21]. The second IS982 element, IS982B, was characterized shortly after, as an IS-like element identified in plasmid pCIT264 of L. lactis subsp. lactis biovar diacetylactis [22]. The subsequent discovery and characterization of related elements in Lactococcus strains, such as IS982C [23], led to the grouping of these IS into one family, designated as the IS982 family. This family was long thought to only consist of IS found in Lactococci (e.g. ISLgar2 and ISLgar3 [24, 25]). Currently, the IS982 family contains 70 distinct elements in the ISFinder database, from 35 bacterial and archaeal genera belonging to 12 different taxonomic groups (Table 1). Host genera were distributed among 23 Gram-negative and 10 Gram-positive bacteria, as well as 2 archaea. The most dominant bacterial groups are Firmicutes and Gamma-Proteobacteria. ISOt4 has a disrupted TnpA and was excluded from further analysis, bringing the number of analyzed elements to 69.

Table 1 List of IS982 elements from the ISFinder database; a original host in which the element was found. b Bacterial/archaeal group to which belongs the original host of the IS982 element according to LifeMap [26] and the NCBI taxonomy. c length of the IS982 element including transposase CDS, left and right inverted repeats. d length of the IS982 TnpA protein and coordinates of its CDS. e length of identical nt in the left and right IR over the total length. f Direct repeats in the element’s original host/species. * = Disrupted transposase coding gene

The size of IS982 family members ranges between 845 and 1282 bp, with a mean length of 996 bp. By convention, the left and right IR are located upstream and downstream of TnpA transcriptional unit, respectively. Their IR are between 11 and 32 identical bp. DR were either already reported in ISFinder or retrieved by manual search via nucleotide BLAST (BLAST.N) of the element against the original host/species genome sequence. When present, DR range from 2 to 10 bp (Table 1). This variation in DR length is not unusual and was previously reported in other families such as IS4 (4–13 bp, [27]). An extreme case of DR variability was also reported in the ISFinder database: IS1182 family with DR ranging between 2 and 60 bp. A multiple sequence alignment of the left and right IR (Additional Fig. S1) showed that the most conserved positions are 5′-AC(N)6T(N)5TT-3′ ends, as shown in Fig. 1. Out of 69 analyzed elements, 57 begin with “AC”, and 5 with CC.

Fig. 1
figure 1

IS982 inverted repeats left and right sequence logo, generated by WebLogo 3 [28]. The x-axis represents position of the corresponding nt. The y-axis represents bits, which indicate the maximum entropy for the given sequence type (log2 4 = 2 bits for DNA/RNA). The height of symbols within the stack reflects the relative frequency of the corresponding nt at that position

IS982 family members can be quite divergent and originate from many different species. However, some elements are isoforms, presenting high sequence identity (> 95% DNA; > 98% protein) such as IS19 and ISEfm1 from Enterococcus faecium, or IS982, IS982B and IS982C from Lactococcus lactis. While the location, copy number and potential association with host genes have been reported for several IS982 family members, some elements, such as ISDds4 and ISDds5, were simply annotated following whole genome sequencing without further information. An example of a well-described element is ISRa1, found on a Riemerella anatipestifer plasmid that contains the vapD gene, thought to encode a virulence factor, in two to twenty copies in some strains of this species [29]. Another example is ISLh1, present in multiple copies on the chromosome and on one plasmid of Lactobacillus helveticus strains [30]. For other characterized IS982 family elements, such as IS1187, ISLpl4, IS195 and ISEfm1, copy number ranged between one and ten copies, distributed between chromosomes and plasmids [31,32,33].

Transposase groups within IS982 family

As indicated above, members of the IS982 family are widely distributed within the prokaryotic world. The current host range spans from Gram-negative bacteria to archaea and includes intracellular bacteria. This wide host range, combined with pairwise TnpA sequence identities spanning from ca. 25 to 98% highlights the extent of divergence within this family. Additional analysis of pairwise distance estimation with Poisson correction between IS982 family elements, using Mega-X [34] with a ClustalW algorithm, corroborates this divergence (Additional Table S1). This estimation highlights the possibility of aa substitution in a certain position of the protein. A smaller value entails a closer relationship and less divergent sequences. To further investigate the diversity of IS982 family and the relationship between its members and their hosts, a genetic tree was constructed based on the comparison of their transposases. Following a MAFFT alignment of their protein sequences [35], a dendrogram of relationship between IS982 TnpA was constructed by a neighbor-joining method (NJ) using a JTT model [36], with a bootstrap value of 500. The tree was rooted with the clade containing the two archaeal IS982 family elements, that were the most divergent in this family following pairwise alignment.

As shown in Fig. 2, most, but not all, IS982 family elements originally found in Gram-negative or Gram-positive bacteria tend to cluster together. In some cases, clustering of elements originating from the same species (e.g. ISLhe1, ISLhe7, ISLhe5 and ISLh1 from Lactobacillus helveticus), the same genus (e.g. ISDds4, ISDds5, ISDge8 from Deinococcus spp.), or the same host group (e.g. 15 elements originating from Gamma-proteobacteria), is evident and might reflect the early presence of these elements in the evolution of their hosts. In other cases, there is a great distance between elements originating from the same species or genus. For instance, ISAba4, ISAba47, ISAba6, ISAba825 and ISAba9 from Acinetobacter baumannii (A. baumannii) strains are distant from each other.

Fig. 2
figure 2

a Dendrogram representing the relationship among the 69 IS982 transposases. Protein sequences were first aligned with MAFFT, and the relationship tree was established via neighbor joining, with a bootstrap value of 100, followed by a rooting [35]. Blue, pink and green colors refer to the IS original hosts as Gram-negative bacteria, Gram-positive bacteria and archaea, respectively; b WebLogo [28] 3 comparison of the left (top) and right (bottom) IR of each cluster; c complete IS element nt length range (bp); d transposase protein length range (aa); e Direct repeats length range (bp) and f count of bacterial and archaeal groups within each cluster

Figure 2 also shows several deep branches within the transposase tree of IS982 family elements. Setting the threshold of TnpA protein sequence identity at 35% (dotted line; Fig. 2), seven clusters (I to VII) could be identified. This division is coherent not only with the aforementioned closeness of elements from the same host species, genus or group, but also with the conservation of IR sequences within each cluster.

Many families in the ISFinder database are divided into groups or sub-families, based on TnpA protein sequence identity, IR length and ends as well as DR length [4]. IS982 family is no different, and this cluster division may allow the ISFinder team to define clear sub-families within IS982.

IS982 family transposase structure and chemistry

IS982 family elements present the typical simple IS organization of two terminal IRs flanking a transposase coding gene. The corresponding TnpA contains on average ca. 290 residues. Although most transposases originate from a single CDS, this is not always the case. For instance, ISTli1, found in the archaeon Thermococcus litoralis [37], displays two CDSs which, through a − 1 frameshifting event, will result in a single 280 aa enzyme, whose activity was not proven experimentally. Another example is that of the functional IS elements ISLpl4. Interestingly, in this case, the two CDSs were shown to be fused by a + 1 frameshifting event, the first described case of a functional + 1 frameshifting among bacterial IS at the time [38]. ISLpl4 pattern (copy number and single nucleotide polymorphisms) changed over generations of the original Lactobacillus plantarum strain, CECT 4645, indicating that this IS was active at one or multiple times in the strain’s evolution. Nonetheless, the study by De Las Rivas et al. (2005) proved the functionality of the + 1 frameshift, by using the lacZ as a reporter gene. The fusion of the frameshifting site with the reporter gene gave a low 1.5% β-galactosidase activity [35].

Although their mechanism is yet to be unraveled, previous studies pointed out that IS982 family transposases carry a DDE motif [35]. Yet, unlike other described DDE TnpA so far, they do not present a conserved K/R residue six/seven aa downstream of the catalytic glutamate, earning it the label of an atypical DDE motif [4]. However, a semi-conserved K/R residue was detected further downstream, after ca. 17 aa, just outside of the predicted DDE domain (Fig. 3). As for other DDE domains, the three catalytic acidic residues (two aspartate and a glutamate) are thought to initiate a nucleophilic attack on a phosphodiester bond of the donor DNA [7]. What follows is either replicative (copy/paste) or conservative (cut/paste) transposition to a target DNA site.

Fig. 3
figure 3

Multiple Sequence alignment of 15 randomly chosen IS982 elements from the seven clusters. Alignment was done using Mega-X via a ClustalW algorithm [34]. Only residues conserved at a minimum of 50% are highlighted. Predicted helix-turn-helix and DDE domains are indicated by orange and grey arrows. The catalytic triad DDE and the potential missing K/R residue are indicated by red and green marks, respectively

A TnpA multiple sequence alignment of 15 randomly selected IS982 elements from the seven clusters revealed several conserved aa, alongside the catalytic DDE triad, as shown in Fig. 3 (For an alignment of all elements, see additional Fig. S2). A noteworthy observation is that many conserved aa are located in regions flanked by the aspartate and glutamate of the DDE motif. IS982 family transposases possess a predicted Helix-turn-Helix motif at the N-terminal of the TnpA, acting as DNA-interacting domain with the IR sequences, upstream of the predicted DDE domain, which spans more than 60% of the protein.

No crystal structure for any IS982 family TnpA is available yet. However, secondary structure predictions using the RaptorX-Property tool [39] revealed an abundance of helical structures (ca. 40%) (data not shown). Also, although the TnpA chemistry of a catalytic triad is conserved throughout all IS982 elements, some differences may arise at the level of DNA-TnpA interaction during DNA transfer (cut/paste, copy/paste or co-integrate) among the different elements form the seven clusters due to the observed divergence TnpA protein sequence.

Exploring the genome meta-database

ISFinder is well-established database of “clean” IS elements, beyond which is a large genomic world of bacterial and archaeal strains that we set out to further explore for the presence and distribution of IS982 family members. Therefore, archaeal and bacterial protein databases were mined for possible new IS982-like elements by a protein BLAST (BLAST.P) search, using the default parameters, with a cut-off of 50% query coverage (QC) and 30% identity (ID) and a maximum target sequence number of 1000. The results were a high number of hits, from which a sample is described below. In the newly found elements, the DDE motif was highly conserved.

This approach remains limited in finding novel IS elements, since its reach is lesser than that of PSI-BLAST, which compiles BLAST. P hits as a search matrix to find more distant results. However, PSI-BLAST results, although comparable to those of BLAST. P, would require more verification, the subject of a future study. The following section is exploratory, scratching the surface of bacterial groups holding IS982-related elements.

IS982-like elements in archaea

Knowledge about the archaeal super-kingdom, an important part of Earth’s microbiome, is ever changing since new meta-genomic, meta-transcriptomic and meta-proteomic datasets, metabolic predictions and phylogenetic assessments are being derived [40, 41]. This super-kingdom is, so far, divided into four main super-phyla: Euryarchaeota, TACK, DPANN and the recently described Asgards [42] (http://lifemap-ncbi.univ-lyon1.fr/; [26]; last update: December 2019).

In 2007, a review by Filée et al. highlighted what was known about the diversity of IS in archaea at the time, undoubtedly confirming that this group of organisms is an intriguing source of TEs. Several IS families were found within the available genomes, with ISPfu3 from Pyrococcus furiosus [43, 44] being the only archaeal IS982 family member. More recently, ISTli1 was found in Thermococcus litoralis [37]. Pyrococcus and Thermococcus both belong to the Thermococcales order. As shown in Fig. 2, a large distance separates the branch holding ISPfu3 and ISTli1 from the rest of the bacterial IS982 family elements. This suggests that these elements were likely not transferred from archaea to bacteria, or vice-versa.

Protein transposases sequences of ISPfu3 and ISTli1 were used to probe the archaeal database by BLAST. P searches. Six potential new IS982 elements were found in Methanotorris formicicus (5) and Methanocaldococcus bathoardescencs (1), two Euryarchaeota species, belonging to the Methanococcales order, indicating that the IS982 family members are not restricted to the Thermococcales. Hits with the bacterial protein database did not pass the established thresholds of 50% QC and 30% ID, reinforcing the distance of IS982 family elements from archaeal or bacterial origins.

IS982-like elements in bacteria

For the bacterial IS982 family elements, five representative elements were chosen from distant branches of the IS982 transposase relationship dendrogram (Fig. 2). BLAST. P searches were conducted against the bacterial and archaeal non-redundant protein databases. Only hits with bacterial strains were found.

Elements originally from gram-negative strains

Three IS982 elements out of the five selected ones are ISPasp3 from Parachlamydia, ISFtu4 from Francisella and IS1187 from Bacteroides [31]. The original Gram-negative hosts of these three IS are classified in different bacterial groups according to the NCBI taxonomy and Lifemap [26]: Parachlamydia is a part of the PVC group (Planctomycetes, Verrucomicrobia, and Chlamydiae), Francisella belongs to the Gamma-Proteobacteria group and Bacteroides is part of the FCB group (Fibrobacteres, Chlorobi and Bacteroidetes). BLAST. P analysis showed that the resulting hits were quite diverse for each IS, as shown in Table 2. For the three elements, a total of 265 new genera containing IS982-related elements were found. Among these genera, 37.7% belong to the FCB group, 16.2% to Gamma-Proteobacteria and 14.3% to Cyanobacteria.

Table 2 Distribution of IS982-related elements based on a BLAST. P search

Elements originally from gram positive strains

As for IS originating from Gram-positive hosts, ISCef2 from Corynebacterium efficiens, a high GC Gram-positive bacteria in the Actinobacteria group [45] and ISCth1 from Clostridium thermocellum, belonging to the Firmicutes group, were considered for BLAST. P searches (Table 2). In total, 180 new genera were identified, 36.3, 19.5 and 17.9% of which are classified as high GC Gram-positive bacteria, FCB group bacteria and Firmicutes, respectively. A noteworthy observation was that ca. 42.16% of hits obtained with ISCef2 originated from Streptomyces strains.

IS982 elements, friends or foes?

IS982 family elements, like all IS, can affect the donor as well as the target site, during transposition. In the following section, an overview of the possible effects of known IS982 family members on genes present on the target site, is presented. The discussed elements and their effects are summarized in Table 3.

Table 3 Consequences of the insertion of IS982 family elements into the promotor region (A) or the coding DNA sequence (B) of antibiotic resistance or virulence genes. The effects include the complete or partial activation/increase of expression (↑) or inactivation (↓) of the corresponding gene(s)

Antibiotic resistance

The emergence of antibiotic resistant bacteria is becoming a major threat to the environment as well as the human health. IS982 family has its fair share of elements associated with antibiotic resistance genes. Such is the case of IS19 and ISEfm1 from Enterococcus faecium, inserted after transposition into the D-Alanine:D-Alanine ligase coding gene. Disruption of the corresponding ligase leads to the absence of the D-Alanine:D-Alanine precursors, and the presence of only those ending in D-Alanyl:D-Lactate. Consequently, these bacteria became resistant to vancomycin [33] and teicoplanin [46], two antibiotics that act specifically on the aforementioned D-Alanine:D-Alanine and inhibit cell wall synthesis. Another example of gene disruption associated with antibiotic resistance is that of ISAba825 from A. baumannii, whose insertion inactivates carO, a gene encoding a transmembrane protein thought to participate in the influx of carbapenem. This led to the development of A. baumannii strains resistant to carbapenem. An interesting observation was also made regarding the difference in the GC content between ISAba825 and its chromosomal insertion site, suggesting an exogenous origin of this element [47].

Another way for ISAba825 to induce carbapenem resistance is by forming a hybrid promoter and activating, directly and indirectly, the expression of the OXA-type carbapenemases coding genes (blaOXA), responsible for carbapenem and imipenem (β-lactam antibiotics) resistance [48,49,50]. Along the same lines, the blaOXA gene expression was enhanced following the insertion of ISAba4, ISAba47 and ISAba9, other identified IS982 family members, in A. baumannii [51,52,53,54].

IS1187, found in a carbapenem resistant Bacteroides fragilis strain, also induces antibiotic resistance by gene activation. CfiA is a Carbapenemase coding gene, conferring resistance to practically all β-lactams. The insertion of IS1187 upstream of this normally promoter-less gene, provided − 7 and − 33 motifs, thus forming a mobile Bacteroides promoter allowing the production of Carbapenemase [31].

Certain elements are not directly responsible for antibiotic resistance but are possibly implicated in the plasmid-mediated phenotype. This is the case of IS1592 located on pCCK13698, a 14.9 kb Pasteurella trehalosi plasmid which carries the floR gene, a florfenicol and chloramphenicol resistance gene. This plasmid is thought to be the result of several recombination events, in which IS1592 could be involved [55].

IS1599 and ISPsma1 are other elements merely associated with, but not directly causing, antibiotic resistance. The former is present on a Moraxella sp. plasmid with tetracycline resistance [56] and the latter is on a plasmid carrying five antibiotic resistance genes, pKLH80, from Psychrobacter maritimus [57]. All listed examples reflect the high implication of this family in antibiotic resistance development.

A counter effect of insertion of an IS982 family element may be antibiotic susceptibility. This was reported in Enterococcus faecium where the liaF gene, part of the LiaFSR operon, encoding stress response regulatory systems, was disrupted by an IS982 family element. The disruption of liaF led to the reversion of daptomycin resistance to hyper susceptibility in the strain [58].

Reduction of bacterial virulence

Gene disruption might also lead to changes in the cell metabolism, possibly affecting its growth and ecology. For instance, ISSa4 is responsible for the loss of the hemolytic activity of Streptococcus agalactiae (S. agalactiae). Among the 15 ISSa4 copies present in a specific strain, one was inserted in cylB, a gene encoding the membrane-spanning domain of the putative hemolysin transporter. ISSa4 could be detected only in strains isolated after 1996, which might indicate a recent acquisition of this novel insertion element by S. agalactiae [59]. Also in Streptococci, ISScr1 interrupts a paaB gene in the downstream region of the antigen I/II gene in Streptococcus cricetus [60, 61]. Antigen I/II is a key element in mediating the attachment of the bacterial cell to host components and in determining cell surface properties [62]. Another example is ISBs1 that disrupts glgB (glycogen branching enzyme) coding gene in a strain of Bacillus stearothermophilus [63]. This enzyme plays a crucial role in carbon and energy storage, therefore affecting the cell metabolism and physiology [64].

In some cases, however, certain IS982 family members cause a reduction in activity in lieu of a total loss. An example is the ISLhe1 element from L. helveticus. Its location between the lacL and lacR genes in the lactose gene cluster may account for a reduced β-galactosidase activity in this strain [65]. Another example involves IS195 found in Porphyromonas gingivalis. Its insertion within a cysteine protease coding gene led to a disruption of its Arg-X cleavage site specificity, thus a massive decrease in the virulence and infectious capacities of this strain [32].

Conclusion

In this mini-review, the unexplored IS982 family was studied by investigating its known elements, their origins as well as their structural and chemical properties. In addition, the extent of this family beyond ISFinder was demonstrated.

IS982 harbors 70 members registered in the ISFinder database. They are ca. 1 kb in length, have IR starting with conserved 5′-AC(N)6T(N)5TT-3′ ends and carry a gene encoding an RNase-H like transposase with an atypical DDE motif. Exploring the genomic and proteomic databases via protein BLAST searches showed the immense number and variety of elements this family has yet to offer, in bacteria as well as in archaea, keeping in mind the impact IS982 family members can have on antibiotic resistance or virulence, as highlighted in this study. Nevertheless, the precise mode of transposition of IS982 family members remains unknown. Therefore, an in-depth analysis must take place to uncover the detailed transposition pathway of this old family that still has much hidden.