INTRODUCTION

The first part of the review covered strategies that allow host cells to avoid recognition by phages, innate immunity mechanisms blocking early stages of infection, and systems that rely on DNA modification for self vs. non-self discrimination. Here, we will continue description of the variety of microbial antiviral systems.

CRISPR-Cas ADAPTIVE IMMUNITY SYSTEMS

In contrast to the DNA modification-based innate immunity systems, where target recognition relies on interaction of defense proteins with a predetermined sequence within a phage genome, prokaryotes also possess adaptive immunity CRISPR-Cas (Clustered Regularly Interspaced Palindromic Repeats and CRISPR-associated proteins) systems. Here, the target nucleic acid recognition is driven by annealing of the complementary RNA molecule, and the system can generate and store guides for interference with novel sequences. An ability to preserve information about previous encounters with invaders is a feature that is shared between CRISPR-Cas systems and immunity of higher eukaryotes, such as humans. Unlike the case of mammals, the CRISPR-Cas immunity is inheritable. The CRISPR-Cas system consists of a CRISPR array (the number of arrays in prokaryotic genomes varies from one to several dozens) and associated cas genes [1-3]. A CRISPR array is a cluster of short repeated genomic DNA fragments separated by unique spacer sequences, at least some of which originate from the foreign DNA. An AT-rich leader region is located in front of the CRISPR array [1]. The cas genes encode protein components of the CRISPR-Cas mechanism. CRISPR-Cas systems are responsible for two different processes: adaptation and interference. CRISPR adaptation is the process of integration of new invader-derived spacers into the CRISPR array. In the course of elemental act of CRISPR adaptation, the array is expanded by one new spacer and one repeat. The proteins responsible for CRISPR adaptation generally are homologous in all CRISPR-Cas systems. Transcription of the array leads to the formation of a pre-crRNA that is processed into short crRNAs in such a way that each crRNA contains a spacer flanked by partial repeats. A crRNA bound by Cas proteins forms an effector complex capable of specific recognition of a protospacer – a DNA or RNA sequence complementary to the spacer part of the crRNA. Protospacer recognition is followed by the degradation of the target nucleic acid molecule that contains it. The process of target recognition and destruction is called CRISPR interference (Fig. 1).

Fig. 1.
figure 1

Mechanism of CRISPR-Cas adaptive immunity in prokaryotes. a) Fragments originating from foreign DNA could be integrated into a CRISPR array in the process of CRISPR adaptation. CRISPR array is elongated by one new spacer and one repeat. The CRISPR array is then transcribed with formation of the pre-crRNA that is processed into short crRNAs so that each crRNA contains a spacer flanked by partial repeats. The cas genes code for protein components of the CRISPR interference and CRISPR adaptation complexes. CRISPR interference complex consists of crRNA bound by Cas proteins and interacts with a protospacer, i.e., a DNA sequence complementary to the sequence of the crRNA spacer. Recognition of the protospacer by CRISPR effector complex leads to degradation of the target DNA molecule. Protein composition of the interference module is variable and used as a major criterion in CRISPR-Cas system classification. CRISPR-Cas systems are subdivided into two classes, six types, and several subtypes. The two classes are distinguished based on the composition of the interference complexes: CRISPR-Cas systems of Class 1 are multi-subunit, while systems of Class 2 contain only one protein. b) In the process of primed adaptation, new spacers are preferentially selected from the DNA targeted by the CRISPR effector complexes during interference.

Diversity of CRISPR interference mechanisms. Classification of the CRISPR-Cas systems is based on the protein composition of effector complexes. According to the latest census, CRISPR-Cas systems can be subdivided into two classes, six types, and 33 subtypes [4]. Class 1 (Types I, III, and IV) systems utilize multisubunit effectors, while Class 2 (types II, V, and VI) effectors are single-subunit proteins (table). Different types of CRISPR-Cas systems are distinguished by the presence of specific “signature proteins” responsible for DNA degradation (Cas3, Csf1, Cas10, Cas9, Cpf1 and C2c2 for the Types I, IV, III, II, V, and VI, respectively [4]).

Table Diversity of interference mechanisms and classification of CRISPR-Cas systems

CRISPR-Cas systems of Class I include three types: Type I, Type III, and Type IV. The effector complexes of Type I and Type III systems have been studied in detail. Architectural similarity of the effector complexes indicates common origin of these systems [5]. Type I effector comprises a large multisubunit protein complex called Cascade, which consists of the Repeat-Associated Mysterious Protein (RAMP) subunits in Cse11:Cse22:Cas76:Cas51:Cas61 stoichiometry. Cascade binds processed 61 nucleotide-long crRNA with 32-nt spacer [6-9]. Annealing of the Cascade-bound crRNA to the complementary protospacer leads to the localized melting of the target dsDNA and formation of an R-loop – a heteroduplex between the crRNA spacer and the “target” strand of the DNA protospacer, while the “non-target” DNA strand of the protospacer is displaced and remains in a single-stranded form. The obligatory condition for the target recognition is the presence of a short two-three nucleotide-long protospacer adjacent motif (PAM) sequence located at the 3′-end of the target strand, i.e., downstream from the protospacer. The requirement of PAM safeguards CRISPR-Cas systems from attack on the cell’s own genome, as PAM is never adjacent to the spacers in the CRISPR loci. Following the R-loop formation, the Cas3 nuclease/helicase is recruited to the complex [10]. Cas3 first introduces a single-stranded break in the “non-target” protospacer strand 11-15 nucleotides downstream from the PAM, then begins to unwind and cleave the DNA in 3′-5′ direction from the PAM.

In the Type III systems, the effector has a similar helicoid structure as that of Cascade [11, 12]. However, the Type III effector recognizes not dsDNA but RNA sequences complementary to the crRNA spacers [13, 14]. Recognition of the transcribing RNA target stimulates nonspecific DNA cleavage activity of the signature nuclease Cas10 HD (Histidine-Aspartate) domain, which results in the in situ degradation of DNA in the transcription bubble [15-20]. At the same time, the cyclic oligoadenylate (cOA) synthetase Palm domain of Cas10 is activated to produce cOA secondary messenger. cOA can be sensed by the auxiliary Cas ribonucleases (e.g., Csm6/Csx1) that degrade host and viral transcripts in a sequence non-specific manner [21-23]. The Type III systems do not rely on PAM for autoimmunity prevention, since the effector complex cannot target CRISPR array or crRNA, however, crRNA encodes 8-nt long sequence tag that inhibits Cas10 activity to avoid self-DNA cleavage in the case of CRISPR array transcription from the opposite strand [17]. If the target sequence is completely complementary to the spacer and the crRNA tag, the interference does not occur.

The exact immunity mechanism of CRISPR in the Type IV systems is not yet fully understood. The signature protein of such systems is Csf1. The Type IV CRISPR–Cas have been found localized on plasmids or in prophage genomes, implying the possibility for recurrent transfer of the CRISPR–Cas machinery to and from mobile genetic elements (MGEs) [24, 25]. The Type IV CRISPR-Cas signature genes are not accompanied by the adaptation module genes [26]. This leads to a suggestion that the Type IV proteins can be involved in cellular functions unrelated to the adaptive immunity [27, 28].

The CRISPR-Cas systems of Class II include three types: Type II, Type V and Type VI. In the Type II systems, the monomeric Cas9 protein in the complex with crRNA is responsible for the target dsDNA recognition and its degradation. Cas9 possesses two nuclease domains (RuvC and HNH), and is capable of double-stranded breaks generation [29, 30]. It represents a minimal interference system, and therefore became the preferred tool in the CRISPR–Cas-based genome engineering applications [31-33]. The Type V systems are characterized by the presence of Cpf1 effector protein. The Cpf1 contains the RuvC nuclease domain, similar to Cas9, while the HNH domain is absent [34]. The Type V effector is able to destroy the target double-stranded DNA in a PAM-specific manner [34, 35], while binding of the Cpf1 to the targets also unleashes its indiscriminate single-stranded DNase activity [36]. The majority of Type V systems contain Cpf1 effector, while in the V-F subtype it is replaced by Cas14. To date, Cas14 is the smallest of known CRISPR effectors. The Cas14a is an ssDNA-targeting CRISPR endonuclease that does not require PAM for activation [37]. Some V-U subtype effectors demonstrate phylogenetic similarity to the TnpB transposases [37, 38]. The Type VI CRISPR-Cas system was bioinformatically predicted in 2015 [39]. Soon, the effector protein C2c2 from the VI-A subtype was described. The VI-A locus in Leptotrichia shahii contains only three genes (cas1, cas2, c2c2) and a CRISPR array. The C2c2 nuclease with bound crRNA forms an effector complex, which is able to cleave the single-stranded RNA molecules. In contrast to all known CRISPR nucleases, C2c2 mediates RNA cleavage by the HEPN (higher eukaryotes and prokaryotes nucleotide) domain. Mutation in the catalytic centre of the HEPN domain leads to inactivation of the effector complex, but the RNA-binding activity of the resulting protein is retained [40]. Because of its ability to bind RNA molecules in a predetermined manner, C2c2 nuclease could be used as an effective tool for RNA editing and regulation of gene expression.

Origin of the CRISPR effectors diversity and their phylogenetic relations represent an interesting question. The effector complexes of Type I and Type III systems are quite similar in structure. It is assumed that the effector complex of the Type III system is more ancient. Here, cas genes are not always associated with the CRISPR arrays and cas1-cas2 adaptation module [41]. The standalone Cas1 homologs were detected in the mobile genetic elements named casposons. The Cas9 and the Cpf1 proteins, typical for the Type II and Type V, respectively, are similar to the TnpB transposon-encoded protein and contain an RuvC endonuclease domain [42]. The protein Cas13 (Type VI system) has RNAse HEPN domains. Thus, the CRISPR-Cas systems could have evolved by adopting interference and adaptation module genes from casposons, while effector nucleases may have originated from the cellular genomes or mobile genetic elements.

CRISPR adaptation. The most conserved components among all CRISPR-Cas systems are Cas1 and Cas2, which are required at the stage of spacer acquisition [43]. As a rule, the cas1 and cas2 genes are located close to each other, and the encoded proteins form a stable complex [44, 45]. Deletion of cas1 and cas2 does not affect CRISPR interference and crRNA maturation in Type I [46-49], Type II [50, 51] and Type III [52] systems. Cas1 is an endonuclease [53, 54], which also has ability to resolve Holliday junctions. In vitro, Cas1 can promote DNA integration and recombination events [55]. Cas2 displays nuclease activity towards both, RNA and DNA, in vitro [56, 57]. However, CRISPR adaptation in vivo requires nuclease activity associated only with Cas1 [44]. The ability to assemble stable Cas1–Cas2 complex is also essential for in vivo adaptation. Mutations that disrupt complex formation in vitro interfere with the in vivo spacer acquisition [44]. During the process of new spacer incorporation, the Cas1-Cas2 complex introduces a single-strand break exactly at the leader–repeat junction in the CRISPR array by catalyzing nucleophilic attack of the 3′-OH end of the incoming spacer on the 5′-end of the first repeat. Similarly, the other strand is nicked at the first repeat–spacer junction, and the 5′-end of the repeat strand is joined to the 3′-end of the new spacer. As a result, the incorporated spacer is flanked by the single-stranded repeat sequences that are filled later due to the activity of the host repair machinery [58]. Similar intermediates are known for the transposase-mediated mobile element integration suggesting that the spacer acquisition and transposon integration reactions are mechanistically similar [59-62].

The ability of the Cas1–Cas2 adaptation complex to uptake new spacers independent from the activity of CRISPR effector complexes is known as the process of naïve adaptation. During naïve adaptation, new spacers may be acquired from the extrachromosomal DNA as well as from the host genome, and only 50% of the new spacers contain consensus PAM. Naïve adaptation is essential for subsequent targeting of unknown foreign DNA and seems to be a universal feature of all CRISPR-Cas systems. The process is known to be at least partially dependent on the activity of the host RecBCD complexes [63]. The RecBCD performs processing of the stalled replication forks and it is believed that the resulting DNA fragments may be used by the Cas1–Cas2 complex for insertion into the CRISPR array. The absence of RecBCD reduces naïve adaptation efficiency but does not stop it. Consequently, the Cas1–Cas2 complex can utilize other sources of spacers. The question of participation of other host proteins in the CRISPR adaptation and regulation of this process has only recently attracted attention of the researchers [64-67]. For example, it was shown that the DNA polymerase I is necessary for both naïve and primed adaptation (presumably to fill single-stranded repeats that arise during spacers embedding) [68]. Dorman and Bhriain hypothesized that the negative supercoiling could alter various stages of the CRISPR proteins interaction with DNA, including adaptation, expression of cas genes and CRISPR loci, and, in fact, interference [65].

The presence of PAM allows distinguishing the host genome containing a spacer in the CRISPR array and protospacer within the target molecule. However, mutations in the PAM (or seed) sequence may protect viruses from recognition and degradation by the effector complexes [30, 69-72]. Therefore, the CRISPR-Cas system should update its “memory” to avoid infection by the “escaper” phages. To achieve this goal, some types of the CRISPR-Cas systems rely on the primed adaptation – a highly efficient process of new spacers acquisition from the already “known”, previously encountered phages whose fragments have been stored in the CRISPR array as immunological memory. Primed adaptation was demonstrated in the I-E [48], I-F [49, 73], I-B [74, 75], I-C [76], I-U [77], and II-A [78] CRISPR-Cas systems. Primed adaptation leads to the highly efficient and targeted accumulation of new spacers located in cis-position to the “priming” protospacer recognized by the effector [79]. The observed efficiency of primed adaptation (measured as the number of extended CRISPR arrays in the population) is very low if the target protospacer fully matches the crRNA and contains a consensus interference-proficient PAM (AAG or ATG in the case of E. coli I-E system) [48, 80]. Efficiency of the primed adaptation is enhanced by the presence of PAM or protospacer mutations that decrease interference efficiency [30, 70, 73]. Yet, primed adaptation requires activity of Cas3 protein, suggesting functional link between the CRISPR interference and primed adaptation [71, 81]. Recent in vitro study suggests that Cascade, Cas1-Cas2 and Cas3 form a single priming complex with activity leading to the effective selection of new spacers [82]. Two alternative models have been suggested to explain such a link. One model postulates that the effectors bound to protospacers with certain PAMs assume a specific conformation that recruits adaptation machinery (Cas1 and Cas2) as well as Cas3 protein, followed by the directional scanning of the target and selection of the new spacers [83]. In contrast, the complexes formed on targets with the interference-proficient PAMs do not support Cas1-Cas2 recruitment, leading to interference only [84]. The second model postulates that the apparent difference in the efficiency of primed adaptation with different targets is a consequence of the dynamics of degradation of the less-than-optimal targets [81, 85]. Since most MGEs are able to replicate and have copy-number maintenance mechanisms of their own, competition between the attenuated CRISPR interference and copy number maintenance mechanisms could create a situation when degradation fragments of the MGE genomes are present in the cell for an extended time, allowing the presumably slower adaptation reaction to occur [86].

ARGONAUTE-MEDIATED INTERFERENCE

Argonaute proteins play a key role in the regulation of gene expression and anti-viral defense through RNA interference in eukaryotes. Members of this protein family are also widely encountered in Bacteria and Archaea [87, 88]. Functions of prokaryotic Argonautes (pAgo) are not yet fully understood, but these proteins are involved in the silencing of exogenous genetic material [89, 90]. pAgos use guide molecules to recognize nucleic acid targets, but in contrast to the CRISPR-Cas and eukaryotic Agos, they often exploit short single-stranded DNA [91], although RNA guides have also been described [92]. The 5′-end of the guide is loaded into the MID domain of pAgo, while the 3′-end interacts with the PAZ domain [93]. In vitro experiments with pAgos from different organisms demonstrated that recognition of the complementary target leads to its nucleolytic cleavage by the catalytic PIWI domain. pAgos mostly target DNA molecules [94, 95], while some may also target RNA in vitro [96-98]. However, it is not clear whether the RNA targeting might be important for the in vivo activities of pAgo. Nevertheless, all possible combinations of pAgos-mediated DNA/RNA guide-target interactions potentially exist [89, 93, 99]. In vivo, the presence of pAgo affects plasmids maintenance and inhibits transformation [92, 100]. While it is generally accepted that the pAgos are also involved in antiviral defense the only experimental evidence was recently obtained with the pAgo from Clostridium butyricum, whose heterologous expression in E. coli decreased the titers of the chronic phage M13 and lytic phage P1vir. However, the mechanisms were undefined [90]. Based on the architecture of the domains, pAgos are divided into classes and some proteins, surprisingly, contain the catalytically-inactive PIWI domain [89, 101]. Function of such variants, if any, remain to be determined.

One of the major questions associated with the pAgo interference is the mechanism of generation and the source of guide molecules, as well as the question on how self-targeting is avoided. Sequencing of the DNA guides bound to pAgo in vivo has shown that they are preferentially derived from the actively replicating or multicopy elements, including plasmids and transposons [90, 100]. The guide-independent nuclease activity, termed DNA chopping, was shown for different pAgo proteins [102, 103]. The sequence-independent plasmid chopping may generate a pool of different-size DNA fragments, and some of them may be further loaded as guides to activate more efficient sequence-dependent degradation of the complementary targets [103]. To generate guides, pAgos could target free DNA ends or replication intermediates more often present in exogenous DNA and, akin to CRISPR-Cas, cooperate with RecBCD [90]. The pAgo from Thermus thermophilus was shown to sequester guides from the replication termination regions similar to the pAgo from C. butyricum and, it is supposed to be involved in the host replication control together with the DNA gyrase by resolving catenated chromosomes [104]. It has been suggested that the DNA compaction characteristic for archaeal genomes can contribute to the self/non-self discrimination by the pAgos [102]. The RNA guides found associated with pAgo from Rhodobacter sphaeroides are presumed to be promiscuously incorporated from the degraded transcripts. Yet, the protein retains its specificity to the foreign DNA [92]. A current model of pAgo mechanism of action is presented in Fig. 2.

Fig. 2.
figure 2

Model of pAgo mechanism of action and schematic structure of the protein loaded with a guide.

The described mode of interference may not be very efficient against the quickly acting lytic phages and the pAgo defense may be specialized on controlling of the less harmful mobile elements or associated with other defense systems to enhance protection against viral infection. The fact that pAgo genes are often found coupled with the nuclease or cas genes within the defense islands supports the latter hypothesis [88, 101, 105].

INDUCED CELL DORMANCY OR SUICIDE – ABORTIVE INFECTION (Abi) AND TOXIN–ANTITOXIN (TA) SYSTEMS

In this section we will consider abortive infection (Abi) in a broad sense – as cellular responses to infection that lead to cessation of the host metabolism (bacteriostatic effect) or cell death (bactericidal effect), prior to the completion of the viral life cycle, thus preventing production of active phage particles or decreasing phage burst size [106, 107]. The Abi systems are mechanistically very diverse. Generally, they are composed of two modules (Fig. 3): one module senses phage infection and transfers the signal, and another – effector module – shuts down host metabolism and/or causes cell suicide after receiving the signal [107, 108]. It is generally accepted that the induced dormancy provides more time for other defense mechanisms to deal with infection. It is also believed that some Abi systems may be the “last resort” of defense, i.e., they activate suicidal response at the later stages of viral infection when other immune mechanisms fail. The strategy of self-elimination by the infected cell stops spread of the infection at the community level and thus benefits a clonal population [109, 110]. Some systems whose action phenotypically resembles the Abi response may directly target phages and not per se cause active cell death. However, their action may be accompanied by the cell lysis caused by the vestigial viral toxic components.

Fig. 3.
figure 3

General principle of abortive infection response, examples of effectors with various mechanisms are shown.

Abi systems. Diversity of the plasmid-encoded systems with Abi mechanism had been historically investigated in the Gram-positive lactococci [106, 111]. Amongst 23 described systems, designated from AbiA to AbiZ, the mode of action was determined only for a few. For example, AbiZ protein cooperates with the phage ϕ31 holin and lysin causing premature cell lysis [112]; AbiK exhibits untemplated DNA polymerization activity [113]; AbiA, AbiK, and AbiF are thought to inhibit replication [106, 114]; AbiB and AbiQ activities are associated with mRNA decay [115, 116]; AbiD1 might interfere with the viral DNA packaging through inhibition of phage resolving nuclease [117]; and AbiT and AbiV are thought to target expression of the late phage proteins [118, 119]. The way phage infection is sensed is not clear for most of these systems. The Abi system with a sensing module that relies on protein phosphorylation was found in Staphylococci [120]. Phosphorylation, an efficient way to amplify a signal, is often exploited by eukaryotic antiviral systems. Staphylococcal serine/threonine kinase Stk2 is activated by phage ϕNM1 protein PacK and phosphorylates multiple target host proteins thus inhibiting core metabolism [120].

Plethora of Abi mechanisms have also been described in Gram-negative E. coli [121]. The Lit and PrrC proteins encoded by cryptic prophages are specific against the T4 phage infection. The Lit protease is activated upon interaction with the conserved Gol peptide of the T4 capsid protein and arrests translation through the specific proteolytic cleavage of the translation elongation factor EF-Tu [122]. The PrrC RNAse also inhibits translation by cleaving tRNALys. PrrC interacts with the Type I Restriction–Modification system (R-M) EcoprrI restriction complex and is activated only upon restriction complex inhibition caused by the T4 encoded peptide Stp [123]. Another interesting example includes F-plasmid encoded protein PifA, which provides protection against the phage T7 [124]. This membrane-associated protein is activated by T7 gp10 or gp1.2 and causes leakage of ATP and other small molecules from the infected cell by disrupting membrane integrity [124, 125]. The λ prophage-encoded system RexAB also acts by increasing membrane permeability [126, 127]. RexA is thought to recognize the DNA-protein intermediates of viral replication complexes and stimulate the membrane-associated RexB to form an ion channel, leading to the loss of membrane potential and inhibition of the energy-dependent processes [127].

Toxin–Antitoxin (TA) based defense. Toxin–antitoxin systems are selfish elements comprising a stable toxin subunit and an unstable antitoxin. Under stress conditions, increased antitoxin degradation releases activity of the toxin, leading to the growth arrest [128, 129]. TA modules are involved in the stress response, biofilm formation and persistence (although the latter has been controversial [129]) but also may be implicated in the Abi anti-viral defense, since phage infection often interferes with the host metabolism in a way that may cause the loss of antitoxin (Fig. 4). TA modules are frequently found within the defense islands and there is an extensive domain exchange between the TA and Abi systems [130]. In fact, no clear boundary can be drawn between the Abi and TA systems, since Abi is a defense strategy, while TA is an organizational/mechanistic principle. Rather, certain Abi systems can be regarded as based on the TA mechanism, e.g., even some Abi systems discussed in the previous section can be considered as solitary toxins, while PrrC/EcoprrI can be considered as a bona fide toxin–antitoxin pair. Depending on the nature of toxin–antitoxin interaction, the TA systems are divided into 6 types, e.g., antitoxin might represent an RNA molecule that directly inhibits toxin protein (Type III) or regulates the level of toxin mRNA translation (Type I); in other types, antitoxin can be represented by a protein that inhibits toxin through the direct protein-protein interaction (Type II) or by counteracting toxin effect on the targets (Type IV) [129, 131].

Fig. 4.
figure 4

General principle of abortive infection based on the activity of TA modules. Several examples of toxin effectors are shown.

Examples of the Abi response based on TA modules include the ToxIN and RnlAB systems. ToxIN, originally identified as AbiQ in Lactococci, is frequently found in bacterial genomes, and functions as a Type III TA system, where activity of the RNAse toxin ToxN is blocked by interaction with ToxI RNA antitoxin [116, 132, 133]. RnlAB system from E. coli represents Type II TA module and protects against phage T4 infection [134]. The RnlA toxin is a stable RNase; the RnlB antitoxin is quickly degraded by host proteases. Thus, if phage infection interferes with the continuous gene expression, prevention of RnlB synthesis releases toxic activity of RnlA leading to the cellular mRNA decay [134]. RnlAB homologs are found in E. coli plasmids and the system is called LsoAB [135]. The T4 phage-encoded protein Dmd functions as an antitoxin for both of these systems [134, 135]. Many TA systems have a reversible effect and do not induce cell death. Still, temporal cessation of growth can provide phage resistance. AbiE, a Type IV TA system, is an example: AbiEii toxin transcribed from the abiE promoter does not directly interact with its antitoxin AbiEi. Instead, AbiEi binds to the promoter region and inhibits transcription of the entire TA operon [136]. AbiEii toxin belongs to the DNA polymerase β-like superfamily and exhibits nucleotidyltransferase activity [136]. Recently it was demonstrated that AbiEii homologue MenT3 from Mycobacterium tuberculosis can transfer pyrimidines to the acceptor stem of the specific tRNAs [137]. In agreement with this is the fact that overexpression of AbiE toxin in Serratia causes growth cessation and decrease in the tRNAs levels [138]. In E. coli, the Type II system MazEF interferes with phage P1 propagation [139], and the Type I TA system hok/sok reduces the burst size of T4 phage [140]. The latter system is based on the holin-like activity of the Hok toxin, while the antitoxin sok is an antisense RNA that inhibits Hok synthesis by binding to its mRNA [141]. The role of TA systems in phage defense remains controversial and poorly characterized [142, 143]. Based on the abundance of TA systems and their involvement in the Abi response in model bacteria it can be expected that TA-based phage defenses are widespread [130, 142, 144].

Retrons as immunity systems. Retrons are genetic elements that encode reverse transcriptase (RTase) and non-coding RNA (ncRNA) that is used by RTase to generate covalently linked RNA/DNA hybrids [145]. The functional role of retrons remained unknown until the recent works demonstrated that they are part of the tripartite TA systems that may be involved in phage defense through Abi [146-149]. The RTase complex with RNA/DNA hybrid is inactive under normal conditions, while phage infection causes its activation and signal transduction to the cognate toxin effectors (Fig. 5). Anti-phage activity was demonstrated for multiple retrons and mutations that affect the ncRNA secondary structure and branching site or RTase catalytic motif eliminated the defense [146, 149]. About 2000 retron systems were found within the defense islands with RTases that could be fused to or accompanied by ATPases, ribosyltransferases, and endonucleases as effector proteins [146, 149]. The Ec48 retron from E. coli “guards” the RecBCD enzyme, which is one of the main barriers for the foreign DNA uptake, and RecB inhibition by the phage-encoded proteins (e.g., Gam from phage λ or gp 5.9 of T7) activates the retron and releases activity of the cognate membrane-anchored effector that cause premature cell lysis [146]. It was shown for the Sen2 retron from S. enterica that perturbations of the DNA part of the complex activate the RcaT toxin [147, 148]. DNA degradation or methylation associated with the phage-encoded RecE endonuclease or Dam-methyltransferase activates the response, while some prophage-encoded proteins can serve as blockers of the retron activation [147].

Fig. 5.
figure 5

General principle of abortive infection based on retron elements.

Cyclic-Oligonucleotide-Based Anti-Phage Signaling Systems (CBASS). A widely encountered family of systems that rely on the synthesis of cyclic oligonucleotides to activate the Abi response has been recently described [107, 150, 151]. In the presence of CBASS phage infection triggers synthesis of a secondary messenger (cyclic GMP-AMP, cyclic triadenylate, etc.) by the cGAS/DncV-like nucleotidyltransferases (CD-NTases), which activates various effectors that induce programmed cell death (Fig. 6) [150-153]. These systems provide another link between the immunity defense of eukaryotes and prokaryotes since cyclic GMP-AMP synthase (cGAS) in animals is involved in anti-viral and inflammatory response through the cGAS-STING pathway activated by sensing of cytosolic DNA [154]. Oligonucleotide cyclase genes are found in about 10% of prokaryotic genomes, and more than half reside within the defense islands. Diversity of CBASS systems can be classified based on the composition of operon, type of the effector or produced signalling molecule [153]. The Type I systems contain only CD-NTase (oligonucleotide cyclase) and effector, while other systems also carry auxiliary components: genes with ubiquitin-associated domains in Type II, HORMA and Trip13-like domain genes in Type III, or nucleotide modification domains in rarely encountered Type IV CBASS [153]. The Type II CBASS from V. cholerae was the first to be studied experimentally [151, 155, 156]. Core of the system is composed of two components: cGAS enzyme DncV and cyclic GMP-AMP (cGAMP) sensing phospholipase CapV. These two components are sufficient to provide defense against the P1 phage, yet, protection against other phages required two additional proteins carrying E1, E2, and JAB domains, typical of ubiquitinating enzymes [151]. The sensing mechanism is not yet determined, but the cells were shown to produce cGAMP upon phage infection, which triggered CapV phospholipase-mediated disruption of the cell membrane before completion of the viral life cycle. In addition to phospholipase, the known effectors of CBASS systems include endonucleases or transmembrane domain carrying proteins [153, 157]. An interesting group of CBASS effectors contain domain homologous to the eukaryotic STING (Stimulator of Interferon Genes). In Bacteria, sensing of cyclic oligonucleotides by STING activates the coupled Toll/interleukin-1 receptor (TIR) domain of the effector, which leads to NAD+ degradation [158]. Comparative analysis of the metazoan and bacterial STING domain structures reveals its transition from the direct effector role in CBASS to the regulatory functions in the immune response of higher animals [158].

Fig. 6.
figure 6

Model of the CBASS-mediated immune response.

Many bacterial CD-NTases were shown to be inactive in vitro [152] and recent work demonstrated importance of the auxiliary proteins for their in vivo activity [159]. The HORMA-domain proteins in eukaryotes bind to the specific closure motifs in the target proteins to assemble into the signaling complexes [160]. In the Type III CBASS of E. coli and P. aeruginosa, HORMA proteins activate CD-NTase leading to cyclic triadenylate (cAAA) second messenger production, which in turn activates promiscuous endonuclease activity of the NucC effector providing defense against phage infection via the Abi mechanism [159, 161]. Without infection, activity of the system is suppressed by the Trip13-like ATPase that is thought to disassemble the CD–NTase complex with HORMA. Recognition of the specific motifs in phage proteins is believed to change HORMA conformation and activate the CD–NTase activity of the complex. Intriguingly, the NucC effector can be found as an accessory endonuclease in the Type III CRISPR/Cas systems that also rely on the cyclic oligoadenylate signaling [21, 22, 161]. Another CBASS effector that can respond to different types of cyclic oligonucleotides is the Cap4 protein from Enterobacter cloacae [162], a founding member of an entire protein family. Cap4 proteins recognize secondary messengers through the divergent SAVED domains composed of 2 CRISPR-related CARF subunits, which induces oligomerization and release of the DUF4297 endonuclease activity [162].

Wide variety of bacterial CD-NTases and coupled effectors has been described in recent years, yet, many questions remain unanswered. How is phage infection sensed by these systems? What are the functions of auxiliary proteins? Are all these systems involved only in phage defense or can some of them perform other functions? What are the costs of the CBASS genes and are there additional mechanisms to restrain their possible self-toxicity, like in the case with the Trip13-like ATPase negative regulation?

PROPHAGE-MEDIATED DEFENSE

Temperate phages can integrate their genomes into the host chromosome to form prophages [163]. Most known bacteria carry prophages, and interaction of the prophage with the host can be considered as mutualistic: since prophage survival is dependent on the host, it is beneficial for the prophage to exclude secondary infection of lysogenized cell [164]. Indeed, prophages often can carry genes associated with antiviral defense (Fig. 7a) [165-168]. The simplest way by which infection with homoimmune phages can be inhibited is expression of a repressor protein – transcription factor that regulates the switch between lytic and lysogenic life strategies of the phage. Since the repressor protein is constantly present in the lysogenic cell to inhibit expression of lytic genes of the prophage, secondary lytic infection with a phage regulated by the same or similar repressor will be inhibited [169]. Although the repressor-associated defense has a narrow specificity range systematic studies of prophages in P. aeruginosa and M. smegmatis uncovered abundant defense genes that provide heterotypic protection [165, 166].

Fig. 7.
figure 7

Defense associated with prophages and mobile genetic elements. a) Prophage encoded defense systems; b) model of the PICI induction and interference; c) model of the PLE induction and interference.

Genes that are dispensable for the prophage survival but can increase fitness of the host were named “morons” (from adding “more on” to the phenotype). Morons can affect multiple host processes including motility, antibiotic resistance, metabolism, and phage defense [170, 171]. Some prophage-encoded TA systems and the Sie mechanism that can be considered as morons were already discussed. Morons can also affect cell surface to prevent receptors recognition. For example, P. aeruginosa phages D3 and ϕ297 can alter conformation of the O-antigen subunits in lipopolysaccharide (LPS) by encoding their own O-antigen polymerase [172-174], while some Shigella and E. coli prophages may block growth of the O-antigen chain by glucosylation or acetylation [175, 176].

Prophages of mycobacteria encode different superinfection exclusion systems. For example, phages Sbash and CarolAnn carry putative TA modules with membrane-associated effectors [177, 178]. Other mycobacteriophages were shown to encode restriction endonucleases, Sie membrane proteins, or Guanosine pentaphosphate [(p)ppGpp] synthases capable of defending the host [166]. Initially found as gp29 and gp30 proteins of the phage Phrann, the system encoding (p)ppGpp synthase and its cognate inhibitor is part of a novel family of TA modules relying on alarmone signaling found in multiple prophages [166, 179]. The toxic component is an enzyme (SAS, Small Alarmone Synthetase) that synthesize ppGpp or ppApp – signaling molecules typical for stress responses associated with starvation (stringent response) that cause growth cessation [180]. The antitoxins directly bind to the synthase or degrade the signaling alarmone [179]. Another Abi system widespread in the prophages of Gram-negative bacteria consists of the BstA effector that co-localizes with the replicating DNA of target phages and interferes with the replication process by an unknown mechanism [181]. This system provides an interesting example of self-immunity avoidance in prophages: BstA is inactivated by its binding to a specific anti-BstA locus (aba) within the prophage genome and thus does not prevent phage’ own lytic infection.

“Morons” can be very different even among the closely related strains and a recent study of P2- and P4-like prophages uncovered an unprecedented variety of the compact defense systems in the specific hotspot loci of their genomes [182]. In addition to the known antiviral systems, multiple gene clusters that carry predicted defense domains (~TIR, SIR2, Nuclease, ATPase) or domains of unknown function were reported in these hotspots. Defense activity has been validated for 14 novel systems. For example, PARIS system that consists of ATPase and DUF4435 protein is suggested to provide abortive infection response triggered by the phage T7 anti-restriction protein Ocr that inhibits host R-M and BREX systems [182-184]. Mining of the diversity hotspots in other prophages might represent a valuable and simple tool for detection of novel phage defense systems.

PHAGE’S PARASITES – MOLECULAR PIRACY OF THE PICIs AND PLEs

MGE can be randomly transferred by phages during the process of generalized transduction. However, some MGEs evolved to hijack phage DNA packaging machinery and capsids to load their own genomes, a phenomenon sometimes called molecular piracy [185-188]. Phage-Inducible Chromosomal Islands (PICI) are widespread in the genomes of Gram-positive and Gram-negative bacteria. Their induction from the chromosome is dependent on the infection with the helper phages and from the host perspective they can be considered as a version of the Abi response [189, 190]. The infected cell is lysed eventually but the burst size of the superinfecting lytic phage progeny is suppressed and majority of the released particles carry the PICI genome instead of the phage genome [191]. Since the PICI elements do not undergo a lytic cycle on their own and limit the spread of helper phages, their presence can be beneficial for bacterial population.

PICI. The most studied group of PICIs was discovered in Staphylococci and is named SaPI (S. aureus pathogenicity islands). These chromosomal islands are less than 15 kbp in size; they encode an integrase, an excision protein, and replication machinery. Expression of these genes is strictly controlled by the master repressor Stl. SaPIs often carry accessory genes, including toxins and virulence factors, and their presence could contribute to the host pathogenicity [185, 192]. The life cycle of SaPI is coupled with the infection by a helper phage and sensing of the specific helper phage proteins relieves the Stl-mediated repression [193, 194]. Following activation, SaPIs could interfere with the helper phage reproduction via several different mechanisms (Fig. 7b). They facilitate assembly of the small capsids capable of accommodating the SaPI genome but excluding the larger genome of the helper phage [195]. SaPIs can also interfere with the viral genome packaging by inhibiting the small terminase subunit TerS, while loading of the SaPI own genome is supported by the SaPI-encoded terminase [195]. Finally, SaPI-encoded proteins can bind viral transcription factors impairing expression of the late genes [196]. While these mechanisms have been described for the staphylococcal SaPIs, PICIs are widely distributed in bacteria [190, 197] and may rely on the similar mechanisms for their parasitic lifestyles. For example, PmCI172 from Pasteurella multiocida directs production of the small-size capsids when the cell is infected by the Mu-like helper phage, while EcCICFT073 from E. coli encodes Rpp, a protein reprogramming phage λ TerS to package the genome of the PICI [190, 198].

PLE. Another type of the satellite MGEs specific to V. cholera and interfering with the phage ICP1 propagation is called PLE – Phage-inducible chromosomal Island-like Element [199]. Similar to PICIs, PLE senses helper phage ICP1 proteins to trigger their genome excision process and exploits molecular machinery and structural components of ICP1. Yet genome organization of the PLE is different and in contrast to PICIs, which only lower the phage burst size, no ICP1 infectious particles are produced when PLE is induced (Fig. 7c) [199-202]. ICP1 replication was shown to be severely inhibited in the PLE-carrying cells, while some PLEs can also modulate viral gene expression [202, 203]. An additional mechanism that may contribute to the ICP1 suppression is production of the PLE-encoded LidI, a protein that disrupts the lysis interference system of ICP1 and accelerates cell lysis [204]. It must be noted that ICP1 in its turn encodes a CRISPR-Cas system to target PLE [205].

SYSTEMS WITH PLAUSIBLE NOVEL MECHANISMS

Mining of the conserved gene clusters found within the defense islands of available prokaryotic genomes allowed predicting multiple novel types of the defense systems [149, 206-208]. Systematic verification of these predictions had been recently carried out. The pipeline included cloning of 28 candidate systems from several source organisms to Gram-negative E. coli or Gram-positive B. subtilis surrogate hosts followed by screening with a set of diverse phages. This work validated 10 novel systems that were named after mythological protective deities [207]. Druantia, Kiwa, and Zorya were validated in E. coli, while activities of Gabija, Hachiman, Lamassu, Thoeris, Septu, Shedu, and Wadjet were demonstrated in B. subtilis. Multiple protein domains distinct from that, known for already studied defense systems were shown to be involved in antiviral protection, which suggests novel mechanisms of action of the discovered systems.

The Zorya system is active against both ssDNA and dsDNA phages. This system encodes ZorA and ZorB proteins, which are homologous to the proton channel-forming unit MotAB of the bacterial flagellar motor [209]. In the type I Zorya systems, ZorAB could be accompanied by the predicted small nuclease ZorE, while in the Type II system – by the helicase/ATPase and Pfam00691-containing proteins ZorCD. Phage infection of the cells carrying Zorya initiates premature cell lysis likely due to the Abi response mediated by the ZorAB effector causing membrane depolarization. This point of view is further supported by the fact that mutations of amino acid residues predicted to be involved in the proton transport reduce the degree of defense provided by the Zorya system [207].

Another studied system – Thoeris – is composed of the ThsA protein with SIR2 and SLOG domains and Toll/interleukin-1 (TIR) domain protein ThsB [207, 210]. The structure of both proteins has been recently resolved [211]. Mutations in the NAD-binding pocket of the ThsA SIR2 domain results in the loss of in vitro activity of the protein and phage resistance in vivo, linking the Thoeris-mediated defense to NAD+ hydrolysis [207, 211]. The TIR domain could serve as a signal transducer in the eukaryotic immune pathways [212] and was also found in other prokaryotic systems (e.g., CBASS). In Thoeris system phage infection could trigger the TIR domain of ThsA to synthesize an isomer of cyclic ADP-ribose that transfers the signal to ThsB, causing abortive infection response associated with NAD+ depletion [210]. Multiple copies of thsB gene could be associated with the single thsA and it was shown that diversification of TIR domains provides defense against a broader range of infecting phages [210].

The Wadjet system was shown to provide defense not against phages but against plasmid transformation [207]. Wadjet is composed of four components: JetABC homologous to the SMC proteins (structural maintenance of the chromosome) involved in the plasmid and host genome segregation, and JetD protein with a putative topoisomerase VI domain [207, 213]. The phenomenon of plasmid maintenance suppression associated with non-canonical SMC proteins has been reported in M. smegmatis, where modulation of the plasmid supercoiling status impaired its segregation to the daughter cells [214]. It could be suggested that Wadjet system functions in a similar manner to exclude foreign extrachromosomal genetic elements.

Another recent study employed a similar strategy for experimental validation of the candidate systems but relied on the different prediction algorithm. Instead of assessing protein domains enrichment, all sequences in the 10-gene neighborhood of the known defense systems were analyzed for the probability to be localized within defense islands. More than 7000 candidate defense genes were predicted, many of which harbored unannotated domains or domains of unknown function. Following manual curation, 48 predicted systems were selected for further work. Heterologous expression in E. coli validated antiviral activities for 29 novel systems [149].

Association of retrons with phage defense systems was established independently. Moreover, reverse transcriptases (RTase) not associated with retron functions were also shown to be involved in the protection. Six groups of RTases were combined under the term DRT (defense-associated RTases). Some DRTs were shown to be associated with the auxiliary proteins, whereas DRTs from the Type I were fused to the nitrilase domain often involved in the small-molecule metabolism [215]. Mutations of the predicted catalytic residues in RTase or nitrilase inactivated the system. Analysis of the transcriptome identified that the DRTs of different types do not interfere with the early viral genes expression, while the Type I system affected accumulation of the late viral transcripts. The mechanism of DRTs protection remains undetermined.

The RADAR (restriction by adenosine deaminase acting on RNA) system was studied in more detail. The core of the system is composed of ATPse RdrA and adenosine deaminase RdrB, that could be accompanied with the ancillary proteins (SLATT, or Csx27 that is also involved in the Type VI CRISPR defense). Analysis of the transcriptome of the phage-infected cells identified A to G substitutions in the sequenced reads, consistent with the predicted RdrB-related formation of inosine in the host and viral RNAs. The RNA stem-loop secondary structures were shown to be preferential editing targets. Infection of the RADAR+ culture with phages at high multiplicity of infection (MOI) results in growth arrest, suggesting defense response through Abi mechanism. This behavior was attributed to the editing of the host transfer-messenger RNA (tmRNA) that rescues stalled ribosomes. It was further demonstrated that expression of the certain phage T2 DNA-binding proteins triggers RNA editing in the uninfected culture, which allowed suggesting a mechanism for the RADAR response activation [149].

NTPases from the STAND superfamily are involved in the programmed cell death signal transduction pathways in eukaryotes [216], but role of these proteins in prokaryotes have been long undetermined. Here, several STAND NTPases were shown to be active in protection against phage infection, and 5 types of such systems were combined under the name AVAST (AntiViral ATPases/NTPases from the STAND superfamily). The NTPase domains were found fused to the different putative effector domains (like nuclease previously known as DUF4297, protease, or SIR2) and were suggested to act through the Abi mechanism. Mutational analysis verified importance of ATPse and putative effectors for the AVAST-mediated defense [149].

Functional domains of other novel systems include nucleases (e.g., DUF4297 from the Lamassu system was shown to be a nuclease involved in CBASS), helicases, SIR2, DNA-binding proteins, phosphatases, and ATPases, as well as multiple unannotated domains [149, 207, 208, 217]. For example, the large Druantia system is composed primarily of the genes with unknown functions. These discoveries significantly expanded our understanding of the diversity of biochemical activities that may be involved in antiviral protection, and should pave the way for further experimental investigations of novel defense mechanisms.

CONCLUSIONS

Interaction of viruses with their hosts is a highly dynamic process that drives generation of multiple offence and defense strategies. Prokaryotic genes associated with phage resistance are amongst the most quickly evolving, and high turnover rate of prokaryotic adaptations followed by viral counter-adaptations is often described in terms of the Red Queen hypothesis, e.g., both sides of this arms race need to constantly evolve just to keep the “status quo” [218, 219]. Long-term evolutionary studies show that mutation accumulation rate is higher when the phage co-evolve with the host, compared to the situation when phage evolves while the host maintains its genotype [220, 221]. One of the examples that allows estimating dynamics of these interactions is confrontation of the V. cholerae PLEs with the phage ICP1. Using V. cholerae stool samples collected since 1940’s the history of this competition can be tracked: five different types of PLE gradually replaced each other, likely by escaping interference from the ICP1-encoded CRISPR-Cas system [199, 205].

Only in recent years we began to appreciate the real abundance of the defense systems and diversity of their mechanisms. This immediately raises the question on how the presence of multiple defense systems in a single bacterium affects its survival and phage resistance? Some defense systems were shown to be highly specific, while other can target multiple phages, like some Abi systems that sense general perturbations in the host metabolism, or DNA-recognition systems that can adapt or evolve to interact with the new sequences [3, 222]. Defense systems target different stages of the viral life cycle, and, in general, three lines of defense can be indicated: systems that affect cell surface to prevent phage adsorption and genome entry; systems that degrade phage genetic material; systems that induce cell dormancy or suicide if the first two lines of defense were circumvented. It can be suggested that the simultaneous presence of different defense systems in one cell increases the chances of survival and broadens the range of the targeted parasites. Some defense systems are even known to act in co-operation, for example, DNA degradation by the Type I R-M systems generates the pool of DNA fragments that can be used as spacer precursors for the CRISPR-Cas system [64, 223, 224], or PrrC and retron Ec48 Abi systems that exploit “guarding” strategy and are activated only if the phage interferes with the function of other defense systems [146]. At the same time, presence of defense systems imposes fitness costs on the host: in the absence of phage infection, expression of the defense genes is energy-costly, while their unrestrained activity is often associated with self-toxicity [110, 219, 225]. Thus, the balance between the benefits of antiviral protection and accompanied fitness costs defines the amount of the active defense systems in the genome, and prevalence of the specific defense. Defense systems can be lost under conditions of low phage pressure, and their persistence in the population is facilitated by the extensive horizontal gene transfer (HGT), downregulation of gene expression, and phase variation [226, 227].

Novel discoveries allowed to highlight additional features of the defense systems that were not so evident before:

Functional modules can be exchanged between the different defense systems, and similar protein domains can be employed in different types of defense systems [208]. For example, Dnd system modification module could be associated with DndFGH or PblABCDE effectors, NucC nuclease could be associated with CBASS or Type III CRISPR-Cas, TIR domains are found in CBASS and Thoeris, etc. [161, 210, 228].

Defense systems are not unique for the prokaryotic genomes, as different types of MGEs often adopt defense systems for inter-MGE conflicts or for the host suppression [205, 226]. For example, recent metagenomic study of the giant phages uncovered occurrence of multiple CRISPR loci with spacers targeting other phages [229].

Proteins typical for the eukaryotic immunity systems were found to perform similar functions in prokaryotes, such as cGAS, pAgo, STING, TIR, STAND [105, 149, 151, 158, 210]. Recent work of Burroughs and Arravind adds more to these findings by tracing homologs of the eukaryotic Wnt, YEATS, TPR-S and other domains in prokaryotic immunity systems [208]. Phylogenetic history of these domains indicates that their common root resides within prokaryotes, which suggests that some immunity mechanisms originated before the eukaryote branching, and were inherited by the latter.