Microbial Arsenal of Antiviral Defenses. Part II

Bacteriophages or phages are viruses that infect bacterial cells (for the scope of this review we will also consider viruses that infect Archaea). The constant threat of phage infection is a major force that shapes evolution of microbial genomes. To withstand infection, bacteria had evolved numerous strategies to avoid recognition by phages or to directly interfere with phage propagation inside the cell. Classical molecular biology and genetic engineering had been deeply intertwined with the study of phages and host defenses. Nowadays, owing to the rise of phage therapy, broad application of CRISPR-Cas technologies, and development of bioinformatics approaches that facilitate discovery of new systems, phage biology experiences a revival. This review describes variety of strategies employed by microbes to counter phage infection. In the first part defense associated with cell surface, roles of small molecules, and innate immunity systems relying on DNA modification were discussed. The second part focuses on adaptive immunity systems, abortive infection mechanisms, defenses associated with mobile genetic elements, and novel systems discovered in recent years through metagenomic mining.


INTRODUCTION
The first part of the review covered strategies that allow host cells to avoid recognition by phages, innate immunity mechanisms blocking early stages of infection, and systems that rely on DNA modification for self vs. non self discrimination. Here, we will continue descrip tion of the variety of microbial antiviral systems.

CRISPR Cas ADAPTIVE IMMUNITY SYSTEMS
In contrast to the DNA modification based innate immunity systems, where target recognition relies on interaction of defense proteins with a predetermined sequence within a phage genome, prokaryotes also pos sess adaptive immunity CRISPR Cas (Clustered Regularly Interspaced Palindromic Repeats and CRISPR associated proteins) systems. Here, the target nucleic acid recognition is driven by annealing of the complementary RNA molecule, and the system can gen erate and store guides for interference with novel sequences. An ability to preserve information about pre vious encounters with invaders is a feature that is shared between CRISPR Cas systems and immunity of higher eukaryotes, such as humans. Unlike the case of mam mals, the CRISPR Cas immunity is inheritable. The CRISPR Cas system consists of a CRISPR array (the number of arrays in prokaryotic genomes varies from one to several dozens) and associated cas genes [ 1 3]. A CRISPR array is a cluster of short repeated genomic BIOCHEMISTRY (Moscow) Vol. 86 No. 4 2021 DNA fragments separated by unique spacer sequences, at least some of which originate from the foreign DNA. An AT rich leader region is located in front of the CRISPR array [1]. The cas genes encode protein components of the CRISPR Cas mechanism. CRISPR Cas systems are responsible for two different processes: adaptation and interference. CRISPR adaptation is the process of inte gration of new invader derived spacers into the CRISPR array. In the course of elemental act of CRISPR adapta tion, the array is expanded by one new spacer and one repeat. The proteins responsible for CRISPR adaptation generally are homologous in all CRISPR Cas systems. Transcription of the array leads to the formation of a pre crRNA that is processed into short crRNAs in such a way that each crRNA contains a spacer flanked by partial repeats. A crRNA bound by Cas proteins forms an effec tor complex capable of specific recognition of a proto spacer -a DNA or RNA sequence complementary to the spacer part of the crRNA. Protospacer recognition is fol lowed by the degradation of the target nucleic acid mole cule that contains it. The process of target recognition and destruction is called CRISPR interference (Fig. 1).
Diversity of CRISPR interference mechanisms. Classification of the CRISPR Cas systems is based on the protein composition of effector complexes. According to the latest census, CRISPR Cas systems can be subdivid ed into two classes, six types, and 33 subtypes [4]. Class 1 (Types I, III, and IV) systems utilize multisubunit effec tors, while Class 2 (types II, V, and VI) effectors are sin gle subunit proteins (table). Different types of CRISPR Fig. 1. Mechanism of CRISPR Cas adaptive immunity in prokaryotes. a) Fragments originating from foreign DNA could be integrated into a CRISPR array in the process of CRISPR adaptation. CRISPR array is elongated by one new spacer and one repeat. The CRISPR array is then transcribed with formation of the pre crRNA that is processed into short crRNAs so that each crRNA contains a spacer flanked by partial repeats. The cas genes code for protein components of the CRISPR interference and CRISPR adaptation complexes. CRISPR interference complex consists of crRNA bound by Cas proteins and interacts with a protospacer, i.e., a DNA sequence complementary to the sequence of the crRNA spacer. Recognition of the protospacer by CRISPR effector complex leads to degradation of the target DNA molecule. Protein com position of the interference module is variable and used as a major criterion in CRISPR Cas system classification. CRISPR Cas systems are subdivided into two classes, six types, and several subtypes. Cas systems are distinguished by the presence of specific "signature proteins" responsible for DNA degradation (Cas3, Csf1, Cas10, Cas9, Cpf1 and C2c2 for the Types I, IV, III, II, V, and VI, respectively [4]).
CRISPR Cas systems of Class I include three types: Type I, Type III, and Type IV. The effector complexes of Type I and Type III systems have been studied in detail. Architectural similarity of the effector complexes indi cates common origin of these systems [5]. Type I effector comprises a large multisubunit protein complex called Cascade, which consists of the Repeat Associated Mysterious Protein (RAMP) subunits in Cse1 1 :Cse2 2 : Cas7 6 :Cas5 1 :Cas6 1 stoichiometry. Cascade binds pro cessed 61 nucleotide long crRNA with 32 nt spacer [6 9]. Annealing of the Cascade bound crRNA to the comple mentary protospacer leads to the localized melting of the target dsDNA and formation of an R loop -a heterodu plex between the crRNA spacer and the "target" strand of the DNA protospacer, while the "non target" DNA strand of the protospacer is displaced and remains in a sin gle stranded form. The obligatory condition for the target recognition is the presence of a short two three nucleotide long protospacer adjacent motif (PAM) sequence located at the 3′ end of the target strand, i.e., downstream from the protospacer. The requirement of PAM safeguards CRISPR Cas systems from attack on the cell's own genome, as PAM is never adjacent to the spacers in the CRISPR loci. Following the R loop forma tion, the Cas3 nuclease/helicase is recruited to the com plex [10]. Cas3 first introduces a single stranded break in the "non target" protospacer strand 11 15 nucleotides downstream from the PAM, then begins to unwind and cleave the DNA in 3′ 5′ direction from the PAM.
In the Type III systems, the effector has a similar helicoid structure as that of Cascade [11,12]. However, the Type III effector recognizes not dsDNA but RNA sequences complementary to the crRNA spacers [13,14]. Recognition of the transcribing RNA target stimulates nonspecific DNA cleavage activity of the sig nature nuclease Cas10 HD (Histidine Aspartate) domain, which results in the in situ degradation of DNA in the transcription bubble [15 20]. At the same time, the cyclic oligoadenylate (cOA) synthetase Palm domain of Cas10 is activated to produce cOA secondary messenger. cOA can be sensed by the auxiliary Cas ribonucleases (e.g., Csm6/Csx1) that degrade host and viral transcripts in a sequence non specific manner [21 23]. The Type III systems do not rely on PAM for autoimmunity preven tion, since the effector complex cannot target CRISPR array or crRNA, however, crRNA encodes 8 nt long sequence tag that inhibits Cas10 activity to avoid self DNA cleavage in the case of CRISPR array transcription from the opposite strand [17]. If the target sequence is completely complementary to the spacer and the crRNA tag, the interference does not occur.
The exact immunity mechanism of CRISPR in the Type IV systems is not yet fully understood. The signature protein of such systems is Csf1. The Type IV CRISPR-Cas have been found localized on plasmids or in prophage genomes, implying the possibility for recur rent transfer of the CRISPR-Cas machinery to and from mobile genetic elements (MGEs) [24,25]. The Type IV CRISPR Cas signature genes are not accompanied by the adaptation module genes [26]. This leads to a suggestion that the Type IV proteins can be involved in cellular func tions unrelated to the adaptive immunity [27,28].
The CRISPR Cas systems of Class II include three types: Type II, Type V and Type VI. In the Type II sys tems, the monomeric Cas9 protein in the complex with crRNA is responsible for the target dsDNA recognition and its degradation. Cas9 possesses two nuclease domains (RuvC and HNH), and is capable of double stranded  [29,30]. It represents a minimal inter ference system, and therefore became the preferred tool in the CRISPR-Cas based genome engineering applica tions [31 33]. The Type V systems are characterized by the presence of Cpf1 effector protein. The Cpf1 contains the RuvC nuclease domain, similar to Cas9, while the HNH domain is absent [34]. The Type V effector is able to destroy the target double stranded DNA in a PAM specific manner [34,35], while binding of the Cpf1 to the targets also unleashes its indiscriminate single stranded DNase activity [36]. The majority of Type V systems con tain Cpf1 effector, while in the V F subtype it is replaced by Cas14. To date, Cas14 is the smallest of known CRISPR effectors. The Cas14a is an ssDNA targeting CRISPR endonuclease that does not require PAM for activation [37]. Some V U subtype effectors demonstrate phylogenetic similarity to the TnpB transposases [37,38]. The Type VI CRISPR Cas system was bioinformatically predicted in 2015 [39]. Soon, the effector protein C2c2 from the VI A subtype was described. The VI A locus in Leptotrichia shahii contains only three genes (cas1, cas2, c2c2) and a CRISPR array. The C2c2 nuclease with bound crRNA forms an effector complex, which is able to cleave the single stranded RNA molecules. In contrast to all known CRISPR nucleases, C2c2 mediates RNA cleavage by the HEPN (higher eukaryotes and prokary otes nucleotide) domain. Mutation in the catalytic centre of the HEPN domain leads to inactivation of the effector complex, but the RNA binding activity of the resulting protein is retained [40]. Because of its ability to bind RNA molecules in a predetermined manner, C2c2 nucle ase could be used as an effective tool for RNA editing and regulation of gene expression.
Origin of the CRISPR effectors diversity and their phylogenetic relations represent an interesting question. The effector complexes of Type I and Type III systems are quite similar in structure. It is assumed that the effector complex of the Type III system is more ancient. Here, cas genes are not always associated with the CRISPR arrays and cas1 cas2 adaptation module [41]. The standalone Cas1 homologs were detected in the mobile genetic ele ments named casposons. The Cas9 and the Cpf1 proteins, typical for the Type II and Type V, respectively, are simi lar to the TnpB transposon encoded protein and contain an RuvC endonuclease domain [42]. The protein Cas13 (Type VI system) has RNAse HEPN domains. Thus, the CRISPR Cas systems could have evolved by adopting interference and adaptation module genes from cas posons, while effector nucleases may have originated from the cellular genomes or mobile genetic elements.
CRISPR adaptation. The most conserved compo nents among all CRISPR Cas systems are Cas1 and Cas2, which are required at the stage of spacer acquisition [43]. As a rule, the cas1 and cas2 genes are located close to each other, and the encoded proteins form a stable complex [44,45]. Deletion of cas1 and cas2 does not affect CRISPR interference and crRNA maturation in Type I [46 49], Type II [50,51] and Type III [52] systems. Cas1 is an endonuclease [53,54], which also has ability to resolve Holliday junctions. In vitro, Cas1 can promote DNA integration and recombination events [55]. Cas2 displays nuclease activity towards both, RNA and DNA, in vitro [56,57]. However, CRISPR adaptation in vivo requires nuclease activity associated only with Cas1 [44]. The ability to assemble stable Cas1-Cas2 complex is also essential for in vivo adaptation. Mutations that disrupt complex formation in vitro interfere with the in vivo spac er acquisition [44]. During the process of new spacer incorporation, the Cas1 Cas2 complex introduces a sin gle strand break exactly at the leader-repeat junction in the CRISPR array by catalyzing nucleophilic attack of the 3′ OH end of the incoming spacer on the 5′ end of the first repeat. Similarly, the other strand is nicked at the first repeat-spacer junction, and the 5′ end of the repeat strand is joined to the 3′ end of the new spacer. As a result, the incorporated spacer is flanked by the single stranded repeat sequences that are filled later due to the activity of the host repair machinery [58]. Similar inter mediates are known for the transposase mediated mobile element integration suggesting that the spacer acquisition and transposon integration reactions are mechanistically similar [59 62].
The ability of the Cas1-Cas2 adaptation complex to uptake new spacers independent from the activity of CRISPR effector complexes is known as the process of na ve adaptation. During na ve adaptation, new spacers may be acquired from the extrachromosomal DNA as well as from the host genome, and only 50% of the new spacers contain consensus PAM. Na ve adaptation is essential for subsequent targeting of unknown foreign DNA and seems to be a universal feature of all CRISPR Cas systems. The process is known to be at least partially dependent on the activity of the host RecBCD complexes [63]. The RecBCD performs processing of the stalled replication forks and it is believed that the resulting DNA fragments may be used by the Cas1-Cas2 complex for insertion into the CRISPR array. The absence of RecBCD reduces na ve adaptation efficiency but does not stop it. Consequently, the Cas1-Cas2 complex can utilize other sources of spac ers. The question of participation of other host proteins in the CRISPR adaptation and regulation of this process has only recently attracted attention of the researchers [64 67]. For example, it was shown that the DNA polymerase I is necessary for both na ve and primed adaptation (pre sumably to fill single stranded repeats that arise during spacers embedding) [68]. Dorman and Bhriain hypothe sized that the negative supercoiling could alter various stages of the CRISPR proteins interaction with DNA, including adaptation, expression of cas genes and CRISPR loci, and, in fact, interference [65].
The presence of PAM allows distinguishing the host genome containing a spacer in the CRISPR array and protospacer within the target molecule. However, muta tions in the PAM (or seed) sequence may protect viruses from recognition and degradation by the effector com plexes [30, 69 72]. Therefore, the CRISPR Cas system should update its "memory" to avoid infection by the "escaper" phages. To achieve this goal, some types of the CRISPR Cas systems rely on the primed adaptation -a highly efficient process of new spacers acquisition from the already "known", previously encountered phages whose fragments have been stored in the CRISPR array as immunological memory. Primed adaptation was demon strated in the I E [48], I F [49,73], I B [74,75], I C [76], I U [77], and II A [78] CRISPR Cas systems. Primed adaptation leads to the highly efficient and tar geted accumulation of new spacers located in cis position to the "priming" protospacer recognized by the effector [79]. The observed efficiency of primed adaptation (measured as the number of extended CRISPR arrays in the population) is very low if the target protospacer fully matches the crRNA and contains a consensus interfer ence proficient PAM (AAG or ATG in the case of E. coli I E system) [48,80]. Efficiency of the primed adaptation is enhanced by the presence of PAM or protospacer mutations that decrease interference efficiency [30,70,73]. Yet, primed adaptation requires activity of Cas3 protein, suggesting functional link between the CRISPR interference and primed adaptation [71,81]. Recent in vitro study suggests that Cascade, Cas1 Cas2 and Cas3 form a single priming complex with activity leading to the effective selection of new spacers [82]. Two alternative models have been suggested to explain such a link. One model postulates that the effectors bound to protospacers with certain PAMs assume a specific con formation that recruits adaptation machinery (Cas1 and Cas2) as well as Cas3 protein, followed by the directional scanning of the target and selection of the new spacers [83]. In contrast, the complexes formed on targets with the interference proficient PAMs do not support Cas1 Cas2 recruitment, leading to interference only [84]. The second model postulates that the apparent difference in the efficiency of primed adaptation with different targets is a consequence of the dynamics of degradation of the less than optimal targets [81,85]. Since most MGEs are able to replicate and have copy number maintenance mechanisms of their own, competition between the atten uated CRISPR interference and copy number mainte nance mechanisms could create a situation when degra dation fragments of the MGE genomes are present in the cell for an extended time, allowing the presumably slow er adaptation reaction to occur [86].

ARGONAUTE MEDIATED INTERFERENCE
Argonaute proteins play a key role in the regulation of gene expression and anti viral defense through RNA interference in eukaryotes. Members of this protein fam ily are also widely encountered in Bacteria and Archaea [87,88]. Functions of prokaryotic Argonautes (pAgo) are not yet fully understood, but these proteins are involved in the silencing of exogenous genetic material [89,90]. pAgos use guide molecules to recognize nucleic acid tar gets, but in contrast to the CRISPR Cas and eukaryotic Agos, they often exploit short single stranded DNA [91], although RNA guides have also been described [92]. The 5′ end of the guide is loaded into the MID domain of pAgo, while the 3′ end interacts with the PAZ domain [93]. In vitro experiments with pAgos from different organisms demonstrated that recognition of the comple mentary target leads to its nucleolytic cleavage by the cat alytic PIWI domain. pAgos mostly target DNA molecules [94,95], while some may also target RNA in vitro [96 98]. However, it is not clear whether the RNA targeting might be important for the in vivo activities of pAgo. Nevertheless, all possible combinations of pAgos mediat ed DNA/RNA guide target interactions potentially exist [89,93,99]. In vivo, the presence of pAgo affects plasmids maintenance and inhibits transformation [92,100]. While it is generally accepted that the pAgos are also involved in antiviral defense the only experimental evidence was recently obtained with the pAgo from Clostridium butyricum, whose heterologous expression in E. coli decreased the titers of the chronic phage M13 and lytic phage P1vir. However, the mechanisms were unde fined [90]. Based on the architecture of the domains, pAgos are divided into classes and some proteins, surpris ingly, contain the catalytically inactive PIWI domain [89,101]. Function of such variants, if any, remain to be determined.
One of the major questions associated with the pAgo interference is the mechanism of generation and the source of guide molecules, as well as the question on how self targeting is avoided. Sequencing of the DNA guides bound to pAgo in vivo has shown that they are preferen tially derived from the actively replicating or multicopy elements, including plasmids and transposons [90,100]. The guide independent nuclease activity, termed DNA chopping, was shown for different pAgo proteins [102,103]. The sequence independent plasmid chopping may generate a pool of different size DNA fragments, and some of them may be further loaded as guides to acti vate more efficient sequence dependent degradation of the complementary targets [103]. To generate guides, pAgos could target free DNA ends or replication inter mediates more often present in exogenous DNA and, akin to CRISPR Cas, cooperate with RecBCD [90]. The pAgo from Thermus thermophilus was shown to sequester guides from the replication termination regions similar to the pAgo from C. butyricum and, it is supposed to be involved in the host replication control together with the DNA gyrase by resolving catenated chromosomes [104]. It has been suggested that the DNA compaction charac BIOCHEMISTRY (Moscow) Vol. 86 No. 4 2021 teristic for archaeal genomes can contribute to the self/non self discrimination by the pAgos [102]. The RNA guides found associated with pAgo from Rhodobacter sphaeroides are presumed to be promiscu ously incorporated from the degraded transcripts. Yet, the protein retains its specificity to the foreign DNA [92]. A current model of pAgo mechanism of action is presented in Fig. 2.
The described mode of interference may not be very efficient against the quickly acting lytic phages and the pAgo defense may be specialized on controlling of the less harmful mobile elements or associated with other defense systems to enhance protection against viral infection. The fact that pAgo genes are often found coupled with the nuclease or cas genes within the defense islands supports the latter hypothesis [88,101,105].

INDUCED CELL DORMANCY OR SUICIDE -ABORTIVE INFECTION (Abi) AND TOXIN-ANTITOXIN (TA) SYSTEMS
In this section we will consider abortive infection (Abi) in a broad sense -as cellular responses to infection that lead to cessation of the host metabolism (bacteriosta tic effect) or cell death (bactericidal effect), prior to the completion of the viral life cycle, thus preventing produc tion of active phage particles or decreasing phage burst size [106,107]. The Abi systems are mechanistically very diverse. Generally, they are composed of two modules ( Fig. 3): one module senses phage infection and transfers the signal, and another -effector module -shuts down host metabolism and/or causes cell suicide after receiving the signal [107,108]. It is generally accepted that the induced dormancy provides more time for other defense mechanisms to deal with infection. It is also believed that some Abi systems may be the "last resort" of defense, i.e., they activate suicidal response at the later stages of viral infection when other immune mechanisms fail. The strategy of self elimination by the infected cell stops spread of the infection at the community level and thus benefits a clonal population [109,110]. Some systems whose action phenotypically resembles the Abi response may directly target phages and not per se cause active cell death. However, their action may be accompanied by the cell lysis caused by the vestigial viral toxic components.
Abi systems. Diversity of the plasmid encoded sys tems with Abi mechanism had been historically investi gated in the Gram positive lactococci [106,111]. Amongst 23 described systems, designated from AbiA to AbiZ, the mode of action was determined only for a few. For example, AbiZ protein cooperates with the phage ϕ31 holin and lysin causing premature cell lysis [112]; AbiK exhibits untemplated DNA polymerization activity [113]; AbiA, AbiK, and AbiF are thought to inhibit replication [106,114]; AbiB and AbiQ activities are associated with mRNA decay [115,116]; AbiD1 might interfere with the viral DNA packaging through inhibition of phage resolv ing nuclease [117]; and AbiT and AbiV are thought to tar get expression of the late phage proteins [118,119]. The way phage infection is sensed is not clear for most of these systems. The Abi system with a sensing module that relies on protein phosphorylation was found in Staphylococci [120]. Phosphorylation, an efficient way to amplify a sig nal, is often exploited by eukaryotic antiviral systems. Staphylococcal serine/threonine kinase Stk2 is activated by phage ϕNM1 protein PacK and phosphorylates multi ple target host proteins thus inhibiting core metabo lism [120].
Plethora of Abi mechanisms have also been described in Gram negative E. coli [121]. The Lit and PrrC proteins encoded by cryptic prophages are specific against the T4 phage infection. The Lit protease is acti vated upon interaction with the conserved Gol peptide of the T4 capsid protein and arrests translation through the specific proteolytic cleavage of the translation elongation factor EF Tu [122]. The PrrC RNAse also inhibits trans lation by cleaving tRNA Lys . PrrC interacts with the Type I Restriction-Modification system (R M) EcoprrI restric tion complex and is activated only upon restriction com plex inhibition caused by the T4 encoded peptide Stp [123]. Another interesting example includes F plasmid encoded protein PifA, which provides protection against the phage T7 [124]. This membrane associated protein is activated by T7 gp10 or gp1.2 and causes leakage of ATP and other small molecules from the infected cell by dis rupting membrane integrity [124,125]. The λ prophage encoded system RexAB also acts by increasing membrane permeability [126,127]. RexA is thought to recognize the DNA protein intermediates of viral replication complex es and stimulate the membrane associated RexB to form an ion channel, leading to the loss of membrane potential and inhibition of the energy dependent processes [127].
Toxin− −Antitoxin (TA) based defense. Toxin-antitox in systems are selfish elements comprising a stable toxin subunit and an unstable antitoxin. Under stress condi tions, increased antitoxin degradation releases activity of the toxin, leading to the growth arrest [128,129]. TA modules are involved in the stress response, biofilm for mation and persistence (although the latter has been con troversial [129]) but also may be implicated in the Abi anti viral defense, since phage infection often interferes with the host metabolism in a way that may cause the loss of antitoxin (Fig. 4). TA modules are frequently found within the defense islands and there is an extensive domain exchange between the TA and Abi systems [130]. In fact, no clear boundary can be drawn between the Abi and TA systems, since Abi is a defense strategy, while TA is an organizational/mechanistic principle. Rather, cer tain Abi systems can be regarded as based on the TA mechanism, e.g., even some Abi systems discussed in the previous section can be considered as solitary toxins, while PrrC/EcoprrI can be considered as a bona fide toxin-antitoxin pair. Depending on the nature of toxin-antitoxin interaction, the TA systems are divided into 6 types, e.g., antitoxin might represent an RNA mol ecule that directly inhibits toxin protein (Type III) or reg ulates the level of toxin mRNA translation (Type I); in other types, antitoxin can be represented by a protein that inhibits toxin through the direct protein protein interac tion (Type II) or by counteracting toxin effect on the tar gets (Type IV) [129,131].
Examples of the Abi response based on TA modules include the ToxIN and RnlAB systems. ToxIN, originally identified as AbiQ in lactococci, is frequently found in bacterial genomes, and functions as a Type III TA system, where activity of the RNAse toxin ToxN is blocked by interaction with ToxI RNA antitoxin [116,132,133]. RnlAB system from E. coli represents Type II TA module and protects against phage T4 infection [134]. The RnlA toxin is a stable RNase; the RnlB antitoxin is quickly degraded by host proteases. Thus, if phage infection interferes with the continuous gene expression, preven tion of RnlB synthesis releases toxic activity of RnlA leading to the cellular mRNA decay [134]. RnlAB homologs are found in E. coli plasmids and the system is called LsoAB [135]. The T4 phage encoded protein Dmd functions as an antitoxin for both of these systems [134,135]. Many TA systems have a reversible effect and do not induce cell death. Still, temporal cessation of growth can provide phage resistance. AbiE, a Type IV TA system, is an example: AbiEii toxin transcribed from the abiE promoter does not directly interact with its antitox in AbiEi. Instead, AbiEi binds to the promoter region and inhibits transcription of the entire TA operon [136]. AbiEii toxin belongs to the DNA polymerase β like superfamily and exhibits nucleotidyltransferase activity [136]. Recently it was demonstrated that AbiEii homo logue MenT 3 from Mycobacterium tuberculosis can trans fer pyrimidines to the acceptor stem of the specific tRNAs [137]. In agreement with this is the fact that over expression of AbiE toxin in Serratia causes growth cessa tion and decrease in the tRNAs levels [138]. In E. coli, the Type II system MazEF interferes with phage P1 propaga tion [139], and the Type I TA system hok/sok reduces the burst size of T4 phage [140]. The latter system is based on the holin like activity of the Hok toxin, while the antitox in sok is an antisense RNA that inhibits Hok synthesis by binding to its mRNA [141]. The role of TA systems in phage defense remains controversial and poorly charac terized [142,143]. Based on the abundance of TA systems and their involvement in the Abi response in model bac teria it can be expected that TA based phage defenses are widespread [130,142,144].
Retrons as immunity systems. Retrons are genetic elements that encode reverse transcriptase (RTase) and non coding RNA (ncRNA) that is used by RTase to gen erate covalently linked RNA/DNA hybrids [145]. The functional role of retrons remained unknown until the recent works demonstrated that they are part of the tri partite TA systems that may be involved in phage defense through Abi [146 149]. The RTase complex with RNA/DNA hybrid is inactive under normal conditions, while phage infection causes its activation and signal transduction to the cognate toxin effectors (Fig. 5). Anti phage activity was demonstrated for multiple retrons and mutations that affect the ncRNA secondary structure and branching site or RTase catalytic motif eliminated the defense [146,149]. About 2000 retron systems were found within the defense islands with RTases that could be fused to or accompanied by ATPases, ribosyltransferases, and endonucleases as effector proteins [146,149]. The Ec48 retron from E. coli "guards" the RecBCD enzyme, which is one of the main barriers for the foreign DNA uptake, and RecB inhibition by the phage encoded proteins (e.g., Gam from phage λ or gp 5.9 of T7) activates the retron and releases activity of the cognate membrane anchored effector that cause premature cell lysis [146]. It was shown for the Sen2 retron from S. enterica that per turbations of the DNA part of the complex activate the RcaT toxin [147,148]. DNA degradation or methylation associated with the phage encoded RecE endonuclease or Dam methyltransferase activates the response, while some prophage encoded proteins can serve as blockers of the retron activation [147].
Cyclic Oligonucleotide Based Anti Phage Signaling Systems (CBASS). A widely encountered family of sys tems that rely on the synthesis of cyclic oligonucleotides to activate the Abi response has been recently described [107,150,151]. In the presence of CBASS phage infec tion triggers synthesis of a secondary messenger (cyclic GMP AMP, cyclic triadenylate, etc.) by the cGAS/DncV like nucleotidyltransferases (CD NTases), which activates various effectors that induce programmed cell death (Fig. 6) [150 153]. These systems provide another link between the immunity defense of eukaryotes and prokaryotes since cyclic GMP AMP synthase (cGAS) in animals is involved in anti viral and inflamma tory response through the cGAS STING pathway acti vated by sensing of cytosolic DNA [154]. Oligonucleotide cyclase genes are found in about 10% of prokaryotic genomes, and more than half reside within the defense islands. Diversity of CBASS systems can be classified based on the composition of operon, type of the effector or produced signalling molecule [153]. The Type I sys tems contain only CD NTase (oligonucleotide cyclase) and effector, while other systems also carry auxiliary com ponents: genes with ubiquitin associated domains in Type II, HORMA and Trip13 like domain genes in Type III, or nucleotide modification domains in rarely encountered Type IV CBASS [153]. The Type II CBASS from V. cholerae was the first to be studied experimentally [151,155,156]. Core of the system is composed of two components: cGAS enzyme DncV and cyclic GMP AMP (cGAMP) sensing phospholipase CapV. These two components are sufficient to provide defense against the P1 phage, yet, protection against other phages required two additional proteins carrying E1, E2, and JAB domains, typical of ubiquitinating enzymes [151]. The sensing mechanism is not yet determined, but the cells were shown to produce cGAMP upon phage infection, which triggered CapV phospholipase mediated disrup tion of the cell membrane before completion of the viral life cycle. In addition to phospholipase, the known effec tors of CBASS systems include endonucleases or trans membrane domain carrying proteins [153,157]. An inter esting group of CBASS effectors contain domain homol ogous to the eukaryotic STING (Stimulator of Interferon Genes). In Bacteria, sensing of cyclic oligonucleotides by STING activates the coupled Toll/interleukin 1 receptor (TIR) domain of the effector, which leads to NAD + degradation [158]. Comparative analysis of the metazoan and bacterial STING domain structures reveals its transi tion from the direct effector role in CBASS to the regula tory functions in the immune response of higher ani mals [158].
Many bacterial CD NTases were shown to be inac tive in vitro [152] and recent work demonstrated impor tance of the auxiliary proteins for their in vivo activity [159]. The HORMA domain proteins in eukaryotes bind to the specific closure motifs in the target proteins to assemble into the signaling complexes [160]. In the Type III CBASS of E. coli and P. aeruginosa, HORMA proteins activate CD NTase leading to cyclic triadenylate (cAAA) second messenger production, which in turn activates promiscuous endonuclease activity of the NucC effector providing defense against phage infection via the Abi mechanism [159,161]. Without infection, activity of the system is suppressed by the Trip13 like ATPase that is thought to disassemble the CD-NTase complex with HORMA. Recognition of the specific motifs in phage proteins is believed to change HORMA conformation and activate the CD-NTase activity of the complex. Intriguingly, the NucC effector can be found as an acces sory endonuclease in the Type III CRISPR/Cas systems that also rely on the cyclic oligoadenylate signaling [21,22,161]. Another CBASS effector that can respond to different types of cyclic oligonucleotides is the Cap4 protein from Enterobacter cloacae [162], a founding member of an entire protein family. Cap4 proteins recog nize secondary messengers through the divergent SAVED domains composed of 2 CRISPR related CARF sub units, which induces oligomerization and release of the DUF4297 endonuclease activity [162].
Wide variety of bacterial CD NTases and coupled effectors has been described in recent years, yet, many questions remain unanswered. How is phage infection sensed by these systems? What are the functions of auxil iary proteins? Are all these systems involved only in phage defense or can some of them perform other functions? What are the costs of the CBASS genes and are there additional mechanisms to restrain their possible self tox icity, like in the case with the Trip13 like ATPase negative regulation?

PROPHAGE MEDIATED DEFENSE
Temperate phages can integrate their genomes into the host chromosome to form prophages [163]. Most known bacteria carry prophages, and interaction of the prophage with the host can be considered as mutualistic: since prophage survival is dependent on the host, it is beneficial for the prophage to exclude secondary infec tion of lysogenized cell [164]. Indeed, prophages often can carry genes associated with antiviral defense (Fig. 7a) [ 165 168]. The simplest way by which infection with homoimmune phages can be inhibited is expression of a repressor protein -transcription factor that regulates the switch between lytic and lysogenic life strategies of the phage. Since the repressor protein is constantly present in the lysogenic cell to inhibit expression of lytic genes of the prophage, secondary lytic infection with a phage regulat ed by the same or similar repressor will be inhibited [169]. Although the repressor associated defense has a narrow specificity range systematic studies of prophages in P. aeruginosa and M. smegmatis uncovered abundant defense genes that provide heterotypic protec tion [165,166].
Genes that are dispensable for the prophage survival but can increase fitness of the host were named "morons" (from adding "more on" to the phenotype). Morons can affect multiple host processes including motility, antibiot ic resistance, metabolism, and phage defense [170,171]. Some prophage encoded TA systems and the Sie mecha nism that can be considered as morons were already dis cussed. Morons can also affect cell surface to prevent receptors recognition. For example, P. aeruginosa phages D3 and ϕ297 can alter conformation of the O antigen subunits in lipopolysaccharide (LPS) by encoding their own O antigen polymerase [172 174], while some Shigella and E. coli prophages may block growth of the O antigen chain by glucosylation or acetylation [175,176] brane associated effectors [177,178]. Other mycobacte riophages were shown to encode restriction endonucleas es, Sie membrane proteins, or Guanosine pentaphos phate [(p)ppGpp] synthases capable of defending the host [166]. Initially found as gp29 and gp30 proteins of the phage Phrann, the system encoding (p)ppGpp synthase and its cognate inhibitor is part of a novel family of TA modules relying on alarmone signaling found in multiple prophages [166,179]. The toxic component is an enzyme (SAS, Small Alarmone Synthetase) that synthesize ppGpp or ppApp -signaling molecules typical for stress responses associated with starvation (stringent response) that cause growth cessation [180]. The antitox ins directly bind to the synthase or degrade the signaling alarmone [179]. Another Abi system widespread in the prophages of Gram negative bacteria consists of the BstA effector that co localizes with the replicating DNA of tar get phages and interferes with the replication process by an unknown mechanism [181]. This system provides an interesting example of self immunity avoidance in prophages: BstA is inactivated by its binding to a specific anti BstA locus (aba) within the prophage genome and thus does not prevent phage' own lytic infection.
"Morons" can be very different even among the closely related strains and a recent study of P2 and P4 like prophages uncovered an unprecedented variety of the compact defense systems in the specific hotspot loci of their genomes [182]. In addition to the known antiviral systems, multiple gene clusters that carry predicted defense domains (~TIR, SIR2, Nuclease, ATPase) or domains of unknown function were reported in these hotspots. Defense activity has been validated for 14 novel systems. For example, PARIS system that consists of ATPase and DUF4435 protein is suggested to provide abortive infection response triggered by the phage T7 anti restriction protein Ocr that inhibits host R M and BREX systems [182 184]. Mining of the diversity hotspots in other prophages might represent a valuable and simple tool for detection of novel phage defense sys tems.

PHAGE'S PARASITES -MOLECULAR PIRACY OF THE PICIs AND PLEs
MGE can be randomly transferred by phages during the process of generalized transduction. However, some MGEs evolved to hijack phage DNA packaging machin ery and capsids to load their own genomes, a phenomenon sometimes called molecular piracy [185 188]. Phage Inducible Chromosomal Islands (PICI) are widespread in the genomes of Gram positive and Gram negative bacte ria. Their induction from the chromosome is dependent on the infection with the helper phages and from the host perspective they can be considered as a version of the Abi response [189,190]. The infected cell is lysed eventually but the burst size of the superinfecting lytic phage progeny is suppressed and majority of the released particles carry the PICI genome instead of the phage genome [191]. Since the PICI elements do not undergo a lytic cycle on their own and limit the spread of helper phages, their pres ence can be beneficial for bacterial population.
PICI. The most studied group of PICIs was discov ered in Staphylococci and is named SaPI (S. aureus path ogenicity islands). These chromosomal islands are less than 15 kbp in size; they encode an integrase, an excision protein, and replication machinery. Expression of these genes is strictly controlled by the master repressor Stl. SaPIs often carry accessory genes, including toxins and virulence factors, and their presence could contribute to the host pathogenicity [185,192]. The life cycle of SaPI is coupled with the infection by a helper phage and sens ing of the specific helper phage proteins relieves the Stl mediated repression [193,194]. Following activation, SaPIs could interfere with the helper phage reproduction via several different mechanisms (Fig. 7b). They facilitate assembly of the small capsids capable of accommodating the SaPI genome but excluding the larger genome of the helper phage [195]. SaPIs can also interfere with the viral genome packaging by inhibiting the small terminase sub unit TerS, while loading of the SaPI own genome is sup ported by the SaPI encoded terminase [195]. Finally, SaPI encoded proteins can bind viral transcription fac tors impairing expression of the late genes [196]. While these mechanisms have been described for the staphylo coccal SaPIs, PICIs are widely distributed in bacteria [190,197] and may rely on the similar mechanisms for their parasitic lifestyles. For example, PmCI172 from Pasteurella multiocida directs production of the small size capsids when the cell is infected by the Mu like helper phage, while EcCICFT073 from E. coli encodes Rpp, a protein reprogramming phage λ TerS to package the genome of the PICI [190,198].
PLE. Another type of the satellite MGEs specific to V. cholera and interfering with the phage ICP1 propaga tion is called PLE -Phage inducible chromosomal Island like Element [199]. Similar to PICIs, PLE senses helper phage ICP1 proteins to trigger their genome exci sion process and exploits molecular machinery and struc tural components of ICP1. Yet genome organization of the PLE is different and in contrast to PICIs, which only lower the phage burst size, no ICP1 infectious particles are produced when PLE is induced (Fig. 7c) [199 202]. ICP1 replication was shown to be severely inhibited in the PLE carrying cells, while some PLEs can also modulate viral gene expression [202,203]. An additional mecha nism that may contribute to the ICP1 suppression is pro duction of the PLE encoded LidI, a protein that disrupts the lysis interference system of ICP1 and accelerates cell lysis [204]. It must be noted that ICP1 in its turn encodes a CRISPR Cas system to target PLE [205].

SYSTEMS WITH PLAUSIBLE NOVEL MECHANISMS
Mining of the conserved gene clusters found within the defense islands of available prokaryotic genomes allowed predicting multiple novel types of the defense sys tems [149, 206 208]. Systematic verification of these pre dictions had been recently carried out. The pipeline included cloning of 28 candidate systems from several source organisms to Gram negative E. coli or Gram pos itive B. subtilis surrogate hosts followed by screening with a set of diverse phages. This work validated 10 novel sys tems that were named after mythological protective deities [207]. Druantia, Kiwa, and Zorya were validated in E. coli, while activities of Gabija, Hachiman, Lamassu, Thoeris, Septu, Shedu, and Wadjet were demonstrated in B. subtilis. Multiple protein domains distinct from that known for already studied defense systems were shown to be involved in antiviral protection, which suggests novel mechanisms of action of the discovered systems.
The Zorya system is active against both ssDNA and dsDNA phages. This system encodes ZorA and ZorB proteins, which are homologous to the proton channel forming unit MotAB of the bacterial flagellar motor [209]. In the type I Zorya systems, ZorAB could be accompanied by the predicted small nuclease ZorE, while in the Type II system -by the helicase/ATPase and Pfam00691 containing proteins ZorCD. Phage infection of the cells carrying Zorya initiates premature cell lysis likely due to the Abi response mediated by the ZorAB effector causing membrane depolarization. This point of view is further supported by the fact that mutations of amino acid residues predicted to be involved in the proton transport reduce the degree of defense provided by the Zorya system [207].
Another studied system -Thoeris -is composed of the ThsA protein with SIR2 and SLOG domains and Toll/interleukin 1 (TIR) domain protein ThsB [207,210]. The structure of both proteins has been recently resolved [211]. Mutations in the NAD binding pocket of the ThsA SIR2 domain results in the loss of in vitro activity of the protein and phage resistance in vivo, linking the Thoeris mediated defense to NAD + hydroly sis [207,211]. The TIR domain could serve as a signal transducer in the eukaryotic immune pathways [212] and was also found in other prokaryotic systems (e.g., CBASS). In Thoeris system phage infection could trigger the TIR domain of ThsA to synthesize an isomer of cyclic ADP ribose that transfers the signal to ThsB, causing abortive infection response associated with NAD + depletion [210]. Multiple copies of thsB gene BIOCHEMISTRY (Moscow) Vol. 86 No. 4 2021 could be associated with the single thsA and it was shown that diversification of TIR domains provides defense against a broader range of infecting phages [210].
The Wadjet system was shown to provide defense not against phages but against plasmid transformation [207]. Wadjet is composed of four components: JetABC homol ogous to the SMC proteins (structural maintenance of the chromosome) involved in the plasmid and host genome segregation, and JetD protein with a putative topoiso merase VI domain [207,213]. The phenomenon of plas mid maintenance suppression associated with non canonical SMC proteins has been reported in M. smeg matis, where modulation of the plasmid supercoiling sta tus impaired its segregation to the daughter cells [214]. It could be suggested that Wadjet system functions in a sim ilar manner to exclude foreign extrachromosomal genetic elements.
Another recent study employed a similar strategy for experimental validation of the candidate systems but relied on the different prediction algorithm. Instead of assessing protein domains enrichment, all sequences in the 10 gene neighborhood of the known defense systems were analyzed for the probability to be localized within defense islands. More than 7000 candidate defense genes were predicted, many of which harbored unannotated domains or domains of unknown function. Following manual curation, 48 predicted systems were selected for further work. Heterologous expression in E. coli validated antiviral activities for 29 novel systems [149].
Association of retrons with phage defense systems was established independently. Moreover, reverse tran scriptases (RTase) not associated with retron functions were also shown to be involved in the protection. Six groups of RTases were combined under the term DRT (defense associated RTases). Some DRTs were shown to be associated with the auxiliary proteins, whereas DRTs from the Type I were fused to the nitrilase domain often involved in the small molecule metabolism [215]. Mutations of the predicted catalytic residues in RTase or nitrilase inactivated the system. Analysis of the transcrip tome identified that the DRTs of different types do not interfere with the early viral genes expression, while the Type I system affected accumulation of the late viral tran scripts. The mechanism of DRTs protection remains undetermined.
The RADAR (restriction by adenosine deaminase acting on RNA) system was studied in more detail. The core of the system is composed of ATPse RdrA and adeno sine deaminase RdrB, that could be accompanied with the ancillary proteins (SLATT, or Csx27 that is also involved in the Type VI CRISPR defense). Analysis of the tran scriptome of the phage infected cells identified A to G substitutions in the sequenced reads, consistent with the predicted RdrB related formation of inosine in the host and viral RNAs. The RNA stem loop secondary struc tures were shown to be preferential editing targets.
Infection of the RADAR+ culture with phages at high multiplicity of infection (MOI) results in growth arrest, suggesting defense response through Abi mechanism. This behavior was attributed to the editing of the host transfer messenger RNA (tmRNA) that rescues stalled ribosomes. It was further demonstrated that expression of the certain phage T2 DNA binding proteins triggers RNA editing in the uninfected culture, which allowed suggesting a mech anism for the RADAR response activation [149].
NTPases from the STAND superfamily are involved in the programmed cell death signal transduction path ways in eukaryotes [216], but role of these proteins in prokaryotes have been long undetermined. Here, several STAND NTPases were shown to be active in protection against phage infection, and 5 types of such systems were combined under the name AVAST (AntiViral ATPases/NTPases from the STAND superfamily). The NTPase domains were found fused to the different puta tive effector domains (like nuclease previously known as DUF4297, protease, or SIR2) and were suggested to act through the Abi mechanism. Mutational analysis verified importance of ATPse and putative effectors for the AVAST mediated defense [149].
Functional domains of other novel systems include nucleases (e.g., DUF4297 from the Lamassu system was shown to be a nuclease involved in CBASS), helicases, SIR2, DNA binding proteins, phosphatases, and ATPases, as well as multiple unannotated domains [149,207,208,217]. For example, the large Druantia sys tem is composed primarily of the genes with unknown functions. These discoveries significantly expanded our understanding of the diversity of biochemical activities that may be involved in antiviral protection, and should pave the way for further experimental investigations of novel defense mechanisms.

CONCLUSIONS
Interaction of viruses with their hosts is a highly dynamic process that drives generation of multiple offence and defense strategies. Prokaryotic genes associ ated with phage resistance are amongst the most quickly evolving, and high turnover rate of prokaryotic adapta tions followed by viral counter adaptations is often described in terms of the Red Queen hypothesis, e.g., both sides of this arms race need to constantly evolve just to keep the "status quo" [218,219]. Long term evo lutionary studies show that mutation accumulation rate is higher when the phage co evolve with the host, compared to the situation when phage evolves while the host main tains its genotype [220,221] types of PLE gradually replaced each other, likely by escaping interference from the ICP1 encoded CRISPR Cas system [199,205].
Only in recent years we began to appreciate the real abundance of the defense systems and diversity of their mechanisms. This immediately raises the question on how the presence of multiple defense systems in a single bacterium affects its survival and phage resistance? Some defense systems were shown to be highly specific, while other can target multiple phages, like some Abi systems that sense general perturbations in the host metabolism, or DNA recognition systems that can adapt or evolve to interact with the new sequences [3,222]. Defense systems target different stages of the viral life cycle, and, in gener al, three lines of defense can be indicated: systems that affect cell surface to prevent phage adsorption and genome entry; systems that degrade phage genetic mate rial; systems that induce cell dormancy or suicide if the first two lines of defense were circumvented. It can be suggested that the simultaneous presence of different defense systems in one cell increases the chances of sur vival and broadens the range of the targeted parasites. Some defense systems are even known to act in co oper ation, for example, DNA degradation by the Type I R M systems generates the pool of DNA fragments that can be used as spacer precursors for the CRISPR Cas system [64,223,224], or PrrC and retron Ec48 Abi systems that exploit "guarding" strategy and are activated only if the phage interferes with the function of other defense sys tems [146]. At the same time, presence of defense systems imposes fitness costs on the host: in the absence of phage infection, expression of the defense genes is energy cost ly, while their unrestrained activity is often associated with self toxicity [110,219,225]. Thus, the balance between the benefits of antiviral protection and accompa nied fitness costs defines the amount of the active defense systems in the genome, and prevalence of the specific defense. Defense systems can be lost under conditions of low phage pressure, and their persistence in the popula tion is facilitated by the extensive horizontal gene transfer (HGT), downregulation of gene expression, and phase variation [226,227].
Novel discoveries allowed to highlight additional fea tures of the defense systems that were not so evident before: Functional modules can be exchanged between the different defense systems, and similar protein domains can be employed in different types of defense systems [208]. For example, Dnd system modification module could be associated with DndFGH or PblABCDE effec tors, NucC nuclease could be associated with CBASS or Type III CRISPR Cas, TIR domains are found in CBASS and Thoeris, etc. [161,210,228].
Defense systems are not unique for the prokaryotic genomes, as different types of MGEs often adopt defense systems for inter MGE conflicts or for the host suppres sion [205,226]. For example, recent metagenomic study of the giant phages uncovered occurrence of multiple CRISPR loci with spacers targeting other phages [229].
Proteins typical for the eukaryotic immunity systems were found to perform similar functions in prokaryotes, such as cGAS, pAgo, STING, TIR, STAND [105,149,151,158,210]. Recent work of Burroughs and Arravind adds more to these findings by tracing homologs of the eukaryotic Wnt, YEATS, TPR S and other domains in prokaryotic immunity systems [208]. Phylogenetic history of these domains indicates that their common root resides within prokaryotes, which suggests that some immunity mechanisms originated before the eukaryote branching, and were inherited by the latter. Andrey Kulbachinskiy for critical comments to the pAgo section. Illustrations were prepared using BioRender.com under the paid subscription.
Funding. The work was financially supported by the Russian Foundation for Basic Research (project no. 19 14 50560 Authors' contributions. OM&AI prepared CRISPR Cas section, AI composed the rest of the manuscript, AI&OM prepared illustrations, KS revised the text.
Ethics declarations. The authors declare no conflict of interest in financial or any other sphere. This article does not contain any studies with human participants or animals performed by any of the authors.
Open access. This article is distributed under the terms of the Creative Commons Attribution 4.0 Inter national License (http://creativecommons.org/licenses/ by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.