Microbial Arsenal of Antiviral Defenses – Part I

Bacteriophages or phages are viruses that infect bacterial cells (for the scope of this review we will also consider viruses that infect Archaea). Constant threat of phage infection is a major force that shapes evolution of the microbial genomes. To withstand infection, bacteria had evolved numerous strategies to avoid recognition by phages or to directly interfere with phage propagation inside the cell. Classical molecular biology and genetic engineering have been deeply intertwined with the study of phages and host defenses. Nowadays, owing to the rise of phage therapy, broad application of CRISPR-Cas technologies, and development of bioinformatics approaches that facilitate discovery of new systems, phage biology experiences a revival. This review describes variety of strategies employed by microbes to counter phage infection, with a focus on novel systems discovered in recent years. First chapter covers defense associated with cell surface, role of small molecules, and innate immunity systems relying on DNA modification.


INTRODUCTION
Phages are found in any natural environment and are considered to be the most abundant biological entities on Earth [1 3]. The estimated numbers of phage particles can be as high as ~10 10 /liter in sea or freshwater and 10 9 /g in soil. Overall, phages are thought to outnumber microbial hosts by 10 to 1 or more [4,5]. Phages play an important ecological role by controlling the size and diversity of microbial populations [6], while phage induced lysis of cells ensures flow of nutrients through the food networks [7]. Phages greatly enhance lateral gene transfer, and medically relevant bacterial traits are often associated with prophages [8]. Host genes can be incor porated in a viral genome or can be packaged into capsids to generate transducing particles [9]. Besides, bacteria can produce phage like gene transferring agents carrying host DNA [10]. The spread of plasmids is also facilitated by phage induced cell lysis [11].
Theoretical works and mathematical models assume that parasitic or selfish genetic elements inevitably emerge in any replicator system and counteracting strategies are required to achieve stable co existence [12 14]. Thus, the presence of phages and host defense sys tems might be considered as a fundamental property of prokaryotic life. Billions of years of co evolution result ed in the development of a broad range of offense and defense strategies employed by viruses and their hosts. Genes associated with phage defense can constitute up to 10% of a microbial genome [15]. Traditionally, discovery of the phage defense systems was associated with selec tion of phage resistant strains and characterization of BIOCHEMISTRY (Moscow) Vol. 86 No. 3 2021 their specific traits. Recent increase in the availability of genomic data and application of bioinformatics approaches significantly expanded the field, allowing systematic prediction of novel phage defense gene clus ters. A popular "guilt by association" approach is based on the fact that functionally linked genes are often co localized [16]. Using a gene with known function as a "bait", neighboring genes in multiple genomes are assessed for probability to be found in the vicinity of the "bait" to predict possible functional linkage [17]. Applied to the known phage resistance systems this approach allowed to introduce an important concept of defense islands -genomic loci with clusters of antiviral defense genes [18]. An individual genome can contain multiple defense islands. Often, they include mobile genetic elements, which contribute to extensive horizon tal gene transfer of defense systems [19,20]. Since ~2/3 of the genes in defense islands were not known to be associated with phage resistance, novel types of antiviral systems were predicted [18]. Recently, systematic analy sis of all pfam database protein domains encountered in the defense islands within the microbial pan genome, resulted in prediction of nearly 300 gene families that were over represented around the known defense genes. Among these, genes that tend to form conserved clusters were assumed to represent candidate defense systems, and some were experimentally investigated. This resulted in validation of a dozen novel types of antiviral systems [21,22]. Even though in recent years our understanding of the abundance of defense systems types was signifi cantly expanded, it can be expected that we have unrav eled only a small proportion of their real diversity, as vast majority of the genomic data is considered as a "dark matter" and defense islands consisting solely from genes of unknown function are not detectable by the current methods [23]. Further improvement of the algorithms, like the use of more elaborate protein domains classifica tion, or application of deep learning approaches already proved useful in gene predictions [24 26] should pro mote further advancement in the field. At the same time, discoveries of novel systems pose a challenge of their functional and biochemical characterization, as current The outline of defensive strategies targeting different stages of the viral life cycle is presented in Fig. 1. Microbial resistance to phage infection could be associat ed with activity of specific immunity systems, whose main function is to inhibit foreign genetic material propaga tion, or with mutations and phase variation in the host genes that are required for productive viral infection. Phage resistance also could be associated with small mol ecules [27], or with activity of mobile genetic elements interfering with viral infection, like PLE (phage inducible island like element) and PICI (phage inducible chromosomal islands), that in a sense can be considered as parasites of parasites [28,29]. Immunity systems often rely on the recognition of specific sites in the invading nucleic acid or sense phage infection in other way to initiate the inhibitory response. To avoid self toxicity a self versus non self discrimination mecha nism should be employed by these systems. They can be further classified into innate immunity [including differ ent types of restriction-modification (R M), bacterio phage exclusion (BREX), defense island systems associ ated with restriction-modification (DISARM), toxin antitoxin (TA), abortive infection (Abi), and plethora of less studied systems] and adaptive immunity mediated by clustered regularly interspaced repeats (CRISPR) and CRISPR associated proteins (CRISPR Cas). Online databases of the prokaryotic immunity systems include REBASE, collection of known R M systems [30], TASmania that is specialized on TA systems [31], CRISPRminer and CRISPRCasdb for CRISPR Cas sys tems [32,33] and PADS that contains annotation of genes associated with different types of defenses [34]. Taxonomic distribution of the protein domains known to be involved in defense also can be viewed in AnnoTree [35].
Since the phage genomes encode only limited num ber of genes, most phages rely on their hosts for tran scription and translation [36], and often sequester host proteins as cofactors, like in the case of thioredoxin that is required for the T7 DNA polymerase activity [37]. Multiple studies revealed importance of the host genes for phage infection using the KEIO Escherichia coli single gene knockouts collection, dCas9 inhibition of specific genes, and transposon insertion mutagenesis [38 44]. Mutations in the non essential genes that are required for phage propagation is a common way for acquiring resist ance, first described in classical experiments of Luria and Delbruck [45].
For every known microbial defense strategy phages evolved means for counter defense, and thus it is only a matter of time before phage encoded inhibitors of novel defense systems will be described [25,26,46]. It was pro posed that similar to the defense islands, anti defense genes tend to form clusters in phage genomes or mobile genetic elements. Existence of such anti defense islands should promote discovery of novel host defense inhibitors [47]. Viral counter defense systems are beyond the scope of the current paper, and this subject can be further explored in other reviews [48 50].
First part of the current review will cover microbial strategies that allow to avoid recognition by phages, innate immunity mechanisms blocking early stages of infection, and systems that rely on DNA modification for self vs non self discrimination. Second part will be focused on the adaptive immunity systems and defenses activated at the late stages of infection.

THE SIMPLEST WAY OF PROTECTION -TO AVOID RECOGNITION
Phage infection is initiated upon recognition of spe cific receptors on the surface of the cell by the phage receptor binding proteins (RBP). Different types of sur face molecules can be exploited by phages as receptors, including pili and flagella, proteins, lipopolysaccharides (LPS) or carbohydrates. Interaction between the phage RBPs and the host receptor can be considered as a limit ing stage in infection, responsible, at least partially, for determination of the range of susceptible hosts [51 53]. Prokaryotes employ receptors masking, modification and mutation, or production of decoys to avoid recognition by phages (Fig. 2).
Role of extracellular matrix (ECM) and outer mem brane vesicles (OMV). Many bacteria are capable of secreting high molecular weight polymers, and spatially structured communities of cells surrounded by ECM form biofilms [54,55]. While the details of phage host interactions in biofilms are not yet fully understood [56], it was shown that the biofilm communities tend to be more resistant to viral predation, and the increased phage pressure might even enhance formation of biofilms, wherein surface receptors are less accessible [57,58]. Another benefit of spatial organization is that only the surface layer of cells is exposed to phages in the environment. Often cells in this layer are metabolically inactive and thus do not allow phage reproduction, while reducing the chances of attack on underlying cells [59]. Components of ECM also may perform a function of decoys or "sinks" that adsorb and immobilize phages before they can reach the cell surface [60,61]. It was shown that curli -proteinaceous components of ECMare associated with increased phage resistance in E. coli biofilms, while secretion of alginate exopolysaccharide protects Pseudomonas fluorescens [62,63]. Polysaccha ride capsule is another way to prevent phage adsorption [64], for example, overproduction of colanic acid is asso ciated with the resistance to different phages in Escherichia, and mucoid cells with gain of function mutations in the RCS signaling pathway controlling this BIOCHEMISTRY (Moscow) Vol. 86 No. 3 2021 function could be selected upon phage infection [44,65]. The role of decoys was also shown for the extracellular vesicles produced by Vibrio and Escherichia: such vesicles containing surface receptors can adsorb phages lowering their titer in the environment [66,67]. On the other hand, it was demonstrated that the receptor carrying vesicles can be incorporated into the membrane of otherwise non susceptible Bacillus cells making them sensitive to infection [68].
Receptors alterations. Phage adsorption often can be considered as a two stage process, where the first stage involves reversible binding to surface exposed structures (i.e., phage T5 binding to the E. coli LPS O antigen, or phage SSP1 binding to the B. subtilis cell wall teichoic acid), followed by irreversible attachment to a secondary receptor (i.e., FhuA and YueB proteins in the case of T5 and SSP1, respectively) [51]. Alteration of the primary and secondary surface receptors is a common way for acquiring phage resistance [69]. Even point mutations in the protein receptors genes can affect efficiency of the interaction with phage's RBPs, as was shown, for exam ple, for the phage T5 receptor FhuA or phage λ receptor LamB [70,71]. Mutations affecting biosynthetic path ways responsible for the synthesis of cell wall compo nents (Gram negative bacteria LPS or Gram positive bacteria teichoic acid) can lead to the alteration in the structure of these molecules, and thus affect recognition by phages targeting these receptors [72,73]. Bacterial extracellular suprastructures also can serve as phage receptors, and mutations in the genes involved in pili or flagella formation are known to provide defense against phages [74,75].
Receptor mutations can be hardly considered as a bona fide defense strategy, as phage infection only selects for the pre existing resistant cells within the population. However, specific mechanisms that control receptors accessibility do exist. Bacterial surface molecules are involved in important cellular processes, including motility and nutrients transport, and their mutations can be associated with fitness costs. Thus, phase variation (reversible switching of gene expression) or masking of receptors can be a safer long term strategy [76,77]. Masking involves synthesis of molecules that bind to the host receptors and physically block interaction with the phage RBPs, like in the case of the TraT protein binding to the OmpA receptor in E. coli [78]. Temporal chemical modifications of receptors also prevent their recognition: examples include pilus glycosylation in P. aeruginosa or O antigen glucosylation in Salmonella enterica [79,80]. Transcriptomic studies demonstrated that alteration of receptors can represent a part of the stress response, like in the case of Lactococcus lactis, wherein phage infection activates genes responsible for the cell wall D alanylation [81]. Downregulation of the receptors expression through phase variation can be achieved by recombina tion or predisposed alterations in the promoter regions [82,83], while mutations in the specific hot spot loci within genes can lead to frameshifting and production of truncated proteins [84]. Multiple inversion systems, known as shufflons, also can be potentially involved in the regulation of receptors expression [85], as in the case of the PilV protein from E. coli IncI plasmids, where one out of seven C terminal region variants can be selected for expression [86]. The common gut symbiont Bacteroides thethaiotaamicron was shown to employ phase variation in at least 19 loci, controlling production of different capsule types and S layer proteins expression [87]. Phase variation permits co existence of microbial sub populations expressing different gene variants, allowing to hedge the risks of phage infection and effects of environmental factors.
Archaeal cell wall structure is very different from that of bacteria, and attachment of the archaeal viruses to the surface of their hosts is poorly understood [88]. Recently, the first structure based adsorption model was provided for the archaeophage STIV binding to the pili like struc ture of Sulfolobus cells [89]. Mutations in the genes asso ciated with the surface molecules in Sulfolobus were shown to provide resistance to SIRV2 infection [90]. Despite a lack of data, one can expect that mechanisms similar to those described for bacterial cells also prevent adsorption of archaeal viruses to their hosts.

SMALL MOLECULES AND PHAGE DEFENSE
Direct role in defense. A plethora of early phage bio logy studies investigated the effects of chemicals on the efficiency of viral infection [91 93]. It was demonstrated that certain compounds, including bacteria synthesized antibiotics, may affect production of phage progeny at concentrations subinhibitory for bacterial growth [94 96]. Likewise, DNA staining dyes and intercalating agents (e.g., propidium iodide or doxorubicin) can inac tivate phage particles [97]. Yet, only recently the involve ment of small molecules produced by bacteria in phage defense had been re evaluated [27]. High throughput screening of chemical libraries identified molecules that were able to interfere with phage λ infection in E. coli without affecting bacterial growth. The anti phage activ ity was shown in a biologically relevant context: addition of the spent medium collected after the growth of dox orubicin and daunorubicin producing strain of Streptomyces peucetius was able to inhibit infection in the phage sensitive strain of S. coelicolor. The authors further demonstrated that ~1/3 of tested Streptomyces extracts had anti phage activity against natural isolates of actinophages, which suggested that chemical defense is a widespread strategy. In most of the extracts anthracy clines or other DNA intercalating agents were deter mined as active components. Some DNA intercalating agents can inactivate phage particles before their contact with the cell by promoting non controlled DNA ejection [98]. Yet it was shown that phage DNA can enter the cell in the presence of daunorubicin but early stages of infec tion were suppressed [27]. This work raises questions regarding inhibitory mechanisms of small molecules, their specificity, self toxicity avoidance and possibility to use anti viral metabolites as a community resource in microbial populations.
Viperins and chain termination nucleotides. Interferon induced antiviral response of higher eukary otes, including humans, involves synthesis of the chain terminating ribonucleotide ddhCTP, achieved through activation of the viperin enzyme [99]. Chain termination is supposed to suppress viral transcription and inhibit replication of viruses with RNA genomes [100]. Viperin genes were sporadically encountered in the genomes of Bacteria and Archaea, and a recent study demonstrated that prokaryotic viperins (pVips) provide protection against phage infection [101,102]. In contrast to eukary otic homologs synthesizing ddhCTP, pVips also producde ddhUTP and ddhGTP. Heterologous expression of vari ous pVips in E. coli inhibited phage T7 infection and T7 RNA polymerase transcription [102]. pVips expression had no effect on the host transcription and was not toxic to cells, suggesting that viral RNA polymerases are more sensitive to ddhNTP inhibition. Intriguingly, pVips expression provided much higher level of protection against phages P1 and λ that rely on the host RNA poly merase for transcription of their genes, which implied existence of additional defense mechanisms associated with viperins. pVip genes are found to be enriched within defense islands and can be accompanied by the genes of nucleotide kinases, that generateNTPs from NMPs and possibly increase the pool of NTP substrates for pVips. It has been also suggested that the genes of HicA like RNAse or ankyrin repeat domain proteins found close to some pVip genes are involved in the phage infection sens ing [102].
Regulatory role. Small metabolites also can be involved in microbial phage defense as signaling mole cules or as cofactors for immunity proteins. The examples will be described later, while here we will cover the indi rect role of small molecules in regulation of the defense genes expression. Quorum sensing (QS) allows to meas ure microbial population density and can be considered as a communication system that is based on the secretion BIOCHEMISTRY (Moscow) Vol. 86 No. 3 2021 of extracellular molecules [103]. High density popula tions are more vulnerable to phage infections and QS mediated activation of defense barriers with increasing cell density is beneficial for survival. For example, in a selection experiment QS proficient P. aeruginosa culture was shown to achieve higher levels of phage resistance compared to the QS deficient cells [104]. Besides their role in biofilm formation [105], QS signals can regulate expression of phage receptors and immunity systems genes. For instance, N acylhomoserine lactone treatment reduces amount of phage λ receptors in E. coli [106], while in V. anguillarum it leads to the production of extra cellular proteases and lowers the amount of the phage KVP40 receptors [107,108]. QS regulation was shown to activate expression of the CRISPR Cas systems compo nents in Serratia, Pseudomonas, and other bacteria [109,110]. To guide the lysis lysogeny decision many phages exploit bacterial QS signals or encode their own signaling systems, like the recently described Arbitrium [111 114]. One can speculate that bacteria may intercept inter viral communication molecules or produce their own specific signals upon phage infection for mobiliza tion of defense barriers in the population.

PHAGE GENOME ENTRY INHIBITION
Following adsorption, phage genome is ejected from the capsid and transported into the host cell [115]. Several mechanisms block this stage of the viral life cycle [116]. As a rule, such mechanisms are encoded by prophages and underlie the phenomenon of superinfec tion exclusion (Sie) -prevention of secondary infection with homoimmune phages after the primary infection (or lysogenisation) is established [117]. Membrane associat ed Sie proteins can block phage DNA entry by targeting phage tape measure protein, like in the case of Streptococcus thermophilus phage TP J 34 lipoprotein [118,119] or E. coli phage HK97 gp15 [120,121]. Phage T4 Sp protein is known to inhibit T4 lysozyme, required for degradation of the cell wall peptidoglycan layer in E. coli [122]. Host proteins required for phage DNA translocation are thought to be other targets for Sie, like the phage T4 Imm protein [122,123] or mycobacterio phage Fruitloop gp52 protein that interacts with the host Wag31 and inhibits infection by the Wag31 dependent phages [124]. Sie systems with unknown targets were described for Lactococcus lactis phage Tuc2009 [125], S. enterica phage P22 [126], V. cholerae phage K139 [127], E. coli phage P1 [128] and P. aeruginosa B3 like phages [129]. Although Sie systems are considered prima rily as a means of competition between phages [130], they provide benefits for the host and can eventually become an integral part of the chromosome, like in the case with protein DicB encoded by the cryptic prophage Qin in E. coli [131].

DNA MODIFICATION BASED IMMUNITY SYSTEMS
Once the phage genome is inside the cell it can be targeted by a variety of enzymes that cause its degrada tion. For example, the RecBCD nuclease/helicase com plex that is also involved in host DNA repair targets free DNA ends that are exposed at the early stages of infection by dsDNA phages with linear genomes [132,133]. Most often the function of incoming DNA degradation is per formed by innate and adaptive immunity systems. Modification of the host DNA that is required for dis crimination of the cell's own and foreign genetic materi al is a hallmark of the R M systems. Modification mod ule is responsible for epigenomic labeling of the host DNA, while the non labeled phage DNA is a subject of endonucleolytic cleavage executed by the restriction module [134,135]. The general principle of R M mech anism is depicted in Fig. 3. In addition to the classical R M, there is a plethora of defense systems encoding modi fication module, however, their mechanisms of restriction of foreign genetic material have not yet been determined.
Classical R M systems. R M systems were discovered in the early 1950s while deciphering the phenomenon of host controlled viral modification [136,137]. They were extensively studied during the early years of molecular biology, culminating in wide applications and rise of the recombinant DNA technologies [138]. More than 300,000 known or putative R M enzymes are currently listed in the REBASE and R M systems have been found in ~90% of the sequenced bacterial and archaeal genomes [30]. The functional subunits of R M systems include methyltransferase (MTase) that transfers methyl group from the S adenosyl methionine (SAM) donor molecule to cytosine or adenine in DNA, and cognate restriction endonuclease (REase). Some systems also encode translo case that utilizes the energy of ATP hydrolysis for motor functions, and has a specificity subunit containing target recognition domains (TRD) that define REase and MTase sequence specificity. Based on the subunit composition, co factors requirement, and the mode of action, R M sys tems are divided into 4 types. However, this classification does not reflect their evolutionary relationship [139,140]. Subunit composition for the modification and restriction complexes, some recognition sites, and cleavage patterns of the Type I IV R M systems are presented in Fig. 4.   Fig. 4. Functional subunits, recognition sites, composition of modification and restriction complexes for the representative members of Type I IV R M systems. Type II R M usually recognize palindromic sites and both DNA strands within or in close proximity to the non methy lated sites are cleaved; Type I R M systems modify both strands of the bipartite asymmetric DNA sites, cleavage requires interaction of two restriction complexes bound to non methylated sites, achieved through ATP dependent DNA looping, and occurs at non fixed position in between; Type III R M systems modify only one strand of the asymmetric recognition sites, cleavage occurs at a fixed position from one recog nition site, when the restriction complex bound to the non methylated site interacts with other complex that was activated by recognition of the nearby non methylated site in the inverted repeat orientation; Type IV R M systems lack modification module and cleave DNA after recognition of the modified sites. Dashed line indicate that the subunit could be dispensable for the depicted activity. BIOCHEMISTRY (Moscow) Vol. 86 No. 3 2021 Type II R M is the most studied group. Systems of this type normally comprise separate MTase and REase proteins. MTase is monomeric, while REase acts as a homodimer. Typically, both cognate enzymes recognize the same specific 4 8 bp long palindromic DNA site. DNA cleavage occurs at both DNA strands at a fixed position within or in close proximity to the non methy lated recognition site and is dependent on the presence of divalent cations, in most cases Mg 2+ [141,142]. MTase efficiently methylates non methylated sites and hemi methylated sites that are produced after DNA replication of fully methylated DNA, while REase has low binding affinity to the methylated and hemi methylated sites [143]. In addition to the described simple mode of action characteristic for the IIP subtype, enzymes of other sub types could display unusual features [139]. For example, the Type IIA, IIS, and IIL enzymes recognize asymmet ric sequences; the REase and MTase polypeptides in the Type IIC and IIL are fused; while the Type IIE and IIF REases require binding to two sites for cleavage. The IIL and IIG subtypes enzymes (e.g., MmeI and Eco571) rec ognize asymmetric sites that are methylated only at one DNA strand [144,145]. The second strand of the Eco571 sites is methylated by an additional methyltransferase [145]. It is not clear how post replicational cleavage of the non methylated sites in IIL systems is avoided. Excessive REase expression is toxic to the cell, while the excess of MTase can lead to methylation of incoming phage genomes and restriction evasion. Thus, orchestrat ed regulation of MTase and REase genes expression should be achieved, for example, by the activity of the controller C protein [146,147] or by the MTase binding and/or methylation of its own promoter [148].
The type I R M systems encode MTase (HsdM), REase (HsdR), and specificity subunit (HsdS) and the most studied example of this kind of system is EcoKI. These enzymes function as HsdM 2 HsdS 1 HsdR 2 com plexes, which can perform both restriction and methyla tion activities, while methylation also can be performed by the HsdM 2 HsdS 1 or HsdM 2 HsdS 1 HsdR 1 complex es [149 151]. The bipartite DNA sites separated by a degenerate sequence (~AACNNNNNNGTGC for EcoKI) are recognized by 2 TRD domains in the HsdS subunit, and both DNA strands of these asymmetric sites are methylated. The mechanistic model of restriction activity is quite comprehensive: after recognition of a non methylated site by the restriction complex, the ATPase motor function of the HsdR subunit is activated and the complex pulls on the bound DNA in both direc tions creating loops [152]. Translocation consumes about 3 ATP molecules per nucleotide [153]. Cleavage occurs when 2 restriction complexes anchored on the different sites collide or when a roadblock (a replication fork or a supercoiled region) is encountered by one of the com plexes [154]. Positions of the DNA cleavage are not fixed and it usually occurs between 2 neighboring recognition sites [155]. SAM is required not only as a donor of methyl groups but also as a cofactor for the restriction complex. Enzymes of the Type ISP subclass represent single polypeptide combining methylation and restriction activ ities, and methylate only one DNA strand [156,157]. To lower the risks of the host DNA damage, activity of the Type I complexes can be additionally controlled, for example, by the ClpXP mediated proteolytic cleavage of the HsdR subunit, the phenomenon that is known as restriction alleviation [158,159]. The type I enzymes are known to alter sequence specificity through phase varia tion of their TRD domains [160].
The type III R M systems in many aspects are sim ilar to the Type I [161]. They function as multiprotein complexes consisting of Mod and Res subunits. DNA modification is performed by the Mod 2 homodimer, while Res 2 Mod 2 or Res 1 Mod 2 complexes serve as ATP and SAM dependent REases [162,163]. The type III enzymes recognize short non palindromic DNA sequences and methylate only one DNA strand. Thus, similar to the Type ISP enzymes, half of their hemi methylated sites become non methylated after replica tion. To ensure that cell's own DNA is not a subject of restriction, two sites in reverse (head to head or tail to tail) orientation are required for cleavage, e.g., when two non methylated neighboring sites are located on differ ent DNA strands -a situation, which is not normally encountered in the host DNA [164,165]. Recognition of the non methylated DNA site activates translocase activity of the Res subunit, but in contrast to the Type I enzymes, it consumes much less ATP and instead of bidirectional looping triggers one dimensional diffusion along the DNA [166,167]. Cleavage occurs at a fixed position from one of the recognition sites, when the acti vated restriction complex interacts with another complex bound to the non methylated site. Expression of the Mod subunit can be regulated through phase varia tion [168].
To avoid cleavage by the host REases, phages can incorporate modified bases into their genome [169] and the Type IV R M systems evolved in response to specifi cally target modified DNA [170,171]. This is a divergent and poorly studied group of the solitary REase proteins that lack cognate Mtase. The subtype IIM enzymes also recognize methylated bases and are considered as Type IV by some authors [171]. The type IV REase usually have broad sequence specificity and can target methylcytosines (McrA), methyladenines (Mrr), or phosphorothioated DNA (ScoMcrA) [172 174]. Some enzymes of this group require ATP or GTP hydrolysis and more than one site for cleavage (McrBC or SauUSI) [175,176]. Abundance of these proteins and their ecological importance is likely underestimated.
In addition to their role in phage resistance and HGT control, R M systems influence other biological processes [177]. For example, MTase genes are often found without the cognate REase genes and such orphan enzymes are thought to be involved in regulation of gene expression or replication. The best characterized exam ples include the Dam MTase in E. coli and CcrM in Caulobacter crescentus [178,179]. The R M systems may be considered as selfish TA elements, since the loss of an MTase gene can lead to post segregational killing associ ated with DNA damage elicited by the REase [180]. The evolutionary and ecological roles of R M systems have been addressed in several reviews [76, 181 183].
Phage growth limitation (Pgl) system. Pgl could rep resent a unique example of the reverse mode of action to the R M systems, where the modified DNA is restricted, but unlike in the case of the Type IV R M systems, mod ification of the phage genome is carried out by the host defense system itself. The Pgl defense was first described during isolation of the ϕC31 phage infecting Streptomyces coelicolor A(3)2 [184] and later it was shown that the phage progeny was released from the Pgl+ cells after the first round of infection but subsequent rounds of infection of the host were restricted [185]. It was suggested that the initially released phages bore Pgl specific modifications. The Pgl system has been found only in Actinomyces. It is assumed that such altruistic behavior could be afforded by multicellular bacteria that sacrifice one compartment for the protection of the whole mycelium. At the same time, Pgl mode of action might be beneficial in competition with the closely related Pgl deficient cell types, since the Pgl+ cells produce modified phages that are capable to target other, but not the Pgl+ hosts [186]. The Pgl pheno type has an additional benefit: in the case of classical R M systems erroneous methylation of the phage genome often leads to the emergence of the protected phage prog eny, which would be able to wipe out the bacterial popu lation. In contrast, the reverse mode of action character istic of Pgl ensures that no escaper phages can emerge in the course of infection (Fig. 5).
The Pgl system encodes 4 components: PglX -an adenine methyltransferase, PglY -an ATPase, PglW -a protein kinase; and PglZ -an alkaline phosphatase  (Fig. 6a) [187 189]. All four proteins are required for defense and activity of the first three components has been demonstrated in vitro [190]. Deletion of the pglZ gene is impossible in the presence of functional pglX. Thus, it was suggested that the proteins encoded by these genes form a TA pair and that PglX plays a critical role in restriction, when its activity is unrestrained by PlgZ [190]. The mechanisms of phage infection sensing by the Pgl system and of the restriction module functioning have not been determined yet.
BacteRiophage EXclusion systems (BREX). Global analysis of the pglZ gene distribution in the defense islands has shown that it could be found not only in Actinomyces, and also it could be often embedded in the conserved gene clusters distinct from the Pgl [18]. It was assumed that these pglZ containing clusters represent a novel superfamily of phage defense systems denoted BREX [191]. Based on the composition of components, the BREX systems have been classified into 6 types, and Pgl has been assigned to the Type II BREX (Fig. 6a). Besides the presence of PglZ, the common feature of all BREX systems is the presence of ATPase and methyl transferase. In the Type IV systems, the latter is replaced by a PAPS reductase, an enzyme that can be involved in DNA phosphorothioation [192]. The most prevalent is the Type I BREX and systems of this type have been experimentally investigated in B. subtilis, E. coli, and V. cholerae (where it has been found in the SXT conjuga tive elements) [191,193,194]. Activity of the BREX methyltransferase has also been demonstrated in Lactobacillus casei [195]. The core components of Type 1 BREX systems include BrxX (PglX), an adenine specific methyltransferase, BrxZ (PglZ), an alkaline phosphatase, BrxC, an ATPase, BrxL, a Lon like protease, and small protein of unknown function BrxB. These predicted activities have not been verified in vitro and some large domains of the BREX proteins have not been assigned a function yet. Additional small proteins that are presum ably playing a regulatory role or are required to confer protection from the specific phages also could be present (i.e., BrxA) [191,193].
Similar to the classical R M systems, it was shown that the BREX systems methylate cell's own DNA, and presence of the BREX specific modifications in the phage genome allow to overcome the defense [191,193]. BREX sites are non palindromic and are methylated only at one strand, which, similar to the Type III and ISP R M systems, might imply the requirement of multiple sites and their specific orientation for restriction. BREX acts at the early stages of phage infection and accumulation of the phage progeny DNA inside the BREX+ cells is pre vented. Yet, the mechanisms of restriction remain unknown. The E. coli BREX defense is suppressed by the phage T7 DNA mimic protein Ocr [46], which is a well known Type I R M systems inhibitor [151,196,197]. This result seems to suggest common mechanistic fea tures between the BREX and multisubunit complexes of R M systems.
Defense Island System Associated with Restriction-Modification (DISARM) systems. Following the discov ery of BREX, mining of the conserved gene clusters in the defense islands resulted in prediction of another novel system -DISARM [198]. Antiviral activity was demon strated for the DISARM system from B. paralicheni formis. It comprises five components: the DrmA helicase, the DrmB protein with unknown function domain DUF1998, DrmC, containing a phospholipase D (PLD) domain, DrmE, and cytosine specific methyltransferase DrmMII [198]. This composition is typical for the class 2 DISARM, while in more abundant class 1 DrmMII is substituted with DrmMI, an adenine methyltransferase, and DrmE -with DrmD, a SNF 2 like helicase (Fig. 6a). The PLD domains can be involved in catalytic activity of nucleases [199]. Yet, surprisingly, DrmC was shown to be dispensable for the DISARM mediated defense against phages. DrmMII alone can methylate symmetric (CCWGG) sites in the host DNA and deletion of the methyltransferase gene in the presence of the full DIS ARM cluster is toxic to cells. Yet, in contrast to the clas sical R M phenotype, phage ϕ3T with the DISARM spe cific methylation was unable to infect DISARM+ cells, suggesting that methylation is necessary but not sufficient to avoid restriction [198]. Similar to BREX, DISARM does not affect phage adsorption and inhibits early stages of infection by an unknown mechanism.
7 Deazapurine in DNA (DPD) systems. Besides DNA methylation, 7 deazaguanine based modifications also can be coupled with the R M like defense systems [200]. Multiple enzymes are involved in generation of 7 deazaguanine, which usually serves as a precursor of the modified bases in tRNA. Some prokaryotes encode addi tional biosynthetic gene clusters responsible for introduc tion of 7 deazaguanine in DNA [200]. Such DPD sys tems (from 7 deazapurine in DNA) can consist of up to 10 components (DpdA K) [200,201]. The R M like activity of the DPD system was suggested based on inhi bition of transformation of the non modified plasmid into the cells of Salmonella Montevideo carrying a dpd cluster [200]. Activity of the DPD system against phage infection had not been demonstrated so far, and possible restriction mechanism remains unclear. Auxiliary DPD components include helicases, ParB like NTPase, and PLD nuclease, which may be involved in the restriction of unmodified DNA. Interestingly, similar 7 deazaguanine modification clusters have been identified in some viral genomes (e.g., in phages 9g or Cajan), where modifica tion was shown to protect phage DNA against a wide spectrum of REases [202,203].
Phosphorothioate (PT) modification based systems. While modifications discussed so far affected only nucle obases, the DNA sugar phosphate backbone also can be subjected to modification. Replacement of a non bridg ing oxygen with a sulfur atom that leads to formation of the phosphorothioate internucleotide linkage -PT mod ification -could be associated with different defense sys tems in Bacteria and Archaea [204 206]. These systems are summarized in Fig. 6b.
The PT modification occurs as a result of the activi ty of the system of dndABCDE genes (Dnd is a phenotype associated with DNA degradation), which encode the DndA cysteine desulfurase, the DndC PAPS reductase, the DndD ATPase/nicking endonuclease, and a small protein DndE that binds nicked DNA, while DndB regu lates transcription of the dndBCDE operon and deter mines proportion of the PT modified sites in the genome [204, 207 209]. Not all stages of the biochemical pathway involved in PT modification have been determined but it is known that cysteine serves as a donor of the sulfur atom that is transferred to DndC and next incorporated in an energy dependent manner into the DNA that was pre liminary nicked at specific sites by DndD [192,210,211]. Recently, it was shown that the dnd genes could also be BIOCHEMISTRY (Moscow) Vol. 86 No. 3 2021 involved in the PT modification of RNA [212]. On its own, the PT modification is thought to be involved in the maintenance of the redox homeostasis and control of gene expression [213], but the Dnd modification module in Bacteria is often accompanied by the dndFGH restric tion gene cluster [214]. In vitro activity of the DndFGH components has not been investigated but the presence of the dndABCDE FGH cluster in vivo inhibits transforma tion with the non modified DNA, while DndFGH expression in a strain without DndABCE leads to the cleavage of cell's own DNA [214 216]. The most promi nent feature of PT modification, which is quite distinct from the R M methylation, is the fact that only a small proportion of available sites are modified and modifica tion of each specific site is transient, which raises ques tions about the mechanisms of self immunity avoidance [217,218]. Presence of Dam methylation affects distribu tion of the PT modified sites, while does not affect their overall density [219]. It was further suggested that PT modification specificity could be defined by the overall geometry of the DNA site, rather than its sequence [219].
The PT modification has also been shown in Archaea, where, instead of the dndFGH, the restriction function is performed by the pbeABCD gene cluster [205]. The dndCDEA pbeABCD from Haloterrigena jeotgali was shown to provide antiviral defense, and restriction activi ty was dependent on the functionally active PT modifica tion module, which is distinct from the dndFGH behavior [205]. Accumulation of the viral DNA was not observed inside dndCDEA pbeABCD infected cells, though its cleavage has not been demonstrated either. The pbeABCD genes can be found as solitary or adjacent to the methyl transferase genes, which implies a possibility of exchange of modules between the different defense systems [205].
Recently, another novel PT modification based defense system has been discovered -SspABCD SspE [206]. The sspABCD genes are not homologous to dndABCDE but encode similar functional domains and perform PT modification of DNA, while SspE serves as a restriction component, inhibiting phage infection. In vitro, SspE was shown to possess an NTPase activity, which was stimulated by the presence of PT modified sites, and non specific nicking endonuclease activity [206]. The feature of the SspABCD system is modifica tion of only one DNA strand within non palindromic recognition sites.
Description of other prokaryotic defense strategies, and discussion of the interplay between different antiviral systems will be continued in the second part of the man uscript [Biochemistry (Moscow), vol. 86, Issue 4].
Funding. The work was supported by the Russian Foundation for Basic Research (projects nos. 19 14 50560, 19 34 90160). AI was supported by the grant from the Russian Foundation for Basic Research (no. 19 34 90160), OM was supported by the grant from the Russian Science Foundation (no. 19 74 00118).
Authors' contributions. AI composed the manu script, AI&OM prepared illustrations, KS revised the text.
Ethics declarations. The authors declare no conflict of interest in financial or any other sphere. This article does not contain any studies with human participants or animals performed by any of the authors.
Open access. This article is distributed under the terms of the Creative Commons Attribution 4.0 Inter national License (http://creativecommons.org/licenses/ by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.