INTRODUCTION

Phages are found in any natural environment and are considered to be the most abundant biological entities on Earth [1-3]. The estimated numbers of phage particles can be as high as ~1010/liter in sea or freshwater and ~109/g in soil. Overall, phages are thought to outnumber microbial hosts by 10 to 1 or more [4, 5]. Phages play an important ecological role by controlling the size and diversity of microbial populations [6], while phage-induced lysis of cells ensures flow of nutrients through the food networks [7]. Phages greatly enhance lateral gene transfer, and medically relevant bacterial traits are often associated with prophages [8]. Host genes can be incorporated in a viral genome or can be packaged into capsids to generate transducing particles [9]. Besides, bacteria can produce phage-like gene transferring agents carrying host DNA [10]. The spread of plasmids is also facilitated by phage-induced cell lysis [11].

Theoretical works and mathematical models assume that parasitic or selfish genetic elements inevitably emerge in any replicator system and counteracting strategies are required to achieve stable co-existence [12-14]. Thus, the presence of phages and host defense systems might be considered as a fundamental property of prokaryotic life. Billions of years of co-evolution resulted in the development of a broad range of offense and defense strategies employed by viruses and their hosts. Genes associated with phage defense can constitute up to 10% of a microbial genome [15]. Traditionally, discovery of the phage defense systems was associated with selection of phage-resistant strains and characterization of their specific traits. Recent increase in the availability of genomic data and application of bioinformatics approaches significantly expanded the field, allowing systematic prediction of novel phage defense gene clusters. A popular “guilt-by-association” approach is based on the fact that functionally linked genes are often co-localized [16]. Using a gene with known function as a “bait”, neighboring genes in multiple genomes are assessed for probability to be found in the vicinity of the “bait” to predict possible functional linkage [17]. Applied to the known phage resistance systems this approach allowed to introduce an important concept of defense islands – genomic loci with clusters of antiviral defense genes [18]. An individual genome can contain multiple defense islands. Often, they include mobile genetic elements, which contribute to extensive horizontal gene transfer of defense systems [19, 20]. Since ~2/3 of the genes in defense islands were not known to be associated with phage resistance, novel types of antiviral systems were predicted [18]. Recently, systematic analysis of all pfam database protein domains encountered in the defense islands within the microbial pan-genome, resulted in prediction of nearly 300 gene families that were over-represented around the known defense genes. Among these, genes that tend to form conserved clusters were assumed to represent candidate defense systems, and some were experimentally investigated. This resulted in validation of a dozen novel types of antiviral systems [21, 22]. Even though in recent years our understanding of the abundance of defense systems types was significantly expanded, it can be expected that we have unraveled only a small proportion of their real diversity, as vast majority of the genomic data is considered as a “dark matter” and defense islands consisting solely from genes of unknown function are not-detectable by the current methods [23]. Further improvement of the algorithms, like the use of more elaborate protein domains classification, or application of deep learning approaches already proved useful in gene predictions [24-26] should promote further advancement in the field. At the same time, discoveries of novel systems pose a challenge of their functional and biochemical characterization, as currently mechanisms of protection for most are poorly understood.

The outline of defensive strategies targeting different stages of the viral life cycle is presented in Fig. 1. Microbial resistance to phage infection could be associated with activity of specific immunity systems, whose main function is to inhibit foreign genetic material propagation, or with mutations and phase variation in the host genes that are required for productive viral infection. Phage resistance also could be associated with small molecules [27], or with activity of mobile genetic elements interfering with viral infection, like PLE (phage-inducible island-like element) and PICI (phage-inducible chromosomal islands), that in a sense can be considered as parasites of parasites [28, 29]. Immunity systems often rely on the recognition of specific sites in the invading nucleic acid or sense phage infection in other way to initiate the inhibitory response. To avoid self-toxicity a self-versus non-self-discrimination mechanism should be employed by these systems. They can be further classified into innate immunity [including different types of restriction–modification (R-M), bacteriophage exclusion (BREX), defense island systems associated with restriction–modification (DISARM), toxin-antitoxin (TA), abortive infection (Abi), and plethora of less studied systems] and adaptive immunity mediated by clustered regularly interspaced repeats (CRISPR) and CRISPR-associated proteins (CRISPR-Cas). Online databases of the prokaryotic immunity systems include REBASE, collection of known R-M systems [30], TASmania that is specialized on TA systems [31], CRISPRminer and CRISPRCasdb for CRISPR-Cas systems [32, 33] and PADS that contains annotation of genes associated with different types of defenses [34]. Taxonomic distribution of the protein domains known to be involved in defense also can be viewed in AnnoTree [35].

Fig. 1.
figure 1

General outline of the microbial defense strategies targeting different stages of the viral life cycle.

Since the phage genomes encode only limited number of genes, most phages rely on their hosts for transcription and translation [36], and often sequester host proteins as cofactors, like in the case of thioredoxin that is required for the T7 DNA-polymerase activity [37]. Multiple studies revealed importance of the host genes for phage infection using the KEIO Escherichia coli single-gene knockouts collection, dCas9 inhibition of specific genes, and transposon insertion mutagenesis [38-44]. Mutations in the non-essential genes that are required for phage propagation is a common way for acquiring resistance, first described in classical experiments of Luria and Delbruck [45].

For every known microbial defense strategy phages evolved means for counter-defense, and thus it is only a matter of time before phage-encoded inhibitors of novel defense systems will be described [25, 26, 46]. It was proposed that similar to the defense islands, anti-defense genes tend to form clusters in phage genomes or mobile genetic elements. Existence of such anti-defense islands should promote discovery of novel host defense inhibitors [47]. Viral counter-defense systems are beyond the scope of the current paper, and this subject can be further explored in other reviews [48-50].

First part of the current review will cover microbial strategies that allow to avoid recognition by phages, innate immunity mechanisms blocking early stages of infection, and systems that rely on DNA modification for self vs non-self-discrimination. Second part will be focused on the adaptive immunity systems and defenses activated at the late stages of infection.

THE SIMPLEST WAY OF PROTECTION – TO AVOID RECOGNITION

Phage infection is initiated upon recognition of specific receptors on the surface of the cell by the phage receptor binding proteins (RBP). Different types of surface molecules can be exploited by phages as receptors, including pili and flagella, proteins, lipopolysaccharides (LPS) or carbohydrates. Interaction between the phage RBPs and the host receptor can be considered as a limiting stage in infection, responsible, at least partially, for determination of the range of susceptible hosts [51-53]. Prokaryotes employ receptors masking, modification and mutation, or production of decoys to avoid recognition by phages (Fig. 2).

Fig. 2.
figure 2

Microbial strategies allowing prevention of recognition by phages.

Role of extracellular matrix (ECM) and outer membrane vesicles (OMV). Many bacteria are capable of secreting high-molecular weight polymers, and spatially structured communities of cells surrounded by ECM form biofilms [54, 55]. While the details of phage-host interactions in biofilms are not yet fully understood [56], it was shown that the biofilm communities tend to be more resistant to viral predation, and the increased phage pressure might even enhance formation of biofilms, wherein surface receptors are less accessible [57, 58]. Another benefit of spatial organization is that only the surface layer of cells is exposed to phages in the environment. Often cells in this layer are metabolically inactive and thus do not allow phage reproduction, while reducing the chances of attack on underlying cells [59]. Components of ECM also may perform a function of decoys or “sinks” that adsorb and immobilize phages before they can reach the cell surface [60, 61]. It was shown that curli – proteinaceous components of ECM – are associated with increased phage resistance in E. coli biofilms, while secretion of alginate exopolysaccharide protects Pseudomonas fluorescens [62, 63]. Polysaccharide capsule is another way to prevent phage adsorption [64], for example, overproduction of colanic acid is associated with the resistance to different phages in Escherichia, and mucoid cells with gain-of-function mutations in the RCS signaling pathway controlling this function could be selected upon phage infection [44, 65]. The role of decoys was also shown for the extracellular vesicles produced by Vibrio and Escherichia: such vesicles containing surface receptors can adsorb phages lowering their titer in the environment [66, 67]. On the other hand, it was demonstrated that the receptor-carrying vesicles can be incorporated into the membrane of otherwise non-susceptible Bacillus cells making them sensitive to infection [68].

Receptors alterations. Phage adsorption often can be considered as a two-stage process, where the first stage involves reversible binding to surface-exposed structures (i.e., phage T5 binding to the E. coli LPS O-antigen, or phage SSP1 binding to the B. subtilis cell wall teichoic acid), followed by irreversible attachment to a secondary receptor (i.e., FhuA and YueB proteins in the case of T5 and SSP1, respectively) [51]. Alteration of the primary and secondary surface receptors is a common way for acquiring phage resistance [69]. Even point mutations in the protein receptors genes can affect efficiency of the interaction with phage’s RBPs, as was shown, for example, for the phage T5 receptor FhuA or phage λ receptor LamB [70, 71]. Mutations affecting biosynthetic pathways responsible for the synthesis of cell wall components (Gram-negative bacteria LPS or Gram-positive bacteria teichoic acid) can lead to the alteration in the structure of these molecules, and thus affect recognition by phages targeting these receptors [72, 73]. Bacterial extracellular suprastructures also can serve as phage receptors, and mutations in the genes involved in pili or flagella formation are known to provide defense against phages [74, 75].

Receptor mutations can be hardly considered as a bona fide defense strategy, as phage infection only selects for the pre-existing resistant cells within the population. However, specific mechanisms that control receptors accessibility do exist. Bacterial surface molecules are involved in important cellular processes, including motility and nutrients transport, and their mutations can be associated with fitness costs. Thus, phase variation (reversible switching of gene expression) or masking of receptors can be a safer long-term strategy [76, 77]. Masking involves synthesis of molecules that bind to the host receptors and physically block interaction with the phage RBPs, like in the case of the TraT protein binding to the OmpA receptor in E. coli [78]. Temporal chemical modifications of receptors also prevent their recognition: examples include pilus glycosylation in P. aeruginosa or O-antigen glucosylation in Salmonella enterica [79, 80]. Transcriptomic studies demonstrated that alteration of receptors can represent a part of the stress response, like in the case of Lactococcus lactis, wherein phage infection activates genes responsible for the cell wall D-alanylation [81]. Downregulation of the receptors expression through phase variation can be achieved by recombination or predisposed alterations in the promoter regions [82, 83], while mutations in the specific hot-spot loci within genes can lead to frameshifting and production of truncated proteins [84]. Multiple inversion systems, known as shufflons, also can be potentially involved in the regulation of receptors expression [85], as in the case of the PilV protein from E. coli IncI plasmids, where one out of seven C-terminal region variants can be selected for expression [86]. The common gut symbiont Bacteroides thethaiotaamicron was shown to employ phase variation in at least 19 loci, controlling production of different capsule types and S-layer proteins expression [87]. Phase variation permits co-existence of microbial sub-populations expressing different gene variants, allowing to hedge the risks of phage infection and effects of environmental factors.

Archaeal cell wall structure is very different from that of bacteria, and attachment of the archaeal viruses to the surface of their hosts is poorly understood [88]. Recently, the first structure-based adsorption model was provided for the archaeophage STIV binding to the pili-like structure of Sulfolobus cells [89]. Mutations in the genes associated with the surface molecules in Sulfolobus were shown to provide resistance to SIRV2 infection [90]. Despite a lack of data, one can expect that mechanisms similar to those described for bacterial cells also prevent adsorption of archaeal viruses to their hosts.

SMALL MOLECULES AND PHAGE DEFENSE

Direct role in defense. A plethora of early phage biology studies investigated the effects of chemicals on the efficiency of viral infection [91-93]. It was demonstrated that certain compounds, including bacteria-synthesized antibiotics, may affect production of phage progeny at concentrations subinhibitory for bacterial growth [94-96]. Likewise, DNA-staining dyes and intercalating agents (e.g., propidium iodide or doxorubicin) can inactivate phage particles [97]. Yet, only recently the involvement of small molecules produced by bacteria in phage defense had been re-evaluated [27]. High-throughput screening of chemical libraries identified molecules that were able to interfere with phage λ infection in E. coli without affecting bacterial growth. The anti-phage activity was shown in a biologically relevant context: addition of the spent medium collected after the growth of doxorubicin- and daunorubicin-producing strain of Streptomyces peucetius was able to inhibit infection in the phage-sensitive strain of S. coelicolor. The authors further demonstrated that ~1/3 of tested Streptomyces extracts had anti-phage activity against natural isolates of actinophages, which suggested that chemical defense is a widespread strategy. In most of the extracts anthracyclines or other DNA intercalating agents were determined as active components. Some DNA intercalating agents can inactivate phage particles before their contact with the cell by promoting non-controlled DNA ejection [98]. Yet it was shown that phage DNA can enter the cell in the presence of daunorubicin but early stages of infection were suppressed [27]. This work raises questions regarding inhibitory mechanisms of small molecules, their specificity, self-toxicity avoidance and possibility to use anti-viral metabolites as a community resource in microbial populations.

Viperins and chain termination nucleotides. Interferon-induced antiviral response of higher eukaryotes, including humans, involves synthesis of the chain-terminating ribonucleotide ddhCTP, achieved through activation of the viperin enzyme [99]. Chain termination is supposed to suppress viral transcription and inhibit replication of viruses with RNA genomes [100]. Viperin genes were sporadically encountered in the genomes of Bacteria and Archaea, and a recent study demonstrated that prokaryotic viperins (pVips) provide protection against phage infection [101, 102]. In contrast to eukaryotic homologs synthesizing ddhCTP, pVips also producde ddhUTP and ddhGTP. Heterologous expression of various pVips in E. coli inhibited phage T7 infection and T7 RNA polymerase transcription [102]. pVips expression had no effect on the host transcription and was not toxic to cells, suggesting that viral RNA polymerases are more sensitive to ddhNTP inhibition. Intriguingly, pVips expression provided much higher level of protection against phages P1 and λ that rely on the host RNA polymerase for transcription of their genes, which implied existence of additional defense mechanisms associated with viperins. pVip genes are found to be enriched within defense islands and can be accompanied by the genes of nucleotide kinases, that generateNTPs from NMPs and possibly increase the pool of NTP substrates for pVips. It has been also suggested that the genes of HicA-like RNAse or ankyrin repeat domain proteins found close to some pVip genes are involved in the phage infection sensing [102].

Regulatory role. Small metabolites also can be involved in microbial phage defense as signaling molecules or as cofactors for immunity proteins. The examples will be described later, while here we will cover the indirect role of small molecules in regulation of the defense genes expression. Quorum sensing (QS) allows to measure microbial population density and can be considered as a communication system that is based on the secretion of extracellular molecules [103]. High-density populations are more vulnerable to phage infections and QS-mediated activation of defense barriers with increasing cell density is beneficial for survival. For example, in a selection experiment QS-proficient P. aeruginosa culture was shown to achieve higher levels of phage resistance compared to the QS-deficient cells [104]. Besides their role in biofilm formation [105], QS signals can regulate expression of phage receptors and immunity systems genes. For instance, N-acylhomoserine lactone treatment reduces amount of phage λ receptors in E. coli [106] while in V. anguillarum it leads to the production of extracellular proteases and lowers the amount of the phage KVP40 receptors [107, 108]. QS regulation was shown to activate expression of the CRISPR-Cas systems components in Serratia, Pseudomonas, and other bacteria [109, 110]. To guide the lysis-lysogeny decision many phages exploit bacterial QS signals or encode their own signaling systems, like the recently described Arbitrium [111-114]. One can speculate that bacteria may intercept inter-viral communication molecules or produce their own specific signals upon phage infection for mobilization of defense barriers in the population.

PHAGE GENOME ENTRY INHIBITION

Following adsorption, phage genome is ejected from the capsid and transported into the host cell [115]. Several mechanisms block this stage of the viral life cycle [116]. As a rule, such mechanisms are encoded by prophages and underlie the phenomenon of superinfection exclusion (Sie) – prevention of secondary infection with homoimmune phages after the primary infection (or lysogenisation) is established [117]. Membrane-associated Sie proteins can block phage DNA entry by targeting phage tape measure protein, like in the case of Streptococcus thermophilus phage TP-J 34 lipoprotein [118, 119] or E. coli phage HK97 gp15 [120, 121]. Phage T4 Sp protein is known to inhibit T4 lysozyme, required for degradation of the cell wall peptidoglycan layer in E. coli [122]. Host proteins required for phage DNA translocation are thought to be other targets for Sie, like the phage T4 Imm protein [122, 123] or mycobacteriophage Fruitloop gp52 protein that interacts with the host Wag31 and inhibits infection by the Wag31-dependent phages [124]. Sie systems with unknown targets were described for Lactococcus lactis phage Tuc2009 [125], S. enterica phage P22 [126], V. cholerae phage K139 [127], E. coli phage P1 [128] and P. aeruginosa B3-like phages [129]. Although Sie systems are considered primarily as a means of competition between phages [130], they provide benefits for the host and can eventually become an integral part of the chromosome, like in the case with protein DicB encoded by the cryptic prophage Qin in E. coli [131].

DNA MODIFICATION-BASED IMMUNITY SYSTEMS

Once the phage genome is inside the cell it can be targeted by a variety of enzymes that cause its degradation. For example, the RecBCD nuclease/helicase complex that is also involved in host DNA repair targets free DNA ends that are exposed at the early stages of infection by dsDNA phages with linear genomes [132, 133]. Most often the function of incoming DNA degradation is performed by innate and adaptive immunity systems. Modification of the host DNA that is required for discrimination of the cell’s own and foreign genetic material is a hallmark of the R-M systems. Modification module is responsible for epigenomic labeling of the host DNA, while the non-labeled phage DNA is a subject of endonucleolytic cleavage executed by the restriction module [134, 135]. The general principle of R-M mechanism is depicted in Fig. 3. In addition to the classical R-M, there is a plethora of defense systems encoding modification module, however, their mechanisms of restriction of foreign genetic material have not yet been determined.

Fig. 3.
figure 3

Principle of classical R-M systems. MTase from the Type I-III systems modifies specific motives in the host DNA, while non-methylated sites in the foreign DNA are cleaved by REase. Type IV REase lacks cognate MTase and cleaves DNA modified by the viral MTase.

Classical R-M systems. R-M systems were discovered in the early 1950s while deciphering the phenomenon of host-controlled viral modification [136, 137]. They were extensively studied during the early years of molecular biology, culminating in wide applications and rise of the recombinant DNA technologies [138]. More than 300,000 known or putative R-M enzymes are currently listed in the REBASE and R-M systems have been found in ~90% of the sequenced bacterial and archaeal genomes [30]. The functional subunits of R-M systems include methyltransferase (MTase) that transfers methyl group from the S-adenosyl methionine (SAM) donor molecule to cytosine or adenine in DNA, and cognate restriction endonuclease (REase). Some systems also encode translocase that utilizes the energy of ATP hydrolysis for motor functions, and has a specificity subunit containing target recognition domains (TRD) that define REase and MTase sequence specificity. Based on the subunit composition, co-factors requirement, and the mode of action, R-M systems are divided into 4 types. However, this classification does not reflect their evolutionary relationship [139, 140]. Subunit composition for the modification and restriction complexes, some recognition sites, and cleavage patterns of the Type I-IV R-M systems are presented in Fig. 4.

Fig. 4.
figure 4

Functional subunits, recognition sites, composition of modification and restriction complexes for the representative members of Type I-IV R-M systems. Type II R-M usually recognize palindromic sites and both DNA strands within or in close proximity to the non-methylated sites are cleaved; Type I R-M systems modify both strands of the bipartite asymmetric DNA sites, cleavage requires interaction of two restriction complexes bound to non-methylated sites, achieved through ATP-dependent DNA looping, and occurs at non-fixed position in between; Type III R-M systems modify only one strand of the asymmetric recognition sites, cleavage occurs at a fixed position from one recognition site, when the restriction complex bound to the non-methylated site interacts with other complex that was activated by recognition of the nearby non-methylated site in the inverted repeat orientation; Type IV R-M systems lack modification module and cleave DNA after recognition of the modified sites. Dashed line indicate that the subunit could be dispensable for the depicted activity.

Type II R-M is the most studied group. Systems of this type normally comprise separate MTase and REase proteins. MTase is monomeric, while REase acts as a homodimer. Typically, both cognate enzymes recognize the same specific 4-8 bp long palindromic DNA site. DNA cleavage occurs at both DNA strands at a fixed position within or in close proximity to the non-methylated recognition site and is dependent on the presence of divalent cations, in most cases Mg2+ [141, 142]. MTase efficiently methylates non-methylated sites and hemi-methylated sites that are produced after DNA replication of fully methylated DNA, while REase has low binding affinity to the methylated and hemi-methylated sites [143]. In addition to the described simple mode of action characteristic for the IIP subtype, enzymes of other subtypes could display unusual features [139]. For example, the Type IIA, IIS, and IIL enzymes recognize asymmetric sequences; the REase and MTase polypeptides in the Type IIC and IIL are fused; while the Type IIE and IIF REases require binding to two sites for cleavage. The IIL and IIG subtypes enzymes (e.g., MmeI and Eco571) recognize asymmetric sites that are methylated only at one DNA strand [144, 145]. The second strand of the Eco571 sites is methylated by an additional methyltransferase [145]. It is not clear how post-replicational cleavage of the non-methylated sites in IIL systems is avoided. Excessive REase expression is toxic to the cell, while the excess of MTase can lead to methylation of incoming phage genomes and restriction evasion. Thus, orchestrated regulation of MTase and REase genes expression should be achieved, for example, by the activity of the controller C-protein [146, 147] or by the MTase binding and/or methylation of its own promoter [148].

The type I R-M systems encode MTase (HsdM), REase (HsdR), and specificity subunit (HsdS) and the most studied example of this kind of system is EcoKI. These enzymes function as HsdM2-HsdS1-HsdR2 complexes, which can perform both restriction and methylation activities, while methylation also can be performed by the HsdM2-HsdS1 or HsdM2-HsdS1-HsdR1 complexes [149-151]. The bipartite DNA sites separated by a degenerate sequence (~AACNNNNNNGTGC for EcoKI) are recognized by 2 TRD domains in the HsdS subunit, and both DNA strands of these asymmetric sites are methylated. The mechanistic model of restriction activity is quite comprehensive: after recognition of a non-methylated site by the restriction complex, the ATPase motor function of the HsdR subunit is activated and the complex pulls on the bound DNA in both directions creating loops [152]. Translocation consumes about 3 ATP molecules per nucleotide [153]. Cleavage occurs when 2 restriction complexes anchored on the different sites collide or when a roadblock (a replication fork or a supercoiled region) is encountered by one of the complexes [154]. Positions of the DNA cleavage are not fixed and it usually occurs between 2 neighboring recognition sites [155]. SAM is required not only as a donor of methyl groups but also as a cofactor for the restriction complex. Enzymes of the Type ISP subclass represent single polypeptide combining methylation and restriction activities, and methylate only one DNA strand [156, 157]. To lower the risks of the host DNA damage, activity of the Type I complexes can be additionally controlled, for example, by the ClpXP-mediated proteolytic cleavage of the HsdR subunit, the phenomenon that is known as restriction alleviation [158, 159]. The type I enzymes are known to alter sequence specificity through phase variation of their TRD domains [160].

The type III R-M systems in many aspects are similar to the Type I [161]. They function as multiprotein complexes consisting of Mod and Res subunits. DNA modification is performed by the Mod2 homodimer, while Res2-Mod2 or Res1-Mod2 complexes serve as ATP- and SAM-dependent REases [162, 163]. The type III enzymes recognize short non-palindromic DNA sequences and methylate only one DNA strand. Thus, similar to the Type ISP enzymes, half of their hemi-methylated sites become non-methylated after replication. To ensure that cell’s own DNA is not a subject of restriction, two sites in reverse (head-to-head or tail to tail) orientation are required for cleavage, e.g., when two non-methylated neighboring sites are located on different DNA strands – a situation, which is not normally encountered in the host DNA [164, 165]. Recognition of the non-methylated DNA site activates translocase activity of the Res subunit, but in contrast to the Type I enzymes, it consumes much less ATP and instead of bidirectional looping triggers one-dimensional diffusion along the DNA [166, 167]. Cleavage occurs at a fixed position from one of the recognition sites, when the activated restriction complex interacts with another complex bound to the non-methylated site. Expression of the Mod subunit can be regulated through phase variation [168].

To avoid cleavage by the host REases, phages can incorporate modified bases into their genome [169] and the Type IV R-M systems evolved in response to specifically target modified DNA [170, 171]. This is a divergent and poorly studied group of the solitary REase proteins that lack cognate Mtase. The subtype IIM enzymes also recognize methylated bases and are considered as Type IV by some authors [171]. The type IV REase usually have broad sequence specificity and can target methylcytosines (McrA), methyladenines (Mrr), or phosphorothioated DNA (ScoMcrA) [172-174]. Some enzymes of this group require ATP or GTP hydrolysis and more than one site for cleavage (McrBC or SauUSI) [175, 176]. Abundance of these proteins and their ecological importance is likely underestimated.

In addition to their role in phage resistance and HGT control, R-M systems influence other biological processes [177]. For example, MTase genes are often found without the cognate REase genes and such orphan enzymes are thought to be involved in regulation of gene expression or replication. The best characterized examples include the Dam MTase in E. coli and CcrM in Caulobacter crescentus [178, 179]. The R-M systems may be considered as selfish TA elements, since the loss of an MTase gene can lead to post-segregational killing associated with DNA damage elicited by the REase [180]. The evolutionary and ecological roles of R-M systems have been addressed in several reviews [76, 181-183].

Phage growth limitation (Pgl) system. Pgl could represent a unique example of the reverse mode of action to the R-M systems, where the modified DNA is restricted, but unlike in the case of the Type IV R-M systems, modification of the phage genome is carried out by the host defense system itself. The Pgl defense was first described during isolation of the ϕC31 phage infecting Streptomyces coelicolor A(3)2 [184] and later it was shown that the phage progeny was released from the Pgl+ cells after the first round of infection but subsequent rounds of infection of the host were restricted [185]. It was suggested that the initially released phages bore Pgl-specific modifications. The Pgl system has been found only in Actinomyces. It is assumed that such altruistic behavior could be afforded by multicellular bacteria that sacrifice one compartment for the protection of the whole mycelium. At the same time, Pgl mode of action might be beneficial in competition with the closely related Pgl-deficient cell types, since the Pgl+ cells produce modified phages that are capable to target other, but not the Pgl+ hosts [186]. The Pgl phenotype has an additional benefit: in the case of classical R-M systems erroneous methylation of the phage genome often leads to the emergence of the protected phage progeny, which would be able to wipe out the bacterial population. In contrast, the reverse mode of action characteristic of Pgl ensures that no escaper phages can emerge in the course of infection (Fig. 5).

Fig. 5.
figure 5

a) Direct mode of R-M action: erroneous methylation of the phage DNA during infection of the cells bearing classical R-M system (RM+) leads to production of the phage progeny that will be able to continue efficient infection of the RM+ host. Modification can be lost only after phage passage through the R-M deficient cells (RM–). b) Reverse mode of R-M suggested for the Pgl system: after the first round of infection Pgl+ cells produce Pgl-modified phage, that is restricted in the Pgl+ cells during the second round of infection; Pgl-modified phage can efficiently infect Pgl-deficient hosts.

The Pgl system encodes 4 components: PglX – an adenine-methyltransferase, PglY – an ATPase, PglW – a protein kinase; and PglZ – an alkaline phosphatase (Fig. 6a) [187-189]. All four proteins are required for defense and activity of the first three components has been demonstrated in vitro [190]. Deletion of the pglZ gene is impossible in the presence of functional pglX. Thus, it was suggested that the proteins encoded by these genes form a TA pair and that PglX plays a critical role in restriction, when its activity is unrestrained by PlgZ [190]. The mechanisms of phage infection sensing by the Pgl system and of the restriction module functioning have not been determined yet.

BacteRiophage EXclusion systems (BREX). Global analysis of the pglZ gene distribution in the defense islands has shown that it could be found not only in Actinomyces, and also it could be often embedded in the conserved gene clusters distinct from the Pgl [18]. It was assumed that these pglZ-containing clusters represent a novel superfamily of phage defense systems denoted BREX [191]. Based on the composition of components, the BREX systems have been classified into 6 types, and Pgl has been assigned to the Type II BREX (Fig. 6a). Besides the presence of PglZ, the common feature of all BREX systems is the presence of ATPase and methyltransferase. In the Type IV systems, the latter is replaced by a PAPS reductase, an enzyme that can be involved in DNA phosphorothioation [192]. The most prevalent is the Type I BREX and systems of this type have been experimentally investigated in B. subtilis, E. coli, and V. cholerae (where it has been found in the SXT conjugative elements) [191, 193, 194]. Activity of the BREX methyltransferase has also been demonstrated in Lactobacillus casei [195]. The core components of Type 1 BREX systems include BrxX (PglX), an adenine-specific methyltransferase, BrxZ (PglZ), an alkaline phosphatase, BrxC, an ATPase, BrxL, a Lon-like protease, and small protein of unknown function BrxB. These predicted activities have not been verified in vitro and some large domains of the BREX proteins have not been assigned a function yet. Additional small proteins that are presumably playing a regulatory role or are required to confer protection from the specific phages also could be present (i.e., BrxA) [191, 193].

Fig. 6.
figure 6

a) Functional characteristics of subunits in different types of BREX and DISARM systems. Order of components on the scheme do not always reflect actual organisation of genes in the operons. b) PT modification-based systems and phosphorothioate modification of the DNA backbone; due to the transient nature of PT modification, only small proportion of the sites are actually modified in the genome, Dnd motifs could remain hemi-modified.

Similar to the classical R-M systems, it was shown that the BREX systems methylate cell’s own DNA, and presence of the BREX-specific modifications in the phage genome allow to overcome the defense [191, 193]. BREX sites are non-palindromic and are methylated only at one strand, which, similar to the Type III and ISP R-M systems, might imply the requirement of multiple sites and their specific orientation for restriction. BREX acts at the early stages of phage infection and accumulation of the phage progeny DNA inside the BREX+ cells is prevented. Yet, the mechanisms of restriction remain unknown. The E. coli BREX defense is suppressed by the phage T7 DNA mimic protein Ocr [46], which is a well-known Type I R-M systems inhibitor [151, 196, 197]. This result seems to suggest common mechanistic features between the BREX and multisubunit complexes of R-M systems.

Defense Island System Associated with Restriction–Modification (DISARM) systems. Following the discovery of BREX, mining of the conserved gene clusters in the defense islands resulted in prediction of another novel system – DISARM [198]. Antiviral activity was demonstrated for the DISARM system from B. paralicheniformis. It comprises five components: the DrmA helicase, the DrmB protein with unknown function domain DUF1998, DrmC, containing a phospholipase D (PLD) domain, DrmE, and cytosine-specific methyltransferase DrmMII [198]. This composition is typical for the class 2 DISARM, while in more abundant class 1 DrmMII is substituted with DrmMI, an adenine-methyltransferase, and DrmE – with DrmD, a SNF-2 like helicase (Fig. 6a). The PLD domains can be involved in catalytic activity of nucleases [199]. Yet, surprisingly, DrmC was shown to be dispensable for the DISARM-mediated defense against phages. DrmMII alone can methylate symmetric (CCWGG) sites in the host DNA and deletion of the methyltransferase gene in the presence of the full DISARM cluster is toxic to cells. Yet, in contrast to the classical R-M phenotype, phage ϕ3T with the DISARM-specific methylation was unable to infect DISARM+ cells, suggesting that methylation is necessary but not sufficient to avoid restriction [198]. Similar to BREX, DISARM does not affect phage adsorption and inhibits early stages of infection by an unknown mechanism.

7-Deazapurine in DNA (DPD) systems. Besides DNA methylation, 7-deazaguanine based modifications also can be coupled with the R-M like defense systems [200]. Multiple enzymes are involved in generation of 7-deazaguanine, which usually serves as a precursor of the modified bases in tRNA. Some prokaryotes encode additional biosynthetic gene clusters responsible for introduction of 7-deazaguanine in DNA [200]. Such DPD systems (from 7-deazapurine in DNA) can consist of up to 10 components (DpdA-K) [200, 201]. The R-M-like activity of the DPD system was suggested based on inhibition of transformation of the non-modified plasmid into the cells of Salmonella Montevideo carrying a dpd cluster [200]. Activity of the DPD system against phage infection had not been demonstrated so far, and possible restriction mechanism remains unclear. Auxiliary DPD components include helicases, ParB-like NTPase, and PLD nuclease, which may be involved in the restriction of unmodified DNA. Interestingly, similar 7-deazaguanine modification clusters have been identified in some viral genomes (e.g., in phages 9g or Cajan), where modification was shown to protect phage DNA against a wide spectrum of REases [202, 203].

Phosphorothioate (PT) modification-based systems. While modifications discussed so far affected only nucleobases, the DNA sugar-phosphate backbone also can be subjected to modification. Replacement of a non-bridging oxygen with a sulfur atom that leads to formation of the phosphorothioate internucleotide linkage – PT modification – could be associated with different defense systems in Bacteria and Archaea [204-206]. These systems are summarized in Fig. 6b.

The PT modification occurs as a result of the activity of the system of dndABCDE genes (Dnd is a phenotype associated with DNA degradation), which encode the DndA cysteine desulfurase, the DndC PAPS reductase, the DndD ATPase/nicking endonuclease, and a small protein DndE that binds nicked DNA, while DndB regulates transcription of the dndBCDE operon and determines proportion of the PT-modified sites in the genome [204, 207-209]. Not all stages of the biochemical pathway involved in PT modification have been determined but it is known that cysteine serves as a donor of the sulfur atom that is transferred to DndC and next incorporated in an energy-dependent manner into the DNA that was preliminary nicked at specific sites by DndD [192, 210, 211]. Recently, it was shown that the dnd genes could also be involved in the PT modification of RNA [212]. On its own, the PT modification is thought to be involved in the maintenance of the redox homeostasis and control of gene expression [213], but the Dnd modification module in Bacteria is often accompanied by the dndFGH restriction gene cluster [214]. In vitro activity of the DndFGH components has not been investigated but the presence of the dndABCDE-FGH cluster in vivo inhibits transformation with the non-modified DNA, while DndFGH expression in a strain without DndABCE leads to the cleavage of cell’s own DNA [214-216]. The most prominent feature of PT modification, which is quite distinct from the R-M methylation, is the fact that only a small proportion of available sites are modified and modification of each specific site is transient, which raises questions about the mechanisms of self-immunity avoidance [217, 218]. Presence of Dam methylation affects distribution of the PT-modified sites, while does not affect their overall density [219]. It was further suggested that PT modification specificity could be defined by the overall geometry of the DNA site, rather than its sequence [219].

The PT modification has also been shown in Archaea, where, instead of the dndFGH, the restriction function is performed by the pbeABCD gene cluster [205]. The dndCDEA-pbeABCD from Haloterrigena jeotgali was shown to provide antiviral defense, and restriction activity was dependent on the functionally active PT modification module, which is distinct from the dndFGH behavior [205]. Accumulation of the viral DNA was not observed inside dndCDEA-pbeABCD infected cells, though its cleavage has not been demonstrated either. The pbeABCD genes can be found as solitary or adjacent to the methyltransferase genes, which implies a possibility of exchange of modules between the different defense systems [205].

Recently, another novel PT modification-based defense system has been discovered – SspABCD-SspE [206]. The sspABCD genes are not homologous to dndABCDE but encode similar functional domains and perform PT modification of DNA, while SspE serves as a restriction component, inhibiting phage infection. In vitro, SspE was shown to possess an NTPase activity, which was stimulated by the presence of PT-modified sites, and non-specific nicking endonuclease activity [206]. The feature of the SspABCD system is modification of only one DNA strand within non-palindromic recognition sites.

Description of other prokaryotic defense strategies, and discussion of the interplay between different antiviral systems will be continued in the second part of the manuscript [Biochemistry (Moscow), vol. 86, Issue 4].