INTRODUCTION

The genus Mycobacterium includes both pathogenic and free-living species. The M. tuberculosis complex and M. leprae are the most important pathogens, causing tuberculosis and leprosy, respectively, while the M. avium complex, M. abcessus, M. ulfans, and M. fortuitum are opportunistic pathogens in humans. Many mycobacteria are free living. The species M. neoaurum is considered as a producer of steroid compounds for pharmaceutics [1].

Mycobacterium tuberculosis causes tuberculosis, which is a widespread dangerous infectious disease and is among the ten main causes of mortality in the global population. Mycobacterium tuberculosis can survive and reproduce within macrophages and other immunocompetent cells, causing damage to lymphoid tissue, the lung, and other organs [2, 3]. Tuberculosis requires long-term therapy, and the spreading of broadly resistant and multidrug-resistant strains is a global medical problem. Targeted changes introduced in the microbial genome makes it possible to investigate the relationship between the genotype and physiological features of microorganisms and to understand their molecular interactions with the environment and the host organism. The issue is of particular importance because functions of a substantial part of M. tuberculosis genes remain unknown although their nucleotide sequences were established [4, 5].

Genetic manipulations are rather difficult to perform in mycobacteria. Species of the M. tuberculosis complex grow slowly; it takes from 7 days to 3 weeks for them to produce colonies on solid media. Mycobacteria possess a hydrophobic cell wall, which consists of lipids, mycolic acids, polysaccharides, and peptidoglycans. The components allow mycobacteria to grow as a dense film and prevent efficient DNA transfer. Molecular genetic methods were still developed to transform mycobacteria with exogenous DNA, thus increasing the opportunity to modify their genome [6, 7].

Genome editing methods make it possible to change the mycobacterial genome in a targeted manner. Mutant strains with knockouts in particular genes, conditional knockdowns, or overexpression of genes of interest are important to construct in order to perform functional studies and to identify targets for new drugs [810]. Introduction of single-point amino acid substitutions allows studies of drug resistance, which may develop as a result of such mutations in the genes that code for drug targets or enzymes responsible for drug modification within the cell, including regulatory gene regions [11].

Successful genome editing improved the protective properties of an M. bovis bacille Calmette–Guerin (BCG) strain. However, BCG often fails to ensure efficient protection, especially in the adult population [12, 13]. A BCG strain was used to design the VPM1002 vaccine, which shows success in clinical trials [14]. The VPM1002 recombinant strain was obtained by deleting the ureC urease gene and integrating Listeria monocytogenes hly, which codes for listeriolysin O. Better protective properties are characteristic of the vaccine strain because listeriolysin O facilitates cell exit from the phagosome at lower pH, while the urease gene deletion prevents mycobacteria from increasing the alkalinity of their environment [15]. Also, the recombinant live vaccine was created on the basis of M. tuberculosis to have a broader range of protective antigens, thus being advantageous over BCG. To achieve this, vaccine strain MTBVAC was constructed via consecutive unmarked deletions of two genes, fadD26 and phoP, which are important for bacterial virulence [16]. The new vaccine has already been put to phase II clinical trials.

The review describes the main approaches to genome editing in mycobacteria with the use of homologous recombination, mycobacteriophage integrases, and the CRISPR/Cas systems.

HOMOLOGOUS RECOMBINATION-BASED GENOME ENGINEERING IN MYCOBACTERIA

Introduction of targeted changes in the genomes of microorganisms is of immense importance for studying the gene functions and constructing strains with desirable properties. A main approach to identifying the function of a gene is constructing its knockdown or knockout with subsequent complementation in the wild-type genome. Homologous recombination provides a means to replace the allele of interest with its mutant counterpart.

A gene knockout is obtained by allelic exchange, which requires two crossover events to occur on both sides of the region to be mutated (Fig. 1). To this end, cells are transfected with a vector that carries a modified target gene allele (e.g., an allele disrupted by insertion of a resistance gene) flanked with homology regions on both sides (Fig. 1a). Systems of this type were used in early allelic exchange experiments in mycobacteria. Only a limited success was achieved with their use because cell transformation was low efficient, the homologous recombination rate is low in mycobacteria, and illegitimate recombination occurred at a high rate. Use of linear DNA fragments was assumed to increase the homologous recombination rate, but transformation with a linear fragment most often failed to lead to a replacement of the target region in the genome [4]. The allelic exchange rate was found to be higher with ssDNA vectors (alkaline denaturation products or phagemide ssDNAs) or UV-irradiated ssDNAs in both fast- and slow-growing species [17].

Fig. 1.
figure 1

Construction of knockouts and other target gene modifications via homologous recombination. (a) Construction of a marker-carrying mutant by homologous recombination. Cells are transformed with a vector that carries a modified gene allele containing a resistance gene. Mutants that have undergone crossover events on both sides of the mutation to be introduced are selected. (b) Construction of an unmarked knockout by site-specific recombination. A construct used for modification contains a modified allele with a resistance gene, which is flanked with recombinase recognition sites. Allelic exchange proceeds in a single step, and the resistance gene is removed with recombinase after selecting the mutants. (c) Two-step construction of unmarked mutants with the use of a suicide vector. The suicide vector is engineered to harbor a modified allele of the target gene and a cassette containing a marker gene linked with a counter selectable marker. After transformation, cells are plated on a medium that contains the respective antibiotic to select the mutants in which the first crossover event has taken place to integrate the vector body into the chromosome. The resulting mutants undergo a second crossover event and are subject to negative selection, where survival is only possible for the mutants that have lost the counter selectable marker together with the selectable marker.

The presence of marker gene ensuring resistance to a particular antibiotic in the genome of the resulting recombinant strain prevents its further use for biomedical purposes. Also, insertion of an antibiotic resistance gene may exert a polar effect on expression of downstream genes of the operon, thus affecting the resulting phenotype and complicating its characterization. Moreover, integration of the antibiotic resistance cassette in the chromosome excludes the marker from further genetic manipulation [18]. The problems associated with the presence of undesirable antibiotic-resistance genes in constructing serial knockouts are possible to solve by using the site-specific recombination systems obtained from bacteriophages or transposons (Fig. 1b).

Saccharomyces cerevisiae FLP recombinase can be used to remove the antibiotic resistance genes integrated in the bacterial genome as a result of homologous recombination. The enzyme was shown to be functional in M. smegmatis cells and mediates site-specific recombination between two FLP recognition target (FRT) sites in the chromosome [19]. When the hyg hygromycin resistance gene is flanked with two FRT sites in the direct orientation, hyg is specifically removed from the chromosome in M. smegmatis mutants expressing the FLP gene [19]. Additional codon optimization in the FLPm gene ensured the more efficient function of the system in M. smegmatis and provided the possibility of using the system with slow-growing mycobacteria. For example, approximately 40% of resistant clones lost the hyg resistance cassette after short-time FLPm expression in experiments with a M. bovis BCG strain. Chromosomal DNA sequencing showed that the FRT-hyg-FRT cassette was specifically excised by FLP [20].

The same principle underlies the function of a bacteriophage P1 system. A cassette with a resistance marker is flanked with loxP sites, and short-term expression of Cre recombinase ensures excision of the cassette [21]. Another variant utilizes the system of the γδ transposon. A plasmid carrying a kanamycin resistance cassette flanked with two res sites was used to obtain unmarked deletions in M. smegmatis and BCG cells. Expression of the tnpR resolvase gene of the γδ transposon in mutant strains led to efficient excision of the resistance gene flanked with res sites [18].

A system based on endogenous Xer recombinase can also be used to remove the resistance genes. In the system, an antibiotic resistance cassette is flanked with dif sites, which are recognized and resolved by XerC and XerD recombinases. The system does not require additional episomal elements to be introduced and then eliminated, thus providing a very simple and efficient tool [22]. The system was optimized and used to disrupt several genes in various regions of the mycobacterial genome. Such modifications are important for constructing attenuated mycobacterial strains and studying the synergistic effect of genes or the functions of duplicated genes [23].

Two-step allelic exchange to construct unmarked mutants became a classical tool in gene engineering of mycobacteria (Fig. 1c). Using this approach, Parish et al. [28, 29] designed the p2Nil/pGoal system to facilitate construct assembly for homologous recombination. Vectors of the p2NIL series are used to clone the homology arms or to modify the target gene; vectors of the pGOAL series contain marker cassettes, such as lacZ combined with sacB and an antibiotic resistance gene. Insertion of the marker cassette from pGOAL to p2NIL with a modified target gene yields a suicide vector, which is not replicated in mycobacteria. A knockout or modification is achieved in two steps. At the first step, cells are plated on a medium containing the antibiotic and X-gal to isolate blue colonies, in which the plasmid has been incorporated in the genome via a single crossover event. To isolate the clones that have undergone a second crossover event, cells are plated on an antibiotic-free medium containing sucrose. Because the second crossover event results in a loss of lacZ and sacB, the colonies are white in color and not sensitive to sucrose. The method was used to construct the tlyAplcABC∆ unmarked double mutant [24]. The method is widely used to solve various problems in studying the roles of proteins and small RNAs in the physiology of mycobacteria [2527].

The sacB suicide gene (Bacillus subtilis) is broadly used to obtain both unmarked and marked mutations in M. tuberculosis. However, the frequency of spontaneous sacB inactivation may be almost the same as the recombination rate of certain genes, thus complicating mutant selection. Double counter selection (sacB with Escherichia coli galK) made it possible to achieve nearly 100% selection efficiency in mycobacteria [28]. The principle was used in optimized suicide vectors (the pKO series), which are part of a uniform cloning platform for genetic manipulations in mycobacteria [10].

Replicating vectors can increase the likelihood of homologous recombination, but are more difficult to remove. Plasmids with a temperature-sensitive origin of replication are capable of replication at a permissive temperature (30–32°C), but are lost rapidly at a higher temperature (39–42°C). A counter selectable marker, such as sacB, should also be included in a plasmid designed for homologous recombination. A drawback of the method is that the cell growth rate is low at a permissive temperature [8].

Another system includes two pAL5000-based replicating plasmids, one containing repA (primase) and the other, repB (a DNA-binding protein). The genes complement each other in trans and ensure plasmid replication [29]. When antibiotic pressure is removed, one or both of the plasmids are lost and replication becomes impossible. The approach makes it possible to increase the time for allelic exchange.

A method of specialized transduction ensures almost 100% efficiency of DNA delivery into cells. A vector is assembled from two components, a cosmid vector with a sequence for allelic exchange and a conditionally replicating shuttle phasmid, which is derived from broad host range bacteriophage TM4 [30]. Mycobacteriophages are propagated in M. smegmatis at a permissive temperature (30°C), which ensures phage replication. A mycobacterial strain of interest is then transduced and cultured at a restrictive temperature (37°C), which inhibits phage replication. The mutant allele contains a resistance gene flanked with resolvase recognition target sites, and transient expression of resolvase (tnpR) eliminates the marker. The method was successfully used in BCG and [31] and M. tuberculosis [32] strains. More recently, the method was improved and tested by generating many single or multiple deletion substitutions in a targeted manner [33, 34].

Phage recombination proteins, such as Exo, Beta and Gam, or RecE and RecT, help to increase the homologous recombination rate and allow shorter homology arms to be used. The products of gp60 and gp61 of mycobacteriophage Che9c were shown to be homologous to RecE and RecT and to possess exonuclease and DNA-binding activities. The genes were used to construct pJV53 for homologous recombination in mycobacterial cells. Recombination occurs at a homology arm length of 50 bp, but is more efficient when the length exceeds 500 bp. The system makes it simpler to construct mutants in M. smegmatis and helps to overcome the illegitimate recombination effect in M. tuberculosis [35]. The green fluorescent protein gene (gfp) inserted in pJV53 facilitates verification of plasmid loss. With a hygromycin resistance cassette with dif sites, the system can be used to consecutively delete several genes in M. smegmatis [36]. A three-plasmid system was also designed to construct unmarked knockouts. In the system, a helper plasmid carries a temperature-sensitive origin and the sacB suicide gene [37]. The pYS2 plasmid is used to engineer the mutant allele, which is modified to include a cassette with the hygromycin resistance and green fluorescent protein genes flanked with loxP sites. A linear substrate is obtained from the plasmid and used to transform cells. The pYS1 plasmid, which is a pJV53 derivative, is used to deliver gp60 and gp61 under an inducible promoter. The plasmid additionally harbors the sacB counter selectable marker, a temperature-sensitive origin of replication, and the kanamycin resistance gene. After recombination, cells are transformed with a third plasmid, pML2714, to express Cre recombinase and to excise the cassette. Mutant clones lose fluorescence and hygromycin resistance as a result.

Phage Che9c recombinases were used to develop a system for introducing point mutations in mycobacterial genomes with the use of ssDNA. Recombination mediated by gp61 was efficient enough to introduce point nucleotide substitutions without performing direct selection; mutant strains were identified by PCR. However, the laborious screening procedure to select target clones limits the application of the system [38].

Thus, various homologous recombination methods are successfully employed in gene editing in mycobacteria. The approaches that combine homologous recombination for allelic exchange with site-specific recombination for subsequent elimination of the resistance genes make it possible to design simple and efficient editing systems, including the use of specialized transduction. Recombination engineering with RecE and RecT of phage Che9c ensures a manifold increase in system efficiency.

MYCOBACTERIOPHAGE RECOMBINASES IN GENE ENGINEERING OF MYCOBACTERIA

Site-specific recombination is a genetic recombination type where DNA strands are exchanged in a region between certain sequences. In contrast to homologous recombination, these specific DNA sites lack extended homology regions. To drive recombination, recombinase recognizes and binds the specific sequence to form a synaptic complex, which catalyzes chromosomal DNA cleavage with subsequent rearrangement and ligation of the cleavage site ends [39].

Site-specific recombination often leads to integration of one DNA molecule into the other, rather than to exchange of genetic information between the two molecules.

Recombinase can catalyze the reverse reaction of excising the integrated sequence in bacteriophages. Excision proceeds via site-specific recombination in many temperate bacteriophages. Recombinase alone is sufficient for some recombination systems, while additional host factors are necessary for others.

Stable integration of mycobacteriophages into the host genome via site-specific recombination was addressed in many studies [4045]. Mycobacterial phages L5, Ms6, Bxb1, and ɸRv1 were studied most comprehensively; the mechanism of their integration is considered below. The mycobacterial integrative element pSAM2 was also investigated, being initially identified in Streptomyces.

Temperate phage L5 is the most typical mycobacteriophage and infects both fast- and slow-growing mycobacterial species to produce stable lysogens [45]. The L5 prophage is integrated into a certain chromosome site during the lysogen phase and is excised in the lytic phase. Both integration and excision are catalyzed by phage-encoded Int integrase and require the host-encoded integration host factor (mIHF). The direction of these recombination events is determined by Xis, which is encoded by the phage gene 36. Integration is efficient in the absence of Xis, while excision is Xis dependent [46].

To allow site-specific recombination, a phage attachment site (attP) interacts with a bacterial chromosomal attachment site (attB). The integration reaction was reproduced in vitro and shown to require mIHF and Int [41, 47]. Supercoiling of the attP or attB sequence stimulates integrative recombination, but is not absolutely essential [48].

The minimal sizes of the functional attP and attB sites are approximately 240 and 29 bp, respectively. The attP sequence includes 43 bp that are common for the two sites, strain exchange occurs within this region, and recombination yields the integrated prophage flanked with left (attL) and right (attR) sites (Fig. 2) [49]. The attB site differs in one nucleotide between fast- and slow-growing mycobacteria, but the difference does not substantially affect the capability of mycobacteriophage L5 to efficiently infect mycobacteria of both groups [45].

Fig. 2.
figure 2

General scheme of site-specific recombination.

Bacteriophage L5 integrates in the vicinity of the 3′ end of the tRNAGly gene without altering its sequence. DNase I footprinting revealed an unusually long 413-bp region that serves as an attP site for L5 integrase. A 252-bp fragment sufficient for efficient phage integration was mapped within the attP site in a subsequent deletion analysis [49].

The L5 phage system was used for the first time to construct new recombinant BCG strains [43]. A DNA segment carrying the attP site and the integrase gene (Int) of mycobacteriophage L5 was used to replace the mycobacterial origin of replication (oriM) in the pMV261 shuttle vector, thus producing the pMV361 vector capable of integration. Because the phage Xis protein is absent, the integrated vector is stably maintained even without antibiotic pressure.

The genome organization is similar in mycobacteriophages Bxb1 and L5 [42]. Phage Bxb1 utilizes serine integrase to integrate into the functional groEL1 gene. There are two groEL genes in mycobacteria. The genes are highly similar (70%) at the nucleotide sequence level, but Bxb1 integrates only into groEL1 because sequence specificity is required for integration. In contrast to many other site-specific recombinases, serine integrases lack a strong specificity to the substrate DNA structure and recombine regions within the same DNA molecule (in the head-to-head or head-to-tail orientation) or between different DNA molecules, affecting supercoiled, linear, and even double-stranded molecules [50].

In contrast to tyrosine integrase sites, the attP and attB sites of serine integrases are short, approximately 50 and 40 bp, respectively. The mechanism of DNA strand exchange also differs. Tyrosine integrases introduce single-strand breaks in DNA and exchange only one strand of each site to produce an intermediate structure similar to a Holliday junction [51], while serine integrases induce double-strand breaks in DNA and exchange the strands via a rotation mechanism [52].

Oligonucleotide-mediated recombineering followed by Bxb1 integrase targeting (ORBIT) was developed by combining two molecular tools, homologous recombination and site-specific integration [53, 54]. Target DNA fragments are inserted in two steps in the ORBIT system. A M. smegmatis or M. tuberculosis strain carrying a plasmid that express RecT recombinase of phage Che9c and integrase of phage Bxb1 is cotransformed with a short synthetic DNA oligonucleotide and a nonreplicating plasmid harboring the Bxb1 attB site, an antibiotic resistance gene to allow transformant selection, and a target sequence. The oligonucleotide is designed so that the phage Bxb1 attP site (48 bp) is flanked with regions of 45–70 bp that are homologous to the target chromosomal region.

At the first step, the oligonucleotide carrying the attP acceptor site for Bxb1 integrase is inserted into a necessary region of the mycobacterial genome via homologous recombination driven by Che9c RecT. At the second step, Bxb1 integrase facilitates site-specific recombination between the plasmid attB and the attP site of the oligonucleotide insert obtained at the first step. The sequence of the synthetic oligonucleotide determines the insertion site position, while the plasmid serves as a target sequence donor in this system.

Bxb1 integrase functions independently of the host factors, than the success of ORBIT application depends on the efficiency of the first recombination step.

Mycobacteriophage Ms6 forms stable lysogens in M. smegmatis. The Ms6 attP site has a high A+T content and harbors many direct and inverted repeats. Ms6 integration into the host genome is mediated by Ms6 integrase, which targets integration to the 3' end of the tRNAAla gene in both fast- and slow-growing mycobacteria. A 26-bp central region of attP overlaps the 3′ end of the tRNAAla gene, which is conserved in both of the mycobacterial groups [40].

The genomes of M. tuberculosis H37Rv and CDC1551 contain two prophage-like elements, φRv1 and φRv2. The φRv2 element encodes tyrosine recombinase, while φRv1 encodes large serine recombinase [55]. Recombination takes place between a putative attP site and the host chromosome; the attB site is within a redundant repetitive element (REP13E12), which occurs in seven copies in the M. tuberculosis genome. It is of interest that both φRv1 and φRv2 are absent from nonvirulent M. bovis BCG strains. Clinical M. tuberculosis isolates do not all have φRv2, but all seem to harbor at least one copy of φRv1 or φRv2. The two related elements presumably play a role in the physiology of M. tuberculosis [44]. The functional character of φRv1 was confirmed by efficient transformation of M. bovis BCG cell with a nonreplicating plasmid that carried the integrase gene and the attP site (reconstructed from the attL and attR sites of the prophage). Four out of the seven REP13E12 sites present in the BCG genome can be used as attachment sites by vectors of the type, and the vectors can simultaneously occupy more than one site [55].

The pSAM2 element is an 11-kb integrative element and was initially characterized in Streptomyces ambofaciens. The pSAM2 recombination system includes integrase of the λ family [56] and the attB/P sites and is similar to the systems described above. The attB site covers the 58-bp region from the anticodon loop to the 3' end of the tRNAPro gene. The attB site is conserved among actinomycetes, including mycobacteria [57]. Thus, pSAM2 is capable of integration in various actinomycete species.

Several advantages are characteristic of the use of site-specific recombination. First, genes are reproducibly integrated into a known site (or at least a limited number of sites) in the mycobacterial genome. Every effect of the integration site (on gene expression and bacterial biology) is equally applicable to any insert. Second, a single gene copy is integrated, thus reducing the number of artifacts due to the use of multicopy plasmids. Third, strains that carry integrative plasmids are usually far more stable than strains that carry episomal vectors. Finally, mycobacteriophage integrases facilitate efficient integration of longer DNA fragments as compared with homologous recombination.

CRISPR/Cas SYSTEMS IN GENE ENGINEERING OF MYCOBACTERIA

CRISPR/Cas systems are prokaryotic systems that consist of clustered regularly interspaced short palindromic repeats (CRISPRs) and CRISPR-associated proteins. The systems are responsible for the adaptive immune response against foreign genetic material [58]. Class II systems include a single effector CRISPR protein, which utilizes CRISPR RNA (crRNA) to recognize and hydrolyze target DNA, and are used to edit various genomes in vitro. The type II-A system of Streptococcus pyogenes and the type V-A system of Francisella novicida are the most common [59, 60]. The S. pyogenes genome editing system includes multidomain RNA-dependent DNA endonuclease SpyCas9, whose specificity is determined by a guide RNA. The guide RNA has a 20-nt region known as the spacer, which is complementary to target DNA (a protospacer), and a structural part, which is recognized by SpyCas9 endonuclease. The guide RNA function may be performed by two RNAs, crRNA (encoded by CRISPRs) and trans-activating crRNA (tracrRNA, which is encoded by a separate gene and is necessary for crRNA processing and SpyCas9 binding), or a single guide RNA (sgRNA), which is a fusion product of the above two RNAs [61]. Correct recognition of the target DNA sequence (the protospacer) requires that a short element known as the protospacer adjacent motif (PAM) with the sequence 5′-NGG-3′ occurs immediately downstream of the target. When PAM is present and the spacer of the guide RNA nearly perfectly matches the protospacer of genomic DNA, DNase activity of SpyCas9 is activated. SpyCas9 produces a double-strand break with blunt ends, the break activates cell DNA repair mechanisms, and a heritable change in genome sequence arises during repair [60]. The V-A type system derived from F. novicida consists of two parts, multidomain RNA-dependent DNA endonuclease FnoCas12a (Cpf1) and a guide RNA, which has a short hairpin recognized by the enzyme and a 23-nt spacer [62]. PAM recognized by FnoCas12a endonuclease has the consensus sequence 5′-BTTV-3′ [63] and should occur upstream of the DNA protospacer, unlike in the type II-A S. pyogens system [64]. In addition, FnoCas12a produces ends with overhangs when introducing a double-strand break in DNA, in contrast to SpyCas9 [64].

The S. pyogenes CRISPR/Cas system was the first to be used as a heterologous system to edit the mycobacterial genome. However, SpyCas9 endonuclease was found to be toxic to mycobacteria, especially when its gene was expressed under the control of the potent constitutive promoter of hsp60 [65, 66]. dSpyCas9 is a nuclease activity devoted form of SpyCas9 endonuclease. It was used in early experiments to inhibit gene expression [67]. It should be noted that the dSpyCas9 gene was expressed under the control of a tetracycline-regulated promoter to avoid toxicity due to dSpyCas9 overproduction. Nuclease SpyCas9 and its derivatives with nickase activity or without nuclease activities were found to be highly toxic to Gram-positive Corynebacterium glutamicum with a GC-rich genome [68]. It seems natural to assume that SpyCas9 toxicity in bacteria with GC-rich genomes is associated with nonspecific DNA-binding activity due to the GC-rich respective PAM. The assumption is questionable because SpyCas9 remains toxic when its PAM-binding arginine residues are mutated [66]. Protein–protein interactions may be responsible for SpyCas9 toxicity to a substantial effect. Recent studies with mammalian cells showed that the wild-type SpyCas9 or dSpyCas9 interacts with the Ku78 subunit of the DNA-dependent kinase complex and thus negatively affects activity of DNA repair by nonhomologous end joining (NHEJ) [69], thus stimulating spontaneous mutagenesis. Natural type II-A CRISPR/Cas9 systems were found to inhibit NHEJ activity in bacterial cells [70]. The findings indicate that SpyCas9 is capable of interacting with cell proteins to distort the normal cell processes. A contribution of other factors to SpyCas9 toxicity cannot be excluded as well. For example, toxicity of the CRISPR/SpyCas9 system in C. glutamicum cells was recently reduced by using the potent E. coli rrnB terminator for sgRNA and RecT recombinase of the Rac prophage in combination with a donor DNA fragment as an oligonucleotide [71]. It is possible to assume by analogy that the S. pyogenes sgRNA terminator fails to function in mycobacteria and that mycobacterial cells are incapable to efficiently repair the DNA breaks introduced by SpyCas9, although possessing three double-strand break repair pathways [72].

In view of CRISPR/SpyCas9 toxicity, alternative systems were sought to edit the mycobacterial genome. A model system with the Renilla luciferase reporter gene was used to study repressor activity of 11 SpyCas9 orthologs that lacked nuclease activity and belonged to type II-A and type II-C CRISPR/Cas systems [66]. A tetracycline-regulated promoter was used to control Cas endonuclease gene expression. The S. thermophilus dSt1Cas9 protein was identified as the most efficient and nontoxic repressor in the system. Experiments with inhibition of mycobacterial gene expression confirmed the results obtained with the model system [66]. Thus, St1Cas9 is a good candidate to be used in experiments with genome editing. However, constitutive expression of the St1Cas9 gene under the control of the hsp60 promoter was found to be toxic to M. smegmatis [65]. Controlled expression of the St1Cas9 gene is therefore necessary to ensure, for example, by using a tetracycline-regulated promoter when designing a St1Cas9-based system for genome editing in mycobacteria. Meijers et al. [73] performed such experiments. St1Cas9 toxicity was reduced not only by using a tetracycline-regulated promoter, but also by expressing St1Cas9 from a single copy of its gene, which was inserted in the mycobacterial genome with phage L5 integrase. The optimal PAM sequence is 5′‑NNAGAAW-3′ (where degenerate W is A or T) for this editing tool and is extremely rare in the GC-rich mycobacterial genome. However, PAMs with minor differences from the consensus can be recognized by St1Cas9 occurring at higher concentrations in vitro [74]. In agreement with this finding, St1Cas9 was shown to recognize suboptimal PAMs with the sequences 5′-NNGGAA-3′ and 5′-NNAGCAT-3′, which substantially increases the number of potential targets in the M. marium and M. tuberculosis genomes. St1Cas9 was successfully used to construct target gene knockouts (the efficiency was approximately 50%) and gene deletions (the efficiency was 22%). Whole-genome sequencing showed that short-term induction of St1Cas9 expression for 1 h generates virtually no off-target mutations. However, undesirable mutations were detected after longer (for 10 days) expression. Promoter leakage was additionally observed for the regulated promoter used in the study; i.e., edited colonies were produced even in the absence of an inducer. In view of background expression of St1Cas9 and its potential to produce off-target mutations upon long-term expression, it was proposed that the St1Cas9 gene be removed from the mycobacterial genome by replacing it with the red fluorescent protein tdTomato reporter gene via homologous recombination [73].

The V-A type system derived from F. novicida was the first to be used for genome editing purposes in mycobacteria [75], as well as in C. glutamicum [68]. Double-strand breaks introduced in DNA by FnoCas12a were found to be lethal in mycobacteria. The finding confirmed the assumption that mycobacterial repair pathways are incapable of efficiently repairing the DNA breaks introduced by heterologous CRISPR/Cas systems. Recombinase of mycobacteriophage Che9c and a recombination donor DNA fragment were used to overcome toxicity of double-strand breaks introduced in mycobacteria by FnoCas12a [75]. The green fluorescent protein reporter gene was used to estimate the efficiency of genome editing in M. smegmatis. The efficiency of site-directed mutagenesis in M. smegmatis was 80%, and an efficiency of 37–75% was observed for fragment insertion and deletion via recombination with double-stranded DNA fragments of approximately 1000 bp. Single-stranded oligonucleotides of 59 and 79 nt facilitated insertions and deletions of no more than 7 and 20 nt, respectively, with an efficiency of 70–80%; the efficiencies achieved with 418- and 1000-bp oligonucleotides were 17.4 and 8.2%, respectively.

Fourteen Cas proteins were tested as tools for editing the mycobacterial genome [65]. Expression of genes for the Cas proteins of Treponema denticola (TdCas9), Neisseria meningitidis (NmCas9), and F. tularensis (novicida) (FnCpf1) under the control of the potent constitutive hsp60 promoter was observed to exert no effect on the M. smegmatis growth rate. When the FnCpf1 gene was optimized according to the codon usage in M. smegmatis, the product was toxic to mycobacteria. It was assumed that high-level expression of optimized genes leads to Cas accumulation and increases the frequency of Cas-induced nonspecific single-strand DNA breaks to a critical level [75]. The mRNA of the FnCpf1_cg gene optimized for expression in C. glutamicum is apparently translated at a low rate and thus prevents FnCpf1_cg from accumulating in dangerous amounts. Of the three Cas proteins that were nontoxic in mycobacteria, only FnCpf1_сg showed high activity with various guide RNAs and ensured a high efficiency (up to 79%) of mycobacterial genome editing. It should be noted that the NHEJ pathway was utilized to edit the mycobacterial genome in that work [65], while homologous recombination was used in the study considered above [75]. Because known NHEJ genes (ku and ligD) served as targets, it is possible to assume that other known alternative pathways facilitated double-strand break repair [72].

Base editors were recently added to the CRISPR/Cas systems available for genome engineering in mycobacteria. Ding et al. [76] designed the MtbCBE two-plasmid system that utilizes a cytidine editor to efficiently change the M. tuberculosis genome. In the system, one of the plasmids codes for protein inhibitors of the RecA- and NucS-dependent DNA repair pathways, while the other plasmid codes for S. thermophilus nCas9Sth1 nickase fused with APOBEC1 cytidine deaminase and the uracil-DNA glycosylase inhibitor (UGI). The study showed that deaminated base repair involves not only uracil-DNA glycosylases, but also the homologous recombination (RecA is a key protein) and mismatch (NucS nuclease is a key protein) repair pathways, which substantially reduce the extent of deaminase-dependent genome editing. The plasmid that carries the genes for repair pathway inhibitors therefore serves to improve the efficiency of base editing.

The set of potential targets of the editor is limited to the relatively long Cas9Sth1 PAM and cytidine-to-thymine substitutions. New base editors were recently designed to replace cytidines with guanines [77] or adenines [78]. The editors include uracil-DNA glycosylase, which excises a deaminated base to produce an apyrimidinic site. Further repair of the apyrimidinic site mostly generates the C → A or C → G transversions. Adaptation of similar base editors to mycobacteria are now in progress and is expected to substantially extend the range of mutations possible to introduce.

Class I type III-A CRISPR/Cas system was found in the genomes of pathogenic mycobacteria, such as M. tuberculosis and M. bovis, by a bioinformatics analysis [79, 80]. In M. tuberculosis H37Rv, the system includes nine genes for Csm2-6, Cas10, Cas6, Cas1, and Cas2 and two CRISPR loci. Cas6 nuclease initiates maturation of crRNA by cleaving its precursor within repeats. The resulting intermediate products have 8 nt of the repeat at the 5′ end, a spacer, and a full-length repeat at the 3′ end [81]. A ribonucleoprotein complex is assembled on the intermediate products to include the Csm proteins and Cas10 nuclease, while Cas6 is displaced. Then part of the repeat is cleaved from the 3′ end to yield mature crRNA [81]. It should be noted that both DNA (Cas10 exerts nuclease activity in this case) and RNA (Csm3 exerts nuclease activity) can be used as targets by a type III-A system in contrast to type II-A and V-A systems considered above [82].

Early studies of the system structure and, mostly, the repeats and spacers of CRISPR loci assumed that the elements are suitable for genotyping mycobacterial strains in epidemiological research [83, 84]. It was not until recently that the mechanism of function of the CRISPR/Cas system and the expression regulation of its components came to be studied in mycobacteria [85, 86]. The system was shown to be active against foreign DNA elements, and crRNA biogenesis was investigated in detail [85, 87]. The M. tuberculosis CRISPR system produced in an E. coli heterologous expression system displayed activity against both DNA and RNA targets both in vivo and as a system reconstructed from purified proteins [88]. Studies of the structure and mechanisms of function made it possible to reprogram the endogenous CRISPR/Cas system for editing the cognate M. tuberculosis genome [89], as was earlier the case, for example, with endogenous CRISPR/Cas systems of clostridia [90, 91]. For the purpose, M. tuberculosis was transformed with a plasmid that coded for guide RNAs against target genome regions. Experiments with structurally various recombination donors demonstrated the possibility of inactivating genes and inserting reporter genes in the M. tuberculosis genome [89]. Moreover, a method was developed to inhibit gene expression by targeting the endogenous CRISPR/Cas10 system to mRNA. As demonstrated earlier, mRNA, rather than DNA, is hydrolyzed by the effector Cas10 when the sequence 5′-GAAAC-3′ is introduced in the 5′ end of the crRNA spacer and is complementary to mRNA [92]. The approach was used to efficiently inhibit expression of individual genes (katG, dcD, or esxT) and several genes simultaneously (lpqE, katG, and inhA). In addition, screening studies identified the genes that affect the M. tuberculosis reproduction rate in culture and within macrophages [89]. Thus, it was demonstrated that mycobacterial endogenous CRISPR/Cas systems are principally possible to use to edit the cognate genome, to regulate expression of individual genes or gene sets, and to carry out screening studies.

CONCLUSIONS

Substantial progress was made in studying mycobacterial genetics over the past years, and molecular tools were developed to allow efficient targeted genetic manipulation in mycobacteria. Targeted changes in the mycobacterial genome are possible to introduce by the methods considered in the review, as is necessary for various biomedical applications.