INTRODUCTION

Double-strand breaks (DSBs) in genomic DNA are lethal if not repaired. To address this type of dangerous damage, cells rely on two major repair mechanisms: Homologous Direct Repair (HDR) and Non-Homologous End Joining (NHEJ). While HDR requires an undamaged homologous region as a template to guide the repair event, NHEJ is not template-directed: the broken DNA ends are brought together and ligated directly, usually in an error-prone way. In prokaryotic cells, HDR is predominant during the exponential phase of growth, when the cells are rapidly dividing, and extra copies of genomic DNA are available. NHEJ can be used whether homologous DNA template is available or not.

Mechanistically, NHEJ is straightforward: broken DNA ends are detected, processed, if necessary, to make them “ligatable”, and ligated, restoring the DNA integrity [1]. In eukaryotes, an arsenal of enzymes and accessory factors are required to carry out NHEJ [2]. The bacterial version is notably simpler. In the most stripped-down versions, like in the case of Bacillus subtilis, only two proteins are involved: a homodimeric DNA-binding protein Ku that binds and brings together both ends of broken DNA, and a multifunctional ligase LigD that is responsible for end processing (if the ends are non-complementary and/or chemically modified) and ligation [3]. While Ku is not directly involved in the DSB processing, it is required to recruit LigD to the broken ends and remains physically associated with the DNA during the entire repair event [4]. The latest massive search for ku- and ligD-like genes revealed that NHEJ machinery is sporadically distributed across bacteria, potentially present in ~20% of bacterial genomes accessible in the public databases [5].

Other non-templated repair mechanisms referred to as Microhomology-Mediated End-Joining (MMEJ) [6] and Alternative End-Joining (A-EJ) [7] have also been reported in prokaryotes. Both are Ku- and LigD-independent. MMEJ strictly requires the existence of microhomology regions on both ends of repaired DNA, while A-EJ can occur in the absence of such microhomology. The mechanisms of action and enzymes involved in either process are not yet fully characterized.

Bacteria of the Pseudomonas genus possess a versatile metabolic and physiologic capacity and can adapt to diverse environments [8]. Representatives of the genus colonize diverse terrestrial and aquatic habitats and infect a wide range of eukaryotic organisms, severely affecting agriculture and human health. The minimal components of NHEJ have been identified in several Pseudomonas species [9, 10]. The Ku and LigD enzymes have been extensively studied in the human pathogen P. aeruginosa (Pae). The 840 amino acids long Pae-LigD has an N-terminal phosphoesterase/nuclease domain, a central DNA polymerase domain, and a C-terminal DNA ligase domain. Pae-Ku is a homodimer of 293-amino acid long polypeptides; in vitro, it stimulates the addition of nucleotides to broken ends of plasmid DNA by Pae-LigD [11].

P. putida KT2440 (hereafter referred to as KT2440), another member of the genus, has captured the attention of the scientific community due to its high potential for metabolic engineering and bioremediation [12]. In consequence, there is a high demand for methods to manipulate the genome of this bacterium. The KT2440 genome harbors the ku and ligD genes whose products are highly similar to the Pae counterparts. Given that some of the most popular genetic engineering strategies currently available involve DSBs and considering the intrinsic mutagenic potential of NHEJ, the study of this pathway in KT2440 may help optimize the available engineering tools for this bacterium and contribute to the rational design of novel genome editing approaches.

CRISPR–Cas systems are adaptive prokaryotic immunity systems that rely on the introduction of programmable DSBs to protect the host from mobile genetic elements. Type II-A CRISPR-Cas systems consist of four genes: cas1, cas2, csn2, and cas9. Csn2, a protein involved in acquisition of foreign DNA fragments in CRISPR arrays, attaches to free DNA ends in the same way that the Ku protein does. If CRISPR–Cas proteins and proteins engaged in DNA repair pathways recognize the same substrate, there may be antagonistic interactions between CRISPR immunity and NHEJ. This could explain why Type II-A CRISPR-Cas systems and NHEJ co-occur only once among the 5536 fully sequenced prokaryotic genomes [13].

The type II-A Cas9 nuclease from Streptococcus pyogenes, when programmed with guide RNAs specific for E. coli genome has been validated as a reliable tool to efficiently kill E. coli, a bacterium lacking the NHEJ genes [14], providing an efficient means to counter select against parental bacteria during genome manipulation. Several research groups [1519] have successfully used Cas9 as a counterselection tool in other bacterial species, including KT2440, to enrich for desired mutations introduced by specific oligonucleotides transformed into cells. The broad applicability of this technique to different bacteria, including those that carry the NHEJ enzymatic machinery, suggests that the presence of NHEJ genes is not enough to save cells from death caused by programmable cleavage of genomic DNA by Cas9. Nevertheless, several lines of evidence indicate that this repair pathway is not predominant during active bacterial growth but exerts a protective effect to ionizing radiation (an agent that generates DSBs) in the stationary phase where the absence of an available homologous template makes HDR impossible [20]. In the current study we tested whether KT2440 cells can overcome Cas9 cleavage-mediated toxicity in different stages of cell growth and investigated the role of Ku and LigD enzymes in the process.

EXPERIMENTAL

Strains and Plasmids. Escherichia coli DH5α (F– endA1 glnV44 thi-1 recA1 relA1 gyrA96 deoR nupG purB20 φ80dlacZΔM15 Δ(lacZYA-argF) U169, hsdR17(rK–mK+), λ–) has been described [21].

Pseudomonas putida KT2440 (rmo– mod+) has been described [22].

P. putida KT2440 ΔligD (rmo– mod+) is a derivative of KT2440 carrying the deletion from the nucleotide #112 to the nucleotide #1335 of the coding region, generating a frameshift in the downstream gene sequence (obtained in this study).

P. putida KT2440 Δku (rmo– mod+) is a derivative of KT2440 carrying the deletion from the nucleotide #59 to the nucleotide #210 of the coding region, generating a frameshift in the downstream gene sequence (obtained in this study).

P. putida KT2440 pHERD26T-Ku (rmomod+) and P. putida KT2440 pHERD26T-LigD (rmomod+) were constructed by cloning of P. putida KT2440 ku and ligD genes amplified with GA-ku-F/GA-ku-R and GA-ligD-F and GA-ligD-R primers, respectively (Table S1, see Supplementary Information). GA-pHERD-F and GA-pHERD-R (Table S1, see Supplementary Information) primers were used to amplify commercial vector pHERD26T and then reaction product was treated with DpnI to degrade methylated template molecules. Then, PCR products were incubated together with HiFi DNA Assembly Master-mix (New England Biolabs). The plasmid identity was confirmed by Sanger sequencing.

Plasmid pBBR5-sgRNA-mutS was previously obtained by Ms. Jany Quintana-Cordero (Skoltech). The sgRNA scaffold was amplified using pKDsgRNA-ack plasmid [23] as the template and primers muts-sgrna-F and sgrna_rrnB-R-PstI (Supplementary Information, Table S1) that introduce a spacer targeting P. putida mutS gene. Pm encoding region was amplified using pSEVA258 [24] as the template and primers pm-R-mutssgrna-r and pm-F-NcoI (Supplementary Information, Table S1). The obtained fragments (containing a homologous region) were amplified in a single reaction using primers pm-F-NcoI and sgrna_rrnB-R-PstI. The obtained single segment Pm-sgRNA and plasmid pBBR5 [25] were restricted by NcoI and PstI enzymes and then ligated. Plasmids pBBR5-sgRNA_ mcb1/_mcb2/_pyrF carrying sgRNA were constructed by using the plasmid pBBR5-sgRNA-mutS as matrix. The pBBR5-sgRNA-mutS plasmid was amplified by PCR using primers (Supplementary Information, Table S1) which introduce the new spacer to the 5' ends. Each vector was then circularized by incubation with HiFi DNA Assembly Master-mix (New England Biolabs). The plasmid identity was confirmed by Sanger sequencing.

Plasmid pSEVA258-Cas9 was previously obtained by Ms. Jany Quintana-Cordero. Cas9 encoding gene was amplified by PCR using pCAS9 plasmid [26] as the template and primers Cas9_F_XmaI and Cas9_R_PstI (Table S1, see Supplementary Information). Amplified PCR fragment was first cloned into pJET1.2/blunt vector (ThermoFisher Scientific) using manufacturer’s protocol. The resulting plasmid pJET-Cas9 and pSEVA258 plasmid were digested by XmaI and PstI. Cas9 fragment and digested pSEVA258 were ligated. All plasmids used are listed in the Supplementary Information, Table S2.

Electrocompetent cells preparation. E. coli DH5α electrocompetent cells were prepared according to [27].

P. putida KT2440 electrocompetent cells: culture was grown overnight at +30°С with shaking (160 rpm) in LB medium. It was centrifuged at 6000 rpm for 4 min at +4°С. Then cells were washed with 10% sterilized solution of glycerol 1mL, initially precooled at +4°С. Then, they were centrifuged at 6000 rpm for 4 min at +4°С, supernatant discarded and pellets resuspended in glycerol. This centrifugation and washing steps were repeated 3 times. After it, cells were finally resuspended in 1 mL of 10% sterilized solution of glycerol and frozen at –80°С.

sgRNA-Cas9 chromosomal targeting. KT2440 cells were transformed simultaneously by electroporation with pSEVA258-Cas9 and pBBR5-sgRNA plasmids. Cells transformed were recovered for two hours at 30ºC in SOC medium. For experiments in the exponential phase, the expression of Cas9 endonuclease and the particular sgRNA was induced by plating the recovered cells on LB-agar, supplemented with kanamycin (50 μg/mL), gentamicin (50 μg/mL) and sodium benzoate (2 mM), and incubated at 30°C overnight. For experiments in the stationary phase, the recovered cells were used to start overnight cultures in LB medium supplemented with kanamycin and gentamicin. On the next day, aliquots from 0.5 to 2 mL from the overnight cultures were plated on LB-agar, supplemented with kanamycin, gentamicin and sodium benzoate (2 mM), and incubated at 30°C overnight. The final amount of cells plated varied based on the killing efficiency of the sgRNA used, in order to guarantee a final number of CFUs of ≥105. In all cases, control cells were plated on LB-agar supplemented with the corresponding antibiotics but without the inducer.

Overexpression of Ku and LigD. KT2440 were transformed with plasmids pSEVA258-Cas9, pBBR5-sgRNA and either pHERD26T-ku or pHERD26T-ligD and the same protocol described in the previous section was carried out, but this time we also added tetracycline (12 μg/mL) to the cultures and plates. To induce the expression of Ku or LigD in the stationary phase we added L-arabinose (0.2 mg/mL) to the cultures 5 hours prior to the final plating of the cells.

RT-qPCR. To measure expression level of ku and ligD genes in strains with overexpression RT-qPCR with intercalating dye was used. The total RNA of P. putida cells was extracted using the GeneJET RNA purification Kit following DNaseI treatment to degrade any remaining genomic DNA. RevertAid Reverse Transcriptase (Thermo Scientific) and reverse primers ku_trans_R or ligD_trans_R (Table S1, see Supplementary Information) were used for reverse transcription following instructions of the manufacturer. The relative increase of gene expression was calculated using the 2–∆∆Ct method [28]. As internal control, the ubiquitously expressed gene gyrA was used and the reverse transcription for this gene was done using the reverse primer gyrA_R. Cells transformed with the empty vector pHERD26T were used as calibrator. The qPCR measurements were performed in a QuantStudio3 system (Applied Biosystems, Thermo Scientific).

The reaction mixture consisted of: 1× EvaGreen Supermix (Biorad); 0.5 μМ of forward and reverse primers (amplifying ku, ligD or gyrA genes); 1–10 ng cDNA template.

Cas9 targeted sites obtaining. High Throughput Sequencing (HTS) was used for analyses of Cas9 targeted sites. The GeneJET Genomic DNA Purification Kit was used to collect the genomic DNA of the whole population of survivors after sgRNA-Cas9 targeting experiments in the stationary phase. The target sequence was amplified using the primers mcb_hseq_F/ mcb_hseq_R and R707_mcb_F/R707_mcb_R, R708_mcb_F/R708_mcb_R (for cells targeted with sgRNA mcb1, the R707 and R708 primers pairs were used in the second sequencing round for non-induced and induced targeting cases, respectively); mcb_hseq_F/ mcb_hseq_R for cells targeted with sgRNA mcb2. mutS_seq_F/mutS_seq_R and R703_mutS_F/ R703_mutS_R, R704_mutS_F/R704_mutS_R (for cells targeted with sgRNA mutS), and pyrF_hseq_F/ pyrF_hseq_R and R705_pyrF_F/R705_pyrF_R, R706_pyrF_F/R706_pyrF_R. All primers can be found in Table S1 (see Supplementary Information). Gel electrophoresis was used to evaluate the PCR product, and the GeneJET PCR Purification Kit was used to clean it up. The libraries were made with a New England Biolabs’ Next® Ultra II DNA Library Prep Kit for Illumina® (New England Biolabs) and deep sequenced with an Illumina MiniSeq System in pair-end 150-bp long-read mode, according to the manufacturer’s protocols.

Analysis of high throughput sequencing data. The tool CRISPResso2 [29] was used to examine the output fastq files. This tool autonomously makes raw reads processing and their alignment to target regions from the reference genome (GenBank: AE015451.2). By spanning the projected site of Cas9 cleavage, known as the “quantification window,” this method can discover variations in a limited region. The window size was set to 30 nucleotides, and the most likely site of Cas9 cleavage was set to 3 nucleotides upstream PAM. The program suggests this position, which is backed up by various structural and functional analyses of Cas9 from S. pyogenes. Further analyses of CRISPResso2 output, including statistical tests, was done in the R software environment. Mutations percent calculations were made as follows. For each mutation in each position was calculated number of reads with such mutation, this number was divided by overall number of reads. This was done for induction and no induction strains. Then, values of no induction samples were subtracted from the corresponding values of induction samples, this allowed to compensate background spontaneous mutations.

RESULTS

The Cas9-Mediated DSBs are Inaccurately Repaired in the Stationary Phase, with Mutations Concentrated Near the Point of the Break

To evaluate the role of NHEJ in DSB repair in wild-type KT2440, we tested the toxicity of DNA cleavage in several genomic locations by Cas9 programmed with different single guide RNAs (sgRNAs). To introduce DSBs we used a two-plasmid system (Fig. 1a), in which each plasmid codes for a different component of the sgRNA-Cas9 complex. The S. pyogenes Cas9 nuclease was cloned into the pSEVA258 vector under the control of the Pm promoter inducible with sodium benzoate. Individual sgRNA genes were cloned in pBBR5, a derivative of the pBBR1-MCS5 plasmid, also under the control of the Pm promoter. Thus, expression of both components of the editing complex could be turned on using the same inducer. The killing efficiency of sgRNA-Cas9 effectors was determined by plating cells from exponential or stationary cultures on selective media in the presence and in the absence of Cas9/sgRNA inducer (Fig. 1b).

Fig. 1.
figure 1

Experimental set-up and killing efficiencies of individual sgRNAs. (a) Schematics of the inducible two-plasmid sgRNA-Cas9 system used to generate DSBs in specific regions of the KT2440 genome. (b) A workflow of the experiment to calculate killing efficiencies of individual sgRNAs. Cells in transformation mixtures after 2 hours of outgrowth in SOC were considered as exponential. Figures 1a and 1b were created with Biorender.com. (c) The sequences and locations of sgRNA targeted sites in the KT2440 chromosome. The PAM sequences of targeted sites are highlighted and match the NGG consensus of SpCas9 [33]. (d) Killing efficiencies of individual sgRNAs. The killing efficiency was calculated as [100 – (CFUInd/CFUnoInd) × 100] where CFUnoInd is the number of colonies formed on plates without the inducer of sgRNA-Cas9 expression and CFUInd—the number of colonies formed in its presence. The bars indicate mean values obtained from at least three independent experiments. Standard deviations of the mean are indicated.

Four different sgRNAs were tested (Fig. 1c). The mcb1 and mcb2 sgRNAs target the promoter region of a putative thiazole-oxazole-modified microcin (mcb) operon [30]. The sites targeted by these sgRNAs are located less than 100 bp apart. The other two sgRNAs target the mutS and pyrF genes. Targets were chosen because of i) different levels of expression of targeted genes (silent (mcb) versus active (mutS and pyrF), ii) location at both sides of the origin of replication (Fig. 1c, bottom), and iii) different GC content (Fig. 1c, top). The mcb operon is not essential under laboratory conditions and is not expressed (data not shown) and so mutations introduced during repair of DSBs programmed by the mcb1 and mcb2 sgRNAs are not expected to impact the survival of the cell. The mutS gene codes for a mismatch repair (MMR) enzyme MutS. ΔmutS E. coli cells are reported to be hyper-mutagenic, but this does not negatively impact cell fitness at laboratory conditions, at least in the short term [31]. The pyrF gene codes for orotidine 5'-phosphate decarboxylase, an enzyme involved in the synthesis of pyrimidines. The deletion of this gene generates uracil auxotrophic and 5-fluoroorotic acid (5-FOA)-resistant phenotypes commonly used as, respectively, negative and positive selection markers in gene editing experiments [15, 32]. Mutations of pyrF that completely abolish the activity of the enzyme will be lethal in the absence of uracil added to a medium like LB used in this work. However, one should be able to detect mutations that do not severely compromise the function of the enzyme.

We expected that the killing efficiency of each sgRNA will be a combined measure of target recognition/cleavage (which should result in cell death if unrepaired) and repair (which should result in survival of the affected cell, provided that the repair process generates mutations that prevent further targeting and that are not lethal). As can be seen from Fig. 1d, the killing efficiency in actively growing KT2440 cultures was near 100% for all tested sgRNAs. However, when targeting was performed in stationary cultures, the killing efficiency was reduced to ~70‒95%, depending on the sgRNA used. The drop in the killing efficiency could be partially due to a decrease in the copy number of the plasmids encoding Cas9 and sgRNA in stationary cells and/or increased efficiency of repair that generates mutations that protect cells from further DNA cleavage by sgRNA-Cas9. To determine if the surviving cells overcome Cas9 toxicity due to accumulation of mutations in the targeted sites, DNA segments comprising each target and ∼90 bp of up- and downstream flanking regions were amplified from genomic DNA prepared from at least 105 pooled colonies grown in the presence of the inducer. The amplicons were next subjected to ILLUMINA sequencing. As controls, amplicons of DNA prepared from a similar number of colonies grown in the absence of inducers were also sequenced. The reads were filtered and analyzed using the CRISPResso2 software [29].

Most reads obtained from pooled DNA of cells surviving targeting by the most efficient mutS and mcb2 sgRNAs contain an intact target site (75 and 93%, respectively, Figs. 2a, 2b). Presumably, these reads came from cells that survived due to inactivation of the Cas9-sgRNA effector by plasmid-borne mutations or defective induction of expression of its components. The remaining 7% of reads corresponding to the mcb2 target contained point mutations whose individual frequencies did not exceed 0.5%. These mutations were randomly distributed along the entire length of the amplicon. A similar frequency and distribution of mutations was observed in reads obtained from amplicons from uninduced cells (data not shown). We therefore conclude that these mutations are either sequencing errors or spontaneous mutations that are irrelevant to mcb2-sgRNA-Cas9 targeting. Among the 25% of altered reads with the mutS target the most predominant one (7.5% of reads) was a point mutation destroying the target PAM. A G to T transversion 2 nucleotides upstream of the cleavage site (represented as a dashed line in Fig. 2) was found in 1.5% of reads. Two short deletions (12 and 24 bp) removing the PAM and partially removing the target were also detected (~0.5% of the reads) (Fig. 2a). During repair of DSBs via NHEJ, nucleotides located closer to the point of break are more likely to be mutated by the action of LigD [3]. Thus, one expects most mutations to occur at positions immediately adjacent to the point of cleavage, followed by a progressive decrease in mutation frequency in further positions. We surmise that the most frequent mutation at the mutS site, a PAM altering substitution 6 nucleotides downstream of the cleavage site, was likely selected from a pool of mutations that existed before the addition of the inducer. Therefore, in what follows, we exclude mutations of PAM from further consideration. The remaining mutations at the mutS site could have been generated by NHEJ, however, their low frequency precludes further analysis.

Fig. 2.
figure 2

Normalized frequency of mutations at each target position in surviving wild-type KT2440 stationary cells. The frequency of mutations was calculated for each amplicon position by dividing the number of mutated reads by the total number of reads. The value was normalized by subtracting the frequency of mutations obtained in cells untreated with sodium benzoate (inducer of sgRNA-Cas9 expression) to the frequency of mutations from cells treated with the inducer ([#modified_reads/ #total_reads]induced_cells ‒ [#modified_reads/#total_reads]uninduced_cells ). (a‒c and d) Correspond to loci targeted by indicated sgRNAs. Gray boxes mark the target sequences complementary to the sgRNA guide segments, PAMs are marked, the dashed lines show the expected Cas9 cleavage site position (between the third and fourth nucleotide upstream of the PAM [33]). Below each graph, the most frequent allele variants (>0.5% of reads) are shown. Substitutions are marked in colors (T—green, A—red, G—yellow, C—purple), deletions are represented as dashed lines and an insertion is framed in red.

In the case of mcb1 and pyrF, two sgRNAs that showed lower killing efficiencies, the frequency of wild-type reads in DNA from pooled surviving colonies was just 11 and 45%, respectively (Figs. 2c, 2d). In both cases, most frequent mutated reads contained point substitutions immediately upstream of the cleavage site. Presumably, these mutations resulted from inaccurate repair of DSBs generated by the effector. These mutations alter the seed region of the target, which should abolish the recognition by Cas9-sgRNA [34]. One interesting allele variant found in low frequency at the pyrF site is a 9 bp insertion, which is actually a duplication of a sequence located immediately upstream of the point of cleavage. It most likely arises as a result of end processing carried out by the nuclease and polymerase domains of LigD, which has been shown to generate small deletions and insertions in the repaired area [3]. This particular insertion might be favored because it preserves the pyrF reading frame and may result in a functional PyrF enzyme.

Overall, we conclude that in cases where the killing efficiency of an sgRNA in stationary cultures is below 90%, the surviving cells accumulate mutations at the targeted site that are consistent with the action of the NHEJ repair process.

LigD Is Dispensable for Repair of Cas9-Mediated DSBs While Ku Is Required at Some Targets

We next determined the role of the NHEJ enzymes Ku and LigD in the generation of mutations at the sgRNA-Cas9 generated cleavage sites. To construct the necessary mutant strains, we created two additional sgRNAs targeting internal regions of the ku and ligD genes and followed the protocol described in Fig. 1b (cells were grown to stationary phase). By PCR screening of 30 randomly chosen surviving colonies and sequencing of anomalously short target amplicons detected in some of them, we identified deletions within ku and ligD that generated frameshifts, and, therefore, inactivated the genes (see details in the Experimental). The creation of these strains is a proof of the utility of sgRNA-Cas9 targeting combined with endogenous systems for error-prone repair of DSBs as a tool for a one-step inactivation of chromosomal genes in KT2440. We named the new strains Δku and ΔligD and used them to carry out the experiment outlined in Fig. 1b.

The killing efficiencies of both mutS and mcb2 sgRNAs in the ku and ligD mutant cultures grown to stationary phase were similar to those observed in the wild-type cells, ranging from 95 to 98%. Compared to the wild-type, no significant changes in the pattern of mutations in surviving cells was observed in strains lacking either Ku or LigD for these targets (Figs. 3a to 3d). For sgRNA-mcb1, the patterns of mutations in both mutant strains were also similar to those observed in corresponding wild-type cells but their overall frequency was lower (Figs. 3e, 3f). This difference was particularly striking in the Δku background, where the frequency of mutations decreased ∼25 times compared to the wild-type strain (Fig. 3e). Coincidentally, the killing efficiency of sgRNA-mcb1 in the Δku background increased from 70 to ~85%. In the ΔligD background, the decrease in the frequency of mutations at the cleavage site was milder, ∼2.5 times compared to the wild-type (Fig. 3f). Consistently, the killing efficiency in this background increased to just 78%. At the pyrF locus, the killing efficiency in the Δku background was comparable to the wild-type (∼85%) and the most frequent mutation, a C to T transition immediately upstream the point of cleavage, was the same as in the wild-type (Fig. 3g). The frequency of reads corresponding to this mutation dropped 3 times in comparison with the wild-type cells. The deletion of ligD resulted in 79% killing efficiency and increased the frequency of mutations at the pyrF locus ∼1.5 times (Fig. 3h). Overall, we conclude that LigD is dispensable for NHEJ for repair of DSBs generated by Cas9 in KT2440, while the presence of Ku is required at some sites.

Fig. 3.
figure 3

Normalized frequency of mutations at each target position in Δku and ΔligD strains. Data for indicated sgRNAs in the Δku or ΔligD background are presented. See Figure 2 legend for details.

An Excess of LigD Changes the Pattern of Mutations Generated at Repaired DSBs

We next performed the experiment described in Fig. 1b using cells with extra copies of ku or ligD genes cloned on plasmids. To this end, low-copy plasmids pHERD26T-ku and pHERD26T-ligD carrying the ku and ligD genes, respectively, under the control of the inducible araBAD promoter were constructed (see Experimental). The pHERD26T plasmid is compatible with the pSEVA258-Cas9 and pBBR-sgRNA plasmids. Cells carrying three different plasmids were prepared and grown in the presence of 0.2% L-arabinose to induce the expression of ku or ligD genes, which led to a ~65- and ~100-fold increase in steady-state levels of ku or ligD transcripts, respectively, as judged by RT-qPCR (Fig. S1, see Supplementary Information). Next, expression of the sgRNA-Cas9 complexes was induced. For all sgRNAs tested, the killing efficiency values were comparable to those obtained with wild-type cells (Fig. S2, see Supplementary Information).

The distributions and frequencies of mutations in surviving cells are shown in Fig. 4. For the mutS and mcb2 targets, no changes from corresponding wild-type cell cultures were observed (Figs. 4a to 4d). For the pyrF site, the C to T transition immediately upstream of the point of cleavage remained the most common allele in either the ku or ligD overexpression backgrounds. Several additional mutations at position located two nucleotides upstream of the cleavage site as well as a small deletion were detected upon ku overexpression (Fig. 4g). A dramatic change in the mutation pattern was observed in the mcb1 site. At conditions of ligD overexpression, long (up to 60 bp) deletions covering the target-PAM region became predominant. A single nucleotide insertion at the cleavage site also became frequent (Fig. 4f, a full overview of the allele variants for the mcb1 site at conditions of LigD overexpression can be found in Supplementary Information, Fig. S3). In contrast, the excess of Ku decreased the frequency of a C to T transition at the cleavage site and increased the frequency of point substitutions at more distant locations of this target (Fig. 4e). A similar, though less pronounced effect was observed at the pyrF site (Fig. 4g). It is possible that the effect of increased concentrations of NHEJ enzymes is more marked at the mcb1 target site because some mutations disrupting the pyrF function compromise cell survival as explained above. This idea is supported by the fact that all the mutations observed at the pyrF locus are either point substitutions or in-frame deletions/insertions.

Fig. 4.
figure 4

Normalized frequency of mutations at each target position in strains where ku or ligD were overexpressed (+ku and +ligD, respectively). Data for indicated sgRNAs in the +ku or +ligD background are presented. See Fig. 2 legend for details.

DISCUSSION

Our findings show that P. putida KT2440 cells in stationary cultures can religate genomic DNA ends created after Cas9 cleavage at some genomic locations, generating mutations at the point of DSB joining. In agreement with our results, it has been previously reported that NHEJ enzymes participate in stationary-phase mutagenesis in carbon-starved KT2440 cultures [9]. However, we found remarkable differences in the number of mutations at the point of Cas9 cleavage among the four genomic regions analyzed. The chi-square distribution test on the proportion of modified reads for each genomic site showed a p-value of <.001, meaning that the nature of the target strongly influences the number of mutations generated due to DSB repair. The targeted loci fell into two categories: highly mutagenic (mcb1 and pyrF) and barely mutagenic (mcb2 and mutS). The repair of highly mutagenic loci can occur in the absence of LigD, which points to the existence of a redundant, not yet reported alternative for error-prone repair of DSBs in KT2440. Consistently, studies in different bacterial species unmasked the existence of proteins other than LigD and Ku, for instance orthologs of Ku [35] or stand-alone domains of LigD [36], as well as additional factors interacting with the LigD-Ku complex involved in NHEJ [35, 36]. We note, however, that no orthologs of Ku and LigD can be identified in the KT2440 genome by standard bioinformatic tools. A more advanced bioinformatic search, combined with powerful methods of forward genetics, like Tn-Seq, would be useful to identify genes involved in repair events observed in KT2440 on the background of ku or ligD deletions.

The repair events we observe appear to be highly context-dependent. For example, repair at the mcb1 but not at the pyrF site is drastically decreased in the absence of Ku. The widely accepted model of NHEJ states that the initial binding of Ku to DNA ends is a critical step because LigD, the enzyme that processes and ligates the DNA, is attracted to the site only after Ku is bound [4]. One can speculate that repair at mcb1 mostly relies on the canonical, Ku-dependent NHEJ. Meanwhile, at pyrF an alternative Ku-independent end-joining pathway must be operational.

Experiments on linear plasmid-recircularization carried out in Mycobacterium smegmatis and Sinorhizobium meliloti showed that upon the inactivation of NHEJ enzymes, an alternative, Ku and LigD independent end-joining (A-EJ) activity becomes operational [37, 38]. The efficiency and fidelity of this type of repair varied depending on the DNA end type. Cleavage by Cas9 generates blunt ends [39], however, these ends could become staggered by cellular exonucleases whose action may depend on the sequence of DNA at the break point. Perhaps, the broken ends at mcb1 and pyrF have different susceptibility to degradation which dictates whether Ku is required or not.

In the barely mutagenic sites mutS and mcb2 neither inactivation nor overexpression of NHEJ-related enzymes had any significant effect. One can propose three possibilities accounting for this behavior: (1) mutations at these sites compromises cell survival, (2) DNA ends at these sites are inaccessible to repair enzymes, or (3) the DSBs at these points are faithfully repaired and cells survive due to defects in Cas9 targeting. The first possibility seems very unlikely since we observe striking differences in the rate of mutations between mcb1 and mcb2, two genomic regions located a short distance from each other in a silent promoter of an operon coding for a non-essential post-translationally modified peptide. In fact, the drastic difference in the behavior of the mcb1 and mcb2 sites provides the strongest evidence for a very high significance of target sequence for the outcome of NHEJ. The only clear difference between the two mcb targets regions is the GC content in the 5' micro-vicinity of PAM (Fig. 5). It is well established that during sgRNA-Cas9 interference phase, a stable R-loop is formed that allows endonucleolytic cleavage of DNA [40]. Once the DSB has occurred, the R-loop is closed and DNA repair enzymes can bind and stabilize the broken ends. It is plausible that the PAM-distal DNA end, which is in a single-stranded form right after the Cas9-catalyzed cleavage, is less protected from degradation and therefore more unstable. Thus, a low GC content in the R‑loop forming region could increase its instability and lead to a higher rate of mutations during the end-joining repair. Coincidentally, the barely mutagenic mutS and mcb2 sites possess stretches of 5‒6 GC nucleotides in the 5' micro-vicinity of PAM, while the highly mutagenic mcb1 and pyrF sites have no more than two GC nucleotides in a row in this region.

Fig. 5.
figure 5

GC content at the target regions. Guanines and cytosines are colored in yellow. Adenines and thymines are colored in red. PAM is marked. The graph was obtained using the software Geneious.

In agreement with our hypothesis, heterologous expression of NHEJ enzymes from M. tuberculosis in E. coli cells at conditions of sgRNA-Cas9 targeting generates, with low-rate, short deletions whose right ends are always located between the PAM and the point of cleavage but whose left ends vary, suggesting that the DNA end containing PAM is more protected from mutations, probably by the bound Cas9 itself [14]. Mutations we observe in our experiments at the highly mutagenic targets and to a much lesser extent at the mutS site, generally follow that same pattern.

If true, the proposed ‘GC-driven mutagenesis’ hypothesis would imply that in the barely mutagenic loci the DSBs are repaired in an accurate way (possibility 3, above). However, we also consider an alternative where the high GC content in the region of the R‑loop formation stabilizes the DNA–sgRNA interaction, hindering the dissociation of the sgRNA-Cas9 complex and consequently making the broken DNA ends inaccessible to repair enzymes (possibility 2). Both scenarios are not mutually exclusive. Systematic testing a higher number of sgRNAs with variable GC contents will be necessary to prove or disprove our hypothesis.

Be it as it may, with the creation of Δku and ΔligD strains reported here we demonstrate that, under appropriate growth conditions, it is possible to take advantage of endogenous error-prone repair mechanisms to inactivate targeted KT2440 genes in one step. In fact, the combination of endogenous NHEJ and CRISPR-Cas has already been successfully applied as a gene editing tool in Mycobacterium tuberculosis [41]. We also show that overexpression of NHEJ enzymes, mainly LigD, increases the variability of the mutations at the point of repair, which potentially makes this enzyme a useful addition to the one-step sgRNA–Cas9-based gene editing method, to generate a higher number of mutated allele variants for a targeted gene.