Background

Genetic diversity, e.g., variation in coding sequences, non-coding genes, and expression-regulating elements, is the basis for plant domestication and crop breeding. Natural variation and random mutations induced by physical and chemical treatments are the sources of genetic diversity in conventional breeding [1]. However, with the growing global population and increasing food demand, there is an urgent need for even greater genetic diversity in modern agriculture. Fortunately, genome editing tools offer a promising solution for protein evolution [1]. There are two major ways to achieve directed evolution, insertions/deletions (indels), and base changes [2,3,4,5]. Indels normally disrupt the function of genes and regulatory elements. Base changes generate a wider range of genetic variants than indels, including gain-of-function changes that fine-tune gene function. Prime editor (PE) and base editor (BE) are two major editing tools for introducing targeted base substitutions [6].

PE is a “search-and-replace” technology [7]. It has been used to create a large number of base substitutions at six conserved residues of acetyl-coenzyme A carboxylase 1 (OsACC1), generating 16 herbicide-resistant alleles in rice [5]. However, the editing products of PE are determined by the editing templates in the pegRNAs, and one pegRNA can only generate one editing product [7]. Hence, PE is suitable for modifying a defined base at a critical position rather than generating large mutant pools of altered bases for high-throughput screening for putative critical positions.

Unlike PE, BE can generate multiple types of editing products at each sgRNA target. Thus, BE has shown greater potential than PE for generating ample base variations. Previously, BE-mediated gene artificial evolution focused on designing multiple sgRNAs to cover whole gene or key regions of these genes [2]. However, the enormous evolutionary potential at every single sgRNA site has been overlooked. For example, cytosine base editor (CBE) is capable of converting multiple cytosines within the editing window in a single mutant derivative [8]. In addition to the primary C-to-T conversion, CBE produces atypical base substitutions such as C-to-G and C-to-A, albeit at lower frequencies [9]. As a result, it can theoretically generate a vast number of alleles, with many variants appearing rarely in nature. Previous studies have obtained herbicide-resistant rice by designing multiple sgRNA targets to cover key positions of genes such as acetolactate synthase (ALS) using base editors [2], but this approach requires obtaining a large number of rice mutant plants, which is time and labor-consuming and low-throughput, and this method has a major penalty that the number of nucleotide and protein mutation forms obtained by every single target is thin, which limits the possibilities of evolution. The generation of large-scale genetic variation depends on having a sufficient number of mutant lines, which is hard to create in crops. Moreover, storage and phenotypic screening of large-scale mutant libraries require much space, time, and labor. So, despite the rapid development of genome edit toolbox, there are no such systems capable of large-scale targeted gene mutagenesis and high-throughput screening in crops.

Arabidopsis (Arabidopsis thaliana) is a model plant that is small, has a short life cycle, and produces thousands of small seeds per plant. We can conduct phenotypic screening on young Arabidopsis seedlings in Petri dishes or soil. Furthermore, it shares many essential genes with crop plants, e.g., genes regulating stress tolerance, herbicide resistance, and photosynthetic efficiency [10, 11]. It would be a breakthrough to generate sufficient mutant derivatives of each sgRNA target required for high-throughput directed evolution in Arabidopsis to produce beneficial allelic forms for crop trait improvement. Here, we developed an efficient pipeline for germline-specific evolution that can easily produce and screen thousands to millions of mutant lines. Our studies of large mutant population produced in this way achieved significant improvements in directed evolution of genes and offered genetic resources for herbicide resistant trait and more importantly ensuing successful transmission application in crop plants. We anticipate that this approach will speed up many aspects of directed evolution for crop improvement.

Results

Development of the germline-specific evolution system in Arabidopsis

Base editor has great potential for generating ample base variations. Due to the advantages we mentioned above, Arabidopsis might be an excellent candidate for base editor mediated directed evolution in situ owing to its advantages of high-throughput. However, the diversity of mutation generated by base editor in Arabidopsis still needs to be improved. Thus, we developed a germline-specific evolution system, which combined an efficient germline-specific base editor with next generation screening, aimed to break this hurdle and generate diverse mutant forms during sexual propagation, enabling high-throughput screening for beneficial derivatives which could be further used for crop improvement (Fig. 1a). Base substitutions in some alleles often fail to be generated in the T1 plants probably due to the short time expression after floral dip. However, the same wild-type (WT) alleles continue to be subjected to editing in the germline cells of T1 plants during sexual propagation, thereby generating a variety of base edited lines in the T2 generation (Additional file 1: Fig. S1). A single T1 Arabidopsis plant with an unchanged WT allele has the potential to produce thousands of seeds, approximately 75% or more of which contain T-DNA expressing the base editor. Thus, this system enables a single T1 plant with WT allele to generate thousands of independent T2 mutant lines. Compared with traditional methods that screen the T1 plants, the system built here has the potential to increase the number of mutant lines by several orders of magnitude.

Fig. 1
figure 1

Overview of the germline-specific evolution system. a The germline-specific evolution strategy. Base editors controlled by the germline specific promoter are transformed into Arabidopsis via Agrobacterium-mediated floral dip. Base editing of the WT alleles can take place in germline cells of T1 transgenic plants during sexual reproduction. Typically, 5000 to 10,000 seeds can be collected from each T1 plant, which may contain thousands of T2 mutant lines with diverse allelic forms. The T2 lines are screened on selective plates for the desired mutants, which are subsequently genotyped to identify their mutant alleles. The desired alleles can then be used for crop breeding. Structures of pHEE901 and pHEE901-A3A. U6p, Arabidopsis U6 promoter; EC1.2en-EC1.1p, EC1.1 promoter fused with the EC1.2 enhancer; UGI, uracil DNA glycosylase inhibitor; NLS, nuclear localization signal. c Percentages of different mutation types at five endogenous target sites created using pHEE901 and pHEE901-A3A in the T1 generation. d Frequencies of different mutation types in T2 plants from six T1 WT plants containing sgRNAs targeting AG and EPSPS, respectively. Homo, homozygous; Hetero, heterozygous

The widely used CBE vector in Arabidopsis, pHEE901, contains the rat APOBEC1-based BE3 (base editor 3) under the control of the EC1.2en-EC1.1p promoter [12]. It has been reported that unedited T1 Arabidopsis plants containing this vector can generated base edited T2 mutants. However, the editing efficiency is less than 10%, and no C-to-A/G mutant has been identified which significantly restricts the diversity of potential allele forms [12]. To address this problem, we replaced rat APOBEC1 with a more efficient cytosine deaminase, human APOBEC3A [13], to create pHEE901-A3A (Fig. 1b). To check whether the editing efficiency of pHEE901-A3A was increased relative to that of the original BE3, we designed five sgRNAs for four target genes including agamous (AG) [14], auxin response factor 2 (ARF2) [15], 5-enolpyruvylshikimate-3-phosphate synthase (EPSPS) [16], and hypersensitive to ABA 1 (HAB1) [17] (Additional file 1: Figs. S2, S3 and S4). A total of 817 transgenic lines were generated, and Hi-TOM sequencing was used to assess the genotypes of all T1 plants [18]. pHEE901-A3A proved to be more efficient than pHEE901, with a wide editing window (Fig. 1c, Additional file 2: Tables S1 and S2). Notably, only pHEE901-A3A produced C-to-A and C-to-G conversions (Additional file 2: Table S2).

Although pHEE901-A3A performed base editing efficiently in T1 plants, 50.0–92.3% of these plants were WT, and 2.6–14.0% were heterozygotes containing an intact WT allele (Additional file 2: Table S1). To evaluate the editing efficiency of pHEE901-A3A during the sexual reproduction process from T1 to T2, we randomly selected six T1 WT plants containing sgRNA targeting AG and six targeting EPSPS. We then assessed the genotype of approximately 300 T2 plants from each of the T1 plants. Base editing had occurred in 31.2–76.9% of the T2 plants with sgRNA targeting AG and 14.9–45.8% of the T2 plants with sgRNA targeting EPSPS (Fig. 1d and Additional file 2: Table S3). These results confirmed that thousands of T2 mutant lines could be generated from a single T1 WT plant using this germline-specific evolution system.

Diverse T2 mutants can be generated at each sgRNA target using the germline-specific evolution system

To maximize the mutation depth of every single target, we investigated whether the T2 mutants were more extensive and diverse genetic variations than the T1 mutants at each target. Both protospacers of the AG and EPSPS targets contained seven cytosines (Additional file 1: Fig. S2). In theory, all seven cytosines could be altered; we could expect to obtain lots of mutated allele forms for each target. In the event, we respectively identified 106 and 36 forms of mutated allele for AG and EPSPS in the tested 1800 T2 plants, compared with only 22 and 12 in the 88 and 85 T1 plants (Fig. 2a, b, Additional file 1: Figs. S4 and S5). Our findings provide strong evidences supporting a correlation between the scale of mutants and types of genetic variants. More importantly, in this experiment, we have only tested a very small number of T2 plants, equivalent to only approximately a quarter of seeds produced by one T1 plant, whereas if we characterized and screened the seeds produced by all the T1 transgenic plants, we would obtain significantly larger number of mutation forms.

Fig. 2
figure 2

The germline-specific evolution system speeds up DNA evolution. a Numbers of alleles actually obtained in the T1 and T2 generations at AG and EPSPS targets. b Comparison of the mutated allele forms obtained in the T1 and T2 generations. c Base substitution frequencies at each base in the T1 and T2 generations. Numbers above the letters indicate positions of these bases in the protospacer. PAM sequences are indicated. d Numbers of mutated allele forms with typical C-to-T changes and atypical base substitutions in the T2 generation. e Numbers of mutated allele forms in the T2 generation with their frequencies. The mutated allele forms are shown in Additional file 1: Fig. S5. The frequencies denote the numbers of mutants harboring corresponding allele among the total plants sequenced

Analysis of the profiles of the editing products revealed that our system expanded the size of the editing window (Fig. 2c, Additional file 1: Figs. S4 and S5). For instance, the editing window of the EPSPS target expanded to 34 bp (from position − 18 to 16) in the T2 generation compared to 7 bp (positions 5 to 11) in the T1 plants (Fig. 2c). Similarly, base substitutions outside of the protospacer of the AG target were detected only in the T2 plants (Fig. 2c). We also observed that our evolution system yielded a wide range of atypical base substitutions including not only C-to-G and C-to-A but also A-to-T, G-to-A, G-to-C, and T-to-C, suggesting that deamination also occurred on A and G (Fig. 2c). The T-to-C conversion might have been resulted from deamination of A on the complementary strand. These results underscore the significant evolutionary potential of base editing on single sgRNA target.

Although more than half of the T2 altered allele forms were the results of atypical base substitutions, the most frequent deamination events occurred on cytosines in a narrow window, and the most common change was a C-to-T transition (Fig. 2d and Additional file 1: Fig. S5). As a result, most of the altered allele forms, including those resulting from C-to-T conversions in the extended editing window, or involving atypical base substitutions, arose at relatively low frequencies: over 75% occurred at frequencies below 1% and more than 40% at frequencies less than 0.1% (only one plant harboring the corresponding allele in the set of approximately 1800 T2 plants) (Fig. 2e and Additional file 1: Fig. S5). These findings show that our system provides an effective strategy for generating large-scale mutant libraries at every single target needed for achieving high-throughput directed evolution and screening for rare mutant forms. The evolutionary depth and efficiency of atypical base substitution can be further improved by using more potent deaminases and designing multiple target sites.

The germline-specific evolution system facilitates directed evolution of proteins

Since our system increased variation at the DNA level, we expected that it would also generate a wider spectrum of variants of AG and EPSPS at the protein level. Although we observed only 2 types of full-length AG protein variant and 10 types of full-length EPSPS protein variant in T1 plants, we identified 15 and 25 forms of full-length protein, respectively, in the tested T2 plants (Fig. 3a, b).

Fig. 3
figure 3

The germline-specific evolution system facilitates protein evolution. a Numbers of variants actually obtained in the T1 and T2 generations at AG and EPSPS targets. b Comparison of the altered protein forms obtained in the T1 and T2 generations. c Heatmap showing the frequencies of different amino acid substitutions in the T1 and T2 generations mutants with or without premature termination. d Numbers of plants harboring full-length mutation forms or premature stop codons in the T1 and T2 generations. e Number of mutant plants with full-length mutation variants of different types obtained in the T1 and T2 generations at the AG target. f Numbers of altered protein forms in the T2 generation with the corresponding frequencies. The altered protein forms are shown in Additional file 1: Fig. S6. The frequencies denote the numbers of mutants harboring corresponding mutated protein forms among the total plants sequenced

Next, we analyzed variation at the amino acid level and found that the presence of atypical base substitutions led to extensive amino acid variation. For example, P182 (CCA) of EPSPS was converted not only to Ser (TCA) and Leu (TTA and CTA) but also to Ala (GCA), Val (GTA), Ile (ATA), Thr (ACA), and a stop codon (TGA) (Fig. 3c, Additional file 1: Figs. S5 and S6). T2 plants contained a wider variety of amino acid changes at T152, R153, I154, R155, and S156 of AG and R181 and L183 of EPSPS than T1 plants (Fig. 3c and Additional file 1: Fig. S6). Additionally, due to the broader editing window, amino acid changes at L147, E148, and R149 of AG and G177 and A179 of EPSPS were detected only in T2 plants (Fig. 3c).

Premature termination of protein translation, which often results from C-to-T conversions at CAA, CGA, and CAG codons, poses a significant challenge to CBE-mediated directed evolution of proteins, as it can impact the evolution of not only these specific codons but also adjacent ones. For instance, the protospacer region of the AG target comprises two CGA codons encoding R153 and R155, with cytosines situated at positions 5 and 11 (Fig. 2c). In the T1 generation, only two of 42 mutant lines contained full-length mutated proteins with single amino acid changes (R153G or S156F), while the others resulted in premature termination, which prevented the functional detection of variants of the other amino acids in the protospacer region (Fig. 3d and Additional file 1: Fig. S6). Screening a large T2 mutant library circumvented this limitation: 102 of 1080 T2 mutated alleles encoded 15 different forms of full-length mutated AG protein were created, including amino acid conversions at five different positions (Additional file 1: Fig. S6). We noted that in these full-length AG variants, CGA either underwent atypical C-to-G/A or G-to-A/C changes instead of C-to-T changes or base editing occurred specifically on the surrounding codons but not CGA (Fig. 3e and Additional file 1: Fig. S7). These results showed that our evolution system is well protected against the adverse effects of mutant protein premature stoppage on artificial evolution. Similar to the frequency at the DNA level, over 60% of the different protein variants appeared at a frequency below 1% and more than 40% at a frequency less than 0.1% (Fig. 3f). In sum, the diversity of mutation significantly increased by using this germline-specific evolution system which could enable comprehensive and large-scale directed evolution of proteins.

Screening for EPSPS mutants exhibiting herbicide-resistance

Glyphosate is one of the most widely used herbicides that target EPSPS [16]. As the targeting site of EPSPS sgRNA used above is in close proximity to the coding sequence of key amino acids related to glyphosate resistance, we assessed the potential to generate herbicide resistant mutants. We sowed approximately 50,000 T2 seedlings from 10 T1 WT plants containing this sgRNA to 1/2 MS medium supplemented with 100 mM glyphosate (Fig. 4a). Only three seedlings grew, and all of them produced the same form of EPSPS with a rare combination of amino acid substitutions: T178I/A179V/P182S (Fig. 4b). This variant was the result of three C-to-T substitutions on C-4, C-1, and C8 and was not encountered in the previously sequenced T2 plants (Fig. 4b and Additional file 1: Fig. S5). Moreover, C-4 was not changed in any of the previously sequenced T2 plants (Fig. 2c). These results demonstrate that a large mutant library is critical for isolating rare beneficial mutants. Interestingly, the T178I/A179V/P182S form of EPSPS has been found in a naturally occurring high glyphosate-resistant strain of herbicide-resistant weed, and its equivalent variants have been used in rice and maize breeding [16, 19, 20].

Fig. 4
figure 4

Generating herbicide resistant mutants with the germline-specific evolution system. a Representative glyphosate-resistant EPSPS mutant created by this evolution system. Scale bar, 1 cm. b The base substitutions in the glyphosate-resistant seedlings shown in a. c Schematic of the ALS gene. White and green bars represent untranslated regions and exons respectively. The arrows indicate the positions of the sgRNAs on the sense (black rightward arrows) and antisense (red leftward arrows) strands. d Amino acid changes in the herbicide-resistant ALS mutants. e Base substitutions in the herbicide-resistant ALS mutants generated by sgRNA3. Numbers above the bases indicate positions in the protospacer, or positions of the amino acids in the protein. The PAM sequence is marked by yellow shadow. Dots indicate unmutated nucleotides and unchanged amino acids. f Representative Arabidopsis seedlings of WT and ALS mutants harboring P197F, P197F/R198C, P197F/R198C/R199C, P197V, P197V/R198C, and P197V/R198C/R199C mutations grown on 1/2 MS plates supplemented with different concentrations and types of ALS-targeting herbicides for 2 weeks. Scale bar, 1 cm. g Representative rice plants with WT, genome edited mutants with P171F and P171V/R172C/R173C forms of OsALS1, grown on medium with different ALS-targeting herbicides for 16 days. Scale bar, 2 cm. The herbicide concentrations used in these treatments are given in the “Methods” section. h Schematic of the HPPD gene and its sgRNAs used in this study. White and green bars represent untranslated regions and exons respectively, and lines represent introns. The lines above or below the schematic represent sgRNAs, with bold and colored ones indicate sgRNAs that resulted in herbicide-resistant mutant forms. i Representative topramezone-resistant HPPD mutant. Scale bar, 1 cm. j Representative Arabidopsis seedlings of WT and HPPD mutants harboring P339S, P339L, and L368F mutations grown on 1/2 MS plates supplemented with different concentrations and types of HPPD-targeting herbicides for 9 days. Scale bar, 1 cm

In the above screening, 10 unedited T1 WT plants generated more than 50,000 T2 seeds that could be easily accommodated in 10 Eppendorf tubes. According to the mutation rate calculated above, these T2 seeds should contain approximately 15,000 independent mutant lines, a number that is almost impossible to obtain in other ways. Moreover, we needed only 25 petri dishes to screen for herbicide-resistance for the 50,000 seeds. Thus, the use of the germline-specific evolution system in Arabidopsis is space/labor-saving and effective in generating, storing and high-throughput screening of mutant libraries.

Screening for herbicide-resistant ALS mutants

We also conducted directed evolution of Arabidopsis ALS, the target of five groups of commercial herbicides: imidazolinones (IMI), pyrimidinylthiobenzoates (PTB), sulfonylaminocarbonyltriazolinones (SCT), sulfonylureas (SU), and triazolopyrimidines (TP) [21]. The development of ALS mutants with broad-spectrum resistance (BSR) to several of these herbicides could benefit agricultural breeding focused on weed control. To isolate such a mutant, we designed 12 sgRNAs targeting the coding sequences of conserved ALS regions between Arabidopsis and rice and then created more than 60 transgenic T1 plants for each sgRNA (Fig. 4c, Additional file 1: Figs. S8 and S9). To screen for herbicide-resistant mutants, we grew 120 T2 seedlings from each T1 plant on 1/2 MS plates containing chlorsulfuron (SU herbicide), bispyribac (PTB herbicide), or imazapic (IMI herbicide). In total, we screened over 100,000 T2 seedlings from 834 T1 plants, determined the genotypes of 1244 herbicide-resistant T2 plants by Hi-TOM sequencing, and identified 46 forms of ALS mutant, including 32 that had not been reported previously (Fig. 4d, e, Additional file 1: Fig. S10 and Additional file 2: Table S4).

We focused on alleles with multiple resistance and found that one T2 seedling with a P197V/R198C/R199C mutation exhibited tolerance to imazapic; meanwhile, P197V and P197V/R198C mutations were observed in five chlorsulfuron-resistant seedlings (Additional file 2: Table S4). All P197F, P197F/R198C and P197F/R198C/R199C were resistance to three herbicides, moreover, P197F/R198C and P197F/R198C/R199C mutants showed resistance to all three tested herbicides (Additional file 2: Table S4). To confirm their resistances to different ALS-targeting herbicides, we generated T3 homozygous derivatives. None of these mutations had a significant influence on plant growth in the absence of herbicide treatment (Additional file 1: Fig. S11). We then grew the mutants on 1/2 MS plates with each of five ALS-targeting herbicides (Fig. 4f). P197V offered robust resistance to four tested herbicides, namely imazapic, pyroxsulam, chlorsulfuron, and flucarbazone, whereas P197F showed strong resistance only to bispyribac and chlorsulfuron (Fig. 4f and Additional file 1: Fig. S12). Moreover, the R198C/R199C mutation increased the tolerance of P197F/V to higher doses of several herbicides (Fig. 4f and Additional file 1: Fig. S12).

To check whether P171F and P197V/R198C/R199C could be used as BSR forms of ALS in crops, we generated genome edited rice plants with modified gene sequences encoding the corresponding P171F and P171V/R172C/R173C variants of OsALS1 using prime editing (Additional file 1: Fig. S13). The BSR of both variants in rice was similar to that observed in Arabidopsis, indicating that the germline-specific evolution system works well for crop improvement (Fig. 4g and Additional file 1: Fig. S14). These results demonstrate that the germline-specific evolution system can perform efficient and effective directed evolution of prominent agronomic trait genes, and more importantly, the obtained mutations can be good applied in molecular breeding of other crops.

Screening for herbicide-resistant HPPD mutants

To conduct direct evolution on 4-hydroxyphenylpyruvate dioxygenase (HPPD), another conserved protein and important target of commercial herbicides (Additional file 1: Fig. S15) [22], we generated a Cas9-NG vector [23], pHEE901-A3A-NG, to expand the target scope of our system, and designed 92 sgRNAs to cover its coding sequence (Fig. 4h and Additional file 2: Table S5). Over 3 million T2 seedlings were planted on 1/2 MS medium supplemented with five HPPD inhibitor herbicides, including pyrasulfotole, topramezone, mesotrione, tembotrione, and isoxaflutole (Fig. 4i). We obtained 26 herbicide-resistant HPPD mutation forms caused by 10 sgRNAs (Fig. 4h, Additional file 1: Figs. S16 and S17). These mutant forms contained substitutions on 17 highly conserved amino acid residues, particularly among dicot plants (Additional file 1: Figs. S15 and S16). Among the 26 mutant forms, 14 occurred on the target of sgRNA68, and all of them contained amino acid substitutions on P339, suggesting that the critical roles of P339 and its surrounding amino acids in herbicide binding. In a previous study, six amino acids, including P339, were identified as potentially contributory for the divergence in herbicide-binding activity between AtHPPD and ZmHPPD [24]. Our findings indicate that the diversity on this amino acid may partially account for the HPPD-type herbicide resistance observed in maize. Further testing in T2 generation indicated that mutants on the target of sgRNA68 and sgRNA73 showed higher degree of resistance to these herbicides (Fig. 4j). However, all of these mutants have not provided enough tolerance which can be applied in crop plants. In these assays, we have identified several important regions of HPPD strongly relating to herbicide resistance. Thus, in the future, we will simultaneously express several sgRNAs to cover all of these key regions to perform larger scale evolution for creating high level BSR mutants, which can be applied in crop plants.

Discussion

In this study, we describe the potential of BE-mediated artificial evolution for generating a large allelic diversity at single target which has been overlooked before, probably due to the lack of an efficient tool for producing and screening large mutant libraries. To address this issue, we established a germline-specific evolution system to screen beneficial alleles in Arabidopsis which could be applied for crop improvement. This system allows unedited T1 Arabidopsis plants with germline-specifically expressed A3A-BE to produce thousands of T2 mutant lines, all of which can be stored in a small Eppendorf tube, enabling space-saving and labor-efficient phenotypic screening.

It has been demonstrated that A3A-BE behave broader editing window than APOBEC1-mediated BE due to its higher editing efficiency [13]. Rare base substitutions, which occur at very low frequencies, can only been found in large-scale mutant pool. Thus, the use of this germline-specific evolution system broadens the editing window and permits the detection of atypical base conversions, significantly increasing the available genetic diversity at both the DNA and protein levels. Importantly, this system reduces the inhibitory effect of premature translation termination on the directed evolution of proteins. We believe that the germline-specific evolution system should also permit us to co-evolve key amino acid residues distributed across different regions of a protein.

Several in vivo evolution studies have been conducted in plants, with screenings primarily focusing on crops to identify herbicide-resistant alleles [2, 25,26,27,28,29]. These screenings typically involved selecting calli on plates supplemented with herbicide, from which T0 resistant lines were subsequently generated. However, the labor-intensive and space-demanding nature of callus transformation limited the selection to only a few thousand calli, thus restricting the variety of mutants that could be examined. In contrast, our system conducted screening in the T2 generation of Arabidopsis, facilitating the evaluation of millions of transgenic lines for the same purpose, resulting in deep evolution on each sgRNA and higher yield of herbicide-resistant mutant alleles. For instance, in comparison to the study by Kuang et al., who employed 63 sgRNAs to evolve ALS in rice and identified only five types of herbicide-resistant variants, with P171F being the most potent [2], we discovered 46 variants. Among these, P171F/R172C/R173C and P171V/R172C/R173C exhibited stronger and broader resistance to ALS-targeting herbicides than P171F.

Multiple strategies could be applied to further increase the genetic diversity created by the germline-specific evolution system. For instance, dCas9-AID(x) has shown the capability to induce more C-to-G/A substitutions compared to dCas9-AID(x)-UGI [30]. Therefore, by modifying A3A-CBE through the removal of the UGI or replacing UGI with uracil-DNA glycosylase [31], we could potentially amplify the mutation forms. Furthermore, exploring the substitution of A3A-CBE with different glycosylase base editors to facilitate C-to-N, T-to-N, A-to-N, and G-to-N substitutions is another promising avenue [32, 33]. Additionally, the utilization of adenine and cytosine base editor (ACBE) could enable the simultaneous base substitution of both A and C nucleotides [25, 34,35,36,37,38,39]. These prospective strategies present exciting directions for our research. The use of BE-mediated evolution system is complementary to that of PE, which is well-suited to achieving directed changes at critical positions in a gene rather than high-throughput screening at the whole-gene level. Using BE, our system can identify critical positions for focused PE-mediated directed evolution.

In addition to studying resistance traits, Arabidopsis is an excellent model plant for investigating various aspects such as photosynthesis rate, fertilizer use efficiency, disease resistance, abiotic tolerance, and heavy metal absorption. Many key genes that determine these traits are highly conserved between Arabidopsis and crop plants. The germline-specific evolution system enables the generation of a diverse library of mutant lines for screening purposes. While some mutations can be easily screened in petri dishes by naked eye, certain phenotypic screenings may require more extensive work and equipment compared to screening for herbicide resistance. For instance, stresses such as salt, temperature fluctuations, pathogens, and various other stressors can lead to an increase in cytoplasmic Ca2+ levels and a decline in photosystem II activity [40,41,42]. In such cases, we can screen for mutants affecting plant responses to these stresses by utilizing an aequorin-expressing background [43, 44], which detects Ca2+ signals and exhibits fluorescence, or screen for mutants with altered photosystem II activity using Imaging-PAM [45]. Anyway, the workload for screening in Arabidopsis is typically smaller than that in crop plants. Furthermore, there is potential to leverage phenomics tools in combination with image recognition using machine learning to establish a series of efficient high-throughput phenotypic screening systems.

Conclusions

Base editing holds significant promise for artificial evolution, but so far, its full potential has been underestimated. To unlock this potential, the availability of expansive mutant pools with diverse genetic profiles and reliable high-throughput screening systems is crucial. The germline-specific evolution system we established in this study has demonstrated the capacity to generate a wide range of genetic variations by introducing unconventional base substitutions and expanding editing window and to perform large scale screening in a space/labor-saving way. This system has successfully addressed the challenge of premature termination, a significant hurdle in CBE-mediated protein evolution. Notably, the application of this system has enabled the efficient production of herbicide-resistant variants, showcasing its potential benefits for crop breeding.

Methods

Plant material and growth conditions

Arabidopsis ecotype Columbia (Col-0) was used for transformation and phenotypic analysis. All Arabidopsis plants were grown in a growth chamber with a 22 °C, 16/8 h day/night photoperiod, and 70% relative humidity. Rice variety Zhonghua11 was used for Agrobacterium-mediated transformation of rice callus cells, as previously described [46]. Hygromycin at 50 μg/mL was used to select transgenic calli, and transgenic plantlets were regenerated on selective medium 10 ~ 12 weeks later. The transgenic plantlets were grown at 28 °C and 16/8 h day/night photoperiod in bottles.

Plasmid construction

To generate plasmid pHEE901-A3A, pHEE901 was digested with XbaI and SacI to generate a 12.8 kb fragment, and pH-A3A-PBE plasmid was digested with AvrII and SacI, and a 5.1-kb fragment was recovered [13]. Then, the two fragments were ligated using T4 DNA ligase to produce pHEE901-A3A.

To generate the pHEE901-A3A-NG plasmid, we firstly digested the pHEE901-A3A using MluI and SacI. Next, we ligated this plasmid backbone with a 1.2-kb fragment that was digested from pHUE411-NG using the same enzymes to obtain an intermedia plasmid lacking uracil glycosylase inhibitor (UGI). Then, the UGI sequence was amplified from pHEE901-A3A using the primer pair UGI-F and UGI-R and inserted into the SacI-digested intermedia plasmid by Gibson assembly to generate the final pHEE901-A3A-NG plasmid.

To obtain Arabidopsis transgenic lines, 109 pairs of oligonucleotides encoding the designed sgRNAs were synthesized, annealed, and cloned into BsaI-digested pHEE901, pHEE901-A3A, or spCas9-NG by T4 DNA ligase. The sequences of the synthesized DNA oligos and all primers used in this study are listed in Additional file 2: Table S5.

For the generation of genome edited rice ALS mutants, the diagram and sequence of pegRNA and nicking sgRNA are listed in Additional file 1: Fig. S18. These sequences were synthesized and digested with BsaI and then inserted into the BsaI digested Prime editing vector enPPE2 [47].

Agrobacterium-mediated transformation of Arabidopsis

Constructs were transformed into Agrobacterium tumefaciens strain GV3101 and introduced into Arabidopsis by the floral dip method as described [48]. Seeds were harvested one-and-a-half months later and selected on 1/2 MS plates containing 25 μg/mL hygromycin. Resistant seedlings were transplanted to soil 2 weeks later.

Agrobacterium-mediated transformation of rice callus cells

Constructs were transformed into Agrobacterium tumefaciens strain EHA105 by electroporation. Agrobacterium-mediated transformation of Zhonghua11 callus cells was carried out as reported [46]. Three days after cocultivation, the calluses were moved to medium with 50 μg/mL hygromycin to select positive transformants.

Assessment of base editing efficiency

To measure the editing efficiency of T1 plants, samples of WT and T1 plants were collected. The editing efficiency of T2 plants was measured using samples collected from the progenies of WT T1 lines. DNA was extracted by heating samples at 95 °C for 10 min in DNA extraction buffer (50 mM Tris–HCl pH 7.5, 300 mM NaCl, 300 mM sucrose). After centrifugation at 12,000 g for 5 min, supernatants were collected and used as templates for PCR using SanTaq PCR Mix (Sangon Biotech). PCR products were sequenced by Sanger sequencing or Hi-TOM sequencing [18]. The sequencing results were analyzed using the BioEdit software.

The mutation type (homozygous, heterozygous, biallelic or chimeric) of each T1 and T2 plant was determined by Hi-TOM sequencing. Reads that represented less than 10% of the results of Hi-TOM sequencing were considered aerosol contaminants or technical errors. Plants that contained one allele were identified as WT or homozygote; plants had two alleles with within-twice frequency differences were considered to have two equal alleles, i.e., to be heterozygous or biallelic; otherwise, they were considered chimeric.

The frequencies of DNA alleles, protein variants, and single base or amino acid changes were calculated as number of plants with these changes versus the total number of plants examined.

Selection of herbicide-resistant plants

To screen the herbicide-resistant EPSPS mutants, seeds of T1 plants were sown on 1/2 MS plates supplemented with 100 mM glyphosate. To select the herbicide-resistant ALS mutants, seeds of T1 plants were planted on 1/2 MS plates containing 0.02 mg/L chlorsulfuron, 0.03 mg/L bispyribac, or 0.05 mg/L imazapic. To obtain the herbicide-resistant HPPD mutants, seeds of T1 plants were sown on 1/2 MS plates supplemented with 0.75 mg/L pyrasulfotole, 0.1 mg/L topramezone, 0.1 mg/L mesotrione, 0.1 mg/L tembotrione, or 0.04 mg/L isoxaflutole. Resistant seedlings were transplanted to soil two weeks later for DNA extraction and seed collection. Hi-TOM sequencing was used to determine the mutant alleles in herbicide-resistant plants.

BSR assay for ALS mutants

Seeds of the P197F, P197F/R198C, P197F/R198C/R199C, P197V, P197V/R198C, and P197V/R198C/R199C mutants were sown on 1/2 MS plates containing different concentrations of herbicides (low dosages: 0.04 mg/L imazapic, 0.04 mg/L pyroxsulam, 0.03 mg/L bispyribac, 0.3 mg/L chlorsulfuron or 0.3 mg/L flucarbazone-sodium; high dosages: 0.08 mg/L imazapic, 0.08 mg/L pyroxsulam, 0.06 mg/L bispyribac, 0.6 mg/L chlorsulfuron or 0.6 mg/L flucarbazone-sodium) for the BSR assay. After 2 weeks, photographs were taken, and the fresh weights of individual plants were recorded.

To investigate whether P171V/R172C/R173C, the equivalent OsALS mutant to P197V/R198C/R199C of AtALS, conferred BSR to ALS herbicides in rice, genome edited plants with modified gene sequences encoding the corresponding P171F and P171V/R172C/R173C variants of OsALS1-using prime editor in T1 rice plants were grown on 1/2 MS medium for 2 days and transferred to 1/2 MS solid medium containing 0.9 mg/L imazapic, 0.4 mg/L pyroxsulam, 1.68 mg/L bispyribac, 8 mg/L chlorsulfuron, or 3.15 mg/L flucarbazone-sodium for 16 days to check for growth.

BSR assay for HPPD mutants

Seeds of T2 of HPPD mutants were sown on 1/2 MS plates supplemented with 0.2 mg/L pyrasulfotole, 0.03 mg/L topramezone, 0.02 mg/L isoxaflutole, 0.03 mg/L mesotrione, or 0.03 mg/L tembotrione for 9 days.