Overview

Plant breeding has traditionally been an iterative process of combining favorable alleles from two or more parents into individual progeny, improving their phenotypic characteristics. It is a biological lottery with poor odds, dependent primarily on segregation and recombination of alleles. For example, the probability of recovering a homozygous inbred individual combining favorable alleles at 10 unlinked loci is one in one million. The probability decreases to one in 10 billion or worse if the loci are physically located near each other on one or more chromosomes. Given that many species have 25,000 to 30,000 genes, with multiple alleles present for each, the challenge of relying on probability clearly produces incremental progress at best.

In addition to the probabilistic challenge of optimal individuals being present in a population, superior cultivars need to be identified based on phenotypic observation. If the lottery winner happens to be grown in poor field plots, it will be measured as underperforming relative to others and never be identified. Heritabilities for many relevant traits range from 30 to 80% on average, meaning that for traits such as grain yield, more than one-half of the observed variation may be non-genetic, which masks the true value of individuals. The statistical improbability of generating genetically optimal individuals along with challenges in measuring performance has driven the search for technologies to improve cultivar development. It should be noted that, despite the aforementioned difficulties, crop domestication and formalized breeding efforts have produced the species and varieties that support civilization today through a large-scale composite effort across thousands of years. Time is no longer on our side, however, and breeding progress must be increased to address current and future agricultural production needs in the face of population increase and changing climate conditions.

Genetic technologies play an increasingly important role in the breeding process: Genomic sequencing and genetic marker analysis have been harnessed to identify genetically superior individuals based on their DNA composition and minimizing reliance on variable field evaluations. In addition to improving the probability of identifying genetically superior individuals, generation time can be accelerated, and cost reduced, by selecting on genetic marker profiles rather than measured performance in production environments. Genomic and marker-assisted selection are now common in many breeding programs for many species. Despite the utility of these approaches, the process still fundamentally relies on the genetic segregation and recombination lottery previously described while improving the odds of identifying superior individuals, if they exist. The process also is limited to a large extent by the allelic variation present within the scope of accessible sexual crosses. Precise, targeted modification of DNA sequences to accelerate accumulation of favorable alleles, and to create variation that is not available through sexual crosses, has the potential to dramatically change the process, pace, and efficiency of cultivar development (Arora and Narula 2017; Scheben et al. 2017; Wang et al. 2019a, b). New breeding technologies, including transgenics, cisgenics, and gene-editing, provide tools to raise the process of crop improvement to a new level.

Specific Challenges and Opportunities in Crop Breeding

Positive allelic variation is necessary for breeding progress; major effect alleles are preferred

The process of breeding relies on having sequence variation available to select improved individuals. Sequence variation exists on a continuum of complexity, ranging from single nucleotide changes to translocations thousands of base-pairs in length. The simplest form of sequence variation is single nucleotide polymorphisms (SNPs), which is the change of one base to another within a sequence. More complex types of sequence variation include insertions and deletions, commonly called InDels, which are typically defined as ranging from one base pair to 50 base pairs in length. Also in this category are copy number variants (CNVs) and presence/absence variants (PAVs). CNVs are variations in the number of copies of a sequence in a genome while PAVs are differences in the presence or absence of the sequence entirely. Complementing these are larger structural variants: inversions and translocations, which result in large-scale rearrangement of portions of chromosomes.

Many genetic variants are “neutral changes” in sequence that do not result in observable phenotypic variance (Nei et al. 2010); however, the genetic changes most interesting and valuable to plant breeders are those that alter phenotypes for traits of interest. Consequently, understanding how sequence variation manifests into phenotypic variation is vital to being able to manipulate variation intelligently. SNPs and InDels can induce a variety of changes. The coding sequence of a protein can be changed, resulting in different structures and function (Wu et al. 2017). Alternatively, translation start sites, premature stop codons, RNA stability, and even translation efficiency can all be impacted by SNPs and InDels. Changes can also be made outside of coding regions such as at cis-regulatory elements (Wang et al. 2019a, b), changing the expression pattern of an affected gene.

Larger modifications can induce all the above changes as well as novel outcomes: CNVs over evolutionary time can lead to neo-functionalization and novel adaptations. For example, the rice gene, TB2, is a duplicate gene of TB1 and has been found to have a novel role in positive regulation of tillering, opposite to the negative regulation by TB1. Consequently, variants of TB2 which cause heightened expression have been selected for in upland rice for enhanced tiller number (Lyu et al. 2020). PAVs can modify whole networks of genes through the introduction of novel genes or modification of existing ones. This is exemplified in the case of the formation of the VGT1 locus, which is caused by a miniature inverted-repeat transposable element (MITE) insertion into an upstream regulatory element, resulting in differences in flowering time in maize (Castelletti et al. 2014). In this case, the MITE insertion resulted in a change in the epigenomic landscape, which is another way for variations to manifest into phenotypic outcomes. Ultimately, there are many different outcomes that sequence variation can create, each of which is potentially useful for generating the next productive crop variety.

Equally important to understanding the kinds of variation is understanding how variation has been generated historically. Much of the crop genetic variation accessed today is historical in origin and was accumulated over thousands to millions of years of evolution. For many species, trillions or more individuals grow each year, and mutations occur at some frequency in all of them. In Arabidopsis thaliana, the rate of single site mutation has been found to be only 7 × 10−9 base substitutions per site per generation, which, given a genome size of 120 million base pairs, means only a single base change per generation (Ossowski et al. 2010). In addition to mutagenic forces, sequence variants arise from recombination errors, DNA repair errors, and replication errors (Gabur et al. 2019). Most sequence variations are neutral or negative in effect, with the negative variations being rapidly purged from populations due to negative selection. Some mutations are positive, and a small fraction of those positive mutations have large effects on plant phenotype. These large effect positive alleles are usually quickly fixed in populations through selection (Stetter et al. 2018), leaving breeders to sort through many small effect alleles across many loci, a relatively inefficient substrate to make breeding progress.

Unfortunately, the processes of novel variation generation in plant breeding have become substantially limited. For each crop that was domesticated in human history, billions to trillions of plants were grown by countless small-scale farmers, and each genetically unique plant is manually observed by a human (Meyer and Purugganan 2013). Currently, in the USA alone, trillions of crop plants are grown each year, but the vast majority are bulk harvested for consumption, not observed, and not available to transmit new variation to the next generation. As a result, in current farming systems, the genetic variation-generating “machine” has largely been taken out of operation. Approaches to increase or produce positive allelic variation, and ideally large effect positive alleles, are critically needed to replace that machinery.

Targeted alteration of recombination frequencies and patterns could unlock cryptic variation

It is not sufficient to merely possess favorable alleles in a breeding population; they need be combined into a single individual, which is complicated by the nature of genetic linkage. The proximity of genes on the same chromosome that are contributing to phenotypic variation can be either positive or negative. If favorable alleles are genetically linked, they will disproportionately be transmitted together through sexual generations increasing the chance that offspring will inherit the favorable combination. In many cases, favorable and unfavorable alleles are genetically linked, locking up variation that can only be expressed and resolved following recombination, which is a challenging process to rely on. Recombination occurs infrequently, with rarely more than three cross-over (CO) events occurring per pair of chromosomes per generation (Christine et al. 2015). Cross-over number increases with size of the chromosome within a species, but that does not apply across all species. Furthermore, recombination does not take place uniformly across the genome with 80% of COs concentrated into only 25% of the genome in most plant species (Blary and Jenczewski 2019).

One repercussion of this low frequency of recombination is that undesirable alleles accrue in those regions because of the selection of linked alleles (Cutter and Payseur 2013). Genetic drift is also more likely to cause fixation in these regions, even in the absence of selection. The specificity of where crossover events occur is a result of differentially accessible regions of the genome, euchromatin and heterochromatin, for those that are more or less accessible, respectively. Creation of these regions has been found to be influenced by methylation and chromatin accessibility, though this is still an open area of investigation (Blary and Jenczewski 2019). The outcome of this phenomenon is that the exploitable sequence variation space is severely limited. For example, in maize, up to 20% of genes are located in regions largely untouched by recombination (Bauer et al. 2013). This impacts the ability to make genetic gain within crops as well as the ability to deploy approaches such as QTL mapping and genomic selection models, both of which rely on recombination. Overcoming this limitation represents a large potential gain for breeding progress.

While major effect alleles have been impactful on crop development, much of the phenotypic variation resides outside of detectable QTL regions, indicating that many small effect alleles working in concert underlie many traits (Huang et al. 2012; Boyle et al. 2017; Miao et al. 2019; Chen et al. 2020). Organizing these small-effect alleles into a maximized configuration is the modern plant breeding struggle, which is currently limited by the number of COs per meiotic event. Even in ideal conditions where breeders have perfect information on the best alleles, it would take dozens of generations to arrive at an ideal combination. Breeding is ultimately a numbers game and recombination is a multiplier on the efficiency of a program. Increasing this multiplier can now be achieved through application of editing technologies.

Combining favorable genes and alleles from wild progenitors and across species boundaries

Integration of novel or exotic breeding material into the germplasm of a breeding program is generally limited by both biological boundaries which prevent effective sexual crosses as well as the barrier of genetic-maladaption of exotic germplasm. In the first case, genomic information coupled with knowledge of local adaptation, evolutionary history, and direct performance measurements now supports identification of favorable alleles within a diverse plant species. This information is commonly found to be relevant to other species, even those across sexual boundaries. The greatest example of this is the past and present use of model species to learn more about plants broadly. For instance, much of the understanding of flowering time in maize and rice originated in Arabidopsis and has been transferred through laborious experimentation (Hill and Li 2016). In the second case, exotic lines and wild progenitors carry negative traits along with potentially useful sequence variation. Because variation tends to be reduced in modern crop breeding pools compared to exotic germplasm and wild progenitors, breeders may be missing out on potentially useful variation. For example, the genetic regions that most strongly differentiate maize from its wild progenitor, teosinte, explain the least quantitative variation, demonstrating fixation of alleles through domestication. Overall, the number of detectable quantitative trait loci (QTLs) in maize compared to teosinte are much smaller, indicating additional evidence of fixation (Chen et al. 2020). Nucleotide diversity in maize has been reduced by 30% overall, which is even greater in regions under selection (Wright et al. 2005; Hufford et al. 2012). Similar patterns of variation loss have been noted in other crops such as wheat and soybean (Sedivy et al. 2017; Maccaferri et al. 2019). Without new sources of variation, increased selection pressure will only further deplete existing variation, causing breeding gains to plateau over time (Jannink 2010).

Traditional breeding is able overcome both barriers in some capacity: Cross-species information can be applied by looking for existing variation within the shared genetic regions, though there is no guarantee that the necessary materials will be available. Prebreeding techniques can be used to make exotic and wild germplasm more amendable for use; however, this is time and resource intensive. Editing technologies have the potential to lower the barrier for bringing to bear the encyclopedia of evolution across all species by enabling direct modification of existing sequences. These modifications can duplicate findings in other species and distant relatives, enabling faster cycles of development at reduced cost. Furthermore, editing technologies may allow creation of wholly designed sequences, potentially better than any that could be found within any existing individual.

The Current Tools of Gene Editing

First-generation editing tools

The ability to selectively identify and manipulate DNA sequences can significantly impact advancements made in the areas of crop functional genomics research and crop improvement. Gene editing systems now provide the molecular tools to change DNA sequences in specific and controlled ways. The first gene editing systems developed were based on meganucleases as well as use of zinc-finger nucleases (ZFNs) to locate, cut, and create double-stranded breaks (DSB) in targeted DNA sequences (Carroll 2008). ZFNs systems use the FokI nuclease cleavage domain paired with zinc finger DNA-binding domain protein motifs (ZFP) which recognize 3 base-pair DNA sequences, with every possible 3 base-pair combination now available for targeting of typically 20–30bp DNA sequences through linking of 6–10 ZFN units. Another system for genome editing was subsequently developed based on utilization of transcription activator-like (TAL) effector proteins to target nuclease activity, called transcription activator-like effector nucleases (TALENs), to specific DNA sequences (Bedell et al. 2012). Each TAL effector targets a single base pair and effectors can be connected to generate targeted editing of specific DNA sequences. Similar to ZFNs, the TAL effectors are paired with FokI nuclease to provide the sequence recognition and DNA cleavage functions, respectively. In both systems, to target a different DNA sequence for cleavage, new vectors encoding ZFN and TALEN arrays able to recognize the new target sequence must be built. Additionally, since the FokI nuclease is used for DNA cleavage, vectors encoding the arrays must be designed as dimers targeting both the forward and reverse strands. The complexities of array and vector design in the ZFN and TALEN systems, coupled with the laborious effort required to redesign vectors for each new DNA sequence target, have impeded the widespread use of those systems for plant genome editing.

CRISPR/Cas9-based editing systems

CRISPR/Cas9-based genome editing systems (Clustered Regularly Interspaced Short Palindromic Repeats/CRISPR-Associated protein 9) have been utilized for 8 yr in plants (Li et al. 2013; Nekrasov et al. 2013; Shan et al. 2013) and has continued to expand in applications and approaches. CRISPR/Cas9-based gene editing systems present a significant advancement over previous systems due to simplicity and versatility of vector design and construction, which has resulted in a wider array of editing tools being developed utilizing this system (Chen et al. 2019; Aamir et al. 2020; Zhu et al. 2020; Nadakuduti and Enciso-rodríguez 2021). In the most common implementation of the system, the Cas9 nuclease enzyme targets a specific DNA sequence for cleavage and creation of double-strand breaks (DSBs) as specified by the single guide RNA sequence (sgRNA), which is made up of a variable sequence targeting region and a fixed Cas9-associated region. This sgRNA contains a protospacer-adjacent motif (PAM) that is part of the fixed region and specifies a 5’-NGG-3’ sequence along with the variable targeting region (Jinek et al. 2012; Cong et al. 2013; Mali et al. 2013a, b). Modifications of the original system have overcome this limitation however, with the part of the Cas protein that confers PAM specificity being changed to provide a range of possible PAM sites (Kleinstiver et al. 2015; Hu et al. 2018; Nishimasu et al. 2018). A recent development in this vein has been the “SpRY” engineered SpCas9 which has been shown to have a highly flexible PAM recognition in rice (Xu et al. 2021).

Changing the DNA sequence to be targeted for editing requires simply changing the guide sequence of the sgRNA instead of engineering a new protein, as in previous editing systems. Additionally, since the platform is not reliant on expression of sequence-recognition proteins such as those in ZFN and TALENS systems, it makes deployment against many targets at once (multiplex editing) much easier. One drawback of the system is that off-target edits have been observed, though this effect seems to be smaller in plants compared to other species (Fu et al. 2013; Zhang et al. 2018a, b). Cas9 is not the only protein utilized for editing as it is a member of a broader family of related enzymes that have been derived from other CRISPR systems, including Cas12a, CasΦ, and Cms1, each with slightly different capabilities (Zetsche et al. 2015; Begemann et al. 2017; Pausch et al. 2020). Cas12a proteins have also been modified to have altered PAM recognition sites (Gao et al. 2017; Li et al. 2018; Zhong et al. 2018).

The double-stranded cut created by the Cas9 nuclease creates genetic modifications by engaging the natural cellular DNA repair pathways of nonhomologous end joining (NHEJ) and homology-directed repair (HDR) (Symington and Gautier 2011; Ceccaldi et al. 2016; Kosicki et al. 2018). NHEJ functions in plants through a classical Ku-dependent manner as well as many backup systems, which results in small insertions and deletions (indels) at the site of repair, though weighted towards deletions over insertions (Shen et al. 2017). HDR employs a template sequence which it uses to repair break sites, repairing the damaged site as a copy of the template. This template sequence can come from a homologous chromatid or exogenous sources delivered to the cell and can result in sequence replacements up to several hundred base pairs in length (Shi et al. 2016). HDR is limited, however, to the S and G2 phases of the cell cycle which makes it exceedingly less common than NHEJ, with some estimates of the natural rate of HDR as low as 1 in 10,000 (Puchta 2005; Salsman and Dellaire 2017).

Expanded applications of Cas9 for editing beyond double-strand breaks

The establishment of CRISPR/Cas9-based editing systems and the associated advantages of those systems has led to rapid, widespread adoption for editing applications in a broad array of species, including plants. This has resulted in steady development of modified and improved forms of Cas9 and Cas9-like nucleases and associated RNA guides leading to novel editing functions. These novel functions and applications are summarized in Table 1, both as applied in plants as well as their non-plant origins. Deactivating one of the two nuclease domains of Cas9 results in the formation of a nickase (nCas9) which cuts only one strand of the double-stranded DNA target sequence (Jiang and Doudna 2017). This can then be used in a paired fashion to create DSBs that have enhanced specificity compared to standard Cas9 (Ran et al. 2013). If both nucleases are deactivated (dCas9), the enzyme will still retain its ability to associate with guide RNAs and can thus be directed to specific DNA sequence locations. The dCas9 can, for example, be targeted to cis-regulatory elements (CREs) and RNA polymerase binding sites to disrupt gene expression, referred to as CRISPR interference (Qi et al. 2013). dCas9 can also be paired with other proteins to target them to specific regions of the genome. Transcription factors can be attached to the Cas9 protein and brought to sites of interest, such as promoter regions, enhancing or repressing expression. For example, a fusion of dCas9 with the transcription factors EDLL and MS:VPR was found to be effective for targeted strong activation of low-expressed genes in tomato (Vazquez-vilar et al. 2019). A final application of dCas9 fusions has been with methylation and demethylation-related proteins and has been used to epigenetically modulate gene expression (Vojta et al. 2016; Xu et al. 2016).

Table 1. Fundamental gene editing studies in plant and non-plant organisms

One suite of tools enabled by linking other proteins to nCas9 proteins is base editing, in which one base can be specifically exchanged for another. C:G to T:A base transitions, known as cytosine base editing (CBE), can be accomplished by pairing nCas9 with a cytidine deaminase and a uracil DNA glycosylase (Komor et al. 2016). While originally developed in mammalian models, it has been optimized and tested in crops (Zong et al. 2017, 2018). CBE is especially useful because it can be used to create precise DNA deletion sites through the activity of uracil DNA glycosylase and a co-expressed AP lyase, which cuts at a specific location in the protospacer (Wang et al. 2020). A:T to G:C transitions are also possible, termed adenine base editing (ABE), and are enabled by nCas9 fusions with adenosine deaminase (Gaudelli et al. 2017).

Non-specific changes are another vital tool enabled by Cas9-mediated sequence targeting. One such system is “saturated targeted endogenous mutagenesis editor” (STEME) which consists of a fusion of adenosine deaminase, nCas9, and a uracil DNA glycosylase inhibitor. As opposed to ABEs, STEMEs are capable of generating dual transitions of C:G to T:A and A:T to G:C, creating diverse changes at a given site (Li et al. 2020a, b). Another system, termed Evolvr, is based on nCas9 paired with an error-prone DNA polymerase providing the capability of creating all 12 substitutions (Halperin et al. 2018). These methods are exciting as they enable targeted random mutagenesis, though they have yet to be applied in plants. Notably, though, the editing systems described above do not insert desired sequences, but merely change what is already present.

Controlled insertion of DNA at specified locations through editing

One of the most exciting applications of editing technologies is to precisely insert/replace specified sequences into genomes. This has been enabled by leveraging the HDR pathway, despite its rare occurrence. In this system, the standard Cas9 enzyme with a sgRNA is paired with a homology repair DNA template. The Cas9 cleaves the DNA at a specific site and then the provided repair template is ideally used to make the repair (Huang and Puchta 2019). Because the process is rare, many techniques have been developed to try to enhance occurrence of sequence insertion. Modulating the expression of genes involved in the NHEJ and HDR processes has been fruitful. In rice, removal of the ligase 4 gene involved in the NHEJ mechanism led to enhancement of HDR-directed editing (Endo et al. 2016). Additionally, in Arabidopsis, expression of yeast-derived RAD54 has been found to enhance HDR as well as successful insertion events (Even-Faitelson et al. 2011). Another successful approach has been to enhance expression of the template sequence through use of geminivirus replicons, which has been successfully deployed in both monocot and dicot targets (Baltes et al. 2014; Wang et al. 2017b; Hahn et al. 2018). Given the importance of controlled insertions as a goal, new systems are continually being developed.

Potentially more useful than HDR-mediated approaches is the recently developed “prime editing” system which utilizes nCas9 fused with reverse transcriptase and prime editing guide RNAs (pegRNA). The nickase creates a site of single-stranded DNA which matches the 3’ end of the pegRNA, which primes the reaction of reverse transcriptase. The reverse transcriptase then synthesizes the DNA sequence specified by the pegRNA, and the DNA sequence is then integrated into the genome through ligation and repair (Anzalone et al. 2019). This system enables any base substitution to be performed as well as insertions currently up to 44 bp and deletions of 80 bp. While having been tested in rice, maize, wheat, and tomato, more work still needs to be conducted in testing, optimizing, and applying the technology to diverse crops (Jiang et al. 2020; Lin et al. 2020; Lu et al. 2020). As a new technology, prime editing likely holds great unexplored potential.

Applying Genome Editing to the Creation of Genetic Variation

There are two interconnected approaches to create genetic variation and diversity with gene editing tools: targeted and broad spectrum. In the first case, genes or genome regions of interest that have been identified can be targeted for specific changes, for example, the deletion of a gene or sequence of interest, or the recombination of a known sequence into backgrounds not previously explored. In the second approach, broad areas of the genome are targeted for nonspecific genetic variation creation, for example, enhancement of recombination location/frequency or mutation of a targeted region. While targeted approaches have the drawback of relying on detailed pre-existing sequence knowledge, they would generate predictable outcomes. Broad spectrum approaches can be more generally applied, but, likewise, can be strengthened by application of specific information. These approaches exist on a continuum and some approaches will fall into both categories.

Worth addressing is the relative value and power of the different forms of genetic variation that can be created via editing. The preponderance of studies, thus far, have used deletion to disrupt gene function. This has proven very powerful for plant functional genomics research and breeding applications; however, fundamentally subtractive processes will only go so far in making breeding advancements. Much more challenging is creating novel additive variation because it either (A) requires foreknowledge about what variation is beneficial, as in the case of knock-in experiments, or (B) requires screening for rare instances of novel creation of variation when using random mutators like Evolvr.

Targeted insertions and deletions for variation expansion

The ability to identify the genetic mechanisms underlying agriculturally relevant traits has greatly increased with the advent of low-cost genome sequencing technologies. QTL and genome-wide association studies (GWAS) analysis have proven very effective, for example, in finding causative alleles in flowering time and stalk biomass in maize (Buckler et al. 2009; Mazaheri et al. 2019), yield in rice (Li et al. 2018a), salt tolerance in soybean (Zeng et al. 2017), and many other traits worth interrogating. Added to this is the diverse body of mutation screens, expression assays, and many other approaches that enable connection of specific traits with causative genes. Editing technologies enable direct targeting of these identified genes, either replicating known useful alleles or trying to determine what the optimal allele would be.

The most apparent application of editing in plant breeding applications is to knock-out genes that are trait-relevant to study gene function and effects on phenotype. This approach has been applied in species ranging from apples to wheat for traits ranging from disease resistance to yield enhancement (Shelake et al. 2019). Advances in multiplexing technology has enabled many genes to be targeted at once such as multiplex targeting of multiple grain-related genes in rice (Zhou et al. 2019). This concept has been advanced to a larger scale in maize where 743 genes linked with agronomically relevant traits were targeted for editing via pooled and multiplexed Cas9 constructs, resulting in 118 genes being identified with phenotypic outcomes (Liu et al. 2020). This subtractive process can also be used to manifest an enhancement of expression.

Upstream open reading frames (uORFs) are alternative reading frames that are highly abundant in the genomes of plants, with 30 to 40% of genes having them (Von Arnim et al. 2014). The outcomes of expression of uORFs are a general attenuation of expression at the translation step as the uORFs “take up” the time of the translational machinery (Kozak 2001, 2002). It has been demonstrated that editing these sites can result in an enhancement and modulation of the expression level of mRNA in Arabidopsis and lettuce (Zhang et al. 2018a). Specifically, the disruption of the uORF in LsGGP2 resulted in enhancement of both ascorbate content and oxidation stress tolerance. This technique provides another form of variation creation, one focused on alteration of expression levels.

Due to the challenging nature of generating sequence knock-ins, it has been used much less frequently to introduce genetic variation compared to knockouts. One application has been in maize where the promoter of the maize gene ARGOS8 was replaced with the promoter of GOS2, a different though related endogenous gene (Shi et al. 2017). ARGOS8 regulates ethylene response negatively and generally is expressed at a low level. By exchanging the promoter and raising expression of the gene, the researchers were able to suppress the ethylene response. The outcome of this modification was a 5 bushel per acre increase under drought stress conditions. Another instance has been in tomato where a single nucleotide change mediated through HDR was made to the ALC gene (Yu et al. 2017). The resulting mutant had excellent storage performance and the edited sequence was inherited in future generations. Another additive-variation tool, base editing, has also been used in a limited capacity for crop improvement. The majority of efforts have centered on modifying genes to confer herbicide resistance where, in many cases, a single point mutation is sufficient to confer resistance (Shimatani et al. 2017a, b; Kuang et al. 2020).

While very new, prime editing has already been used in many crop species to introduce specific variation. Among crops, the first application was in rice and wheat, where insertions up to 15bp were demonstrated (Lin et al. 2020). Subsequent studies have shown even larger insertions, with sequences as long as 2,049bp being successfully inserted in rice (Lu et al. 2020). Prime editing has also been demonstrated in maize, where specific point mutations were made in the ALS genes, with efficiencies of 53.2% and 6.3% for their two targets (Jiang et al. 2020). Dicots have proven more challenging to edit in this manner, though success in tomato has been shown through codon and promoter optimization, with editing frequencies reported between 0.025 and 1.66% (Lu et al. 2020). Increasing the efficiency of prime editing is a primary goal, and strategies such as deploying paired and optimized pegRNAs to enhance prime editing efficiency have been demonstrated in rice protoplasts (Lin et al. 2021); further enhancements will continue to be made as the technology is explored. Prime editing has the potential to open the door for precise but wide-scale engineering of plants as the technology advances.

Targeted but non-specific variation creation

The above applications are all dependent on the quality of target sequence information available and knowing what genetic variation state is best. It is common to simply know that a gene or region is impactful to a trait but not what configuration would be “optimal.” Targeting non-specific editing to these sites to generate novel variation could potentially generate new useful alleles. The most straight-forward way to do this is by targeting promoter regions specifically, since changing gene expression patterns is a powerful way to modulate plant traits. An example of successful application of this approach using traditional transformation technologies was with modulation of expression of the plant development gene ZMM28 in maize. The gene is naturally expressed at a low level during the stages of development past V6, with expression being undetectable at early phases. Moderate level constitutive expression of the gene across all developmental stages using the Gos2 promoter resulted in a significant gain in yield through enhancing nitrogen utilization, carbon assimilation, and plant growth overall (Wu et al. 2019). Editing tools can still be used in this case to bring in novel variation through targeted, though nonspecific, variation creation.

One way that this can be achieved is by using Cas9 to target the promoter region of a gene of interest. This method has been applied in tomato to disrupt expression of the SICLV3 gene for fruit locule number as well as the S and SP genes for plant architecture (Rodríguez-Leal et al. 2017). A multiplexed Cas9 construct was used to target the 2kb upstream region without concern for any predicted CREs. The outcome was a range of phenotypic traits created for each gene, which were able to be fixed in the F2 generation. Significantly, they were not able to correlate phenotypic impact with expression level, revealing the complicated, non-linear nature of dose-sensitive genes in plants (Birchler et al. 2016). Furthermore, they discovered that expression levels could not be explained by the simple action of predicted CREs, revealing the still-present gaps in our understanding of promoter structure, function, and regulation.

As with the other approaches, this process is subtractive in nature. The changes created by the edits were predominantly deletions of various regions, but also some inversions and small indels within the promoter. However, expression of genes is controlled by both positive and negative regulators. One such negative regulator is the lc QTL in the WUS promoter of tomato, where targeting it for mutation via editing was shown to enhance expression of the gene (Rodríguez-Leal et al. 2017). Through this mechanism, expression can be changed in both directions. However, it is still dependent on sequences already present in the genome, instead of what could potentially be there. As of yet, no study has looked at creating additive variations within the promoter region.

Less commonly targeted for additive variation creation are proteins themselves. This is due in part to the difficulty in understanding the structural features that give proteins their function. Ever declining sequencing costs have made manipulating expression levels by changing regulatory sequences much easier in comparison. However, there are instances of agronomically important alleles such as sd1, a GA-related gene related to a dwarf phenotype in rice, being determined to be a result of protein structural changes (Hedden 2003). Many of the herbicide-resistant plants that have been developed fall into this category. Targeting variation to protein sequences is a routine process in bacterial research, but that is typically in the context of directed evolution, which is challenging to do in crops given their long cycle time. Greater advancement in this area remains to be made.

Novel domestication to enhance variation

Targeted approaches can also be leveraged to rapidly domesticate new species. The domestication process has led to utilization of a relatively small group of plant species for agronomically and economically viable food, feed, fiber, and fuel options compared to the vast number of undomesticated species. Many of these undomesticated species carry readily apparent useful traits such as resistance to disease and harsh conditions (Fernie and Yan 2019). Additionally, there is no reason to assume that the crops that have thus far been domesticated represent ideal performance for various traits across environments. There could be many plant species that have a higher upward limit on productivity that have not yet been tested. Domestication genes tend to be shared across species, meaning that knowledge gained from domesticated crops can be applied to wild species through application of editing technologies.

The main efforts, to date, in directed crop domestication have been largely in Solanum species. Three recent studies focused on wild species related to tomatoes for targeting via editing. Two of the studies focused on Solanum pimpinellifolium, known for its stress resistance against both disease and environmental conditions. One research group focused on a single accession and targeted 6 sites for editing (Zsögön et al. 2018), while in the second study, another group focused on four accessions but only four target sites (Li et al. 2018b). Interestingly, there was only an overlap of a single gene targeted between the two studies: self-pruning, which modifies plant architecture. The unshared genes were known to affect yield, nutritional quality, and flowering time. The most impressive editing-based domestication research to date has been in ground cherry (Physalis pruinosa), which included the development of a transformation protocol and whole-genome sequencing, as well as the demonstration of editing of domestication genes (Lemmon et al. 2018). Ground cherry suffers from negative traits similar to S. pimppinellifolium: bushy growth, undesirable flowering time, and small/few fruits. The genes chosen for editing overlapped with the previous studies, targeting the SP, SP5G, and SlCLV1 genes. Interestingly, the null alleles were found to be too strong and did not achieve the stated goals. Taken together, these three studies provide a roadmap for other plants to be domesticated using editing-based approaches.

Integration of exotic germplasm into existing breeding populations

Related to the issues of rapid domestication are the challenges of integrating exotic germplasm into plant breeding programs. These exotic materials are useful because they can bring in novel genetic variation that can enhance valuable traits; however, they also tend to bring many undesirable alleles through linkage (Wang et al. 2017). Enhancing recombination can assist with this; however, more direct approaches can be taken. If the specific genetic modification is identified, it can be immediately incorporated into more amenable germplasm or, in some cases, directly into elite lines through editing.

Targeted introduction of known genetic variation helps to address issues of scale in plant breeding. It might be reasonable to conduct physical crosses if one was working with a single pair of lines; however, the amount of work quickly escalates when trying to evaluate an allele within many elite genetic backgrounds. Furthermore, simultaneous introductions of alleles from many parents inevitably takes several generations and many plants to evaluate. In these cases, editing can be a more economical choice even when dealing with time intensive transformation protocols. Speed gains can also come from being able to instantly test and apply information from related species, broadening the base of available information for utilization in plant breeding applications.

Genome-wide recombination enhancement to increase variation

Among broad spectrum approaches to increasing genetic variation, enhancing recombination is alluring because it ties in naturally with existing breeding programs. Native recombination is not uniform and does not occur frequently, which results in underutilization of the variation that breeders have available to them. This limits the ability to recombine material and in turn limits integration of exotic germplasm, prebreeding efforts, and arrival at ideal genotypes, each of which would be enhanced by increasing recombination rates (Taagen et al. 2020). Editing tools also enable the possibility of achieving targeted recombination; estimates of the potential gains from targeted recombination have been shown to range from 60 to 400% over nontargeted recombination, depending on the trait (Bernardo 2017; Brandariz and Bernardo 2019).

Many factors contribute to where and how frequently recombination occurs and manipulating these can enhance overall recombination rate in plants (Choi 2017; Fayos et al. 2019; Taagen et al. 2020). The most straightforward approach to enhancing recombination rates in plants is to modify genes involved in the recombination process, eliminating those that suppress the CO process and enhance expression of those that stimulate it (Taagen et al. 2020). The first approach was tested in rice, pea, and tomato through targeting orthologues of the FANCM, RECQ4, and FIGL1 genes originally identified in Arabidopsis, each of which are involved in the recombination process. Among those gene targets, RECQ4 proved to be the most impactful, with editing-mediated deletions of the gene resulting in a three-fold increase in number of COs (Mieulet et al. 2018). Editing-based alteration of these types of genes can also have far reaching impacts, however, such as the case where knockouts of FIGL1 resulted in sterility in tomato and pea. Because COs are an important element of the meiotic process, interfering with the CO mechanism can result in sterility. This phenomenon was found to be true for another antagonist of COs, MEICA1 in rice. Knocking out the gene increased CO frequency but also resulted in high sterility rates (Hu et al. 2017).

Another route to broad recombination enhancement is through modification of the epigenetic landscape of a genome. In Arabidopsis, plants with mutations in the decreased DNA methylation 1 (ddm1) gene were found to have enhanced recombination in euchromatic regions (Melamed-Bessudo and Levy 2012; Yelina et al. 2012). Unfortunately, this did not impact heterochromatic pericentric regions, revealing additional recombination control mechanisms at work. Impacting these heterochromatic regions necessitated the dual targeting of both H3K9 methyltransferase genes and the non-GC methylation genes, which, in Arabidopsis, was sufficient to induce an increase in recombination in heterochromatic pericentric regions (Underwood et al. 2017). As with targeting of CO regulators, targeting epigenetic changes can have complicated whole genome impacts that impede their application to crops. In tomato, orthologs of the cytosine methylation gene, ddm1, were targeted for editing, which resulted in expected changed methylation patterns but also negative impacts on growth and development of the resulting plants (Corem et al. 2018). Further research will be necessary to make this a more viable approach for enhancement of recombination in crops.

Pooled guide approaches toward creation of genetic variation

Some approaches toward genetic variation creation bridge the gap between targeted and broad approaches, applying specific knowledge at a broad scale. A recent editing-mediated technique utilized in generation of genetic variation has been to use pools of guide RNAs that are targeted to a wide number of genome sites. Successful guides can be identified through sequencing the resultant mutants, enabling sorting of results despite the pool being mixed. In tomato, a total of 72 genes were targeted via editing, resulting in 30 recovered mutants displaying aberrant phenotypes ranging from boron deficiency to modified leaves (Jacobs et al. 2017). The largest scale experiment, to date, has been in rice where all 39,045 rice genes were targeted for mutagenesis, resulting in generation of 91,004 mutants valuable for functional genomics research and breeding purposes (Lu et al. 2017; Meng et al. 2017). This pooled approach has also been successfully deployed in maize and soybean, each targeting novel QTLs for discovery purposes (Bai et al. 2020; Liu et al. 2020).

The strength of this combined approach is in being able to target more sites at once by eliminating the need for conducting many individual experiments. In a transformation-based editing pipeline, each distinct experiment needs to be conducted, tracked, and monitored separately. This costs, time, and resources, both through the physical effort involved as well as the cognitive work of managing each experiment. These time and resource costs can be minimized in a pooled approach because only the events that resolve into a useful phenotype are traced back via sequencing to the genome sequence change that caused it. Even the labor of tracing could be eliminated in situations where the only goal is to affect the end product of performance; no sequencing needs to be done if it is not a goal to figure out the underlying genetic cause of the change in phenotype. This approach will be most powerful in situations where the expected percentages of “positives” is low since there will be the greatest time savings in avoiding negatives. If only 10% of experiments would result in a phenotypic positive, then 90% of overhead costs can be avoided through using a pooled approach.

Controlled nonspecific recombination enhancement as a means to increase variation

Another strategy that falls into a middle path between targeted and broad approaches is targeting of specific genome regions for recombination enhancement. This has been demonstrated in plants through the use of somatic homologous recombination (HR), spo11 fusions, and epigenetic modification (Hayut et al. 2017; Sarno et al. 2017; Taagen et al. 2020). Somatic HR pathways take advantage of the same DSB repair pathway used by HDR-mediated insertions, except to cause repair by the other chromatid. A DSB is induced by Cas9 which is then repaired by the sister chromatid. This approach has been demonstrated in tomato through targeting the phytoene synthase gene to cause reversion to wildtype red sectors in fruit (Hayut et al. 2017). To date, this approach has not yet been explored widely and further research is needed to define its value as a technique to enhance genetic variation.

A potentially exciting approach may be to target the initiators of the CO process directly to sites of interest. This has been tested in yeast where spo11, a conserved gene which generates the DSB at the beginning of CO, was targeted to different genetic regions by pairing it with dCas9 and found to enhance COs in otherwise “cold” regions of the genome (Sarno et al. 2017). This approach has not yet been reported in crops but could be useful to consider. Finally, epigenomic modification may be a future route toward controlled COs, given that methylation and histone modification impacts CO occurrence. This approach has yet to be combined with the current epigenetic editing tools that have been developed (Taagen et al. 2020).

Expanding the Crop Genetic Terrain into New Areas

Each of the previous examples of genetic variation creation work within the context of the existing breeding landscape and pangenome. One informative way to think about the pangenome is as a multidimensional space. The various axes are different regions of the genome and moving along these axes equates to exploring the genetic variations within these regions. Setting these axes to a certain fixed state, for example, in a given crop variety, results in informative outputs: yield, pest resistance, drought hardiness, nutritional value, etc. This is a useful metaphor because it reframes the task of breeding to one of exploration of the variation-space that a crop contains. Areas of depleted variation become unexplored genetic terrain. Editing technologies will allow both understanding and rapid shifting of this landscape, resulting in greater variation and diversity.

One impact of this landscape is that allelic effects depend on the background they are in, both due to epistatic interactions and the combination of many small-effect alleles. Different genetic backgrounds occupy different regions of genetic space and few changes to these spaces will have uniform outcomes. This is especially true of resistance alleles. For example, an analysis of 19 crop QTL studies on resistance alleles found that a majority of them displayed epistatic interactions with other alleles, resulting in QTLs performing differently across different backgrounds (Gallois et al. 2018). In maize, it was observed that flowering time alleles can vary in their impact depending on the genetic background. Specifically, the MADS69 QTL was shown to have a large impact in dent backgrounds but small effect in flint backgrounds (Rio et al. 2020). Additionally in maize, the impact of genes involved in the hypersensitive response has been shown to be dependent on the genetic background of the plant (Chaikam et al. 2011). Manipulating and understanding these background effects are empowered by editing techniques which enable alleles to be tested in more diverse backgrounds than would be otherwise feasible (Gao et al. 2020).

Traditional breeding approaches toward understanding and leveraging QTLs are further hampered by linkage drag and the unfavorable statistics of multiple segregating alleles. Linkage drag causes unwanted alleles to accompany the QTL of interest, obscuring its actual impact (Zhou et al. 2019). Bringing together multiple favorable QTLs into a single line is labor intensive: combining a favorable allele from 6 parents would take at least 3 generations in a diploid crop, ignoring potential time it would take to segregate away unfavorable alleles. Editing bypasses each of these problems as it is precise, can be completed in fewer generations, and can be deployed in many backgrounds at once. This has been illustrated in rice where three yield-related QTLs were targeted in elite rice varieties (Zhou et al. 2019). By the T2 generation, a wide variety of single to triple mutants had been generated and confirmed to be transgene free. Gains of 68% and 30% were made in yield per panicle in the two lines studied most thoroughly, J809 and L237, respectively. This difference in impact was also found in the single mutants, with GS3 mutants having a greater effect on grain length in J809 compared to L237. An equivalent study using non-editing approaches would have taken many more resources and years to conduct.

Establishment and manipulation of large effect genes

Within variation-space, axes are not uniformly impactful. Those genes with a disproportionate effect on trait performance, called large effect genes (LEGs), can have a very strong impact on the traits and performance of a crop. Domestication genes are generally of this type, as has been found in many crops (Meyer and Purugganan 2013). These genes radically change a species to make it amenable to human cultivation, and commonly are shared across many crop species such as the Shattering1 (Sh1) gene, which is shared as a domestication target across rice, maize, and sorghum (Lin et al. 2012). Modern breeding efforts have also generated LEGs. For example, rht in wheat and sd1 in rice cause semi-dwarf phenotypes and were one of the innovations of the Green Revolution (Hedden 2003). Another example are the brown midrib genes in grasses, which cause an alteration in lignin content, making them much more digestible by ruminants (Barrière and Argillier 1993).

Editing can be used both to modify existing LEGs as well as create new ones. Many domestication genes and LEGs have been subject to fixation due to selection, resulting in a depletion of variation (Meyer and Purugganan 2013). It would be naive to assume that most of these genes exist in their ideal configuration given that they were selected for during a period of agriculture that looked vastly different than today. Even for non-fixed genes, it has been demonstrated that alleles superior to naturally occurring ones can be manufactured. Zmm28 is an AP1-FUL MADS-box gene in maize which, when expressed under the moderately constitutive gos2 promoter, had higher grain yield compared to wild type (Wu et al. 2019). Modification of these selected regions could be done by first looking for existing variation found in wild relatives which have not undergone selection and re-introduction of that variation through targeted editing or recombination-enhanced breeding. Alternatively, novel variation could be generated through using targeted mutagenesis tools.

Transfer of genetic enhancement across species borders

In addition to LEGs, evolution has led to the development of many processes of potential value, but that are isolated within selected species. Nitrogen fixation and photosynthesis optimization in particular stand out as targets to make large-scale gains in agriculture. Theoretically these traits could be selected for using traditional breeding approaches, but significant challenges mean that achieving the desired breeding outcome would be an extremely lengthy process. Specifically, the number of genetic changes that would be necessary, paired with the lack of needed sequence variation to make those changes, makes those traits very difficult to address through traditional breeding approaches. For instance, photosynthetic conversion efficiency has improved in soybean over the last 80 yr, but that is only under well irrigated conditions and has not come with improvements in other aspects of photosynthetic efficiency (Koester et al. 2016). Editing would enable the direct change of genetic factors underlying these processes in less time and more broadly, as findings in one species could be applied to another.

Introduction of nitrogen fixation capabilities into non-nitrogen fixing crops would be a significant achievement for modern agriculture that is currently dependent on added nitrogen, largely derived from the energy-intensive Haber-Bosch process. This could theoretically be achieved through either engineering the legume symbiosis pathway or the nitrogenase-related genes directly into plants (Oldroyd and Dixon 2014). Equally significant, would be improvements in the photosynthesis machinery as it would mean more available energy for yield. The available routes that could be taken are very diverse, but one of the main approaches investigated is the introduction C4 photosynthesis into C3 crops such as wheat (Kubis and Bar-Even 2019). Both nitrogen-fixation and photosynthetic enhancements are active areas of research, though not yet at the implementation phase. Either of these developments will only be made possible through modern editing technologies as existing pathways will need to be eliminated and modified to accommodate the new ones. Altering metabolite flux will require fine tuning through novel variation creation in contributing genes and their promoters. The process will not be as simple as transforming in new genes because these processes interact with existing regulatory pathways and product stoichiometry can be critical.

Evolutionary history can give us more insights into potentially worthwhile breeding goals toward which to apply editing-based technologies. Important factors in plant evolution have been the twin forces of whole genome duplication (WGD) and tandem duplication (TD). Both introduce new genes into the genome which are subsequently selected for deletion, subfunctionalization, and, most relevantly, neofunctionalization. This is the process by which genes take on new functions, which has been linked to plant adaptations to new environments as well as many agronomic traits (Lyu et al. 2020). Replicating these processes could be an avenue of future advancement in crops, either by creating TDs through insertion or through synthetic chromosome engineering. Continued development in the power of editing technologies will be needed to make this a reality, however.

Variation creation via exploration of insignificant data

A currently underexplored application of editing for variation creation is in leveraging “insignificant” data sources. QTL analysis, GWAS, gene regulatory network (GRN) inference, and other bioinformatic analysis techniques all rely on defining cutoffs of statistical “significance” based on acceptable values of error. QTL and GWAS studies generally begin with a standard p-value of 0.05 which is then further modified to account for multiple comparison, population structure, and recombination rates (Liu and Yan 2019). Given the large effort involved in proving a single QTL as causative of variation in a trait, it makes sense to have high standards in what is accepted as significant. However, this approach removes from consideration many potential QTLs that may have been important but did not meet the necessary level of significance. GRN inference suffers from similar issues in trying to balance precision and accuracy in reporting of interactions. In maize, the GENIE3 (GEne Network INference with Ensemble of trees) genetic network mapping tool was used to create a gene network from protein and mRNA expression data, resulting in an area under the receiver operating characteristic (ROC) curve of 0.717 (Walley et al. 2016). Said another way, the network would predict 50% of true-positives with a false-positive rate of 20%. Even single cell expression datasets have challenges in the inference task despite the high density of their data (Chen and Mar 2018). As of yet, GRNs broadly have not been widely used for improvement of crops due to these challenges.

The result of this cut-off approach is that the most impactful genes with sufficient variation are the ones that receive investigation. There is still useful information that is of “insignificant” quality within these data that could be useful in approaches similar to Liu et al. which utilized pooled guides to target vast numbers of genomic sites for editing. In QTL and GWAS type data, this would be to directly generate variation within low-significance targets. In GRNs, this would be used in targeting the “spokes” of known important hub genes. This has not yet been reported in crops, but as the technological ability to perform editing at large scales expands, so too will be the acceptable range of error.

The Future of Editing Technologies

While gene editing technologies have advanced rapidly, there is still room for improvement. Principally, while there are many tools for making targeted genetic modifications, it is still difficult to make modifications at a large scale and target many sequences at once. One component of this bottleneck is that, currently, genome editing in crops relies on crop transformation systems. Many crops are still challenging to transform, and current transformation protocols tend to be genotype-dependent, inefficient, expensive, time-consuming, and/or subject to regulatory or intellectual property issues (Altpeter et al. 2016). Development of improved transformation systems that overcome current limitations, such as morphogenic gene-based systems, meristem-based systems, and nanoparticle-mediated protocols, will help in expanding the use, and reducing the time and cost, of editing-based genetic study and improvement of crops (Altpeter et al. 2016; Kausch et al. 2019; Nadakuduti and Enciso-rodríguez 2021). This should enable application of gene editing technologies at a much larger scale to more species and genotypes.

The other side of this bottleneck is that the solutions for deploying large numbers of guides are still under-explored in plants. Pooled-guide approaches have been successful but have featured a low number of guides per construct. This fits the objective of the studies in identifying causative QTLs. However, in cases where phenotypic gains are more important than basic genetic research, it would be advantageous to launch multiplex guide cassettes to multiply the number of variation sites. A promising lead in this area is work with Cas12 paired with utilization of a pol II promoter to launch a large number of guides, though the approach has not yet been deployed to specifically generate novel variation (Campa et al. 2019). Combined with different molecular machinery that can create diverse forms of variation, this could be a very powerful approach in the future.

Even in the era of big data, information gaps abound

Advancements in molecular tools need to be paired with greater information to be useful. Although this is considered the era of “big data,” there are still challenges with identifying actionable biological information from that data, beyond simply large effects. This is in part due to the magnitude of questions asked of the data. The genomes of plant species each contain tens of thousands of genes, which in turn can have hundreds of variations across the pangenome, each of which modifying developmental pathways by incremental amounts. There is a near infinite developmental and genetic space that scientists are trying to describe with big data approaches, which results in only the most impactful rising to the top. Another complicating issue is the current difficulties in integrating diverse datasets. There are many ways that one can, for instance, attempt to identify the location and size of the promoter sequence of a gene. This is relevant to know to be able to deploy editing components to the correct location; however, deciding whether to use ATAC-seq or DNase-seq or even how to integrate both is not presently clear. Supporting these objectives will require the continued advancement of sequencing methods to further reduce costs. It will also require the development and maintenance of standard datasets against which aggregate models can be evaluated.

In addition to the considerations raised above is the necessary sustained development of trust between the creators of molecular technologies and those people affected by them. While some editing technologies have been deemed unregulated in the USA as well as many South American countries, they have been mostly prohibited in the European Union and beyond. If these technologies are to be used to address current and growing global agronomic needs, there continues to be the need to both develop safe technology as well as communicate the benefits of this technology. This cannot be couched solely in discussion of facts, as is the tendency, as this has been proven ineffective (Kahan et al. 2012). Instead, it needs to come from a place of shared values and goals and knowledge of how editing and other molecular techniques can meet crop improvement needs and benefit society.

Genome editing is a very powerful technique that can be used to bring much needed genetic variation into plant breeding programs and has already delivered many gains. Sustained development in editing technologies should allow advancements beyond mainly subtractive processes into de novo creation of novel diversity that can be leveraged to improve crop yield, quality, and abiotic and biotic stress tolerances, and reduce the environmental footprint of crop production.