WheatCRISPR: a web-based guide RNA design tool for CRISPR/Cas9-mediated genome editing in wheat
CRISPR/Cas9 gene editing has become a revolutionary technique for crop improvement as it can facilitate fast and efficient genetic changes without the retention of transgene components in the final plant line. Lack of robust bioinformatics tools to facilitate the design of highly specific functional guide RNAs (gRNAs) and prediction of off-target sites in wheat is currently an obstacle to effective application of CRISPR technology to wheat improvement.
We have developed a web-based bioinformatics tool to design specific gRNAs for genome editing and transcriptional regulation of gene expression in wheat. A collaborative study between the Broad Institute and Microsoft Research used large-scale empirical evidence to devise algorithms (Doech et al., 2016, Nature Biotechnology 34, 184–191) for predicting the on-target activity and off-target potential of CRISPR/SpCas9 (Streptococcus pyogenes Cas9). We applied these prediction models to determine on-target specificity and potential off-target activity for individual gRNAs targeting specific loci in the wheat genome. The genome-wide gRNA mappings and the corresponding Doench scores predictive of the on-target and off-target activities were used to create a gRNA database which was used as a data source for the web application termed WheatCRISPR.
The WheatCRISPR tool allows researchers to browse all possible gRNAs targeting a gene or sequence of interest and select effective gRNAs based on their predicted high on-target and low off-target activity scores, as well as other characteristics such as position within the targeted gene. It is publicly available at https://crispr.bioinfo.nrc.ca/WheatCrispr/.
KeywordsCRISPR Cas9 gRNA design tool Genome editing Transcriptional regulation Wheat
Cutting frequency determination
Clustered, regularly interspersed, palindromic repeats
Rule set 2
Genome editing technology based on a bacterial adaptive immune system, termed CRISPR (Clustered, Regularly Interspersed, Palindromic Repeats) / Cas9 (CRISPR-associated endonuclease 9 [1, 2, 3, 4];) has sparked a new revolution in biological and agricultural research [5, 6]. CRISPR/Cas9 technology originating from Streptococcus pyogenes relies on two important components, a Cas9 endonuclease and a single guide RNA (sgRNA) formed by fusing two small RNA molecules, namely CRISPR RNA (crRNA) and an auxiliary trans-activating crRNA (tracrRNA) that together guide Cas9 nuclease to a specific DNA site [7, 8]. Each crRNA unit contains a 20-nt guide sequence complementary to a target site, designated as guide RNA (gRNA). Another critical feature of the Cas9 system is the Protospacer-Adjacent Motif (PAM) flanking the 3′-end of the DNA target site that dictates the target search mechanism of Cas9 . The PAM comprises a triplet of base pairs with a canonical sequence 5′-NGG-3′ where “N” is any nucleotide . Other non-canonical PAM triplets have also been described, including NAG, NCG and NGA that support less efficient CRISPR/Cas9 functions [10, 11], and thus may contribute to off-target activity.
Although CRISPR/Cas9 applications promise to accelerate the pace and course of crop improvement [5, 6], a number of hurdles exist that limit full exploitation of this innovative technology, especially in crops with large polyploid genomes. Wheat is an economically important cereal crop providing 20% of the calorie and protein intake for the global population. It harbours a complex allohexaploid genome of 16 Gb with approximately 85% repetitive elements and estimated 107,921 high confidence and a further 161,537 low confidence annotated genes . Due to the presence of up to six homoeoalleles per gene and large gene families, off-target gRNA binding and cleavage is one of the most critical issues that affect implementation of CRISPR/Cas9 technology in wheat. The gRNA is an important component of the CRISPR/Cas9 system as it determines the efficacy and specificity of Cas9 nuclease. An effective gRNA should have high on-target activity and low off-target potential. Thus, rational design and optimization of functional gRNA sequences is essential to achieving maximal effectiveness and highest targeting specificity for intended genomic location(s).
Multiple bioinformatics tools have been developed to facilitate the design of gRNAs and prediction of off-target sites [13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]; however, only two of these programs, including E-CRISP  and CRISPRdirect  support design of gRNAs for wheat. CRISPRdirect predicts specific gRNAs based on in silico prediction of specificity but the lack of implementation of evidence-based metrics to predict off-target sites is a notable caveat. E-CRISP identifies off-target sites by aligning gRNAs to the genome with Bowtie2. However, Bowtie2 does not guarantee that all possible hits will be found, especially when the number of mismatches is high . This results in an underestimation of potential off-target sites. A collaborative effort between scientists at the Broad Institute and machine learning experts at Microsoft Research used large-scale empirical evidence based on cleavage potential of thousands of gRNAs targeting a panel of 15 genes to uncover position-specific sequence features that are predictive of gRNA efficacy and specificity, including the position and frequency of single and di- nucleotides, the GC content of the gRNA, the location of the gRNA within the protein coding region and melting temperatures of the first 5, middle 8 and last 5 base pairs of the gRNA [11, 25]. The findings from these large-scale empirical data were utilized to devise new rules for gRNA on-target activity [rule set (rs) 2] and cutting frequency determination (CFD) scores to predict gRNA off-target effects , that can be broadly applied. In this study, we applied these prediction models to determine on-target specificity and potential off-target activity of individual gRNAs targeting any locus in the wheat genome, and designed a web-based bioinformatics portal (WheatCRISPR) for design of highly specific gRNAs for CRISPR/Cas9-mediated genome editing and CRISPR-based transcriptional regulation of gene expression in Chinese Spring wheat.
gRNA database construction and content
Survey of PAM sites in the IWGSC v1.0 Chinese Spring wheat reference genome sequence
Non-canonical (NAG, NCG and NGA)
Tiered k (mismatches) levels applied for the search of off-targets in the wheat genome
Exons and promoters
Introns and UTRs
Non-canonical (NAG, NCG, NGA)
Exons and promoters
Introns and UTRs
Utility and discussion
A key summary statistic for evaluating the off-target activity of a gRNA is the maximum CFD score for the gRNA, i.e. the single worst off-target hit. The gRNA plot (exemplified in Fig. 3a) and table (Additional file 1: Table S1) for a given gene presents the rs2 score and the maximum CFD score for each of the four genomic regions: coding, promoter, other genic, and intergenic. This facilitates selection of specific gRNAs by characterizing the potential severity of off-target effects based on the likelihood of unintended activity resulting in functional change to a coding region.
WheatCRISPR assists the user to find a trade-off between high on-target activity and low off-target activity by calculating an overall score for each gRNA that rewards high rs2 scores and penalizes high CFD scores (Additional file 1: Table S1). The overall score is a weighted average of the rs2 and maximum CFD scores. An optional variation of this score can be toggled on or off if the user wishes to target all homoeologous copies of the gene. In this variation, high CFD scores in homoeologues are rewarded while the maximum CFD in non-homoeologues remains penalized (Fig. 3b). Homoeologs were identified by the annotation available at ensemblgenomes.org. An overall score is used to rank all gRNAs for a gene so that the user can quickly identify the most likely candidate gRNAs. The overall scoring function is not based on any empirical evidence, so it is simply an intuitive estimate designed to help accelerate the process of finding effective gRNAs. Users are strongly encouraged to consider the individual rs2 and CFD scores, and other factors such as the location of gRNA within the protein coding region of the gene, before selecting a gRNA. The exact function used when targeting a specific gene (the default mode) is:
0.5(rs2) + 1 − (0.5(0.7(max(cfd_coding,cfd_promoter)) + 0.2(max(cfd_other_genic)) + 0.1(max(cfd_intergenic)))0.5(rs2) + 1-(0.5(0.7(max(cfd_coding,cfd_promoter)) + 0.2(max(cfd_other_genic)) + 0.1(max(cfd_intergenic)))
and when targeting homoeologs is enabled:
0.33(rs2) + (1 − (0.33(0.7(max(cfd_coding,cfd_promoter)) + 0.2(max(cfd_other_genic)) + 0.1(max(cfd_intergenic))))) + 0.34(mean(cfd_hmlgs))
Besides the predicted on-target and off-target activity metrics, the location of the gRNAs within a gene can also be important. It is often desirable to select gRNAs from exons that occur in all splice isoforms of a gene to ensure that all alternative transcripts are targeted. To identify the location of gRNAs within a gene, WheatCRISPR presents a genome browser-style Gene Plot with tracks for the gene models and the selected gRNA (Fig. 3).
The precomputed on-target to off-target mappings improve performance but limit the target sites to annotated genes. To search for targets outside annotated genes, WheatCRISPR also allows the user to paste in an arbitrary sequence of interest. In such cases, gRNAs are extracted, and off-target sequences and scores are computed on the fly. In this mode, functionality is limited for performance reasons. The maximum number of mismatches is limited as described in Table 2, and targeting a set of homoeologs is not possible.
As an elegant alternative to reliance on natural or induced mutagenesis, CRISPR/Cas9-based gene editing technology has the potential to change the pace and course of crop breeding. To facilitate the application of this innovative technology in wheat, we have developed a robust web-based bioinformatics tool (WheatCRISPR) to enable selection of specific gRNAs for user-specified target gene or sequence and prediction of potential off-target sites. The current implementation of WheatCRISPR supports the selection of gRNAs to guide S. pyogenes Cas9 to genomic locations in the wheat genome. Identification of guide sequences with different PAMs reported for Cas9 variants, such as StCas9 (Streptococcus thermophilus Cas9), NmCas9 (Neisseria meningitides Cas9), SaCas9 (Staphylococcus aureus Cas9) and FnCpf1 (Francisella novicida RNA-guided endonuclease) would be highly desirable. However, the reliance of Doench algorithms on empirical data (gRNA efficacy and specificity) specific to PAM sites of SpCas9 limits extension of WheatCRISPR to PAM sites of other Cas9 variants. Additionally, in wheat there will be a few genes for which finding unique gRNAs would be difficult due to polyploidy, high content of repetitive DNA content and genes typically existing as members of multi-gene families with high levels of sequence identity. In such cases, the users may have to consider other strategies (for example, dual gRNAs) to improve targeting specificity.
Availability and requirements
The WheatCRISPR web application is publicly available at https://crispr.bioinfo.nrc.ca/WheatCrispr/.
We would like to thank Dr. Dan Tulpan for critical reading of the manuscript.
SK conceived the study. SK and AGS coordinated the study. DC designed and implemented the WheatCRISPR application. MK and PB provided technical and data analysis support. KR provided vectors and reporter constructs. MB and IAPP contributed to the development of on the fly blast-based gRNA design tool. NR performed in vitro nuclease assays for gRNA validation. SK, MK and DC wrote the manuscript. All authors edited the manuscript and approved the final version.
This research was supported by the Canadian wheat improvement flagship program, National Research Council Canada. The funding agency had no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.
Ethics approval and consent to participate
This research did not need ethics approval and consent as it did not involve human subjects, material or data.
Consent for publication
The authors declare that they have no competing interests.
- 12.International Wheat Genome Sequencing Consortium. Shifting the limits in wheat research and breeding using a fully annotated reference genome. Science. 2018;361(6403):eaar7191.Google Scholar
- 13.Ding Y, Li H, Chen L-L, Xie K. Recent Advances in Genome Editing Using CRISPR/Cas9. Front Plant Sci. 2016;7:703.Google Scholar
- 27.Wang W, Simmonds J, Pan Q, Davidson D, He F, Battal A, Akhunova A, Trick HN, Uauy C, Akhunov E. Gene editing and mutagenesis reveal inter-cultivar differences and additivity in the contribution of TaGW2 homoeologues to grain size and weight in wheat. Theor Appl Genet. 2018;131(11):2463–75.CrossRefGoogle Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.