Rapid Quantitative Evaluation of CRISPR Genome Editing by TIDE and TIDER.

Current genome editing tools enable targeted mutagenesis of selected DNA sequences in many species. However, the efficiency and the type of introduced mutations by the genome editing method are largely dependent on the target site. As a consequence, the outcome of the editing operation is difficult to predict. Therefore, a quick assay to quantify the frequency of mutations is vital for a proper assessment of genome editing actions. We developed two methods that are rapid, cost-effective, and readily applicable: (1) TIDE, which can accurately identify and quantify insertions and deletions (indels) that arise after introduction of double strand breaks (DSBs); (2) TIDER, which is suited for template-mediated editing events including point mutations. Both methods only require a set of PCR reactions and standard Sanger sequencing runs. The sequence traces are analyzed by the TIDE or TIDER algorithm (available at https://tide.nki.nl or https://deskgen.com ). The routine is easy, fast, and provides much more detailed information than current enzyme-based assays. TIDE and TIDER accelerate testing and designing of DSB-based genome editing strategies.


Introduction
CRISPR-based systems are popular and widely used for genome editing in the field of molecular biology. CRISPR endonuclease Cas9 introduces a DSB into the genomic DNA with high precision. Due to the error-prone repair mechanisms of the cell, this often results in insertions or deletions at the targeted site [1]. This is exploited to make functional knock-outs of specific genes and regulatory elements [2][3][4]. Alternatively, to gain more control over the nature of the mutations, strategies have been developed that introduce small nucleotide changes around a precisely targeted site by using a donor template [5,6]. In the latter approach the genomic DNA around the DSB break is replaced by the DNA of the donor template through homology-directed repair (HDR), resulting in the introduction of a designed mutation with high accuracy [7,8]. This precise editing creates the possibility to generate and study specific disease-causing nucleotide variants [6,9]. Typically, one starts with a homogeneous cell line and ends up with a pool of cells with a complex mix of indels and/or designer mutations [10][11][12]. To study a mutation of interest, clonal mutant lines need to be isolated from the cell pool. Because this is a very labor-intensive process it is important to know a priori the efficiency in which the desired mutation(s) have been introduced. However, a complicating factor is that the efficacy of the programmable nucleases can vary dramatically depending on the sequence that is targeted. In addition, different cell types have a varying performance in transfection capability. These factors make the efficacy of CRISPR experiment difficult to predict. For this reason it is usually necessary to test several guide RNAs (gRNAs) that lead the endonuclease to the site of interest. This is even more critical when a template-directed strategy is applied, which often has a low efficiency because HDR repair pathways are generally less active than error-prone non-templated repair [10,12]. Hence, a quick and easy assay to estimate the frequencies of the diverse introduced mutations in the cell pool is of key importance.
We developed two methods that can accurately quantify the efficiency of either indels or template-directed mutations in a pool of cells. Both methods are rapid and cost-effective. The method TIDE (Tracking of Indels by DEcomposition) identifies and quantifies indels. It requires only a pair of standard Sanger sequence traces of two PCR products [13]. The sequence traces are then analyzed using an easy-to-use web tool. Note that TIDE can only detect overall indel frequencies, but not nucleotide substitutions or specifically designed indels. For the latter purpose we developed TIDER (Tracking of Insertions, DEletions, and Recombination events) [14]. This method can estimate the incorporation frequency of template-directed mutations, including point mutations, and distinguish them from a background of additional indels that originate from competing erroneous repair pathways. Although TIDER can also quantify indels alone, TIDE is slightly simpler to implement and therefore more suited for the assessment of non-templated editing experiments. The corresponding web tools for both TIDE and TIDER are freely accessible at http://tide.nki.nl.

DNA Purification Buffers and Solutions
Usually, 1-3 days after transfection genomic DNA is isolated. Genomic DNA of a minimum of 1000 cells should be isolated to get a comprehensive sampling of the complexity of the mutations that are introduced by the repair of the CRISPR-Cas9 double strand break. A standard genomic DNA isolation Kit (e.g., BioLine ISOLATE II Genomic DNA Kit) can be used according to the manufacturer's protocol. DNA can also be isolated with the protocol for isolation of high-molecular-weight DNA from mammalian cells using proteinase K and phenol/chloroform extraction [15] 10. Purify the PCR product using a kit and manufacturer's instructions (e.g., BioLine ISOLATE II PCR and Gel Kit).

Sanger Sequencing
We strongly recommend that all PCR products (control, experimental sample(s), and for TIDER also the reference) are sequenced in parallel. Purified PCR samples are prepared for Sanger sequencing with the following protocol or can be send for commercial Sanger sequencing.

Control and Experimental Sample Generation
For both methods genomic DNA is isolated from the cell pool that was transfected with the nuclease or guide RNA alone (control) and from cells exposed to both Cas9 and guide RNA (experimental sample). For TIDER the experimental sample is also co-transfected with the donor template. Then a region of about 500-1500 base pairs around the target site is amplified by PCR from DNA of the control and experimental sample (Fig. 1a, b). Next, the PCR amplicons are subjected to conventional Sanger sequencing. In the PCR product of the experimental sample, the sequence trace may consist of a combination of multiple sequences derived from unmodified DNA and DNA that has acquired a mutation (Fig. 2a).

Reference Sample Generation (TIDER Only)
TIDER is required for genome editing experiments in the presence of a donor template. In addition to the control and experimental sample trace (see Subheading 3.1), TIDER requires one extra Sanger sequencing trace called "reference." The reference is similar to the control sequence, except that it carries the desired base pair changes as designed in the donor template (Fig. 2e). There are two paths to obtain the reference sequence as described below. The reference sequence can be easily created in a 2-step PCR protocol based on site-directed mutagenesis [16]. Here, two additional primers are required that overlap and carry the desired mutation(s) (mutated primers c, d, which are reverse complement of each other) (Fig. 1c). These primers are used in combination with the primers used for the amplification of the control and experimental sample (control primers a, b). The control forward primer a is combined with the reverse mutated primer c and the forward mutated primer d with the control reverse primer b, resulting in two PCR amplicons that incorporate the designed mutations. Then the two amplicons are denatured and hybridized at the complementary ends in an annealing reaction. The second PCR uses the annealing mixture as a template and the control forward and reverse (primers a and b) as primers. This PCR starts with an extension step followed by exponential amplification. This results in a PCR product carrying the designed mutations (see Notes 2 and 3).
Alternatively, the reference DNA can be ordered as synthesized DNA. The design should include a similar DNA code as the PCR product of the control sample, except that it should carry the designed mutation(s) as in the donor template. The annealing sequences for the forward and reverse primers (a, b) should also be present in the synthesized fragment. Similar to the control and test sample, the reference can be amplified with primer a, b (see Note 3).

Web Tool
The PCR products of the control, optional reference, and experimental sample are processed by conventional Sanger sequencing. aligned to the control sequence in order to determine the position of the expected Cas9 break site. Next, in all Sanger sequence traces an alignment window is automatically selected that runs from 100 to 15 bp upstream of the break site. The sequence segment in this window of the experimental sample (and the optional reference) is aligned to that of the control in order to determine any offset between the sequence reads. Users may change the default settings for these calculations, which is necessary when alignment problems occur with these settings (see Notes 6 and 7). Subsequently, two output plots are generated: one plot that can help with quality control and one that displays the indel/HDR spectrum.  by one predominant nucleotide signal indicative of the actual nucleotide. The minor signals from the other three nucleotides are normally considered as background. In TIDE(R) the percentage of these aberrant nucleotides is plotted along the sequence trace of the control and the experimental sample. Thus, a value of 0% at a position indicates that the detected nucleotide does not differ from the control sequence while a value of 100% indicates that the expected nucleotide was not detected at all (and instead only one or more of the other three nucleotides) (Fig. 2b). The percentages of aberrant nucleotides in the control should be low along the whole sequence trace. However, the experimental sample consists of a mixture of multiple sequences due to the presence of indels and possible point mutations. Around the break site the sequences start to deviate from the control, which is visible with consistently elevated signal of the aberrant sequence signal. Note that there is a 25% chance that an identical nucleotide in a mutated sequence is found as is present in the control sequence at the same position, because there are only 4 different nucleotides available. This plot allows the user to visually inspect the sequence deviation caused by the targeted nuclease and enables to verify the alignments and quality of the data. It is important to confirm that (1) the break site is located as expected, (2) the aberrant signal is only increasing around the break site and (3) remains elevated downstream of the break site. The sequence trace downstream of the break site is decomposed into its individual sequence components. The region used for this purpose is marked as the decomposition window. All parameters in TIDE(R) have default settings but can be adjusted if necessary. The user can interactively change the alignment and decomposition windows. Choosing a different decomposition window is often a remedy to circumvent locally poor sequence traces, which should be avoided (see Notes 8-10). For TIDER two additional quality plots are generated. In one, the aberrant signal of the reference trace compared to the control trace is plotted. This can be used to verify whether the designed mutation(s) is/are present at the expected location. In the second one, the percentage of the designed mutation(s) present in the experimental sample is plotted, representing the relative incorporation of the donor template (Fig. 2f).

Mutation Detection by Decomposition
For the detection of individual mutations with the corresponding frequencies, the TIDE and TIDER software perform a decomposition of the mixed sequence signal in the experimental sample. This composite sequence trace is a linear combination of the wild type (control) and the mutated sequences. For TIDE, the decomposition is performed on a sequence segment downstream of the break site. As a rule of thumb, the larger the decomposition window is chosen, the more robust the estimation of mutations is (see Note 9). To perform the decomposition, a set of sequence trace models are generated that contain all possible indels of size {0..n} (n is by default set to 10). The models are derived from the control trace and contain all nucleotide peak signals of the decomposition window shifted by the appropriate number of positions to the left or right. A wild-type trace (shift 0) is also added as a model. Then, using non-negative linear modeling the combination of trace models that can best explain the composite sequence trace in the experimental sample is determined (Fig. 2c) (see Note 11). An R 2 value is calculated as a measure of the goodness of fit (see Notes 10 and 12), and the statistical significance of the detection of each indel is calculated.
For TIDER the mutation detection is more complex. It is mandatory that the decomposition window in TIDER covers the location of the designed mutation(s) in the donor template (see Notes 9 and 13). In contrast to TIDE, the decomposition window of TIDER spans by default only 100 bp. In case only few base pair changes are introduced, the sequence with the designed mutation will be very similar to the wild-type sequence. The smaller decomposition window of TIDER emphasizes the difference between the control and reference better. Simulations of all possible insertions and deletions are generated from the control file and placed in a decomposition matrix together with the control and reference. Subsequently, decomposition of the experimental sample is performed thereby choosing the best combination of the models in the decomposition matrix. This results in an estimation of the incorporation frequency of template-directed mutation(s) and distinguishes these from the background of indels that are introduced by error-prone repair (see Note 14).
The reliability of TIDE and TIDER depends on the quality of the input samples (see Note 15). For an accurate TIDE (R) estimation it is recommended that (1) R 2 > 0.9 and (2) aberrant signals upstream of the break site are below 10% in the quality plot. This applies to all files: control, reference, and experimental sample. To verify the results the samples can be sequenced from the opposite strand (see Note 13).

Sequence Determination of the þ1 Insertion (TIDE Only)
During repair of CRISPR-Cas9 a single base pair is frequently inserted at one of the DNA ends of the break [13,17,18]. TIDE provides an estimate of the base composition of this insertion. This may be of interest if one wishes to obtain a particular sequence variant (Fig. 2d). For longer insertions this base calling is computationally complicated and currently not implemented.

38
Eva Karina Brinkman and Bas van Steensel 1. Primer design recommendations for control and experimental sample. Primers a, b need to cover the CRISPR target site. The length of the PCR product can vary, but there should be at least >50 bp up-and downstream of the break site for the alignment (see Notes 6 and 7) and decomposition windows respectively (see Note 9).
2. Primer design recommendations for reference sample. Primer c, d should carry the designed mutation(s) as present in the donor template (see Subheading 3.2, Note 3). It is advised to include at least 10 complementary nucleotides on the 3 0 side of the mutation(s).
3. Donor plasmid contamination in isolated genomic DNA.
Potentially, a donor template that was transfected into the cells could co-purify with genomic DNA and be co-amplified in the PCR if it contains the primer sequences. This could result in an overestimation of the HDR events. This is generally not a problem with short ssODN donors, but with plasmid templates with long homology arms the primers a, b should be chosen outside of these homology arms. Alternatively, the donor plasmid may be cleared from the cells by a few passages of culturing.
4. Nuclease type. TIDE(R) is currently designed for regular Cas9. But it can be used to analyze data from another nuclease, by entering in the web tool the DNA sequence around the expected cut site. The TIDE(R) web tool assumes that the DSB is induced between nucleotides 17 and 18 of the guide RNA sequence string (Fig. 3f). Note that if the exact breakpoint is unknown, TIDE will estimate the amount of the indels correctly, but the nucleotide composition of the þ1 insertion will not be reliable. TIDER will only work when the exact cutting position is known and when the nuclease is a blunt cutter.

5.
No guide RNA match. Sometimes a mismatch occurs in the control sequence at the location of the sgRNA. This will stop the TIDE(R) analysis. In this case, edit the base annotation in the chromatogram file into IUPAC nucleotides of the expected control sequence (Fig. 3g). The peak signals in the chromatogram should not be altered. Viewing and editing of chromatogram files can be performed with Snapgene or Chromas software.
6. Alignment cannot be performed. By default, the alignment window begins at nucleotide number 100, because the first part of the sequence read tends to be of low quality. The end of the alignment window is set automatically at 15 bp upstream of the break site. When this window is too small or when the break site is located upstream of nucleotide 100, the alignment cannot be performed correctly. Then the start of the alignment window should be set manually closer to nucleotide number 1 (Fig. 3c).
7. Incorrect alignment. When the beginning of the sequence trace is of poor quality, the alignment function can make a mistake. This results in a quality plot with a high aberrant sequence signal along the whole length of the sequence trace (Fig. 3d). The aberrant sequence signal should only increase around the expected cut site (blue dashed line). In case of poor alignment, the start of the alignment window needs to be adjusted until a proper alignment is achieved (default of 100).
8. Quality plot recommendations. In the experimental sample, around the break site a consistently elevated signal is expected, which is due to indels introduced at the break site. The starting position of this elevated signal may be used to verify that breaks were induced at the expected location. The control trace should have a low and equally distributed aberrant sequence signal along the whole trace. The reference trace in the case of TIDER should only have high scores at the positions of the altered nucleotides. Fluctuations in the control and reference signal reflect local variation in the quality of the sequence trace. Near the end of the sequencing traces the aberrant signal is often high, typically due to the lower quality of the trace toward the end (Fig. 3a). When a sequence stretch of poor local quality is present in the decomposition window the calculations of TIDE(R) are compromised. The boundaries of the decomposition window can be manually adjusted to remove the region that is of low quality; this will improve the estimations. Another area to avoid in the decomposition window is a stretch of repetitive sequences. These regions can be recognized in the quality plot as a sudden stretch without aberrant nucleotides (Fig. 3b). Such region might confound the decomposition of the sequence trace.
9. Decomposition window recommendations. For TIDE, the default decomposition window spans the entire sequence trace from the break site until the end of the sequence minus the size of the maximum indel. When the boundaries of the decomposition window cannot fulfill this constraint, the software will report that the boundaries are not acceptable. For example, this can occur when the break site is too close to the end of sequence trace. To address this, the decomposition window boundaries should be set further apart or a smaller indel size should be chosen. Alternatively, new primers have to be designed according to Note 1. For TIDER the decomposition window is by default 20 bp upstream of the break to 80 bp downstream from the break. This smaller window compared to TIDE has more discriminatory power for subtle designed base pair changes. 10. Goodness of fit. R 2 is a measure for the reliability of the estimated values. For example, if the R 2 value is 0.95, it means that 95% of the variance can be explained by the model; the remainder 5% consists of random noise, very large indels, non-templated point mutations, and possibly more complex mutations. Decomposition results with a low R 2 must be interpreted with caution. A low R 2 can be caused when the settings are not optimal or when the sequence quality is not good (see Note 15). A low R 2 value can also arise when a sequence stretch with a poor local quality is present in the decomposition window (see Note 8). Furthermore, the presence of indels larger than the maximum indel size that is considered can affect the R 2 (default of 10). By default these are not modeled, which may result in a low R 2 score. The size range of indels that are modeled can be manually changed to larger number to test if this improves the fit (Fig. 3e).
11. Allele-specific indels. The different bars in the plot represent the insertions, deletions, and/or template-directed mutations in the cell population. These mutations are not specific of an allele. To determine allele-specific information a cell clone needs to be isolated and analyzed again by TIDE(R). A diploid cell gives a percentage of~50% per mutation.
12. Overall efficiency. The overall efficiency refers to the estimated total fraction of DNA with mutations around the break site. It is calculated as R 2 Â 100% wild type.
13. Distal designed mutations. It has been reported that the incorporation of donor template sequence is less efficient when the designed point mutations are further away from the break site [19]. This often leads to a variation in incorporation frequently of the distal and proximal designed mutations as can be observed in the quality plots. Such a situation may confound TIDER estimates. The decomposition window can be restricted to either the proximal or the distal mutations to resolve the individual efficiencies. 15. Poor sequence quality. When the sequence has poor quality overall, TIDE(R) will yield poor results with a low R 2 value (see Note 10) since too much noise is present in the data. The quality plot will show an overall high aberrant sequence signal in the control (the reference) and the experimental sample, before and after the break site (see Note 8). It is recommended to check the chromatograms of the samples (Fig. 3h) for poor sequencing quality. If so, these samples cannot be analyzed reliably by TIDE(R). Note that sometimes the peak signals in the chromatogram appear normal, but the file can contain wrongly unannotated or additional annotated nucleotides (Fig. 3i). TIDE(R) gives a warning when the spacing between the nucleotides in the chromatogram of the sequence trace is not consistent, which is often an indication for wrongly unannotated or additional annotated nucleotides. In case of this warning, the chromatograms should be carefully investigated (use Snapgene or Chromas software).