iGUIDE: an improved pipeline for analyzing CRISPR cleavage specificity
Genome engineering methods have advanced greatly with the development of programmable nucleases, but methods for quantifying on- and off-target cleavage sites and associated deletions remain nascent. Here, we report an improvement of the GUIDE-seq method, iGUIDE, which allows filtering of mispriming events to clarify the true cleavage signal. Using iGUIDE, we specify the locations of Cas9-guided cleavage for four guide RNAs, characterize associated deletions, and show that naturally occurring background DNA double-strand breaks are associated with open chromatin, gene dense regions, and chromosomal fragile sites. iGUIDE is available from https://github.com/cnobles/iGUIDE.
In the GUIDE-seq method (Fig. 1a), cells are exposed to designer nucleases such as Cas9/sgRNA complexes, and then a marker deoxyribonucleotide (ODN) is transfected into cells. Cellular DNA repair pathways then incorporate the oligonucleotide into the double-strand break site in the course of repair, thereby covalently marking the location of the break. Break sites can then be read out using ligation-mediated PCR (Fig. 1a, b), in which DNA is broken by sonication, adaptors are ligated to the broken DNA ends, then DNAs are amplified by two rounds of PCR using primers that bind to the adaptor and primers that bind to the incorporated ODN. PCR products are then analyzed by next-generation DNA sequencing and mapped onto the human genome scaffold.
However, a complication is that PCR primers will sometimes anneal to human DNA sequences other than the ODN and prime PCR. This results in PCR products that are indistinguishable from products formed by primer binding to the ODN, because the PCR primer contributes sequences identical to the ODN—thus mispriming will obscure the true distribution of cleavage sites. Estimates of Cas9/sgRNA off-target cleavage positions have varied widely, probably in part because of authentic variation among sgRNAs, but also likely due in part to variable admixture of mispriming artifacts.
Here, we present a revision of the GUIDE-seq method that allows mispriming artifacts to be distinguished from authentic ODN integration sites (Fig. 1b) and a vetted software pipeline to implement the analysis (available at https://github.com/cnobles/iGUIDE). The iGUIDE method involves the same steps as in GUIDE-seq, but a larger ODN is used (46 nt versus 34 nt). As a result, the PCR primer binding sites can be moved away from the junction between the ODN and flanking human-derived DNA, leaving a segment of the ODN in the final PCR product. Following DNA sequence acquisition, this ODN reporter sequence can be recognized in the DNA sequence data. Only products generated by PCR priming on the ODN will have this ODN reporter sequence—sequences acquired by mispriming will lack the ODN reporter. Thus, correct priming on the ODN can be distinguished from artifactual mispriming elsewhere in the human genome (Fig. 1b), a distinction that was not possible with the original GUIDE-seq design.
Results and discussion
Given the ability to distinguish mispriming from correct priming, we investigated the frequency and distribution of mispriming events generated in model studies of sgRNA/Cas9 nuclease targeting four loci. The first two, in VEGFA, have been studied extensively previously and serve as methodological controls. The other two are in the beta2-microglobulin gene (B2M) and the T cell receptor alpha chain coding region (TRAC). The evaluation of these latter two targets is of interest for disrupting pathways of antigen detection, a crucial component in the development of allogenic T cell immunotherapies. Disruption of either T cell receptor alpha chain or beta chain is sufficient to knockout the T cell receptor, while B2M is essential for presentation of the HLA-I complex . Samples were tested with or without the sgRNA and Cas9, and GUIDE-seq and iGUIDE were compared. All sgRNAs were tested in primary human T cells, which are of particular interest as substrates for modification due to their extensive use in human immunotherapy. Detailed protocols are provided in the “Methods” section and Additional file 2: Table S2 together with directions to custom software for sequence analysis.
Mispriming is readily detectable experimentally. In control reactions in which no ODN was added, amplification products were still detected, documenting amplification after binding of ODN primers to sequence-related sites on the human genome (Additional file 3: Table S3). An approach to quantifying mispriming in reactions with DNA from cells that were transfected with the ODN is shown in Fig. 1c. It is expected that mispriming takes place when PCR primers bind to human DNA sequences that happen to resemble the primer sequences—thus, inferred primer binding sites from mispriming events are expected to resemble the PCR primer sequence to a greater degree than is expected by chance. As can be seen, a substantial fraction of sites generated by GUIDE-seq (Fig. 1c, top) lacking the ODN reporter shows greater sequence homology to the PCR primer (blue) than random controls (red), indicative of widespread mispriming. However, samples where sites were filtered using resemblance to the iGUIDE dsODN reporter (Fig. 1c, bottom) were closer to the random control, indicating removal of misprimed sequences.
Figure 1d shows the sequence profile returned for on-target cleavage. The figure shows relative sequence coverage for the bases reported by iGUIDE surrounding the site of nuclease cleavage (arrow). The positions of the ODN also report the edges of deletions at the sites of sgRNA/CAS9 cleavage. Additional file 4: Figure S1 shows the data by site of DNA breakage.
A standard operating procedure for carrying out iGUIDE analysis is available in the Additional files (Additional file 5). We note that empirical experience shows that iGUIDE typically yields more total reads aligning to the human genome than does GUIDE-seq; possibly, the longer ODN is more stable in cells or incorporated by cellular enzymes more efficiently. In addition, we supply software that takes as input the raw iGUIDE sequence data and outputs a series of data tables and summaries. An example of such a reproducible report is in the Additional files (Additional file 6); while the most current version of the software is available here (https://github.com/cnobles/iGUIDE).
Improved filtering by iGUIDE allowed us to clarify the chromosomal features associated with spontaneous cellular DNA double-strand breaks and marking by ODN incorporation (Fig. 2e–g). Detailed analysis showed that spontaneous DNA double-strand breaks occur preferentially near active genes (Fig. 2g) and epigenetic marks associated with gene activity (Fig. 2e, f). Breaks also occur preferentially in AT-rich DNA and near previously annotated chromosomal fragile sites (13 to 19% enrichment, p value < 0.001 compared to random incorporation sites). The extent of these trends was obscured in GUIDE-seq data by admixture of mispriming artifacts. These findings now pose the question of whether chromatin structure and gene activity influence the initial formation of dsDNA breaks or the subsequent activity of repair pathways leading to ODN incorporation.
In conclusion, iGUIDE provides a method for quantifying sites of nuclease cleavage free of confounding mispriming artifacts and allows more accurate assessment of the distribution of dsDNA breaks in cells.
Editing the genes encoding beta2-microglobulin and the T cell receptor alpha constant region with Cas9 in T cells
Cas9 protein was delivered complexed with a single-guide RNA (sgRNA) against B2M (guide RNA sequence: GAGTAGCGCGAGCACAGCTANGG), TRAC (guide RNA sequence: TGTGCTAGACATGAGGTCTANGG), VEGFA site 2 (guide RNA sequence: GACCCCCTCCACCCCGCCTCNGG), and VEGFA site 3 (guide RNA sequence: GGTGAGTGAGTGTGTGCGTGNGG). Primary human CD4+ and CD8+ T cells were isolated from healthy volunteer donors following leukapheresis by negative selection using RosetteSep Kits. Primary lymphocytes were stimulated with anti-CD3/CD28 beads for 3 days. The Cas9 guide RNA complex was formed by incubating (10 min) Cas9 protein at room temperature with guide RNA at a molar ratio of 1:2.4. On day 4, the Cas9 complexed with sgRNAs targeting B2M were electroporated into the cells. After expansion for an additional 6 days, the T cells were harvested and genomic DNA was isolated.
Library preparation, DNA sequencing, and analysis
Libraries were prepared as described in the associated protocol for iGUIDE. Genomic DNA from samples was purified and randomly fragmented by ultrasonication. Adapters were ligated to end-repaired DNA, and targeted DNA was amplified through a nested-PCR from the incorporated dsODN to the ligated adapter sequence. Amplicons were purified and sequenced on an Illumina MiSeq with 300 cycle v2 reagent kits. Additional file 2: Table S2 presents oligonucleotides used in this study. Output sequence data was analyzed using the iGUIDE pipeline.
iGUIDE standard operating procedure
An SOP for carrying out iGUIDE is associated with this manuscript (Additional file 5).
We are grateful to members of the Bushman laboratory for help and suggestions.
This work was supported by AI 082020, AI 045008, AI 126620 and AI 117950. We also acknowledge support from the Penn Center for AIDS Research (P30 AI 045008), and the PennCHOP Microbiome Program.
Availability of data and materials
Sequencing data resulting from the iGUIDE and GUIDE-seq protocols has been deposited in the Sequence Read Archive under the BioProject accession PRJNA506241 . Source code for iGUIDE has been deposited in an archived format here, https://doi.org/10.5281/zenodo.2538220 , and can be found in its most up-to-date version here, https://github.com/cnobles/iGUIDE, under the GNU General Public License. Additionally, source code used to generate figures for this manuscript has been deposited in an archived format here, https://doi.org/10.5281/zenodo.2531901 .
All authors helped conceive the study. SR, JS, XL, MD, and YZ conducted biochemical analyses. CLN and FDB carried out informatic analysis. All authors read and approved the final manuscript.
Ethics approval and consent to participate
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 13.Veres A, Gosis BS, Ding Q, Collins R, Ragavendran A, Brand H, Erdin S, Cowan CA, Talkowski ME, Musunuru K. Low incidence of off-target mutations in individual CRISPR-Cas9 and TALEN targeted human stem cell clones detected by whole-genome sequencing. Cell Stem Cell. 2014;15:27–30.CrossRefGoogle Scholar
- 19.Nobles CL, Reddy S, Salas-McKee J, Liu X, June CH, Melenhorst JJ, Suhoski M, Zhao Y, Bushman FD. iGUIDE: an improved pipeline for analyzing CRISPR cleavage specificity. Seq Read Arch. 2018; https://www.ncbi.nlm.nih.gov/bioproject/506241.
- 20.Nobles CL, Reddy S, Salas-McKee J, Liu X, June CH, Melenhorst JJ, Suhoski M, Zhao Y, Bushman FD. iGUIDE software. Zenodo. 2019. https://doi.org/10.5281/zenodo.2538220.
- 21.Nobles CL, Reddy S, Salas-McKee J, Liu X, June CH, Melenhorst JJ, Suhoski M, Zhao Y, Bushman FD. iGUIDE manuscript source code. Zenodo. 2019. https://doi.org/10.5281/zenodo.2531901.
- 22.Fraietta JA, Nobles CL, Sammons MA, Lundh S, Carty SA, Reich TJ, Cogdill AP, Morrissette JJD, DeNizio JE, Reddy S, et al. Disruption of TET2 promotes the therapeutic efficacy of CD19-targeted T cells. Nature. 2018;558(7709):307–12.Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.