High-resolution genome-wide mapping of chromosome-arm-scale truncations induced by CRISPR–Cas9 editing

Lazar, Nathan H.; Celik, Safiye; Chen, Lu; Fay, Marta M.; Irish, Jonathan C.; Jensen, James; Tillinghast, Conor A.; Urbanik, John; Bone, William P.; Gibson, Christopher C.; Haque, Imran S.

doi:10.1038/s41588-024-01758-y

High-resolution genome-wide mapping of chromosome-arm-scale truncations induced by CRISPR–Cas9 editing

Article
Open access
Published: 29 May 2024

Volume 56, pages 1482–1493, (2024)
Cite this article

Download PDF

You have full access to this open access article

From

View current issue Submit your manuscript

High-resolution genome-wide mapping of chromosome-arm-scale truncations induced by CRISPR–Cas9 editing

Download PDF

Nathan H. Lazar¹,
Safiye Celik¹,
Lu Chen¹,
Marta M. Fay¹,
Jonathan C. Irish¹,
James Jensen¹,
Conor A. Tillinghast¹,
John Urbanik¹,
William P. Bone¹,
Christopher C. Gibson¹ &
…
Imran S. Haque ORCID: orcid.org/0000-0001-7782-2852¹

19k Accesses
4 Citations
280 Altmetric
35 Mentions
Explore all metrics

Abstract

Clustered regularly interspaced short palindromic repeats (CRISPR)–CRISPR-associated protein 9 (Cas9) is a powerful tool for introducing targeted mutations in DNA, but recent studies have shown that it can have unintended effects such as structural changes. However, these studies have not yet looked genome wide or across data types. Here we performed a phenotypic CRISPR–Cas9 scan targeting 17,065 genes in primary human cells, revealing a ‘proximity bias’ in which CRISPR knockouts show unexpected similarities to unrelated genes on the same chromosome arm. This bias was found to be consistent across cell types, laboratories, Cas9 delivery methods and assay modalities, and the data suggest that it is caused by telomeric truncations of chromosome arms, with cell cycle and apoptotic pathways playing a mediating role. Additionally, a simple correction is demonstrated to mitigate this pervasive bias while preserving biological relationships. This previously uncharacterized effect has implications for functional genomic studies using CRISPR–Cas9, with applications in discovery biology, drug-target identification, cell therapies and genetic therapeutics.

Unintended CRISPR-Cas9 editing outcomes: a review of the detection and prevalence of structural variants generated by gene-editing in human cells

Article Open access 24 April 2023

Next-Generation Sequencing of Genome-Wide CRISPR Screens

CRISPR Off-Target Analysis Platforms

Main

Clustered regularly interspaced short palindromic repeats (CRISPR)–CRISPR-associated protein 9 (Cas9)-based methods are powerful genome editing tools with applications in in vitro discovery biology, ex vivo editing for cell therapies and in vivo editing for genetic therapeutics¹. Cas9 is programmably scalable and relatively specific compared with earlier technologies such as zinc finger nucleases, TALENs, and small interfering RNAs^2,3. However, CRISPR–Cas9-based editing is known to have off-target activity and undesired on-target changes, such as kilobase-scale deletions^4,5,6, chromosome truncation^7,8,9,10 and complex rearrangements^5,6. Profiling these effects systematically is crucial for discovery and therapeutic development but is costly and labor intensive with existing molecular or sequencing-based methods.

Pooled CRISPR–Cas9 knockout screens have been widely used to identify essential genes in tumor-derived cell lines^11,12,13. These studies have reported associations between copy number variants (CNVs) and chromosomal instability at the target site after CRISPR–Cas9 editing and introduced methods to help correct for those effects^{14,15,16,17,18}. However, they have not explored the effects of CRISPR–Cas9-induced chromosomal changes that are unrelated to CNVs nor have they explored these effects in primary cell types or with endpoints other than essentiality.

Cellular morphological profiling, or ‘phenomics’, is an emerging technology for high-dimensional phenotyping that offers a powerful alternative to transcriptomic or proteomic assays¹⁹. Measuring cellular morphology, a holistic functional endpoint of the cellular state, generates high-dimensional single-cell data at a lower cost than molecular methods such as single-cell RNA sequencing (scRNA-seq).

We applied phenomics to systematically profile CRISPR–Cas9 knockouts in primary human cells, targeting over 17,000 genes with more than 100,000 CRISPR guides. A proprietary deep-learning model encoded six-stain fluorescent images of plated cells¹⁹, producing a single ‘gene vector’ representing the phenotype of each perturbed gene. Cosine similarity between gene vectors acted as a pairwise measure of phenotypic similarity between knockouts, recapitulating known and novel biological relationships, including protein complexes and annotated pathways, and can be extended to assess similarity among a broad range of cellular perturbations, including genetic, large-molecule and small-molecule treatments²⁰. In this Article, we report the observation of ‘proximity bias’, where CRISPR knockout phenotypes are systematically more similar to unrelated genomically proximal genes located on the same chromosome arm. This effect is found to be general across laboratories, cell types and Cas9 delivery mechanisms and is dependent on nuclease activity. Also, patterns of proximity bias reflect differences between reference genomes and true chromosomal structure, including large-scale structural variants. Molecular investigation with bulk and single-cell transcriptomic analysis supports large-scale chromosomal truncation as the driving mechanism. Additionally, we reanalyzed the Cancer Dependency Map (DepMap) genome-wide CRISPR–Cas9 screens²¹ in cancer cell lines to confirm the impact of proximity bias on target discovery, propose potential mediators and show that this effect persists even when controlling for cell-line specific CNVs. Finally, we show that an arm-based normalization of gene-level features largely corrects for this bias without affecting the recovery of known biological relationships.

Results

Genome-wide profiling recapitulates known biology

To produce a genome-wide ‘map’ of pairwise phenotypic similarity between gene knockouts, we performed a phenomics screen using CRISPR–Cas9 to knock out 17,065 genes in primary human umbilical vein endothelial cells (HUVEC) with 101,029 guides (typically six guides per gene and 24 replicates per guide) leveraging a highly automated robotic workflow (rxrx3 dataset)²².

To validate phenomics, we computed a complementary similarity map using the cpg0016 dataset from the Joint Undertaking in Morphological Profiling–Cell Painting consortium²³. The key differences between rxrx3 and cpg0016 include gene sets, cell types and Cas9/guide delivery protocols. Cpg0016 profiles fewer genes (n = 7,975) with fewer samples (n = 4 guides and five replicates per guide), screens the U2OS osteosarcoma cell line and uses lentiviral Cas9 delivery with lipofection of guide RNA pools. For rxrx3, we applied a proprietary deep-learning model to extract features; for cpg0016, we used the CellProfiler-derived features²⁴ provided in the Joint Undertaking in Morphological Profiling–Cell Painting source data. For both datasets, the gene vectors were aggregated to build a genome-wide ‘map’ to compare phenotypic similarities using cosine similarity (Fig. 1a).

**Fig. 1: Heat maps of phenotypic similarity between gene knockouts recapitulate known biology as well as genomic proximity effects.**

We evaluated the ability of rxrx3 and cpg0016-based maps to recapitulate known biology in both a targeted and a broad sense. Targeted examination of genes in well-studied pathways showed that gene–gene similarities recapitulate both biology that is highly conserved across cell types (for example, microtubule, proteasome and autophagy genes), as well as therapeutically relevant pathways including Janus kinase (JAK)/signal transducer of activation (STAT), transforming growth factor (TGF)-beta and insulin receptor (Fig. 1b and Extended Data Fig. 1). Despite the methodological differences between datasets, large-scale benchmarking²⁰ of both datasets shows substantial recall of known annotations drawn from public datasets, including Reactome²⁵, HuMap²⁶ and CORUM²⁷ (Fig. 1c).

Knockouts show increased similarity within chromosome arms

Upon the generation of the rxrx3 full-genome knockout data, we noticed a curious bias: the distribution of cosine similarities for gene pairs on the same chromosome was shifted relative to gene pairs on different chromosomes (Extended Data Fig. 2a). Visualizing the full genome-wide dataset ordered by genomic coordinate showed a striking structure in which knockouts of genomically proximal genes on the same chromosome arm were systematically more phenotypically similar to one another than distal pairs (Fig. 1d,e). To test whether proximity bias was an artifact of our experimental set up (laboratory protocol, Cas9 and guide delivery system, HUVEC cell type, computational image analysis and featurization scheme and so on), we performed the same visualization in cpg0016 and found a very similar effect both visually in the genome-wide heat map (Fig. 1d,e) and in intra- versus interchromosomal similarity distributions (Extended Data Fig. 2a).

As proximity blocks appeared to correlate with chromosomal structure, we wondered whether proximity blocks would also reflect nonreference structure in genomically abnormal cells. The U2OS line used in cpg0016 is known to be karyotypically abnormal and heterogeneous, with different clones exhibiting distinct genotypes²⁸. DepMap¹¹ cataloged a fusion between RAD50 on chromosome 5q and ZNF536 on chromosome 19q in this cell line, and examination of the cpg0016 map shows a clear block of interchromosomal proximity bias between 5q and 19q that closely recapitulates the boundaries of the annotated fusion (Fig. 1f).

Finally, we quantified the proximity bias effect by estimating, for each chromosome arm, the probability of a within-arm relationship displaying a higher cosine similarity than a between-arm relationship using a nonparametric Brunner–Munzel test²⁹. This metric is both comparable across maps that may have different numbers of tested genes and flexible enough to be used to quantify the bias encoded in an entire map, within each chromosome arm or at the gene level (by restricting to relationships involving that arm or gene). At the full-map level, the probability of a within-arm relationship being ranked above a between-arm relationship was 0.71 for the rxrx3 dataset and 0.72 for the cpg0016 data (P < 1 × 10⁻¹⁰). At the chromosome-arm level, both datasets show a significant effect for all chromosome arms (Fig. 1g,h).

Proximity bias arises from chromosome-arm truncation

Several unintended consequences of Cas9 editing have been discussed in the literature^{4,5,6,7,8,9,10,30,31,32,33,34}, and while the rxrx3 and cpg0016 genome-wide maps show widespread proximity bias, they do not directly nominate a mechanism. However, observing the maps in Fig. 1, we found knockouts of genes closer to a centromere often display a stronger proximity bias signal. To quantify this, we plotted the gene-level Brunner–Munzel probability versus relative chromosome-arm position and found negative correlations that were significant for most chromosome arms (Fig. 2a,b and Extended Data Fig. 2b,c). This suggests a model in which Cas9 editing can cause chromosomal truncations resulting in mixed phenotypes due to multiple gene deletions (Fig. 2c).

**Fig. 2: Genome-wide phenomic measurements and transcriptomic data support a model of chromosomal truncation underlying proximity bias.**

We sought to test this hypothesis by searching for and quantifying chromosome-arm truncations in sequencing data, paralleling similar searches for truncations and deletions in literature^9,10. We reanalyzed two scRNA-seq datasets uniformly reprocessed as part of the scPerturb resource^34,35, which profiled the effects of CRISPR–Cas9 gene knockout in the THP-1 leukemia line and in melanoma-derived melanocytes, respectively^36,37. Following previous work^9,10, we assessed deletions in this data by identifying the genes, which, when targeted by CRISPR–Cas9, result in substantial, significant copy number loss in more than 70% of the 150 genes near the cut site in the 3′ or 5′ direction (Methods). Across these datasets, editing at 4–25% of targeted genes resulted in copy number loss. Moreover, those losses were significantly more likely to occur in the telomeric direction (Fisher’s exact test P = 1.4 × 10⁻⁵), further supporting the model of chromosome-arm truncations and supporting the cell-type independence of proximity bias (Table 1). As reported in previous work⁹, for targets with a called loss, only a fraction of cells exhibited a deletion (mean 4.3% and maximum 15.1%) (Supplementary Table 2). Figure 2d,e highlights the enrichment for loss in the telomeric direction for each of the two datasets examined, by showing whole-genome copy number calls for the cells that exhibited a deletion.

Table 1 Single-cell sequencing reveals widespread on-target proximal deletions from CRISPR–Cas9

Full size table

To further validate the finding of chromosomal arm truncations, we searched an internal Recursion database of HUVEC bulk RNA sequencing data and focused on a high-replicate set of 45 targeted genes, each treated with a single intron-targeting guide in 63 replicate samples compared with a no-guide reference pool of 3,320 samples (average 1.3 million unique reads per well). Comparing copy number calls from Cas9-edited wells to Cas9-free controls, we observed multiple loci enriched for deletions between the target cut site and the telomere, including the genes ZNF394 (located on chromosome 7q) and RCAN3 (on chromosome 1p) (Fig. 2f, Extended Data Fig. 2d and Supplementary Table 1). Although the number of loci found in this search was limited, these events are expected to be rare and, therefore, difficult to detect in bulk assays, particularly by RNA rather than DNA sequencing.

Proximity bias confounds therapeutic target identification

A key application of genome-wide knockout screening is in mapping of biological pathways, particularly for therapeutic target discovery. Consequently, we investigated the potential impact of proximity bias on target discovery in a widely used, publicly available resource.

Project Achilles has performed genome-wide CRISPR screens of cell survival in hundreds of cancer cell lines in an effort to identify potentially druggable essential genes for a range of tumor types, contributing to the DepMap¹¹. We surmised that if DepMap CRISPR screens were also affected by proximity bias, then it would manifest as patterns of essentiality across cell types that cluster unexpectedly by genomic proximity. To that end, we built genome-wide maps from DepMap CRISPR 19Q3 data; in these maps, each gene was characterized by a vector representing its essentiality in each of the 625 tested cell lines rather than as a vector of morphological features (Fig. 3a). Visual examination and quantification of the DepMap CRISPR map confirms the presence of arm-scale proximity bias (Fig. 3b,c), and the proximity bias effect is maintained in a newer version of these data (22Q4), which controls for CNVs¹⁷.

**Fig. 3: CRISPR–Cas9 screens of cancer gene essentiality are significantly confounded by proximity bias.**

While correlations between gene dependency and genomic location have been reported before and several correction methods have been implemented^{14,15,16,17,38}, the effect was thought to be primarily driven by copy number variation in these cancer cell lines. Since we observe similar effects in copy number-normal cell lines, we sought to disentangle CNV-based effects from the proximity bias effect using the following procedure (Methods). Beginning with the full set of 1,078 cell lines in the CRISPR–Cas9 DepMap 22Q4 data, we first looked for subsets of cell lines that were free from CNVs on each autosomal chromosome arm. Then, for each pair of those arms, we intersected the cell line sets and assessed proximity bias by computing the Brunner–Munzel probability of within-arm cosine similarities exceeding between-arm cosine similarities (741 arm pairs; intersection cell line counts minimum, maximum and mean of 73, 314 and 174, respectively) (Supplementary Tables 5 and 6). These values were compared with probabilities from the same process but using all cell lines in both the CRISPR–Cas9 and short hairpin RNA (shRNA) data (190 cell lines) (Extended Data Fig. 3b). Controlling for copy number in this way significantly reduces proximity bias (Mann–Whitney U, P value <1 × 10⁻¹⁰) but fails to eliminate it, with all arm pairs showing arm-level Brunner–Munzel probabilities above 0.5. Since restricting to fewer cell lines reduces the power to detect gene–gene interactions in general, we also see a reduction in Brunner–Munzel probabilities toward 0.5 when subsampling cell lines randomly. While a further reduction is observed in the cell lines with very few CNVs, there is still significantly more proximity bias in CNV-controlled CRISPR–Cas9 arm pairs than in pairs formed from shRNA data in which CNVs are still present, but there is no DNA cutting, suggesting that CNVs alone cannot explain the effects observed (Mann–Whitney U, P value <1 × 10⁻¹⁰) (Extended Data Fig. 3b). Finally, we note that the DepMap Chronos pipeline for CRISPR data excludes guides which map to multiple regions, so these results cannot be explained by multitargeting guides¹⁷.

In addition, we examined the results from shinyDepMap³⁹, which sought to cluster genes with similar dependencies to identify druggable targets and pathways using the 19Q3 data. Examination of 16,941 inferred gene–gene relationships from shinyDepMap CRISPR data revealed that a large number of putative relationships inferred are within chromosomal arms. Comparing the odds of identifying intra- versus inter-arm connectivity in the shinyDepMap clusters with other databases of known biological relationships derived from pathway or protein complex data revealed that shinyDepMap contains far more intra-arm annotated relationships than other sources (Fisher exact test odds ratio of 0.068, P < 0.0001). This suggests that results of downstream DepMap CRISPR analyses may be significantly confounded by proximity bias (Extended Data Fig. 3c,d).

Finally, we sought to identify cancer-specific false-positive dependency calls due to proximity bias and not CNVs. If the hypothesis that proximity bias is caused by telomeric truncations were correct, some unexpressed genes centromeric of driver genes would spuriously appear as essential, since occasional truncations telomeric of the targeted unexpressed gene would also delete the true driver. To explore this, we used the DepMap 22Q4 dependency data and first stratified cell lines by their annotated cancer subtype; then, for each gene, we restricted the cell lines to those without CNVs (copy number within (1.75, 2.25)) and tested for differences in dependency for that subtype versus all other cell lines. Examining three cancer subtypes, we found a number of genes centromeric of known subtype-specific driver genes that have low expression (transcripts per million reads <0.3) but nevertheless exhibit significantly higher dependency (that is, appear more essential) than in other subtypes (Benjamini–Hochberg adjusted t-test, P < 0.01). For example, on chr2p for renal cell carcinoma, four genes centromeric of the driver EPAS1 (ref. ⁴⁰) (C2orf73, ARHGAP25, VAX2 and LRRTM4) satisfy these criteria, as do two genes on chr18q for B-lymphoblastic leukemia and lymphoma centromeric to the driver BCL2 (ref. ⁴¹) (ELOA2 and GRP) and three genes on chr2p for neuroblastoma genes centromeric to the driver SOX11 (ref. ⁴²) (KCNF1, NTSR2 and FAM166C) (Supplementary Table 7). This suggests that these essentiality annotations may be spurious and actually are caused by proximity bias-related truncations of nearby true driver genes.

Proximity bias is dependent on Cas9 nuclease activity

Given the proposed model of large truncations, we sought to confirm whether proximity bias is dependent on nuclease activity of Cas9 by analyzing CRISPR interference (CRISPRi)⁴³ and shRNA screens⁴⁴. We extended our analysis of scRNA-seq CRISPR–Cas9 datasets from scPerturb³⁵ to three CRISPRi datasets^45,46,47 and found that in contrast to CRISPR–Cas9-perturbed cells, in which 4.2–25% of genes resulted in large chromosomal losses when targeted (Table 1), only up to 2.6% of target genes were observed to have such losses across three CRISPRi datasets (Table 1 and Supplementary Table 2). We also examined whether there is evidence of telomeric loss enrichment in the significantly smaller proportion of genes showing loss in the CRISPRi datasets, and our findings were negative (Fisher’s exact test, P = 0.68). Additionally, a map built from DepMap shRNA screening data did not show substantial proximity bias, suggesting that the effect arises as a specific consequence of CRISPR–Cas9 editing (Extended Data Fig. 4a,b).

DepMap data connects proximity bias and cell cycle

As the DepMap data profiled a wide range of cell lines with diverse genetic backgrounds, we hypothesized that by stratifying cell lines according to genetic features and constructing maps out of slices of these data, we may be able to elucidate the potential biological mechanisms behind, and the mediators of, proximity bias.

TP53 expression has been suggested as a marker for reduced aneuploidy in CRISPR–Cas9 editing^9,31,48, and previous work has established that p53 activity reduces CRISPR–Cas9 editing efficiency through activation of DNA repair and apoptosis^{49,50,51,52,53}. Thus, loss of TP53 would be expected to increase proximity bias by increasing the rate of chromosomal arm truncations. We stratified DepMap cell lines by TP53 loss-of-function (LOF) status and found significantly increased proximity bias (t-test, P <1 × 10⁻¹⁰) in a CRISPR map built from putatively TP53-null cell lines (LOF) compared with one built using only cell lines with putatively functional wild-type (WT) TP53 (Fig. 4a).

**Fig. 4: Role of *TP53*, cell-cycle and replication-associated genes in proximity bias.**

Next, we searched for additional gene mediators while controlling for TP53 status across eight splits: for genes in which putative LOF or amplification (AMP) either increased or decreased proximity bias, in either a TP53 null or functional background (Supplementary Table 3). Several genes showed interesting behaviors that support their known functions in cell cycle and TP53 regulation (Fig. 4b,c and Supplementary Table 3). In both the TP53 WT and TP53 LOF settings, we found that loss of CDKN2A or CDKN2B significantly increases proximity bias while CDKN2C AMP decreases proximity bias in a TP53 LOF background (cell line bootstrap t-test Bonferroni-corrected P < 0.05; DepMap did not have sufficient cell lines for us to test CDKN2C in the TP53 WT setting). This suggests that these cell cycle regulators^54,55,56 act independently of p53. Conversely, AMPs of the TP53 regulators MDM2 and MDM4 (ref. ⁵⁷) show differential effects on proximity bias depending on the TP53 background. Both AMPs increase proximity bias when a functional TP53 is present, but MDM2 AMP has no effect in the TP53 LOF setting, while MDM4 AMP decreases proximity bias in that environment. This suggests that the effect of MDM2 on proximity bias is entirely mediated through TP53.

Additionally, we found that because of large-scale CNVs, identifying drivers of proximity bias is itself affected by chromosome-position effects, making it difficult to confidently fine-map individual driver genes within a genomic region. For example, BTG2 surfaced as a potential driver but is located only 1.2 Mb in the centromeric direction from MDM4 on chromosome 1 and appears to mimic its impact on proximity bias in both the TP53 WT and LOF conditions. Upon closer inspection, we find that all 15 of the genes between these two, with sufficient data to assess, show the same pattern despite no known cancer or TP53 associations (Extended Data Fig. 5a,b).

We also looked for enriched biological processes among the genes with largest impacts on proximity bias in each of the above contexts using ShinyGO (v0.77)⁵⁸. Selecting genes with mean differences in Brunner–Munzel probabilities between WT and LOF and AMP conditions of less than −0.1 or greater than 0.2, we found the strongest associations in the TP53 WT setting were with ‘regulation of cell population proliferation’ and ‘positive regulation of cell population proliferation’, where AMP of 34 and 25 genes, respectively, show increased proximity bias (P <1 × 10⁻¹⁰). In the TP53 LOF setting, the strongest associations were with ‘regulation of apoptotic processes’ and, again, ‘regulation of cell population proliferation’ (P < 1 × 10⁻¹⁰) where, in both cases, AMP for 28 genes shows increased proximity bias (Supplementary Table 4). This supports the hypothesis that proximity bias is driven by chromosome-arm truncations and suggests a mechanism in which inhibition of apoptosis may lead to unrepaired double-strand breaks and loss of acentric chromosome-arm fragments during mitosis^9,10.

Geometric correction reduces proximity bias

Given that the proximity bias effect appears to be largely localized within chromosome arms, we hypothesized that applying a chromosome-arm correction to rxrx3 and cpg0016 maps might mitigate the unwanted signal. To that end, we adjusted the vector representation for each gene by subtracting an estimated representation of the chromosome arm in which the gene is located built using unexpressed genes (Methods). This significantly reduces the proximity bias effect, both globally and per chromosome arm (Fig. 5a–d) while maintaining or improving genome-wide benchmarking metrics in both datasets (Fig. 5e). Interestingly, the recall of annotated within-arm relationships decreases with the chromosome-arm correction (Fig. 5e), but this is outweighed by improved recall on the larger number of between-arm annotated relationships, suggesting that the proximity bias effect can confound such benchmarking efforts if it is not taken into account.

Following the preprint publication of this work, DepMap released the 23Q2 revision of its Project Achilles CRISPR screens, incorporating a correction similar to that suggested in this section (https://forum.depmap.org/t/announcing-the-23q2-release/2518). A genome-wide map built from this data similarly reduces proximity bias, demonstrating the generality of this geometric correction across modalities (Extended Data Fig. 3a). Additionally, the potential false positive driver genes for specific cancer subtypes discussed in the previous section are reduced with six of the nine highlighted genes no longer showing a subtype-specific dependence (Supplementary Tables 7 and 8).

Discussion

Since its discovery, the CRISPR–Cas9 editing system has become a valuable research tool; however, deep characterization has revealed potential issues arising from undesired on-target effects. In this work, we use cellular phenomics to systematically profile CRISPR-induced gene knockouts for virtually all human protein-coding genes in a primary human cell type and have replicated our findings across cell types, assay contexts and molecular follow-up. We discover an undesired on-target effect driven by a small fraction of cells that results in knockout phenotypes displaying a ‘proximity bias’ that probably arises from chromosomal truncations and is ubiquitous across cell types, genetic loci and measurement modalities but can be computationally corrected given proper controls.

This refines prior work that asserted aneuploidy and chromosome truncation as a potential consequence of CRISPR–Cas9 editing in T cells^9,10, primarily by focusing on the TRAC locus (14q11.2, near the centromere of the acrocentric chromosome 14). Reanalyzing Perturb Cellular Indexing of Transcriptomes and Epitopes (Perturb-CITE) sequencing data from melanoma cells³⁷, we found evidence for similar occasional loss of the entire chromosome 21 arising from editing of SLC19A1, located on the q arm (21q22.3) (Fig. 2e and Supplementary Table 2). In our transcriptomic analysis, chromosomal truncations were primarily seen to proceed in the direction away from the centromere, but since it is well established in medical genetics that the short arms of acrocentric chromosomes 13, 14, 15, 21 and 22 are nonessential⁵⁹, this suggests that the observation of whole-chromosome loss in T cells is probably a specific artifact of editing pericentromeric loci on acrocentric chromosomes.

The apparent generality of undesired on-target effects from CRISPR–Cas9 editing raises potential concerns for both functional genetic screening and therapeutic gene editing. Prior literature has primarily examined cell lines^31,32,33 or zygotes, embryos or embryonic stem cells^4,6,7,8,30, which have varying DNA damage responses, so continuing to establish the importance of these effects in somatic primary cells will be important^49,50,52. More recent work^9,10 has shown recurrent aneuploidy in ex vivo edited human T cells and suggested that protocols inducing TP53 expression before editing may be protective for chromosome truncation. However, TP53 induction may not be feasible in many settings. In particular, somatic loss of TP53 has been observed to increase in frequency with age in a variety of nonmalignant tissues, including colonic epithelium and blood^60,61,62, suggesting that potential risks related to in vivo CRISPR–Cas9 editing may be age dependent. Although no negative consequences due to unintended effects of CRISPR cutting have yet been documented in patients, further research is necessary to detect and quantify the presence of chromosomal losses in in vivo editing to maximize patient safety.

Our chromosome-arm truncation hypothesis is consistent with recent findings from other groups^7,32, based on transcriptomic evidence and DepMap¹¹ analysis, and suggests a mechanism involving CRISPR–Cas9-induced losses in a subpopulation of cells, with increased mitosis potentially amplifying the effect. We find higher rates of deletions in CRISPR–Cas9 RNA sequencing data relative to shRNA on both sides of the cut but are more common in the telomeric direction. While previous work analyzing dependency studies in cancer cell lines found similar effects due to copy number variations^14,15,16, we demonstrate that this effect is largely independent of copy number by quantifying its presence in primary cell types and in regions of cancer cell lines that lack CNVs. The mechanism proposed here generates testable hypotheses for future research, exploring the impact of mitogens, cell cycle inhibitors or repeated passaging on deletion rates and suggests that highly mitotic cell types may experience more proximity bias in CRISPR–Cas9 functional genomics screens than slowly or nondividing cells.

Additionally, inspection of whole-genome similarity maps similar to Fig. 1d–f suggests that proximity bias patterns are more complex than just increased similarity within chromosome arms and that these patterns probably differ between cell types. This may be due to a wide variety of factors including differences in susceptibility to truncation, epigenetic state influencing Cas9 efficiency, gene haploinsufficiency, gene essentiality and the strength of phenotypic effects caused by genes telomeric from the target loci. To deconvolve these effects, a further investigation across many cell types with consistent data collection and processing is needed.

Finally, we suggest a correction strategy that estimates and removes the confounding signal on each chromosome arm using unexpressed genes. This highlights the advantages of taking a genome-wide view and suggests control strategies both for large gene surveys and for more targeted screening. Beyond this geometric correction, a wide range of other mitigation strategies may be developed to combat proximity bias. From a biological or biochemical perspective, it is probable that the use of noncutting perturbations—for example, CRISPRi⁴³, CRISPRoff⁶³, base editors or RNA-targeting perturbations such as Cas13d⁶⁴—would circumvent proximity bias; however, some recent studies suggest that base and prime editors can induce double-strand breaks and associated deletions or translocations⁶⁵. With cutting-based CRISPR assays, activation of p53 or DNA repair pathways (for example, through nutlin pretreatment)⁶⁶ may mitigate this effect, as may the addition of free nucleotides or optimizing the timing of experimental steps⁶⁷, modifying Cas9 constructs⁶⁸ or extending the 5′ end of single guide RNAs with cytosine bases⁶⁹. Additionally, given that these effects are probably driven by a relatively small subpopulation of cells, improved data cleaning strategies may also prove fruitful. Ideas here include the filtering of subsets of cells in transcriptomics or patches of images in phenomics or utilizing loss functions during neural network training to ignore populations of affected cells. While each method has particular limitations (for example, durability, specificity and computational intensity), the quantification methods presented in this study can be used to judge effectiveness and to drive innovation in this area.

Methods

This research complies with all relevant ethical regulations as approved by Recursion Pharmaceuticals.

Cell culture

HUVEC umbilical vein endothelial cells (Lonza, C2519A) at early passage are expanded within an acceptable window of in vitro culture in single-use bioreactor systems that provide 250,000 cm² of growth surface. This results in a yield of 10 × 10⁹ cells to screen up to 4,000 1,536-well plates. HUVEC are produced and banked in vapor-phase liquid nitrogen and successfully seeded into high-throughput screens directly from stasis post editing. HUVEC are seeded into 1,536-well microplates (Greiner, 789866) via Multidrop (Thermo Fisher) and incubated at 37 °C in 5% CO₂ for the duration of the experiment.

CRISPR–Cas9 editing

Custom-designed Alt-R CRISPR–Cas9 reagents were purchased from Integrated DNA Technologies and prepared following the manufacturer’s guidelines and protocols (Alt-R CRISPR–Cas9 crRNA, Alt-R CRISPR–Cas9 trans-activating RNA (tracrRNA) cat. no. 1072534, Alt-R S.p. Cas9 Nuclease V3, cat. no. 1081059). Alt-R CRISPR–Cas9 crRNA was duplexed to Alt-R CRISPR–Cas9 tracrRNA and then combined with Alt-R S.p. Cas9 Nuclease V3, following Integrated DNA Technologies guidelines, to form a functional CRISPR–RNP (ribonucleoprotein) complex. This CRISPR–RNP complex was transfected into cells with a proprietary lipofection-based process for high-throughput application.

To control for and filter nonproximal off-target effects of individual guides, each gene was targeted with 4–12 nonoverlapping guides (89% of genes targeted by six guides), for a total of 101,029 guides. Each guide was assessed independently in an arrayed format, typically with 24 total replicate wells per guide across two executional batches.

Phenomic imaging

The plates were stained using a modified cell painting protocol¹⁹. The cells were treated with MitoTracker deep red (Thermo, M22426) for 35 m; fixed in 3–5% paraformaldehyde; permeabilized with 0.25% Triton X100; stained with Hoechst 33342 (Thermo), Alexa Fluor 568 Phalloidin (Thermo), Alexa Fluor 555 wheat germ agglutinin (Thermo), Alexa Fluor 488 concanavalin A (Thermo) and SYTO 14 (Thermo) for 35 min at room temperature; then washed and stored in Hanks’ balanced salt solution + 0.02% sodium azide. The images were acquired with ImageXpress micro confocal microscopes (Molecular Devices) in wide field mode using a PlanApo 10× 0.45 numerical aperture objective and Spectra-3 light-emitting diode light engine (Lumencor). For the sake of acquisition speed, six-channel imaging was accomplished using three combinations of two dichroic mirrors and three emission filters.

Phenomic analysis

All images were uploaded to cloud storage and featurized by embedding them with a proprietary convolutional neural network trained on the public RxRx1 dataset using Google Cloud Platform as described in a previous work⁷⁰. The images are captured at 2,048 × 2,048 pixels and divided into 16 tiled patches, which are each embedded separately. Those embeddings are averaged to create a single representation for each imaged well.

Generation of gene-level representations for rxrx3 HUVEC data

For this screen, Recursion ran 176 12-plate experiments in 1,536-well plates, generating 24 images per guide for a total of 101,029 guides and 17,065 genes. The embedding vectors for each image were centered on a set of perturbation controls, aligned using typical variance normalization and aggregated to the gene level as described in Celik et al.²⁰. The externally released version of this data described in Fay et al.²² contains all the same gene guides but was processed with an older pipeline and contains fewer replicates per guide (18), so there may be small discrepancies with the data shown here.

Generation of gene-level representations for cpg0016 U2OS data

The well-level aggregated CellProfiler profiles were downloaded from the Cell Painting Gallery²³. The ‘Image’ CellProfiler and ‘ObjectNumber’ features were discarded, and the remaining features were normalized by plate. A principal component analysis was performed using a 98% variance cutoff to reduce the dimensionality of the data, followed by an additional plate normalization step. The experimental replicates were aggregated by taking the mean to yield a feature representation per gene.

Normalization of cosine distributions across maps

To make different heatmaps (for example, rxrx3 HUVEC data and cpg0016 U2OS data) visually comparable, cosine similarity values for each map were quantile normalized to a normal distribution with mean zero and standard deviation 0.2 for display purposes only.

Benchmarking of known relationships

To assess how well a map embedding recapitulates known biology, we calculated recall measures on known pairwise relationships from annotated sources (Reactome²⁵, HuMap²⁶ and CORUM²⁷) as follows. Given pairwise cosine similarities between the aggregated perturbation embeddings of all perturbed genes, we selected the top 5% and bottom 5% of gene pairs (excluding self-relationships) from the cosine similarity distribution as ‘predicted relationships’. We then calculated the recall as the proportion of these predicted relationships over all relationships in the annotation source. If annotated relationships were spread randomly throughout the cosine distribution, this would produce a recall of 0.1, so that value is used as a baseline. For Fig. 1c, we have 314, 460 and 530 annotated relationships within chromosome arms and 6,713, 11,502 and 11,376 between chromosome arms for Reactome, HuMap and CORUM, respectively.

Statistics and reproducibility

No statistical method was used to predetermine sample size. No data were excluded from the analyses. The experiments were not randomized and investigators were not blinded to allocation during experiments and outcome assessment.

Analysis was performed using Python (v3.9), numpy (v1.22.3), pandas (v1.4.2), scipy (v1.10.1) and statsmodels (v0.13.2). The visualizations were generated using matplotlib (v3.5.2), scikit-image (v0.18.3) and seaborn (v0.12.3). scRNA-seq data were processed with STAR (v2.7.7a) and scanpy (v1.9.3), and the CNV elements were determined with infercnvpy (v0.4.1). The list of cancer genes was downloaded from OncoKB (v4.4), https://www.oncokb.org/cancer-genes.

To quantify the level of proximity bias for cell-line splits in the DepMap data, we computed the effect size of intra-arm relationships being larger than inter-arm relationships, according to a Monte Carlo variant of the Brunner–Munzel test²⁹. In particular, we compute:

$$P\left(\mathrm{intra}\,\mathrm{arm} > \mathrm{inter}\,\mathrm{arm}\right)=\frac{{\sum }_{i=1}^{N}{\mathrm{rank}}(\mathrm{intra}\,\mathrm{ar}\mathrm{m}_{i})}{{NM}}-\frac{N+1}{2M}$$

where N is the number of intra-arm samples, M is the number of inter-arm samples and rank(x) is the index of sample x when all samples are sorted, with ties being assigned their average rank. To perform bootstrapping, we set N = M and repeatedly sampled N random pairs of genes from both the intra-arm and inter-arm populations for T trials. The dependency scores for each gene pair across cell lines are used to compute a cosine similarity, which is then used as the ranking metric. Except where otherwise noted, we utilized N = 500 and T = 100. For the final score, we took the empirical mean over these trials. The numbers of cell lines used in each split and other statistical details are included in Supplementary Table 3.

To quantify proximity bias at the genome level, the Brunner–Munzel test statistic was computed between the full inter-arm and intra-arm cosine similarity distributions across all chromosome arms (without sampling). The statistic estimates the probability that the intra-arm cosine similarity is greater than the inter-arm cosine similarity, and the P values shown in Fig. 1g,h and Extended Data Fig. 4b are one tailed and Bonferroni corrected. For arm-level metrics, we restricted the distributions to gene pairs within a given arm versus pairs with one gene on that arm. The sample sizes can be determined by the number of genes on each chromosome arm, which are given in Supplementary Table 5 for the rxrx3, cpg0016 and DepMap 22Q4 data, and tests were performed only for chromosome arms with at least 20 within-chromosome-arm pairs.

Gene-level proximity bias quantification was computed by the Brunner–Munzel test between all cosine similarities of the gene to all other genes on the same arm and the cosine similarities to genes on other arms (no sampling). The correlation between proximity bias and rank gene location between the telomere and the centromere was computed using the Spearman rank correlation of the Brunner–Munzel statistics and the ordered position of the genes on the chromosome arm. For Fig. 2b (rxrx3 data) and Extended Data Fig. 2c (cpg0016 data), the sample sizes are given in Supplementary Table 6.

Analysis of public scRNA-seq data

Files containing scRNA-seq AnnData objects for two CRISPR–Cas9 and three CRISPRi screens were downloaded from PapalexiSatija2021_eccite_arrayed_RNA.h5ad³⁶, FrangiehIzar2021_RNA.h5ad³⁷, ReplogleWeissman2022_rpe1.h5ad⁴⁴, TianKampmann2021_CRISPRi.h5ad⁴⁵ and AdamsonWeissman2016_GSM2406681_10×010.h5ad⁴⁶ as harmonized by the scPerturb study³⁵. Each dataset was loaded using the ‘scanpy’ package⁷¹ (v1.9.3). and we determined the CNV events using the ‘infercnvpy’ package (v0.4.1), which is a scalable implementation of ‘inferCNV’ of the Trinity CTAT project (https://github.com/NCIP/Trinity_CTAT). For each of the datasets, we identified genes that, when perturbed, led to chromosomal loss proximal to the target gene (Table 1 and Supplementary Table 2). A perturbed gene is identified as resulting in proximal chromosomal loss in a cell if 70% or more of the neighboring 150 genes in the same chromosome are lost (that is, inferred CNV value of ≤−0.05) in that cell.

It is crucial to ensure that the chromosomal loss at a targeted locus is specifically due to the perturbation at that locus and is not a nonspecific loss commonly observed when other genes on distal chromosomes are targeted. Observed proximal loss when a gene G is perturbed is considered specific if the fraction of cells exhibiting loss near G is a minimum of three standard deviations away from the average fraction of cells demonstrating chromosomal loss near G when any gene within the dataset is perturbed. For each of the genes called using this process, the fraction of impacted cells (that is, cells that lose more than 70% of the 150 genes near the perturbed gene) is reported (Supplementary Table 2). Finally, we generate the heat maps in Fig. 2d,e using the ‘infercnvpy’ package (https://github.com/icbi-lab/infercnvpy).

Analysis of bulk RNA sequencing data

Illumina reads were aligned to the hg38 reference and gene-level counts generated for each sample using the gencode_v33 gene annotation set and stored in AnnData objects together with sample perturbation metadata using STAR v2.7.7a (ref. ⁷²) and scanpy (v1.9.3)⁷¹. The CNV events and determination of chromosomal loss near the on-target cut site were determined using the method described above for the scRNA-seq analysis. A total of 45 intron-cutting CRISPR guide perturbations were tested in HUVEC cells using this method (Supplementary Table 1).

Analysis of DepMap data

Four datasets from DepMap (https://depmap.org/portal) were analyzed: CRISPR–Cas9 23Q2 (Chronos pipeline with arm normalization correction), CRISPR–Cas9 22Q4 (Chronos pipeline), CRISPR–Cas9 19Q3 (CERES pipeline) and shRNA (DEMETER2 pipeline)^17,18. For each gene, we treated the dependency scores across different cell lines as a feature vector and computed the cosine similarity to other genes in the dataset (Fig. 3a). To reduce the bias toward essential genes from cosine similarity computation, we recentered the dependency scores for each gene by subtracting the mean from all cell lines. The cosine similarity values were then quantile normalized to a normal distribution with mean zero and standard deviation 0.2 for display purposes only.

To disentangle proximity-bias driven effects from CNVs dependencies, we reanalyzed the DepMap 22Q4 data (which has been corrected for copy number using the Chronos pipeline¹⁷) by estimating the Brunner–Munzel probabilities across pairs of chromosome arms with almost no CNVs. For each pair of autosomal chromosome arms, we restricted to cell lines where less than 1% of genes had copy-number calls outside of (1.75, 2.25) (Supplementary Tables 5 and 6) and calculated the arm-level Brunner–Munzel probabilities (without sampling) for each pair (1,482 total values). These were then compared with Brunner–Munzel probabilities using all cell lines in both the full CRISPR–Cas9 data and shRNA data, as well as to randomly sampled cell lines matching the number of cell lines without CNVs (ten random sampling runs) (Extended Data Fig. 3b).

Additionally, we performed an analysis of the difference in proximity bias effect observed in WT cell lines as compared with AMP or LOF cell lines using the 22Q4 DepMap data. This was further stratified by looking at both a TP53 WT background and a TP53 partial LOF background. This was performed by restricting to the cell lines that are TP53 WT or partial LOF (copy number ≤1.5) before computing the proximity bias score. To select the cell lines matching LOF or AMP, we first subset to cell lines that have copy number ≤1.5 or ≥2.5 for LOF and AMP, respectively. Then, for AMP, we additionally subset to cell lines that do not have a nonsense or frame shift mutation, as these cell lines may have LOF despite the AMP. To control for different numbers of cell lines in different conditions, we computed a bootstrap version of the Brunner–Munzel proximity bias metric described above with an additional level of sampling. This consists of taking a random sample of 20 of cell lines in each condition, constructing maps and calculating the Brunner–Munzel probabilities for S = 4 trials. Additionally, we increase the number of trials T (of genes used to compute cosine similarity) to 200 and exclude any conditions with fewer than 25 cell lines. Once this metric was computed for all 602 genes with a sufficient number of cell lines to meet the above conditions, we computed the difference between the WT and each mutant condition in each background condition, subset to the top 200 genes and repeat the computation with S = 32 trials (Supplementary Table 3). The initial list of cancer genes was downloaded from OncoKB⁷³ (v4.4, https://www.oncokb.org/cancer-genes).

Gene-set enrichment was conducted for Gene Oncology biological process with ShinyGO (v0.77) ⁵⁸ using all genes with mean differences in whole-genome level Brunner–Munzel statistics above 0.2 for increases in proximity bias and below −0.1 for decreases in proximity bias in each condition. All default settings were used (false discovery rate cutoff of 0.05; number of processes to show; process size: minimum 2, maximum 2,000; selected by false discovery rate and sorted by fold enrichment) (Supplementary Table 4).

For the data in Supplementary Tables 7 and 8, the cell lines were grouped by disease annotations and then for each gene, we performed t-tests on the dependency values, first between all cell lines with that annotation and then after restricting to cell lines with copy number calls within (1.75, 2.25). False discovery rate correction was applied to all tests (Benjamini–Hochberg). We report all significant genes in B-lymphoblastic leukemia and lymphoma, neuroblastoma and renal cell carcinoma for both the 22Q4 and 23Q2 datasets along with the gene expression values (in transcripts per million reads). The 23Q2 data have a correction applied for the proximity bias effect, so differences between these two tables highlight the reduction in potential false-positive disease-specific driver genes.

Geometric method for proximity bias reduction

For each chromosome arm, we first calculate the feature-wise mean across all unexpressed genes on that arm and then subtract those means from the features for each gene located on that arm. The gene locations were identified by National Center for Biotechnology Information RefSeq transcript locations against the hg38 reference assembly. The unexpressed genes were defined as those with zFPKM (fragments per kilobase of transcript per million mapped reads) <−3.0 in normalized bulk RNA sequencing of the given cell type before any CRISPR–Cas9 treatment.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

Raw images, metadata and deep-learning-derived embeddings for rxrx3 are available at https://rxrx.ai; however, the majority of gene identities are currently masked due to commercial considerations. Due to contractual obligations with partners, Recursion is unable to share additional data underlying the rxrx3 analyses or the bulk RNA sequencing in Fig. 2f. cpg0016 is available as part of the JUMP Cell Painting datasets available from the Cell Painting Gallery on the Registry of Open Data on Amazon Web Services at https://registry.opendata.aws/cellpainting-gallery/. The scRNA-seq datasets are available through scPerturb (https://scperturb.org/). The DepMap data are available at https://depmap.org/portal/download/all/. The JUMP CP data were downloaded from S3 using python (1.22.3) and pandas (1.4.2) at https://registry.opendata.aws/cellpainting-gallery/. The hg38 gene locations and annotations were downloaded from the University of California, Santa Cruz Genome Browser at https://genome.ucsc.edu/cgi-bin/hgTables. The shinyDepMap data were downloaded and processed using R (v4.1) at https://depmap.org/portal/download/all/. The files containing scRNA-seq AnnData objects for two CRISPR–Cas9 and three CRISPRi screens were downloaded from Zenodo at https://zenodo.org/record/7416068 (ref. ⁷⁴). Source data are provided with this paper.

Code availability

The Python-based data analysis source code to reproduce plots from public datasets is available at https://github.com/recursionpharma/proxbias and Zenodo at https://doi.org/10.5281/zenodo.10795539 (ref. ⁷⁵). Due to contractual obligations with partners, Recursion is unable to share code to reproduce plots from the rxrx3 data or Fig. 2f.

References

Raguram, A., Banskota, S. & Liu, D. R. Therapeutic in vivo delivery of gene editing agents. Cell 185, 2806–2827 (2022).
CAS PubMed PubMed Central Google Scholar
Jackson, A. L. & Linsley, P. S. Recognizing and avoiding siRNA off-target effects for target identification and therapeutic application. Nat. Rev. Drug Discov. 9, 57–67 (2010).
CAS PubMed Google Scholar
Becker, S. & Boch, J. TALE and TALEN genome editing technologies. Gene Genome Editing 2, 100007 (2021).
CAS Google Scholar
Adikusuma, F. et al. Large deletions induced by Cas9 cleavage. Nature 560, E8–E9 (2018).
CAS PubMed Google Scholar
Geng, K. et al. Target-enriched nanopore sequencing and de novo assembly reveals co-occurrences of complex on-target genomic rearrangements induced by CRISPR–Cas9 in human cells. Genome Res 32, 1876–1891 (2022).
PubMed PubMed Central Google Scholar
Kosicki, M., Tomberg, K. & Bradley, A. Repair of double-strand breaks induced by CRISPR–Cas9 leads to large deletions and complex rearrangements. Nat. Biotechnol. 36, 765–771 (2018).
CAS PubMed PubMed Central Google Scholar
Zuccaro, M. V. et al. Allele-specific chromosome removal after Cas9cleavage in human embryos. Cell 183, 1650–1664 e15 (2020).
CAS PubMed Google Scholar
Papathanasiou, S. et al. Whole chromosome loss and genomic instability in mouse embryos after CRISPR–Cas9 genome editing. Nat. Commun. 12, 5855 (2021).
CAS PubMed PubMed Central Google Scholar
Tsuchida, C. A. et al. Mitigation of chromosome loss in clinical CRISPR–Cas9-engineered T cells. Cell 186, 4567–4582 e20 (2023).
CAS PubMed PubMed Central Google Scholar
Nahmad, A. D. et al. Frequent aneuploidy in primary human T cells after CRISPR–Cas9 cleavage. Nat. Biotechnol. 40, 1807–1813 (2022).
CAS PubMed PubMed Central Google Scholar
Tsherniak, A. et al. Defining a cancer dependency map. Cell 170, 564–576 e16 (2017).
CAS PubMed PubMed Central Google Scholar
Girish, V. & Sheltzer, J. M. A CRISPR competition assay to identify cancer genetic dependencies. Bio Protoc. 10, e3682 (2020).
CAS PubMed PubMed Central Google Scholar
Lin, A., Giuliano, C. J., Sayles, N. M. & Sheltzer, J. M. CRISPR/Cas9 mutagenesis invalidates a putative cancer dependency targeted in on-going clinical trials. eLife 6, e24179 (2017).
PubMed PubMed Central Google Scholar
Meyers, R. M. et al. Computational correction of copy number effect improves specificity of CRISPR–Cas9 essentiality screens in cancer cells. Nat. Genet. 49, 1779–1784 (2017).
CAS PubMed PubMed Central Google Scholar
Aguirre, A. J. et al. Genomic copy number dictates a gene-independent cell response to CRISPR–Cas9 targeting. Cancer Discov. 6, 914–929 (2016).
CAS PubMed PubMed Central Google Scholar
Munoz, D. M. et al. CRISPR screens provide a comprehensive assessment of cancer vulnerabilities but generate false-positive hits for highly amplified genomic regions. Cancer Discov. 6, 900–913 (2016).
CAS PubMed Google Scholar
Dempster, J. M. et al. Chronos: a cell population dynamics model of CRISPR experiments that improves inference of gene fitness effects. Genome Biol. 22, 343 (2021).
PubMed PubMed Central Google Scholar
Iorio, F. et al. Unsupervised correction of gene-independent cell responses to CRISPR–Cas9 targeting. BMC Genomics 19, 604 (2018).
PubMed PubMed Central Google Scholar
Bray, M. A. et al. Cell Painting, a high-content image-based assay for morphological profiling using multiplexed fluorescent dyes. Nat. Protoc. 11, 1757–1774 (2016).
CAS PubMed PubMed Central Google Scholar
Celik, S. et al. Biological cartography: building and benchmarking representations of life. Preprint at bioRxiv https://doi.org/10.1101/2022.12.09.519400 (2022).
Behan, F. M. et al. Prioritization of cancer therapeutic targets using CRISPR–Cas9 screens. Nature 568, 511–516 (2019).
CAS PubMed Google Scholar
Fay, M. M. et al. RxRx3: phenomics map of biology. Preprint at bioRxiv https://doi.org/10.1101/2023.02.07.527350 (2023).
Chandrasekaran, S. N. et al. JUMP Cell Painting dataset: morphological impact of 136,000 chemical and genetic perturbations. Preprint at bioRxiv https://doi.org/10.1101/2023.03.23.534023 (2023).
Carpenter, A. E. et al. CellProfiler: image analysis software for identifying and quantifying cell phenotypes. Genome Biol. 7, R100 (2006).
PubMed PubMed Central Google Scholar
Gillespie, M. et al. The reactome pathway knowledgebase 2022. Nucleic Acids Res. 50, D687–D692 (2022).
CAS PubMed Google Scholar
Drew, K., Wallingford, J. B. & Marcotte, E. M. hu.MAP 2.0: integration of over 15,000 proteomic experiments builds a global compendium of human multiprotein assemblies. Mol. Syst. Biol. 17, e10016 (2021).
PubMed PubMed Central Google Scholar
Giurgiu, M. et al. CORUM: the comprehensive resource of mammalian protein complexes–2019. Nucleic Acids Res. 47, D559–D563 (2019).
CAS PubMed Google Scholar
Raftopoulou, C. et al. Karyotypic flexibility of the complex cancer genome and the role of polyploidization in maintenance of structural integrity of cancer chromosomes. Cancers 12, 591 (2020).
CAS PubMed PubMed Central Google Scholar
Brunner, E. & Munzel, U. The nonparametric Behrens–Fisher problem: asymptotic theory and a small-sample approximation. Biom. J. 42, 17–25 (2000).
Google Scholar
Alanis-Lobato, G. et al. Frequent loss of heterozygosity in CRISPR–Cas9-edited early human embryos. Proc. Natl Acad. Sci. USA 118, e2004832117 (2021).
CAS PubMed PubMed Central Google Scholar
Cullot, G. et al. CRISPR–Cas9 genome editing induces megabase-scale chromosomal truncations. Nat. Commun. 10, 1136 (2019).
PubMed PubMed Central Google Scholar
Leibowitz, M. L. et al. Chromothripsis as an on-target consequence of CRISPR–Cas9 genome editing. Nat. Genet. 53, 895–905 (2021).
CAS PubMed PubMed Central Google Scholar
Przewrocka, J., Rowan, A., Rosenthal, R., Kanu, N. & Swanton, C. Unintended on-target chromosomal instability following CRISPR–Cas9 single gene targeting. Ann. Oncol. 31, 1270–1273 (2020).
CAS PubMed Google Scholar
Weisheit, I. et al. Detection of deleterious on-target effects after HDR-mediated CRISPR editing. Cell Rep. 31, 107689 (2020).
CAS PubMed Google Scholar
Peidli, S. et al. scPerturb: harmonized single-cell perturbation data. Nat. Methods 21, 531–540 (2024).
CAS PubMed Google Scholar
Papalexi, E. et al. Characterizing the molecular regulation of inhibitory immune checkpoints with multimodal single-cell screens. Nat. Genet. 53, 322–331 (2021).
CAS PubMed PubMed Central Google Scholar
Frangieh, C. J. et al. Multimodal pooled Perturb-CITE-seq screens in patient models define mechanisms of cancer immune evasion. Nat. Genet. 53, 332–341 (2021).
CAS PubMed PubMed Central Google Scholar
Amici, D. R. et al. FIREWORKS: a bottom-up approach to integrative coessentiality network analysis. Life Sci. Alliance 4, e202000882 (2021).
CAS PubMed Google Scholar
Shimada, K., Bachman, J. A., Muhlich, J. L. & Mitchison, T. J. shinyDepMap, a tool to identify targetable cancer genes and their functional connections from Cancer Dependency Map data. eLife 10, e57116 (2021).
CAS PubMed PubMed Central Google Scholar
Chen, B., Wang, L., Zhao, J., Tan, C. & Zhao, P. Expression and prognostic significance of EPAS-1 in renal clear cell carcinoma. Ann. Ital. Chir. 92, 671–675 (2021).
PubMed Google Scholar
Adams, C. M., Clark-Garvey, S., Porcu, P. & Eischen, C. M. Targeting the Bcl-2 family in B cell lymphoma. Front. Oncol. 8, 636 (2018).
PubMed Google Scholar
Decaesteker, B. et al. SOX11 regulates SWI/SNF complex components as member of the adrenergic neuroblastoma core regulatory circuitry. Nat. Commun. 14, 1267 (2023).
CAS PubMed PubMed Central Google Scholar
Larson, M. H. et al. CRISPR interference (CRISPRi) for sequence-specific control of gene expression. Nat. Protoc. 8, 2180–2196 (2013).
CAS PubMed PubMed Central Google Scholar
Paddison, P. J., Caudy, A. A., Bernstein, E., Hannon, G. J. & Conklin, D. S. Short hairpin RNAs (shRNAs) induce sequence-specific silencing in mammalian cells. Genes Dev. 16, 948–958 (2002).
CAS PubMed PubMed Central Google Scholar
Replogle, J. M. et al. Mapping information-rich genotype–phenotype landscapes with genome-scale Perturb-seq. Cell 185, 2559–2575 e28 (2022).
CAS PubMed PubMed Central Google Scholar
Tian, R. et al. Genome-wide CRISPRi/a screens in human neurons link lysosomal failure to ferroptosis. Nat. Neurosci. 24, 1020–1034 (2021).
CAS PubMed PubMed Central Google Scholar
Adamson, B. et al. A multiplexed single-cell CRISPR screening platform enables systematic dissection of the unfolded protein response. Cell 167, 1867–1882 e21 (2016).
CAS PubMed PubMed Central Google Scholar
Cullot, G. et al. Cell cycle arrest and p53 prevent ON-target megabase-scale rearrangements induced by CRISPR–Cas9. Nat. Commun. 14, 4072 (2023).
CAS PubMed PubMed Central Google Scholar
Haapaniemi, E., Botla, S., Persson, J., Schmierer, B. & Taipale, J. CRISPR–Cas9 genome editing induces a p53-mediated DNA damage response. Nat. Med. 24, 927–930 (2018).
CAS PubMed Google Scholar
Ihry, R. J. et al. p53 inhibits CRISPR–Cas9 engineering in human pluripotent stem cells. Nat. Med. 24, 939–946 (2018).
CAS PubMed Google Scholar
Enache, O. M. et al. Cas9 activates the p53 pathway and selects for p53-inactivating mutations. Nat. Genet. 52, 662–668 (2020).
CAS PubMed PubMed Central Google Scholar
Bowden, A. R. et al. Parallel CRISPR–Cas9 screens clarify impacts of p53 on screen performance. eLife 9, e55325 (2020).
PubMed PubMed Central Google Scholar
Sinha, S. et al. A systematic genome-wide mapping of oncogenic mutation selection during CRISPR–Cas9 genome editing. Nat. Commun. 12, 6512 (2021).
CAS PubMed PubMed Central Google Scholar
Zhao, R., Choi, B. Y., Lee, M. H., Bode, A. M. & Dong, Z. Implications of genetic and epigenetic alterations of CDKN2A (p16(INK4a)) in cancer. EBioMedicine 8, 30–39 (2016).
PubMed PubMed Central Google Scholar
Xia, Y. et al. Dominant role of CDKN2B/p15INK4B of 9p21.3 tumor suppressor hub in inhibition of cell-cycle and glycolysis. Nat. Commun. 12, 2047 (2021).
CAS PubMed PubMed Central Google Scholar
Stampone, E. et al. Genetic and epigenetic control of CDKN1C expression: importance in cell commitment and differentiation, tissue homeostasis and human diseases. Int. J. Mol. Sci. 19, 1055 (2018).
PubMed PubMed Central Google Scholar
Toledo, F. & Wahl, G. M. MDM2 and MDM4: p53 regulators as targets in anticancer therapy. Int. J. Biochem. Cell Biol. 39, 1476–1482 (2007).
CAS PubMed PubMed Central Google Scholar
Ge, S. X., Jung, D. & Yao, R. ShinyGO: a graphical gene-set enrichment tool for animals and plants. Bioinformatics 36, 2628–2629 (2020).
CAS PubMed Google Scholar
Spinner, N. B., Conlin, L. K., Mulchandani, S. & Emanuel, B. S. in Emery and Rimoin’s Principles and Practice of Medical Genetics 6th edn (eds Rimoin, D. et al.) Ch. 45 (Academic Press, 2013).
Lee-Six, H. et al. The landscape of somatic mutation in normal colorectal epithelial cells. Nature 574, 532–537 (2019).
CAS PubMed Google Scholar
Jaiswal, S. & Ebert, B. L. Clonal hematopoiesis in human aging and disease. Science 366, eaan4673 (2019).
CAS PubMed PubMed Central Google Scholar
Kessler, M. D. et al. Common and rare variant associations with clonal haematopoiesis phenotypes. Nature 612, 301–309 (2022).
CAS PubMed PubMed Central Google Scholar
Nunez, J. K. et al. Genome-wide programmable transcriptional memory by CRISPR-based epigenome editing. Cell 184, 2503–2519 e17 (2021).
CAS PubMed PubMed Central Google Scholar
Tao, J., Bauer, D. E. & Chiarle, R. Assessing and advancing the safety of CRISPR–Cas tools: from DNA to RNA editing. Nat. Commun. 14, 212 (2023).
CAS PubMed PubMed Central Google Scholar
Fiumara, M. et al. Genotoxic effects of base and prime editing in human hematopoietic stem cells. Nat. Biotechnol. https://doi.org/10.1038/s41587-023-01915-4 (2023).
Vassilev, L. T. et al. In vivo activation of the p53 pathway by small-molecule antagonists of MDM2. Science 303, 844–848 (2004).
CAS PubMed Google Scholar
Wienert, B. & Cromer, M. K. CRISPR nuclease off-target activity and mitigation strategies. Front. Genome Ed. 4, 1050507 (2022).
PubMed PubMed Central Google Scholar
Yin, J. et al. Cas9 exo-endonuclease eliminates chromosomal translocations during genome editing. Nat. Commun. 13, 1204 (2022).
CAS PubMed PubMed Central Google Scholar
Kawamata, M., Suzuki, H. I., Kimura, R. & Suzuki, A. Optimization of Cas9 activity through the addition of cytosine extensions to single-guide RNAs. Nat. Biomed. Eng. 7, 672–691 (2023).
CAS PubMed PubMed Central Google Scholar
Sypetkowski, M. et al. RxRx1: a dataset for evaluating experimental batch correction methods. Preprint at https://arxiv.org/abs/2301.05768 (2023).
Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).
PubMed PubMed Central Google Scholar
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
CAS PubMed Google Scholar
Chakravarty D. et al. OncoKB: a precision oncology knowledge base. JCO Precis. Oncol. https://doi.org/10.1200/PO.17.00011 (2017).
Peidli, S. et al. scPerturb single-cell perturbation data: RNA and protein h5ad files. Zenodo https://zenodo.org/record/7416068 (2022).
Lazar, N. H. et al. Nathanlazar/proxbias_clone: initial release. Zenodo https://doi.org/10.5281/zenodo.10795539 (2024).
Babon, J. J., Varghese, L. N. & Nicola, N. A. Inhibition of IL-6 family cytokines by SOCS3. Sem. Immunol. 26, 13–19 (2013).
Google Scholar
Yamamoto, T. et al. The nuclear isoform of protein-tyrosine phosphatase TC-PTP regulates interleukin-6-mediated signaling pathway through STAT3 dephosphorylation. Biochem. Biophys. Res. Commun. 297, 811–817 (2002).
CAS PubMed Google Scholar
Tzavlaki, K. & Moustakas, A. TGF-β signaling. Biomolecules 10, 487 (2020).
CAS PubMed PubMed Central Google Scholar
Tecalco-Cruz, A. C., Ríos-López, D. G., Vázquez-Victorio, G., Rosales-Alvarez, R. E. & Macías-Silva, M. Transcriptional cofactors Ski and SnoN are major regulators of the TGF-β/Smad signaling pathway in health and disease. Sig. Transduct. Target Ther. 3, 15 (2018).
Google Scholar
Haeusler, R. A., McGraw, T. E. & Accili, D. Biochemical and cellular properties of insulin receptor signalling. Nat. Rev. Mol. Cell Biol. 19, 31–44 (2018).
CAS PubMed Google Scholar

Download references

Acknowledgements

We thank G. H. L. Roberts, H. Donnella, S. Guler, T. Ahfeldt, Y. Chong and K. Thomas for their help and discussions in generating this manuscript and the incredible Recursion lab and engineering teams for design and execution of experiments and storage and processing of data.

Author information

Authors and Affiliations

Recursion, Salt Lake City, UT, USA
Nathan H. Lazar, Safiye Celik, Lu Chen, Marta M. Fay, Jonathan C. Irish, James Jensen, Conor A. Tillinghast, John Urbanik, William P. Bone, Christopher C. Gibson & Imran S. Haque

Authors

Nathan H. Lazar
View author publications
You can also search for this author in PubMed Google Scholar
Safiye Celik
View author publications
You can also search for this author in PubMed Google Scholar
Lu Chen
View author publications
You can also search for this author in PubMed Google Scholar
Marta M. Fay
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan C. Irish
View author publications
You can also search for this author in PubMed Google Scholar
James Jensen
View author publications
You can also search for this author in PubMed Google Scholar
Conor A. Tillinghast
View author publications
You can also search for this author in PubMed Google Scholar
John Urbanik
View author publications
You can also search for this author in PubMed Google Scholar
William P. Bone
View author publications
You can also search for this author in PubMed Google Scholar
Christopher C. Gibson
View author publications
You can also search for this author in PubMed Google Scholar
Imran S. Haque
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

I.S.H. contributed to the writing—initial draft. All authors contributed to the writing, review and editing. N.H.L., C.A.T., I.S.H., S.C., J.C.I., L.C., J.J., J.U., M.M.F. and W.P.B. contributed to the formal analysis, visualization, validation and methodology. C.C.G. and I.S.H. contributed to the supervision.

Corresponding author

Correspondence to Imran S. Haque.

Ethics declarations

Competing interests

All authors are current or former employees of Recursion Pharmaceuticals, Inc., and have received real or optional ownership interest in the company.

Peer review

Peer review information

Nature Genetics thanks Jinghui Zhang, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Rxrx3 and cpg0016 example pathways.

a, Table of genes shown in Fig. 1b. b, c, d, Heatmaps of rxrx3 (above diagonal) and cpg0016 (below diagonal) data for selected biological pathways with corresponding pathway diagrams for JAK/STAT, TGF-beta, and insulin biology. Data not present in cpg0016 shown in gray. b, Example interleukin (IL) 6 pathway: IL6, IL6R, IL6ST, JAK1, and STAT3 activate the IL-6 signaling pathway⁷⁶. CRISPR-Cas9 targeting of these genes leads to similar cellular phenotypes and produces positive cosine similarities (red squares between IL6, IL6R, IL6ST, JAK1, and STAT3 in the heatmap). As inhibitors of the IL-6 pathway, SOCS3 and PTPN2 demonstrate a negative cosine similarity to the pathway components IL6/IL6R/IL6ST/JAK1/STAT3, especially in the rxrx3 HUVEC data (blue squares)^76,77. c, Example TGF-beta pathway: FURIN, TGFB1, TGFBR1, TGFBR2, SMAD2, and SMAD3 activate the TGF-beta pathway. CRISPR-Cas9 targeting of these genes gives a similar cellular phenotype and high cosine similarity (red squares in the heatmap)⁷⁸. SMURF2 and SKI inhibit the TGF-beta pathway, and CRISPR-Cas9 targeting of SMURF2 and SKI, show high cosine similarity to each other but negative cosine values to FURIN, TGFB1, TGFBR1, TGFBR2, SMAD2, and SMAD3 (blue squares)^78,79. Grey squares indicate genes not present in the cpg0016 data. d, Example insulin pathway: INSR, IRS2, AKT1, PIK3CA transmit insulin signaling⁸⁰. CRISPR-Cas9 targeting these factors gives similar phenotypes and therefore they are highly cosine-similar (red squares between INSR/IRS2/AKT1/PIK3CA in the heatmap). GRB10 and FOXO1 inhibit insulin signaling, reflected in negative cosine similarities between CRISPR-Cas9 targeting of GRB10 and INSR, IRS2, AKT1 and PIK3CA (blue squares)⁸⁰. Grey squares indicate genes not present in the cpg0016 data.

Extended Data Fig. 2 Additional figures showing proximity bias effects in rxrx3 and cpg0016.

a, Distribution plots of within-chromosome and between-chromosome cosine similarity for rxrx3 (left) and cpg0016 data (right). The within-chromosome distribution is shifted toward the positive, which was the initial indication that some bias was present. b, Scatterplot of gene-level one-sided Brunner-Munzel probabilities versus relative chromosome-arm position for three chromosome arms in the cpg0016 dataset. The value on the y-axis estimates the probability of an intra-chromosome-arm relationship involving a given gene having a higher cosine similarity than an inter-chromosome-arm relationship involving the same gene. c, Spearman correlations in plots similar to b across all chromosome arms for the cpg0016 data. The height of the bar for each arm agrees well with the degree of fading in diagonal blocks in Fig. 1c below the diagonal. Colors show Bonferroni-corrected p-values. d, Bulk RNA sequencing gene count depletion for cells treated with a ZNF394-targeting guide relative to untreated cells in 10-gene blocks across chromosome 7. Decreased expression is evident on the telomeric side of the cut site.

Extended Data Fig. 3 Proximity bias quantification in DepMap data.

a, Split genome-wide heatmap of the DepMap 22Q4 (above diagonal) and 23Q2 (below diagonal) CRISPR data. Both are processed with the Chronos pipeline¹⁷ but 23Q2 has an additional correction applied to reduce proximity bias. 22Q4 has 1,078 cell lines, 23Q2 has 1,095 cell lines. b, Distributions of arm-level Brunner-Munzel probabilities for maps built using pairs of autosomal chromosome arms (741 pairs represented twice in blue, green and red distributions). Blue distribution is built using all DepMap 22Q4 CRISPR-Cas9 cell lines, orange samples random cell lines matching the numbers from the green distribution (10 random sampling runs), green uses only cell lines with less than 1% of genes having copy number calls outside of [1.75, 2.25] (counts in Supplementary Table 4), and red uses all cell lines in the DepMap shRNA data. Two-sided Mann-Whitney U tests between all distributions are highly significant (p-value < 1E-10) for all pairwise comparisons. c, Boxen plots showing distributions of the ratio of within-chromosome-arm relationships to between-arm relationships for each chromosome arm across different gene annotation sets (n = 39 chromosome arms for all sources. Boxes are drawn at each octile with outliers outside of those boxes. The 19Q3 DepMap data show a much higher ratio of within-arm to between-arm annotations, suggesting a systematic bias to the predicted associations. d, Counts of gene-gene relationships within and between chromosome arms for shinyDepMap 19Q3 data (blue and tan, n = 4747 and 9271 respectively) and public annotation sets (Reactome, HuMap, and CORUM) (green and red, n = 98 and 2825 respectively)^25,26,27. DepMap predicts a much higher proportion of within-chromosome-arm relationships than are found in public annotation sets (odds ratio 0.068, Fisher exact p-value < 1e-10).

Extended Data Fig. 4 Proximity bias correction in DepMap Data.

a, Split genome-wide heatmap built from 625 CRISPR cell lines, 190 shRNA cell lines, and 11,169 genes shared between CRISPR and shRNA datasets in the DepMap 19Q3 and DEMETER2 v6 data. CRISPR-Cas9 data are shown above the diagonal and shRNA data below. No proximity bias signal is visible in the shRNA data. b, Quantification of proximity bias in the DepMap shRNA dataset with colors showing Bonferroni-corrected p-values from the one-sided arm-level Brunner-Munzel test. Only a few chromosome arms display significant deviation of intra- versus inter-chromosome-arm similarities contrasting with the CRISPR data shown in Fig. 3b.

Extended Data Fig. 5 Proximity bias quantification for additional genes between BTG2 and MDM4.

Box and scatter plots of whole-genome level proximity bias quantification by Brunner-Munzel intra-arm vs inter-arm probability from DepMap 22Q4 data with cell lines stratified by gene status. Each point represents a bootstrap sample of cell lines, 128 bootstraps were run for each condition. Box plots show the median, lower and upper quartile with whiskers extending to the furthest points within 1.5 times the inner quartile range. Detailed test statistics in Supplementary Table 3. a, All genes with sufficient data in order of chromosome position between BTG2 and MDM4 on chromosome 1q; wild-type (WT) vs amplification (AMP) in TP53 WT background (FMOD: WT n = 175, AMP n = 87, PRELP: WT n = 174, AMP n = 87, OPTC: WT n = 172, AMP n = 87, ATP2B4: WT n = 167, AMP n = 87, LAX1: WT n = 171, AMP n = 87, ZC3H11A: WT n = 172, AMP n = 87, SNRPE: WT n = 177, AMP n = 86, SOX13: WT n = 169, AMP n = 87, ETNK2: WT n = 170, AMP n = 87, REN: WT n = 173, AMP n = 87, KISS1: WT n = 176, AMP n = 87, GOLT1A: WT n = 173, AMP n = 87, PLEKHA6: WT n = 158, AMP n = 86, PPP1R15B: WT n = 171, AMP n = 87, PIK3C2B: WT n = 158, AMP n = 87). b, Same as a, in TP53 LOF background (FMOD: WT n = 182, AMP n = 81, PRELP: WT n = 183, AMP n = 81, OPTC: WT n = 182, AMP n = 81, ATP2B4: WT n = 180, AMP n = 80, LAX1: WT n = 184, AMP n = 80, ZC3H11A: WT n = 182, AMP n = 80, SNRPE: WT n = 189, AMP n = 80, SOX13: WT n = 180, AMP n = 80, ETNK2: WT n = 186, AMP n = 80, REN: WT n = 186, AMP n = 79, KISS1: WT n = 189, AMP n = 79, GOLT1A: WT n = 185, AMP n = 79, PLEKHA6: WT n = 174, AMP n = 78, PPP1R15B: WT n = 178, AMP n = 79, PIK3C2B: WT n = 172, AMP n = 79). Genes with less than 25 cell lines in a given condition are not shown.

Source data

Supplementary information

Reporting Summary

Peer Review File

Supplementary Tables 1–8

Supplementary Table 1: CRISPR–Cas9 perturbations in HUVEC measured by bulk RNA sequencing, with the fraction of wells per perturbation that exhibited specific copy loss telomeric or centromeric to the target gene. Supplementary Table 2: Genes resulting in specific chromosome loss around the cut region for each CRISPR–Cas9 or CRISPRi scRNA-seq dataset and direction tested (3′ or 5′). Supplementary Table 3: Statistics for DepMap proximity bias gene driver analysis for the highest positive and highest negative proximity bias effect size when comparing mutant to WT cell lines, stratified against TP53 WT or mutant background. Supplementary Table 4: Statistics for DepMap proximity bias gene ontology biological process enrichment for the highest positive and highest negative proximity bias effect size when comparing mutant to WT cell lines, stratified against TP53 WT or mutant background. Supplementary Table 5: Number of genes and Brunner–Munzel sample sizes (number of pairs) on each chromosome arm for each dataset (rxrx3, cpg0016, DepMap 19Q3 and DepMap 22Q4). rxrx3 and cpg0016 data were restricted only to genes found in both datasets. Additionally, the sample sizes for analysis of DepMap data quantifying proximity bias in each pair of chromosome arms with less than 1% CNVs. Supplementary Table 6: Brunner–Munzel probabilities for each pair of chromosome arms in the DepMap 22Q4 data when restricting to cell lines without CNVs, using all cell lines or considering RNAi data. Supplementary Table 7: Detailed statistics for DepMap search for false-positive dependencies due to proximity bias in both the 22Q4 data. Supplementary Table 8: Detailed statistics for DepMap search for false-positive dependencies due to proximity bias in both the 23Q2 data.

Source data

Source Data Fig. 4

Statistical source data for Fig. 4a–c.

Source Data Extended Data Fig. 5 and Supplementary Table 3

Statistical source data for Extended Data Fig. 5 and Supplementary Table 3.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Lazar, N.H., Celik, S., Chen, L. et al. High-resolution genome-wide mapping of chromosome-arm-scale truncations induced by CRISPR–Cas9 editing. Nat Genet 56, 1482–1493 (2024). https://doi.org/10.1038/s41588-024-01758-y

Download citation

Received: 30 June 2023
Accepted: 18 April 2024
Published: 29 May 2024
Issue Date: July 2024
DOI: https://doi.org/10.1038/s41588-024-01758-y
Springer Nature America, Inc.

This article is cited by

A benchmark of computational methods for correcting biases of established and unknown origin in CRISPR-Cas9 screening data
- Alessandro Vinceti
- Raffaele M. Iannuzzi
- Francesco Iorio
Genome Biology (2024)

High-resolution genome-wide mapping of chromosome-arm-scale truncations induced by CRISPR–Cas9 editing

Abstract

Similar content being viewed by others

Main

Results

Genome-wide profiling recapitulates known biology

Knockouts show increased similarity within chromosome arms

Proximity bias arises from chromosome-arm truncation

Proximity bias confounds therapeutic target identification

Proximity bias is dependent on Cas9 nuclease activity

DepMap data connects proximity bias and cell cycle

Geometric correction reduces proximity bias

Discussion

Methods

Cell culture

CRISPR–Cas9 editing

Phenomic imaging

Phenomic analysis

Generation of gene-level representations for rxrx3 HUVEC data

Generation of gene-level representations for cpg0016 U2OS data

Normalization of cosine distributions across maps

Benchmarking of known relationships

Statistics and reproducibility

Analysis of public scRNA-seq data

Analysis of bulk RNA sequencing data

Analysis of DepMap data

Geometric method for proximity bias reduction

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Navigation