Background

CCCTC-binding factor (CTCF) is an 11-zinc finger protein that directionally binds to a well-defined DNA motif [1,2,3]. Although CTCF was initially reported as a transcription factor [4, 5], subsequent studies found it served as an insulator [6,7,8]. Nowadays, CTCF has been reported being involved in multiple cellular processes, such as transcriptional regulation, insulator activity, epigenetic regulation, organization of chromatin architecture and X chromosome inactivation [1, 2, 9,10,11,12,13,14,15]. CTCF activates and silences gene expression by preventing the spread of heterochromatin and blocking of unrelated enhancer-promoter interactions [1, 16]. Interestingly, mammalian genome is organized into thousands of highly self-interacting topologically associated domains (TADs) with CTCF demarcating individual TAD boundary [17]. Analyses based on high-resolution interaction matrix further identified ~ 10,000 chromatin loops ranging ~185Kb in human genome, anchoring by convergent CTCF-binding motif-pair at TAD boundaries [14]. In addition, chromatin loops mediated by CTCF and cohesin can tether distal enhancers to gene promoters and regulate its target gene expression [14, 15, 18, 19]. Further studies showed CTCF-mediated DNA loop could determine the chromatin architecture, with anchor-genes almost exclusively being housekeeping genes, while loop-genes being tissue-specific genes [15, 20]. For instance, inversion of a CTCF-binding site reconfigured the topology of chromatin loops and activated gene expression by creating a new chromatin loop [21, 22].

Recent studies showed that disruption of CTCF binding in mammalian cells resulted in loss of TADs [19], genomic instability [23], developmental failure [24, 25] and other malfunctions. Our recent study demonstrated that CTCF played an important role in stabilizing enhancer-promoter interaction and reducing the gene expression noises in mammalian cells [18]. In particular, we found that CTCF-KD or deletion of CTCF binding sites led to increased variation of cellular expression of GATA3, CD28, CD90 and CD5 [18]. However, the genome-wide change of cell-to-cell variation after CTCF-KD remains unknown. In this study, we conducted single cell RNA-seq on both WT and CTCF-KD cells to investigate the changing landscape of cell-to-cell variation at a genome-wide scale. Interestingly, GO terms including regulation of transcription, DNA binding and Zinc finger were significantly enriched in CTCF-KD specific highly variable genes. We also found that cellular variation-increased genes were significantly enriched in down-regulated genes, indicating knockdown of CTCF simultaneously reduced the expression level and increased the expression noises of its regulated genes.

Results

Efficient CTCF knockdown and single cell RNA-seq

We knocked down CTCF in EL4 cells by short hairpin RNA (shRNA). Western blotting showed a dramatic decrease of the CTCF protein level in shCTCF #1 and shCTCF #2 compared to shRNA luciferase controls (shLuc) (Fig. 1a). Quantitative polymerase chain reaction (qPCR) revealed the mRNA levels in shCTCF #1 and shCTCF #2 had been reduced to 38 and 40% of that in shLuc, respectively (Fig. 1b). These results confirmed the efficient knockdown of CTCF expression in shCTCF#1 and shCTCF#2, with a significant reduction consistently at both RNA and protein level.

Fig. 1
figure 1

Knockdown of CTCF and schema of single cell sequencing. a. Western blot analysis of CTCF in luciferase control (shLuc) and CTCF-KD cells (shCTCF#1 and shCTCF#2). EL4 cells were infected with retroviral particles encoding GFP and an shRNA targeting CTCF or a control sequence for 5 days. b. Real-time quantitative PCR (RT-qPCR) analysis of CTCF expression in luciferase control (shLuc) and knockdown (shCTCF#1 and shCTCF#2) cells. The expression level of CTCF was normalized to GAPDH. c. Schema of single cell RNA sequencing using Fluidigm C1 system

In order to investigate the changing landscape of cell-to-cell variation after CTCF knockdown, we successfully conducted single cell RNA-seq for shLuc#1, shLuc #2, shCTCF #1 and shCTCF#2 using 4 integrated fluidics circuits (IFCs) (Fig. 1c). We noticed the gene expression levels of pooled shLuc single cells were highly correlated with that of bulk data from our previous study [18] (r2 = 0.86; Additional file 1: Figure S1A). The gene expression of pooled single cells from shLuc #1 was also highly correlated with that of shLuc #2 (r2 = 0.87; Additional file 1: Figure S1B). In addition, the gene expression of pooled single cell repeats and bulk cell repeats in CTCF-KD cells were highly correlated (Additional file 1: Figure S1C-S1D).

Systematic differences between WT cells and CTCF-KD cells

A total of 95 cells, including 24 cells from shLuc #1, 24 cells from shCTCF #1, 23 cells from shLuc #2 and 24 cells from shCTCF #2, were kept for further analyses after quality control. We conducted principal component analysis (PCA) on 11,361 genes shared by those 95 cells (Fig. 2a). The coordination of cells from experiment1 and experiment2 on PCA projection is not significantly different (P = 0.8; student’s t-test), indicating no obvious batch effect between experiment1 and experiment2. Further analysis showed that WT cells and CTCF-KD cells were distinguishable on PCA projection and concentrated in two different clusters on PC1 (Fig. 2b,c), implying a systematic difference of gene expression profiles between CTCF-KD and WT cells. We also noticed a correlation between CTCF expression level and its coordination on the PCA projection (using the first 10 PCs), among which CTCF expression level and PC2 exhibited the highest correlation (Additional file 2: Figure S2A) (r2 = 0.18, P = 0.22 × 10− 6).

Fig. 2
figure 2

Systematic difference between CTCF-KD and WT cells. a. No significant batch effect among the experimental repeats based on PCA analysis. b. CTCF-KD cells were largely distinguishable from WT cells on PCA projection. c. Distribution of individual WT cells and individual CTCF-KD cells on PC1. d. Heatmap of differentially expressed genes (TOP 20) between WT cells and CTCF-KD cells

We further calculated the differential gene expression between WT and CTCF-KD cells using edgeR [26]. In total, we identified 195 up-regulated and 107 down-regulated genes in CTCF-KD cells compared to WT cells (Additional file 2: Figure S2B). Heatmap of the most differentially expressed genes between WT cells and CTCF-KD cells exhibited a cellular heterogeneity within the same cell population (Fig. 2d). The most enriched gene categories in down-regulated genes include glycolytic processing, Prolyl 4-hydroxylase α subunit, iron-dependent dioxygenase and carbon metabolism (Additional file 2: Figure S2D). Compared to the most enriched gene categories in up-regulated genes include RNA binding, ribosome biogenesis, WD40 repeat domain and RNA processing (Additional file 2: Figure S2C), consistent to our recent study based on bulk data in some way [18].

CTCF knockdown changed the landscape of cell-to-cell variation

In order to distinguish true signals of cellular variation from technical noise, we calculated the expression noise of each gene (σ22) [27, 28]. The expression noises exhibited two distinct scaling properties: negative association with expression at low expression levels and no association at high expression levels (log2TPM > 1) (Fig. 3a). We filtered out low expressed genes (log2TPM ≤ 1) to reduce the impact of technological noise, resulting in 7843 genes for further analysis (Fig. 3a). Coefficient of variation (CV) was calculated to measure cell-to-cell variation of each gene across the cell populations. The distribution of alterations of cell-to-cell variation pre-and post-CTCF knockdown followed a normal distribution (Additional file 3: Figure S3). We identified 602 cellular variation increased genes and 890 cellular variation decreased genes after CTCF knockdown by mean ± SD (Fig. 3b).

Fig. 3
figure 3

Identification and analyses of genes showing cellular variation changes after CTCF KD. a. The relationship between expression level and noise level of reference genes. Genes with low cellular variation was used for further analyses. b. Scatter plot of cellular variation-changed genes after CTCF KD. Blue and red indicate the variation decrease and variation increase, respectively. c. The top 15 gene categories enriched in variation increased genes. d. The top 15 gene categories enriched in variation decreased genes

GO analyses showed that variation-increased genes were significantly enriched in GO terms such as regulation of transcription, DNA binding, zinc finger proteins, covalent chromatin modification and transcription factor binding (Fig. 3c). In fact, almost all genes involved in DNA binding, zinc finger proteins and transcription factor binding are transcription factors. The significant enrichment of transcription factors in CTCF-KD-specific highly variable genes potentially indicates a high sensitivity of transcription factors to CTCF level. Dysregulation of certain transcription factors possibly explains why knockdown of CTCF leads to a systematic change of gene expression. In contrast, variation-decreased genes were significantly enriched in housekeeping genes related GO terms such as rRNA processing, DNA repair, tRNA processing and RNA modification (Fig. 3d). The enrichment of housekeeping genes in WT-specific highly variable genes potentially indicates a higher cellular variation of cell activity in WT cells compared to CTCF-KD cells.

CTCF knockdown simultaneously altered expression level and cellular variation of its regulated genes

We identified 302 expression-changed genes and 1490 cellular variation-changed genes pre-and-post CTCF knockdown. Next, we were interested to examine whether those cellular variation-changed genes were enriched in expression-changed genes. Venn diagram showed that 47 genes out of total 107 down-regulated genes exhibited increased cellular variation (Fig. 4), which were significantly over-represented (P = 0.29 × 10− 23, χ2 test). These results indicate CTCF knockdown simultaneously reduced the expression level and increased the gene expression noise. Among those genes with decreased expression and increased cellular variation, EGR1 and JUNB played an important role in maintaining the cell type-specific gene regulation. For instance, EGR1 belongs to the EGR family of C2H2-type zinc-finger proteins, and encodes a nuclear protein that participates in transcriptional regulation.

Fig. 4
figure 4

Genes showing cellular variation change tend to be differentially expressed genes. Genes with expression decrease and cellular variation increase were significantly over-represented (P = 0.29 × 10− 23, χ2 test). Genes with decreased cellular variation and increased expression level were significantly over-represented (P = 0.48 × 10− 23, χ2 test)

Meanwhile, there were 96 genes out of the total 195 up-regulated genes exhibiting decreased cellular variation (Fig. 4), which were also significantly over-represented (P = 0.48 × 10− 23, χ2 test). The 96 genes with decreased cellular variation and increased expression level were significantly enriched in poly(A) RNA binding, rRNA processing, WD40, purine nucleobase biosynthetic processing, rRNA methylation and RNA methyltransferase activity. It is obvious that those enriched GO terms were associated with basic cellular functions belonging to housekeeping genes. Taken together, our results clearly indicate that distortion of CTCF expression could simultaneously change the gene expression level and cell-to-cell variation of its regulated genes.

Furthermore, we identified CTCF binding sites using CTCF ChIP-seq data in WT EL4 cells from our previous study [18]. We identified each gene-associated CTCF by counting the CTCF binding sites within 20Kb of the transcriptional start site (TSS) for each gene. The numbers of gene-associated CTCF of variation-increased genes are significantly higher than that of variation-decreased genes (P = 0.0033; Wilcoxon test), and are significantly higher than that of variation-unchanged genes (P = 0.16 × 10− 6; Wilcoxon test). The numbers of gene-associated CTCF of variation-decreased genes do not show significant differences from that of variation-unchanged genes (P = 0.5; Wilcoxon test). These results suggest that genes regulated by multiple CTCF binding sites tend to possess a higher cellular variation after CTCF knockdown.

Discussion

CTCF plays an important role in chromatin structure organization and regulation of gene expression [14,15,16,17]. In this study, we used single cell RNA-seq to analyze genome-wide gene expression profiles of WT and CTCF-KD cells at single cell resolution. Indeed, WT cell population and CTCF-KD cell population showed distinct concentration on PC1, indicating that knockdown of CTCF resulted in a systematic impact on the genome-wide gene expression profile. These results further implied that CTCF contributed to key functions in controlling the genome-wide gene regulation. We generated the genome-wide landscape of cell-to-cell variation in both WT and CTCF-KD cells. After comparing cell-to-cell variations between WT and CTCF-KD cells, we identified those genes showing a significant change of cellular variation after CTCF knockdown. Interestingly, the cellular variation-increased genes are significantly enriched in expression-decreased genes, suggesting that CTCF-medicated promoter-enhancer interaction did not only play an important role in maintaining the expression of its regulated genes, but also reduced their expression noise.

In this study, we identified numerous genes with an obvious change of cellular variation after CTCF knockdown, although the knockdown efficiency is moderate. We expected more genes showing alterations of cellular variation with a more efficient CTCF knockdown, which would enhance the conclusion in this study. Interestingly, the variation-increased genes were significantly enriched in GO terms such as chromatin DNA binding, zinc finger proteins and zinc ion binding, implying the expression noise of those zinc finger proteins were strongly increased after CTCF knockdown. The increased cellular variation of zinc finger proteins potentially indicates a high sensitivity of zinc finger proteins to CTCF expression level or cellular environmental change within the cell. In fact, the majority of those zinc finger proteins were transcription factors that played an important role in the regulation of cell type-specific gene expression. Our observation that CTCF knockdown fluctuated the expression of many transcriptional factors further explained why disruption of CTCF expression led to pronounced biological effects such as developmental failure [24, 25]. Taken together, our findings provide convincing evidence that CTCF serves as a key player in stabilizing the gene expression noise of zinc finger related genes.

Conclusion

We conducted single cell RNA-seq on both wild type (WT) cells and CTCF-Knockdown (CTCF-KD) cells using Fluidigm C1 system. Principal component analysis of single cell RNA-seq data showed that WT and CTCF-KD cells concentrated in two different clusters on PC1, indicating a systematic difference of gene expression profiles between WT and CTCF-KD cells. Interestingly, GO terms including regulation of transcription, DNA binding, zinc finger and transcription factor binding were significantly enriched in CTCF-KD-specific highly variable genes, implying tissue-specific genes such as transcription factors were highly sensitive to CTCF level. Dysregulation of transcription factors possibly explains why knockdown of CTCF leads to a systematic change of gene expression. In contrast, housekeeping genes such as rRNA processing, DNA repair and tRNA processing were significantly enriched in WT-specific highly variable genes, potentially due to a higher cellular variation of cell activity in WT cells compared to CTCF-KD cells. We further noticed that cellular variation-increased genes were significantly enriched in down-regulated genes, indicating CTCF knockdown simultaneously reduced the expression levels and increased the expression noise of its regulated genes. To our knowledge, this is the first attempt to explore the genome-wide landscape of cellular variation after CTCF knockdown. Our study not only advances our understanding of CTCF function in maintaining gene expression and reducing expression noise, but also provides a framework for examining gene function.

Methods

Cell culture

EL4 (ATCC® TIB-39™), derived from lymphoma in C57BL/6 N mouse, was purchased from ATCC. EL4 cells were cultured in DMEM (GIBCO Invitrogen) supplemented with 50 IU/mL penicillin, 50 mg/mL streptomycin (GIBCO Invitrogen) and 10% heat-inactivated calf serum (Sigma, USA). Cultures were maintained by replacement of fresh medium every 3 days, and cell density was kept between 1 X 105 and 1 X 106 cells/mL.

Knockdown of CTCF by shRNA

Knockdown of CTCF was performed using Lentiviral-mediated short hairpin RNA (shRNA) in EL4 cells as described previously [18]. Briefly, 293T cells were co-transfected with an envelope plasmid (pLP/VSVG) to generate lentiviral particles. The medium containing lentiviral particles was harvested after 48 h transfection. EL4 cells were infected with the harvested shLuc and shCTCF retroviral particles packaged in GP2–293. GFP+ cells were sorted out to check the knockdown efficiency using RT-qPCR and Western blotting after 5 days of infection. The cell populations displaying efficient knockdown of CTCF were used for single cell RNA-seq.

The following shRNA sequences were used for CTCF knockdown: mouse CTCF-shRNA 1: 5′-GGTGCAATTGAGAACATTATA; mouse CTCF-shRNA 2: 5′-TGGACGATACCCAGATCATAA.

Western blot analyses

After thorough washing, the knockdown (shCTCF) cells and control (shLuc) were harvested for Western blotting analyses. Protein concentration of the cell lysates was measured using BCA kit (Pierce, Rockford, IL). Protein samples (40 μg/lane) were applied to SDS-PAGE followed by Western blotting against anti-CTCF antibody (07–729, Millipore), and anti-GAPDH antibody (sc-1616, Santa Cruz Biotechnology).

Quantitative real-time PCR

Total RNAs from the knockdown (shCTCF#1 and shCTCF#2) and control (shLuc) cells were extracted using miRNeasy Micro Kit (QIAGEN). cDNA was synthesized by using oligo (dT)20 and SuperScript III Reverse Transcriptase (Invitrogen) according to manufacturers’ instructions. RT-qPCR samples were mixed with the following Taqman probe mixture (Applied Biosystems) and run on a LightCycler 96 (Roche): Gapdh: Mm03302249_g1; CTCF: Mm00484027_m1. Results were normalized to the mRNA level of Gapdh.

Single cell RNA sequencing

Fluidigm C1™ Single-Cell Autoprep System (Fluidigm, South San Francisco, CA, USA) was used for single cell RNA-seq. In the initial experiment, shLuc#1 or shCTCF#1 were uploaded to a C1 integrated fluidics circuit (IFC) for cell capture, respectively. We checked the IFC to count the number of captured cells, and to distinguish between live and dead cells for later data processing. After successful completion of the second knockdown, cells from shLuc#2 and shCTCF#2 were treated similarly to the initial experiment. Single-cell RNA-seq with SMARTer protocol (Clontech, Mountain View, CA, USA) was prepared following Fluidigm manual ‘Using the C1 Single-Cell Auto Prep System to generate mRNA from Single Cells and Libraries for Sequencing’. The wells containing either zero or double cells were filtered out. We selected 24 cells with the highest quality from each IFC. The DNA materials obtained from the 96 single cells were sequenced on Illumina HiSeq 3000, as illustrated in Fig. 1c.

Reads mapping and quality control

Quality of the reads was assessed using FASTQC. All reads were aligned to the mouse genome (Ensemble version GRCm38.89) utilizing STAR v.2.5.2 [29, 30]. Unique mapping reads were allowed (using parameter --outFilterMultimapNmax). The alignments were used as input in HTSEQ v.0.9.1 [31] to count the number of reads mapping to each of the 24,057 ref-seq genes in each cell. We filtered out those low-quality cells from our dataset based on a threshold for a minimum of 3000 unique genes per cell. In total, the final dataset contained 95 cells (including 24 cells from shLuc #1, 24 cells from shCTCF #1, 23 cells from shLuc #2 and 24 cells from shCTCF #2) with a mean of 216,157 sequenced reads per cell. Transcripts per million (TPM) was used to normalize the gene expression level and log2 transformed. Furthermore, genes with log2(TPM + 1) > 1 in less than two individual cells were filtered out, leaving a total of 95 samples and 11,361 genes for further analyses.

Statistical analyses and gene enrichment analyses

For identification of genes with biologically significant cell-to-cell variation, we used η2 = σ22 (σ denotes standard deviation; μ denotes mean) to measure the noise of gene expression [27, 28, 32]. We filtered out those genes exhibiting a low expression (log2TPM ≤ 1), since the expression noise is inversely proportional to the expression when the gene expression level is low (log2TPM ≤ 1) (Fig. 3a), leading to 7843 genes remaining for further analyses. To examine any possible enrichment of particular gene categories and pathways in certain gene lists, GO enrichment analysis was performed using DAVID [33, 34]. Multiple comparison corrections for GO enrichment analyses were performed using Benjamini adjustment.

In this study, we used coefficient of variation (CV) to calculate the cell-to-cell variation. The variation increased genes are calculated by

$$ C{V}_{KD}-C{V}_{WT}> mea{n}_{C{V}_{KD}-C{V}_{WT}}+s{d}_{C{V}_{KD}-C{V}_{WT}} $$

while the variation decreased genes are calculated by

$$ C{V}_{KD}-C{V}_{WT}< mea{n}_{C{V}_{KD}-C{V}_{WT}}-s{d}_{C{V}_{KD}-C{V}_{WT}} $$