Introduction

Genome-wide gene expression studies have revealed that several genes are regulated in a cell cycle-specific manner1,2,3,4,5. Many of these genes are involved in basic cellular processes, such as cell cycle control, DNA repair, DNA replication, and chromosome segregation2,6. One example is the cyclins, which in complex with cyclin-dependent kinases (CDKs), control cell cycle progression. The cyclins are periodically expressed throughout the cell cycle; the E-type cyclins CCNE1 and CCNE2 are G1/S-specific2,7,8, whereas the B-type cyclins CCNB1 and CCNB2 are G2/M-specific2,9,10.

Long non-coding RNAs (lncRNAs) have emerged as important regulators of gene expression at the epigenetic, transcriptional, and translational level, and are recognized as key modulators in several cancers as well as neurological, autoimmune, and cardiovascular diseases. LncRNAs are more than 200 nucleotides long with little or no protein coding potential, and they generally have a more cell type-specific expression pattern compared to mRNAs11. LncRNAs are classified into five main categories according to where they are encoded in the genome in relation to mRNAs: sense, antisense, bi-directional, intergenic, and intronic. They are able to regulate the gene expression at the transcriptional level by acting as signals, guides, scaffolds, or decoys12. Most lncRNAs are transcribed by RNA Polymerase II (Pol II) and are poly-adenylated and 5′-capped like mRNAs13. A curated knowledgebase of lncRNAs from existing databases and published literature indicates that there are more than 268,000 human lncRNA transcripts, and only a few of them have known functional roles14.

Several lncRNAs are involved in the cell cycle, possibly through the regulation of other well-known cell cycle regulators like the cyclins, p53, retinoblastoma protein (RB), CDKs, and the CDK inhibitors15. Many lncRNAs have a cyclic expression profile, and the majority peak during the G1 phase. The cell cycle phase-specific expression of lncRNAs may be consistent with their phase-specific function. For instance, the expression of Metastasis Associated Lung Adenocarcinoma Transcript 1 (MALAT1) is highest during G1/S transition and mitosis, which is consistent with its independent function during these phases16. A known cell cycle-associated lncRNA is the growth arrest-specific transcript 5 (GAS5), which is found downregulated in several cancers where its overexpression results in cell cycle arrest or apoptosis. In prostate cancer GAS5 functions as a tumor suppressor that inhibits proliferation by targeting the CDK inhibitor p2717. Another known cell cycle-associated lncRNA is the zinc finger antisense 1 (ZFAS1). ZFAS1 can act as an oncogene in some cancer types18,19, and as a tumor suppressor in others20, possibly depending on both type of tissue and state of progression.

The function of the majority of lncRNAs is still unknown, as only about 500–1500 have been functionally characterized. As lncRNAs can bind DNA, RNAs, and proteins, they can regulate the gene expression at the transcriptional, post-transcriptional, and epigenetic level. They can serve as signaling molecules, as sponges by binding microRNAs (miRNAs), and inhibit miRNA-induced degradation of mRNAs, as guides by recruiting transcription regulators, or act as scaffolds by binding proteins to regulate gene expression21. As RNA molecules, lncRNAs need a physical proximity to exert their function. Thus, the subcellular localization of lncRNAs provides important information regarding their potential function22. For example, nuclear enriched lncRNAs can act as transcriptional and epigenetic regulators, but they are unlikely to have any coding potential since translation occurs in the cytoplasm. In general, most lncRNAs demonstrate a stronger nuclear localization than mRNAs23. Moreover, a higher cell type specificity means that targeting lncRNAs supposedly have less side-effects than targeting protein coding genes24.

Some lncRNAs can regulate gene transcription by modulating histone modifications25, although little is known about how lncRNAs are transcriptionally regulated26. Histone modifications such as H3K4me3 and H3K27me3 are considered key epigenetic regulators of transcription. H3K4me3 is a mark of actively transcribed genes and H3K27me3 is associated with silenced genes27. Previous studies have demonstrated that RNA sequencing (RNA-seq) combined with chromatin immunoprecipitation sequencing (ChIP-seq) are useful for detecting transcriptional fluctuations by correlating gene expression with changes in histone modifications28,29. A study from Wan et.al combined RNA-seq with ChIP-seq and identified differential peaks for H3K4me3 and H3K27me3 around the promoter area and at enhancer regions of differentially expressed lncRNAs in an Alzheimer’s disease mouse model compared to control, suggesting that most of these lncRNA genes were transcriptionally regulated by histone modifications30. Since the majority of lncRNAs are spatially and temporally regulated and expressed, ChIP-seq is a sensitive method for capturing these changes by identifying enriched peak regions of histone modifications and other transcriptional regulators26.

Building on our previous work, which identified protein coding genes with tissue-specific cell cycle-dependent expression1, we set out to identify lncRNAs with cell cycle-dependent expression and potential cell cycle functions. By combining RNA-seq and Pol II, H3K4me3, and H3K27me3 ChIP-seq data from synchronized HaCaT cells, we identified genes where expression and ChIP-seq signal were correlated and varied depending on the cell cycle. Genes with high correlation to Pol II or H3K4me3 were strongly enriched for cell cycle functions. From the RNA-seq data we identified 99 lncRNAs with cell cycle-dependent expression profiles; 57 of these had highly correlated Pol II or H3K4me3signals. We selected four lncRNAs for further functional characterization and showed that knockdown of these lncRNAs affected cell cycle phase distributions and reduced cell proliferation in multiple cell lines.

Results

Total RNA sequencing of HaCaT cells identifies cell cycle genes

Our group previously published a microarray-based study of cell cycle synchronized HaCaT cells identifying a set of genes with cell cycle-dependent expression and strong enrichment for known cell cycle functions1. Although our microarray-based study identified several genes with significant periodic expression patterns during the cell cycle, the microarrays precluded the detection of non-coding RNA (ncRNA) transcripts. To identify ncRNAs that are differentially expressed during cell cycle, we therefore set out to do total RNA-seq on synchronized HaCaT cells.

To study cell cycle regulated genes, it is essential to obtain a proper cell synchronization. In two independent experiments (Epi1, Epi2), we used a double thymidine block to arrest and subsequently release HaCaT cells at the G1/S transition, and collected cells every third hour for 24 h, covering approximately two cycles of DNA replication (Fig. 1A,B). Flow cytometry analyses of the cells’ DNA content showed that the distributions of cells through the cell cycle were reproducible between the two independent experiments (Fig. 1A). Upon release from thymidine block, approximately 90% of the cells progressed into S phase and continued through the cell cycle (Fig. 1B). Cells gradually lost synchrony, such that 70% instead of 90% of the cells synchronously re-entered the second S phase. Based on these results, we considered the HaCaT cells to be effectively synchronized and proceeded with gene expression profiling of the two experiments by using total RNA-seq.

Figure 1
figure 1

Total RNA-seq of synchronized HaCaT cells identifies cyclic gene expression patterns. (A) HaCaT cells were synchronized by double thymidine block in two independent experiments (Epi1, Epi2) and cell synchrony was monitored by flow cytometry of propidium iodide-stained cells. The figure shows superimposed DNA content profiles of the two replicate experiments for each time point. Horizontal axes show DNA content (arbitrary units) and vertical axes show the number of cells with the corresponding DNA content. Control is unsynchronized cells. (B) Percentage of cells assigned to G1, S, and G2/M phases for each of the time points analyzed. Values and error bars are averages and standard deviations (n = 2). (C) Percentage of cell cycle genes assigned to G1/S (21.5%), S (46.3%), G2 (10.1%), G2/M (3.8%), and M/G1 (18.2%) phases. (D) Heatmap showing the expression changes of the cell cycle genes relative to their median expression. Colour bars in the gene margin (y axis) show the genes’ assigned cell cycle phase; blue bars above the time points (x axis) show the time points having the highest percentage of S phase cells. (E) Distribution of the genes’ average RNA-seq expression for cell cycle genes in G1/S (n = 388), S (n = 835), G2 (n = 183), G2/M (n = 68), and M/G1 (n = 329) phases. ****p ≤ 0.0001 (Welch’s t-test, p-values for each phase group against S phase were Bonferroni corrected for multiple testing). (F) RNA-seq profiles for cell cycle genes CCNB1, CCNE2, PCNA, and TOP2A. (G) Relative expression profiles for CCNB1, CCNE2, PCNA, and TOP2A as measured by RT-qPCR in two new biological replicates (Val1, Val2; Figure S3). Spearman's correlation coefficients (ρ) were calculated based on the mean RNA-seq expression per time point (Epi1, Epi2) and mean RT-qPCR fold change per time point (Val1, Val2).

Genes that had a cell cycle profile were identified using PLS regression, as described in1. We identified 1803 genes with significant periodic expression patterns during the HaCaT cell cycle. These genes are referred to as cell cycle genes. Using the Ensembl BioMart tool (human genes dataset GRCh38.p13), we found 1692 protein coding genes and 108 ncRNAs (Supplementary Dataset 1). Three genes were not detected in BioMart. We used existing annotations2 to subdivide the genes into five main groups that represent G1/S, S, G2, G2/M, and M/G1 phases of the cell cycle. Of the 1803 cell cycle genes, 835 were assigned to S phase, whereas only 68 were assigned to G2/M phase (Fig. 1C); the genes’ cyclic expression patterns are presented in Fig. 1D. Examining the RNA-seq expression for the 1803 cell cycle genes by their different cell cycle phases, we observed a significant difference in gene expression between different phases (analysis of variance (ANOVA) p-value < 2e−16) and that S phase genes had a significantly lower expression level than the genes expressed in the other phases (Fig. 1E). This was in accordance with our previous findings1. When examining the genes with comparable expression levels to the cell cycle genes (that is, mean expression ≥ the cell cycle gene with lowest mean expression) but no significant cell cycle-dependent expression pattern (!CC genes), we found that these genes had a bimodal distribution of expression levels (Figure S1). We therefore divided this group of genes into genes with high expression level (!CC_high, n = 9259) and genes with low expression level (!CC_low, n = 3003) as reference sets for comparison against the cell cycle genes in further analysis.

To further examine cell cycle profiles at the single gene level, we selected four genes with cell cycle-specific expression patterns and known functions in the cell cycle: Proliferating cell nuclear antigen (PCNA), Cyclin E2 (CCNE2), DNA topoisomerase 2-alpha (TOP2A), and Cyclin B1 (CCNB1). PCNA is essential for DNA replication whereas CCNE2 plays a role in the G1/S transition of the cell cycle. Thus, both PCNA and CCNE2 are G1/S-specific genes. TOP2A is involved in processes such as chromosome condensation and chromatid separation whereas CCNB1 is a regulatory protein involved in mitosis. Thus, both TOP2A and CCNB1 are G2/M-specific genes. For all four genes, our RNA-seq data corresponded with their previously reported cell cycle-dependent expression (Fig. 1F). Technical validation by RT-qPCR confirmed the RNA-seq expression patterns (Figure S2). As a biological validation of the RNA-seq data, we did two new double thymidine block cell cycle synchronization experiments (Val1, Val2) in HaCaT cells (Figure S3). Analyses by RT-qPCR showed a high correlation between the original RNA-seq profiles and the new validation experiments (Fig. 1G). Thus, both the biological and technical validation indicated that gene expression profiling by total RNA-seq of HaCaT cells could successfully identify cell cycle genes.

ChIP sequencing maps dynamic transcriptional responses in HaCaT cell cycle

By gene expression profiling we identified a set of genes with cell cycle-dependent expression patterns in HaCaT cells. We wanted to further characterize the dynamic transcriptional response of these genes during the cell cycle. Specifically, we asked whether the genes’ expression changes were accompanied by similar cell cycle-dependent changes in Pol II occupancy at the genes’ transcription start sites (TSSs). Moreover, we asked whether the histone modifications H3K4me3, associated with actively transcribed genes, and H3K27me3, associated with silenced genes, also changed dynamically with gene expression changes. Using ChIP-seq we measured and quantified genome-wide Pol II, H3K4me3, and H3K27me3 occupancy during two independent synchronization experiments (Epi1, Epi2).

First, we performed a sanity check of our sequencing data, by correlating gene expression levels with histone modification patterns and Pol II occupancy in TSS regions for all genes expressed in our RNA-seq data. As Pol II and H3K4me3 both are linked to actively transcribed genes, we expected a positive correlation with gene expression. In contrast, we expected a negative correlation with gene expression for the H3K27me3 modification. Indeed, we found that enhanced Pol II and H3K4me3 signals correlated well with highly expressed genes (Spearman's correlation coefficient ρ = 0.61 and ρ = 0.64, respectively), whereas there was a negative correlation between gene expression and H3K27me3 signal (ρ = − 0.34) (Fig. 2A–C). Similarly, dividing the genes into quantiles based on their RNA-seq expression showed that highly expressed genes had the highest Pol II and H3K4me3 signals, whereas genes that were expressed at a low level or not expressed, had low levels of Pol II and H3K4me3 (Figure S4). The opposite was observed for H3K27me3, where genes that were not expressed or expressed at a low level had the highest H3K27me3 signal.

Figure 2
figure 2

ChIP-seq of synchronized HaCaT cells identifies dynamic H3K4me3, H3K27me3, and Pol II changes during cell cycle. (AC) Average RNA-seq expression (genes expressed > 0) plotted against average (A) Pol II (n = 30,744), (B) H3K4m3 (n = 30,729), and (C) H3K27me3 (n = 30,732) ChIP-seq signal normalized against input. Values are Spearman's correlation coefficients (ρ). (D) Percentage of cell cycle gene promoters containing Pol II, H3K4me3 (K4), or H3K27me3 (K27) marks and combined Pol II/K4, Pol II/K27, or bivalent K4/K27 marks. (E) The UCSC Genome browser (GRCh38/hg38) view of RNA-seq, Pol II, H3K4me3, H3K27me3, and input ChIP-seq data at the CCNB1 (NM_031966) gene locus. (F) Pol II ChIP-seq profiles for cell cycle genes CCNB1, CCNE2, PCNA, and TOP2A. (G) Biological validation of Pol II ChIP-seq data for CCNB1, CCNE2, PCNA, and TOP2A. Spearman's correlation coefficients (ρ) were calculated based on the mean Pol II ChIP-seq signal per time point (Epi1, Epi2) and mean Pol II ChIP-qPCR fold change per time point from the new biological replicates (Val1, Val2; Figure S3). (HJ) Distribution of the genes’ average (H) Pol II (n = 1737), (I) H3K4me3 (n = 1726), and (J) H3K27me3 (n = 802) ChIP-seq signal (normalized against input) for cell cycle genes in G1/S, S, G2, G2/M, and M/G1 phases. ns p > 0.05, *p ≤ 0.05; **p ≤ 0.01, ***p ≤ 0.001, ****p ≤ 0.0001 (Welch’s t-test, p-values for each phase group against S phase were Bonferroni corrected for multiple testing).

When investigating the Pol II signal, we observed low signal at time 12 h in the Epi2 experiment (Figure S4), but western blot analysis of the global Pol II protein level throughout the cell cycle indicated no differences in Pol II protein levels between different time points in the cell cycle or between the two independent synchronization experiments (Figure S5A). Therefore, we concluded that this was not a biological but a technical issue and decided to exclude the sample (12 h, Epi2) in the following analysis. We also investigated global H3K4me3 and H3K27me3 modification levels and found no changes during cell cycle or between the two independent synchronization experiments (Figure S5B).

As cell cycle control is a central housekeeping function, we expected the cell cycle genes we identified by RNA-seq to be enriched for H3K4me3 within their promoters, as this epigenetic mark is associated with actively transcribed genes. We also expected the genes to be enriched for Pol II indicating active promoters. Indeed, 96% of the cell cycle genes had promoter regions containing H3K4me3 or Pol II marks. Consistent with these genes being highly expressed, only 45% of the cell cycle gene promoters contained H3K27me3 marks (Fig. 2D). In 95% of the gene promoters we found combined Pol II/H3K4me3 marks, whereas 43% of the gene promoters contained combined Pol II/H3K27me3 or bivalent H3K4me3/H3K27me3 marks.

To further inspect the data quality, we visualized RNA-seq and ChIP-seq data in the UCSC Genome browser. CCNB1 was one of the well-known cell cycle genes we identified, and we observed RNA-seq reads for all exons and Pol II ChIP-seq signals were captured around the TSS (Fig. 2E). Additionally, H3K4me3 was highly enriched at this active promoter near TSS, whereas H3K27me3 signals were at the level of the background signal (input ChIP-seq). In contrast, Myelin transcription factor 1 (MYT1) is a transcription factor involved in the development of the nervous system, and this gene was not expressed in our HaCaT RNA-seq data (Figure S6). The H3K27me3 histone modification was captured at the MYT1 locus, consistent with this gene not being expressed. Pol II and H3K4me3 signals were observed, but not higher than the level of the background signal.

To characterize the cell cycle ChIP-seq profiles, we first focused on the four cell cycle genes, CCNB1, CCNE2, PCNA, and TOP2A. The Pol II ChIP-seq profiles for CCNB1 and TOP2A indicated these genes had the highest Pol II signal and active transcription at 9 and 24 h (Fig. 2F), which is consistent with both genes having increased expression in the G2/M phase of the cell cycle (Fig. 1F). Additionally, H3K4me3 signals for CCNB1 and TOP2A showed the same cyclic patterns peaking in G2/M phase (Figure S7A). CCNE2 and PCNA are both G1/S-specific genes, but only CCNE2 showed the expected Pol II (Fig. 2F) and H3K4me3 (Figure S7A) profiles peaking at 0–3 and 15–18 h. As PCNA displayed inconsistent RNA-seq and ChIP-seq profiles, we inspected the sequencing data for the PCNA locus in the UCSC Genome browser (Figure S8). Interestingly, PCNA has two different transcript variants separated by 6,664 bp between their TSSs. As illustrated in Figure S8, RNA-seq reads were from the short transcript variant (NM_182649) with a cyclic expression pattern peaking in the G1/S phase. In the ChIP-seq data analysis, the longest transcript variant is always selected (see “Methods”). Thus, the ChIP-seq signals we observed for PCNA were the weak signals from the longest transcript variant (NM_002592).This explains the inconsistency between RNA-seq and ChIP-seq data at the PCNA locus, and could also be the case for other genes with multiple TSSs separated by more than 5000 bp. The H3K27me3 ChIP-seq profiles for CCNB1, CCNE2, PCNA, and TOP2A showed negative H3K27me3 signal values, which are consistent with these genes being actively transcribed and highly expressed (Figure S7B).

To validate the Pol II ChIP-seq data for CCNB1, CCNE2, PCNA, and TOP2A we did Pol II ChIP-qPCR from the two new cell cycle synchronization experiments (Val1, Val2) in HaCaT cells. We designed PCR primers against the TSS region for these genes (Table S1); for PCNA we designed primers against the short transcript variant (NM_182649). For CCNB1, TOP2A, and CCNE2 we found good correspondence between the original Pol II ChIP-seq experiments and the Pol II ChIP-qPCR from the new validation experiments (Fig. 2G). Moreover, the Pol II ChIP-qPCR profile for the short PCNA transcript showed increased expression in the G1/S phase (Fig. 2G), which is consistent with the RNA-seq (Fig. 1F) and RT-qPCR (Fig. 1G) expression profiles for PCNA peaking in G1/S phase (Figure S9).

Since we observed a significant difference in gene expression between the S phase genes and the genes upregulated in the other cell cycle phases (Fig. 1E), we wanted to examine the distribution of Pol II, H3K4me3, and H3K27me3 ChIP-seq signals during cell cycle. We included cell cycle genes with positive ChIP-signal in at least two time points of the cell cycle. As for gene expression, we observed a significant difference in both Pol II and H3K4me3 signals between different cell cycle phases, with ANOVA p-values of 7.6e−11 and 1.6e−10, respectively. And as for gene expression, the S phase genes had significantly lower Pol II and H3K4me3 signals than the genes expressed in the other phases (Fig. 2H,I), with the exception of H3K4me3 signal for genes in G2 phase, which showed no significant difference from S phase genes. As expected, genes with high expression (!CC_high) showed significantly higher levels of Pol II and H3K4me3 signals than genes with low expression (!CC_low). There were no significant differences in H3K27me3 signal between different cell cycle phases, and as expected, !CC_low genes had significantly higher levels of H3K27me3 signal than !CC_high genes (Fig. 2J).

A set of cell cycle genes is highly correlated with Pol II and H3K4me3 changes and has strong enrichment for cell cycle functions

Having quantified gene expression and mapped histone modification patterns together with Pol II occupancy during the cell cycle, we did an integrated bioinformatic analysis of RNA-seq and ChIP-seq data for the cell cycle genes. Specifically, we asked to what extent the genes’ RNA-seq expression through the cell cycle were correlated with their ChIP-seq data and whether such correlation patterns were related to the genes’ functional role in the cell cycle. As a reference, we included the genes without cell cycle-dependent expression profiles, divided by their gene expression levels (!CC_high, !CC_low; Figure S1).

We found that all three gene sets (cell cycle (CC) genes, !CC_high, and !CC_low) on average showed a significant positive correlation for RNA-seq expression against Pol II ChIP-seq signal (Fig. 3A). Importantly, CC genes showed a significantly higher correlation than !CC genes. The cell cycle genes CCNB1 (Spearman’s ρ = 0.47), CCNE2 (ρ = 0.74), and TOP2A (ρ = 0.78) are examples of genes with high correlation values. We also found that CC genes and !CC_high genes, but not !CC_low genes, showed a significant positive correlation for RNA-seq expression against H3K4me3 ChIP-seq signal (Fig. 3B). Again, CC genes showed a significantly higher correlation than !CC genes. Correlation values for CCNB1, CCNE2, and TOP2A were ρ = 0.74, ρ = 0.24, and ρ = 0.58, respectively. There were no significant differences between the three gene sets for the H3K27me3 modification (Figure S10).

Figure 3
figure 3

A set of cell cycle genes correlate with Pol II and H3K4me3 signals. (A,B) Cumulative distributions of Spearman's correlation between RNA-seq expression and ChIP-seq signal through the cell cycle for the cell cycle (CC) genes, and other highly (!CC_high) and lowly (!CC_low) expressed genes. Marked on the x axis are the correlation values for CCNB1, CCNE2, and TOP2A. (A; Pol II) All three gene sets had a significant (p-value < 2.2e−16) positive correlation for RNA-seq against Pol II ChIP-seq. CC genes showed significantly higher correlation than !CC_high (p = 1.49e−11) and !CC_low (p-value < 2.2e−16). (B; H3K4me3) CC genes and !CC_high had a significant positive correlation for RNA-seq against H3K4me3 ChIP-seq with p-values of 3.83e−15 and < 2.2e−16, respectively. P-value for the !CC_low gene set was not significant (p = 0.2287). CC genes showed significantly higher correlation than !CC_high (p = 0.0003859) and !CC_low (p = 1.002e−08). Significant differences were determined by Student's t-test (unpaired, two-tailed) assuming unequal variances. (C) GO analysis for cell cycle genes. The results show GO biological process (BP), cellular component (CC), molecular functions (MF) terms, and KEGG and REACTOME pathways significantly enriched (p-values < 0.05) for cell cycle genes divided in five groups (Figure S11). (D) Cell cycle genes are divided into four groups based on the arrangement of TSSs and other nearby genes. Genes in group 1 (n = 127) have one single TSS and a TSS from at least one other gene within 10 kb. Genes in group 2 (n = 61) have one single TSS and no TSSs from other genes within 10 kb. Genes in group 3 (n = 356) have multiple TSSs all within 1 kb and a TSS from at least one other gene within 10 kb. Genes in group 4 (n = 152) have multiple TSSs all within 1 kb and no TSSs from other genes within 10 kb. (E) Odds ratios from Fisher's exact tests comparing the fraction of highly correlated genes in the groups (d) with all cell cycle genes with Pol II signals (n = 1735). Group 1 p = 0.006, group 2 p = 0.026, group 3 p = 0.028, and group 4 p = 0.041.

Further, we divided the cell cycle genes in high (ρ > 0.2), low (ρ < − 0.2), and middle (ρ > − 0.2 and ρ < 0.2) correlated genes, based on the genes’ Spearman correlation value. We did a gene ontology (GO) analysis for five different groups (Figure S11); high correlation for both Pol II and H3K4me3 (n = 400), high correlation for Pol II only (n = 423), high correlation for H3K4me3 only (n = 186), middle correlation for Pol II or H3K4me3 (n = 580), and low correlation for both Pol II and H3K4me3 (n = 125). We identified non-redundant terms among the top 20 significant terms within each GO category. The results showed that genes with high correlation for Pol II and H3K4me3 were specifically enriched for cell cycle-related terms, including cell cycle regulation, DNA replication, and nuclear division (Fig. 3C, Supplementary Dataset 2). Moreover, middle correlated genes were specifically enriched for cell signalling, including p53 signalling, whereas genes with low correlation were specifically enriched for initiation of replication. For H3K27me3, genes with high or middle correlation were weakly enriched for cell cycle functions whereas genes with low correlations were weakly enriched for the ribosome biogenesis pathway (Figure S12; Supplementary Dataset 3).

The combined RNA-seq and ChIP-seq analysis indicated distinct functions for cell cycle genes based on their correlation with Pol II and H3K4me3 signals, as highly correlated genes had strong enrichment for cell cycle functions. Nevertheless, we wondered to what extent the results were affected by ambiguous mapping of ChIP-seq signals. Specifically, we wondered if the correlation was affected by whether the gene had one or multiple annotated TSSs and by whether the gene was isolated or had annotated TSSs for neighboring genes within its TSS region. Thus, we divided the cell cycle genes into four different groups based on the arrangement of TSSs and other nearby genes (Fig. 3D). Genes in group 1 had one single TSS and other genes within 10 kb, whereas genes in group 2 had one single TSS and no other genes within 10 kb. Genes in group 3 had multiple TSSs all within 1 kb and other genes within 10 kb, whereas genes in group 4 had multiple TSSs all within 1 kb and no other genes within 10 kb. Compared with all cell cycle genes (CC), a larger fraction of genes with one TSS (groups 1, 2) were highly correlated with their Pol II signal (Fig. 3E); for genes in groups 3 and 4, the fractions were between those of all CC genes and single TSS genes. Moreover, for genes with clearly defined TSS areas and with no overlap with other genes (group 2 compared to 1 and group 4 compared to 3), there was a slightly larger fraction of highly correlated genes. We found no such differences for H3K4me3 and H3K27me3 modifications (Figures S13 and S14), possibly because these signals on average cover wider genomic regions than do Pol II.

Thus, whereas ambiguous gene annotations could explain low correlation levels for some genes with cell cycle functions, including PCNA, genes with cell cycle-dependent expression and correlated Pol II or H3K4me3 changes were strongly enriched for known cell cycle functions. This result suggested that these highly correlated genes are prime candidates for functional follow-up, so we focused on lncRNAs among these genes.

Combined RNA-seq and ChIP-seq analysis identifies cell cycle-associated lncRNAs

By combining RNA profiling with ChIP-seq data, we identified 97 cell cycle lncRNAs (Table S2). Similar to a set of 57 well-described cell cycle genes2, these lncRNAs had higher expression in proliferating tissues than in non-proliferating tissues (Fig. 4A), further supporting their potential role in proliferation. Thus, the combinational use of RNA-seq and ChIP-seq data enabled us to focus on lncRNAs with strong enrichment for cell cycle functions.

Figure 4
figure 4

Joint RNA-seq and ChIP-seq analysis identifies lncRNA affecting cell growth and cell cycle progression. (A) Relative tissue expression (log transcript per kilobase million) of known cell cycle genes2 and our cell cycle lncRNAs in selected tissues from the Genotype-Tissue Expression (GTEx) project. Gene expression values were normalized to relative expression values by subtracting the gene’s average expression across all GTEx tissues. (BE) The genomic loci of SNHG26, EMSLR, ZFAS1, and EPB41L4A-AS1 from the Ensembl Genome Browser (http://www.ensembl.org/). (F) RNA-seq profiles for SNHG26, EMSLR, EPB41L4A-AS1, and ZFAS1. (G) Relative expression profiles for SNHG26, EMSLR, EPB41L4A-AS1, and ZFAS1 as measured by RT-qPCR in two new biological replicates. Spearman's correlation coefficients (ρ) were calculated based on the mean RNA-seq expression per time point (Epi1, Epi2) and mean RT-qPCR fold change per time point (Val1, Val2). (H) Effect of siRNA-mediated knockdown of SNHG26 (siRNA A1; siRNA A2 for A549), EMSLR (siRNA R1), EPB41L4A-AS1 (siRNA E1), and ZFAS1 (siRNA Z1) on proliferation in four different cell lines. Data are the number of cells following siRNA treatment relative to control-treated cells (percentage of control) as measured by cell counting. Bars and error bars are mean and SEM of three or more independent replicates. (*p ≤ 0.05; **p ≤ 0.01 (Welch’s t-test, Bonferroni corrected for multiple testing). ANOVA p-values from the hierarchical, linear model: SNHG26 p = 6.9e−4, EMSLR p = 3.92e−4, EPB41L4A-AS1 p = 0.09, and ZFAS1 p = 0.371. (I) The distribution of cells in G1, S, and G2/M phases in response to knockdown of SNHG26 (siRNA A1), EMSLR (siRNA R1), EPB41L4A-AS1 (siRNA E1), and ZFAS1 (siRNA Z1) in HaCaT cells. Data are the difference in percentages of G1, S, and G2/M cells of siRNA-treated HaCaT cells to those of control-treated HaCaT cells. Bars and error bars are mean and SEM of three independent replicates. *p ≤ 0.05; **p ≤ 0.01 (Welch’s t-test, Bonferroni corrected for multiple testing).

We selected four lncRNAs for further functional characterization. The two lncRNA candidates Small nucleolar RNA host gene 26 (SNHG26) and E2F1 mRNA stabilizing lncRNA (EMSLR) were chosen as they had high correlation for Pol II and H3K4me3. SNHG26 is located between the genes TOMM7 and FAM126A on chromosome 7 (Fig. 4B), while EMSLR is located at the same chromosome between the genes FIS1 and IFT22 (Fig. 4C). The lncRNA ZNFX1 Antisense RNA 1 (ZFAS1) was identified among the candidates and since it has previously been reported to affect proliferation in different types of cancer31, we chose this lncRNA as a candidate. ZFAS1 is located between the genes DDX27 and ZNFX1 on chromosome 20 (Fig. 4D). In addition, we chose EPB41L4A Antisense RNA 1 (EPB41L4A-AS1) as a candidate, as this lncRNA had similar ChIP-seq characteristics as ZFAS1 (Figure S15, Table S2). EPB41L4A-AS1 is located between the genes NREP and EPB41L4A on chromosome 5 (Fig. 4E). Of these neighboring genes, FAM126, IFT22, and ZNFX1 had cell cycle-dependent expression (Figure S16). Tissue-specific expression for the four lncRNA candidates confirmed that all were highly expressed in proliferating tissues (Figure S17). All candidates were predicted to have a low protein coding probability, as assessed by the Coding Potential Assessment Tool32 (Table S3).

Since lncRNAs are dependent on proximity to their target molecules to exert their function, the subcellular localization can provide important information about their biological role. We used the lncAtlas online tool (https://lncatlas.crg.eu/) to determine the subcellular localization of the four candidate lncRNAs. The lncAtlas displays the subcellular localization of lncRNAs based on a relative concentration index (RCI = concentration of a gene, per unit mass of RNA between cytoplasm and nucleus) derived from RNA-seq data sets from available cell lines and cellular compartments from Human Gencode v2433. Based on RCI from several cell lines, the four lncRNA candidates were mainly localized in either the nucleus or cytoplasm. SNHG26 was enriched in the nucleus with an average RCI of − 2.03, whereas EMSLR, EPB41L4A-AS1, and ZFAS1 had the highest level in cytoplasm with an average RCI of 1.82, 1.12, and 1.43, respectively (Figure S18).

The RNA-seq data showed that all four lncRNA candidates had cell cycle-dependent expression profiles that peaked 12 h after release (Fig. 4F). At 12 h, the majority of the cells were in G1 phase (Fig. 1B), but when taking into account each gene’s overall profile compared with those of known cell cycle genes, the genes were assigned to G2/M (SNHG26), M/G1 (EMSLR, EPB41L4A-AS1), and G1/S (ZFAS1). These profiles were confirmed by technical validation by RT-qPCR (Figure S19), but only the profiles for SNHG26 and EMSLR could be validated in the two new independent double thymidine block synchronization experiments (Val1, Val2) in HaCaT cells (Fig. 4G). The phase-dependent expression profile of SNHG26 was further confirmed by FACS analysis of HaCaT cells, showing a significantly reduced expression in S phase compared to G1 phase (Figure S20). Whereas EMSLR and EPB41L4A-AS1 also showed reduced expression in S phase, these differences were not significant (p-values = 0.054 and 0.074, respectively).

To explore the functional role of the candidate lncRNAs in the cell cycle, we used siRNAs to knockdown each candidate (Figure S21) and cell counting to evaluate the effect of knockdown on proliferation. We tested four different cell lines—HaCaT, A549, LS411N, and DLD1—and used a hierarchical, linear model and ANOVA to calculate percent growth inhibition across the four cell lines assuming a random effect for each cell line (Fig. 4H). The growth in all four cell lines was significantly affected by SNHG26 knockdown, with an average growth inhibition of 35% (Fig. 4H). Antisense oligos are supposedly more effective than siRNAs for nuclear localized transcripts, therefore we used ASO-mediated knockdown to confirm the overall growth inhibitory effect of SNHG26 knockdown (average 55%; Figure S22). Proliferation was also significantly reduced in all cell lines in response to siRNA-mediated knockdown of EMSLR, with average growth inhibition of 28% and 37% for two independent siRNAs (Figs. 4H; S22). Meanwhile, the overall growth reduction in all four cell lines in response to EPB41L4A-AS1 and ZFAS1 knockdown was not significant. However, knockdown of EPB41L4A-AS1 resulted in a significant growth reduction in HaCaT, A549, and DLD1 cell lines (average 31%; p-value = 0.045; Fig. 4H), but the growth of LS411N cells was not affected. An independent siRNA gave similar results, although the growth reduction in HaCaT, A549, and DLD1 was not significant (p-value = 0.070; Figure S22). In response to ZFAS1 knockdown, the growth of the cancerous cell lines A549, LS411N, and DLD1 was significantly reduced by an average of 19% (p-value = 0.0054; Fig. 4H), which was validated using another siRNA (average 20%; p-value = 0.0016; Figure S22). Notably, both ZFAS1 siRNAs gave a slight growth increase in HaCaT cells (20% and 6%; Figs. 4H and S22).

Finally, we investigated the distribution of HaCaT cells in different cell cycle phases in response to siRNA-mediated knockdown of the candidate lncRNAs (Figs. 4I, S23). Whereas the effects of individual siRNAs varied somewhat per phase, the overall patterns in increased and decreased percentages of cells per phase were largely consistent between the two independent siRNAs. Specifically, ANOVA showed that knockdown of SNHG26 shifted the phases to a decrease in G1 and increase in G2/M cells; EMSLR shifted to increase in G1 and decrease in S; EPB41L4A-AS1 shifted to decrease in G1 and increase in G2/M; and ZFAS1 shifted to decrease in G1 and increase in G2/M (Figure S23). Finally, as SNHG26 was mainly enriched in the nucleus (Figure S18), we used ASO-mediated knockdown (Figure S24) and confirmed the significant reduction of cells in the G1 phase and the significant enrichment of cells in the G2/M phase (Figure S25).

Discussion

Whereas many lncRNAs have a tissue-specific expression, about 11% of lncRNAs are ubiquitously expressed, suggesting an involvement in cellular functions generally necessary for normal growth and development34,35. By sequencing total RNA through the cell cycle we identified 99 lncRNAs with cell cycle-dependent gene expression. Of these, 57 lncRNAs were highly correlated with changes in Pol II or H3K4me3 occupancy at their annotated TSS as measured by ChIP-seq, supporting that these lncRNAs are transcribed in a cycle-dependent manner and thereby are likely to have roles in cell proliferation. Indeed, protein coding genes with similar cell cycle-dependent expression and correlated Pol II changes were strongly enriched for cell cycle functions.

We have previously shown that a subset of protein coding cell cycle genes is cell type-specific in their expression and function1. Due to the high cell type specificity of many lncRNAs, we therefore expect that some of the lncRNAs identified as cell cycle-associated in HaCaT cells will differ somewhat in other cell types. Nevertheless, similar to protein coding genes with known cell cycle functions, the cell cycle lncRNAs had increased expression in samples from proliferating compared with non-proliferating tissues, suggesting that many of these lncRNAs will have roles in cell proliferation in multiple cell types. Indeed, several lncRNAs that are commonly dysregulated in cancer and that have known cell cycle functions, such as GAS5, ZFAS1, LINC00963, DANCR, and MALAT120,36,37,38,39, were among the 99 lncRNAs with a cyclic expression profile. Moreover, three of the top five lncRNAs (Table S2) have already been connected to proliferation and cancer in at least one functional study; CTD-2555C10.340, SNHG1641, and SNHG2642. Of these, SNHG16 is best categorized and is often found overexpressed in cancers where it is associated with poor prognosis43,44. Cao et.al.41 demonstrated that SNHG16 increases proliferation in bladder cancer by epigenetic silencing of p21, a potent CDK inhibitor with several functions in the cell cycle45.

To further investigate the cell cycle-associated lncRNAs, we used siRNAs to down-regulate four candidates—SNHG26, EMSLR, EPB41L4A-AS1, and ZFAS1—and evaluated the resulting effects on cell proliferation in four cell lines (HaCaT, A549, DLD1, and LS411N) and cell cycle progression in HaCaT.

SNHG26 was the top cell cycle-associated lncRNA candidate, based on its expression and corresponding correlation to Pol II and H3K4me3 ChIP-seq signals. Downregulation of SNHG26 led to growth inhibition in all four cell lines and affected the cell cycle distribution of HaCaT cells by increasing the number of cells in G2/M by more than 10 percentage points. Thus SNHG26 seems to be necessary for a normal G2/M progression, which is in line with the gene’s expression profile from RNA-seq, which was most similar to those of known G2/M phase genes. SNHG26 is also known as small nucleolar RNA host gene 26 (SNHG26), which belongs to a group of lncRNAs called small nucleolar RNA host genes (SNHGs) that are often found upregulated in cancers, and that have oncogenic functions connected to proliferation and cell cycle progression. Zimpta et al.44 reviewed the role of SNHGs focusing on oncogenic properties and potential clinical applications. Overexpression of SNHGs is often correlated with progression and lower overall survival in several cancers, including lung, stomach, bone, esophagus, liver, brain, and colon. Moreover, in line with our results, the oncogenic properties of SNHGs can be effectively impaired by downregulating their expression using RNA interference44. SNHG26 has been identified as dysregulated in gene expression datasets from different types of cancer46,47,48. The only study involving any functional characterization of SNHG26 was a recent publication identifying SNHG26 as a direct transcriptional target of the well-known oncogenic c-MYC and a mediator of MYC-driven proliferation in human lymphoid cells42. Our study is the first to report that SNHG26 is necessary for a proper G2 to M transition during cell cycle, and is important for normal proliferation in HaCaT, A549, DLD1, and LS411N cells.

EMSLR expression peaked in G1 phase, and its knockdown gave growth inhibition in all four cell lines, enrichment of cells in the G1 phase by 3–7 percentage points and a corresponding reduction of cells present in the S phase. These results are in line with a previous functional study of EMSLR in A549 cells49. In addition to reporting similar growth inhibition and enrichment of cells in G1 phase, the study revealed that EMSLR, there named E2F1 mRNA stabilizing (EMS) lncRNA, is a direct transcriptional target of c-MYC. Mechanistically, their results suggest that EMS modulates E2F1 stability and promotes G1 to S cell cycle progression through c-MYC. EMSLR is upregulated in tissue from colon cancer patients in several datasets, and its expression is associated with poor prognosis50,51,52. In one study EMSLR was differentially expressed between patients with early and advanced stage endometrial carcinoma, where increased expression of EMSLR was associated with disease progression53. Previous studies together with our results, indicate an oncogenic role of EMSLR, possibly by interfering with the progression from G1 to the S phase of the cell cycle.

In our study, knockdown of EPB41L4A-AS1 resulted in an overall growth inhibition of HaCaT, A549, and DLD1 cells, and the distribution of HaCaT cells in the different phases of the cell cycle was also affected with an increase of cells in the G2/M phase by 7 percentage points and a decrease in G1 by 6 percentage points. In the RNA-seq data and RT-qPCR technical validation, EPB41L4A-AS1 expression peaked in M/G1, but this pattern was only reproduced in one of the two biological validation experiments (Val2, Fig. 4C). Several gene expression studies have identified EPB41L4A-AS1 as both over- and underexpressed in cancer, probably depending on type of tissue and stage of progression54,55,56. Functional studies suggest a central role of EPB41L4A-AS1 in metabolic reprogramming and as a repressor of the Warburg effect in placental tissue of miscarriage57 and in cancer cells (cervical, breast, bladder, and liver)58. Another functional study investigating the role of miR-146a on the proliferation of bone marrow-derived mesenchymal stem (BMSC) cells, reported that miR-146a interacts with and inhibits the expression of EPB41L4A-AS1 and SNHG7. Also, overexpression of EPB41L4A-AS1 increased proliferation and affected the phase distribution of BMSCs, with a reduction of cells in G1/G0 phase and an increased percentage of cells in the S and G2/M phase59. The expression of EPB41L4A-AS1 is higher in colorectal cancer tissue compared to normal tissue, and in line with our results, knockdown of EPB41L4A-AS1 decreased proliferation in colorectal cancer cell lines HCT116 and SW62060. Based on previous studies, the expression of EPB41L4A-AS1 and its biological role seem to vary in different types of cells, tissues, and stage of disease, suggesting a potential role as a biomarker for disease progression and as a therapeutic target. Our study is the first to evaluate how EPB41L4A-AS1 affects the cell cycle and proliferation in HaCaT, A549, and DLD1 cells. The results suggest an oncogenic role of EPB41L4A-AS1, possibly by affecting the progression from G2 to M phase.

The function of ZFAS1 has been evaluated in several papers, and like many other lncRNAs it varies depending on the type of cell line or tissue being investigated. ZFAS1 was first described as a regulator of mammary development61. Later studies identified it as an oncogene upregulated in several cancers, including lung, colon, ovary, glioma, liver, and gastric cancers, but downregulated in breast cancer62. In a study from Fan et. al., overexpression of ZFAS1 resulted in G1/G0 phase arrest in two breast cancer cell lines20. In contrast, we found that ZFAS1 knockdown had no significant effect on HaCaT proliferation and phase distributions. Instead, our results demonstrated reduced proliferation in response to knockdown in the cancerous cell lines A549, DLD1, and LS411N. Consequently, and in line with previous studies describing the oncogenic characters of ZFAS1, our results show that ZFAS1’s effects on cell proliferation are cell type-specific and may depend on other oncogenic transformations.

Three of the four lncRNAs we investigated (SNHG26, ZFAS1, and EPB41L4A-AS1) overlap with snoRNAs and could therefore be SNHGs—the primary transcripts for these snoRNAs. Whereas our results show that knockdown of these lncRNAs affect proliferation and cell cycle progression, we acknowledge that some of these effects may be mediated by overlapping snoRNAs.

In this study we used double thymidine block to arrest and subsequently release HaCaT cells at the G1/S transition and we identified 1803 genes with significant periodic expression patterns during the cell cycle. However, all chemical synchronization procedures have disadvantages as they may perturb the normal cell cycle and potentially trigger stress‐related responses and cause DNA damage63. Thus, disruptions of normal biological processes caused by the manipulations during cell synchronization can give false positive cyclic profiles and hide the expression of true cell cycle regulated genes. Although we cannot exclude such false positives among our list of candidate lncRNAs, the additional FACS and RT-qPCR analyses showed reduced expression levels in S phase for SNHG26, EPB41L4AAS1, and EMSLR (Figure S20), consistent with their RNA-seq profiles following the double thymidine block.

Several lncRNAs have been identified as cell cycle-associated64,65,66,67. Hung et al.66 used tiling microarrays to investigate ncRNA expression in promoters of 56 cell cycle genes. Their expression data included HeLa and U2OS cells synchronized by double thymidine block, primary cells or cell lines perturbed by DNA damage, or differentiation or oncogenic stimuli, and tumors and paired normal tissues. Cell cycle-dependent expression of lncRNAs in HeLa cells synchronized by double thymidine block was also confirmed by RNA-seq67, though this study identified a different set of 39 lncRNAs with cyclic expression. Ali et al.64 used RNA-seq of nascent RNAs in HeLa cells synchronized by thymidine and hydroxyurea. The data included three time points in the subsequent S-phase and their analyses identified 1145 lncRNAs with temporal expression changes. Hao et al.65 used deep RNA-seq of U2OS cells synchronized to different phases of the cell cycle and identified more than 2000 lncRNAs that had phase-specific expression. Cells were synchronized to different stages by nocodazole (M) followed by mitotic shake-off (G1), and by double thymidine block (G1/S) and subsequent isolation at 4 h (S) and 8 h (G2) following release. They observed that 35–40% of the genes differentially expressed during the cell cycle consisted of lncRNAs, with the majority being highly expressed in G1 phase. In our study, 56% of the cell cycle-associated lncRNAs were assigned to S phase, whereas 19% were assigned to G1. All of our candidate lncRNAs were identified in the study by Hao et al.65 where SNHG26, EPB41L4A-AS1, and EMSLR showed reduced expression levels in S phase, which is in line with our results. We do note the differences between these studies in the number of identified lncRNAs and in their phase-specific expression characteristics. But we also note differences in data generation, lncRNA annotations, and data analyses, which all may have contributed to these differences.

Our analyses suggest that by using a positive correlation between the RNA-seq expression profile and H3K4me3 and Pol II signal as a selection criteria, we identify genes that are actively transcribed and highly enriched for cell cycle functions. Although this is a useful method for detecting cell cycle-associated lncRNAs, cyclic lncRNAs with low correlation to ChIP-seq signal should not be dismissed, as they may also be possible candidates for cell cycle involvement. As exemplified by PCNA, genes with several transcripts may have a low correlation value if the wrong TSS was selected for ChIP-seq signal analysis. More sophisticated analyses that use the ChIP-seq data to identify the most likely active TSS per gene may eliminate some of these false negatives.

In summary, our results indicate that all four candidate lncRNAs tested in this study influenced both proliferation and cell cycle progression, although the degree of effect varied between the candidates. The top candidate SNHG26, which has a cyclic expression pattern and the highest overall correlation to Pol II and H3K4me3 ChIP-seq signal, did have a consistent effect on overall growth inhibition and cell cycle phase distribution in response to knockdown. Results from our functional evaluation support that our multi-omics method is well suited for identifying lncRNAs involved in the cell cycle.

Methods

Cell culture

All cell lines were obtained from the American Type Culture Collection (ATCC) and cultivated in a humidified incubator at 37 °C and 5% CO2. The human keratinocyte cell line HaCaT was cultured in Dulbecco’s modified Eagle’s medium (DMEM, Sigma-Aldrich, D6419) supplemented with 10% fetal bovine serum (FBS, Sigma-Aldrich, F7524), 2 mM glutamine (Sigma-Aldrich, G7513), 0.1 mg/ml gentamicin (Gibco, 15710049), and 1.25 µg/ml fungizone (Sigma-Aldrich, A2942). For the lung carcinoma cell line A549 we used DMEM supplemented with 10% FBS and 2 mM glutamine, while the colorectal carcinoma cell line LS411N and the colorectal adenocarcinoma cell line DLD1 were cultivated in RPMI 1640 medium (Gibco, A1049101) supplemented with 10% FBS.

Cell cycle synchronization

HaCaT cells were seeded in 150-mm culture dishes (2 × 106 cells each dish) and were arrested in the G1/S transition by double thymidine block. Briefly, cells were treated with 2 mM thymidine for 18 h, released from the arrest for 10 h and arrested a second time with 2 mM thymidine for 18 additional hours. After blocking, media was replaced and cells were collected every third hour for 24 h, covering approximately two cell cycles. Unsynchronized cells were used as a reference sample.

Cell cycle and fluorescence-activated cell sorting (FACS) analysis

We used FACS analysis to determine the cell cycle phase distribution. HaCaT cells were washed twice with preheated PBS and trypsinated for 8 min before collected using cold PBS supplemented with 3% FBS. Then we centrifuged the cells at 4 °C for 5 min. The cell pellet was resuspended in 100 µl cold PBS and fixed by adding 1 ml cold (− 20 °C) methanol dropwise while vortexing at 1600 rpm and stored at 4 °C until DNA measurement. Cells were then washed with cold PBS and incubated with 200 µl of DNase-free RNAse A in PBS (100 μg/ml) for 30 min at 37 °C before DNA staining with 200 µl of Propidium Iodide (PI, Sigma; 50 μg/ml) at 37 °C for 30 min. Cell cycle analyses were performed by using a BD FACS Canto flow cytometer (BD Biosciences). The excitation maximum of PI is 535 nm and the emission maximum is 617 nm. Here, PI-stained cells were excited with the blue laser (488 nm), and the PI fluorescence was detected in the Phycoerythrin (PE) channel (578 nm). PE channel (578 nm). Quantification of cells in each phase was done with the FlowJo software and the percentage of cells assigned to G1, S, and G2/M phases was calculated.

Total RNA-seq

Total RNA was isolated using the mirVana miRNA Isolation Kit (ThermoFisher Scientific, AM1560) according to the manufacturer’s protocol. Integrity and stability of RNA samples were assessed by Agilent 2100 Bioanalyzer (Agilent Technologies), whereas the RNA concentration and quality were measured on a NanoDrop ND-1000 UV–Vis Spectrophotometer. RNA-seq libraries were prepared using the Illumina TruSeq Stranded Total RNA with Ribo-Zero™ Human/Mouse/Rat kit, according to the manufacturer's instructions (Illumina; Supplementary Method 1). The sequencing (50 cycles single end reads) was performed on an Illumina HiSeq2500 instrument, in accordance with the manufacturer’s instructions. FASTQ files were created with bcl2fastq 2.18 (Illumina).

Identifying cell cycle genes

RNA-seq raw reads were quality-filtered using fastq_quality_filter 0.0.13 (http://hannonlab.cshl.edu/fastx_toolkit/; parameters −Q33 −q 20 −p 80 −z), and subsequently aligned to human genome (version GRCh38.p7) with STAR 2.4.0.f168; parameters –chimSegmentMin 30 –runThreadN 12 –outFilterMultimapNmax 20 –alignSJoverhangMin 8 –alignSJDBoverhangMin 1 –outFilterMismatchNmax 10 –outFilterMismatchNoverLmax 0.04 –alignIntronMin 20 –alignIntronMax 1,000,000). Read alignments were then feature-counted on annotated exons and summarized on genes, using htseq-count 0.6.069; parameters -r pos -i gene_id -t exon -s yes. The resulting raw count matrix was stripped off genes with zero-count in any of the profiles, preventing such genes from dominating the partial least squares regression (PLS) model. Specifically, out of 58,051 gtf-annotated genes in Human Gencode v24, at least a single read was present for 31,433 genes, and a total of 14,059 was left for analysis. This filtered count matrix was finally transformed to the logarithmic domain and adjusted with precision weights to reduce heteroscedasticity based on abundance using voom from the Limma package70. To identify genes with a cell cycle-dependent profile, we used PLS as previously described1, except that we used the FACS cell fraction matrix directly for the response matrix. Cell cycle phases were assigned by using the profiles of known phase-associated cell cycle genes as described in1.

Quantitative reverse transcription PCR (RT-qPCR)

We isolated total RNA by using the mirVana miRNA Isolation Kit (ThermoFisher Scientific, AM1560) before DNA was removed using TURBO DNA-free™ Kit (Invitrogen, AM1907), according to the manufacturer’s instructions. RNA concentration and quality were measured on a NanoDrop ND-1000 UV–Vis spectrophotometer. Total RNA was reverse transcribed using TaqMan reverse transcription reagents (Applied Biosystems, N8080234) followed by quantitative real-time PCR using SYBR™ Select Master Mix (Applied Biosystems, 44729199) and quantification by the Step One Real-time PCR system (Applied Biosystems). RT2 lncRNA PCR assays (Qiagen, 330701) and QuantiTect primer assays (Qiagen, 249900) that were used for lncRNA and mRNA expression analysis are listed in Table S4. The relative expression of mRNAs and lncRNAs was calculated using the ∆∆Ct method71 with Glyceraldehyde 3-phosphate dehydrogenase (GAPDH) as an endogenous control.

ChIP-seq

Chromatin immunoprecipitation (ChIP) was performed as described in Supplementary Method 2. The antibodies used for immunoprecipitation (IP) were obtained from Diagenode; anti-H3K4me3 (C1541003-50), anti-H3K27me3 (C15410195), and anti-Pol II (C15200004). As a control for successful IP, qPCR was performed using human positive and negative control qPCR primer sets from Active Motif (Table S5). Immunoprecipitated material and input chromatin were submitted to the Genomics Core Facility (GCF) at Norwegian University of Science and Technology (NTNU) for library preparation (Supplementary Method 3) and sequencing. ChIP-seq libraries were prepared using the MicroPlex Library Preparation Kit v4 (Diagenode) and the sequencing (50 cycles single end reads) was performed on an Illumina HiSeq2500 instrument, in accordance with the manufacturer’s instructions (Illumina). FASTQ files were created with bcl2fastq 2.18 (Illumina).

ChIP-seq data analysis

The ChIP-seq FASTQ files were aligned against the reference human genome hg38 with the hisat2 aligner72. Using the -k flag, we only kept 1 primary alignment per read. We also disallowed spliced alignments. We then found the reads per kilobase million (RPKM)-coverage of each alignment file for the genes in hg38 using deepTools bamCoverage73. The coverage of each ChIP sample was divided by the average of the coverage of the input sample to provide a normalized expression per gene. For each gene in Human Gencode v24 we found the transcription start sites (TSSs) of the genes and the longest transcript variant was chosen. Then we binned the 10,000 bp area around the TSSs into bins of 50 bp. We found the counts of the reads within each bin by extending each read by 75 (half the fragment size). Then we saw which bin overlapped with the point TSS + 75. Each read was only considered to belong to one bin.

Pol II ChIP-qPCR

Cells were harvested and Pol II ChIP was performed as described in Supplementary Method 2, except that cells were sonicated using a Bioruptor Pico (Diagenode) for 14 cycles of 30 s ON/30 s OFF in a volume of 300 µl. No digestion with Micrococcal Nuclease was included. For accurate fragment assessment, the shared chromatin was analyzed on a 2% agarose gel. Fragment size was optimized to be 200–500 bp. As a control for successful IP, qPCR was performed using human positive and negative control qPCR primer sets from Active Motif (Table S5). PCR primers were designed in the TSS region for selected genes (Table S1). ChIP DNA was diluted 1:2 in TE buffer and qPCR was performed on immunoprecipitated material and input chromatin. We added 2 µl ChIP DNA and 500 nM of each primer to SYBR Select Master Mix (Applied Biosystems) in technical duplicates. Target values from all qPCR samples were normalized with matched input DNA using the percent input method [100 * 2(Adjusted input − Ct (IP)]. The relative expression of selected genes (Table S1) was normalized against GAPDH and ACTB (mean).

Western blot analysis

We used western blot analysis to validate the Pol II protein level, and H3K4me3 and H3K27me3 modification levels in double thymidine blocked synchronized HaCaT cells (Supplementary Method 4). Primary antibodies for Pol II (C15200004), H3K4me3 (C15410003), and H3K27me3 (C15410195) were obtained from Diagenode.

RNA interference

All cells were transfected with 20 nM siRNAs or Antisense LNA GapmeR (Antisense oligo; ASO) using Lipofectamine RNAimax (Invitrogen™, 13778030) when seeded, according to the manufacturer’s protocol. Cells were harvested after 48 and/or 72 h at about 70% confluence. MISSION® siRNA Universal Negative Control #1 (Sigma, SIC001) and the negative control A Antisense LNA GapmeR (Qiagen, LG00000002) were used as controls for siRNAs and ASO, respectively. The producers and sequences of siRNAs and ASO are listed in Table S6. All cell culture experiments were performed in three or more independent experiments, and with siRNAs/ASO targeting two different sequences within the same lncRNA.

Viability assay

We performed cell counting using Moxi z mini automated cell counter (ORFLO Technologies) to investigate how knockdown of the lncRNA candidates affected cell growth. All four cell lines (HaCaT, A549, LS411N, and DLD1) were seeded in triplicates for each condition in a 24-well tray and counted 72 h after transfection. Each well was washed twice with preheated PBS and trypsinated for 5–10 min before the cells were resuspended in preheated growth medium and counted. We applied a two-tailed, paired Student’s t-test to test whether the growth was significantly different (p < 0.05) between cells transfected with negative control siRNA/ASO and a lncRNA target-specific siRNA/ASO in at least three independent experiments.

Statistical analysis

The results were presented as the mean values ± standard error of mean (SEM). Differences between groups were estimated using the two-tailed Welch’s t-test, assuming unequal variances. The one-sample t-test was used to compare the mean of the sample data to a known value. We used a hierarchical, linear model and ANOVA to calculate (1) the percent growth inhibition across four cell lines and (2) the distribution of HaCaT cells in different cell cycle phases in response to siRNA/ASO-mediated knockdown of the candidate lncRNAs, assuming a random effect for each cell line/siRNA/ASO. A value of P < 0.05 was considered statistically significant for all tests. All data analyses were performed in R. Plots were made using the packages gglot2 and ggpubr. Gene ontology (GO) analyses were done with the package gProfileR.