Live cell imaging and proteomic profiling of endogenous NEAT1 lncRNA by CRISPR/Cas9-mediated knock-in

In mammalian cells, long noncoding RNAs (lncRNAs) form complexes with proteins to execute various biological functions such as gene transcription, RNA processing and other signaling activities. However, methods to track endogenous lncRNA dynamics in live cells and screen for lncRNA interacting proteins are limited. Here, we report the development of CERTIS (CRISPR-mediated Endogenous lncRNA Tracking and Immunoprecipitation System) to visualize and isolate endogenous lncRNA, by precisely inserting a 24-repeat MS2 tag into the distal end of lncRNA locus through the CRISPR/Cas9 technology. In this study, we show that CERTIS effectively labeled the paraspeckle lncRNA NEAT1 without disturbing its physiological properties and could monitor the endogenous expression variation of NEAT1. In addition, CERTIS displayed superior performance on both short- and long-term tracking of NEAT1 dynamics in live cells. We found that NEAT1 and paraspeckles were sensitive to topoisomerase I specific inhibitors. Moreover, RNA Immunoprecipitation (RIP) of the MS2-tagged NEAT1 lncRNA successfully revealed several new protein components of paraspeckle. Our results support CERTIS as a tool suitable to track both spatial and temporal lncRNA regulation in live cells as well as study the lncRNA-protein interactomes.


INTRODUCTION
Noncoding RNAs (ncRNAs) have been characterized as the dark matter of the human genome because they do not encode protein sequences despite taking up a major portion of our transcriptome (Derrien et al., 2012;Evans et al., 2016;Jandura and Krause, 2017). ncRNAs that longer than 200 nt are called long noncoding RNAs (lncRNAs) and are particularly interesting due to their important roles in a variety of cellular activities (Marchese et al., 2017;Kopp and Mendell, 2018;Yao et al., 2019). It is well documented that lncRNAs can fold into structural units and form complexes with proteins such as RNA binding proteins (RBPs) to regulate different functions, including gene imprinting, antiviral response, differentiation and development (Weinrich et al., 1997;Nguyen et al., 2001;Xing et al., 2017;Munschauer et al., 2018). One intrigue characteristic of lncRNAs is that they may be post-transcriptionally modified and bind different RBPs to regulate diverse biological functions. For instance, during developmental process or under stress, the localization and organization of lncRNA RNP are regulated spatially and temporally by RNA modification and changes of RNA Electronic supplementary material The online version of this article (https://doi.org/10.1007/s13238-020-00706-w) contains supplementary material, which is available to authorized users. binding proteins to adapt to their functions being performed (Yang et al., 2015;Chen, 2016). In other words, lncRNA RNP could be dynamic. Therefore, studying the spatiotemporal distribution and interactive proteomics of lncRNA within cells play central roles to unmask their biological functions and mechanisms during physiological and pathological processes.
Some of the lncRNAs can play an architectural role with RBPs in the assembly of subcellular structures, such as paraspeckle and nuclear speckle in the nucleus (Hutchinson et al., 2007), which are ideal models to illustrate the dynamical regulation of lncRNA-protein complexes. Paraspeckles are mammal-specific, membraneless RNAprotein nuclear bodies that built on a long, non-protein-coding RNA, nuclear-enriched abundant transcript 1 (NEAT1) and a series of core RNA-binding structural protein components (Sunwoo et al., 2008;Clemson et al., 2009;Sasaki et al., 2009;Fox et al., 2018). Since paraspeckles first described in 2002 as nuclear foci, the knowledge on paraspeckles is accumulating at a rapid rate (Fox et al., 2002;Naganuma et al., 2012). Existing research suggests that paraspeckles play a role in a variety of developmental and disease scenarios, including female reproduction, viral infection, ALS pathogenesis and cancer progression (Zhang et al., 2013;Nakagawa et al., 2014;Shelkovnikova et al., 2014;Standaert et al., 2014;Fujimoto et al., 2016;Lanzós et al., 2017;Wang et al., 2017). However, the functional studies are not as advanced. As yet, no distinct catalytic activity has been found to occur within paraspeckles, while it is commonly believed that the molecular basis of paraspeckle function mainly consequent on the sequestration of certain RNA species and paraspeckle protein components by the scaffold of NEAT1 lncRNA (Chen and Carmichael, 2009;Hirose et al., 2014;Wang et al., 2018).
However, study of such a highly dynamical lncRNA has been limited by a lack of efficient and versatile method that can both track endogenous lncRNA dynamics in live cells and identify lncRNA interacting proteins. Currently, methods based on RNA antisense nucleotide probe hybridization are the golden standard for studying lncRNA localization and interaction. Visualization of lncRNAs could be achieved with high specificity and signal-to-noise ratio (SNR) by RNA fluorescent in situ hybridization (RNA FISH) (Levsky and Singer, 2003;Itzkovitz and van Oudenaarden, 2011). Although RNA FISH is ideal for fixed samples, it is incompatible to track the subcellular lncRNA dynamics in live cells. Methods using biotinylated oligonucleotides complementary to the lncRNAs to identify lncRNA-interacting proteins including RAP and ChIRP (Cao et al., 2019). However, both of these methods are cross-linking and hybridization-based, which may disrupt lncRNA structure and are insufficient for the study of lncRNA with higher structure or RBP occupancy. Other latest RNA labeling systems such as RNA-targeting Cas9 (RCas9) and nuclease-inactive dCas13a (Nelles et al., 2016;Rau and Rentmeister, 2016;Cox et al., 2017), although both of them have shown satisfactory SNR on the cytoplasm mRNAs trafficking in live cells, neither of RCas9 and dCas13a has been developed as a routine method to identify the interaction proteins of labeled RNA until now. Given the intricate connection between lncRNA localization, RBP interactions and functions, there is appealing need to develop methods to visualize and purify endogenous lncRNA-RNPs.
The MS2-MCP system has been widely used for labeling exogenous RNA from yeast, zebrafish to higher organisms (Wu et al., 2012;Liu et al., 2015;Tutucci et al., 2018a, b), which enables efficient RNA immunoprecipitation and tracking of exogenous RNA with single-molecule sensitivity in live cells. MS2 is a stem-loop bacteriophage RNA that specifically binds to the MS2 coat protein (MCP) with high affinity, where multiple copies of MS2 (up to 24) can further improve the SNR as each MS2 is able to bind an MCP:FP dimer (Fusco et al., 2003). Moreover, MS2 tags have also been used to visualize endogenous RNA localization and trafficking in living yeast, mouse or human cells recently (Lionnet et al., 2011;Park et al., 2014;Tutucci et al., 2018a, b;Kim et al., 2019;Spille et al., 2019;Yang et al., 2019), demonstrating that the insertion of MS2 repeats would not disturb the spatiotemporal distribution of certain RNAs. However, considering the highly complicated human genome that can lead to unexpected off-targets and much lower efficiency of homologous recombination, precisely tagging the endogenous RNAs especially lncRNA with MS2 in human cells is still a challenging task.
Here, we develop a simple, efficient and scalable endogenous lncRNA labeling system called CERTIS (CRISPR-mediated Endogenous lncRNA Tracking and Immunoprecipitation System) in human cells. By combining CRISPR/Cas9-mediated gene knock-in (KI) and MS2-MCP RNA labeling system, it enables both live cell imaging and immunoprecipitation of endogenous lncRNA-protein complexes, as well as monitoring the expression level variation of lncRNA. First, CERTIS knocked in the MS2-repeat cassette in the distal end of lncRNA locus by CRISPR/Cas9induced micro-homology mediated end joining (MMEJ). In order to enrich the KI cell population as well as monitor the lncRNA expression level, an IRES-GFP fragment was added after the MS2-repeat and driven by the endogenous promoter. Successfully KI cells were then enriched by consecutive GFP positive sorting before subjected to clonal isolation and genotyping. In case of the unexpected off-target insertions and aiming at recycling GFP for future multicolor labeling, another round of GFP negative selection could be further performed optionally by precise excision of the IRES-GFP fragment. After stably expressing MCP and its fusion fluorescent protein tags, visualization or affinity purification of targeted lncRNA-protein complex could be achieved using these cells. In this study, we show that CERTIS effectively labeled endogenous paraspeckle lncRNA NEAT1 without disturbing its physiologic properties and could be used to quantitatively monitor the expression variation of endogenous NEAT1. Moreover, CERTIS achieved superior performance in live cell imaging of both short-and long-term tracking of NEAT1 dynamics. Using this system, we were able to determine the effect of different classes of cellular stress on NEAT1 expression and dynamics. For example, we found that NEAT1 and paraspeckles were sensitive to topoisomerase I specific inhibitors and viral infection. In addition, we performed RNA Immunoprecipitation (RIP) of tagged NEAT1 lncRNA and identified several new protein components of paraspeckle. Overall, our results suggest that CERTIS is a tool suitable to track both spatial and temporal lncRNA dynamics in live cells as well as study the lncRNA-protein interactomes.

RESULTS
Targeting the NEAT1 lncRNA through MMEJ-mediated CRISPR KI The NEAT1 gene locus generates two noncoding isoforms in human: the 3.7 kb short NEAT1_1 and the 23 kb long NEAT1_2 (Fig. 1A). NEAT1_1 completely overlaps the 5′ end of NEAT1_2. Although both NEAT1 transcripts localize to the paraspeckle, NEAT1_2, but not NEAT1_1, play an architectural role in the formation of paraspeckles (Clemson et al., 2009). EM and super-resolution microscopy have been used to reveal that paraspeckles are composed of fused chains of spherical subcompartments, with NEAT1_2 arranged perpendicular to the longer axis of paraspeckles and its 5′ and 3′ ends facing outward West et al., 2016). Thus, observation and manipulation of NEAT1_2 is a route to functional studies of paraspeckles, which needs to target the unique C terminal distal end of NEAT1_2 transcript. Unless otherwise specified, NEAT1 hitherto refers to the longer isoform. Considering the important functions and obscure regulatory mechanism of NEAT1, a new approach is now developed, aiming at labeling such a highly dynamical structural lncRNA.
Recently, CRISPR/Cas9 technology has been developed to edit and label genomic loci in the living cell due to its accuracy and efficiency on the recognition of DNA and RNA (Nelles et al., 2016;Shao et al., 2016;Qin et al., 2017;Tasan et al., 2018). CRISPR/Cas9-mediated gene KI usually employs donor templates with long homologous arms, ranging from dozens of nucleotides for single-stranded oligodeoxynucleotides (ssODNs) to several hundred base pairs for larger fragments. The process relies on homologydirected recombination (HDR), which occurs predominantly during late S/G 2 phases of the cell cycle and has relatively low efficiency in human cells (Taleei and Nikjoo, 2013). Moreover, traditional HR method needs laborious construction of donor vectors with long homology arms, which impedes high-throughput applications. Interestingly, it was shown that donor templates with considerably shorter homologous arms (<20 nt) could mediate efficient and successful KI via the MMEJ pathway (Bae et al., 2014;Nakade et al., 2014;Kostyrko and Mermod, 2016). Unlike HDR, the alternative end-joining process MMEJ uses microhomology regions of 5-25 bp that flank a DSB to repair DNA and has high activity during the G 1 and early S phases of the cell cycle. The short homology arms also make it easier to amplify templates and scale-up different loci targeting by one-step, cloning-free PCR.
To engineer the MS2-MCP system to the labeling of endogenous NEAT1, we designed and optimized the workflow for efficient CRISPR/Cas9 knock-in of MS2 repeats to the distal end of targeted site (Fig. 1C). First, we adopted a PCR-based strategy using primers that contain at their 5′ ends 20 nt micro-homology arms (MHAs) and a 25-30 nt overlapping region of the universal template plasmid, to exponential amplification of linear donor DNA. Therefore, we only need to replace the 20 nt MHAs at the 5′ end of the primers to prepare the DNA donor templates for scalable lncRNAs labeling. Notably, to avoid the rapid degradation of linear DNA fragment in the nucleus, the 5′ end of primers were thiophosphorylated to their last 5 nt (Fig. 1B). The universal plasmid template contains the 24×MS2-IRES-GFP cassette, where IRES-GFP is flanked by a nontarget pre-gRNA site and a CC-rich region (as PAM motifs). This design aims to minimize random off-target insertions or recycle the GFP for multi-color labeling as a further choice.
Two gRNAs were designed to target the distal end of NEAT1 respectively ( Fig. 1A and Table S1). Following introduction of the donor template and Cas9 RNP into HEK293T, successful KI cells were enriched by consecutive several rounds of FACs sorting (Fig. 1D). mScarlet is a strong red fluorescence protein with superior anti-photobleaching properties (Bindels et al., 2017), which is ideal for long-term tracking of tagged proteins. When we expressed tdMCP-mScarlet-3×Flag in the two independent MS2 KI cell lines, speckle-like puncta could be both observed in the nucleus compared to the control cells ( Fig. 1E), indicating that these are likely paraspeckles revealed by MCP-mScarlet expression. Then the cells were plated to 96-well-plate to isolate single cell clones. To further confirm the correct KI of MS2-repeat, genotyping PCR was carried out with junction PCR primers and followed by Sanger sequencing (Figs. S1, S7 and S8). The expected band size of PCR products for both NEAT1 KI cell lines revealed the successful and efficient insertion by MMEJ-based KI (NEAT1-g1: 19/19 and NEAT1-g2: 16/19). Sanger sequencing after TA cloning of the PCR products confirmed that most of the clones were precisely inserted into the expected loci. One clone from each gRNA KI was used for further analysis.
Endogenous NEAT1 distally tagged with the MS2 cassette displays normal spatiotemporal distribution and function When we examined the expression level of NEAT1 in the two KI cell lines with parental HEK293T by RT-qPCR, we observed no significant difference between the cell lines ( Fig. 1F). Next, we carried out RNA-FISH analysis using an Alexa Flour 647-labeled NEAT1 RNA probe that targets the middle region of NEAT1 ( Fig. 2A and 2B). The super-resolution microscopy STEP revealed the mScarlet signals (red, endogenous NEAT1) overlapped well with signals from the FISH probe (defined as green). The scatter plots also demonstrated that the mScarlet channel colocalized well with FISH probe channel for each individual pixel, with both high Pearson's correlation coefficient and overlap coefficient (Fig. 2C). Interestingly, less noise in the background was observed from CERTIS approach compared to FISH, probably because CERTIS avoids the cumbersome steps that might cause the residual non-specific binding of FISH probe.
To further evaluate the MS2-tagged NEAT1, we marked paraspeckles with antibodies that recognize paraspeckle marker proteins PSPC1 and SFPQ. Both proteins are essential components of paraspeckles and directly interact with NEAT1 (Fox et al., 2005;Lee et al., 2015). In fact, either of their deficiency can result in NEAT1 instability and eventual disaggregation of paraspeckles (Naganuma et al., 2012). We found extensive overlapping of mScarlet-labeled NEAT1 and Alexa Flour 647-labeled PSPC1 or SFPQ could be observed in both of MS2 KI cell lines ( Fig. 2D and 2F). Furthermore, when we knocked down SFPQ by siRNA in the NEAT1-labeled cells, mScarlet signals appeared diffused in the nucleus with no clear puncta formation, consistent with paraspeckle disaggregation after SFPQ deficiency ( Fig. 2E and 2F). Taken together, these results suggest that the MS2tagged NEAT1 is able to correctly associate with protein binding partners and assemble paraspeckles, and that these cells are a good model system to study NEAT1 lncRNA and its binding proteins.

CERTIS quantitatively monitors both spatial and temporal regulation of NEAT1 in live cells
Quantitatively reporting of lncRNA expression in live cells would aid our understanding of lncRNA function and its regulatory mechanisms, especially in the application of highthroughput gene inactivation or drug screening. Taking the advantage that CERTIS labeled NEAT1 was tagged by endogenous promoter-driven IRES-GFP, we next evaluated whether change of lncRNA level could be monitored by the GFP intensity through flow cytometry (Fig. 3A). Actinomycin D (ActD) could decrease NEAT1 expression level rapidly owing to its RNA synthesis inhibition capability (Fox et al., 2002). We treated the NEAT1-g1 KI cells with ActD for 48 h and then measured the expression of NEAT1 by flow cytometry. Compared to untreated cells, the GFP peak shifted to the left and mean fluorescence intensity significantly decreased after ActD treatment, indicating a diminishing of NEAT1 expression was detected ( Fig. 3B and 3C). This was consistent with RT-qPCR quantification of NEAT1 level (Fig. 3D). Since herpes simplex virus 1 (HSV-1) infection has been reported that could enhance the expression of NEAT1 (Imamura et al., 2014), we next treated the cells with HSV-1 under the indicated MOI and determined the variation of GFP intensity. Indeed, an expected improvement of GFP level was observed in line with the NEAT1 expression after 48 h HSV-1 infection ( Fig. 3E-G). Moreover, we found that the fluorescent imaging of mScarlet signal at paraspeckles could also truthfully monitor the spatial regulations of NEAT1. During ActD treatment, a diminishing of NEAT1 was observed, which disaggregated from the paraspeckle puncta and dispersed rapidly within the nucleus, while increasing both the number and total area of paraspeckles in response to HSV-1 infection ( Fig. 3H and 3I). These results above demonstrated that both the spatial and temporal regulation of CERTIS labeled lncRNA could be faithfully measured in live cells, paving ways for potential high-throughput screening of lncRNA regulators.

RESEARCH ARTICLE
Bohong Chen et al.
Tracking of NEAT1 and paraspeckles dynamics in response to different stimuli over time To salvage the GFP marker for multi-color labeling as well as eliminate potential off-target insertions within cells, two gRNAs respectively targeting the pre-gRNA site and the CCrich region-endogenous lncRNA junction were simultaneously introduced into the GFP positive cell lines with Cas9 (Figs. 1B and 4A). If the intact tagging cassette is precisely inserted into the lncRNA locus, further Cas9 cleavage should remove IRES-GFP and render the cells GFP-negative, while off-target insertions by error-prone NHEJ pathway or inefficient double knocked-out cells still expressing GFP. As expected, only the existence of paired gRNAs could significantly eliminate the GFP expression (Fig. 4B). After double KO by the paired gRNAs, the remained GFP positive cells were also subcloned and identified by genotyping PCR (Fig. S2). A majority of the GFP positive cells were still correctively inserted into the on-target loci as judged by PCR products with expected size, probably due to inefficient double KO of the IRES-GFP fragment simultaneously. The correctly tagged GFP negative cells were then enriched by a second round of GFP negative selection. Proteins such as SFPQ and NONO are known to be key regulators of paraspeckles and interact with NEAT1 (Naganuma et al., 2012). To study the spatiotemporal distribution of NEAT1 in live cells, we stably expressed GFP-SFPQ fusion protein in the NEAT1-g1 KI cell line, and isolated cell clones with appropriate expression level that similar to endogenous SFPQ for further observation. Similarly, CER-TIS-labeled NEAT1 co-localized well with GFP-SFPQ (Fig. 4C). Then different stimuli were added to the cells to track the NEAT1 and paraspeckles dynamics over time. With the addition of ActD, SFPQ aggregated into the nucleolus caps within a few hours, followed by rapid depolymerization of NEAT1 and disruption of paraspeckles. Interestingly, when the cells were treated with another RNA pol II inhibitor α-Amanitin, there was little SFPQ aggregated to the nucleolus caps while the paraspeckles were still stable, although the expression level of NEAT1 was also dramatically reduced after α-Amanitin treatment ( Fig. 4C and 4D). This implies that there may be additional mechanism participates in the collapse of paraspeckles.
Since paraspeckles were previously speculated to be relative to DNA damage response, we further treated the cells with a series of DNA damage reagents to observe dynamic change of NEAT1 and paraspeckles. Hydroxyurea (HU) triggers fork stalling through inhibition of ribonucleotide reductase. We did not find response on paraspeckle distribution after HU treatment. The radiomimetic chemical zeocin could generate random DNA double-strand breaks (DSB) in the genome, while another DSB-trigger etoposide works by inhibiting topoisomerase II (Delacôte et al., 2007;Wu et al., 2011). Paraspeckles were insensitive to zeocin-induced DSB, with no significant morphological or distribution change even under prolonged treatment. Etoposide treatment for 4 h did not cause paraspeckle disruption, but we found reduced NEAT1 signals and some of SFPQ moved away from paraspeckles (Fig. 4C). RT-qPCR demonstrated that all of these drugs could lower NEAT1 expression level to some extent (Fig. 4E). Interestingly, when we treated the cells with DNA single-strand break (SSB) reagent CPT, the well-known topoisomerase I inhibitor, SFPQ accumulated into the nucleolus in a short time, resulting in NEAT1 diminishing and paraspeckle collapse. To further confirm the relationship between topoisomerase I inhibition and paraspeckle disruption, we treated the cells with SN-38, another topoisomerase I specific inhibitor. We noticed that SN-38 could generate the similar phenotype as CPT even at a very low concentration (1 μmol/L) (Fig. 4F). The corresponding expression level of NEAT1 were consistent with the phenotype (Fig. 4G), while the co-localization of NEAT1/SFPQ foci per nucleus was significantly reduced after CPT or SN-38 treatment, strongly indicating that paraspeckle may response to topoisomerase I defects. Besides, by using the NEAT1-labeled cells that before GFP excision, we found the regulatory relationship of CPT was also correctly mirrored by the GFP intensity through flow cytometer ( Fig. S3A-C). These results indicated that paraspeckles may play multifunctional roles under different types of DNA damage. In addition, the CERTIS platform is suitable for in vivo tracking endogenous lncRNA dynamics, and for expedient screening of the lncRNA regulators.

CERTIS enables long-term time-lapse imaging of endogenous NEAT1 in living cells
To assess the long-term imaging performance of CERTIS and analyze paraspeckles dynamics in living cells, a timelapse capture of CERTIS labeled NEAT1 was performed. GFP-SFPQ was also monitored as a reference for paraspeckle protein dynamics. First, the KI cells were cultured under normal condition, and images were taken in 3D z-stacks every 15 min for 6 h ( Fig. 5A and Supplementary Movie S1). The mScarlet signal did not show photobleaching after 6 h observation. Notably, even in normal culture medium, the number and size of paraspeckles were dynamically altered, while SFPQ was consistently located inside or next to NEAT1 foci. Besides, a fraction of SFPQ scattered across the nucleus, suggesting the multi-functional character of SFPQ (Knott et al., 2016). Since we found that NEAT1 and paraspeckles were sensitive to topoisomerase I specific inhibitors, we next added CPT to the cells and recorded the NEAT1 and paraspeckles dynamics. We found SFPQ moving away from NEAT1 and relocating to the nucleolus rapidly. Consequently, NEAT1 broke up into smaller puncta within a few hours and finally dispersed into the nucleoplasm (Fig. 5B and Supplementary Movie S2). Similarly, when the RNA synthesis inhibitor ActD was added, a disruption of paraspeckles was successfully tracked while SFPQ depolymerized with NEAT1 (Fig. S4). To further evaluate CERTIS in long-term imaging and study reassemble of paraspeckles, we replaced the CPT culture medium with normal medium after 6 h treatment and performed live cell imaging for another 14 h. After removal of CPT, SFPQ gradually infiltrated from the nucleolus and reunited with NEAT1 puncta again, followed by regrouping of NEAT1 foci and finally formed bigger paraspeckles ( Fig. 5C and Supplementary Movie S3). We suspected that this dynamic change of paraspeckles is similar to phase separation. Overall, these results demonstrated that CERTIS labeled method is suitable for long-term dynamic tracking of endogenous lncRNA. By virtue of CERTIS, we successfully recorded the reversible processes of the disruption and reconstitution of paraspeckles in live cells.

Proteomic profiling of CERTIS labeled NEAT1 by native RIP-MS
To understand the function and regulatory mechanisms of lncRNAs, it is necessary to identify proteins that associate with lncRNAs. Current methods for identification of lncRNAprotein interactome mostly used RNA hybridization and are crosslinking-based, which is time-consuming and laborious (George et al., 2018). Since the MCP-fused protein contained epitope tags for immunoprecipitation, we developed an efficient and simple native RIP (nRIP) approach to identify proteins that interact with lncRNA. In our system, 3×Flagfused MCP was applied to pull-down the MS2 tagging lncRNA because of the high affinity and specificity of Flag antibody. For CERTIS-based nRIP, the cells were lysis under a mild lysate, then the MCP-lncRNA-protein complexes were pulled down using M2 magnetic beads. After proteinase K treatment and RNA extraction using part of the samples, we could determinate the fold enrichment of the target lncRNA by q-PCR or for further usage such as RNA-seq. The rest of the samples were digested with RNase A to release the interacting proteins for western blot or mass spectrometry analysis (Fig. 6A). We found CERTIS-labeled NEAT1 was significant enriched after nRIP by q-PCR (Fig. 6B). Importantly, paraspeckle component PSPC1 was also enriched after immunoprecipitation when compared to the control (Fig. 6C). Next, NEAT1 associated proteins were identified by mass spectrometry. We cross-referenced our enrichment protein list with previous identified paraspeckle proteins according to two different genome-wide screening (Naganuma et al., 2012;West et al., 2014), and found many of the known paraspeckle-associated proteins were identified, suggesting CERTIS-based nRIP enables efficient pull-down of the lncRNA interacting proteins ( Fig. 6D and Table S2). The overlapping proteins among three methods were also displayed and listed by the venn diagram (Fig. S5). Core paraspeckle component such as the Drosophila behavior/ human splicing (DBHS) protein family and a series of RNA splicing factors were successfully identified. GO analysis showed that the enriched proteins were consistent with On-target KI Off-target paraspeckle's function as described before, mainly participating in the regulation of RNA splicing, protein complex disassembly, RNA localization, DNA recombination and innate immune response ( Fig. 6E and Table S2). Notably, some of the proteins have not been reported to co-localize with NEAT1 or paraspeckles. To identify new paraspeckle proteins, we determined the localization of a few candidates using GFP fusions in NEAT1 KI cells. Among them, C11orf84, QKI and RBM10 colocalized well with NEAT1 puncta (Fig. 6F). The roles of these proteins in paraspeckles warrant further study in the near future. These results demonstrated that CERTIS-based nRIP could be a robust screening method for identifying lncRNA interacting proteins.

DISSCUSION
Paraspeckles are a highly dynamical membraneless organelle that built on an architectural lncRNA NEAT1. Since the paraspeckle has been long discovered, yet its function and regulation have still been obscure. In this study, we successfully labeled 23 kb lncRNA NEAT1 using the newly developed CERTIS method. First, we demonstrated that knock-in of MS2-repeat does not affect the expression and physicochemical properties of endogenous NEAT1. Next, we showed the expression regulation of NEAT1 could be quantitatively monitored by detecting the GFP intensity through flow cytometry, paving ways for potential genomewide screening of lncRNA regulators. Taking advantage of CERTIS, we performed short-and long-term tracking of NEAT1 dynamics under a series of drugs treatment, and successfully recorded the reversible process of NEAT1 disaggregation and reconstitution in living cells when treated or withdrew of topoisomerase I inhibitor CPT. At last, we developed an efficient approach for native RIP of the CER-TIS labeled NEAT1, and successfully identified several new paraspeckle proteins. We found that paraspeckles have different responses during different drugs treatment in this study. Interestingly, we observed that paraspeckle collapsed rapidly after the addition of topoisomerase I inhibitors CPT and SN-38, which could frequently generate single-strand DNA damage in the genome. It is also remarkable that although both ActD and α-Amanitin are transcription inhibitors, ActD, but not α-Amanitin could disrupt the formation of paraspeckle in a short time. The mechanism of them may be a little different, α-Amanitin is a specific inhibitor of RNA polymerases II and III (Weinmann et al., 1974), while ActD inhibits DNA-primed RNA synthesis by forming a stable complex with double-stranded DNA. Besides, ActD could cause single-strand breaks in DNA (Perry and Kelley, 1970;Roots and Smith, 1976;Sobell, 1985). According to the previous finding, we suspect that the disruption of paraspeckle after ActD treatment might also be caused by the frequent DNA SSB rather than transcription inhibition. In a word, the regulatory relationship between paraspeckle and DNA SSBR is worthy for further research in the future.
In this work, we establish CERTIS as a promising tool for studying endogenous lncRNA-protein complexes, and the application and advantages of CERTIS platform were listed (Fig. 7). Precise insertion of MS2-repeat by CRISPR/Cas9mediated knock-in enables tracking both expression level variation and spatiotemporal distribution of lncRNA, as well as screening for lncRNA-protein interactomes. After Cas9 cleavage, DNA double-strand breaks could be repaired accurately through several recombination pathways in the presence of different lengths of homologous arms, such as HR, HMEJ and MMEJ. Among them, MMEJ repair pathway needs only 20 nt micro-homology arms, which is much easier to prepare and is more efficient in certain cell lines than HR. The random off-target insertions by error-prone NHEJ pathway is a main concern of this field. Taking advantage of MMEJ short homology arms, the CC-rich region-endogenous lncRNA junction is predictable, which could form newly fusion gRNA sites at the C-terminal of insertion, while longer homologous arms usually used in HR template could not generate such a fusion. To eliminate offtarget insertions, a pair of gRNAs respectively target the pre-gRNA site and CC-rich region-endogenous lncRNA junction could be further introduced to the GFP positive knocked-in cells and followed by another round of GFP negative sorting. Although there is still a possibility that off-target insertions At present, there is still a lack in efficient methods for simultaneous tracking endogenous lncRNA dynamics and screening lncRNA interacting proteins in mammalian cells. Compared to the existing RNA labeling systems, CERTIS platform has an advantage on high-throughput screening of regulators that may affect lncRNA spatiotemporal dynamics in living cell, which are usually accomplished with the alteration of its interaction proteins. As a consequence, nRIP could be performed to identify differential interacting proteome under certain conditions, which may help us discover the new role of lncRNA plays in specific biological processes. The nRIP method also has some advantages over cross-linking method. Besides of easy accessibility, nRIP allows to reveal the identity of lncRNAs directly bound by the protein and their abundance in the immunoprecipitated samples. However, one drawback of nRIP is that the weak or indirect interaction protein of lncRNA might be omitted from the immunoprecipitation. Recently, a method called RaPID has been developed to detect the RNA-protein interaction based on proximity labeling, which allows to capture the weak and indirect interaction through biotinylation of the proximal proteins at high affinity (Ramanathan et al., 2018). When in combination with CERTIS by fusing BirA* with MCP, it is promising to identify the interactome of an endogenous lncRNA regardless of their interaction strength. However, like other proximity labeling BioID methods, RaPID can only identify protein in the proximity of the biotinylation enzyme. Thus, nRIP and RaPID would be important complementation for each other to unveil the interactive proteomics of the target lncRNA.
The superiority of the MS2-MCP system has long been confirmed in human cells, but mainly tested in overexpression conditions. However, large amounts of evidence have indicated that overexpression of RNAs may causes many false positive phenotypes and might not reveal the authentic regulation of endogenous RNAs. Moreover, exogenous overexpression and tagging of the very long lncRNAs (∼5% over 8 kb in human genome) (Ma et al., 2014) in the cell is usually inefficient and hard to achieve. By combining CRISPR/Cas9-mediated KI with MS2-MCP system, we successfully developed an efficient system for both visualization and immunoprecipitation of endogenous lncRNA, such as NEAT1 23 kb in length. However, there are still some concerns about CERTIS labeling system. First, the insertion of MS2-repeat might affect the spatiotemporal distribution of certain lncRNAs. For the very long lncRNAs, this concern might be less. Second, for lncRNAs with three-dimensional structures at the 3′ end, it may be necessary to test suitable insertion site of lncRNA to prevent the misleading location or weakening MS2-MCP binding. To solve these problems, we expect that the MS2 repeats could be reduced to 12× or 6× according to the length of the target lncRNA, or other smaller RNA motif system such as boxB-λN could be alternatives for MS2-MCP (Kawakami et al., 2006).
In summary, as a novel lncRNA labeling system, CERTIS shows great potential on the deciphering of function and regulatory mechanisms of the highly dynamical lncRNAs such as NEAT1. Scaling-up the application of CERTIS on more lncRNAs is urgent to further optimized this system. Meanwhile, working along with other traditional methods can help us to obtain a more reliable and unbiased result of lncRNA biology. In the near future, we believe that CERTIS could become one of the universal tools in the field of lncRNA research and could be a help to flourish the study of lncRNAs.

Plasmid construction and sgRNA preparation
To construct plenti-tdMCP-mScarlet-3×Flag, the NLS-tdMCP fragment was amplified from pHAGE-UBC-NLS-HA-tdMCP-GFP (a gift from Robert Singer; Addgene plasmid no.40649), and mScarlet-3×Flag fragment was amplified from the synthesized plasmid (IGE Biotechnology), then ligated together into a lentivirus vector containing an EF-1α promoter by NEBuilder HiFi DNA Assembly (NEB). The universal KI template plasmid was constructed by ligation of 24×MS2 and IRES-GFP fragment into the multiple cloning site of pBlueScript II SK(+) (Agilent) by NEBuilder HiFi DNA Assembly (NEB). 24×MS2 fragment was amplified from pHAGE-CMV-CFP-24×MS2 (a gift from Robert Singer; Addgene plasmid no.40651), and IRES-GFP fragment was amplified from pMSCV PIG (a gift from David Bartel; Addgene plasmid no.21654). The pre-gRNA site and CC-rich region was introduced into the fragments through b Figure 5. CERTIS enables continuous, long-term live cell imaging of endogenous NEAT1 dynamics. (A) Time-laps snapshots from live imaging of CERTIS-labeled NEAT1 (red). GFP-fused SFPQ (green) was stably expressed to monitor paraspeckle protein dynamic change. The movie was taken for 6 h with 15 min time-lapse. Insets showed the magnified boxed region of the merged channel of NEAT1 and its interaction with SFPQ. Scale bar: 2 μm. Also see Supplementary Movie S1. overlapping PCR primers. The sequence of full-length universal KI template vector was shown in Fig. S6.
Full-length SFPQ and other paraspeckle candidate proteins were PCR amplified from the cDNA of HEK293T cells. The PCR products were first cloned into pENTR™ vector (Invitrogen) and then transferred into a Gateway-compatible destination vector containing a GFP tag by LR reaction according to the manufacturer's protocol (Invitrogen).
The sgRNAs were designed on the website of https://chopchop. cbu.uib.no/# as described before (Montague et al., 2014). The sgRNA oligos were cloned into the pDR274 vector and were in vitro transcribed using the MEGAshortscript T7 transcription kit (Thermo Fisher). sgRNAs were purified using the MEGAclear kit (Thermo Fisher), then dissolved in RNase-free water and quantified using NanoDrop 1000 (Thermo Fisher). The target sequences of gRNAs were shown in Table S1.

One-step generation of linear DNA donor
The universal KI template plasmid was used for generation of linear DNA donor as a PCR template. Primers (IGE Biotechnology) contain 20 nt micro-homology arms (MHAs) around the cleave site (3 nt upstream of PAM) and 25 to 30 nt overlapping region of the universal template plasmid. In addition, all primers were 5′ thiophosphorylated to enhance the half-life of linear donor fragments as previously reported (Orlando et al., 2010;Tasan et al., 2018) and further lengthen the phosphorothioate bonds to their last 5 nt of 5′ end. PCR was performed using KOD FX (TOYOBO) with a step-down cycle condition following manufacturer's instructions. Then the PCR To study the dark matter in the genome, CERTIS allows precise and efficient knock-in of the MS2 cassette into the lncRNA. The second round knock-out for GFP negative selection is optional for eliminating the off-target insertions. CERTIS has an advantage on the labeling of very long LncRNA, which is hard to achieve through exogenous overexpress and tagging. The endogenous promoter-driven GFP enables quantitatively monitoring of the regulation of lncRNA in live cells, which is compatible for high-throughput gene inactivation or drug screening. By stably express the MCP-mScarlet-FLAG fusion protein, it allows for both short-and long-term tracking of the labeled lncRNA in live cells as well as studying the lncRNA-protein interactomes. Live cell imaging and proteomic profiling of endogenous NEAT1 lncRNA RESEARCH ARTICLE products were purified by Gel Extraction Kit (Qiagen) and dissolved in nuclease-free water. Sequence of the primers were shown in Table S1.

MMEJ-based CRISPR KI
To generate highly efficient KI in the target locus,electroporation of Cas9 RNP and the linear DNA donor simultaneously to the cells was performed as previously reported . Specifically, 10 μg TrueCut Cas9 Protein v2 (Thermo) and 2 μg in vitro transcribed sgRNA were incubated at 37°C for 20 min, and 5 × 10 5 HEK293T cells (70%-90% confluent) were harvested and centrifuged at 90 ×g for 10 min. Then the Cas9 RNP was electroporated to the cell pellet together with 1 μg linear DNA donor by Lonza 4D-Nucleofector (Lonza) under a pre-optimized program. 72 h after electroporation, GFP positive cells were enriched by FACs sorting (BD FACSAria II) as described before (Lee et al., 2011), followed by another two rounds of consecutive sorting to make sure the cell population was nearly 100% GFP positive. Then the KI cells were visualized by the stably expression of tdMCP-mScarlet-3×Flag fusion protein. Single cell clones were isolated for further genotyping and experiments. For the precise excision of IRES-GFP fragment, a pair of gRNAs that respectively targeted the pre-gRNA site and the GFP-NEAT1 fusion junction were in vitro transcribed and simultaneously electroporated to the GFP positive clones with Cas9 protein. 72 h after electroporation, GFP negative cells were enriched by another new round of FACs sorting.
Cell culture, RNAi, lentiviral production and stable cell line construction HEK293T cells were cultured in DMEM media supplemented with 10% fetal bovine serum and 1% penicillin-streptomycin. siRNA oligos of SFPQ and the negative control (siNC) were ordered from RiboBio and transfected into cells using Lipofectamine® RNAiMAX (Thermo Fisher Scientific). The sequence of oligos were the same as previously described (Naganuma et al., 2012) and were listed in Table S1. Lentiviral particles were produced by transient transfection of HEK293T cells with pLenti-tdMCP-mScarlet-3×Flag, psPAX2, and pMD2.G in a ratio of 4:3:1. The supernatant containing viral particles was harvested twice at 48 h and 72 h after transfection, and virus titer was tested by TransLvTM Lentivirus qPCR Titration Kit (Transgene Biotech) following the manufacturer's instructions. To construct a stable cell line with low expression level of tdMCP cassette, the MOI (Multiplicity of Infection) was controlled under 0.3 and cells were cultured in medium containing lentivirus and 8 μg/mL polybrene (Sigma) for 48 h, and then subjected to puromycin (5 μg/mL) selection for another 72 h. A low virus titer of lentiviral infection is critical to increase the SNR of the labeled lncRNA within the nucleus.

Clonal isolation, genotyping and sequencing
Clonal isolation was started by limiting dilution in 96-well plate as described before (Mali et al., 2013). Once the clones grown in 12-well plates reached ∼80% confluency, half of the cells were collected for genomic DNA (gDNA) extraction (Qiagen) and the other cells were preserved in liquid nitrogen until further use. Genotyping and Sanger DNA sequencing were carried out as described previously (Tasan et al., 2018). Briefly, all the genotyping PCRs were performed using KOD FX polymerase (TOYOBO) according to the manufacturer's protocol. For the clones with expected genotyping results, knock-in band was purified by Gel Extraction Kit (Qiagen) and performed Sanger DNA sequencing (IGE biotech) to verify sequences of the 5′ and 3′ junctions as well as the integrity of the MS2 repeats. Sequences of the junction PCR primers can be found in Table S1.

RNA isolation and qRT-PCR
Total RNA from each cultured cell line or cultured cells with different treatments was extracted by Trizol Reagent (Invitrogen). For the exact quantification of NEAT1 expression, an improved RNA extraction method specific to nuclear body-associated architectural lncRNA was used by needle shearing of cell lysate in the Trizol reagent as previously described (Chujo et al., 2017). A total of 1 μg RNA was treated with the RNase-free RQ DNase I (Promega) for 10 min at 37°C and then reverse transcribed using the SuperScript III (Invitrogen) with oligo (dT) and random hexamers. Quantitative polymerase chain reaction (qPCR) was carried out in Applied Biosystems StepOne™ Real-Time PCR Systems using the GoTaq qPCR Master Mix (Promega). The relative expression of different sets of genes was quantified to GAPDH mRNA. Primer sequences for qRT-PCR were listed in Table S1.

RNA FISH and immunofluorescence
HEK293T cells were grown on a glass slide and fixed in 4% formaldehyde in PBS. Cells were washed once with PBS, permeabilized with Triton X-100, dehydrated through a series of ethanol washes and hybridized overnight with a digoxigenin-labeled NEAT1 probe (Table S1). RNA FISH signal was detected by incubating with Alexa Flour 647-labeled anti-digoxygenin antibody (Roche) and examined on a Leica TCS SP8 STED microscope.
Immunofluorescence was carried out as described previously (Deng et al., 2019). Briefly, cells growing on glass slides were rinsed with PBS and fixed with 4% PFA for 15 min, and then permeabilized in 0.2% Triton X-100 followed by 30 min 5% BSA blocking. The cells were then stained with human anti-PSPC1 (Abcam, ab104238) or anti-SFPQ antibodies (Abcam, ab11825) in blocking buffer for overnight at 4°C and washed three times with PBS. Then stained with Alexa Flour 647-labeled secondary antibodies (Jackson Laboratory) in blocking buffer for 1 h. After three washes with PBS, coverslips were mounted with VECTASHIELD mounting medium containing 0.5 μg/mL DAPI and then visualized at 100× on a CCD camera mounted on a Leica TCS SP8 STED microscope using imaging software. The magenta signal was defined as green by ImageJ (National Institutes of Health) before further colocalization analysis.

Image acquisition and analysis
For fixed cells on a slide with a coverslip, images were acquired on a Leica TCS SP8 STED microscope using the confocal mode with a 100× oil immersion/1.4 N.A. objective (Leica Microsystems)

RESEARCH ARTICLE
Bohong Chen et al.
following manufacturer's instructions. For long-term imaging of living cells, a 93× glycerol immersion/1.4 N.A. objective was used and the cells were cultured in live cell workstation under 37°C with 5% CO 2 . The excitation wavelengths were used according to the dye of the fluorescent protein specification: DAPI with 405 nm, GFP with 488 nm, mScarlet with 561 nm and Alexa Flour 647 with 633 nm. Emission was detected with a PMT detector. Gain and off-set were set at values which prevented saturated and empty pixels. Images of 1,024 × 1,024 pixels were obtained. 2D STED images as single planes or z-sectioning step with a slice distance of 300 nm were recorded and regions of interest were adjusted manually. After image acquisition, images were deconvolved under lightning model using Leica LAS Lite Software (Leica). Three-dimensional projection of the acquired movies was generated using maximum intensity in ImageJ. Image analyses of signal intensity and colocalization were also carried out by ImageJ software as reported before (Shao et al., 2016).

Native RNA immunoprecipitation
Native RIP was carried out according to previously described (Xing et al., 2017). NEAT1-labeled cells (2 × 10 7 ) were harvested and rinsed twice with ice-cold PBS, WT HEK293T stably expressing tdMCP-mScarlet-3×Flag was used as a control. Cells were suspended in 1 mL RNA immunoprecipitation (RIP) buffer containing 50 mmol/L Tris pH 7.4, 150 mmol/L NaCl, 0.5% NP-40, 1× RNasin plus (Promega), 2 mmol/L ribonucleoside vanadyl complex (VRC, NEB), 1 mmol/L PMSF and 1× protease inhibitor cocktail (Sigma). Followed by brief sonication, cell lysates were centrifuged at 10,000 ×g for 10 min at 4°C and the supernatants were precleared with 10 μL Dynabeads Protein G (Invitrogen). The precleared supernatants were then divided into two parts equally and incubated with 20 μL anti-FLAG M2 Magnetic Beads (Sigma) for 2 h at 4°C, followed by washing three times with RIP buffer. One-third of the beads were incubated with elution buffer (100 mmol/L Tris pH 6.8, 4% SDS, 10 mmol/L EDTA) at room temperature for 10 min followed by western blotting. The remainder was decrosslinking by proteinase K buffer (50 mmol/L Tris pH 7.4, 150 mmol/L NaCl, 1× RNasin plus, 0.5% SDS and 200 μg/mL proteinase K) at room temperature for 30 min before used for RNA extraction. For mass spectrometry, the binding proteins were released from the beads by incubating with the RNase A buffer (50 mmol/L Tris pH 7.4, 150 mmol/L NaCl and 1 μg/mL RNase A) before trypsin digestion. Liquid chromatography-tandem MS (LC-MS/MS) was carried out on the Thermo Q Exactive Hybrid quadrupole orbitrap mass spectrometer as reported before (Tsai et al., 2011).