Short-read and long-read full-length transcriptome of mouse neural stem cells across neurodevelopmental stages

Ding, Chaoqiong; Yan, Xiang; Xu, Mengying; Zhou, Ran; Zhao, Yuancun; Zhang, Dan; Huang, Zongyao; Pan, Zhenzhong; Xiao, Peng; Li, Huifang; Chen, Lu; Wang, Yuan

doi:10.1038/s41597-022-01165-0

Short-read and long-read full-length transcriptome of mouse neural stem cells across neurodevelopmental stages

Data Descriptor
Open access
Published: 02 March 2022

Volume 9, article number 69, (2022)
Cite this article

Download PDF

You have full access to this open access article

Scientific Data

Short-read and long-read full-length transcriptome of mouse neural stem cells across neurodevelopmental stages

Download PDF

Chaoqiong Ding¹^na1,
Xiang Yan¹^na1,
Mengying Xu²^na1,
Ran Zhou²,
Yuancun Zhao²,
Dan Zhang²,
Zongyao Huang¹,
Zhenzhong Pan¹,
Peng Xiao¹,
Huifang Li ORCID: orcid.org/0000-0001-7101-0802³,
Lu Chen ORCID: orcid.org/0000-0002-1083-9729² &
…
Yuan Wang ORCID: orcid.org/0000-0002-6324-6134¹

3071 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

During brain development, neural stem cells (NSCs) undergo multiple fate-switches to generate various neuronal subtypes and glial cells, exhibiting distinct transcriptomic profiles at different stages. However, full-length transcriptomic datasets of NSCs across different neurodevelopmental stages under similar experimental settings are lacking, which is essential for uncovering stage-specific transcriptional and post-transcriptional mechanisms underlying the fate commitment of NSCs. Here, we report the full-length transcriptome of mouse NSCs at five different stages during embryonic and postnatal development. We used fluorescent-activated cell sorting (FACS) to isolate CD133⁺Blbp⁺ NSCs from C57BL/6 transgenic mice that express enhanced green fluorescent protein (EGFP) under the control of a Blbp promoter. By integrating short- and long-read full-length RNA-seq, we created a transcriptomic dataset of gene and isoform expression profiles in NSCs at embryonic days 15.5, 17.5, and postnatal days 1.5, 8, and 60. This dataset provides a detailed characterization of full-length transcripts in NSCs at distinct developmental stages, which could be used as a resource for the neuroscience community to study NSC fate determination, neural development, and disease.

Measurement(s)	transcriptomic datasets of neural stem cells
Technology Type(s)	Next-Generation Sequencing(NGS) • third-generation full-length sequencing(TGS)
Sample Characteristic - Organism	Mus musculus
Sample Characteristic - Environment	laboratory facility
Sample Characteristic - Location	China

Single-Cell Transcriptome Analysis of Neural Stem Cells

Article 27 February 2017

Single-cell RNA-sequencing of the brain

Article Open access 08 June 2017

Longitudinal scRNA-seq analysis in mouse and human informs optimization of rapid mouse astrocyte differentiation protocols

Article 11 September 2023

Background & Summary

During mammalian brain development, neural stem cells (NSCs) give rise to major cell types in various brain regions, including neurons and glial cells (astrocytes and oligodendrocytes). Although embryonic and postnatal NSCs share molecular markers such as brain lipid-binding protein (Blbp, also known as fatty acid-binding protein 7, Fabp7) and CD133 (also known as Prominin-1), their cellular identities and fates vary significantly at different developmental stages. Embryonic NSCs are radial glial cells in the ventricular zone (VZ), which initially generate neurons in different cortical layers, and subsequently, undergo neuron-glia fate-switch to produce astrocytes and oligodendrocytes at late embryonic and perinatal stages¹. After birth, a subset of radial glial cells transform into postnatal NSCs in the subventricular zone (SVZ) and subgranular zone (SGZ) in the hippocampus, which continue to generate interneurons and glia². In the adult brain, the majority of NSCs in the SVZ and SGZ are committed to neuronal fate^3,4. These fate switches in NSCs are driven by dramatic transcriptional alterations. Extensive efforts have been made to characterize human and mouse brain cells including NSCs at bulk and single-cell levels during neurodevelopment^5,6,7. However, partly due to the scarcity of NSCs, full-length transcriptomic datasets of NSCs across different neurodevelopmental stages under similar experimental settings are lacking, which is essential for uncovering stage-specific transcriptional and post-transcriptional mechanisms underlying the fate commitment of NSCs.

Smart-seq2 is a powerful single-cell full-length sequencing protocol, which provides complete coverage across the genome allowing the detection of alternative transcript isoforms and SNPs^8,9.

This protocol can also be adapted for full-length bulk RNA-seq of rare cell populations, such as NSCs. However, for the conventional 2^nd-generation RNA-seq, cDNA generated from Smart-seq2 is fragmented before sequencing, resulting in accurate short-read raw data, which complicates the task of reconstructing and quantifying transcript isoforms. Long-read sequencing, or 3^rd-generation sequencing, on the other hand, does not require cDNA fragmentation and provides a complete picture of the transcriptome at the cost of the sequencing accuracy. Combining short- and long-read sequencing can draw on their respective strengths.

In this study, we used fluorescence-activated cell sorting (FACS) to isolate CD133⁺Blbp-EGFP⁺ NSCs from C57BL/6 transgenic mice at five different stages of embryonic and postnatal development, including embryonic day 15.5 (E15.5, the peak of cortical neurogenesis), E17.5 (the transition to gliogenesis), postnatal day 1.5 (P1.5, neonatal stage), P8 (the peak of postnatal NSC proliferation and gliogenesis), and P60 (adult). We used the Smart-seq2 protocol to prepare the cDNA samples of NSCs at these stages, and performed a total of 20 short-read RNA-seq with at least three samples per stage, along with paired Oxford Nanopore long-read RNA-seq for each stage. The whole study design of the present study is present in Fig. 1a. The resultant dataset provides a detailed characterization of full-length transcripts in NSCs at distinct developmental stages in a similar experimental setting and could be used as a resource to study NSC fate determination, neural development, and disease.

Methods

Animals

The Blbp-EGFP mice used in this study were initially generated by Anthony et al. at The Rockefeller University and obtained from Dr. Yuan Zhu’s lab at Children’s National Medical Center in Washington, DC under a material transfer agreement with Sichuan University. The mice were bred in the Experimental Animal Centre of Sichuan University and maintained on a C57BL/6 genetic background. Mice were housed in pressurized, individually ventilated cages (PIV/IVC) and maintained under specific-pathogen-free conditions, with free access to food and water in a 12 h light/dark cycle. All animal studies were approved by the Animal Care and Use Committee of Sichuan University. For timed pregnancies, the plug date was designated as E0.5 and the date of birth was defined as P0.5.

Sample collection and FACS

To collect embryonic NSCs, embryonic brains were placed into pre-chilled 10% FBS solution (10% FBS in DPBS, Gibco), and the dorsal wall of the LV was dissected out under a dissecting microscope (Motic). The tissue was dissociated by pipetting, and the cells were filtered through a 40 µm nylon mesh cell strainer (BD Falcon) to prepare single-cell suspension.

Postnatal brains were placed into 10% FBS solution, cut into coronal slices, and the SVZ region was harvested, minced into small pieces, and dissociated with Accutase solution (Millipore) at 37 °C for 20 min. The resultant cells were filtered through a 40 µm nylon mesh cell strainer (BD Falcon) to prepare single-cell suspension.

NSCs were stained with viability marker fixable viability stain 510 (FVS510, BD Horizon, 564406) and NSC marking antibody CD133-APC (Abcam, ab19898), and subjected to FACS. 1000 FVS510⁻/CD133⁺/Blbp-EGFP⁺ cells from each mouse were collected for subsequent RNA-seq (Fig. 1b). The FACS plots for all samples are presented in Figshare¹⁰. The cells for 10X single-cell RNA sequencing are also derived from FACS with the same sorting strategy as bulk RNA-seq.

NSC Culture

Sorted FVS510⁻/CD133⁺/Blbp-EGFP⁺ cells were cultured in NSC culture medium (1% N2, 2% serum-free B27, 20 ng/ml EGF and 20 ng/ml bFGF in DMEM/F12) in 6-well ultra-low binding plates (Corning). Neurospheres are visible after 4 days of culture. (Fig. 1c).

cDNA library construction and sequencing

cDNA preparation

cDNA preparation was modified from a published protocol which was originally used for single-cell RNA sequencing¹¹. Briefly, all the components of lysis buffer (TritonX-100, dNTP, Oligo-dT VN primer, and RNase inhibitor) were 2X except RNase inhibitor which was increased to 8X,resulting in a total volume of 8.8 µL. The components for RT-PCR reaction mix were increased accordingly. NSCs were collected in tubes containing lysis buffer and were immediately transferred onto dry ice. The lysate was vortexed vigorously for 1 min followed by incubation at 72°C for 3 min, and subjected to RT-PCR. Reverse transcription mixture was prepared by mixing 1.6 µL SuperScript II reverse transcriptase, 1.6 µL RNase inhibitor, 6.5 µL Superscript II first-strand buffer, 1.6 µL DTT, 6.5 µL betaine, 0.2 µL MgCl₂, 0.3 µL TSO and 0.9 µL nuclease-free H₂O to reach a total volume of 19.2 µL. Cell lysate was mixed with reverse transcription mixture and incubated at 42°C for 90 min, followed by 10X RT-PCR cycles: ① 50 °C for 2 minutes ②42 °C for 2 minutes. Afterwards, the reverse transcribed cDNA samples were incubated at 70 °C for 15 min. For additional PCR amplification, 33.5 µL cDNA was mixed with 33.5 µL KAPA HiFi HotStart ReadyMix and 0.7 µL ISPCR primers to a total volume of 67.7 µL. The mixture was first incubated at ③ 98 °C for 3 minutes, followed by 20X PCR cycles: ① 98 °C for 20 seconds, ② 67 °C for 15 seconds and ③ 72 °C for 6 minutes. The amplified cDNA samples were incubated at 72 °C for 5 minutes. cDNA purification was carried out with Ampure XP magnetic beads (0.8:1 ratio, Beckman Coulter, A63881). Before library construction, cDNA quality was checked with Agilent 2100 Bioanalyzer (Invitrogen). Library construction was performed with qualified cDNA for both short-read and long-read sequencing. The 10X single cell RNA-seq were prepared in the Chromium Single Cell Gene Expression Solution using the Chromium Single Cell 3′ Gel Bead, Chip and Library Kits v2 (10X Genomics) as per the manufacturer’s protocol. 8000–10,000 total cells were added to each channel. The cells were then partitioned into Gel Beads in Emulsion in the Chromium instrument, where cell lysis and barcoded reverse transcription of RNA occurred, followed by amplification, shearing 5′ adapter, and sample index attachment. Libraries were sequenced on the Illumina NovaSeq 6000 platform at Novogene, Beijing, China¹².

Short-read sequencing

Qualified cDNA samples were respectively taken for short-read library construction, including DNA fragmentation, end-repair, 3′ ends A-tailing, adapter ligation, PCR amplification, and library validation. cDNA library was subjected to quality inspection with PerkinElmer LabChip® GX Touch. Qualified libraries were then loaded on the Illumina Hiseq platform for PE150 sequencing.

Long-read sequencing

The cDNA samples for long-read sequencing were taken from the same pools for short-read sequencing. Equal content of cDNA samples of the same stage were mixed into one sample. ONT Ligation Sequencing Kit (SQK-LSK109) was used for library preparation according to the manufacturer’s instructions except that DNA was not sheared before native barcode ligation. The library construction included DNA repair, end preparation, native barcode ligation, purification pooling, and sequencing adapter ligation. The cDNA libraries were pooled evenly at the amount of 80 ng from each stage. Sequencing was performed using MinKNOW (v20.06.4, Oxford Nanopore Technologies Ltd.). MinKNOW is the instrument control software that runs on the host computer to which the MinION equipped with an R9.4.1 flow cell is connected. The data output from MinKNOW consist of 4,000 sequence reads in an HDF5 format called FAST5.

Validation of splice junctions by PCR

To validate NSCs time points-specific splice junctions we designed exon-specific PCR primers for 8 stage-specific alternative splicing events, including two types of splice junctions: Alternative 3′splice site (A3SS) and Skipping exon (SE). Pooled cDNA libraries for NSCs from each time point (E15.5, E17.5, P1.5, P8 and P60) were mixed with the primers and subjected to 30X PCR cycles. Each PCR reaction contained 25 ng of library DNA template and 10 pmol of each gene specific primer in a PCR master mix (2X Phusion Plus Green, Thermo Scientific) at a total volume of 50 µl. PCR products were subjected to electrophoresis separation in 2% TBE/agarose gels. The images were captured and gel band intensity was calculated by Image J. The PCR PSI is calculated as the intensity of the long transcript divided by the total intensity of the long + short transcripts.

Gene Name	Forward Primer (5′→3′)	Reverse Primer (5′→3′)
*Capzb*	GCACGCTGAATGAGATCTAC	GCGTGGTCGATGCAAACTG
*Dync1i2*	CTATGTCTCCATCCTCCAAGTC	GGTCTGAGTTTCCTTTGTGTATG
*Gkap1*	CTCCCGCTCCAGAGCACAAC	GGACCGTCAGCTCGTTCTTG
*Gpm6b*	CATGTCCTATCACCTGTTCATTG	CAGTTCCTGCTCTTCCTTTGC
*Hnrnpdl*	CCAGAACAATTACCAGCCCTAC	GAGTCATCATAACACAGGTAGC
*Nkain4*	GTCTATGGTTGCTACGTGGTCAG	CTCACAGTTGTAGCCACCCTGTC
*Abat*	GAGAACGGTGGCTGGAATCATCG	GCAGGTCTTCCCGCTTGATGATG
*Sox5*	CACCAGGCTTAGGCCCACTC	CAGAGCTGGCATGTGAGGAGAG

Data processing

Deconvolution of bulk RNA-seq

We used MuSiC to deconvolute the transcriptome of Bulk RNA-Seq samples into the likely constituent cell types, using scRNA-seq datasets from same samples as Bulk RNA-Seq as a reference. We calculated the predicted proportions of each cell type in bulk samples, and visualized these proportions with bar plot¹².

ScRNA-seq data analysis

We used Seurat (v3.1.0) for downstream analyses including data normalization (NormalizeData, LogNormalize method, scaling factor 10,000), data feature scaling (ScaleData), variable gene detection (FindVariableGenes with vst method) and PCA of variable genes (RunPCA). The statistically significant PCs were used for Harmony to remove the batch effect, and the two-dimension UMAP was calculated among the Harmony matrix¹³. Then the original Louvain algorithm (FindClusters) with clustering resolution 1.4 was performed to cluster the cells. We computed DEGs using the FindAllMarkers function in the Seurat package with default parameters. To determine the cell types, we used the list of DEGs and the published dataset of marker genes^10,14.

Base calling

The raw data generated by MinKNOW software were converted from.fast5 files to base-called.fastq files under high accuracy mode using the ONT basecaller Guppy software (v.4.0.14)¹⁵. In the meantime, the sample barcodes were trimmed off with the modes ‘--barcode_kits’ and ‘--trim_barcode’.

Quality control

The quality of the short-read sequencing data was checked using FastQC software (v0.11.8) (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) and RSeQC package (v4.0.0)¹⁶ (http://rseqc.sourceforge.net). For long-read sequencing data, the quality check was performed with NanoComp software (v1.33.1)¹⁷.

Alignment

The paired-end reads of short-read sequencing were aligned to the mouse reference genome GRCm38 with annotation from ENSEMBLE release 93 using STAR (v2.7.1a)¹⁸, and the reads of long-read sequencing were aligned to the same genome file using Minimap2 (v2.17-r974-dirty)¹⁹.

Aligned reads distribution

Gene body coverage, reads’ distribution over genome feature, and RNA integrity at cDNA level of short-read sequencing data were calculated by geneBody_coverage.py, read_distribution.py, and tin.py from RSeQC¹⁶, respectively.

Gene expression quantification

For short-read sequencing data, the gene expression was quantified using the HTSeq (v0.11.2)²⁰. The raw read counts were then normalized by their library size factors and were normalized to stabilize the variance across the samples using DESeq 2 (v1.28.1)²¹ with variance stabilizing transformation (VST). The top 500 highly variable genes were utilized for unsupervised clustering analysis.

Differential splicing usage

The percent-spliced-in (PSI, also denoted Ψ) value was calculated according to a previous study²² and recapitulated here. The PSI metric was computed directly by counting reads that aligned to known or predicted splicing junctions (SJs) generated from STAR¹⁸. The significance of enrichment was tested by a two-tailed hypergeometric test²³. The functional annotations of each AS event were performed by using the GenomicRanges R package (v1.40.0, findOverlaps module)²⁴ and mouse reference genome GRCm38 with annotation from ENSEMBLE release 93.

Identification of AS modes

The resulting alignments (in BAM format) of long-read sequencing data were used to build sample-specific transcriptome assembled with Stringtie2 (v2.1.4)²⁵. The mode -G was specified to use the mouse reference, and the mode -L was specified in long reads format. GffCompare²⁶ was used to compare and evaluate the accuracy of Stingtie2²⁵ transcript assemblers. The consolidated set of accurate isoforms (GTF format) were used to obtain a list of all possible AS events, and SUPPA2 generateEvents mode²³ was used to generate all AS events with the parameter of “-f ioe -e SE SS MX RI FL”.

Transcript visualization

For visualization of transcripts, samtools (v1.10.2)²⁷ was used to extract the gene region, bedtools (v2.30.0)²⁸ was used to convert bam file to bed format files, UCSC tools were used to convert bed format files to GTF format. Visualization was carried out by using R package ggbio (v1.36.0)²⁹. Sashimi plots of short-read sequencing data were plotted using pysashimi (https://github.com/ygidtu/pysashimi).

Data Records

The raw fastq files were deposited at NCBI under accession number SRP321063³⁰. The FACS data for individual samples, as well as the processed files, including the quantification of gene expression, isoforms, SJs from both short-read and long-read data were uploaded in Figshare¹⁰. We also included detailed gene expression matrix and cell type determination of a validation scRNA-seq dataset in Figshare¹⁰.

Technical Validation

NSC purity

To confirm our sorted cells are indeed mainly composed of NSCs, we first performed neurosphere culture assay. Sorted cells readily formed Blbp-EGFP⁺ neurospheres after 4-day non-adherent culture in vitro, indicating they are neural stem or progenitor cells (Fig. 1c). We further performed single-cell RNA-seq on CD133⁺Blbp-EGFP⁺ cells from P60 SVZ to identify individual cell types in these cells¹⁰. The majority of the cells are aNSCs (73.8%) (Fig. 1d,e). The qNSC population is relatively minor, and is transcriptionally hard to distinguish from astrocytes. Consistent with our sampling of dorsal SVZ regions, Gli1, a marker for ventral NSCs, is barely detected in the NSC populations (Fig. 1f). To determine the NSC percentage in bulk samples, we performed deconvolution of bulk RNA-seq using our scRNA-seq dataset as a reference. Consistently, the majority of the cells are aNSC-like (Fig. 1g). These data support that our sample collection method can enrich for NSCs.

RNA integrity

As the mRNA was reversely transcribed immediately after bulk NSCs were lysed, the mRNA integrity could not be measured directly. We performed the quality check by examining the fragment distribution of cDNA. It turned out that all peaks of the sample cDNA were longer than 1200 bp (Online-only Table 1). The RNA integrity at the transcript level was further evaluated using the Transcript Integrity Number (TIN) algorithm, which was calculated with the tin.py script from the RSeQC package¹⁶. TIN represents a score ranging from 0 to 100 for each expressed transcript, and the medTIN (median TIN score across all the transcripts) can be used to measure the RNA integrity at the sample level. The mean TIN score of all samples was 43.87 (Online-only Table 1).

Data quality

Biological replicates are fundamental to guarantee data reliability. In the present study, we took 3 E15.5 samples, 6 E17.5 samples, 4 P1.5 samples, 4 P8 samples, and 3 P60 samples for bulk transcriptome sequencing. The average depth was 25.99 M (SD = 11.78) for the short-read sequencing. The quality of each base generated was assessed using FastQC. There is no significant difference in the distribution of average quality score per base in samples from different stages, and the mean of Q30 is over 100% (Fig. 2a). The reads generated from all samples were distributed approximately uniform across the gene body (Fig. 2b). We further gathered the gene regions where the reads mapped to, and more than 90% of reads were mapped to exon regions (Fig. 2c). Moreover, Cook’s distance was calculated to test for outliers, with none detected (Fig. 2d). All the samples have over 80% uniquely mapped reads (Online-only Table 1). Besides, Q30 of each sample is higher than 85% (Online-only Table 1).

For long-read sequencing data, we selected 20.6 Gb reads using Guppy¹⁵ from 77 Gb raw data and identified 12,387,984 reads. The sequencing statistics were counted using NanoComp¹⁷. The mean read length of all the five stages was around 800 bp, and the P60 sample had the maximum mean length (Fig. 2e, Online-only Table 1). On the aspect of base calling quality, the mean read quality score was above 12 for each sample (Fig. 2f, Online-only Table 1).

To establish the congruency of short-read data among all stages, we carried out principal component analysis (PCA) (Fig. 3a) and hierarchical clustering (Fig. 3b) using the top 500 highly variable genes from normalized RNA-seq data with variance stabilizing transformation (VST) in DESeq 2²¹. The PC1 explained 51% of the variance, while the PC2 explained 15% variance. PCA revealed that stage E15.5 was close to E17.5, while stage P1.5 was close to P60 (Fig. 3a). Besides, the samples of P8 were distant from all the other stages (Fig. 3a). Hierarchical clustering showed a similar result to that of PCA (Fig. 3b).

Differential splicing usage

As an index of AS, the PSI value was calculated for the inclusion levels of internal exons, as described in a previous study²². Differential PSI was calculated via a two-tailed hypergeometric test. The heatmap (Fig. 3c) showed all the 4403 differential alternative SJs in 5 stages with the P-value < 0.01, ΔPSI > 0.2, and these SJs were detected in more than 60% samples of a target group and existed in other groups in short-read sequencing data. To further understand the differential splicing usage, we analyzed the AS types and the host genes of differential SJs. The SJs with single skipping exon (SE) (30.39%) were much more than SJs with other singles, including alternative 3′ splice site (A3), alternative 5′ splice site (A5), alternative first exon (AF), alternative last exon (AL) and mutually exclusive exons (MX) (Fig. 3d). Besides, there were 22.21% SJs with unknown AS types and clusters (multi-AS types). Most host genes (93.9%) were protein-coding genes, with only 2.37% host genes expressing processed transcripts, 1.88% expressing lincRNA, and 1.37% as antisense genes (Fig. 3d). The left 0.49% host genes belong to other types (Fig. 3d).

Stage-specific AS

For long-read sequencing data, seven types of AS were quantified to analyse the relative contribution of AS at all five stages. The quantification of each type of AS event was performed by using the SUPPA2 program³¹. In total, 33,230 AS events were identified from all five stages. The P60 stage had the most 13211 AS events, while P8 had the least 6649 AS events (Fig. 4a). Moreover, both E15.5 and E17.5 had nearly 10,000 AS events (Fig. 4a). Stage E15.5 and E17.5 had the most overlapped AS events (Fig. 4a). The P60 stage had the largest percentage of specific AS events (22.1%), while P8 had the smallest percentage of specific AS events (9.84%) (Fig. 4b). The other 3 stages had around 14% specific AS events (Fig. 4b). The percentage of each AS type was also analyzed. SE was the most AS type, which constitutes 30%~37% AS events in each stage, whereas AL and MX were the least with each frequency less than 5% (Fig. 4c).

To further test the consistency between short-read and long-read sequencing data, we checked the SE of two neural development-associated genes, Sox5 and Abat (Fig. 5a). We screened some reads according to their start and end sites within the range of 600 bp upstream and downstream of the annotated transcripts in the mouse reference genome GRCm38. For Sox5, Exon 7 skipping was annotated by Ensembl which involves 2 transcripts, ENSMUST00000170367 and ENSMUST00000038815 (Fig. 5a). These 2 transcripts were found out in the long-read sequencing data but were not evenly distributed among the five stages. These transcripts were highest in E15.5 samples and not expressed in P1.5 samples (Fig. 5a). Samples from embryonic stages (E15.5 and E17.5) but not postnatal stages contain the transcript that skips Exon 7 (Fig. 5a). The splicing junction analysis of short-read sequencing data showed that Exon 7 skipping occurred partially at E15.5 (PSI = 0.69) and E17.5 (PSI = 0.21) stages (Fig. 5a). However, no Exon 7 skipping occurred at P8 (PSI = 1) or P60 stages (PSI = 1) (Fig. 5a). For gene Abat, Exon 12 skipping was annotated by Ensembl which involves 2 transcripts, ENSMUST00000115839 and ENSMUST00000065987 (Fig. 5b). The former transcript with no Exon 12 was only found in long-read sequencing data of the P60 stage, while the later one harbouring Exon 12 was found in both E15.5 and P60 stages (Fig. 5b). Splicing junction analysis of short-read sequencing also found that Exon 12 skipping occurred at the P60 stage (PSI = 0.26) but not at E15.5 (PSI = 1) The exon retention ratios for Sox5, Abat are confirmed by PCR analysis (Fig. 5a,b). We performed PCR validation for 6 additional alternative splicing events (Gkap1, Hnrnpdl, Capzb, Gpm6b, Dync1i2 and Nkain4), and the PCR PSIs are consistently correlated with RNA-seq PSIs (Fig. 5c). Detailed transcript information for these genes are available in Figshare¹⁰.

Usage Note

The present study provides full-length transcriptomic profiles of mouse NSCs across embryonic and postnatal stages. As the dataset contains both short-read and long-read sequencing data, the profiles are reliable for related researches. The profiles are valuable for transcriptional and posttranscriptional mechanisms of neurodevelopment and fate commitment of NSCs, especially the stage-specific gene expression and alternative splicing. Besides, the profiles are also suitable for exploring the molecular mechanisms underlying diseases related to neurodevelopment.

Code availability

The codes used in this article were deposited in https://github.com/LuChenLab/Neuron.

References

Kriegstein, A. & Alvarez-Buylla, A. The glial nature of embryonic and adult neural stem cells. Annu Rev Neurosci 32, 149–184, https://doi.org/10.1146/annurev.neuro.051508.135600 (2009).
Article CAS PubMed PubMed Central Google Scholar
Bond, A. M., Ming, G. L. & Song, H. Adult Mammalian Neural Stem Cells and Neurogenesis: Five Decades Later. Cell Stem Cell 17, 385–395, https://doi.org/10.1016/j.stem.2015.09.003 (2015).
Article CAS PubMed PubMed Central Google Scholar
Menn, B. et al. Origin of oligodendrocytes in the subventricular zone of the adult brain. J Neurosci 26, 7907–7918, https://doi.org/10.1523/JNEUROSCI.1299-06.2006 (2006).
Article CAS PubMed PubMed Central Google Scholar
Suh, H. et al. In vivo fate analysis reveals the multipotent and self-renewal capacities of Sox2+ neural stem cells in the adult hippocampus. Cell Stem Cell 1, 515–528, https://doi.org/10.1016/j.stem.2007.09.002 (2007).
Article CAS PubMed PubMed Central Google Scholar
Codega, P. et al. Prospective identification and purification of quiescent adult neural stem cells from their in vivo niche. Neuron 82, 545–559, https://doi.org/10.1016/j.neuron.2014.02.039 (2014).
Article CAS PubMed PubMed Central Google Scholar
Rosenberg, A. B. et al. Single-cell profiling of the developing mouse brain and spinal cord with split-pool barcoding. Science 360, 176–182, https://doi.org/10.1126/science.aam8999 (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
Zhong, S. et al. Decoding the development of the human hippocampus. Nature 577, 531–536, https://doi.org/10.1038/s41586-019-1917-5 (2020).
Article ADS CAS PubMed Google Scholar
Picelli, S. et al. Smart-seq 2 for sensitive full-length transcriptome profiling in single cells. Nat Methods 10, 1096–1098, https://doi.org/10.1038/nmeth.2639 (2013).
Article CAS PubMed Google Scholar
Ramskold, D. et al. Full-length mRNA-Seq from single-cell levels of RNA and individual circulating tumor cells. Nat Biotechnol 30, 777–782, https://doi.org/10.1038/nbt.2282 (2012).
Article CAS PubMed PubMed Central Google Scholar
Chaoqiong, Ding. et al. Short-read and long-read full-length transcriptome of neural stem cells across different stages of mouse brain development, Figshare, https://doi.org/10.6084/m9.figshare.14658867.v1 (2021).
Picelli, S. et al. Full-length RNA-seq from single cells using Smart-seq 2. Nature Protocols 9, 171–181, https://doi.org/10.1038/nprot.2014.006 (2014).
Article CAS PubMed Google Scholar
Wang, X. et al. Sequential fate-switches in stem-like cells drive the tumorigenic trajectory from human neural stem cells to malignant glioma. Cell Res 31, 684–702, https://doi.org/10.1038/s41422-020-00451-z (2021).
Article CAS PubMed PubMed Central Google Scholar
Korsunsky, I. et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat Methods 16, 1289–1296, https://doi.org/10.1038/s41592-019-0619-0 (2019).
Article CAS PubMed PubMed Central Google Scholar
Borrett, M. J. et al. Single-Cell Profiling Shows Murine Forebrain Neural Stem Cells Reacquire a Developmental State when Activated for Adult Neurogenesis. Cell reports 32, 108022, https://doi.org/10.1016/j.celrep.2020.108022 (2020).
Article CAS PubMed Google Scholar
Wick, R. R., Judd, L. M. & Holt, K. E. Performance of neural network basecalling tools for Oxford Nanopore sequencing. Genome Biol 20, 129, https://doi.org/10.1186/s13059-019-1727-y (2019).
Article CAS PubMed PubMed Central Google Scholar
Wang, L., Wang, S. & Li, W. RSeQC: quality control of RNA-seq experiments. Bioinformatics 28, 2184–2185, https://doi.org/10.1093/bioinformatics/bts356 (2012).
Article CAS PubMed Google Scholar
De Coster, W., D’Hert, S., Schultz, D. T., Cruts, M. & Van Broeckhoven, C. NanoPack: visualizing and processing long-read sequencing data. Bioinformatics 34, 2666–2669, https://doi.org/10.1093/bioinformatics/bty149 (2018).
Article CAS PubMed PubMed Central Google Scholar
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21, https://doi.org/10.1093/bioinformatics/bts635 (2013).
Article CAS PubMed Google Scholar
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100, https://doi.org/10.1093/bioinformatics/bty191 (2018).
Article CAS PubMed PubMed Central Google Scholar
Anders, S., Pyl, P. T. & Huber, W. HTSeq–a Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166–169, https://doi.org/10.1093/bioinformatics/btu638 (2015).
Article CAS PubMed Google Scholar
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq 2. Genome Biol 15, 550, https://doi.org/10.1186/s13059-014-0550-8 (2014).
Article CAS PubMed PubMed Central Google Scholar
Pervouchine, D. D., Knowles, D. G. & Guigo, R. Intron-centric estimation of alternative splicing from RNA-seq data. Bioinformatics 29, 273–274, https://doi.org/10.1093/bioinformatics/bts678 (2013).
Article CAS PubMed Google Scholar
Kachitvichyanukul, V. & Schmeiser, B. Computer-Generation of Hypergeometric Random Variates. J Stat Comput Sim 22, 127–145, https://doi.org/10.1080/00949658508810839 (1985).
Article MATH Google Scholar
Lawrence, M. et al. Software for computing and annotating genomic ranges. PLoS Comput Biol 9, e1003118, https://doi.org/10.1371/journal.pcbi.1003118 (2013).
Article CAS PubMed PubMed Central Google Scholar
Kovaka, S. et al. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol 20, 278, https://doi.org/10.1186/s13059-019-1910-1 (2019).
Article CAS PubMed PubMed Central Google Scholar
Pertea, G. & Pertea, M. GFF Utilities: GffRead and GffCompare. F1000Res 9, https://doi.org/10.12688/f1000research.23297.2 (2020).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079, https://doi.org/10.1093/bioinformatics/btp352 (2009).
Article CAS PubMed PubMed Central Google Scholar
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842, https://doi.org/10.1093/bioinformatics/btq033 (2010).
Article CAS PubMed PubMed Central Google Scholar
Yin, T., Cook, D. & Lawrence, M. ggbio: an R package for extending the grammar of graphics for genomic data. Genome Biol 13, R77, https://doi.org/10.1186/gb-2012-13-8-r77 (2012).
Article PubMed PubMed Central Google Scholar
Chaoqiong, Ding. Short- and long-read RNA-seq of mouse neural stem cells across five developmental stages, NCBI Sequence Read Archive, https://identifiers.org/bioproject:PRJNA731598 (2021).
Trincado, J. L. et al. SUPPA2: fast, accurate, and uncertainty-aware differential splicing analysis across multiple conditions. Genome Biol 19, 40, https://doi.org/10.1186/s13059-018-1417-1 (2018).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank Bin Chen for technical support. Y.W. is supported by the National Key Research and Development Program of China, Stem Cell and Translational Research (2017YFA0106500), the National Natural Science Foundation of China (31871376), the Distinguished Young Scientists Program of Sichuan Province (2019JDJQ0029), and the 1·3·5 project for disciplines of excellence, West China Hospital, Sichuan University (ZYYC20019). L.C. is supported by the National Key Research and Development Program of China, Stem Cell and Translational Research (2017YFA0106800 and 2017YFA0106500), and the National Science Fund for Excellent Young Scholars (81722004).

Author information

These authors contributed equally: Chaoqiong Ding, Xiang Yan, Mengying Xu.

Authors and Affiliations

Department of Neurosurgery, State Key Laboratory of Biotherapy and Cancer Center, West China Hospital, Sichuan University and National Collaborative Innovation Center, Chengdu, 610041, China
Chaoqiong Ding, Xiang Yan, Zongyao Huang, Zhenzhong Pan, Peng Xiao & Yuan Wang
Key Laboratory of Birth Defects and Related Diseases of Women and Children of MOE, State Key Laboratory of Biotherapy, West China Second Hospital, Sichuan University, Chengdu, 610041, China
Mengying Xu, Ran Zhou, Yuancun Zhao, Dan Zhang & Lu Chen
Core Facilities of West China Hospital, Sichuan University, Chengdu, China
Huifang Li

Authors

Chaoqiong Ding
View author publications
You can also search for this author in PubMed Google Scholar
Xiang Yan
View author publications
You can also search for this author in PubMed Google Scholar
Mengying Xu
View author publications
You can also search for this author in PubMed Google Scholar
Ran Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Yuancun Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Dan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zongyao Huang
View author publications
You can also search for this author in PubMed Google Scholar
Zhenzhong Pan
View author publications
You can also search for this author in PubMed Google Scholar
Peng Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Huifang Li
View author publications
You can also search for this author in PubMed Google Scholar
Lu Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yuan Wang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Y.W. and L.C. conceived and supervised the study, and finalized the manuscript. C.D., X.Y. and M.X. drafted the manuscript. C.D., X.Y., P.X., Y.Z. and H.L. performed the tissue preparation, FACS, RNA-seq, PCR experiments, and analyzed the data. M.X., assisted by R.Z., D.Z. and Z.H., performed most of the computational analyses, analyzed the data.

Corresponding authors

Correspondence to Lu Chen or Yuan Wang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Online-only Table

Online-only Table 1 Sample and sequencing information.

Full size table

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

The Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/ applies to the metadata files associated with this article.

Reprints and permissions

About this article

Cite this article

Ding, C., Yan, X., Xu, M. et al. Short-read and long-read full-length transcriptome of mouse neural stem cells across neurodevelopmental stages. Sci Data 9, 69 (2022). https://doi.org/10.1038/s41597-022-01165-0

Download citation

Received: 26 July 2021
Accepted: 08 February 2022
Published: 02 March 2022
DOI: https://doi.org/10.1038/s41597-022-01165-0
Springer Nature Limited

This article is cited by

Comprehensive assessment of mRNA isoform detection methods for long-read sequencing data
- Yaqi Su
- Zhejian Yu
- Wanlu Liu
Nature Communications (2024)

Short-read and long-read full-length transcriptome of mouse neural stem cells across neurodevelopmental stages

Abstract

Similar content being viewed by others

Single-Cell Transcriptome Analysis of Neural Stem Cells

Single-cell RNA-sequencing of the brain

Longitudinal scRNA-seq analysis in mouse and human informs optimization of rapid mouse astrocyte differentiation protocols

Background & Summary

Methods

Animals

Sample collection and FACS

NSC Culture

cDNA library construction and sequencing

cDNA preparation

Short-read sequencing

Long-read sequencing

Validation of splice junctions by PCR

Data processing

Deconvolution of bulk RNA-seq

ScRNA-seq data analysis

Base calling

Quality control

Alignment

Aligned reads distribution

Gene expression quantification

Differential splicing usage

Identification of AS modes

Transcript visualization

Data Records

Technical Validation

NSC purity

RNA integrity

Data quality

Differential splicing usage

Stage-specific AS

Usage Note

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Online-only Table

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comprehensive assessment of mRNA isoform detection methods for long-read sequencing data

Search

Navigation