Cell types differ in global coordination of splicing and proportion of highly expressed genes

Trakhtenberg, Ephraim F.; Pho, Nam; Holton, Kristina M.; Chittenden, Thomas W.; Goldberg, Jeffrey L.; Dong, Lingsheng

doi:10.1038/srep32249

Cell types differ in global coordination of splicing and proportion of highly expressed genes

Article
Open access
Published: 31 August 2016

Volume 6, article number 32249, (2016)
Cite this article

Download PDF

You have full access to this open access article

Scientific Reports

Cell types differ in global coordination of splicing and proportion of highly expressed genes

Download PDF

Ephraim F. Trakhtenberg¹,
Nam Pho²,
Kristina M. Holton²,
Thomas W. Chittenden²,
Jeffrey L. Goldberg³ &
…
Lingsheng Dong²

2701 Accesses
17 Citations
2 Altmetric
Explore all metrics

Abstract

Balance in the transcriptome is regulated by coordinated synthesis and degradation of RNA molecules. Here we investigated whether mammalian cell types intrinsically differ in global coordination of gene splicing and expression levels. We analyzed RNA-seq transcriptome profiles of 8 different purified mouse cell types. We found that different cell types vary in proportion of highly expressed genes and the number of alternatively spliced transcripts expressed per gene and that the cell types that express more variants of alternatively spliced transcripts per gene are those that have higher proportion of highly expressed genes. Cell types segregated into two clusters based on high or low proportion of highly expressed genes. Biological functions involved in negative regulation of gene expression were enriched in the group of cell types with low proportion of highly expressed genes and biological functions involved in regulation of transcription and RNA splicing were enriched in the group of cell types with high proportion of highly expressed genes. Our findings show that cell types differ in proportion of highly expressed genes and the number of alternatively spliced transcripts expressed per gene, which represent distinct properties of the transcriptome and may reflect intrinsic differences in global coordination of synthesis, splicing and degradation of RNA molecules.

Conserved regulation of RNA processing in somatic cell reprogramming

Article Open access 31 January 2019

Single-cell full-length total RNA sequencing uncovers dynamics of recursive splicing and enhancer RNAs

Article Open access 12 February 2018

Transcription Factors and Splice Factors—Interconnected Regulators of Stem Cell Differentiation

Article 29 June 2023

Introduction

How does a cell maintain global properties of the transcriptome? This question has been addressed using thermodynamic models explaining the maintenance of RNA homeostasis and involving equilibrium between synthesis and degradation^{1,2,3,4,5,6,7,8,9}. Evidence also exists that global levels of transcription could be affected by genes such as c-Myc or by chromosomal aneuploidies^10,11,12, however, it is unknown whether various mammalian cell types differ intrinsically in how they maintain their global properties of the transcriptome. For example, do different cell types vary in a negative feedback threshold or a general molecular mechanism for regulating the levels of highly expressed genes? Is alternative splicing mechanism active at similar levels across cell types?

To investigate these questions, we compared proportion of expressed genes, alternatively spliced transcripts and other global properties of the transcriptome at different expression thresholds in transcriptome profiles of 8 purified mouse cell types from different developmental lineages: retinal ganglion cells (RGC)¹³, cortical neurons, astrocytes, oligodendrocytes, microglia, endothelial cells¹⁴, megakaryocyte-erythroid progenitors (MEP) and erythroid-committed precursors (ECP) Gata1 knockout (KO, which cannot differentiate into the erythroid cells without Gata1)^15,16.

Results

To analyze the cell types’ transcriptome profiles, we selected the datasets that had two replicates and were generated using libraries prepared from the polyA-selected RNA and paired reads sequenced 100 bp from each end on HiSeq 2000 Sequencer (Illumina) in all samples. The origins of the datasets used in this study are shown in Table 1. We analyzed the datasets using the Cufflinks pipeline^17,18,19 (class codes for the novel predicted transcripts are summarized in Figure S1). As comparative RNA-seq analyses could be affected by noise, sequencing depth, gene length and normalization^{20,21,22,23,24,25}, we filtered the datasets to improve their quality (the pipeline is summarized in Fig. 1A; see Methods for details). Filtering improved quality of the data, as shown by average correlation between replicates within the samples increasing from r average of 0.715 in unfiltered to 0.946 in filtered and further to 0.949 after random subsampling (Fig. 1B). The filtered replicates’ gene expression profiles were highly correlated within but not between the samples (correlation matrix in Table 2). On average over 95% of the filtered reads aligned to transcripts across cell types, with less than 5% percent aligning to introns and intergenic regions (Fig. 1C).

Table 1 Sources of the cell type-specific RNA-seq datasets used in this study.

Full size table

Table 2 Correlation Matrix (Pearson, 2-tailed).

Full size table

We then analyzed cell types’ expression profiles clustering (Fig. 2). Due to transcript length bias and possible noise at very low levels of expression (Fig. 3B), only genes expressed above 1 FPKM in at least one sample were retained for this analysis. Hierarchical cluster analysis segregated cell types into 3 groups (Fig. 2): (a) mesodermal origin myeloid precursors-derived MEPs and ECPs Gata1 KO; (b) although microglia also originated from the myeloid precursors they formed a discrete group on its own consistent with their divergence towards a different cell fate; and (c) neuroectodermal origin/neural stem cell-derived RGCs, cortical neurons, astrocytes and oligodendrocytes, although endothelial cells also associated with this neuro-cluster despite their mesodermal origin. In the original study from which we obtained the raw reads for several of the cell types, the endothelial cells also clustered closely with some neural lineage cell types¹⁴. Thus, cell types’ expression profile clusters segregate consistently with their developmental lineages, cell fates and previous analyses.

Next, we compared the number of genes expressed at different expression thresholds in cell types’ transcriptome profiles. We plotted the number of expressed genes across increasing normalized expression (FPKM) thresholds and found that cell types differed significantly in the proportion of highly expressed genes (p < 0.001 by ANOVA with repeated measures, sphericity assumed, Fig. 3A), particularly ≥20 FPKM (also see later, Fig. 4C). We also tested with the upper quartile normalization and found similar differences between cell types in the proportion of highly expressed genes (p < 0.001, Figure S2), with the same four cell types comprising either upper or lower ranking groups (Table S1), as we also show later in Fig. 4B, although there were minor differences within the upper ranking group (Table S1). These data show that findings were not driven by the normalization method. Across samples, transcript length correlated weakly with the expression level at very low levels of expression, but there was no correlation above 1 FPKM (Fig. 3B). The differences in average transcript length at higher expression thresholds (≥1 FPKM, Fig. 3C) did not follow the pattern of how cell types differed in the proportion of highly expressed genes. For example, oligodendrocyte and microglia were amongst the cell types with the highest proportion of highly expressed genes, but both were at the middle of distribution of cell types’ average transcript length at high expression thresholds. We then examined whether cell types vary in the number of alternately spliced transcripts expressed from a locus at different expression thresholds. We found that while at low expression levels (<1 FPKM) the ratio of transcripts per gene was similar across cell types, at higher expression thresholds (≥1 FPKM) the ratio differed between cell types (Fig. 3D). Further, the differences between cell types in the ratio of transcripts per gene at high expression thresholds (particularly ≥20 FPKM) followed the pattern of differences between cell types in proportion of highly expressed genes (also ≥20 FPKM). These data suggest that cell types differ in proportion of highly expressed genes and that these differences are associated with the number of alternatively spliced transcripts expressed per gene. Thus, our analyses show that cell types that express more variants of alternatively spliced transcripts per gene also tend to express higher proportion of highly expressed genes, suggesting that alternative splicing activity and the level of gene expression are linked.

Then we asked whether cell types segregate into groups based on patterns in proportion of highly expressed genes. Hierarchical cluster analysis segregated cell types into 2 major groups (Fig. 4A,B): (a) RGCs, astrocytes, cortical neurons and endothelial cells and (b) MEPs, ECPs Gata1 KO, microglia and oligodendrocytes. Similarly to clustering based on genes’ expression level (Fig. 2), the neuroectodermal origin neural stem cell-derived RGCs, cortical neurons and astrocytes, as well as mesodermal-derived endothelial cells, clustered together. Further, mesodermal origin myeloid precursors-derived MEPs, ECPs Gata1 KO and microglia clustered together, despite that in clustering based on genes’ expression level microglia formed a discrete group on its own. However, oligodendrocytes did not follow either the pattern of clustering based on genes’ expression level nor developmental lineage, as they clustered with mesodermal instead of their neuroectodermal origin cell types. These data suggests that differences between cell types in proportion of highly expressed genes represents a distinct property of the transcriptome that is related to, but is not always explained by, clustering based on genes’ expression levels and developmental lineage.

Next, we identified genes differentially enriched in the two clusters which segregated based on patterns in proportion of highly expressed genes. Cell types in each group were treated as one condition and the analysis of differential expression between the two conditions was performed as above (see Methods for details). The difference between these groups in the average proportion of highly expressed genes was significant (p < 0.01; Fig. 4C). Further, the ratio of expressed genes number averages in groups with high to low proportion of highly expressed genes increases at higher expression thresholds (Fig. 4D). Consistent with one of the two groups of cell types having a higher proportion of highly expressed genes, more genes were differentially enriched in this group (Fig. 5A,B) and the ratio of enriched DE genes numbers in groups with high to low proportion of highly expressed genes also increased at higher expression thresholds (Fig. 5C).

Finally, we analyzed functional annotations of the DE genes. As we found that even weak correlation between the transcript length and expression level does not persist at expression above 1 FPKM in our filtered datasets (Fig. 1D), we set the expression threshold to be above 1 FPKM (in the condition in which its expression was enriched). We set the minimum fold-change threshold to 2. There was no significant difference between the average length of expressed DE and not-DE transcripts (Fig. 5D). We then proceeded to Functional Annotation Clustering of the biological processes GO terms using the Database for Annotation, Visualization and Integrated Discovery (DAVID), where higher enrichment score signifies more cluster enrichment and is the geometric mean (in -log scale) of p-values for the individual annotation categories comprising the cluster^26,27. We found enrichment of biological functions involved in negative regulation of gene expression in the group of cell types with low proportion of highly expressed genes and an enrichment of biological functions involved in regulation of transcription and RNA splicing in the group of cell types with high proportion of highly expressed genes (Tables 3, S2 and S3). Our analyses raise the hypothesis that the genes comprising these predicted biological pathways underlie the intrinsic differences between cell types in proportion of highly expressed genes and the number of alternatively spliced transcripts expressed per gene.

Table 3 Functional Annotation Clustering of the GO terms using DAVID.

Full size table

Discussion

The molecular mechanisms of how cells regulate balance in global properties of the transcriptome are not well understood and it is unknown whether various mammalian cell types differ in their homeostatically maintained transcriptome properties. Broadly speaking, homeostasis could be regulated at the level of transcription, stabilization and degradation, as well as alternative promoter site usage and mRNA splicing. Prior studies attempted to decipher how cells maintain global properties of the transcriptome in a stable state by investigating the molecular mechanisms controlling synthesis and degradation of RNA, the equilibrium between these processes and the thermodynamic models explaining the transcriptome homeostasis^{1,2,3,4,5,6,7,8,9,10,11,12}.

Here we investigated whether various mammalian cell types differ in global transcriptome properties. To address this question, we compared 8 mouse cell types’ RNA-seq datasets. All cell types were acutely purified primary cells, except ECPs Gata1 KO, which were a cell line derived from immature embryonic mouse erythroblasts with targeted Gata1 gene deletion^15,28. However, despite ECPs Gata1 KO being a cell line, it was most closely associated on all parameters with acutely purified MEPs¹⁵, consistent with their erythroid precursor lineage, suggesting that ECPs Gata1 KO being a cell line or lacking the ability to differentiate into the erythroid cells due to the absence of Gata1 did not substantially alter its global transcriptome properties. We found that different cell types vary in proportion of highly expressed genes and the number of alternatively spliced transcripts expressed per gene and that the cell types that express more variants of alternatively spliced transcripts per gene are those that have higher proportion of highly expressed genes. Such association could occur if, for example, the cell types with higher proportion of highly expressed genes would have elevated basal transcriptional activity, which also involves splicing activity and result in both of these global parameters to be higher in the same cell types. Remarkably, cell types segregated into two upper hierarchy clusters based on high or low proportion of highly expressed genes alone. Although clustering was associated with cell types’ developmental lineage for most cell types, because it was not associated with all cell types that we tested, the proportion of highly expressed genes alone would not be sufficient for establishing cell types identities. However, this property of transcriptome may not be determined by the developmental lineage alone, but also by other factors, such as positioning in the tissue and signaling by adjacent cells. With regards to the highly expressed genes themselves, since they may have stronger weight in clustering analysis, it would be interesting to investigate in future studies the extent to which the underlying biology may be driven specifically by these groups of genes.

Are there consistent differences between the specific genes or pathways expressed by cells based on proportion of highly expressed or more highly spliced genes? Analysis of Functional Annotation Clustering of the GO terms associated with genes differentially enriched in the two clusters of cell types identified pathways involved in regulating gene expression and RNA splicing. Thus, cell types could vary in intrinsic properties of the transcriptome by maintaining different proportion of highly expressed genes and different number of alternatively spliced transcripts expressed per gene. These processes, in turn, may reflect intrinsic differences between cell types in coordination of synthesis, splicing and degradation of RNA molecules. This discovery should promote investigation into contributions of individual genes’ or pathways’ effects on the transcriptome homeostasis and subsequent downstream cellular or tissue phenotypes. The additional identified GO terms may also be involved in these biological processes and could provide clues for future studies.

What is the biological significance of spatio-temporal variance between cell types in the proportion of low and highly expressed genes and the number of alternatively spliced transcripts expressed per gene? More highly expressed genes exhibit more gradients in their concentration in cells or tissues, which could lead to more fine-tuned interactions and increased functional complexity in the downstream molecular network. This increased functional complexity could underlie differences between cell types at different stages in development or at different positions within the tissue, much like gradients in morphogenic factors during development contribute to anatomical complexity of an organ. A higher number of alternatively spliced transcripts expressed per gene may also enable increased functional complexity stemming from that gene locus. Because we find that cell types that express more variants of alternatively spliced transcripts per gene are those that demonstrate a higher proportion of highly expressed genes, these properties could be coupled and involved in regulation of the same underlying biological attribute(s). However, a higher number of low expressed genes may also lead to more fine-tuned regulation and increased functional complexity, if they are not regarded by the cell as noise. It is also possible that a high proportion of highly expressed genes may be indicative of a larger total transcriptome size²⁹ and may be related to cell volume and cellular metabolism, which interestingly was one of the biological processes enriched in cell types with higher proportion of highly expressed genes (Table 3). These hypotheses need to be addressed experimentally in future studies.

Our observations have a unique implication for RNA-seq studies where transcriptional or epigenetic factors are experimentally targeted, as such factors may regulate global properties of the transcriptome. For example, if transcriptional or epigenetic factor manipulations elicit a negative feedback mechanism to downregulate highly expressed genes or the frequency of RNA splicing events, they will also render differential gene expression analysis difficult to interpret. While identifying absolute levels of gene expression requires additional methods such as synthetic spike-in standards³⁰, analyzing proportion of highly expressed genes and the number of alternatively spliced transcripts expressed per gene could be done with RNA-seq data generated using standard methods, which will at least enable accounting for such relative differences. Utilizing spike-in standards³⁰ in future studies is also important because it will facilitate investigating various aspects of the transcriptome biology alluded to by our studies. For example, one could then derive a more accurate reconstruction of alternatively spliced transcripts that are expressed at a very low level, as well as predict the negative feedback threshold for global homeostatic downregulation of highly expressed genes, which our studies suggest may differ between cell types and possibly between species.

In conclusion, our findings suggest that cell types vary in intrinsic properties of the transcriptome by maintaining different proportion of highly expressed genes and different number of alternatively spliced transcripts expressed per gene. Such intrinsic differences between cell types could be associated with differential coordination of synthesis, splicing and degradation of RNA molecules and should be accounted for in comparative RNA-seq analysis, particularly if transcriptional or epigenetic factors are experimentally targeted. The molecular mechanisms and pathways regulating global properties of transcriptome, their biological significance and the differences between more of the various cell types and of the same cell type between species, are important to investigate in future studies.

Methods

Cell purification methods and RNA-seq datasets Gene Expression Omnibus (GEO) accessions

Astrocytes were purified by FACS from single cell suspension cortices of Aldh1l1–BAC-eGFP transgenic mice following an established protocol¹⁴ (original raw reads available from the NCBI GEO accession numbers GSE52564/GSM1269903/GSM1269904). Endothelial cells were purified by FACS from single cell suspension cortices of Tie2–EGFP transgenic mice following an established protocol¹⁴ (original raw reads available from the NCBI GEO accession numbers GSE52564/GSM1269915/GSM1269916). Cortical neurons were purified from mice cortices single cell suspension by immunopanning for L1CAM after depletion of endothelial cells, oligodendrocyte precursor cells, microglia and macrophages (using BSL1, O4 and CD45, respectively) and washing off the nonadherent cells, following an established protocol¹⁴ (original raw reads available from the NCBI GEO accession numbers GSE52564/GSM1269905/GSM1269906). Oligodendrocytes were purified from mice cortices single cell suspension by immunopanning for MOG after depletion of endothelial cells, oligodendrocyte precursor cells, microglia and macrophages (using BSL1, PDGFRα, A2B5 and CD45, respectively) and washing off the nonadherent cells, following an established protocol¹⁴ (original raw reads available from the NCBI GEO accession numbers GSE52564/GSM1269911/GSM1269912). Microglia were purified from mice cortices single cell suspension by immunopanning for CD45 after depletion of macrophages through perfusing the mice with PBS to wash away blood from the brain, following an established protocol¹⁴ (original raw reads available from the NCBI GEO accession numbers GSE52564/GSM1269913/GSM1269914). Megakaryocyte-erythroid progenitors (MEP) were purified from adult mouse bone marrow by FACS¹⁵ using an established protocol [Lineage(−), cKit(+), Sca1(−), CD34low, CD16/32(−)]³¹ (original raw reads available from the NCBI GEO accession numbers GSE40522/GSM995525). Erythroid-committed precursors (ECP) Gata1 KO (which cannot differentiate into the erythroid cells without Gata1) were derived from immature embryonic mouse erythroblasts with targeted Gata1 gene deletion¹⁵ using an established protocol²⁸ (original raw reads available from the NCBI GEO accession numbers GSE40522/GSM995536). Retinal ganglion cells (RGCs) were purified by authors from postnatal day 5 mice eyes single cell suspension by immunopanning for Thy1 (CD90, MCA02R, Serotec) after depletion of macrophages (using anti-mouse macrophage antibody, AIA31240, Accurate Chemical) and washing off the nonadherent cells, following an established protocol^13,32 and RNA extracted using the Direct-zol RNA kit (Zymo Research) had a RIN ≥ 8.5 (Bioanalyzer 2100, Agilent 6000 kit; raw reads available from the NCBI GEO accession numbers pending). All animal procedures for collecting RGCs were approved by the University of Miami Institutional Animal Care and Use Committee and by the Institutional Biosafety Committee at the University of Miami and performed in accordance with the ARVO Statement for the Use of Animals in Ophthalmic and Visual Research. C57BL/6J mice were obtained from Charles River Laboratories, Inc. For all cell types samples libraries were prepared using polyA-selected RNA and paired reads sequenced 100 bp from each end on HiSeq 2000 Sequencer (Illumina)^14,15. All cell types samples included two biological replicates for which raw reads and analyzed/reanalyzed datasets are available through the GEO accession numbers provided above.

RNA-seq analysis pipeline commands and software versions

Reads were mapped to mouse reference genome mm10 (UCSC Genome Browser) and a comprehensive transcriptome annotation database GTF file, which was assembled by using the UCSC Table Browser Intersection utility to merge the GENCODE M4³³ transcripts in a non-redundant manner with the UCSC Gene Track³⁴ transcripts that did not overlap more than 90% with the GENCODE transcripts. The raw reads were mapped using the TopHat/Bowtie2/Cufflinks pipeline^17,18,19, with -g option, to construct merged GTF file that included the annotated and novel transcript structures from all samples. We then used the IntersectBed tool (Bedtools) to retain only the reads that mapped to the merged GTF, which was converted to BED with Gtf2bed tool (Bedops). This filtering step allowed selecting the reads which contributed to the identified gene structures and exclude noise and artifacts even if they mapped to the genome but did not contribute to gene structure. Next, we selected only uniquely mapped and properly paired reads using View -bq 4 -bh -f2 -F12 command (Samtools). After this step we used DownsampleSam tool (Picard) to randomly subsample equal number of paired reads, which provided representative samples of the same size for all samples (34.6 M per sample/replicate; properly paired and total reads count with Flagstat, Samtools). Then we used the TopHat/Bowtie2/Cufflinks/Cuffdiff pipeline^17,18,19 with -g option for determining normalized expression in fragments per kilobase of transcript sequence per million mapped fragments (FPKMs) in each replicate of each sample with Cuffdiff’s across-sample normalization (Table 2) and assessed the filtered reads aligned to transcripts or introns and intergenic regions using RnaSeqMetrics (http://broadinstitute.github.io/picard)³⁵. For the differential expression analysis where cell types in each of the two upper hierarchy clusters were treated as one condition, each replicate of each cell type was assigned to one of only two cluster groups. For differential expression analysis, the Cuffdiff q-value (which is the FDR corrected p-value^17,18) cut off was set to 0.05. For the upper quartile normalization, the FPKMs were normalized to the upper quartile across samples and scaled by the mean of upper quartiles from all samples. Software versions used: Tophat 2.0.12, Bowtie 2.2.4, Cufflinks 2.2.1, Samtools 0.1.19, Picard 1.79, Bedops 2.4.2, Bedtools 2.19.0. Analyses were performed on the Orchestra High Performance Compute Cluster at Harvard Medical School NIH supported shared facility, consisting of thousands of processing cores and terabytes of associated storage. The datasets from these analyses are available through the GEO accession Series GSE85458.

Statistics, Cluster analysis and Functional Annotations

Pearson correlation and matrix analysis (2-tailed) of gene expression profiles, as well as ANOVA with posthoc LSD, were preformed using SPSS software with p < 0.05 indicating statistical significance. Dendrogram and hierarchical clustering heat maps, with uncentered Pearson correlation and centroid linkage, were generated using Gene Cluster 3.0 and visualized with Java Treeview 1.1.6r4^36,37. Functional Annotation Clustering of the GO terms associated with differentially expressed genes was performed using the Database for Annotation, Visualization and Integrated Discovery (DAVID), with higher enrichment score signifying more cluster enrichment^26,27. Enrichment score is the geometric mean (in −log scale) of p-values for the individual annotation categories comprising a cluster^26,27. Minimum enrichment score threshold was set to 0.75 and clusters implicated in the same higher order biological process were manually merged and the averages of their enrichment scores are shown.

Additional Information

How to cite this article: Trakhtenberg, E. F. et al. Cell types differ in global coordination of splicing and proportion of highly expressed genes. Sci. Rep. 6, 32249; doi: 10.1038/srep32249 (2016).

References

Konishi, T. A thermodynamic model of transcriptome formation. Nucleic Acids Res 33, 6587–6592, doi: 10.1093/nar/gki967 (2005).
Article CAS PubMed PubMed Central Google Scholar
Pérez-Ortín, J. E., Alepuz, P., Chávez, S. & Choder, M. Eukaryotic mRNA decay: methodologies, pathways and links to other stages of gene expression. J Mol Biol 425, 3750–3775, doi: 10.1016/j.jmb.2013.02.029 (2013).
Article CAS PubMed Google Scholar
Miller, C. et al. Dynamic transcriptome analysis measures rates of mRNA synthesis and decay in yeast. Mol Syst Biol. 7, 458, doi: 10.1038/msb.2010.112 (2011).
Article CAS PubMed PubMed Central Google Scholar
Schwalb, B. et al. Measurement of genome-wide RNA synthesis and decay rates with Dynamic Transcriptome Analysis (DTA). Bioinformatics 28, 884–885, doi: 10.1093/bioinformatics/bts052 (2012).
Article CAS PubMed Google Scholar
Sun, M. et al. Comparative dynamic transcriptome analysis (cDTA) reveals mutual feedback between mRNA synthesis and degradation. Genome Res 22, 1350–1359, doi: 10.1101/gr.130161.111 (2012).
Article CAS PubMed PubMed Central Google Scholar
Dori-Bachash, M., Shema, E. & Tirosh, I. Coupled evolution of transcription and mRNA degradation. PLoS Biol 9, e1001106, doi: 10.1371/journal.pbio.1001106 (2011).
Article CAS PubMed PubMed Central Google Scholar
Rabani, M. et al. Metabolic labeling of RNA uncovers principles of RNA production and degradation dynamics in mammalian cells. Nat Biotechnol 29, 436–442, doi: 10.1038/nbt.1861 (2011).
Article CAS PubMed PubMed Central Google Scholar
Amorim, M. J., Cotobal, C., Duncan, C. & Mata, J. Global coordination of transcriptional control and mRNA decay during cellular differentiation. Mol Syst Biol 6, 380, doi: 10.1038/msb.2010.38 (2010).
Article CAS PubMed PubMed Central Google Scholar
Dölken, L. et al. High-resolution gene expression profiling for simultaneous kinetic parameter analysis of RNA synthesis and decay. RNA 14, 1959–1972, doi: 10.1261/rna.1136108 (2008).
Article CAS PubMed PubMed Central Google Scholar
Lin, C. Y. et al. Transcriptional amplification in tumor cells with elevated c-Myc. Cell 151, 56–67, doi: 10.1016/j.cell.2012.08.026 (2012).
Article CAS PubMed PubMed Central Google Scholar
Nie, Z. et al. c-Myc is a universal amplifier of expressed genes in lymphocytes and embryonic stem cells. Cell 151, 68–79, doi: 10.1016/j.cell.2012.08.033 (2012).
Article CAS PubMed PubMed Central Google Scholar
Upender, M. B. et al. Chromosome transfer induced aneuploidy results in complex dysregulation of the cellular transcriptome in immortalized and cancer cells. Cancer Res 64, 6941–6949, doi: 10.1158/0008-5472.CAN-04-0474 (2004).
Article CAS PubMed PubMed Central Google Scholar
Trakhtenberg, E. F. et al. Regulating Set-β‘s Subcellular Localization Toggles Its Function between Inhibiting and Promoting Axon Growth and Regeneration. J Neurosci. 34, 7361–7374, doi: 10.1523/JNEUROSCI.3658-13.2014 (2014).
Article CAS PubMed PubMed Central Google Scholar
Zhang, Y. et al. An RNA-sequencing transcriptome and splicing database of glia, neurons and vascular cells of the cerebral cortex. J Neurosci 34, 11929–11947, doi: 10.1523/JNEUROSCI.1860-14.2014 (2014).
Article CAS PubMed PubMed Central Google Scholar
Paralkar, V. R. et al. Lineage and species-specific long noncoding RNAs during erythro-megakaryocytic development. Blood 123, 1927–1937, doi: 10.1182/blood-2013-12-544494 (2014).
Article CAS PubMed PubMed Central Google Scholar
An, X. et al. Global transcriptome analyses of human and murine terminal erythroid differentiation. Blood 123, 3466–3477, doi: 10.1182/blood-2014-01-548305 (2014).
Article CAS PubMed PubMed Central Google Scholar
Trapnell, C. et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc 7, 562–578, doi: 10.1038/nprot.2012.016 (2012).
Article CAS PubMed PubMed Central Google Scholar
Trapnell, C. et al. Differential analysis of gene regulation at transcript resolution with RNA-seq. Nat Biotechnol 31, 46–53, doi: 10.1038/nbt.2450 (2013).
Article CAS PubMed Google Scholar
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat Methods 9, 357–359, doi: 10.1038/nmeth.1923 (2012).
Article CAS PubMed PubMed Central Google Scholar
McIntyre, L. M. et al. RNA-seq: technical variability and sampling. BMC Genomics 12, 293, doi: 10.1186/1471-2164-12-293 (2011).
Article CAS PubMed PubMed Central Google Scholar
Tarazona, S., García-Alcalde, F., Dopazo, J., Ferrer, A. & Conesa, A. Differential expression in RNA-seq: a matter of depth. Genome Res 21, 2213–2223, doi: 10.1101/gr.124321.111 (2011).
Article CAS PubMed PubMed Central Google Scholar
Wagner, G. P., Kin, K. & Lynch, V. J. Measurement of mRNA abundance using RNA-seq data: RPKM measure is inconsistent among samples. Theory Biosci. 131, 281–285, doi: 10.1007/s12064-012-0162-3 (2012).
Article CAS PubMed Google Scholar
Young, M. D., Wakefield, M. J., Smyth, G. K. & Oshlack, A. Gene ontology analysis for RNA-seq: accounting for selection bias. Genome Biol. 11, R14, doi: 10.1186/gb-2010-11-2-r14 (2010).
Article CAS PubMed PubMed Central Google Scholar
Oshlack, A. & Wakefield, M. J. Transcript length bias in RNA-seq data confounds systems biology. Biol Direct 4, 14, doi: 10.1186/1745-6150-4-14 (2009).
Article CAS PubMed PubMed Central Google Scholar
Rapaport, F. et al. Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data. Genome Biol 14, R95, doi: 10.1186/gb-2013-14-9-r95 (2013).
Article CAS PubMed PubMed Central Google Scholar
Jiao, X. et al. DAVID-WS: a stateful web service to facilitate gene/protein list analysis. Bioinformatics 28, 1805–1806, doi: 10.1093/bioinformatics/bts251 (2012).
Article CAS PubMed PubMed Central Google Scholar
Huang, d. W., Sherman, B. T. & Lempicki, R. A. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 4, 44–57, doi: 10.1038/nprot.2008.211 (2009).
Article CAS Google Scholar
Welch, J. J. et al. Global regulation of erythroid gene expression by transcription factor GATA-1. Blood 104, 3136–3147, doi: 10.1182/blood-2004-04-1603 (2004).
Article CAS PubMed Google Scholar
Lovén, J. et al. Revisiting global gene expression analysis. Cell 151, 476–482, doi: 10.1016/j.cell.2012.10.012 (2012).
Article CAS PubMed PubMed Central Google Scholar
Jiang, L. et al. Synthetic spike-in standards for RNA-seq experiments. Genome Res 21, 1543–1551, doi: 10.1101/gr.121095.111 (2011).
Article CAS PubMed PubMed Central Google Scholar
Pronk, C. J. et al. Elucidation of the phenotypic, functional and molecular topography of a myeloerythroid progenitor cell hierarchy. Cell Stem Cell 1, 428–442, doi: 10.1016/j.stem.2007.07.005 (2007).
Article CAS PubMed Google Scholar
Trakhtenberg, E. F. et al. The N-terminal Set-β Protein Isoform Induces Neuronal Death. J Biol Chem 290, 13417–13426, doi: 10.1074/jbc.M114.633883 (2015).
Article CAS PubMed PubMed Central Google Scholar
Harrow, J. et al. GENCODE: producing a reference annotation for ENCODE. Genome Biol 7 Suppl 1, S4.1–9, doi: 10.1186/gb-2006-7-s1-s4 (2006).
Article Google Scholar
Hsu, F. et al. The UCSC Known Genes. Bioinformatics 22, 1036–1046, doi: 10.1093/bioinformatics/btl048 (2006).
Article CAS PubMed Google Scholar
DeLuca, D. S. et al. RNA-SeQC: RNA-seq metrics for quality control and process optimization. Bioinformatics 28, 1530–1532, doi: 10.1093/bioinformatics/bts196 (2012).
Article CAS PubMed PubMed Central Google Scholar
de Hoon, M. J., Imoto, S., Nolan, J. & Miyano, S. Open source clustering software. Bioinformatics 20, 1453–1454, doi: 10.1093/bioinformatics/bth078 (2004).
Article CAS PubMed Google Scholar
Saldanha, A. J. Java Treeview–extensible visualization of microarray data. Bioinformatics 20, 3246–3248, doi: 10.1093/bioinformatics/bth349 (2004).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

We gratefully acknowledge support from the AHA (15POST25080290, EFT), NEI (EY026766, JLG), NCRR (1S10RR028832-01, Research Computing Group at Harvard Medical School). Portions of this research were conducted on the Orchestra High Performance Compute Cluster at Harvard Medical School NIH-supported shared facility. We are grateful for assistance from Research Computing Group (Harvard Medical School) with computing resources and bioinformatics analysis, William Hulme, Ryan Gentry and Daniel Pita-Thomas (Center for Genome Technology, University of Miami) for next generation sequencing, Larry Benowitz (Boston Children’s Hospital, Harvard Medical School), Isaac Kohane (Boston Children’s Hospital, Harvard Medical School) and Brian Haas (Broad Institute, MIT) for advice.

Author information

Authors and Affiliations

Department of Neurosurgery, F.M. Kirby Neurobiology Center, Boston Children’s Hospital, Harvard Medical School, Boston, MA, USA
Ephraim F. Trakhtenberg
Research Computing Group, Harvard Medical School, Boston, MA, USA
Nam Pho, Kristina M. Holton, Thomas W. Chittenden & Lingsheng Dong
Byers Eye Institute, Stanford University, Palo Alto, CA, USA
Jeffrey L. Goldberg

Authors

Ephraim F. Trakhtenberg
View author publications
You can also search for this author in PubMed Google Scholar
Nam Pho
View author publications
You can also search for this author in PubMed Google Scholar
Kristina M. Holton
View author publications
You can also search for this author in PubMed Google Scholar
Thomas W. Chittenden
View author publications
You can also search for this author in PubMed Google Scholar
Jeffrey L. Goldberg
View author publications
You can also search for this author in PubMed Google Scholar
Lingsheng Dong
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

E.F.T. conceived and designed the study, performed RGC RNA-seq and bioinformatics analyses and wrote the manuscript. N.P., K.M.H., T.W.C. and L.D. contributed to bioinformatics analyses. J.L.G. contributed to study design and edited the manuscript.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Electronic supplementary material

Supplementary Information

Rights and permissions

This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

Reprints and permissions

About this article

Cite this article

Trakhtenberg, E., Pho, N., Holton, K. et al. Cell types differ in global coordination of splicing and proportion of highly expressed genes. Sci Rep 6, 32249 (2016). https://doi.org/10.1038/srep32249

Download citation

Received: 10 March 2016
Accepted: 01 August 2016
Published: 31 August 2016
DOI: https://doi.org/10.1038/srep32249
Springer Nature Limited

This article is cited by

Dual SMAD inhibition and Wnt inhibition enable efficient and reproducible differentiations of induced pluripotent stem cells into retinal ganglion cells
- Venkata R. M. Chavali
- Naqi Haider
- Jason A. Mills
Scientific Reports (2020)
Expression and activity of the calcitonin receptor family in a sample of primary human high-grade gliomas
- Anna Ostrovskaya
- Caroline Hick
- Sebastian G. B. Furness
BMC Cancer (2019)
Identification and characterization of SSR, SNP and InDel molecular markers from RNA-Seq data of guar (Cyamopsis tetragonoloba, L. Taub.) roots
- Omika Thakur
- Gursharn Singh Randhawa
BMC Genomics (2018)

Cell types differ in global coordination of splicing and proportion of highly expressed genes

Abstract

Similar content being viewed by others

Conserved regulation of RNA processing in somatic cell reprogramming

Single-cell full-length total RNA sequencing uncovers dynamics of recursive splicing and enhancer RNAs

Transcription Factors and Splice Factors—Interconnected Regulators of Stem Cell Differentiation

Introduction

Results

Discussion

Methods

Cell purification methods and RNA-seq datasets Gene Expression Omnibus (GEO) accessions

RNA-seq analysis pipeline commands and software versions

Statistics, Cluster analysis and Functional Annotations

Additional Information

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Ethics declarations

Competing interests

Electronic supplementary material

Supplementary Information

Rights and permissions

About this article

Cite this article

This article is cited by

Dual SMAD inhibition and Wnt inhibition enable efficient and reproducible differentiations of induced pluripotent stem cells into retinal ganglion cells

Expression and activity of the calcitonin receptor family in a sample of primary human high-grade gliomas

Identification and characterization of SSR, SNP and InDel molecular markers from RNA-Seq data of guar (Cyamopsis tetragonoloba, L. Taub.) roots

Navigation

Cell types differ in global coordination of splicing and proportion of highly expressed genes

Abstract

Similar content being viewed by others

Conserved regulation of RNA processing in somatic cell reprogramming

Single-cell full-length total RNA sequencing uncovers dynamics of recursive splicing and enhancer RNAs

Transcription Factors and Splice Factors—Interconnected Regulators of Stem Cell Differentiation

Introduction

Results

Discussion

Methods

Cell purification methods and RNA-seq datasets Gene Expression Omnibus (GEO) accessions

RNA-seq analysis pipeline commands and software versions

Statistics, Cluster analysis and Functional Annotations

Additional Information

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Ethics declarations

Competing interests

Electronic supplementary material

Supplementary Information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Dual SMAD inhibition and Wnt inhibition enable efficient and reproducible differentiations of induced pluripotent stem cells into retinal ganglion cells

Expression and activity of the calcitonin receptor family in a sample of primary human high-grade gliomas

Identification and characterization of SSR, SNP and InDel molecular markers from RNA-Seq data of guar (Cyamopsis tetragonoloba, L. Taub.) roots

Search

Navigation