Introduction

Stroke is diagnosed based on patient history, neurological exam, and brain imaging. Differential diagnosis can be difficult particularly for distinguishing ischemic stroke (IS) from intracerebral hemorrhage (ICH) when imaging is unavailable in the acute setting. Thus, an accurate, inexpensive, and rapid blood test would be useful.

Blood transcriptomes show promise as diagnostic biomarkers and have provided insight into understanding the nature of the immune response following human stroke [16]. However, these studies have investigated only a portion of the protein coding transcriptome, since they have used 3′-biased microarrays to measure blood mRNA expression [16]. Though these studies demonstrated proof-of-principle, most of the stroke transcriptome which is comprised of all alternatively spliced isoforms remains unstudied in stroke. The importance of alternative splicing is supported by evidence implicating it in the pathogenesis of many diseases [7, 8] .

Alternative splicing is the process whereby exons from a single gene are included or excluded in the final mRNA transcript (Supplementary Figure 1). A single gene can produce several alternatively spliced isoforms which have specific functions in different cells, tissues, developmental stages, and disease states. Thus, the ~20,000 known genes code for >250,000 different mRNAs and proteins. Differential alternative splicing (DAS) is alternative splicing that differs between groups. We hypothesized that DAS would vary for different causes of IS (cardioembolic, large vessel, and lacunar) and for ICH when compared to each other and to controls.

RNA-seq is a new technology that allows for estimation of expression of each splice variant (Supplementary Figure 1), a significant advance over previous technologies. Because there have been no studies of alternative splicing related to stroke etiology, or for IS versus ICH either in humans or in animal models, we performed this pilot RNA-seq study to examine DAS in whole blood following IS and ICH in humans.

Methods

Stroke patients and control subjects were randomly selected from those recruited at the University of California Davis Medical Center between 2008 and 2012. Stroke patients were chosen to represent the major IS etiologies (cardioembolic, large vessel atherosclerotic, lacunar) or had ICH. IS diagnosis and causes were assessed as described previously [5, 9]. ICH patients had deep ICH (basal ganglia, deep white matter, or thalamus) confirmed by CT and/or MRI brain scans and were associated with hypertension without evidence of vascular malformation, tumor, or aneurysm. Control subjects were selected to match stroke subjects for age, race, sex, and vascular risk factors and had no history of previous stroke or cardiovascular events. Blood from all subjects was drawn into PAXgene tubes between 5.8 and 101.2 h following IS or ICH. RNA from whole blood was isolated as previously described [3]. The UCD Institutional Review Board approved this study and all subjects provided informed consent.

Whole blood RNA was used to prepare mRNA libraries using the TruSeq RNA Sample Prep v2 kit and protocol (Illumina). Two hundred million PE 100-bp RNA-seq reads were obtained from each mRNA library using Illumina Solexa sequencing by synthesis on the Illumina HiSeq 2000. TopHat v2.0.7 (Bowtie v2.0.6) was used with default parameters to map reads to a reference genome (Hg19) and generate bam files for analysis [10]. RNA transcript quantification was performed using Hg19 AceView transcripts in the Partek Genomics Suite 6.6 RNA-seq workflow.

The raw reads for genes displaying DAS are shown in Supplementary Table 2 and the raw reads for genes displaying differential exon usage are shown in Supplementary Table 6. They were generated from aligned bam files using featureCounts against AceView (NCBI 37) [11] with options allowing for any and multiple overlaps [12]. However, they were not used directly for the statistical analysis. Instead, raw aligned reads were normalized, and differential alternatively spliced transcript expression and exon expression quantification were performed using the expectation/maximization (E/M) algorithm (briefly described below) as implemented in Partek Genomics Suite [13]. DAS was determined with one-way ANOVA on Group (Benjamini-Hochberg false discovery rate, FDR; p < 0.05), and differential exon usage was assessed between each two groups (p < 0.0005, fold change > |1.2|).

Principal components analysis (PCA) and hierarchical clustering were performed in Partek Genomics Suite. Ingenuity Pathway Analysis (IPA®) and DAVID identified regulated pathways and processes as described previously [6].

Results

Subject Demographics

Subject demographics and clinical characteristics are presented in Table 1. Only Caucasian males were studied because of the small group sizes. Age, time since event for IS or ICH, and vascular risk factors were not significantly different between groups. Coverage of a wide range of post-stroke biology was obtained by selecting patients with early (5.8 h) through late (101.2 h) blood draw times after IS and ICH. However, the means were similar between the stroke groups. Cardioembolic IS post-event blood draw times were, on average, 33.7 h; large vessel averaged 47.4 h; lacunar averaged 34.6 h; and ICH averaged 29.4 h (Table 1).

Table 1 Subject demographics and clinical characteristics

RNA Sequencing Alignments

RNA sequencing alignment statistics for all samples among the five groups are presented in Supplementary Table 1. Cardioembolic stroke samples had on average, 1.60E+08 alignments; large vessel had 1.65E+08 alignments; lacunar stroke had 1.64E+08 alignments; and ICH and control groups each averaged 1.59E+08 alignments. These data show that there is no bias in the numbers of alignments for any of the five groups.

Distinct Alternative Splicing of Genes in Whole Blood of Stroke Patients and Controls

A total of 412 genes displayed differential alternative splicing (DAS) in the whole blood transcriptomes of the five groups of patients with ischemic stroke (cardioembolic, large vessel, and lacunar), intracerebral hemorrhage (ICH) and controls (FDR p < 0.05; Supplementary Table 2, raw reads; Supplementary Table 3, ANOVA results). These 412 genes are those predicted using the E/M algorithm [13] as implemented in Partek, to have DAS for IS (cardioembolic, large vessel, lacunar) versus ICH versus controls. The E/M algorithm probabilistically assigns reads to known isoforms/exons of a gene [13]. Partek then uses a log-likelihood ratio test to identify genes with DAS across samples [13, 14]. The 412 significant genes displaying DAS across IS, ICH, and controls are involved in cellular immunity, cytokine signaling, and cell death and survival pathways (Supplementary Table 4).

Pathways highly over-represented with differentially alternative spliced genes between the five groups included CD28 signaling in T helper cells, CDC42 signaling, Nur77 signaling in T lymphocytes, fMLP signaling in neutrophils, and interferon signaling (Supplementary Table 4). Molecular and cellular functions most highly associated with the differentially alternatively spliced genes were cell death and survival of immune cells, cell-cell signaling, activation and recruitment of leukocytes, antigen-presenting cells, activation of T lymphocytes, adhesion of vascular endothelial cells, and immune response of neutrophils (Supplementary Table 5).

Specific Exon-Usage Profiles for ICH and Different Ischemic Stroke Etiologies

A total of 308 exons from 292 genes were differentially expressed for the three causes of IS (cardioembolic, large vessel, lacunar), ICH, and controls (p < 0.0005, fold change >|1.2|; Supplementary Table 6, raw reads; Supplementary Table 7, ANOVA results). These exons separated the five groups, including the three causes of IS (cardioembolic, large vessel, lacunar), ICH, and controls on principal components analysis (PCA) plots (Fig. 1a) and using unsupervised hierarchical clustering (Fig. 1b). Given that the E/M algorithm uses counts of the numbers of reads on each exon [13, 14], the differential expression of exons across the five groups represents differential exon usage across the five groups. These results are relevant to DAS because DAS results from differential exon usage.

Fig. 1
figure 1

Principal components analysis (PCA) (Fig. 1a) and unsupervised hierarchical clustering (Fig. 1b) of the 308 exons (292 genes) with differential exon usage among intracerebral hemorrhage (n = 4), ischemic strokes (IS) (cardioembolic, large vessel, and lacunar) (n = 12), and control subjects (n = 4). In Fig. 1a, the expression of the 308 exons is compressed on to three axes in the PCA plot. The three principal components on the PCA plot account for 64.1 % of the variance. In Fig. 1b, exon expression is shown on the X-axis and subjects are shown on the Y-axis. Each row on the Y-axis represents a single individual, with five individuals per group. The dendrograms were removed from this figure. Red indicates increased expression. Green indicates decreased expression

Biological functions and networks represented by genes with differentially expressed exons in each group (Fig. 1b) are summarized in Supplementary Table 8. Cardioembolic stroke genes with differential exon usage were involved in ion binding/transport and cellular assembly/organization. Large-vessel stroke genes were associated with cell death, transcription, and chromatin remodeling. Lacunar stroke genes were associated with cellular compromise, cell cycle, cell death and survival. ICH genes were involved with protein transport and localization (Supplementary Table 8).

Discussion

Although differential alternative splicing (DAS) is implicated in many human diseases, this is the first study to show that DAS differs between intracerebral hemorrhage (ICH), ischemic stroke, and control subjects. In addition, it is the first study to show that DAS differs between different etiologies of ischemic stroke including cardioembolic, large vessel, and lacunar causes. Identification of DAS in RNA from whole blood for specific stroke etiologies and ICH suggests the immune response varies for each condition. This will be important for understanding the pathogenesis of each condition and will be important for developing biomarkers to differentiate ischemic stroke from ICH and for developing biomarkers to differentiate the different causes of ischemic stroke.

This study identified several pathways, molecular functions, and genes previously reported in human ischemic stroke using 3′-biased microarrays [6, 15]. These included actin cytoskeleton signaling, CCR5 signaling in macrophages, NF-κB activation, α-adrenergic signaling, cellular growth and proliferation, cell death and survival, cell morphology, hematopoiesis, hematological system development, and inflammatory response [4, 5, 16, 17]. Moreover, a number of the pathways implicated in different etiologies of ischemic stroke in our previous microarray studies were confirmed in these RNA-seq studies [4, 5, 16, 17].

This study is the first to describe genes with DAS and pathways unique for ICH. Among the genes that differentiated ICH from IS were INPP5D (inositol polyphosphate-5-phosphatase) and ITA4 (integrin alpha 4). The INPP5D enzyme regulates myeloid cell proliferation and programming, and its expression correlates with hemorrhagic transformation of ischemic stroke [18]. ITA4 is involved in leukocyte recruitment after intracerebral hemorrhage [19], and leukocytes are intimately associated with ICH. For example, leukocytes are involved in clotting and interact with injured vessels and brain following ICH [15].

Other genes with DAS associated with ICH in this study included NAV1 (neuron navigator 1), PDGFC (platelet derived growth factor C), and CCM2 (cerebral cavernous malformation 2) which participate in vascular endothelial growth factor (VEGF) signaling, which predisposes the brain to hemorrhage because of new vessel formation [20]. Of interest, mutations of CCM2 cause cerebral cavernous malformations which can lead to intracerebral hemorrhage [21]. Other genes with DAS associated with ICH included EXOSC1 (exosome component 1) and EXOSC9 (exosome component 9) which code for core components of the exosome complex [22]. Although exosomes have been implicated in neuroinflammation, neurodegeneration, and cancer, they have not previously been associated with ICH [23, 24]. Lastly, another gene with DAS associated with ICH included DGCR8 (DiGeorge syndrome critical region 8, a microprocessor complex subunit) which is involved in the biogenesis of microRNAs [25], which could suggest that miRNAs are involved with differential alternative splicing following ICH.

Study Limitations

Sample sizes in this pilot study were small. Thus, we cannot rule out splicing changes due to vascular risk factors. However, hierarchical clustering of the differentially expressed exons demonstrates separation on diagnosis and not on vascular risk factors (Supplementary Figure 2). Validation of these findings in a separate cohort is needed to confirm the present results. These results are important because they provide evidence for differential alternative splicing in the pathophysiology of the immune response to ischemic stroke and intracerebral hemorrhage and also might provide novel biomarkers for ICH and different causes of IS.