A genomic search for alternative splicing in the human placenta
Placental alternative splicing is a prominent feature of gestational diseases
To determine whether there are differences in mature transcripts for a given gene between healthy and diseased placentas, gene expression at the exon level was measured in placentas from two distinct cohorts of patients: a cohort of PE-affected women and matched controls (Institut Cochin-PE cohort, n = 7 and 9, respectively) and another of IUGR-affected women and matched controls (Angers Hospital-IUGR cohort, n = 13 and 8, respectively). Patient characteristics are presented in Supplementary Table S1. As expected, both gestational age and birth weight are significantly different between disease and control groups. This difference is a recurrent and expected feature of transcriptome analysis comparing normal and pathological placentas. In this case, since most of the samples (including the pathological ones) are from third-trimester placentas, the effects linked to placental age are probably meek compared to the ones associated with the disease per se. To more thoroughly ascertain this, we proceeded to identify and estimate surrogate variables for known and unknown sources of variation in our dataset using the SVA package (Leek et al. 2012). In the IUGR vs CTRL comparison, 6.5% of the probes differentially expressed according to disease, were also associated to gestational age, 12.5% to birth weight, and 10.9% to sex. In the PE vs CTRL comparison, 7.5% of the genes significantly changed due to PE effects were significantly associated to gestational age, 6.8% to birthweight and 10% to the sex of the baby. Considering these data, we assumed that the gene expression and splicing alterations detected are strongly influenced by the disease status. These data are summarized as Supplemental Table S2a, S2b and S2c for the comparison between CTLs on the one hand, vs IUGRs, all PEs, and isolated PEs, respectively.
The comparison of gene expression at the exon level was carried out using Affymetrix ClariomD microarray, an attractive solution for tackling the complex question of alternative splicing regulation, compared to other more computation-intensive approaches depending upon RNA-seq. Analyses through the Affymetrix Transcriptome Analysis Console software lead to the definition of a splicing index associated with a relative p value for each gene and each comparison (Gardina et al. 2006), allowing to rank the genes from the most differentially spliced, to the least. The complete dataset is available with the accession number E-MTAB-9416 (EMBL-EBI ArrayExpress).
When using a p value threshold of 0.001, 1060 and 1409 genes had a significant splicing index value in PE vs controls and IUGR vs controls, respectively. Splicing index alterations were not correlated with gene expression deregulation between preeclamptic and control samples (Fig. 1a).
For instance, CYP19A1 or BAIAP2 were not strongly modified at the expression levels in PE placental samples and IUGR samples, respectively, while the splicing index alteration in these genes was amongst the highest (Table 1).
Table 1. 48 genes sorted by Splicing Index as modified in pregnancy diseases Alternative splicing concerns specific gene categories in PE and IUGR
Since p value allows to pinpoint significant but also small variations in splicing (that may not be relevant), we decided to focus on transcripts with SI >|\(3\)|, a list marginally different from the one given with the p < 0.001 threshold. We identified a list of 1456 transcripts in PE vs control placentas, and 725 transcripts in IUGR vs control placentas at the same threshold. Among those, 1071 and 575 have an official gene symbol (Supplementary Tables S3 and S4) and 176 were found alternatively spliced in both diseases (Fig. 1b). These latter 176 genes encoded proteins that constituted a network of Protein–Protein Interactions (p = 0.00097, Supplemental figure S1, String database, https://string-db.org/cgi (Szklarczyk et al. 2019)), showing that the splicing alteration does not occur randomly in the placental genome in pathological situations. We then performed an over-representation analysis using WebGestalt (Wang et al. 2017; Liao et al. 2019) on the 176 genes found alternatively spliced in both disease conditions. We compared our gene list with the GLAD4U database (Jourquin et al. 2012), that collects gene sets associated with diseases. We found a significant enrichment for Pregnancy complications, fetal diseases, gestational hypertension, hypoxia and several cancer pathways (Fig. 2a). The exhaustive values and genes involved are presented as Fig. 2b for ‘Gestational Hypertension’, ‘Pregnancy complications’, ‘Anoxia’, and ‘Pregnancy’.
Functional clustering of alternatively spliced genes partly differs between PE and IUGR
Then we proceed to separately analyze the genes spliced in either IUGR or PE. We found that while the two pathologies shared biological processes related to secretion, exocytosis and vesicle metabolism, substantial clustering differences were observed.
In IUGR (Table 2), KEGG and Reactome pathways related to hypoxia, steroidogenesis, hormone metabolism and anion transport were enriched. Against disease database (GLAD4U and OMIM), the alternatively spliced genes in IUGR were widely associated to 'pregnancy complications', 'Nitric Oxide metabolism' (known as intimately linked to placental pathology) (Aouache and Biquard 2018; Motta-Mejia et al. 2017; Dymara-Konopka and Laskowska 2019), 'preeclampsia' and 'eclampsia', but also to autoimmune diseases, especially 'lupus erythematosus' (SLE), which is consistent with the increased risk of IUGR and other defective pregnancy outcomes documented in autoimmune patients, especially those affected by SLE (Do and Druzin 2019).
Table 2 Enrichment analysis of genes differentially spliced in IUGR The same type of analysis applied to the PE-alternatively spliced genes yielded quite different results (Table 3). In terms of Biological processes, extracellular matrix, circulatory system and several neuron/axon ontology terms were found enriched specifically in PE. KEGG and Reactome pathways consistently pointed out to extracellular matrix organization pathways. In terms of diseases databases (GLAD4U and OMIM), ‘pregnancy’ and ‘pregnancy complications’ appeared enriched, but also, interestingly, numerous pathological pathways implicated in neurodevelopmental diseases emerged with GLAD4U ('Brain diseases', 'CNS diseases', 'Dementia', 'Cri-du-Chat syndrome'), while OMIM pointed out to SLE (Qing et al. 2011) and Malaria susceptibility, consistently with the IUGR OMIM terms. In this case, nevertheless, an enrichment in genes involved in Alzheimer's disease was also found, consistently with the GLAD4U keywords.
Table 3 Enrichment analysis of genes differentially spliced in PE In summary, this comprehensive description of abnormal splicing in IUGR and PE reveals quite different enrichment of alternatively spliced genes between two important placental diseases and normal placentas. Notably, a connection with neurological disease genes was exclusive to PE. Comparing our sQTLs results, with placental eQTLs identified in previous studies (Peng et al. 2017), we found four genes in common amongst the cis-eQTLs (ACER3, CLDN1, PSG4, LGALS8). A contingency chi2 revealed a marginally significant enrichment (12.8% expected versus 33% observed, p = 0.034).
Individual validation of alternative splicing on an enlarged collection of placental samples
We next validated some of these alterations through a targeted approach, based upon exon-specific primers and RT-qPCR. For this validation, we selected 12 genes based upon their known involvement in preeclampsia and/or their very high splicing index differences between healthy and preeclamptic placentas: FLT1, CLDN1, LEP, FSTL3, TXK, CAP2, CA10, TNFRSF1B, ACOXL, TIE1, LAGLS14, CPXM2. Maximal splicing indexes (p values) were respectively: FLT1 11.27 (p = 0.0016), CLDN1 15.51 (p = 0.0063), LEP 2 (0.2957), FSTL3 − 21.76 (p = 0.005), TXK 10.9 (p = 8.55E−05), CAP2 − 8.58 (p = 0.0331), CA10 10.84 (p = 0.0557), TNFRSF1B 7.03 (p = 0.0035), ACOXL 4.46 (0.0241), TIE1 4.01 (p = 0.0034), LGALS14 − 2.22 (p = 0.5275), CPXM2 10.64 (p = 0.0301). Additional new samples were included along with the samples used in the microarray analysis, for a final total of 13 controls, 15 isolated PE, 5 PE + IUGR and 10 isolated IUGR samples. Overall, we confirmed splicing alterations in diseased placentas for 10 out of 12 genes (except LGALS14 and CPXM2). An example of analysis is given for five genes (FLT1, CLDN1, LEP, FSTL3 and TXK) in Fig. 3, and the remaining genes are presented as Supplementary figures S2.
Genetic regulation of alternative splicing in the human placenta—determination of sQTLs
We next analyzed the genetic basis of splicing alterations in the different individual placentas, focusing upon a sample of 48 genes with the most differential splicing index between normal and preeclamptic placentas (Table 1). Several of these genes are well known from the literature to be modified at the expression level between PE and normal placentas such as FLT1, LEP, PAPPA2, HTRA4 (Vaiman et al. 2013). Nineteen out of the 48 genes were also significantly modified at the expression level between controls and IUGR, as assessed by the analysis of the Angers Hospital-IUGR cohort. By PCA analysis, we could show that there was no major effect of the batch (Supplementary Figure S3).
Splicing is influenced by variants located in the vicinity of genes (cis-sQTL) and at far locations (trans-sQTL)
Each placental DNA was genotyped using a SNP genotyping array that encompasses ~ 710,000 SNPs and MatrixeQTL R package was used to investigate the impact of SNP variants on individual splicing indexes (Shabalin 2012). Placental Individual Splicing Indexes (ISIs) were computed along with the genotype data. A QQ-plot (Fig. 4) was obtained and allowed the identification of 180 cis-sQTLs (1 Mb around the gene under scrutiny) with p < 0.01 and 52 with FDR < 0.05 and 199,884 trans-sQTLs (> 1 Mb) with p < 0.01 and 52 with FDR < 0.05. The detailed list of cis-sQTLs identified is given as Supplementary Table S5. To add stringency to the approach, we re-ran the program focusing only on the control samples (Tong et al. 2018). Despite the loss of power due to this reduced size, we werestill able to detect 5 cis-sqtl with a FDR < 0.0.5 that are all common to the complete list and presented in bold in Table S5.
Cis-sQTLs have additive effects on splicing
A selection of highly significant cis-sQTLs are presented in Fig. 5. It is interesting to note that in these cases, the splicing effect is overall linear in function of the occurrence of one of the alleles (additive effect). For instance, in the first example (CYP19A1-rs12907866), the splicing index is the lowest in AA genotypes, intermediate in AB and higher in BB. The same type of profile is visible in the other examples presented. These influences on SI do not appear to be strictly connected to the disease status. Cis-sQTLs may induce alternative splicing by influencing the binding of splicing factors, by modifying the secondary structure of mRNAs, or any other local influence. However, they are not giving an image of the alternative splicing regulation at the genome level. Thus, to identify variants that influence splicing distantly from their location, we studied in more detail trans-sQTLs.
Trans s-QTL analysis reveals major loci associated to splicing in the placenta
The conventional significance threshold for a genome-wide analysis (p < 10–8) identified 52 significant SNP-gene couples for trans-sQTL (Supplementary Table S6). At p < 0.001, 24,867 QTLs were found, while 3,639 QTLs passed the p < 0.0001 threshold. As reported in many studies, keeping an FDR < 0.05 or a p < 10–8 will miss QTL with biological relevance (Morrow et al. 2018). Therefore, we developed a novel approach not to lose this relevant biological information; in particular, we were interested in identifying bandmaster loci that would control the splicing of a large part of the 48 genes that we identified as the most strongly spliced.
To perform this, we organized the dataset by sorting the SNPs along the chromosomes to find isolated SNPs (or windows of SNPs separated by less than 2000 bp—a distance in the range of minimal chromosomal block sizes of markers in linkage disequilibrium (Pritchard and Przeworski 2001)) that were significantly associated to alternative splicing of more than one gene.
A Monte-Carlo statistical analysis was performed to evaluate the longest possible windows of consecutive SNP-gene couples obtained from a random organization of the SNPs (simulated windows). At a threshold of 0.001, the maximal window length was of 19 consecutive gene-SNP couples, while random cases lead to a maximal size of 10 gene-SNP couples. Performing the same operation at the 0.0001 threshold we identified a maximum window size of 6 in the randomized dataset (one occurrence), while nine windows of more than 6 consecutive SNP-gene couples (Fig. 6a) were identified from the real dataset. The nine windows identified are located on chromosomes 2, 5, 7, 8, 10, 13, 14 (at two locations) and 21. To note, four of these windows were also detected at the threshold of 0.001: on chromosomes 5, 7, 8, and the second region on chromosome 14. Also, two of these windows (on chromosome 2 and 14) encompassed gene-SNPs couples that were significant at a genome-wide threshold (p < 10–8, rs13006826-SLC6A10P and rs9323491-CYP19A1). We decided to focus on these four regions, presented in Fig. 6b as the blue, green, orange and purple fountains of a circus Plot (Gu et al. 2014).
The Chromosome 5 region contains two SNPs separated by 702 bp, rs13185255 and rs12520828. These SNPs are located inside the ARHGAP26 gene, or its antisense and are associated with increased splicing alterations for FLT4, CLDN1, PLA2G2F, SH3BP5, P4HA1 and SLC6A10P.
The Chromosome 7 region spans 314 base pairs and encompasses 2 SNPs (rs6964915 and rs6965391). These SNPs are located inside the LOC105375161 non-coding RNA, which harbors the highest level of expression in the placenta ((https://www.ncbi.nlm.nih.gov/gene/105375161) and (Fagerberg et al. 2014)). RT-qPCR results interrogating two distinct regions of LOC105375161 showed no difference in expression between the sample groups (Supplementary Fig. S5). The splicing alterations of 3 genes were found associated with variants in this Chr7 region (FLT1, TXK and NTRK2).
The Chromosome 8 region encompasses a unique SNP (rs1431647) located inside the non-coding RNA LOC105375897, which is expressed at a low level in the placenta. Eight genes were potentially affected by this variant at the splicing level (FSTL3, FLT1, P4HA1, NTRK2, FLT4, BAIAP2, CLDN1, ST18). By RT-qPCR, we could show that LOC105375897 is expressed at similar levels between control and preeclamptic placentas, while is overexpressed significantly in IUGR ~ 30 fold (p value < 0.01) (Supplementary Fig. S5).
The Chromosome 14 region spans 828 bp and encompasses two SNPs, rs7145295 and rs7151086 that are located inside the GPHN gene (harboring the non-coding RNA LOC105370538, which was not detectable in our samples by RT-qPCR). Six genes were potentially affected at the splicing level: CYP19A1, FLT1, P4HA1, FLT4, SH3BP5, and NTRK2).
The association between these SNPs and splicing levels is represented in Fig. 7. The two SNPs identified for chromosome 5 as well as for chromosome 7 had the same profile, and thus only one SNP of each region was represented. For rs13185255 it was the heterozygous genotype that was different from the homozygous genotypes (Fig. 7a). There were no obvious association between the levels of splicing, the genotypes of the QTL and the status of the patient. For rs6965391 the AA genotype was characterized by a higher splicing index for the three target genes (Fig. 7b), with a possible association with preeclampsia but that should be confirmed on a larger sample since the AA genotype was the rarest. For rs1431647, it was also the heterozygous genotype that was different from the others, without obvious connections with the disease status (Fig. 7c). In Fig. 7d are represented the two SNPs characterizing the fourth region studied. One of the SNPs (rs7145295) presents an additive behavior in terms of splicing levels for CYP19A1 (r = 0.85, p < 10–4). AA was marginally associated with preeclampsia (Chi-Square, p = 0.022, Log-Likelihood, p = 0.029), and the data are consistent with the second SNP nearby rs7151086. In both cases, a higher splicing index characterizes the BB genotype.