Patterns of differential gene expression in a cellular model of human islet development, and relationship to type 2 diabetes predisposition

Aims/hypothesis Most type 2 diabetes-associated genetic variants identified via genome-wide association studies (GWASs) appear to act via the pancreatic islet. Observed defects in insulin secretion could result from an impact of these variants on islet development and/or the function of mature islets. Most functional studies have focused on the latter, given limitations regarding access to human fetal islet tissue. Capitalising upon advances in in vitro differentiation, we characterised the transcriptomes of human induced pluripotent stem cell (iPSC) lines differentiated along the pancreatic endocrine lineage, and explored the contribution of altered islet development to the pathogenesis of type 2 diabetes. Methods We performed whole-transcriptome RNA sequencing of human iPSC lines from three independent donors, at baseline and at seven subsequent stages during in vitro islet differentiation. Differentially expressed genes (q < 0.01, log2 fold change [FC] > 1) were assigned to the stages at which they were most markedly upregulated. We used these data to characterise upstream transcription factors directing different stages of development, and to explore the relationship between RNA expression profiles and genes mapping to type 2 diabetes GWAS signals. Results We identified 9409 differentially expressed genes across all stages, including many known markers of islet development. Integration of differential expression data with information on transcription factor motifs highlighted the potential contribution of REST to islet development. Over 70% of genes mapping within type 2 diabetes-associated credible intervals showed peak differential expression during islet development, and type 2 diabetes GWAS loci of largest effect (including TCF7L2; log2FC = 1.2; q = 8.5 × 10−10) were notably enriched in genes differentially expressed at the posterior foregut stage (q = 0.002), as calculated by gene set enrichment analyses. In a complementary analysis of enrichment, genes differentially expressed in the final, beta-like cell stage of in vitro differentiation were significantly enriched (hypergeometric test, permuted p value <0.05) for genes within the credible intervals of type 2 diabetes GWAS loci. Conclusions/interpretation The present study characterises RNA expression profiles during human islet differentiation, identifies potential transcriptional regulators of the differentiation process, and suggests that the inherited predisposition to type 2 diabetes is partly mediated through modulation of islet development. Data availability Sequence data for this study has been deposited at the European Genome-phenome Archive (EGA), under accession number EGAS00001002721. Electronic supplementary material The online version of this article (10.1007/s00125-018-4612-4) contains peer-reviewed but unedited supplementary material, which is available to authorised users.

was aspirated and cells were washed once in PBS (with Ca 2+ /Mg 2+ ) before starting the differentiation. The differentiation protocol differed substantially to that used in our previous studies [10], and was carried out essentially as described by Rezania and colleagues [9] but with some modifications (ESM Tables 1, 2). All three iPSC lines were differentiated once, in parallel, using the same culture and differentiation media.

Flow cytometry
In vitro differentiation efficiency was evaluated by measuring the expression of stage- Methods for flow cytometry were as described previously [10], with details of antibodies listed in ESM Table 3.

RNA extraction, sequencing, and quantification
Cells were harvested into suspension using TrypLE Select before pelleting via centrifugation, and storage at -80°C in TRIzol reagent (15596-018; ThermoFisher Scientific, Paisley, UK). RNA was subsequently extracted as per manufacturer's guidelines. Library preparation and sequencing was performed at the Oxford Genomics Centre (Wellcome Centre for Human Genetics, University of Oxford) as described previously [10]. Briefly, polyadenylated transcripts were isolated with the NEBNext PolyA mRNA Magnetic Isolation Module (E7490 L; New England Biolabs, Ipswich, MA, USA), this followed by library preparation using the NEBNext Ultra Directional RNA Counts were normalized using variance stabilizing normalization and the network was constructed by hierarchical clustering of adjacency-based dissimilarity. This divided the genes into 29 co-expressed modules.

Differential expression analysis
Counts were filtered to include only genes that reached one cpm in all donors of at least one differentiation stage. Only autosomal protein-coding genes and long intergenic non-coding RNA (lincRNA) genes annotated in Ensembl Genes v88 (http://mar2017.archive.ensembl.org/index.html) were retained for downstream analysis. After removing genes with low counts, 15,221 protein coding genes and lincRNAs remained for normalization and differential expression analysis (ESM table 4). Gene counts were normalised and transformed to log-cpm using the voom function within the limma package (v.3.32.5) in R [16]. Contrasts were fitted comparing all the differentiation stages with iPSC as the baseline, adjusting for donor effect, as input for the eBayes function in limma for the differential expression analysis. All the reported pvalues of differential expression are adjusted for false discovery rate (q values) using the Benjamini and Hochberg method [18]. To determine the stage-specific biomarkers, differentially-expressed genes (q<0.01) with absolute log2 fold change (log2FC)>1 were assigned to the stage in which they were most upregulated compared to the baseline iPSC profile. When the log2FC was negative for all contrasted stages, the gene was assigned to iPSC (ESM Table 5).
For the comparative analysis between the previous [10] and current protocols, we retained only genes detected in both experiments, expressed at >1 cpm in all donors of at least one stage in at least one of the experiments, resulting in 15,464 protein coding genes and lincRNAs (ESM Table 6). The differential expression analysis was performed exclusively for the stages shared between the protocols (iPSC, DE, GT, PF, PE and EN) (ESM Table 7). Of note, sample numbers differed between the protocols with three samples (SB Ad2.1, SB Ad3.1, SB Neo1.1) differentiated with the optimised protocol, and two samples (SB Ad2.1, SB Ad3.4) with the previously published protocol.

Gene ontology and transcription factor binding motif enrichment
Differentially expressed genes in each stage were tested for enrichment in gene ontology (GO) terms for biological processes, using the GOstats package (v. 2.40.0) in R [19]. All the genes tested for differential expression were used as background. GO terms that were significant (5%) after BH p value adjustment were retained (ESM Table   8).
For transcription factor enrichment, upstream regulators were predicted for the lists of differentially expressed genes assigned to each stage using the iRegulon (v1.3) Cytoscape plugin [20]. Significantly enriched binding sequence motifs were detected based on 1120 chromatin immunoprecipitation (ChIP) sequencing tracks from ENCODE and 9713 non-redundant transcription factor (TF) binding motifs from 7 species. Regions 10 kb around the transcription start site of each input gene were considered, and enrichment was run with default parameters. Motifs and ChIP sequencing tracks were ranked based on Normalized Enrichment Score (NES), with only those with NES>3 (corresponding to a false discovery rate (FDR) of 3-9%) being considered. Enriched motifs were then matched to transcription factors known to bind them (ESM Table 9).

Type 2 diabetes and fasting glucose gene enrichment
Enrichment analysis was implemented in two ways: as a hypergeometric test in R (using all genes tested for differential expression as background) or using the gene scoring function in MAGENTA [21] followed by a gene set enrichment analysis (GSEA) [22,23].
For the hypergeometric test, we analysed the differentially expressed genes from each differentiation stage for enrichment in genes mapping to type 2 diabetes or fasting glucose GWAS signals, which were defined as protein coding genes and lincRNAs located within 0, 50, 100, 200 or 500 kb surrounding the credible intervals for type 2 diabetes/fasting glucose-associated loci. Credible intervals were defined by the boundaries of the 99% credible sets of variants (that collectively encompass the 99% posterior probability of association with the trait) [24] from DIAGRAM (150,000 European subjects imputed to 1000 Genomes [25], 96 loci) and ENGAGE (46,694 European subjects imputed to 1000 Genomes [26], 16 loci), respectively (ESM Table   10). This resulted in lists of 195, 262, 340, 481 and 903 genes and lincRNAs present in the background for type 2 diabetes; and 14, 27, 39, 57 and 101 for FG.
We pooled as "beta-cell function" 15 loci influencing hyperglycaemia, beta-cell function (defined as reduced insulin secretion and fasting hyperglycaemia), and insulin processing (reduction in fasting proinsulin levels); or that were significantly associated in a GWAS for intravenous glucose test for insulin secretion [27, 28] (ESM Table 11).
The enrichment analysis by hypergeometric test in R was performed using all genes tested for differential expression as background. A distribution of p values was calculated for each stage from 10,000 random samplings (without replacement) of groups of genes, of the same size as the gene sets tested, from the background, which after applying the hypergeometric test gave a distribution of random p values. Then a permuted p value was calculated for each stage, defined as the fraction of p values from the random distribution that are more extreme (smaller) than the p value from the differentially expressed gene set for each stage.  Antibodies were validated in all cases for absence of staining in developmental stages where they are not expressed. Table 4. Gene counts from each stage and sample, generated with the current differentiation protocol.

ESM
ESM Table 5. Results of the differential expression analysis contrasting the expression of each gene at each stage against the iPSC baseline from the current differentiation protocol.
ESM Table 6. Gene counts from each stage and sample generated with the current and previously-published [10] differentiation protocols. Table 7. Results of the differential expression analysis comparing corresponding stages (iPSC, DE, GT, PF, PE and EN) between the current and previously-published [10] differentiation protocols.

ESM
ESM Table 8. Enrichment in gene ontology (GO) terms for biological processes of differentially expressed genes in each stage.
ESM Table 9. Predicted transcription factors with iRegulon from enriched transcription factor binding motifs and ChIP-seq tracks from the differentially expressed genes in each stage. Table 10. 99% credible intervals of 96 type 2 diabetes-associated loci (from DIAGRAM) and 16 fasting glucose-associated loci (from ENGAGE). Table 11. Beta cell function loci, as defined by: i) Dimas and colleagues (2014) [26] as involved in hyperglycaemia (HG), beta cell function (BC) or insulin processing (PI), or 2) Wood and colleagues' (2017) Table 10), and for all differentially expressed genes in only physiological type 2 diabetes loci ("T2D [beta-cell])", ESM Table 11). We consider beta cell function loci 15 loci influencing hyperglycaemia, beta-cell function, and insulin processing [26,27]. The y-axis represents the results of the hypergeometric test in permuted p values (-log10). The horizontal grey dashed line marks the 5% significance threshold.