Identification and analysis of murine pancreatic islet enhancers
- 2.5k Downloads
The paucity of information on the epigenetic barriers that are blocking reprogramming protocols, and on what makes a beta cell unique, has hampered efforts to develop novel beta cell sources. Here, we aimed to identify enhancers in pancreatic islets, to understand their developmental ontologies, and to identify enhancers unique to islets to increase our understanding of islet-specific gene expression.
We combined H3K4me1-based nucleosome predictions with pancreatic and duodenal homeobox 1 (PDX1), neurogenic differentiation 1 (NEUROD1), v-Maf musculoaponeurotic fibrosarcoma oncogene family, protein A (MAFA) and forkhead box A2 (FOXA2) occupancy data to identify enhancers in mouse islets.
We identified 22,223 putative enhancer loci in in vivo mouse islets. Our validation experiments suggest that nearly half of these loci are active in regulating islet gene expression, with the remaining regions probably poised for activity. We showed that these loci have at least nine developmental ontologies, and that islet enhancers predominately acquire H3K4me1 during differentiation. We next discriminated 1,799 enhancers unique to islets and showed that these islet-specific enhancers have reduced association with annotated genes, and identified a subset that are instead associated with novel islet-specific long non-coding RNAs (lncRNAs).
Our results indicate that genes with islet-specific expression and function tend to have enhancers devoid of histone methylation marks or, less often, that are bivalent or repressed, in embryonic stem cells and liver. Further, we identify a subset of enhancers unique to islets that are associated with novel islet-specific genes and lncRNAs. We anticipate that these data will facilitate the development of novel sources of functional beta cell mass.
KeywordsChIP-seq Enhancer H3K4me1 lncRNA Pancreas Transcription factor
Chromatin immunoprecipitation with quantitative PCR
Chromatin immunoprecipitation sequencing
Embryonic stem cell
Forkhead box A2
Histone H3-lysine 4 monomethylation
Histone H3-lysine 4 trimethylation
Histone H3-lysine 9 trimethylation
Histone H3-lysine 27 trimethylation
Histone H3-lysine 27 acetylation
Kyoto Encyclopedia of Genes and Genomes
Long non-coding RNA
v-Maf musculoaponeurotic fibrosarcoma oncogene family, protein A (avian)
Neurogenic differentiation 1
Pancreatic and duodenal homeobox 1
Phylogenetic codon substitution frequency
Serial analysis of gene expression
Transcriptional start site
University of California, Santa Cruz
Prevention of diabetes depends on maintaining beta cell mass and insulin-secretory capacity. For this reason, recent efforts have focused on finding ways of enhancing beta cell survival, preventing beta cell death, and stimulating replacement of beta cell mass. One strategy involves the reprogramming of embryonic stem cells (ESCs) or more abundant cell types, such as hepatocytes from the liver, which, like the pancreas, is derived from the foregut endoderm , into glucose-responsive insulin-secreting cells. However, such protocols often generate multihormonal cells, predominately produce alpha cells, or generate ‘beta cells’ that cannot match the insulin-secretory capacity of a native beta cell. In part, progress in developing better protocols is hampered by our lack of understanding of the regulatory networks that drive beta cell genesis and function, by our inability to assess how closely protocols recapitulate normal beta cell development, and by our limited awareness of the epigenetic barriers that are being encountered during reprogramming.
As enhancers largely govern tissue-specific gene expression , we anticipated that identifying enhancers in islets and analysing their chromatin states in ESCs, hepatocytes and islets would be of particular value for improving protocols for generating beta cells, and would provide novel insights into beta cell development and function. To date, enhancer loci have been detected genome-wide using transcription factor binding [3, 4] or histone modification enrichment data [2, 5], or by identifying regions of open chromatin [6, 7]. Although each of these approaches has specific benefits, they either suffer from a high rate of false positives or are unable to detect the majority of enhancers. We therefore sought to develop a novel combined approach to identifying enhancers that would take advantage of the benefits of each method, while mitigating their limitations.
For this, we combined nucleosome predictions based on histone H3 lysine-4 monomethylation (H3K4me1) enrichment data, which demarcates active and poised enhancer loci [8, 9, 10, 11], with genome-wide occupancy data for pancreatic and duodenal homeobox 1 (PDX1), v-MAF musculoaponeurotic fibrosarcoma oncogene family, protein A (avian) (MAFA), neurogenic differentiation 1 (NEUROD1) and forkhead box A2 (FOXA2), which are critical regulators of beta cell development and function [12, 13, 14, 15], to identify enhancers in in vivo mouse pancreatic islets. Using these data, we identified 22,223 putative enhancer loci. We then compared the chromatin states of these loci in ESCs and hepatocytes in order to assess their developmental ontologies and to begin to understand the types of epigenetic barriers faced by protocols using these cell types to generate beta cells. Finally, we identified enhancers unique to islets in the hope of gaining additional insight into the transcriptional networks that make beta cells different from other cell types.
Chromatin immunoprecipitation sequencing (ChIP-seq) and RNA sequencing (RNA-seq)
We performed ChIP-seq using antibodies to MAFA (Abcam, Toronto, ON, Canada) and NEUROD1 (Santa Cruz, Santa Cruz, CA, USA) using freshly isolated 8–10-week-old mouse islets, and to histone H3 lysine-9 trimethylation (H3K9me3) (Millipore, Billerica, MA, USA) and histone H3 lysine-27 trimethylation (H3K27me3) (Millipore) in islets and liver as previously described [8, 16]. RNA-seq was performed as previously described  using pooled islets from C57BL/6J mice with two replicates sequenced. For more information on mouse maintenance, islet isolations, ChIP-seq and RNA-seq, see electronic supplementary material (ESM) Methods. Data were deposited under GEO accession GSE30298.
Analysis of ChIP-seq datasets and detection of H3K4me1-marked nucleosome locations
Probabilistic inference for nucleosome positioning (PING)  was used to identify H3K4me1-marked nucleosome positions from sonicated H3K4me1 ChIP-seq data in pancreatic islets. Sequence reads from sonication-based ChIP-seq experiments for H3K4me1, histone H3 lysine-4 trimethylation (H3K4me3), H3K9me3 and H3K27me3 in ESCs, for H3K4me1 and H3K4me3 ChIP-seq experiments in islets and liver, and for PDX1 and FOXA2 in islets were obtained from previously generated data (ESM Table 1) [8, 19, 20, 21, 22, 23, 24, 25]. Reads were mapped and peaks identified as described in ESM Methods. Clustering of histone modification data was performed using the total read counts in 2 kb windows around locus midpoints.
Chromatin immunoprecipitation with quantitative PCR (ChIP-qPCR)
For ChIP-qPCR, we performed ChIP on cells using 3 μg of anti-H3K4me1 (Abcam), anti-histone H4 acetylation (H4ac) (Millipore), anti-E1A binding protein p300 (p300) (Santa Cruz), anti-histone H3-lysine 27 acetylation (H3K27ac) (Millipore), or rabbit IgG (Santa Cruz). DNA from triplicate ChIP experiments was obtained and amplified using an ABI Viia7 real-time PCR system (Applied Biosystems, Carlsbad, CA, USA) and SYBR Green Supermix (Applied Biosystems). The fold enrichment of each target site was calculated as 2−∆Ct between rabbit IgG and the immunoprecipitated samples. Primers were designed using Primer3. Primer sequences are available upon request.
Dual luciferase assays
We cloned selected loci from mouse islet DNA into pGL 4.23 (Promega, Madison, WI, USA). Cells were seeded in 96-well plates and co-transfected with 40 fmol pGL4-enhancer firefly and 0.7 fmol Renilla (pRL-TK; Promega) vectors using Lipofectamine 2000 (Life Technologies, Burlington, ON, Canada). Six replicate transfections were performed for each cell line. Cell lysates were prepared and analysed according to the manufacturer’s instructions using the Dual-Luciferase Reporter Assay System (Promega). Firefly values were normalised to Renilla. Activity was defined as 2SD (p < 0.01) above the median activity of negative controls.
Identification of novel transcripts in islets
To identify novel transcripts expressed in islets, we performed de novo assembly of islet RNA-seq reads (ESM Methods) using Trans-ABySS . Contigs were aligned to the NCBIM.37 reference genome, and alignments were annotated against Ensembl v64 genes. Contigs with alignments that did not overlap an annotated gene in this database were filtered (ESM Methods), and the coding potential of the remaining transcripts was determined using phylogenetic codon substitution frequency (PhyloCSF)  using an eight-way multispecies alignment. Transcripts with a PhyloCSF score below 100 were considered non-coding .
The expression (reads per kilobase per million mapped reads) of the contigs in the islet RNA-seq dataset and in 14 other tissues  (ESM Table 1) was calculated by determining the number of reads with a minimum quality score of 10 that overlapped each exon using SAMtools . At least three reads had to overlap a contig in a library for it to be considered to be expressed.
qPCR validation of novel transcripts
RNA was isolated from adult tissues using Trizol (Life Technologies) and RNA purification columns. All primers were designed using Primer3Plus and spanned introns where possible; primer sequences are available on request. A Viia7 real-time PCR system (Applied Biosystems) and Fast SYBR Green Master Mix (Applied Biosystems) was used for all reactions. Triplicate cDNAs were obtained by reverse transcription of 1 μg total RNA from newly isolated tissue. A 10 ng amount of generated cDNA was used in each reaction, and all reactions were performed in triplicate. Obtained values were normalised to β-actin Ct values to determine the percentage abundance relative to β-actin in each sample.
Genome-wide identification of putative enhancers in pancreatic islets
As a first step to identifying enhancers in mouse pancreatic islets, we identified in vivo locations of H3K4me1-enriched nucleosomes genome-wide [8, 18]. From this, using a strict definition of an enhancer and a stringent set of selection criteria chosen to minimise false-positive regions (ESM Methods and ESM Fig. 1a), we identified 16,835 putative enhancer loci. We identified additional loci using 13,770 PDX1-occupied and 6,176 FOXA2-occupied loci identified using previously generated ChIP-Seq data , as well as 3,638 MAFA-occupied and 6,568 NEUROD1-occupied loci using newly generated ChIP-seq data from mouse islets. Combining these data allowed us to identify 22,223 putative enhancer loci in in vivo islets flanked by H3K4me1-marked nucleosomes (ESM Fig. 1a, b and ESM Table 2).
Enhancer loci have distinct developmental ontologies
We next compared the frequency with which PDX1, MAFA, NEUROD1 and FOXA2 occupied the enhancers in the different clusters. In general, the enhancer clusters had similar occupancy frequencies (Fig. 2c, d), except for clusters e5 and e8, which had reduced occupancy levels compared with the other clusters (1.6-fold decrease, p < 0.0001, Fisher’s exact test). On the other hand, when we compared the motifs enriched in the clusters (ESM Methods), we found that cluster e1, which was in an active state in ESCs, liver and islets, was highly enriched for motifs for widely produced transcription factors (Fig. 2e), while cluster e3, which was in an active state in liver and islets, was highly enriched for motifs for transcription factors produced in both islets and liver, and clusters e4, e5 and e6, which are predominately in an active state in islets only, were enriched for motifs for transcription factors produced in the islet lineage.
Enhancers with different developmental ontologies regulate genes with distinct attributes
Identification of islet-specific enhancer (ISE) regions
To begin to determine why the identified ISEs obtained an active chromatin state uniquely in islets, we compared their underlying DNA sequence with NSEs. ISEs had a significantly lower (p < 0.0001) average GC content than NSEs (Fig. 4e). In agreement, ISEs were enriched for HNF3-, PAX-, NKX-, RFX- and PDX1-like motifs (Fig. 4f and ESM Fig. 6, 7), which are A/T rich, and were specifically deprived in ZFX-, SP-, AP- and CTCF-like motifs, which are more G/C rich (ESM Methods). Comparing the spatial distributions of ISEs and NSEs, we observed that ISEs had a broader distribution around transcriptional start sites (TSSs), while NSEs were more often proximal to TSSs (Fig. 4g); in fact, the mean enhancer to TSS distance was over three times higher for ISEs than NSEs (p < 0.0001). This resulted in significantly fewer ISEs than NSEs being mapped to known genes (p < 0.0001) (Fig. 4h).
ISEs are associated with novel islet-specific transcripts
To determine whether the ISEs that we were unable to map to any known genes (Fig. 4h) might regulate novel transcripts, such as long non-coding RNAs (lncRNAs), we used Trans-ABySS  to perform de novo assembly of the islet RNA-seq reads. Trans-ABySS identified 2,498 transcripts that met our minimum read count and exon count thresholds (ESM Methods) and that had no annotation in the Ensembl, Refseq or University of California, Santa Cruz (UCSC) databases (ESM Table 3). The expression of these transcripts was significantly lower than for known protein-coding genes (p < 0.0001), as previously reported for non-coding transcripts in other tissues . These 2,498 transcripts represented 1,473 distinct loci, suggesting that some of the loci may generate multiple transcriptional variants, or that complete transcripts were not constructed by Trans-ABySS because of the low read counts associated with these transcripts [28, 36]. Some 92% of these transcripts were associated with H3K4me1/H3K4me3-enriched regions, and 78% could be associated with an identified islet enhancer, further suggesting their validity.
As enhancers are thought to be the primary regulators of tissue-specific gene expression [2, 9], we attempted to develop a compendium of in vivo enhancer elements in pancreatic islets. We subsequently sought to determine how the developmental ontologies of the enhancer loci might affect protocols for generating beta-like cells from ESCs and hepatocytes, and to identify enhancers associated with islet-specific transcriptional networks.
In an effort to mitigate the limitations of previous approaches to identifying enhancers genome-wide [2, 3, 4, 5, 6, 7], we combined predictions of H3K4me1-marked nucleosomes with locations of PDX1, NEUROD1, MAFA and FOXA2 binding in vivo to identify enhancer regions in islets. This approach ensured that the loci identified were flanked by H3K4me1-marked nucleosomes, and were therefore both within open chromatin and in an appropriate chromatin state [9, 10, 11]. Despite this, it is possible that the identified loci could contain central unmarked nucleosomes , although such central nucleosomes would probably contain the histone variants, H3.3 and/or H2A.Z, that are common within regulatory regions . Further, by using H3K4me1, rather than H3K27ac, which only marks active loci [21, 39], we were able to identify loci both in an active state and in a poised state that may become active under different physiological conditions. Meanwhile, including PDX1, NEUROD1, MAFA and FOXA2 occupancy data allowed us to identify additional loci lost because of the stringency of our selection criteria using H3K4me1-marked nucleosome-based predictions.
Despite the benefits of our approach, we note that the 22,223 loci identified here do not represent a fully comprehensive list of enhancers in islets, in part because the stringent criteria used probably eliminated many true enhancers. In addition, mouse islets are predominately composed of beta cells (∼80%), and PDX1, MAFA and NEUROD1 are found only in beta cells in the adult, suggesting that the majority of loci identified here are probably beta cell enhancers, and enhancers from other islet cell types (alpha cells, delta cells, PP cells and epsilon cells) are probably largely not detected. Regardless, our data suggest that roughly 50% of the regions identified are active, a higher fraction than reported in other efforts to identify functional enhancers in HeLa cells using histone modification data (∼36%) , or in islets using open chromatin (∼33%) , validating the relative success of our approach. Thus, we think it likely that our list of putative enhancers in islets, while not exhaustive, is highly enriched in loci capable of acting as functional enhancer elements in beta cells, and should be of significant utility in generating novel biological insights.
Although previous efforts to identify cis-regulatory loci genome-wide in human islets have proven valuable [6, 7], these studies were unable to provide insight into the developmental ontologies of the loci. To begin to address this, we compared the chromatin states of the identified enhancers in islets with their chromatin states in ESCs, which can be used to infer the chromatin state of the loci before differentiation, and in liver, which, like the pancreas, develops from the foregut endoderm and thus can be used to infer whether a specific chromatin-state transition occurred before pancreas/liver specification. We found that enhancers in islets are generated through at least nine distinct developmental ontologies. Of these, we identified three ontologies associated with genes that have islet-specific expression and function. This included enhancers that uniquely obtained H3K4me1/H3K4me3 marks in islets, as well as those that were in a repressed state in ESCS, and those in a bivalent state in ESCs, but in an active state in islets. Our results agree with previous work indicating H3K27me3 prepatterns Pdx1 regulatory regions , and further show that genes for many of the transcription factors important in pancreas development, including Nkx2-2, Nkx6-1, Mafa and Mnx1, also have enhancers that were either bivalent or repressed in ESCs but in an active state in islets. Together, these results suggest that the primary epigenetic barriers faced in the conversion of ESCs and hepatocytes into beta-like cells are the appropriate recruitment of trithorax complexes, which can induce H3K4 methylation , and H3K27me3 demethylases, such as KDM1 lysine (K)-specific demethylase 6B (KDM6B, also known as JMJD3) , to islet-critical cis-regulatory regions.
To further define islet-specific transcriptional networks, we next discriminated 1,799 enhancers unique to islets using H3K4me1 data from 19 other cell or tissue types. We found that these enhancers have a reduced average GC content compared with NSEs, and were enriched for A/T-rich motifs for pancreas-specific transcription factors. Although we found ISEs associated with several known islet critical factors, we also found ISEs associated with many genes with as yet uncharacterised roles in beta cell function but which have islet-enriched expression, including genes involved in RNA splicing, cell adhesion and cytokine-mediated signalling. We further identified several ISEs associated with previously unknown islet-expressed lncRNAs. At least 75% of these lncRNAs were more abundantly expressed in islets than in any other tissue type, suggesting that they are also islet specific. Consistent with this, many lncRNAs are tissue specific , and are thought to play roles in regulating tissue-specific transcriptional networks or in establishing the chromatin state of tissue-specific regulatory regions . The roles of these lncRNAs, and of the other genes identified with an associated ISE, in beta cell development and function will be of considerable future interest.
In summary, we identified 22,223 putative enhancer loci in pancreatic islets. We show that these loci have at least nine distinct developmental ontologies, and find that, in contrast with promoters, the majority of enhancers acquire H3K4me1 either specifically in islets or in a shared islet/liver multipotent progenitor. Our analysis of these regions clearly points to the importance of the coordinated, stage-specific action of histone methyltransferases and histone demethylases in establishing appropriate chromatin states at enhancers and promoters that regulate genes critical to pancreas specification and beta cell function. Further, we identify 1,799 of these loci as unique to islets, and show the utility of these data by using them to help identify novel islet-specific lncRNAs. We anticipate that our data will contribute towards ongoing efforts to understand beta cell development and function, and will facilitate the development of novel strategies for generating glucose-responsive, insulin-secreting cells.
The authors would like to acknowledge Canada’s Michael Smith Genome Sciences Centre’s (BCGSC) sequencing, bioinformatics and SAGE library construction teams, as well as P. Plettner (BCGSC) and A. He (BCGSC) for technical assistance, and J. Johnson (BCGSC) and A. Kotzer (Terry Fox Research Institute) for project management.
M.A. Marra, P.A. Hoodless and S.J.M. Jones are Senior Scholars of the Michael Smith Foundation for Health Research. L. Li was supported by the Intramural Research Program of the NIH, National Institute of Environmental Health Sciences (Z01-ES-101765). R. Gottardo was supported by NIH grant HG005692. X. Zhang was supported by an NSERC-CGS fellowship. F.C. Lynn was supported by the Canadian Institutes for Health Research (MOP-102628 and RMF-111626) and the Juvenile Diabetes Research Foundation (2-2011-91). Funding was provided by Genome Canada, Genome British Columbia, the Child and Family Research Institute, the Juvenile Diabetes Research Foundation (5-2011-85), Common Wealth Insurance, the Canadian Institutes for Health Research (MOP-111010), and the Canucks for Kids Foundation, with infrastructure support provided by the British Columbia Cancer Foundation.
Duality of interest
The authors declare that there is no duality of interest associated with this manuscript.
BRT, MK, CJW, AK and BGH were involved in the acquisition and analysis of the ChIP-qPCR, qPCR and luciferase data. MB, PVS, FCL, MAM, SJMJ and BGH participated in acquisition of the ChIP-seq and/or RNA-seq data, while AGR, LL, XZ, NT, RC, KM and BGH performed the analysis and interpretation of the data. RG, MAM, SJMJ, PAH and BGH conceived and designed the experiments. BRT, AGR and BGH wrote the manuscript. MK, MB, CJW, PVS, AK, LL, XZ, NT, RC, KM, RG, MAM, FCL, SJMJ, and PAH provided critical revisions for important intellectual content. All authors approved the final version.