Methylation of histone H3 lysine 4 is required for maintenance of beta cell function in adult mice

Aims/hypothesis Beta cells control glucose homeostasis via regulated production and secretion of insulin. This function arises from a highly specialised gene expression programme that is established during development and then sustained, with limited flexibility, in terminally differentiated cells. Dysregulation of this programme is seen in type 2 diabetes but mechanisms that preserve gene expression or underlie its dysregulation in mature cells are not well resolved. This study investigated whether methylation of histone H3 lysine 4 (H3K4), a marker of gene promoters with unresolved functional importance, is necessary for the maintenance of mature beta cell function. Methods Beta cell function, gene expression and chromatin modifications were analysed in conditional Dpy30 knockout mice, in which H3K4 methyltransferase activity is impaired, and in a mouse model of diabetes. Results H3K4 methylation maintains expression of genes that are important for insulin biosynthesis and glucose responsiveness. Deficient methylation of H3K4 leads to a less active and more repressed epigenome profile that locally correlates with gene expression deficits but does not globally reduce gene expression. Instead, developmentally regulated genes and genes in weakly active or suppressed states particularly rely on H3K4 methylation. We further show that H3K4 trimethylation (H3K4me3) is reorganised in islets from the Leprdb/db mouse model of diabetes in favour of weakly active and disallowed genes at the expense of terminal beta cell markers with broad H3K4me3 peaks. Conclusions/interpretation Sustained methylation of H3K4 is critical for the maintenance of beta cell function. Redistribution of H3K4me3 is linked to gene expression changes that are implicated in diabetes pathology. Graphical abstract Supplementary Information The online version of this article (10.1007/s00125-023-05896-6) contains peer-reviewed but unedited supplementary material.

ChIP-seq reads were aligned to concatenated mouse (GRCm38/mm10) and Drosophila (BDGP6.28) genome assemblies using Bowtie2 v2.3.4.1 with options '--very-sensitive --no-unal --no-discordant' [3]. Multi-mapped reads, reads with mapping quality MAPQ < 20, and suspected PCR-duplicates were discarded using Samtools [4]. To scale enrichment according to the Drosophila cell spike-ins, the fraction of remaining reads that mapped to the Drosophila genome was determined, and a scaling factor was calculated for each sample as (minimum Drosophila read fraction per histone mark)/(sample Drosophila read fraction). After determining a scaling factor, each sample was downsampled to the lowest mapped read count per each histone mark using Samtools, and then reads mapped to the Drosophila genome were removed. For visualization, libraries were converted to bedgraph format using the calculated scaling factor in the '-scale' argument of BEDtools v2.26.0 genomecov [5]. Genome browser views were generated for merged replicates using Spark [6] and TSS profiles using Deeptools v3.4.2 [7]. Peak locations and breadth were determined using MACS2 v2.2.6 with parameters '-broad --nolambda' [8]. Peaks detected in only one biological replicate or that overlapped with high-background blacklisted regions defined by the ENCODE project [9] were excluded. Chromosomes X, Y, and M were also excluded from the analyses. DESeq2 [10] was used for differential peak width analysis using a significance cutoff of P ≤ 0.05 (Wald test with Benjamini and Hochberg adjustment), and enrichment for differential breadth in peak quantiles was determined by one-sided Fisher's exact test. We defined TSS locations as the 5' end of the most abundant annotated transcript of each gene detected in our RNA-seq data. H3K4me3 peaks were assigned to genes for which they overlap the region ±1 kb of the TSS. For ESM Fig. 4b, H3K27ac peaks were assigned to the nearest TSS with no maximum distance. The null distribution of peak-to-gene associations was estimated by randomly permuting peak genomic locations corresponding 100 times using bedtools shuffle.
Residual DNA was digested by Turbo DNase (Thermo Fisher Scientific) for 30-minutes at 37°C. Purified total RNA content was measured using the Qubit RNA HS assay (Thermo Fisher Scientific). Mature mRNA was enriched from 400 ng of total RNA per sample using the NEBNext Poly(A) mRNA Magnetic Isolation Module (New England Biolabs) and used to prepare libraries with the NEBNext Ultra II Directional RNA Library Prep Kit for Illumina (New England Biolabs) with 11 amplification cycles. Uniquely indexed libraries were quantified, pooled, and sequenced for 2 × 38 nucleotide reads as described above. Sequence data is available in the GEO under accession number GSE181951.
Sequenced reads were aligned to the concatenated mouse (GRCm38/mm10; Gencode vM24) and Drosophila (BDGP6.28; Ensembl release 99) genomes and transcriptomes using STAR v2.7.3a [12] with default settings. Transcript abundance was calculated using Salmon v1.4.0 [13] in alignment mode with seqBias and gcBias options. Gene counts and differential expression were calculated using DESeq2 [10]. Genes with fewer than two counts in any sample were filtered out and genes showing ≥ 2-fold difference in expression with P ≤ 0.01 (Wald test with Benjamini and Hochberg adjustment for multiple comparisons) are considered differentially expressed. Genes showing <1.1-fold difference in expression are considered stable. Except in Fig. 3e, Drosophila transcripts were removed after abundance calculation; for Fig. 3e, Drosophila transcripts were used as control genes in the estimateSizeFactors function of DESeq2, and then removed.
Single cell RNA-seq Islets from one WT and one KO mouse were dissociated to single-cell suspensions as described above without FACS-enrichment. The cell suspensions were processed through the Chromium Single Cell 3' protocol using the Chromium Controller (firmware v4.0) with Reagent Kit v3.1 (10X Genomics) and Dual Index Kit TT Set A (10X Genomics) according to the manufacturer's instructions for a targeted 5,000-cell recovery. Total cDNA was amplified for 11-cycles, and then 13-cycles during index-ligation. cDNA concentrations were quantified by qPCR with the NEBNext Library Quant Kit for Illumina (New England Biolabs). Libraries were pooled and sequenced for 28-90 paired-end nucleotide reads to an average depth of ~70,000 mapped fragments per cell. Sequence data is available in the GEO under accession number GSE181951.
Cellranger (v5.0.0, 10X Genomics) was used to generate FASTQ files and demultiplex reads. Cell barcode detection, read mapping, quality filtering, and transcript counting were performed using Alevin (Salmon v1.4.0) [14] against the mouse protein-coding transcriptome (Gencode vM24), which we appended with the Tdtomato and Egfp sequences, as well as the mouse genome (GRCm38/mm10) as mapping decoys for selective alignment [15]. Additional cell quality filtering was performed using Seurat v4.1.2 [16] based on unique features (> 750 genes) and mitochondrial RNA content (< 20%). Putative doublets were excluded using DoubletFilter v2.0 [17]. Genes detected in < 3 cells were excluded from analysis. After quality filtering, the dataset comprised 3484 WT and 3685 KO cells. Filtered data were scaled and normalized using SCTransform v0.3.5 [18] with default parameters. PCA and UMAP dimensionality reduction using the first 60 principal components was used to cluster and visualize islet cell populations. Beta cell clusters (identified based on Ins1/2 expression) from WT and KO samples were then merged, transformed, and clustered in UMAP space using 15 principal components. Slingshot v2.4.0 [19] was used for pseudotime analysis to rank cells along a continuous trajectory of transcriptome remodeling from beta cell cluster 1 to 2. Entropy scores are a modified Shannon entropy calculated using a published R function (ref [20]). Variance/mean ratios for each gene in each beta cell cluster was calculated from the raw RNA count, excluding genes detected in < 1 % of cells. Gene regulatory network inference was performed using SCENIC [21,22]. For this, raw RNA counts from cells in beta cell clusters were exported in loom format using SCopeLoomR v0.13.0. A list of mouse transcription factors was downloaded from the cisTarget database (https://resources.aertslab.org/cistarget/). Co-expression modules were inferred using GRNBoost2 in a Docker Hub image of pySCENIC v0.12.0 [22]. This step was performed 100 times (with seed 1-100) and transcription factor/gene pairs identified in fewer than 80 iterations were discarded. The "importance" values of surviving pairs were averaged. The resulting high-confidence gene network was processed through pySCENIC's ctx and aucell functions using mouse transcription factor motif and target gene promoter annotations downloaded from the cisTarget database (motifs-v10nr_clust-nr.mgi-m0.001-o0.0.tbl, mm10_500bp_up_100bp_down_full_tx_clustered.genes_vs_motifs.rankings.feather, mm10_10kbp_up_10kbp_down_full_tx_clustered.genes_vs_motifs.rankings.feather) with default settings. The resulting loom file was loaded back into R. To plot the relative activity of each regulon in individual beta cells, regulon AUCs were added to the Seurat object using CreateAssayObject() and scaled using ScaleData() commands. To compare the overall activity of each regulon between beta cell clusters, regulon specificity scores were generated using the calcRSS() command after down-sampling cells to equalize cell numbers in each cluster being compared. randomizing the genomic location of peaks 100 times. For example, the left-most column shows that roughly 3% of H3K27ac peaks that increase in intensity are closest to genes that are upregulated; if H3K27ac peaks are randomly distributed in the genome, roughly 1 to 2% of H3K27ac peaks that increase in intensity are closest to genes that are upregulated. P values were calculated using permutation tests. **p<0.01