Background

Gene regulation is essential to the development and maintenance of life. Gene regulatory networks describe the interplay between regulatory regions, such as promoters and enhancers, and expression of their target genes [1]. Deciphering how specific regulatory regions control gene transcription can provide insights into biological processes such as cell type differentiation [2, 3], responses to addictive substances [4], and other cell functions.

The advent of new sequencing techniques has led to a greater understanding of how genes are differentially expressed. RNA-seq has provided a broader and more detailed picture of complex transcriptional states and responses [5, 6]. While genome-wide RNA-seq experiments can yield information on the many genes that are differentially transcribed in different conditions, these rich datasets reveal little about the regulatory mechanisms involved in directing these expression changes.

Epigenomic assays such as chromatin immunoprecipitation (ChIP-seq), DNAse-seq, and assay for transposase-accessible chromatin by sequencing (ATAC-seq) can interrogate chromatin accessibility and identify transcription factor binding sites [7,8,9]. The relationship between chromatin accessibility and transcription is complicated. Previous studies show little overlap between corresponding differences in chromatin and transcription [10,11,12], which highlights the complex interactions between the chromatin state and downstream gene expression. Furthermore, few studies have analyzed if changes in open chromatin induced by an intervention occur in transcriptional enhancers that can be coupled to heterologous minimal promoters to engineer intervention-inducible reporter constructs.

Here, we sought to characterize the relationship between ATAC-seq and RNA-seq data in more detail, with particular focus on whether intervention-induced changes in chromatin accessibility can accurately predict gene expression. We used the insulin signaling pathway as a model because the insulin receptor activates multiple downstream signaling pathways [13,14,15], resulting in widespread changes to the chromatin state [16] and gene expression [17]. Our data from Drosophila S2 cells show that ATAC-seq and RNA-seq datasets are correlated, mainly driven by the ATAC-seq peaks/reads located in gene promoter regions. We also show that DNA regions with increased accessibility after insulin treatment can be harnessed to generate insulin-inducible reporter constructs.

Results

ATAC-seq and RNA-seq changes in insulin-exposed S2 cells

To investigate the concordance in changes in gene expression and chromatin accessibility, we exposed serum-starved Drosophila S2 cells to insulin or vehicle and harvested the cells 4 hours later for ATAC-seq and RNA-seq analysis. We determined genome-wide changes in open chromatin by ATAC-seq and identified 9726 high-confidence peaks (i.e., regions of accessible DNA mapped to the nuclear genome) in the insulin-exposed S2 cells and 9560 in the vehicle-exposed S2 cells. Merging the control and experimental peak sets resulted in 10,269 peaks. The largest variance in this dataset (6 samples; 2 treatments × 3 replicates) arose from insulin treatment, as shown by principal component analysis (PCA; Fig. 1A). In parallel, we identified 10,287 transcripts in vehicle- and insulin-exposed S2 cells using RNA-seq. PCA indicated that the largest variance between the 6 samples resulted from insulin treatment (Fig. 1B). Because ATAC-seq provides a view of chromatin accessibility along all features of genes, we evaluated the feature distribution of both the treatment and the control ATAC-seq data (Fig. 1C, D). We observed the same genome features in the control and treated data, but the relative proportion of features was significantly different (χ2 = 19.6, df = 10, p = 0.03). This difference largely resulted from a change in the proportion of peaks annotated to distal (1-2 kb from the TSS) and proximal (≤1 kb) promoters, which increased from 8 to 9% and 58 to 60%, respectively (Fig. 1C), though the proportion of each feature was not significantly different (Fig. 1D). These results suggest that insulin signaling recruits additional regulatory features by changing chromatin accessibility.

Fig. 1
figure 1

Insulin induces widespread alterations in chromatin accessibility and transcription. Serum-starved S2 cells were treated with vehicle or insulin for 4 h, and nucleic acids were isolated and analyzed. A Principal component analysis of chromatin accessibility determined by ATAC-seq. B Principal component analysis of transcript expression by RNA-seq. C Proportions of each genomic feature type in all annotated chromatin peaks analyzed by ATAC-seq in S2 cells after treatment with insulin or vehicle. D Proportion of each annotated feature type in each biological replicate. The data represent the mean ± SD. None of the proportions were significantly different

ATAC-seq and RNA-seq reads show weak correlation driven by ATAC-seq peaks in proximal promoters

We next asked whether RNA transcript levels were correlated with ATAC-seq reads and whether the feature annotation of those ATAC-seq peaks, i.e. where in a gene they were located, mattered for any levels of correlation. 8621 out of 10,269 ATAC-seq peaks were mapped to a gene, and we plotted these ATAC-seq peak reads against the RNA-seq reads for each peak (thus duplicating many RNA-seq data points, since each gene has a median number of 2 (Quartile 1-3: 2-4) ATAC-seq peaks mapped to it). Overall, RNA-seq and ATAC-seq peak reads showed only a weak correlation (Pearson correlation coefficient R = 0.1; Fig. 2A), which was still highly significant (p = 2.2e-16) due to the number of data points. To determine if the counts for specific feature types were correlated, we stratified this analysis by the 11 ATAC-seq peak gene features. Only the peak reads in the ≤1 kb promoter class correlated with RNA-seq reads (R = 0.2, p = 2.2e-16; Fig. 2B), though this correlation was also weak. This would suggest that more highly transcribed genes require a greater extent of DNA accessibility in their promoters, which might be expected for efficient transcriptional initiation. Further, these results indicate that while information can be obtained from raw count data, these data cannot be used to predict the functional relevance of a specific genome feature type.

Fig. 2
figure 2

Chromatin peaks annotated to proximal promoters drive the correlation between normalized ATAC-seq and RNA counts. A Transcripts identified by RNA-seq were overlapped with chromatin peaks annotated to the same genes. The normalized ATAC- and RNA-seq counts were log scaled and analyzed using Pearson correlation analysis. B Overlapping ATAC- and RNA-seq counts from (A) were stratified by genomic feature. Pearson correlation analysis was used to identify feature-specific correlations between ATAC- and RNA-seq counts. Here, and in following figures, ns = not significant, *p < 0.05; **p < 0.01; ***p < 0.001

Differential gene expression and DNA accessibility correlate for multiple ATAC-seq peak feature annotations

Next, we determined the insulin-induced changes in DNA accessibility and RNA expression. In the ATAC-seq peak set, 773 peaks were significantly differentially accessible (false discovery rate, FDR < 0.1) between the insulin-exposed and control samples. 364 peaks were more accessible upon insulin exposure, while 409 peaks were less accessible after exposure to insulin (Fig. 3A). The feature distribution of those differential peaks was very similar to the feature distribution in the whole ATAC-seq peak dataset (χ2 = 6.13, df = 10, p = 0.80; Fig. 3B), though we did not detect distal downstream elements (1-2 and 2-3 kb downstream) in the differentially accessible peaks. We also examined the significant gene expression changes from the RNA-seq dataset. In this dataset, 3616 genes were differentially expressed (FDR < 0.05) between the insulin-exposed and control samples. 2056 genes were upregulated after insulin exposure, while 1560 were downregulated (Fig. 3C).

Fig. 3
figure 3

Insulin induces differential chromatin accessibility and gene expression in S2 cells. A Volcano plot of differential chromatin accessibility in significantly different chromatin peaks, stratified by feature type. The dotted line represents FDR = 0.05. B) Proportions of genomic features annotated to chromatin peaks with differential accessibility after treatment with insulin or vehicle. C Volcano plot of differential gene expression in S2 cells after treatment with vehicle or insulin. Red indicates upregulated genes and blue indicates downregulated genes. The dotted line represents FDR = 0.05

Then, we investigated the correlation between all ATAC-seq and RNA-seq log2 fold changes after insulin treatment. The overall correlation between the two datasets was significant, but very weak (R = 0.05, p = 1.8e-06; Fig. 4A). Performing the same analysis after stratifying by feature type showed weak but significant correlations between the differential RNA-seq transcripts and the differential ATAC-seq features in addition to ≤1 kb promoters (R = 0.096, p = 5.7e-13), including weak correlations with ATAC-seq peaks in enhancers (2-3 kb from the TSS: R = 0.15, p = 5.4e-4 and 1-2 kb from the TSS: R = 0.13, p = 9.0e-05). Furthermore, there were significant anticorrelations in ATAC-seq peaks for downstream elements (1-2 kb: R = − 0.67, p = 0.035) and distal intergenic regions (R = − 0.14, p = 0.0011; Fig. 4B). When we restricted the analysis to only the significant changes in ATAC-seq (FDR < 0.1) and RNA-seq (FDR < 0.05) peaks, the overall correlation for all features increased 8-fold (R = 0.42; Fig. 5A). Log2 fold changes with the same sign in ATAC- and RNA-seq peaks (i.e. both increased or both decreased) were significantly more numerous than half of all the restricted data points (p = 5.38e-10), indicating that increased chromatin accessibility is associated with increased gene expression and vice versa. The overall feature type correlations also changed when restricted to only significant changes (Fig. 5B), particularly the correlation with enhancers 1-2 kb from the TSS (R = 0.65), which increased by approximately 5-fold. Conversely, the anticorrelations with peaks in downstream and distal intergenic regions disappeared. Together, these results suggest that predicting functionally-relevant genome regions may be more accurate when the regions have significantly different accessibility. These results also suggest that DNA accessibility in enhancers is involved in mediating changes in transcription.

Fig. 4
figure 4

Insulin-induced log2 fold changes correlate between ATAC-seq and RNA-seq. A Chromatin peaks were overlapped with expressed transcripts. Pearson correlation analysis shows a weak correlation between log2 fold changes in chromatin accessibility and transcript expression. B Overlapping ATAC- and RNA-seq log2 fold change values from (A) were stratified by genomic feature. Pearson correlation analysis was used to identify correlations between ATAC- and RNA-seq counts by feature

Fig. 5
figure 5

Significant insulin-induced changes in ATAC-seq indicates recruitment of enhancers for transcript regulation. A Significant log2 fold change values from differentially-accessible chromatin peaks and differentially-expressed transcripts were analyzed by Pearson correlation analysis. B Overlapping chromatin peaks and differentially-expressed genes from (A) were stratified by feature type and reveal that distal (1-2 kb) promoter accessibility is correlated with insulin-induced transcript changes

Functional testing of significant differentially accessible ATAC-seq peaks

We next wanted to test whether any of the DNA regions from significantly more accessible ATAC-seq peaks could drive insulin-induced expression. We cloned a number of ATAC-seq peaks in front of a luciferase gene with a minimal promoter and transfected S2 cells with these vectors for 48 h. The cells were serum-starved for 18 h and then treated with 10 μM insulin or vehicle for 4 h. We were particularly interested in identifying putative enhancer elements, which are often distant from proximal promoter sequences [18] and are cell-type specific [19], so we focused on regions farther than 1 kb from the TSS. Accordingly, we selected three groups of four ATAC-seq peaks each: first, we chose the four peaks with the largest log2 fold change, indicating increased accessibility after insulin treatment (Additional file 1). Of the four tested plasmids, one showed significantly increased luciferase activity after insulin treatment: 3 L114 (annotated to CG6149; 1.2-fold increase; p = 0.016; Fig. 6A). Because ATAC-seq peaks in enhancer regions were the most strongly correlated with differential gene expression in our above analysis (Fig. 5B), we next chose four peaks with the highest log2 fold change from enhancer regions that were significantly more accessible after insulin. Of the tested peaks, 2 produced significantly increased luciferase activity after insulin treatment: 2 L225 (annotated to snoRNA:psi285-2996; 1.1-fold increase; p = 0.0033) and 2R111 (annotated to schnurri; 1.4-fold increase; p = 0.025; Fig. 6B). Lastly, because introns often contain regulatory regions that contain instructive DNA for expression [20], we chose the four intron regions with the largest log2 fold changes for luciferase assays. One ATAC-seq peak resulted in significantly increased luciferase activity: X216 (annotated to lncRNA flamenco; 1.3-fold increase; p = 0.05) (Fig. 6C). These data show that DNA regions with increased accessibility upon insulin treatment can indeed drive insulin-induced increases in expression when placed in front of a heterologous promoter.

Fig. 6
figure 6

Cloned DNA from differentially-accessible chromatin regions can induce luciferase activity upon insulin application. DNA was cloned in front of a minimal promoter and luciferase gene, S2 cells were transfected for 18 h and then treated with insulin or vehicle for 4 h. A Candidate ATAC-seq peaks were selected by the largest log2 fold change. B Chromatin peaks from promoters (1-2 kb from the TSS) were the feature that was most highly correlated with differential transcript expression. Peaks with the highest log2 fold change from this correlation were cloned upstream of luciferase for functional validation. C Chromatin peaks with significantly different accessibility were selected from introns, a genomic feature known to contain regulatory regions. Data represent means ± standard error of three biological replicates. Differences were analyzed by Student’s t-test

Limited predictability of the levels of expression and inducibility

Out of the twelve ATAC-seq peaks we cloned and tested, all led to variable levels of luciferase expression, while only four caused significant insulin-inducibility. To determine whether the luciferase expression levels and inducibility by insulin was predictable from our ATAC-seq and RNA-seq datasets, we analyzed the correlation between the –omics data and luciferase activity. First, we asked if expression levels of luciferase were correlated with ATAC-seq peak reads, but found no correlation (Fig. 7A), even when we stratified the data according to enhancer- (Fig. 7B) or intron-derived ATAC-seq peaks (Fig. 7C). Similarly, RNA-seq counts did not correlate with S2 luciferase luminescence (Fig. 7D-F). Next, we asked whether the log2 fold changes in the –omics data sets correlated with the relative inducibility of luciferase by insulin (measured as insulin/vehicle ratios). Again, we failed to observe significant correlations of S2 inducibility with log2 fold changes in ATAC-seq (Fig. 8A) and RNA-seq (Fig. 8B) reads, even when we analyzed only the cloned peaks that led to significant insulin-induced changes (Fig. 8C, D).

Fig. 7
figure 7

ATAC-seq and RNA-seq counts are not correlated with functional luciferase activity. Log-transformed ATAC-seq counts (A-C) and RNA-seq counts (D-F) from S2 cells were correlated to insulin-induced luciferase activity. A Overall correlation between counts from the tested ATAC-seq peaks and luciferase activity; B) Promoters; C) Introns. D Overall correlations between counts from genes annotated to the tested ATAC-seq peaks and luciferase activity; E) Promoters; F) Introns. Associations were analyzed by Pearson correlation analysis. Each point represents a biological replicate, and each peak was tested in triplicate

Fig. 8
figure 8

ATAC-seq and RNA-seq log2 fold changes do not predict insulin inducibility. A ATAC-seq and B) RNA-seq log2 fold changes from S2 cells were correlated to insulin-induced luciferase activity, shown as the ratio between luciferase activity in vehicle- vs. insulin treated cells. C Correlation between the ATAC-seq peaks driving significantly increased luciferase activity and associated ATAC-seq log2 fold changes. D Correlation between the ATAC-seq peaks driving significantly increased luciferase activity and RNA-seq log2 fold changes in the associated genes. Associations were analyzed by Pearson correlation analysis. Each point represents a biological replicate, and each peak was tested in triplicate

Discussion

Next-generation sequencing has enabled an unprecedented amount of genome-wide information on RNA transcript levels and DNA accessibility. ATAC-seq data provides accessibility information from distinct features/regions of a gene, thereby suggesting gene regions that act as functional enhancers (or suppressors) of gene expression. Here, we investigated the correlation between genome-wide changes in DNA accessibility and transcript levels and found weak, but significant correlations that were mostly driven by proximal promoter and distal enhancer regions. Cloning some of these DNA regions with increased accessibility upon insulin stimulation showed that some of them indeed act as transcriptional enhancers, demonstrating that genome-wide ATAC-seq can be harnessed to clone functionally-active insulin-response elements.

To investigate the functional relevance of differential DNA accessibility, we first determined genome-wide ATAC-seq reads in Drosophila S2 cells from serum-starved and insulin-exposed conditions (Fig. 1). The insulin receptor activates several downstream pathways, including the PI3K [21] and Ras/ERK [22] pathways, which have various effects on the chromatin state [23, 24] and gene expression [25] during several cellular processes including cell growth, protein synthesis, and gluconeogenesis [26]. Thus, activating insulin signaling provided a way to identify genome-wide chromatin and gene expression changes, which allowed us to integrate these physiological changes and determine whether chromatin regions that become more open after insulin signaling could predict gene regulation. We found significant (though weak) overall correlations between ATAC-seq reads and transcript levels, which were driven by ATAC-seq peaks in proximal promoters (Fig. 2). In ATAC-seq, genome regions with increased accessibility result in a higher mapped read count [9]. Because promoter regions are critical for the initiation of transcription, these genomic regions are generally accessible for actively-transcribed genes [27]. Thus, proximal promoter regions largely drive the overall weak correlation between ATAC-seq and RNA-seq counts that we observed. These data indicate that normalized counts can identify correlations between chromatin and gene expression, but these correlations are likely limited to promoter regions for actively transcribed genes.

When we analyzed correlations between all insulin-induced log2 fold changes in ATAC-seq peak and transcript reads (Fig. 3), changes in open chromatin in distal (1-2 and 2-3 kb away from the TSS) promoter regions also correlated significantly (though weakly) with changes in transcript levels (Fig. 4). This suggests that the application of insulin recruits additional enhancers that participate in promoting transcription. Conversely, other enhancer regions become less accessible, and the linked genes are less transcribed with insulin. The weak correlations between distal enhancer and proximal promoter with transcript levels increased when we analyzed only ATAC-seq peaks that changed significantly with insulin (Fig. 5). These results suggest that perturbations that cause differential gene expression occur via recruitment of additional regulatory promoter features. The correlations between differential transcript levels and differentially accessible promoter regions were all positive, suggesting that these regions play a role in the downstream differential gene expression. However, these data do not exclude the possibility that in some genes, insulin might lead to increased accessibility at promoters which are then bound by transcriptional repressors, leading to decreased transcription. Indeed, numerous ATAC-seq peak/transcript data points are in quadrants of anticorrelation (Fig. 5), and the insulin-induced transcription factor FOXO is known to have transcriptional repressor activity [28, 29]. Future experiments focusing on such anticorrelated data pairs/genes might reveal DNA regions that lead to insulin-induced transcriptional repression.

Our main goal was to determine whether we could harness our ATAC-seq data to generate insulin-inducible reporter plasmids. We selected ATAC-seq peaks based on our correlation analysis of differentially-accessible chromatin regions and differential transcript expression. We particularly focused on enhancers (1-2 kb from the TSS) because the correlation increase was the largest for this feature. Distal regions may include regulatory regions such as enhancers or repressors that are critically involved in regulating gene expression [30]. Our results suggested that these regions can drive differential gene expression (Fig. 6). We also selected peaks with relatively large log2 fold changes in intron peaks. In Drosophila, intronic regions often contain regulatory sequences [20], so altering chromatin accessibility in genome regions associated with introns is one mechanism to control gene expression [31]. Finally, we selected peaks with the largest log2 fold changes, irrespective of feature type. In each of these three categories we found peaks that led to significant insulin-induced increases in reporter gene expression. However, none of the three categories seemed obviously more promising for predicting insulin-inducibility. Furthermore, neither read counts nor log2 fold changes in ATAC-seq or RNA-seq were predictive of insulin-inducibility (Figs. 7, 8). This suggests that while ATAC-seq data can be successfully harnessed to generate insulin-inducible reporter constructs, their efficacy is not obviously predictable and will require larger datasets to understand which ATAC-seq peaks can be utilized to generate functionally relevant transgenes. Indeed, previous studies investigating putative enhancer elements identified candidates based on overlap with known histone marks (H3K4me1, H3K27ac, etc. identified by ChIP-seq), known enhancers associated with annotated genes of interest [32,33,34], or used massively parallel reporter assays [35]. In contrast, our goal was to determine whether ATAC-seq alone could predict downstream transcription using only feature-based or fold change-based selection. Importantly, these previous studies showed similar success rates to ours. Peaks with increased chromatin accessibility after insulin treatment that did not result in insulin-induced luciferase activity may represent regulatory elements that are involved in setting up poised transcription or may contain repressor regions that pause transcription. In contrast, peaks causing increased luciferase activity may represent sequences that are sufficient to initiate transcription or activate promoter clearance [36,37,38]. Additional studies using ChIP-seq to identify the histone marks at our tested peak sequences will be required to determine whether they are enhancers involved in poised versus active transcription.

Conclusions

Our investigation shows that ATAC-seq data can be harnessed to isolate regulatory DNA regions that are both expressed and inducible. However, because chromatin peaks may be one of several regulatory sequences [20, 30], these chromatin regions cannot be easily predicted by analysis of these genome-wide –omics data alone and must be functionally validated. Still, our data show the feasibility of using ATAC-seq to generate active transgenes that are inducible by an intervention or by a diseased state to drive a reporter, or even a disease-antidote gene.

Methods

Cell culture

Drosophila S2 cells (Drosophila Genomics Resource Center, Bloomington, IN, USA) were cultured in Schneider’s Drosophila Medium (ThermoFisher, Waltham, MA, USA) supplemented with 10% fetal bovine serum (ThermoFisher) at 25 °C. Cells were cultured in Schneider’s medium without FBS for 24 h before experiments. Then, cells were incubated with 10 μM insulin (Sigma Aldrich, St. Louis, MO, USA) or vehicle (25 mM HEPES, pH 8.2) for 4 h at 25 °C.

ATAC-seq

S2 cells were incubated with 3 μM DAPI for 10 min. 60,000 cells per sample were sorted using a BD FACS Aria flow cytometer (BD Biosciences, San Jose, CA, USA). DAPI-negative cells were collected into ice-cold PBS (pH 7.4). After sorting, the samples were washed once with ice-cold PBS and centrifuged at 500 g for 5 min at 4 °C. ATAC-seq libraries were prepared as previously described [39]. Briefly, 50 μL lysis buffer (10 mM Tris-HCl 7.4, 10 mM NaCl, 3 mM MgCl2, 0.1% NP40) was added to each sample, and the sample was centrifuged at 500 g for 10 min at 4 °C. The supernatant was removed, and the nuclei pellet was tagmented using a Nextera DNA Library Prep kit (Illumina, Inc., San Diego, CA, USA) as previously described. Then, the tagmented DNA was purified using a Qiagen MinElute PCR Purification Kit (Qiagen, Germantown, MD, USA). The purified DNA was PCR amplified for 5 cycles using a Nextera DNA Library Index kit (Illumina) and Phusion HF Master Mix (New England BioLabs, Inc., Ipswich, MA, USA) with the following protocol: 72 °C for 5 min, 98 °C for 30 sec, and 5 cycles of 98 °C for 10 sec, 63 °C for 30 sec, and 72 °C for 1 min. A 5-μL aliquot of the pre-amplified reaction was analyzed by qPCR using SsoFast EvaGreen Supermix (Bio-Rad Life Science, Inc., Hercules, CA, USA) and an Applied Biosystems 7900HT qPCR instrument (ThermoFisher) using the following protocol: 1 cycle of 98 °C for 30 sec and 40 cycles of 98 °C for 10 sec, 63 °C for 30 sec, and 72 °C for 1 min. Then, the pre-amplified PCR mixture was amplified for another 10 cycles (corresponding to 1/3 maximum fluorescence from the qPCR assay) using the same thermocycling parameters. After amplification, the libraries were purified using AMPure XP beads (Beckman Coulter Life Sciences, Indianapolis, IN, USA). Libraries were sequenced on an Illumina HiSeq 2500 instrument using 50-bp single-end reads.

RNA-seq

Total RNA was isolated from S2 cells using a PureLink RNA purification kit (ThermoFisher). Then, rRNA was removed from each sample with a Ribo-Zero rRNA Removal kit (Illumina). RNA libraries were constructed using a NEBNext Ultra II RNA Library Kit for Illumina and NEBNext Multiplex Oligos for Illumina, Primer Set 1 (New England Biolabs). Libraries were sequenced on an Illumina HiSeq 2500 instrument using 50-bp single-end reads.

ATAC-seq data analysis

ATAC-seq Fastq files were aligned to the dm6 genome assembly (http://ftp.flybase.net/releases/FB2018_06/dmel_r6.25/fasta/) using Novocraft Novoalign with the following settings: --NonC -o SAM -r Random. SAM files were processed to BAM format, sorted, and indexed using Samtools [40]. BAM files were reads per million-normalized and converted to bigWig files using the Bio-ToolBox ‘bam2wig.pl’ program (https://github.com/tjparnell/biotoolbox/blob/master/scripts/bam2wig.pl). Peak calling was performed on the bigWig files by utilizing the Multi-Replicate Macs ChIPseq pipeline (https://github.com/HuntsmanCancerInstitute/MultiRepMacsChIPseq) with the following settings: --dupfrac 0.2 --size 200 --cutoff 2 --peaksize 300 --peakgap 200. Fractions of reads in peaks values were calculated in R using the Rsubread package (version 2.8.2). Transcription start site enrichment was calculated using Deeptools (version 3.3.2). Called peaks were annotated in R with the ChIPseeker package [41]. Count data for called peaks was collected from processed BAM files using the Bio-ToolBox ‘get_datasets.pl’ program (https://metacpan.org/pod/distribution/Bio-ToolBox/scripts/get_datasets.pl). The count data was then used to identify differentially accessible regions with the R package DEseq2 [42]. ATAC-seq quality metrics are shown in Additional file 1.

RNA-seq data analysis

RNA-seq fastq files were aligned to the BDGP6 genome assembly using the STAR aligner [43] with the following settings: --twopassMode Basic --outSAMtype BAM SortedByCoordinate --outWigType bedGraph --outWigStrand Unstranded --clip3pAdapterseq AGATCGGAAGAGCACACGTCTGAACTCCAGTCA. The resulting sorted BAM files were indexed using Samtools (Li et al., 2009). FeatureCounts was used to collect count data for BDGP6 genes using the following command: -T 16 -s 2 [44]. Count data for all replicates and experimental conditions were combined into a single count matrix in R. The count matrix was subsequently used to identify differentially expressed genes with the R package DEseq2 [42].

Integration analysis of ATAC-Seq and RNA-Seq datasets

The ATAC-seq peak data were compared to the RNA-seq data to determine how chromatin accessibility influenced gene expression. The raw RNA-seq and ATAC-seq counts for each sample were compared using the gene annotation of the ATAC-seq peak and the assigned RNA-seq gene. The raw count value was averaged by experimental condition and genomic assay type. Then, the RNA-seq and ATAC-seq datasets were compared using the annotated genes and the log2 fold change values for each peak/gene in the respective genomic assay. ATAC-seq peaks with an FDR < 0.1 and genes detected by RNA-seq with an FDR < 0.05 were used to compare the differentially accessible peaks and differentially expressed genes. Pearson correlation analysis was performed between the log2 fold change values of the genomic assays and between the average raw count values of the genomic assays (controlling for the experimental condition).

Plasmid construction and transformation

A multiple cloning site (MCS) was cloned into the backbone pDEST VanGlow-GL vector [45]. Then, we removed the mini-white+ cassette using the restriction enzymes AflII (3 L137, 3R131, X216, 3 L114, 2 L796, 3 L981, 2 L220, 2R177, 2 L225, and FOXO TFBS) or SmaI and PmlI (X950, 2 L846, and 2R111). The digested plasmids were incubated with T4 ligase for 15 min at room temperature and purified by 1% gel electrophoresis. The plasmids were linearized using AvrII and PacI (sites contained in MCS; all enzymes from New England BioLabs).

Genomic DNA was purified from S2 cells using a Monarch Genomic DNA Purification kit. Candidate chromatin peak sequences and 100-200 bp flanking sequences [34] (Additional file 2) were amplified using Phusion High-Fidelity PCR MasterMix (primer sequences are listed in Additional file 3) and a C1000 thermocycler (Bio-Rad Life Science). The peak sequences were amplified for 98 °C for 5 min, followed by 35 cycles of 98 °C for 30 sec, 52-68 °C gradient for 30 sec, 72 °C for 4 min, and a final incubation for 5 min at 72 °C. The amplified fragments were purified on 1% agarose gels and extracted using a Monarch Gel Purification kit and cloned into linearized VanGlo-GL-MCS plasmid using NEBuilder HiFi Assembly master mix. The assembled plasmids were transformed into DH5α cells and grown overnight. Clones were screened by restriction digestion using EcoRI-HF. Sequences were confirmed by Sanger sequencing at GeneWiz (South Plainfield, NJ, USA). Confirmed plasmids were transformed into S2 cells using TransIT Insect Transfection Reagent (Mirus Bio, Madison, WI, USA). Transformed cells were grown for 48 h before use in experiments.

Luciferase assays

Transformed S2 cells were serum starved for 24 h and treated with insulin or vehicle as described above (Cell culture). Then, luciferase activity was assayed using a Luciferase Reporter Substrate Assay Kit-Firefly (Abcam, Cambridge, MA, USA). Luminescence was detected with a BioTek Synergy HTX microplate reader (BioTek Instruments, Winooski, VT, USA) and Gen5 2.01.17 software (BioTek Instruments).

Statistical analysis

Statistical differences in relative luminescence data were analyzed by Student’s t-tests with at least three biological replicates using GraphPad Prism 9.0 software. Differences between genome feature proportions were analyzed using χ2 tests included in R [46] Correlations were analyzed using Pearson correlation tests included in R. One-proportion z-tests were performed using the binom.test function in R. Heatmaps were generated using the R package ComplexHeatmap [47]. PCA plots were created using the R package pcaExplorer [48]. Correlation plots were produced with the R package ggpubr (https://github.com/kassambara/ggpubr).