Background & Summary

An estimated 15,780 children in the United States will be diagnosed with cancer in 20191. While 80% of pediatric cancer patients overcome this disease, 20% of children do not survive, and survivors often have multiple side effects of therapy1. Neuroblastoma accounts for more than 7% of malignancies in patients under 15 years of age and approximately 12% of all pediatric cancer-related deaths (for review see2). Neuroblastoma shows wide phenotypic variability, with tumors arising in children diagnosed under the age of 18 months often spontaneously regressing with little or no treatment, but patients diagnosed at an older age or with unfavorable genomic features often showing a relentlessly progressive and widely metastatic disease pattern despite intensive, multimodal therapy (for review see2,3,4). Ninety-eight percent of low-risk neuroblastoma disease are currently cured5, however, the survival rate for patients with high-risk neuroblastoma remains less than 50%6. Relapsed high-risk neuroblastoma is typically incurable7, and thus these children require improved therapeutic options.

A major prognostic factor predicting the severity, risk, and inferior outcome for neuroblastoma patients is amplification of the proto-oncogene MYCN. MYCN amplification occurs in nearly 20% of all neuroblastomas, and approximately 50% of patients with high-risk disease8,9. It is a truncal genomic event, and typically stable across the spectrum of therapy and disease recurrence. MYCN, along with structural and binding homologues MYC and MYCL, are members of the MYC transcription factor family10 and have been implicated in transcriptional regulation of proteins involved in cell growth11, proliferation12, and ribosome biogenesis12. Mounting evidence has also indicated that MYCN and MYC are functionally redundant13,14,15. However, the protein expression of MYC and MYCN appears to be mutually exclusive. For example, neuroblastoma tumors with MYCN amplification typically lack or have low MYC mRNA expression9. The strong influence of MYCN on the progression and metastasis of neuroblastoma makes it a key target for therapy, but due to its global transcriptional activity, it is necessary to develop a better understanding of which of its gene targets directly influence oncogenesis.

To better understand the regulatory effects of MYC family proteins in neuroblastoma, we performed ChIP-Seq data for MYCN in six neuroblastoma cell lines with MYCN amplification, MYC in four neuroblastoma cell lines without MYCN amplification, and H3K27Ac, H3K27me3, H3K4me1, and H3K4me3 histone modifications along with ATAC-Seq in all ten neuroblastoma cell lines (with ATAC data in four additional lines also reported here). All of the cell lines here also have RNA sequencing data freely available16.

Methods

Online Table 1 summarizes which assays were performed for each cell line, and an overview of the workflow is shown in Fig. 1.

Fig. 1
figure 1

Experimental workflow. (a) Cells were thawed, grown, and expanded until 70–80% confluency in a 150 mm dish. (b) For ChIP-Seq, cells were fixed, collected, and frozen (N = 1 biological replicate per cell line). Libraries were prepared, sequenced, and data analyzed. (c) For ATAC-Seq (n = 14 samples with n = 2 biological replicates), cells were incubated in a transposition reaction, DNA was purified, and amplified with limited PCR. Libraries were prepared, sequenced, and analyzed. Diagram was created using Servier Medical ART (https://smart.servier.com/).

Cell growth and expansion

The cell lines used to collect this data were obtained from multiple sources: the Children’s Oncology Group (COG) Cell Culture and Xenograft Repository at Texas Tech University Health Sciences Center (www.cccells.org), the American Type Culture Collection (Manassas, MA), or the Children’s Hospital of Philadelphia (CHOP) cell line bank. All of the cell growth and preparations were done at CHOP. The neuroblastoma cell lines were cultured using media (Table 1) and methods as previously described16. Briefly, cells were thawed by floating in a 37 °C water bath for 2–3 minutes. Cells were then added to a 15 mL conical tube, containing 5 mL of the appropriate growth media, and centrifuged at 300 × g for 5 minutes at room temperature (RT). Media was then carefully aspirated off, and the pelleted cells were resuspended in 1 mL of media before being transferred to a 75 mm2 flask containing 10 mL of growth media. Cells were incubated at 37 °C with a 5.0% CO2 concentration. When cells reach 70–80% confluency, media was aspirated off and cells were gently washed with 1X PBS. Following aspiration of the PBS, 3 mL of the appropriate detachment solution (noted in Table 1) was added and the flask was incubated at 37 °C for 2–5 minutes. Cells were then gathered by tilting the plate at a 45° angle and washing with at least 4 mL of the appropriate growth media, and transferred to a 15 mL conical. After centrifugation for 5 minutes at 300 × g. Media was aspirated off, and the pellet was resuspended in 1 mL of growth media and transferred to a 150 mm cell culture dish containing 19 mL of growth media. Cells were incubated at 37 °C with a 5.0% CO2 concentration until reaching 70–80% confluency. Necessary materials and reagents are listed in Online Table 2.

Table 1 Neuroblastoma cell line information.

Immunoblotting

Whole cell lysates were prepared using a mixture of cell lysis buffer (Cell Signaling, #9803), PSMF (Cell Signaling, 8553S), Phosphatase Inhibitor Cocktail 2 (P5726, Sigma Aldrich), Phosphatase Inhibitor Cocktail 3 (P0044, Sigma Aldrich), and PBS (Gibco, 14190–136). Cells were resuspended in lysis buffer and kept on ice for 15 minutes. Cells were then spun at 14,000 × g at 4 °C for 15 minutes. The supernatant was collected and protein concentration was quantified using the Pierce BCA Protein Assay kit (Thermo Scientific, #23225). Next, 20 μg of protein was loaded using 4X Laemmli sample buffer (BioRad, #1610747) and separated on a 4–15% Criterion™ TGX™ Precast Midi Protein Gel (#5671085), and transferred to an Immobilon Membrane (Cat No. IPVH00010, 0.45 μm pore size). The membrane was blocked in 5% non-fat milk in Tris-buffered saline and Tween-20 (TBS-T) at room temperature for one hour. Incubation with primary antibody was overnight, rocking at 4 °C. Membranes were then washed three times for 10 mins in TBS-T, and then incubated with HRP-labeled Rabbit secondary antibody at room temperature for one hour (1:2000–1:5000; Millipore, AP132P). The membranes were then developed using chemiluminescence (SuperSignal West Femto, Thermo Fischer Scientific). The primary antibodies used were: N-MYC (1:1000; Cell Signaling, #9405S), MYC (1:800; Cell Signaling #5605), and β-Actin (1:5000; Cell Signaling, #4967S).

ChIP-Seq protocol

The ChIP-Seq Protocol is separated into four sections: Cell Fixation, Chromatin Immunoprecipitation (ChIP) and Library Preparation, Library Sequencing, and ChIP-Seq Analysis. Of note, the MYCN ChIP-Seq for Kelly and NGP cell lines was performed using a varied procedure and is noted in a separate section within this protocol. Necessary materials and reagents are listed in Online Table 2.

Cell fixation

Cells were grown as described in Cell Growth and Expansion section of protocol to 70–80% confluence in 150 mm tissue culture plates in 20 mL of media. The Formaldehyde solution (Online Table 2) was freshly prepared. Cells were removed from incubation and 1/10th of the growth media volume of the Formaldehyde Solution was added to the existing media in the plate (i.e. if the current volume of the plate is 20 mL of media, 2 mL of Formaldehyde Solution would be added). The solution was gently swirled, and then rocked at RT for 15 minutes. To stop the fixation, 1/20th the current volume of the Glycine Solution (Online Table 2) was added to the plate (i.e. if the current volume in the plate is 22 mL then 1.1 mL of Glycine Solution should be added). The plate was gently swirled to mix, and then allowed to sit at RT for 5 minutes. Following this incubation, a cell scraper was used to collect the cells, and then all cells and solution were transferred to a 50 mL conical on ice. From this point forward, all samples were kept on ice. The 50 mL conical was centrifuged at 800 × g at 4 °C for 10 minutes to pellet the cells. Supernatant was removed and discarded, and the cells were resuspended with 10 mL of chilled, sterile PBS. Centrifugation of the tube at 800 × g at 4 °C for 10 minutes was repeated. The supernatant was removed and discarded, and the cells were resuspended with 10 mL of chilled, sterile PBS with 100 uL of PMSF. The tube was centrifuged at 800 × g at 4 °C for 10 minutes, the supernatant was removed, and then the cells were snap frozen on dry ice and stored at −80 °C. The cells were then shipped to Active Motif on dry ice following the instructions listed at on the Sample Submission Form, downloaded from www.activemotif.com/sample-submission.

ChIP and library preparation by active motif

Chromatin immunoprecipitation was completed by Active Motif. Full methods are proprietary. Chromatin was isolated using a lysis buffer and membranes were disrupted with a dounce homogenizer. The lysates were then sonicated with Active Motif’s EpiShear probe sonicator (#53051) and cooled sonication platform (#53080) to an average fragment length 300–500 bp. A portion of the sample was collected as the Input DNA, treated with RNase, proteinase K, and incubated to reverse crosslinking. The DNA was then collected by ethanol precipitation. The Input DNA was resuspended and concentration was quantified by a NanoDrop spectrophotometer. Extrapolation of this concentration to the original chromatin volume allowed for quantitation of the total chromatin yield. Aliquots of the fixed chromatin were used in the immunoprecipitation were precleared with protein A agarose beads (Invitrogen, #15918014). Genomic DNA regions of interest were isolated using specific ChIP antibodies (Online Table 2). Antibody DNA complexes were isolated using additional protein A agarose beads, and the crosslinked DNA, antibody, and bead complexes were washed. The cross-linked DNA was eluted from the beads with SDS buffer, and subjected to RNase and proteinase K treatment. Reverse crosslinking was done in an overnight incubation at 65 °C, and ChIP DNA was purified with a phenol-chloroform extraction and ethanol precipitation.

Illumina sequencing libraries were prepared from the ChIP and Input DNAs using the standard consecutive enzymatic steps of end-polishing, dA-addition, and adaptor ligation using Active Motif’s custom liquid handling robotics pipeline. Samplers were amplified with a 15 cycle PCR amplification and then quantified before being shipped to the Jefferson Cancer Genomics Laboratory at the Kimmel Cancer Center for sequencing.

MYCN ChIP-Seq: Kelly and NGP cell lines

Chromatin immunoprecipitation was performed on adherent cells as described in Bosse et al.17. Of note, a different MYCN antibody was used than listed in Bosse et al., 2017 (Santa Cruz B8.4B, sc-53993). Cells were grown as described in Cell Growth and Expansion section of protocol to 70–80% confluence in 150 mm tissue culture plates in 20 mL of media. To the existing media, 415 mL of 37% formaldehyde (final concentration of 0.75%) was added, and rocked for 10 minutes at RT. To this, 1.5 mL of 2.5 M glycine (Online Table 2) (final concentration of 0.18 M) was added to inactivate the formaldehyde, and the plate was rocked for an additional 5 min. Cells were lysed with a volume of FA Lysis Buffer (Online Table 2) equivalent to 5 pellet volumes. Beads were washed 3 times in ChIP Wash Buffer (Online Table 2) and one time with Final Wash Buffer (Online Table 2). Libraries were constructed using NEB Ultra Kit following the manufacturer’s instructions. Libraries were sequenced as single-end, 50 bp reads on a MiSeq to a depth of ~50 M reads by the Children’s Hospital of Philadelphia Nucleic Acid and PCR Core.

ChIP library sequencing for ChIP

Sequencing was conducted by the Jefferson Cancer Genomics Laboratory at the Kimmel Cancer Center. Samples were quality control tested using an Agilent High Sensitivity Screen Tape to determine average fragment length. The concentration of each library was measured using a High Sensitivity Qubit Quantification kit, and samples were diluted to an appropriate amount for the loading protocol (4 nM or less). Samples were normalized to the same nanomolar concentration, and libraries were pooled together in equal amounts. Samples were diluted to 1.51 pM in Low EDTA TE Buffer. Samples were then sequenced as single-end, 75 bp reads to an average depth of ~30 M reads on a NextSeq. 500.

ATAC-Seq protocol

The following ATAC-Seq protocol was adapted from Buenrostro, et al.18. This protocol consists of four parts: Cell Preparation, Transposition Reaction and Purification, PCR Amplification, qPCR, and Library Preparation. Primer 1 and Primer 2 were custom synthesized by Integrated DNA Technologies (IDT), using sequences provided in Buenrostro, et al., 2015. Note: ATAC-Seq for NB-69 and NGP was performed using a slightly varied procedure and is noted in a separate section.

Cell preparation

Cells were grown as described in Cell Growth and Expansion section of protocol to 70–80% confluence in a 75 mm2 tissue culture flasks in 10 mL of media. Following detachment and pelleting, cells were resuspended in 1.0 mL of the appropriate growth media. Cells were triturated until they were in a homogenous single-cell suspension. Using an automated cell counter, the volume for 500,000 cells was determined and aliquoted into a sterile 1.5 mL Eppendorf tube containing 500 μL of sterile 1X PBS. Cells were centrifuged at 500 × g for 5 minutes at 4 °C. The supernatant was carefully aspirated, and the cells were resuspended in 500 mL of sterile 1X PBS. Centrifugation was repeated and cells were resuspended in 500 mL of cold lysis buffer by gently pipetting up and down, and then immediately centrifuged at 500 × g for 10 minutes at 4 °C. The supernatant was carefully removed and discarded. The pellet was immediately resuspend in 50 μL of nuclease free water by gently pipetting up and down, and the protocol immediately continued on to Transposition Reaction and Purification section.

Transposition reaction and purification

The pellet was placed on ice. The following reagents were prepared and combined: transposition reaction mix (25 μL TD (2X reaction buffer from Nextera Kit), 2.5 μL TDE1 (Nextera Tn5 Transposase from Nextera Kit), 17.5 μL nuclease-free water, and 5.0 μL of resuspended DNA/protein from the final step in Cell Preparation (resuspended pellet in 50 μL of nuclease free water). The transposition reaction was incubated in a thermocycler at 37 °C for 30–35 minutes. The reaction was immediately purified using Qiagen MinElute PCR Purification Kit, and the transposed DNA was eluted in 10.5 μL of elution buffer (Buffer EB from the MinElute Kit consisting of 10 mM Tris-Cl (pH 8)). The eppendorf tube containing purified DNA was parafilmed, and stored at −20 °C. NOTE: This can act as a good stopping point, however these DNA fragments are not PCR amplifiable if melted at this point.

PCR amplification

Primer sequences are shown in Table 2. To amplify the Transposed DNA, the following were combined into a 0.2 mL PCR tube: 10 μL transposed DNA, 10 μL nuclease-free H2O, 2.5 μL 25 mM PCR Primer 1 (Ad1), 2.5 μL 25 mM Barcoded PCR Primer 2 (Ad2.X, X being the unique number of samples), and 25 μL NEBNext High-Fidelity 2X PCR Master Mix. The thermal cycle was as follows:

$$\begin{array}{rl}1\,{\rm{c}}{\rm{y}}{\rm{c}}{\rm{l}}{\rm{e}}: & 5\,{\rm{m}}{\rm{i}}{\rm{n}}\,7{2}^{\circ }{\rm{C}}\\ & 30\,{\rm{s}}{\rm{e}}{\rm{c}}\,9{8}^{\circ }{\rm{C}}\\ 5\,{\rm{c}}{\rm{y}}{\rm{c}}{\rm{l}}{\rm{e}}{\rm{s}}: & 10\,{\rm{s}}{\rm{e}}{\rm{c}}\,9{8}^{\circ }{\rm{C}}\\ & 30\,{\rm{s}}{\rm{e}}{\rm{c}}\,6{3}^{\circ }{\rm{C}}\\ & 1\,{\rm{m}}{\rm{i}}{\rm{n}}\,7{2}^{\circ }{\rm{C}}\end{array}$$
Table 2 Primer sequences used in ATAC-Seq.

The five minute extension in the first cycle is critical to allow extension on both ends of the primer after transposition, thereby generating amplifiable fragments. This ensures that downstream quantitative PCR (qPCR) quantitation will not change the complexity of the original library.

qPCR

To reduce the GC and size bias in PCR, the appropriate number of PCR cycles (N) was determined using qPCR, allowing us to stop prior to saturation. The samples were kept in the thermocycler following the PCR Amplification reaction, and the qPCR side reaction was run. In a 0.2 mL PCR tube the following were added: 5 μL of DNA PCR amplified DNA, 2 μL of nuclease free H2O, 1 μL of 6.25 mM Custom Nextera PCR Primer (Ad1), 1 μL of 6.25 mM Custom Nextera PCR Primer 2 (Ad2.X), 1 μL 9X SYBR Green I, and 5 μL NEBNext High-Fidelity 2X PCR Master Mix. This sample was run in the qPCR instrument with the following cycles:

$$\begin{array}{rl}1\,{\rm{c}}{\rm{y}}{\rm{c}}{\rm{l}}{\rm{e}}: & 30\,{\rm{s}}{\rm{e}}{\rm{c}}\,9{8}^{\circ }{\rm{C}}\\ 20\,{\rm{c}}{\rm{y}}{\rm{c}}{\rm{l}}{\rm{e}}{\rm{s}}: & 10\,{\rm{s}}{\rm{e}}{\rm{c}}\,9{8}^{\circ }{\rm{C}}\\ & 30\,{\rm{s}}{\rm{e}}{\rm{c}}\,6{3}^{\circ }{\rm{C}}\\ & 1\,{\rm{m}}{\rm{i}}{\rm{n}}\,7{2}^{\circ }{\rm{C}}\end{array}$$

To calculate the additional number of cycles needed, a linear plot of Rn versus cycle was generated. This determined the cycle number (N) that corresponds to one-third of the maximum fluorescent intensity.

The remaining 45 mL PCR reaction was run to the cycle number (N) determined by qPCR. Cycles are as follows:

$$\begin{array}{rl}1\,{\rm{c}}{\rm{y}}{\rm{c}}{\rm{l}}{\rm{e}}: & 30\,{\rm{s}}{\rm{e}}{\rm{c}}\,9{8}^{\circ }{\rm{C}}\\ {\rm{N}}\,{\rm{c}}{\rm{y}}{\rm{c}}{\rm{l}}{\rm{e}}{\rm{s}}: & 10\,{\rm{s}}{\rm{e}}{\rm{c}}\,9{8}^{\circ }{\rm{C}}\\ & 1\,{\rm{m}}{\rm{i}}{\rm{n}}\,7{2}^{\circ }{\rm{C}}\\ & 30\,{\rm{s}}{\rm{e}}{\rm{c}}\,6{3}^{\circ }{\rm{C}}\end{array}$$

The amplified library was purified using Qiagen MinElute PCR Purification Kit after the additional PCR. The purified library was eluted in 20 μL of elution buffer (Buffer EB from the MinElute Kit consisting of 10 mM Tris-Cl (pH 8)). It is important to make sure that the column is dry prior to adding elution buffer to avoid ethanol contamination of final library. The amplified library was purified using AMPure XP beads at a 1.8x ratio to get rid of adapter dimers, using 80% ethanol for the wash steps. Sample was eluted in 50 μL of nuclease free H2O. The concentration of the DNA eluted from the column should be about 30 nM.

Library preparation

The quality of the purified libraries was assessed using a Bioanalyzer High-Sensitivity DNA Analysis kit (Agilent). If libraries contained predominant peaks around 1000 bp, SPRI beads were used to remove these fragments. This was accomplished by first, with a new vial of SPRI beads, performing size selection with various ratios to ensure larger peaks are removed. For example, ratios could include 0.4X, 0.45X, 0.5X. Choose the ratio that removes 1000 bp fragments, but leaves 800 bp fragments. Libraries were eluted in 20 μL of nuclease-free water, and sequenced as described below.

Sequencing for ATAC-Seq by Beijing Genomics International (BGI)

Sequencing was conducted by Beijing Genomics International at the Children’s Hospital of Philadelphia. Samples were quality control tested using an Agilent High Sensitivity Screen Tape to confirm average fragment sizes were ~180, 380, 580, 780, and 980 bp. The concentration of each library was measured using a High Sensitivity Qubit Quantification kit, to ensure they were 5.5 nM. Samples were normalized and libraries were pooled together in equal amounts. Samples were then sequenced as paired-ends, 100 bp to an average depth of 80 M reads on a HiSeq. 2500.

ATAC-Seq NB-69 and NGP cell lines via Active Motif

Cells were grown as described in Cell Growth and Expansion section of protocol to 70–80% confluence in a 75 mm2 tissue culture flasks in 10 mL of media. Following detachment and pelleting, cells were resuspended in 1.0 mL of the appropriate growth media. Cells were triturated into a homogenous single-cell suspension. Using an automated cell counter, the volume for 100,000 cells was determined, and aliquoted into a sterile 1.5 mL eppendorf tube containing 500 μL of sterile 1X PBS. Cells were then centrifuged at 500 × g for 5 minutes at 4 °C. The supernatant was carefully aspirated off, and the cells were resuspended in 500 μL of growth media with 5% DMSO. The sample was transferred to a 1.7 mL microfuge tube on ice. Cells were frozen with a slow cooling to minimize cell lysis. Samples were shipped on dry ice to Active Motif (1914 Palomar Oaks Way, Ste 150, Carlsbad, CA 92008) following the instructions listed at on the Sample Submission Form, downloaded from www.activemotif.com/sample-submission. Samples were prepared and sequenced following Active Motif’s ATAC-Seq proprietary protocol. Cells were thawed in a 37 °C water bath, pelleted, washed with cold PBS, and tagmented as previously described18, with some modifications based on19. Cell pellets were resuspended in lysis buffer, pelleted, and tagmented using the enzyme buffer provided in the Nextera Library Prep Kit (Illumina). Tagmented DNA was then purified using the MinElute PCR purification kit (Qiagen), amplified with 10 cycles of PCR, and purified using Agencourt AMPure SPRI beads (Beckman Coulter). The resulting material was quantified using the KAPA Library Quantification Kit for Illumina platforms (KAPA Biosystems) and sequenced with PE42 sequencing on the NextSeq. 500 sequencer (Illumina).

ChIP-Seq data analysis

FASTQ quality was assessed using FastQC v0.11.4 (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) and sequences were adapter- and quality-trimmed using default parameters for Trim Galore v.0.4.0 and CutAdapt v.1.1220,21. MultiQC v1.4 was used to aggregate FastQC results across all samples, with the report available on Figshare22. Since multiple sequencers were used, FASTQ phred sequencing scores23 were calculated using a perl script (https://raw.githubusercontent.com/douglasgscofield/bioinfo/master/scripts/phredDetector.pl). This value was used as input into the alignment algorithm. The bwa v.0.7.12 samse24 was used to align the reads to hg19 reference genome and Picard tools v.2.17.9-SNAPSHOT25 was used to remove duplicates. Fragment sizes were estimated using MaSC 1.2.126 and these values were used as input into the –extsize argument of MACS2 v.2.1.127 for narrow peak calling (transcription factors) or broad peak calling (histone marks). Broad peaks were called significant using a q-value (minimum False Discovery Rate) cut off of 0.10 and narrow peaks at a q-value cutoff of 0.05. Results were returned in units of signal per million reads to get normalized peak values. Repetitive centromeric, telomeric and satellite regions known to have low sequencing confidence were removed using blacklisted regions defined by the ENCODE project: http://mitra.stanford.edu/kundaje/akundaje/release/blacklists/hg19-human/wgEncodeHg19ConsensusSignalArtifactRegions.bed.gz. The resulting filtered peakfiles were used as input into Homer v4.10.4 for gene annotation and motif analysis.

ATAC-Seq data analysis

Samples were quality-controlled and trimmed as described in Chip-Seq Analysis. FASTQ files were aligned using bwa aln for BGI samples (100 bp reads) and bwa mem for Active Motif samples (42 bp reads). Reads with mapping quality <10 were discarded. Biological duplicate BAMs were merged using Picard v.2.17.9-SNAPSHOT. Broad peaks were called using –extsize 200, –shift 100, –nomodel. Results were returned in units of signal per million reads to get normalized peak values. Finally, repetitive centromeric, telomeric and satellite regions known to have low sequencing confidence were removed using merged blacklisted regions defined by the ENCODE project: http://mitra.stanford.edu/kundaje/akundaje/release/blacklists/hg19-human/wgEncodeHg19ConsensusSignalArtifactRegions.bed.gz.

ChIP-Seq quality control metrics

We investigated three metrics to assess ChIP-seq quality. To calculate enrichment of reads within peaks we determined the FRiP score using deeptools228. The FRiP score is defined as the fraction of reads that fall within a peak divided by the total number of reads. To measure read enrichment independent of peak calling we calculated the NSC (normalized strand cross-correlation) and the RSC (relative strand cross-correlation) using phantompeakqualtools29,30 as part of the ENCODE ChIP-seq processing pipeline. All ChIP-Seq data passed quality control and results are reported in Online Table 3.

ATAC-Seq quality control metrics

To compare reproducibility between ATAC-seq biological replicates we performed irreproducible discovery rate (IDR) analysis using scripts downloaded from https://github.com/nboley/idr. Peaks passing the suggested threshold (IDR < = 0.05%) between two replicates were kept. The ratio between the number of peaks between true replicates (Nt) and pooled pseudoreplicates (Np) was calculated. In accordance with ENCODE guidelines, we confirmed that at least 50% of true replicate IDR analysis based peaks (Nt) were identified in the IDR comparison of pseuduoreplicates (Np): Np/Nt < 2. A similar analysis was done with self-pseudoreplicates (N1 and N2). We confirmed that the ratio between Np/Nt or N1/N2 was <2. All ATAC-seq data passed IDR results and are reported in Online Table 4. Peakfiles resulting from IDR analysis are available from FigShare31.

Super-enhancer calling and comparison

Super-enhancers (SEs) were called from H3K27Ac BAM files using the default parameters of LILY (https://github.com/BoevaLab/LILY), which includes correction for copy number variation inherently present in cancer samples. Enhancers were classified into SEs, enhancers, and promoters and annotated using Homer v4.10.4. Scripts to run LILY can be found on Github (https://github.com/marislab/epigenomics-data-descriptor). SEs were also called from H3K27Ac MACS2 peaks using ROSE v.0.1 (https://bitbucket.org/young_computation/rose/src/master/) using default parameters and annotated using Homer v4.10.4. SEs which overlapped with the MYCN locus (hg19, chr2:16080683-16087129) were removed from the analysis. SE genes which we annotated as transcription factors32 were used for comparison to two literature studies33,34.

Heatmap preparation

The 5,000 most significant (sorted by highest -log10(p-value) and -log10(q-value)) MYCN peaks for each of the five MYCN amplified cell line were intersected using bedtools. Heatmaps were generated for regions +/−4 kb from the transcription start site (TSS) for the 5,046 peaks common to at least four MYCN amplified cell lines. Heatmaps were created for LA-N-5 and NB-69 at loci annotated as enhancers, SEs, and promoters-TSS by LILY. All ChIP-seq heatmaps were created using deepTools 3.2.0 package plotHeatmap tool28. The code and parameters used to generate heatmaps can be found on GitHub (https://github.com/marislab/epigenomics-data-descriptor).

Cell line authentication

All cell lines were STR-authenticated by Guardian Forensic Sciences (Abington, PA) using the GenePrint 24 (Promega, #B1870).

Data Records

Raw, concatenated FASTQ files were deposited in Sequence Read Archive under the SRA study accessions SRP22394135, SRP22397736, and SRP22394237. Processed BIGWIG files for all sequencing data were deposited into the Gene Expression Omnibus (GEO) under SuperSeries Accession Number GSE13831538. MYCN and MYC ChIP-Seq data for the Kelly and NGP cell lines were deposited into GEO under Accession Number GSE9478239, all other MYCN and MYC ChIP-Seq were deposited under Accession Number GSE13829540, histone ChIP-Seq data were deposited under Accession Number GSE13831441, and ATAC-Seq data were deposited under Accession Number GSE13829342. Homer motif analysis and motif files are available on FigShare22,31,43.

Technical Validation

Prior to selecting cell lines for MYCN and MYC profiling, we assessed RNA expression (Fig. 2a,b) and protein expression (Fig. 2c,d) across a subset of neuroblastoma cell lines. NB-LS, while MYCN non-amplified, has substantial MYCN RNA and protein expression44, but was not chosen, as we restricted MYCN ChIP-Seq to MYCN amplified cell lines plus one negative control. SK-N-BE(2)-C, a MYCN amplified cell line, showed high MYCN mRNA expression, but surprisingly low protein expression, and thus was excluded. The remaining cell lines had concordant MYCN and MYC mRNA and protein expression, thus, COG-N-415, KELLY, NB-1643, LA-N-5, and NGP were chosen for MYCN ChIP-Seq while NB-69, SK-N-AS, and SK-N-SH were chosen for MYC ChIP-Seq. As additional controls, we performed MYCN ChIP-Seq in the MYCN non-amplified line NB-69, and MYC ChIP-Seq on the MYCN amplified cell line KELLY. To validate the MYCN and MYC ChIP-Seq antibodies, we first intersected loci bound by MYCN in two or more cell lines and of the 157 MYCN transcriptional targets previously reported using ChIP-on-ChIP45, found 139 loci occupied by the MYCN via ChIP-Seq (Fig. 2e). Next, we integrated the top 5,000 MYCN peaks from each MYCN amplified cell line. We generated heatmaps for the peaks (1,335) which overlapped in all five cell lines (as defined in Heatmap Preparation) and depict occupancy of MYCN (Fig. 2f) and MYC (Fig. 2g) at these sites. As expected, the MYCN amplified cell lines COG-N-415, KELLY, NB-1643, LA-N-5, and NGP show similar binding profiles, while the negative control MYCN non-amplified line NB-69 depicted an absence of binding for MYCN at the same loci. Importantly, Homer motif analysis of the 34,906 target sequences bound by MYC in NB-69 were significantly enriched (Benjamini q-value < 0.001) for the canonical CACGTG e-box motif, while this motif was absent from the 112 target sequences found in the NB-69 MYCN ChIP-Seq sample. We observed MYC bound to the same loci in the MYCN non-amplified cell lines, SK-N-AS, SK-N-SH, and NB-69 as well as the MYCN amplified and low MYC-expressing line KELLY (Fig. 2g), and observed shared CACGTG motif binding for both MYCN and MYC in KELLY, supporting the notion of redundant functionality of MYC family protein members. To further validate both the specificity and functional redundancy of the MYCN and MYC ChIP-Seq, we assessed MYCN and MYC binding to transcriptional targets of an 18-gene MYC family (MYCN/MYC/MYCL1) activity signature46 in KELLY (MYCN and MYC) and SKNBE(2)C (MYCN) cell lines alongside six non-MYC family core regulatory TFs (ASCL1, GATA3, HAND2, ISL1, PHOX2B, TBX2) from publicly-available ChIP-Seq data (ASCL1: GEO accession number GSE120074 and GATA3, HAND2, ISL1, PHOX2B, TBX2: GEO accession number GSE94824) reprocessed with our pipeline (see Methods). Supplemental Fig. 2 shows the binding patterns for four of the 18 genes: APEX1, NME1, ENO1, and ODC1. APEX1, NME1, and ENO1 are not bound by the six non-MYC family core regulatory TFs (ASCL1, GATA3, HAND2, ISL1, PHOX2B, TBX2), while ASCL1 shows binding at ODC1 because it recognizes the e-box motif, CANNTG. Altogether, these data demonstrate specificity of MYCN and MYC antibodies and functional redundancy of MYCN and MYC proteins.

Fig. 2
figure 2

Comparison of MYCN and MYC binding based on MYCN amplification status. The log2 FPKM mRNA expression of MYCN (a) and MYC (b) in all neuroblastoma cell lines assayed herein. Protein expression for MYCN (C) and MYC (d) in a subset of cell lines. (e) Comparison of unique MYCN peaks called in at least two of the MYCN ChIP-Seq samples (blue) to known MYCN regulated genes (gray)45 demonstrates concordance of MYCN ChIP-Seq to MYCN ChIP-ChIP. (f) The top 1,335 MYCN peaks (p < 0.05, q < 0.05) are plotted as ChIP-Seq heatmaps for five MYCN amplified cell lines (COG-N-415, Kelly, NB-1643, LAN-5, and NGP) and one MYCN non-amplified cell line (NB-69). All heat map densities ranges from +/−4.0 Kb from the TSS, with average signal plots shown above. (g) MYC ChIP-Seq heat maps for the same peaks in two MYCN non-amplified lines (SK-N-AS, and NB-69) and one MYCN amplified cell line (Kelly) showing redundant binding of MYC in non-amplified cell lines.

Next, we evaluated genome-wide binding densities of the histone antibodies and assessed open chromatin by plotting binding of one MYCN amplified cell line LA-N-5 (Fig. 3a), and one MYCN non-amplified cell line NB-69 (Fig. 3b). Of note, cell-line specific promoters are located in regions of open chromatin and strongly occupied by narrow regions of H3K4me3 and devoid of H3K27me3 and H3K4me1, as expected. The majority of promoters are also occupied by MYCN in LA-N-5 and MYC in NB-69. Enhancers have bivalent marking of MYCN, H3K4me3, H3K27Ac, open chromatin, and absence of H3K27me3. SEs are broadly marked by MYCN, H3K4me3, H3K27Ac, H3K4me1, and open chromatin.

Fig. 3
figure 3

Validation of ChIP-Seq promoter, enhancer, and open chromatin occupancy. Binding densities of MYCN, MYC, histone antibodies, and open chromatin for promoter regions of the MYCN-amplified cell line LA-N-5 (a) and the MYCN non-amplified cell line NB-69 (b) are depicted (+/−4.0 Kb from gene TSS) and distinct profiles are shown for promoters (+/−4.0 Kb from gene TSS), enhancers (+/−10.0 Kb from gene TSS), and SEs (+/−10.0 Kb from gene TSS). For LA-N-5: Npromoters-TSS = 4,662, Nenhancers = 25,601, NSE = 826, and for NB-69: Npromoters-TSS = 4,718, Nenhancers = 31,769, NSE = 667.

Finally, we used our H3K27Ac ChIP-Seq data to compare SE prediction of cell line lineage in our dataset compared to those reported in two other publications describing the SE landscape in neuroblastoma (Fig. 4 and Supplemental Fig. 1). Boeva and colleagues identified 4,791 SE-associated genes in Table S3 to identify core regulatory transcriptional circuitry in neuroblastoma using 25 cell lines33. Four cell lines were common to our study: SK-N-BE(2)C, SK-N-FI, SK-N-AS, and NB-69. Therefore, to validate our H3K27Ac ChIP-Seq, we utilized the same algorithm (LILY, see Methods) to call SEs from our H3K27Ac data, and restricted comparison analyses to genes defined as transcription factors (TFs), as defined by core regulatory circuitry32. We annotated 396 of the SEs reported by Boeva and colleagues as transcription factors and found 59–85% concordance of our TF SE calls (Supplemental Fig. 1). While a majority of SEs called in each of our cell lines was concordant with Boeva and colleagues, the high variance in total number of SEs called likely stems from the diversity of cell lines in both studies, as well as pipeline processing and filtering parameters. We were unable to directly compare methods without their code and raw data readily available. Thus, we additionally compared our TF SE calls to those from an independent neuroblastoma study34 which used the ROSE algorithm (see Methods) and reported smaller SE genesets (Online Table 5) driving the lineage-specific mesenchymal (MES, N = 20 TFs) and adrenergic (ARDN, N = 18 TFs) subtypes. To mimic the analysis performed by van Gronigen and colleagues, we ran ROSE on our H3K27Ac ChIP-Seq data and removed any peaks which overlapped the MYCN locus (see Methods) to account for false SE calls due to MYCN amplification. There were no common neuroblastoma cell lines between van Gronigen and colleagues study and the lines used in our study. We assessed the number of MES or ADRN SE-associated TFs detected in each of our study and found between five and eight ADRN SEs were detected using ROSE (Fig. 4a) and between five and 11 ADRN SEs were detected using LILY (Fig. 4b). SK-N-SH has a known MES subtype; its subclone, SH-SY-5Y, was profiled as MES by van Gronigen and colleagues. Combining the calls, we were able to significantly (Fisher’s exact test, p < 0.05) validate ADRN subtypes in eight of the ten cell lines we profiled (Fig. 4c). Interestingly, SK-N-AS contains SEs from both subtypes and thus may reflect a heterogeneous cell line. Specific SEs are reported per algorithm per cell line in Online Table 5. As further validation, we re-analyzed publicly-available SK-N-SH H3K27Ac (Biosample SAMN05733860, Run SRR5338927) and SK-N-SH Input (Biosample SAMN05733844, Run SRR5471111) ChIP-Seq data (GEO accession GSM2534162) using the same peak-calling and SE pipelines used on our data (see Methods). We observed enhancer binding (H3K27Ac) and open chromatin (ATAC) at the same loci we observe strong MYC occupancy (Supplemental Fig. 1A). Further, we assessed concordance of SEs called in SK-N-SH with those previously reported and found 76% of TF SEs called in SK-N-SH in common with those from Boeva, et. al, similar to our findings.

Fig. 4
figure 4

Super-enhancer calls correctly determine neuroblastoma cell line lineage (a) Total number of lineage-specific SEs called from ROSE and (b) LILY per cell line (ADRN = adrenergic, MES = mesenchymal subtype). (c) Total number of lineage-specific SEs called when combining ROSE and LILY results. One-tailed fisher’s exact test p-values are listed (*denotes significance at an α level of 0.05).

Together, we have validated both MYCN and MYC ChIP-Seq antibodies for use in ChIP-Seq, as well as genome-wide occupancy profiles for histone markers and open chromatin across a cohort of neuroblastoma cell lines. We ran two algorithms (LILY and ROSE) and compared our data to two independent datasets to validate reproducibility of lineage-specific SEs in neuroblastoma cell lines. Finally, we demonstrate integration of publicly-available H3K27Ac data from SK-N-SH with our MYC ChIP-Seq and ATAC-Seq data, and show reproducibility of SE calls between the publicly-available data and two independent reports. These data should be a valuable resource to the childhood cancer and MYC research communities.

Usage Notes

Here, we provide raw FASTQ and bigwigs for a comprehensive, validated ChIP-Seq (MYCN, MYC, H3K27Ac, H3K27me3, H3K4me3, and H3K4me1) and ATAC-Seq neuroblastoma cell line dataset which can be coupled with our previous RNA-Seq profiling dataset16 to interrogate novel transcriptional regulation in this disease. For example, the H3K27me3 ChIP-Seq can be used to identify genes being repressed via the PRC2 complex, while H3K27Ac and H3K4me1 ChIP-Seq can be used to interrogate promoter-enhancer mechanisms. CSI-ANN can be used to integrate histone ChIP-Seq data to predict regulatory DNA segments47, and IM-PET can use the results from CSI-ANN to predict enhancer-promoter interactions without the need for Hi-C data13. Additionally, chromatin states can be inferred48,49, and these data can be later integrated with whole exome or genome sequencing data or genome-wide association studies to identify molecular alterations driving transcriptional regulatory marked by histone marks or open chromatin.

All data are openly-available from GEO as described in the Data Records section.