Data description

This project was approved by the Royal Children’s Hospital Human Research Ethics Committee (RCH HREC# 29140C). We have performed Methyl-Binding Domain protein 2 (MBD2) enrichment and isolated fractions of DNA from 40 individuals for sequencing on the Sequencing by Oligonucleotide Ligation and Detection (SOLiD™) sequencing platform (SOLiD™MBD-Seq, Life Technologies, Carlsbad, USA). MBD2 has been shown to bind to double-stranded methylated DNA molecules and used to interrogate the human methylome [1]. By comparing the enriched fraction to the "input" total genomic DNA fraction, genomic regions of DNA methylation can be inferred after sequencing both fractions. The samples analysed are comprised of the following: three model cell lines, JWL (an in-house non-leukaemic cell line [2]), CEM-CCRF (childhood T-cell acute lymphoblastic leukaemia [ALL] cell line) and K562 (adult chronic myelogenous leukaemia cell line). From two non-leukaemic individuals (pbsc1 and pbsc2), peripheral blood mononuclear cells were sampled and four haematopoietic cell populations (CD34-positive, CD19-positive, CD33-positive and CD45-positive) were isolated for SOLiD™MBD-Seq analysis. From another two non-leukaemic individuals (bm9 and bm10), the same haematopoietic cell populations were isolated from bone marrow. Eight cases of childhood ALL were analysed with the identifiers 135, 197, 292, 316, 362, 367, 378 and 386 at diagnosis (leuk) and 28 days post induction chemotherapy (rem). A third set of samples was taken at relapse (lap) for cases 197, 316, 362 and 367 (Table 1).

Table 1 Samples analysed in this study and sequencing metrics

Genomic DNA from archived bone marrow smear microscope slides from ALL patients, cells and cell lines were extracted as previously described [3] and used for the enrichment of CpG methylation with the MethylMiner™ Methylated DNA enrichment kit (Life Technologies) according to the manufacturer’s protocols. The fragmented input genomic DNA (I) and enriched E5 fraction (E) were isolated from each sample for library preparation and sequencing using SOLiD™ v3 and v4 chemistry according to the manufacturer’s protocols (Life Technologies).

Single and paired-end SOLiD™ sequencing reads were aligned using LifeScope™ Genomic Analysis Suite (Life Technologies) with default parameters against the hg19 reference genome. Alignment efficiency (the ratio of uniquely aligned reads to total sequenced reads for each sample) ranged from 26.57% to 93.15% across all samples in this study (Table 1).

Alignments were then processed using MACS (Model-based Analysis for ChIP-Seq) [4] and HOMER (Hypergeometric Optimization of Motif EnRichment) [5,6] to identify enrichment peaks.

This study is unique in a number of ways. This is the first sequencing-based DNA methylation profiling study in childhood ALL using archived bone marrow samples of similar quality to formalin-fixed paraffin-embedded (FFPE) tissue samples [7]. We have selected samples that have been interrogated using an orthogonal platform, the Illumina Infinium Human Methylation 450K BeadArray [3,8], and included replicate samples to assess the reproducibility of SOLiD™MBD-Seq and to identify regions of differential DNA methylation of interest to childhood ALL.

We performed replicate DNA methylation enrichment analysis using the JWL cell line with 1 μg and 5 μg of starting genomic DNA to determine if 1 μg of starting material was sufficient for DNA methylation enrichment. This was less than the recommended quantity but a typical amount obtainable from our primary patient samples.

We isolated four haematopoietic cell populations (CD34, CD19, CD33, CD45) at major stages of development corresponding to the arrested stages of development in paediatric leukaemia. This was achieved by positive selection using fluorescent-labelled antibodies and Fluorescent Activated Cell Sorting (FACS) from four individuals. This would enable us to track changes in DNA methylation between cell lineages and contrast them with leukaemic cells. After MACS enrichment peak analysis, a large proportion of peaks were common between the CD19 cells from three individuals, confirming the premise of tissue-specific DNA methylation profiles in haematopoietic cells (Figure 1A).

Figure 1
figure 1

Venn diagrams summarising peak region overlaps between samples analysed in this study. Overlapping peak regions are shown after MACS peak analysis. (A) Peaks on chromosome 21 from three non-leukaemic individuals where CD19 cells were positively selected using FACS. A high degree of overlapping peaks were observed. (B) Peaks from matching leukaemic and remission samples from individual 135. Although there are some overlapping peaks (183), there are a substantial number of distinct peaks in each sample. (C) The extent of overlapping peaks between 3 leukaemic samples. (D) The extent of overlapping peaks between 3 remission samples.

When comparing DNA methylation enrichment peaks between leukaemic and remission samples (tumour versus normal) from the same individual, distinct enrichment peaks are seen; these are likely to correlate to disease state (Figure 1B). The number of overlapping peaks between leukaemic and remission samples were fewer compared to the haematopoietic cell analyses (Figure 1C and 1D) and could be indicative of the difference in sample qualities.

For each of the samples analysed in this study, we have generated track hubs that can be uploaded and visualised on the UCSC Genome Browser. This permits the immediate visualisation of regions of differential DNA methylation with potential biological significance. Moreover, we have performed Infinium analysis on these samples, and visualisation using the Genome Browser permits direct comparison to other publicly available data such as The Cancer Genome Atlas (TCGA) [9] and TARGET (Therapeutically Applicable Research to Generate Effective Treatments) [10]. This also permits further analysis and comparison to publicly available data using the Galaxy [11,12] and Cistrome [13] web servers.

In summary, our data represent one of the first DNA methylation enrichment analyses using SOLiD™MBD-Seq on archival bone marrow smears from children diagnosed with ALL. Such specimens are readily available in most pathology laboratories across the world and are amenable to genomic-scale analysis, as we have demonstrated here. These data should prove valuable for other DNA methylation studies in childhood ALL in haematopoeitic cell development.

Availability of supporting data

Supporting data is available from the GigaScience Database, GigaDB [14] and at NCBI under BioProject PRJNA272864.

Data file details

  • SRA Files included BioProject PRJNA272864

  • MACS and HOMER output files of peaks and peak locations

  • Track Hubs for UCSC Genome Browser