Distinguishing Active Versus Passive DNA Demethylation Using Illumina MethylationEPIC BeadChip Microarrays

The 5-carbon positions on cytosine nucleotides preceding guanines in genomic DNA (CpG) are common targets for DNA methylation (5mC). DNA methylation removal can occur through both active and passive mechanisms. Ten-eleven translocation enzymes (TETs) oxidize 5mC in a stepwise manner to 5-hydroxymethylcytosine (5hmC), 5-formylcytosine (5fC), and 5-carboxylcytosine (5caC). 5mC can also be removed passively through sequential cell divisions in the absence of DNA methylation maintenance. In this chapter, we describe approaches that couple TET-assisted bisulfite (TAB) and oxidative bisulfite (OxBS) conversion to the Illumina MethylationEPIC BeadChIP (EPIC array) and show how these technologies can be used to distinguish active versus passive DNA demethylation. We also describe integrative bioinformatics pipelines to facilitate this analysis. to the DNA that indicate if a cytosine nucleotide was modified. If the cytosine is modified (either 5mC or 5hmC), the C will stay a C, but if the cytosine is unmodified, it will be deaminated to uracil (and converted into thymine during PCR amplification). HRM takes advantage of these methylation-specific SNPs. After bisulfite conversion, regions of interest in the genome are amplified by real-time qPCR, and then a high-resolution melt step, in which the temperature is raised in very small increments and fluorescence is detected after each increment, is conducted to determine the melting temperature of the amplicon. The more Ts (unmodified cytosines) in the amplicon, the lower the melting temperature; the more Cs (modified cytosines) in the amplicon, the higher the melting temperature. By using differences in melting temperature of an amplicon across samples, the relative DNA modification state of a sample can be determined. in the 4c), suggesting that a threshold of 5hmC abundance is needed in a sample population for OxBS array to reliably quantify this mark. Indeed, by coupling this rationale with our ELISA-based global quantification results (Fig. 2c), we hypothesize that a threshold of at least 0.1% total 5hmC is needed in a gDNA sample to be detected by OxBS array, as our DAC + VitC samples exceeded this threshold while our DAC and NoTx samples did not. To further investigate this notion, we downloaded OxBS array data from a patient cohort of brain tissue samples and performed the subtraction calculation on these data (GSE138597) [59]. While in our analysis of NoTx and DAC treated samples we observed a large percentage of probes with negative 5hmC β -values following subtraction (Fig. 4c), the brain samples with known high levels of 5hmC only demonstrated 9% of negative 5hmC β -values on average (data not shown). We believe that samples with low 5hmC abundance processed by OxBS array analysis are more susceptible to this issue than if they are processed by TAB array, as OxBS array provides an indirect measurement of 5hmC while TAB array provides a direct measurement. Further work needs to be conducted to determine what exact global 5hmC threshold needs to be met to allow accurate quantification of 5hmC by the OxBS array. We recommend that the decision for use of TAB array or OxBS array should be made following global quantification of 5hmC levels in a sample. If the global amount of 5hmC is very low, we recommend the TAB array approach, as it can directly measure low levels of 5hmC present in a sample population. If the global amount of 5hmC is relatively high, then we recommend the OxBS approach, as reagent cost and sample processing time is limited in comparison to TAB array.


Introduction
Methylation on the 5-carbon of cytosine nucleotides in genomic DNA of eukaryotes is the most extensively studied epigenetic modification. To date, over 70,000 research papers, methods chapters, and review articles have been dedicated to the study of DNA methylation (5mC). 5mC provides diverse functionality in the regulation of gene expression, genome stability, chromatin compaction, and developmental timing [1]. Indeed, DNA methylation is largely regarded as one of the most stable epigenetic modifications, as its inheritance to daughter cells following cell division is faithfully copied during DNA replication by the maintenance methylation machinery, DNA methyltransferase 1 (DNMT1) and Ubiquitin-like, containing PHD and RING finger domains, 1 (UHRF1) [2][3][4][5]. 5mC patterning is conserved across most somatic tissues, with the most dynamics occurring at enhancers and other distal regulatory regions of the genome that influence gene expression [6,7]. Additional dynamic changes in 5mC are observed in disease transformation and in early mammalian development, as further described below.
DNA methylation can be passively removed in dividing cells that lack DNA methylation maintenance activity. An active mechanism for 5mC removal remained elusive until 2009, when the existence of an oxidized form of DNA methylation, DNA hydroxymethylation (5hmC) was thrust into the spotlight with the discovery of its abundance in neuronal tissue and the identification of an enzyme that could oxidize 5mC to 5hmC, Ten-eleven translocation 1 (TET1) [8][9][10]. Subsequently, two additional TET enzymes, TET2 and TET3, also demonstrated the ability to oxidize 5mC in a stepwise manner to 5hmC, 5formylcytosine (5fC), and 5-carboxylcytoine (5caC) [11,12]. Oxidation of 5mC to 5fC and 5caC allows for base-excision repair of the oxidized nucleotide by thymidine deglycosylase (TDG) and replacement by unmodified cytosine (5C) [13][14][15]. Combined, these discoveries laid the foundation for what is now widely accepted as the active DNA demethylation pathway.
While recent evidence suggests that the oxidized forms of 5mC can act in a regulatory manner through the recruitment of reader proteins [16,17], perhaps the most well-studied roles for the active DNA demethylation pathway are in the early stages of mammalian development [18]. Following fertilization, both the paternal and maternal genomes undergo massive changes in DNA methylation patterning that occurs through both active and passive DNA demethylation, respectively [19][20][21][22][23]. Primordial germ cells (PGCs) also undergo a dramatic loss of DNA methylation that can be attributed to both passive and active DNA demethylation mechanisms [24,25]. Embryonic stem cells (ESCs) also rely on TET proteins to maintain self-renewal properties as well as to direct lineage specification upon induction of differentiation [11,26].
Given the importance of 5mC for maintaining proper control of chromatin structure and function, aberrant patterning of 5mC has been widely studied in the context of aging, psychiatric and developmental disorders, and cancer [27][28][29][30]. As hypermethylation of tumor suppressor genes is a hallmark of cancer, significant effort has been devoted to developing therapies that induce DNA demethylation of these genes in order to restore their expression and function in cancer cells [27]. Accordingly, both passive and active DNA demethylation mechanisms are now being targeted for combination cancer therapies with DNMT inhibitors like 5-aza-2-′-deoxycytidine (DAC) and with L-ascorbic acid (Vitamin C, VitC), a co-factor for TET dioxygenase activity [31][32][33].
In this chapter, we use DAC and VitC to induce active and passive DNA demethylation in the human germ cell tumor-derived cell line NCCIT, known to express TET enzymes [34,35]. To distinguish between active and passive DNA demethylation at base-resolution, we coupled Tet-assisted bisulfite (TAB) and oxidative bisulfite (OxBS) conversion chemistries to DNA methylation analysis with the Illumina MethylationEPIC BeadChIP (EPIC array) [36][37][38][39][40]. The EPIC array is a high-throughput platform that interrogates the DNA methylation status of approximately 850,000 individual CpG dinucleotides at baseresolution across multiple features of the genome (e.g., CpG islands, promoters, enhancers). Using bisulfite-converted genomic DNA (gDNA) as an input, single-stranded DNA probes hybridize to the bisulfite-converted gDNA, and single base-pair extensions with fluorescently labeled nucleotides reveal the underlying modification status of the gDNA (Fig. 1a). For example, if a cytosine nucleotide is unmodified in the gDNA, bisulfite conversion will deaminate the cytosine to uracil, which will then be read as thymine following whole-genome amplification. Once the probe for this specific CpG hybridizes to the bisulfite-converted gDNA, an adenine nucleotide will be incorporated and give off a fluorescent signal to indicate that the cytosine was unmethylated (Fig. 1a). Vice versa, cytosine nucleotides that are modified (5mC/5hmC) are protected from bisulfite conversion and will remain cytosines [41]. Following bisulfite conversion, whole-genome amplification, and hybridization, a fluorescently labeled guanine nucleotide will be incorporated, informing that the underlying cytosine was methylated (Fig. 1a).
TAB conversion is an upstream modification to the standard bisulfite conversion method that allows for only 5hmC nucleotides to be read as modified cytosines [38,42]. 5hmC nucleotides in gDNA are first protected from downstream steps by addition of a glucose moiety mediated by T4 β-glucosyltransferase (β-GT) (Fig. 1b). 5mC nucleotides are targeted for TET-mediated stepwise oxidation to 5hmC, 5fC, and 5caC by incubating β-GTtreated gDNA with the recombinant catalytic domain of TET2 and its required co-factors. Following bisulfite conversion and amplification, only 5hmC nucleotides will be read as cytosine by the EPIC array, and 5mC/5C nucleotides are read as thymine (Fig. 1b). With OxBS conversion, 5hmC nucleotides in gDNA are oxidized by potassium perruthenate (KRuO 4 ) to 5fC prior to bisulfite conversion (Fig. 1c) [43]. Following EPIC array processing, 5mC will be read as cytosine while all oxidized cytosines and 5C will be thymine (Fig. 1c).
In this chapter, we demonstrate the utility of EPIC arrays for determining active versus passive DNA demethylation using the techniques shown in Fig. 1. We provide bioinformatic pipelines that can be used to analyze the 5mC and 5hmC signals from TAB and OxBS arrays. Additionally, we detail assays that can be used to determine relative global change in 5mC and 5hmC across gDNA samples, which we use to check samples prior to EPIC array analysis. Finally, we provide a comparison of the TAB array and OxBS array approaches and discuss how to determine which platform is best suited for different experiments.

Benchtop Assays to Detect DNA Modification Change
We treated NCCIT cells (biological duplicate) with PBS (NoTx), 1 μM DAC to induce passive DNA demethylation, and 1 μM DAC with 57 μM VitC (DAC + VitC) to induce both passive and active DNA demethylation (Fig. 2a). In order to conduct TAB array or OxBS array, each gDNA sample must undergo two different treatments: (1) bisulfite conversion and (2) TAB/OxBS conversion. Both treatments of an individual gDNA sample are then submitted for processing on the EPIC array, meaning that the user cost is doubled for analysis of each sample. Depending on the nature of the experiment, querying both 5mC and 5hmC on the EPIC array can become quite expensive. In this section, we describe quick, low-cost benchtop assays commonly used in our laboratory to detect locus-specific and global changes in 5mC and 5hmC across gDNA samples of interest prior to submission on the EPIC array.

Locus-Specific High-Resolution Melt (HRM) Analysis-High-resolution
melt (HRM) analysis is a quantitative, real-time PCR-based method that allows the user to determine the relative nucleotide composition of a region of double-stranded DNA by analyzing the melting curve of a PCR amplicon [45]. Initially designed to identify mutations and polymorphisms in a gDNA sample, HRM has been adapted for use in epigenetics research to determine the relative amount of DNA modifications at a given locus [46,47]. Following DNA isolation, a sodium bisulfite conversion step is performed to make single nucleotide polymorphisms (SNPs) to the DNA that indicate if a cytosine nucleotide was modified. If the cytosine is modified (either 5mC or 5hmC), the C will stay a C, but if the cytosine is unmodified, it will be deaminated to uracil (and converted into thymine during PCR amplification). HRM takes advantage of these methylation-specific SNPs. After bisulfite conversion, regions of interest in the genome are amplified by real-time qPCR, and then a high-resolution melt step, in which the temperature is raised in very small increments and fluorescence is detected after each increment, is conducted to determine the melting temperature of the amplicon. The more Ts (unmodified cytosines) in the amplicon, the lower the melting temperature; the more Cs (modified cytosines) in the amplicon, the higher the melting temperature. By using differences in melting temperature of an amplicon across samples, the relative DNA modification state of a sample can be determined.
In our treatment paradigm (Fig. 2a), we measured a decrease in the peak melting temperature for DAC and DAC + VitC samples relative to the NoTx group, indicating that the genomic loci being queried (PPP1R18, DAXX) have less modified cytosines than the NoTx group (Fig. 2b). The RPL30 promoter served as a negative control for cytosine modifications, as it is completely unmodified in NCCIT (Fig. 2b). RPL30 also served as a positive control for bisulfite conversion, as amplification of this region could not occur without high conversion efficiency. Collectively, these results demonstrate that the DAC treatment was effective in reducing the overall modification level of cytosines at known regions of modification in the NoTx sample, indicating that these samples and treatment paradigms were good candidates for EPIC array analysis.
Step 4 95 °C 30 s Step 5 60 °C 1 min Step 6 Melt Curve 65-95 °C-10 s/step-increase temperature by 0.1 °C each step, and capture fluorescence at end of each step.

6.
The CFX manager software will automatically calculate the melting temperature of each sample. All fluorescence data can also be exported for each individual temperature measurement ("Melt Curve Derivative Results.xlsx") to build the plots as shown in Fig. 2b (see Note 4).

Global
Quantification of 5hmC-As 5mC is substantially more abundant in the genome than 5hmC (approximately 14-fold higher), quantification by HRM for passive loss of DNA methylation is sufficient. However, detecting global changes in 5hmC and active DNA demethylation is challenging due to the low level of this modification on cytosine nucleotides. In this section, we discuss two approaches to determine the global level of 5hmC across samples: (1) ELISA-based quantification and (2) 5hmC DNA Dot Blot.
ELISA-Based Assay: HRM analysis of DAC and DAC + VitC treated samples suggested substantial loss in cytosine modifications relative to NoTx (Fig. 2b). As 5mC is the most abundant cytosine modification, detection of changes in 5hmC are likely masked by 5mC changes in the HRM assay. Using the EpiGentek MethylFlash Global DNA Hydroxymethylation (5-hmC) ELISA Easy Kit (Colorimetric), we profiled global 5hmC levels in our gDNA samples to determine if treatment with DAC or DAC + VitC induced changes in 5hmC. Indeed, while DAC treatment did not significantly affect global 5hmC levels relative to NoTx, the addition of VitC to the DAC treatment lead to a significant increase in 5hmC detectable by this assay (Fig. 2c). Taken together with our HRM results of these gDNA samples, we concluded that our treatment conditions induced changes to both 5mC and 5hmC.

1.
Prepare gDNA samples in a 96-well plate at a concentration of 25 ng/μL. A total of 100 ng gDNA is added to the assay wells.

3.
Follow analysis instructions as outlined in the EpiGentek MethylFlash Global DNA Hydroxymethylation (5-hmC) ELISA Easy Kit (Colorimetric) manual (see Note 7). Analyze all biological and technical duplicates separately.

DNA Dot Blot:
For additional confirmation that our treatments sufficiently promoted changes in 5hmC levels, we performed gDNA dot blot analysis. Briefly, gDNA is denatured and immobilized on a nitrocellulose membrane prior to being probed with a 5hmC antibody. With this assay, changes in global 5hmC were detected in samples treated with DAC and DAC + VitC relative to NoTx (Fig. 2d). Complimenting the results of our HRM analysis and ELISA-based assays, we further concluded that both active and passive DNA demethylation would be observed in our samples after application to TAB array and OxBS array.

2.
Use a NanoDrop spectrophotometer to measure gDNA sample concentration.

5.
Neutralize samples with 1 M ammonium acetate on ice. Incubate sample on ice for 10 min.

6.
Load 240 μL of each sample into the top row of a 96-well plate. Load 120 μL 1× TE buffer in each sequential row. Using a multichannel pipette, ensure the samples in the top row containing gDNA are thoroughly mixed and transfer 120 μL to the row below. Repeat this process working down the rows to achieve twofold serial dilutions.

7.
Equilibrate nitrocellulose membrane and two sheets of filter paper in 6× SSC buffer.

8.
Secure membrane on top of filter papers in the dot blot apparatus. Tighten knobs as much as possible, apply vacuum, and re-tighten knobs.

9.
Wash wells with 200 μL 1× TE buffer (see Note 9). 5. Make sure to mix all gDNA samples extremely well before applying to the assay plate, as comparisons among samples are dependent on the amount of gDNA loaded. 6. Measure remaining gDNA by NanoDrop to ensure accurate concentrations and amount loaded into the assay wells. 7. We determined that the calculation using polynomial second order regression fit our standard curve best for our analysis of % 5hmC (data not shown). 8. Even loading of gDNA across samples is crucial. To achieve this, use the average of at least two Nanodrop readings and thoroughly mix samples and wells. In addition, it is helpful to move quickly through the denaturation and neutralization steps to ensure even processing across samples. If an alternative starting concentration of gDNA is desired, adjust accordingly, but account for a minimum of 20 μL dead volume in each well of the 96 well plate to improve pipetting accuracy. 9. Always apply liquid to the membrane while the vacuum is off. If individual wells do not clear, pipetting a few times will allow them to flow through.

10.
Using a multichannel pipette, apply 109 μL of each sample to the membrane. Final amount of gDNA is 800 ng followed by twofold serial dilutions. Allow samples to sit on membrane 2-5 min before applying vacuum (see Note 9).

11.
Apply vacuum to pull samples through the manifold. Once each well has cleared, wash wells in 200 μL 2× SSC buffer.

12.
Remove membrane from apparatus, mark corners with a pencil to maintain orientation, place in a covered container (we use pipette tip box lids), and dry at 80 °C for 45 min in a hybridization oven.

14.
Block for 1 h in Superblock at room temperature.

17.
Incubate blot in rabbit secondary antibody diluted 1:5000 in Superblock at room temperature for 1 h.

20.
For verification of gDNA loading, incubate blot in stripping buffer for 20-30 min. Rinse with distilled water and incubate in 5% methylene blue stain for 15-20 min. Rinse with distilled water and place between plastic to scan image.

Tet-Assisted Bisulfite (TAB) Array
General Procedure Preparation of gDNA

1.
Quantify gDNA by Invitrogen Qubit fluorometer dsDNA HS assay and dilute 5 μg gDNA in nuclease-free water to a final volume of 130 μL.

2.
Transfer prepared gDNA to a Covaris microtube, and shear sample with Covaris E220 sonicator to a final size of <10,000 bp using the following parameters:

3.
Transfer sheared gDNA from the Covaris microTUBE to a 1.5 mL microcentrifuge tube.

6.
Wash samples once with 70% ethanol, and centrifuge at 17,090 RCF for 10 min at room temperature.

7.
Air-dry pelleted gDNA upside-down over a KimWipe for approximately 8-10 min at room temperature.

1.
In a PCR-tube strip, combine the following reagents from the NEB T4-βGT kit:

3.
Add 80 μL of nuclease-free water to 20 μL of reaction. Transfer total volume to 1.5 mL tube.

4.
Add an additional 100 μL of nuclease-free water to bring the total volume to 200 μL.

5.
Add KAPA Pure Beads to sample at a 1:1 ratio. In this case, add 200 μL of beads (see Note 11).

6.
Mix sample and beads well by flicking the tube multiple times.

7.
Incubate at room temperature for 10 min. 11. We tried several different methods of purifying gDNA following β-GT and Tet oxidation reactions (including phenol:chloroform purification with ethanol precipitation and standard DNA purification kits) and determined that KAPA Pure Beads most reliably gave the best yield of gDNA. As the amount of gDNA to be submitted to core facilities for EPIC array processing is crucial, it is important to use a method of DNA recovery that will provide the best overall yield as measured by Invitrogen Qubit dsDNA HS assay.

8.
Place samples on DynaMag magnetic rack and let beads move to the back of the tube. Usually this step takes about 10 min for the supernatant to become completely clear.

10.
With the beads still on the rack, wash beads with 500 μL of 80% ethanol.

11.
Let wash sit on beads for 30 s and then remove.

13.
Remove the wash, and let beads air-dry for 4 min.

14.
Remove the beads from the magnetic rack and resuspend in 30 μL of nucleasefree water to elute gDNA.

15.
Incubate beads at room temperature for 10 min.

16.
Place beads back on the magnetic rack and allow beads to move to the back of the tube.

17.
Carefully withdraw elution and save in 1.5 mL tube.

1.
Add components in the following order and amounts (see Notes 13-15):

2.
Incubate samples in the dark at 37 °C for 2 h.

4.
Bring final volume up to 100 μL with nuclease-free water.

5.
Incubate at 37 °C for 2 h in the dark. 12. To avoid pulling beads with the withdrawal of the supernatant, leave a small amount of volume at the bottom of the tube. Addition of 80% ethanol will take the small proportion of beads at the bottom of the tube and efficiently capture in the magnetic field so that accidentally taking beads will not be an issue in the removal of the washes. 13

6.
Add an additional 100 μL of nuclease-free water to bring the total volume to 200 μL.

7.
Add KAPA Pure Beads to sample at a 1:1 ratio. In this case, add 200 μL of beads.

9.
Add an additional 20 μL of nuclease-free water to the KAPA Pure beads following removal of the TAB-treated gDNA elution.

10.
Incubate beads at room temperature for 10 min.

11.
Place beads back on the magnetic rack and allow beads to move to the back of the tube.

12.
Carefully withdraw elution and save in a different 1.5 mL microcentrifuge tube than the first elution. This elution will be used to process the 5mC/5hmC standards described below.

13.
Quantify TAB-treated gDNA from the first elution using Invitrogen Qubit dsDNA HS assay. TAB-treated gDNA may be stored at −20 °C for up to 2 weeks.

14.
Submit TAB-treated gDNA and non-treated gDNA from the same sample to a genomics core that processes EPIC arrays (see Note 16).

1.
Using 10 μL of the second elution of the TAB-treated gDNA recovered from step 12 above, perform bisulfite conversion overnight with the ZYMO DNA EZ Methylation kit per the manufacturer's instructions (see Note 2). Elute in 10 μL nuclease-free water.

2.
Set up PCR reaction mixture to amplify the 5mC/or 5hmC spike-in standard from step 1 of β-GT reaction (see Note 17): 16. Contact the core facility that will be processing samples on the EPIC array prior to submission. Most core facilities that process EPIC arrays require a certain amount and quality of gDNA, and TAB-treated gDNA typically does not meet these standards. Discuss with the core facility the upstream modification that will be done with the gDNA, and how much gDNA is expected to be submitted. Our laboratory typically recovers 280-330 ng of TAB-treated gDNA from the initial 500 ng of gDNA that was put into the reaction. If TET2-CD enzyme is in ample supply, we recommend doubling the Tet oxidation reaction for each sample and pooling the reactions together prior to KAPA Pure Beads DNA purification. 2. Add an additional centrifugation step following the last ethanol wash to remove any excess ethanol left in the column. Excess ethanol will adversely affect the results of this analysis. 17. For the initial spike-in of the standard to the T4-βGT reaction, add only 5mC to one reaction and only 5hmC to a different reaction.
Both the 5mC and 5hmC standard from ZYMO have the same sequence, so determining efficiency of the TAB oxidation reaction for both standards needs to be separated in space. The standards are also modified at every cytosine (CpG and CpH) in the sequence. We designed bisulfite primers using MethPrimer that would still allow for amplification following bisulfite conversion of this DNA molecule:

4.
Run amplification products on a 1.5% agarose gel at 100 V for 30 min.

5.
Excise amplification product from the agarose gel and purify using NEB Monarch Gel Extraction Kit following all manufacturer's instructions. Elute PCR product in 6 μL nuclease-free water.

6.
Ligate PCR product into Promega pGEM-T vector overnight at room temperature using the following reaction mixture:

8.
Aliquot 50 μL of competent cells per ligation product into a new tube.

9.
Add 5 μL of ligation product to cells, gently flick the tube a few times, and incubate the cells on ice for 30 min.

10.
Heat shock cells for exactly 30 s at 42 °C.

11.
Place the cells on ice for 5 min.

12.
Add 450 μL of SOC media to the cells and incubate with shaking at 37 °C for 1 h.

13.
Split the cells onto two different ampicillin bacterial agar plates that have been coated with 80 μL of 80 mg/mL X-gal and spread until mostly dry.

15.
The next day, make a master ampicillin agar plate for each PCR product and pick at least 30 white clones to grow up individually on the plate.

16.
Incubate the master agar plates overnight at 37 °C.

17.
Perform colony PCR using the reaction conditions from steps 2 and 3 on at least 20 clones to verify successful insertion of the product.

18.
Using illustra TempliPhi DNA Sequencing Template Amplification Kit, prepare a 96-well plate of clones to be sequenced by adding 5 μL of Denature Buffer to each well and a small amount of a positive colony (see Note 18).

19.
Denature the samples at 95 °C for 3 min and let the samples cool to 4 °C.

20.
Add 5 μL of the Premix buffer to the cooled samples, seal the plate, and submit for Sanger sequencing.

21.
Analyze sequences using QUMA online software with all parameters set to account for CpH methylation [44].

1.
Follow all manufacturer instructions exactly. gDNA samples should be in water rather than TE buffer. gDNA input is 500 ng for both the sample that will be treated with oxidant and the sample without oxidant.

Bioinformatic Pipelines for EPIC Array Analysis
To model the utility of BS array (Fig. 1a), TAB array (Fig. 1b), and OxBS array (Fig. 1c) for detecting active and passive DNA demethylation, we treated NCCIT embryonal carcinoma cells with compounds to inhibit the DNMTs (DAC) and enhance TET activity (VitC).
Notably, NCCIT cells are derived from a germ cell tumor, giving them pluripotent properties and the ability to differentiate upon treatment with retinoic acid [49]. Given these properties, NCCIT cells serve as an excellent model to study active and passive DNA demethylation, as cytosine modification patterning by both DNMTs and TETs is dynamic [35,50]. To specifically inhibit the catalytically active DNMTs, of which all are highly expressed in NCCIT, we treated cells with 1 μM DAC for 24 h, and then refreshed cells with media lacking DAC for the remainder of the growth period (Fig. 2a). To both inhibit DNMTs and enhance TET activity, we treated cells with a 24-h pulse of 1 μM DAC and then added VitC at a physiologic concentration (57 μM) every 24 h until collection. Cells treated with PBS 18. For validation of TAB reactions on the 5mC standard, we submit at least 30 colonies to ensure that all 5mC was successfully oxidized.
(NoTx) served as our control (Fig. 2a). All treatments were done in biological duplicate over 72 h, and differences in population doublings across treatments were insignificant (data not shown), indicating that all treatment groups went through DNA replication roughly an equivalent number of times. As discussed, we performed benchtop assays to determine the effectiveness of our drug treatments for 5mC loss (Fig. 2b) and induction of 5mC conversion to 5hmC (Fig. 2c, d) prior to submission on the EPIC array. All EPIC array analysis is conducted in the R statistical software environment (Version 3.6.1) (R Core Team).

TAB Array
Processing-We validated the efficiency of TAB oxidation reactions by standard bisulfite Sanger sequencing (detailed in Subheading 3.1) of fully modified 5mC and 5hmC spike-in standards (Fig. 3a). Following validation of the reaction, BS and TAB array were completed for both biological duplicates of NoTx, DAC, and DAC + VitC samples to measure the levels of 5mC/5hmC and 5hmC, respectively. While BS array samples demonstrated a high retention rate of probes following SeSAMe processing with default settings, TAB array samples were more likely to fail array QC standards due to a high detection p-value (≥0.05). As the intensity values from the unmethylated and methylated fluorescent channels are used to determine the quality of probe detection, we hypothesized that SeSAMe was overestimating our failure rate due to the low signal from the methylated fluorescent channel [51]. In an effort to retain more probes in the TABtreated samples that were biologically meaningful, we relaxed the detection p-value threshold to include all probes with a detection p-value ≤0.15. At this threshold, we were able to retain almost 70,000 more probes in our analysis without compromising our biological conclusions. For this analysis, we included all probes that had a detection p-value ≤0.15 across all samples queried on BS and TAB arrays (12 samples, n = 466,341 probes).
DNA modifications across a sample's population of DNA molecules are quantified on the EPIC array by the β-value in which a β-value of 1 indicates the cytosine is fully modified (5mC/5hmC) in the population and a β-value of 0 indicates the cytosine is completely unmodified (5C) in the population. For initial sample characterization, we profiled the density distribution of β-values for cytosine modifications (BS array) and 5hmC alone (TAB array) (Fig. 3b, c). BS array analysis demonstrated a bimodal distribution of β-values for the NoTx samples in which the majority of cytosines were either fully modified or fully unmodified ( Fig. 3b (top), c (left panel)). For both DAC and DAC + VitC samples, a leftward shift in β-value distributions was observed, consistent with loss of DNA modifications. The DAC + VitC samples also appeared to lose slightly more cytosine modifications relative to the DAC samples, although the difference between the median losses was not as pronounced as compared to the NoTx samples (Fig. 3b, top). Unlike BS array, TAB array samples yielded a unimodal distribution of β-values closer to 0, as the level of 5hmC in a sample population is typically very low (Fig. 3b, bottom; Figure 3c, middle), indicating that treatment of NCCIT cells with Vitamin C effectively enhanced TET activity and the conversion of 5mC to 5hmC.
As BS array β-values are a summation of 5mC and 5hmC signal, 5mC signal alone can be calculated by subtraction of TAB array β-values from BS array β-values from the same sample. In principle, this subtraction works well and yields 5mC β-values that are interpretable. As previously reported, this subtraction occasionally results in negative βvalues, typically when the cytosine nucleotide is primarily modified by 5hmC in the population with little to no detectable 5mC [38,52]. To account for negative β-values, a correction was applied that discarded all probes that yield a β-value for 5mC that was < −0.05. Calculated 5mC β-values that fell between −0.05 and 0 were adjusted to have a βvalue of 0.001 [38]. Performing this correction on our dataset resulted in a loss of 8953 probes from our analysis. Distribution of 5mC β-values among all samples revealed that the DAC + VitC samples demonstrated a more significant leftward β-value shift than DAC and NoTx, indicating that DAC + VitC treatment induced more DNA demethylation than DAC (Fig. 3c, right panel). In Subheading 3.3.4, the calculated β-values for 5mC and TAB array β-values for 5hmC from this processing pipeline were used to determine the significance of modification changes across all samples.

1.
Load necessary R packages for analysis.

2.
Move all IDAT files for analysis to the same directory, and then set the working directory to the location of the IDAT files.

3.
Make a signal summary dataset for all the IDAT files and run SeSAMe to generate and normalize β-values for each sample [51]. Relax the pval.threshold to 0.15 to include more probes in the analysis. Name your samples as needed.
Make sure the sample order is the same as the order of EPIC array number and position.

5.
Transform the β-value data matrix into a data frame and remove all probes that do not have a β-value for all samples queried. Plot the density of β-value distributions across samples.

6.
Calculate true 5mC β-values by subtracting the TAB array β-values from the BS array β-values for each individual sample.

7.
To correct for negative 5mC β-values, write an if-else statement such that any βvalue that is less than −0.05 will be given the new value "10," and any β-value between −0.05 and 0 will be corrected to 0.001. If the β-value does not meet either of these criteria, it will remain as it was originally calculated from the code above. Finally, remove all 5mC β-values that were transformed into "10" as they will not remain in the analysis.

OxBS Array
Processing-BS and OxBS array were performed on an individual set of drug treatments (NoTx1, DAC1, DAC1 + VitC) to determine levels of 5hmC/5mC and 5mC, respectively. Distribution of β-values for NoTx and DAC treatments were similar to those observed on the BS array conducted alongside the TAB array, where a bimodal distribution was observed for highly modified and completely unmodified cytosines (Fig. 4a,  top). Similar to the BS array results in Subheading 5.1, we observed a leftward β-value shift in DAC treated samples, indicating a loss of DNA modifications. Importantly, the difference in this leftward shift between DAC1 and DAC1 + VitC was minimal in the BS array β-value distribution (Fig. 4a, top). While the NoTx1 and DAC1 β-value distribution from the OxBS array were similar to the BS array pattern, the DAC1 + VitC distribution of β-values on OxBS array demonstrated a greater degree of a leftward shift relative to both NoTx1 and DAC1 than observed by BS array, suggesting that 5hmC patterning was also changing in this sample (Fig. 4a, bottom).
To determine the β-values for 5hmC in drug treatments, we performed the same calculation as described for TAB array, except subtraction of OxBS array (5mC) from BS array (5mC/ 5hmC) yielded 5hmCβ-values rather than 5mC. Next, we plotted the density distributions of β-values from the BS array, OxBS array, and calculated 5hmC values. Unlike the results from TAB array, we noticed that a large fraction of calculated 5hmCβ-values fell below zero, particularly for NoTx1 and DAC1 treated samples (Fig. 4b). We quantified the number of CpG probes with a 5hmCβ-value below and above zero among all samples and determined that while NoTx1 5hmCβ-values were evenly split, more probes fell above zero in DAC1 than NoTx1. DAC1 + VitC 5hmCβ-values were almost all above zero (Fig. 4c). Taken together with the results from ELISA-based assays (Fig. 2c) and TAB array 5hmCβvalue distributions (Fig. 3b, bottom; c middle), we believe that the overall abundance of 5hmC in a sample population can predict the ability of OxBS array to quantify 5hmCβvalues via the subtraction method, a perspective that will be further discussed in Subheading 6.
To correct for the subtraction method disparity, we employed a Bioconductor package specifically designed to correct for this problem in OxBS array data, OxBS-MLE [53]. OxBS-MLE uses the paired CpG probe intensity values from the BS array and OxBS array to calculate maximum likelihood estimates of 5mC and 5hmCβ-values within a sample. OxBS-MLE correction produced β-value distributions (Fig. 4d, e) for 5mC (top) and 5hmC (bottom) that closely resembled results obtained by BS array and TAB array (Fig. 3b, c), where DAC + VitC samples demonstrated the greatest loss of 5mC and the greatest increase in 5hmC relative to both NoTx and DAC. Finally, we calculated β-values for 5mC and 5hmC of DAC and DAC + VitC relative to the NoTx sample to quantify changes in 5mC and 5hmC (Fig. 4f). As we only performed OxBS array on a single drug treatment set, we used β-values to determine significance of these changes in Subheading 5.3.

1.
Load necessary R packages for analysis.

2.
Move all IDAT files for analysis to the same directory, and then set the working directory to the location of the IDAT files.

3.
Make a signal summary dataset for all the IDAT files and run SeSAMe to generate and normalize β-values for each sample. Name your samples as needed.
Make sure the sample order is the same as the order of EPIC array number and position.

4.
Transform the β-value data matrix into a data frame and remove all probes that do not have a β-value for all samples queried.

6.
Determine the number of CpG probes with a 5hmCβ-values above and below 0.

7.
To correct for the number of 5hmCβ-values below 0, use the OxBS-MLE command from ENmix [53]. First, isolate the β-values for BS array and then isolate the β-values for OxBS array.

8.
Next, isolate the intensity values independently for both BS array and the OxBS array. A critical note is that all samples must remain in the same order and be named the same thing between BS array and OxBS array. aa.5mC","NoTx.5hmC","DAC.5hmC","DAC.aa.5hmC")

10.
Calculate β-values for each drug treatment relative to each other for both 5mC and 5hmC.

Comparison of TAB and OxBS Array
Results-To directly compare results derived from TAB array and OxBS array among samples, we merged the calculated 5mC and 5hmCβ-values from each analysis for probes that maintained high QC standards between both arrays (n = 448,954 CpGs). Multi-dimensional scaling (MDS) analysis among all probes revealed that samples clustered based on drug treatments and cytosine modification rather than by platform (TAB versus OxBS), indicating that our results between the approaches were consistent (Fig. 5a). Next, we directly compared β-values of 5mC and 5hmC for samples that were the same between the two arrays: NoTx1, DAC1, and DAC1 + VitC (Fig. 5b). Overall, 5mC β-values were consistent between TAB and OxBS array with Pearson correlation coefficients above 0.9 (Fig. 5b, top). While 5hmCβ-values were not as consistent as 5mC, we noted that DAC1 + VitC, the sample with known higher amounts of 5hmC compared to NoTx1 and DAC1, yielded the highest Pearson correlation coefficient (R = 0.365) (Fig. 5b, bottom), suggesting that when 5hmC is abundant, both platforms may more consistently capture this distribution. We believe that the lack of strong correlation between TAB and OxBS5hmCβ-values is due to the low abundance of this cytosine modification and the difference in how the β-values are determined in TAB (directly) versus OxBS array (indirectly).

2.
Perform multidimensional scaling (MDS) analysis on all sample β-values using the following command from minfi to determine variance and relative separation among samples:

Determining Active Versus Passive DNA Demethylation Using TAB and
OxBS Arrays-For each individual platform, we next determined which cytosines were significantly differentially modified for 5mC and 5hmC relative to the other drug treatment samples. For TAB array analysis, we assayed both biological duplicates of each drug treatment, which allowed us to conduct significance testing for each cytosine modification using limma [54,55]. We considered a CpG as differentially modified if the adjusted p-value ≤0.01 and the log 2 fold-change ≥1.0. As would be expected from global analysis, DAC + VitC exhibited gains in 5hmC relative to both NoTx and DAC (Fig. 5c, left). Both DAC and DAC + VitC drug treatments also demonstrated significant loss of 5mC relative to NoTx, and DAC + VitC additionally had a number of CpGs that significantly lost 5mC relative to DAC (Fig. 5c). We performed OxBS arrays on a single set of drug treatments, so rather than conduct significance testing across biological duplicates, we calculated β-values among the samples and set the following thresholds for determining differential modifications: 5hmC |β-value| ≥0.1; 5mC |β-value| ≥0.2. Consistent with our comparison of TAB array and OxBS array β-values, the pattern of differentially modified cytosines, as queried by OxBS array, was almost identical to that of TAB array (Fig. 5c). Notably, OxBS array differential analysis did call more probes significant using our set criteria; however, this was most likely due to our inability to call statistical significance, as we only submitted one of the drug treatment sets for OxBS array analysis, indicating the importance of querying biological replicates when possible.
Our ultimate goal for TAB array and OxBS array analysis was to distinguish the degree of active versus passive DNA demethylation in drug treatments. To do this, we classified the collective behavior of cytosine nucleotides, both 5mC and 5hmC, for each individual platform. Using the criteria for determining differential modifications as discussed for TAB array and OxBS array analysis (Fig. 5c), we classified a CpG's collective behavior by asking how 5hmC changed in one drug treatment group relative to another, and then asking how 5mC changed as well. If the criteria was not met for determining differential modifications, we classified this as "no change" in the modification. For example, if 5hmC at an individual CpG increased ("+") in DAC + VitC relative to NoTx, and 5mC at the same CpG decreased ("−"), we considered this a CpG that was susceptible to active DNA demethylation (Fig. 5d, middle circles, red). However, if 5hmC did not have a significant change ("0"), but 5mC decreased ("−"), then we would consider this passive loss of DNA methylation (Fig. 5d, blue). No change in 5mC or 5hmC at a CpG locus is represented by dark green. All classifications that had a measurable number of probes that behaved in the given manner are shown in the legend for Fig. 5d. Overall, for both TAB array and OxBS array analysis, DAC treatment compared to NoTx demonstrated predominately passive DNA demethylation (Fig.  5d, top, blue). Addition of VitC to DAC treatments (DAC + VitC) successfully induced active DNA demethylation in addition to passive DNA demethylation relative to NoTx as queried by both platforms (Fig. 5d, middle, red/blue). Finally, by comparing DAC + VitC to DAC, we observed that while passive DNA demethylation is largely conserved with DAC treatments (highlighted by the increase of "no change" in dark green), the primary difference between DAC + VitC and DAC is the induction of 5mC conversion to 5hmC and an increase in active DNA demethylation with the addition of VitC (Fig. 5d, bottom see Note 19).

1.
For statistical testing, we use the standard workflow within limma to compare sample groups [54][55][56][57]. First, transform β-values to M-values, and transform the data frame into a data matrix.

3.
Construct a contrast matrix for the samples to be compared, and then proceed with the standard limma workflow to calculate the statistical significance. For simplicity of comparisons and to get individual statistics for each comparison, make each contrast matrix individually and combine all statistical data at the end. 19. Of particular note, we would like to comment on deciding between TAB array and OxBS array for quantifying 5hmC in a sample population. As previously mentioned, 5hmC abundance is very low in comparison to 5mC (almost 14-fold on average), with the exception of ESCs and brain tissue [10,11,58]. In our analysis, the OxBS array was inefficient at detecting 5hmC in the NoTx and DAC samples due to the low level of 5hmC; however, it was much better at capturing this modification in the DAC + VitC sample in which VitC successfully induced an increase in the mark (Fig. 4c), suggesting that a threshold of 5hmC abundance is needed in a sample population for OxBS array to reliably quantify this mark. Indeed, by coupling this rationale with our ELISA-based global quantification results (Fig. 2c), we hypothesize that a threshold of at least 0.1% total 5hmC is needed in a gDNA sample to be detected by OxBS array, as our DAC + VitC samples exceeded this threshold while our DAC and NoTx samples did not. To further investigate this notion, we downloaded OxBS array data from a patient cohort of brain tissue samples and performed the subtraction calculation on these data (GSE138597) [59]. While in our analysis of NoTx and DAC treated samples we observed a large percentage of probes with negative 5hmCβ-values following subtraction (Fig. 4c), the brain samples with known high levels of 5hmC only demonstrated 9% of negative 5hmCβ-values on average (data not shown). We believe that samples with low 5hmC abundance processed by OxBS array analysis are more susceptible to this issue than if they are processed by TAB array, as OxBS array provides an indirect measurement of 5hmC while TAB array provides a direct measurement. Further work needs to be conducted to determine what exact global 5hmC threshold needs to be met to allow accurate quantification of 5hmC by the OxBS array. We recommend that the decision for use of TAB array or OxBS array should be made following global quantification of 5hmC levels in a sample. If the global amount of 5hmC is very low, we recommend the TAB array approach, as it can directly measure low levels of 5hmC present in a sample population. If the global amount of 5hmC is relatively high, then we recommend the OxBS approach, as reagent cost and sample processing time is limited in comparison to TAB array.

1.
Using the calculated β-values, define the direction of change for each modification or note if the change is not significant using the following criteria: