Key words

1 Introduction

The crucial role of regulatory elements that comprise the non-coding genome has been demonstrated in development, disease, and evolution [1, 2]. Advances in genomics (e.g., the ability to sequence and assemble myriads of animal genomes) and techniques in molecular biology have now made it possible to explore the role of the regulatory genome in the process of whole-body regeneration. Previous techniques to characterize regulatory elements have relied either on species-specific reagents or a large number of input cells, hindering the genome-wide identification of putative enhancers in emerging model systems. The Assay for Transposase Accessible Chromatin (ATAC-seq) [3], which is relatively wet-lab simple and requires a small amount of input material, has the potential to revolutionize the fields of functional genomics and evolutionary-developmental biology by providing a method to identify putative enhancers at high resolution in emerging systems of study (Fig. 1).

Fig. 1
figure 1

Overview of an ATAC-seq seq experiment to assay regeneration-responsive chromatin. ATAC-seq involved applying a transposase (left panel) capable of cutting open chromatin and simultaneously ligating in sequencing primers (“tagmentation”). The transposase enzyme will integrate less in closed chromatin (WT) and will preferentially insert into open chromatin, e.g., a region that harbors an enhancer (blue) that opens during regeneration (“regen,” bottom left). The final library consists of small regions of open chromatin that are ready to be sequenced. After alignment of these sequences to the genome (green lines, right panel), “peaks” of open chromatin (green) can be called and compared across regenerating samples (differential accessibility). Transcription factor (TF) binding can be inferred by viewing the number of transposase cutting events (# cuts) around TF binding sites. When a TF is bound, it occludes the transposase from inserting into that region and leaves a “footprint,” which can be compared across samples (“differential TF footprinting”)

ATAC-seq works by treating a small number of permeabilized cells or exposed nuclei to a transposase enzyme that preferentially accesses regions of open chromatin, simultaneously cutting DNA and inserting primers for sequencing (“tagmentation”) (Fig. 1). Following sequencing, reads mapped to the genome provide information on open chromatin, nucleosome position, and transcription factor binding. The main benefits of the assay are (1) no species-specific reagents, (2) low input required, from 50,000 cells down to a few thousand, (3) reproducibility, in that replicates are highly concordant, and (4) speed, one can go from intact tissue to a sequencing-ready library in a single day.

Due to the experimental ease and high resolution of ATAC-seq, a number of methods papers have been published that describe the assay in detail. These include step-by-step instructions for cell lines [4], zebrafish [5, 6], echinoderms [7], xenopus [8], and plants [9]. Recent advances to the protocol (“Omni-ATAC”) have improved the sensitivity of the assay and made it possible to perform in frozen tissues [10]. In addition to the wet-lab protocols for ATAC-seq, there are a number of methods papers that describe the bioinformatic data analysis portion of ATAC-seq [11,12,13]. The majority of the wet and dry lab portions of ATAC-seq are quite similar across organisms and do not deviate much from the original methods paper describing the assay [4]. The critical factor when performing ATAC-seq in a “new” species is attaining the correct number of cells for proper transposition. Keeping this as a focus, here we describe step-by-step instructions for ATAC-seq in the acoel worm Hofstenia miamia . A defining step of this protocol is direct disruption of tissue in lysis buffer (as opposed to traditional dissociation and cell counting), followed immediately by transposition. This rapid processing of samples likely reduces background noise and better captures transcription factor binding as inferred by footprinting. We envision that this protocol will work robustly for all invertebrate animals that are generally easy to lyse or dissociate into single cells.

2 Materials

  1. 1.

    Octylphenoxypolyethoxyethanol (IGEPAL CA-630) (Sigma cat # I8896).

  2. 2.

    Tagment DNA enzyme 1 (TDE1) enzyme.

  3. 3.

    2× Tagment DNA (TD) buffer (Illumina cat # 20034197) (see Note 1).

  4. 4.

    Mini kit for gel extraction and PCR clean up (e.g., Nucleospin, Macherey-Nagel cat # 740609).

  5. 5.

    High-fidelity 2× PCR master mix (New England Labs cat # M0541).

  6. 6.

    PCR primers (Table 1).

  7. 7.

    DNA concentration measurement equipment (e.g., Qubit, Thermo Fisher Scientific, cat # Q32851).

  8. 8.

    Automated electrophoresis tool (e.g., Tapestation, Agilent).

  9. 9.

    0.40-μm cell strainer (Falcon cat # 352340).

  10. 10.

    Lysis buffer: 10 mM Tris–HCl, pH 7.4, 10 mM NaCl, 3 mM MgCl2, 0.1% (v/v) IGEPAL CA-630. Prepare fresh, keep on ice.

  11. 11.

    Transposition reaction mix: 25 μL TD buffer, 2.5 μL TDE1 enzyme, 22.5 μL ddH2O. Prepare fresh, keep on ice.

  12. 12.

    PCR reaction mix: 25 μL high-fidelity 2× PCR mix, 2.5 μL 25 μM universal PCR primer 1, 2.5 μL 25 μM barcoded PCR primer 2, 10 μL H2O. Make fresh when performing the PCR amplification.

  13. 13.

    Samtools software version 1.10 [14].

  14. 14.

    Bowtie2 software version 2.3.2 [15].

  15. 15.

    Picard software version 2.24.0.

  16. 16.

    NGmerge software version 0.3 [16].

Table 1 Primer sequences. Primer sequences used for PCR, table reproduced from Supplementary Table 1 of [3]

3 Methods

Care should be taken to move as quickly as possible from tissue extraction to dissociation to retain chromatin state at the appropriate timepoint. This protocol is based on the original ATAC-seq protocol [3]. Modifications that improve the assay have been described (“Omni-ATAC”) [10], but use a detergent mixture that may be harmful to more sensitive cells. Thus, we suggest attempting the original protocol first and subsequently exploring the Omni-ATAC modifications to potentially improve the experiment.

3.1 Library Preparation

Determine the optimal tissue size of interest that contains ~50,000–100,000 cells (see Note 2).

  1. 1.

    Extract the desired tissue at timepoint of interest using sterile surgical blade (see Notes 3 and 4).

  2. 2.

    Transfer the sample to a 1.5-mL tube filled with ~25 μL of appropriate solution (e.g., PBS, sea water).

  3. 3.

    Replace the solution with 200 μL of cold lysis buffer.

  4. 4.

    Dissociate the tissue by gently pipetting using a p200 pipette until the fragment is completely in solution (~30 s) (see Note 5).

  5. 5.

    Filter the solution through a 40-μm filter into a new 1.5-mL tube.

  6. 6.

    Centrifuge the solution at 800 rcf for 10 min at 4 °C to pellet the cells/nuclei.

  7. 7.

    Gently remove the supernatant.

  8. 8.

    Resuspend the (invisible) pellet in 50 μL of the transposition reaction mix.

  9. 9.

    Incubate the cells at 37 °C for 30 min under 1000 rpm orbital shaking (e.g., thermomixer).

  10. 10.

    Purify the transposed DNA using extraction kit (see Note 6) according to the manufacturer’s instructions.

  11. 11.

    Elute in 12 μL of elution buffer.

  12. 12.

    Store purified DNA at −20 °C.

3.2 PCR Amplification of Library and Sequencing

  1. 1.

    Add 10 μL of the eluted library to the 40 μL PCR reaction mix in a 0.2-mL PCR tube.

  2. 2.

    Run PCR using the following conditions (see Note 7): 1 cycle 5 min at 72 °C, 11 cycles 10 s at 98 °C, 30 s at 63 °C, 1 min at 72 °C, hold at 4 °C.

  3. 3.

    Purify the amplified DNA using the gel extraction and PCR clean up mini kit according to the manufacturer’s instructions.

  4. 4.

    Elute in 22 μL of elution buffer.

  5. 5.

    Store-purified DNA at −20 °C.

  6. 6.

    Determine the concentration of library using the DNA concentration measurement equipment according to the manufacturer’s instructions. We typically attain around ~10–20 ng/μL, but concentration can range from ~1 to 30 ng/μL.

  7. 7.

    Run purified DNA on the automated electrophoresis tool according to manufacturer’s instructions (see Note 8, Fig. 2).

  8. 8.

    Pool libraries according to Illumina sequencing platform and desired ratio of reads (see Note 9).

  9. 9.

    Sequence using 50 bp paired-end on an Illumina platform at ~15 million mapped reads per Gb of genome (see Note 10).

Fig. 2
figure 2

Tapestation examples of different quality ATAC libraries. ATAC libraries were run on an Agilent Tapestation 2200 using an HD5000 tape. “Ideal” trace shows extensive nucleosomal “laddering,” indicating a high-quality library. The “acceptable” trace also shows nucleosomal laddering, but with an extended sub-nucleosomal peak that may indicate slight “over-tagmentation” (over-cutting of the enzyme, likely due to too few cells in the reaction). If no “ideal” libraries are present this library is acceptable to sequence. “Overtagmented” shows no laddering and only a single peak, likely due to too few cells being added to the reaction. This library should not be sequenced, and the experiment should be repeated with more input cells. The “undertagmented” (insufficient cutting by the enzyme) trace shows no nucleosomal laddering but also no clear sub-nucleosomal peak, which could be the result of too many cells in the reaction. This library should not be sequenced, and the experiment should be repeated with fewer cells

3.3 Data Analysis

Raw reads should be backed up in at least two separate locations, ideally one physical and one cloud- or server-based. The following steps are designed to guide the user from raw reads to a processed alignment file, which is the most common input file for most downstream applications. Further example code and details for read processing and other applications (including differential peak analysis, see Note 11) can be found at https://github.com/agehrke6/ATAC_processing_analysis_guide. Note that the example code given below is designed as a starting point for the beginner user, and the manuals for each bioinformatic tool should be consulted for full explanation and detail.

  1. 1.

    Trim raw reads of adapters using NGmerge: NGmerge -a -e 20 -n 4 -1 <sample>.R1.fastq.gz -2 <sample>.R2.fastq.gz -o <sample>_trimmed. This command will output two files: <sample>_trimmed.R1.fastq.gz and <sample>_trimmed.R2.fastq.gz.

  2. 2.

    Index genome of interest with Bowtie2: bowtie2-build <genome>.fasta <build_name>.

  3. 3.

    Map trimmed reads to reference genome with Bowtie2: bowtie2 -x <build_name> -X 2000 -1 <sample>_trimmed.R1.fastq.gz -2 <sample>_trimmed.R2.fastq.gz -p 31 | samtools view -b -S - | samtools sort - <sample>. This command will create an alignment file (.bam).

  4. 4.

    Index the .bam file: samtools index <sample>_nodups_nomulti.bam. Quality libraries have a high percentage of mapped reads (>80%).

  5. 5.

    Remove PCR duplicates from alignment (.bam) file using Picard: java -jar picard.jar MarkDuplicates I=<sample>.bam O=<sample>_nodups.bam M=<sample>_dups.txt REMOVE_DUPLICATES=true VALIDATION_STRINGENCY=LENIENT.

  6. 6.

    Remove reads mapping to the mitochondrial genome from the de-duplicated alignment (.bam) file using samtools: samtools view -h <sample>_nodups.bam | grep -v chrM | samtools sort -O bam -o <sample>_nodups_noMt.bam. Note that chrM should be changed to the designation of the mitochondrial genome in the input assembly (e.g., scaffoldX).

  7. 7.

    Remove multi-mapped reads using samtools (see Note 12): samtools view -h -q 30 <sample>_nodups_noMt.bam > <sample>_nodups_noMt_nomulti.bam.

  8. 8.

    Retain only properly paired reads using samtools (see Note 12): samtools view -h -b -F 1804 -f 2 <sample>_nodups_noMt_nomulti.bam > <sample>_nodups_noMt_nomulti_filtered.bam.

  9. 9.

    Sort the .bam file: samtools sort <sample>_nodups_noMt_nomulti_filtered.bam -o <sample>_nodups_noMt_nomulti_filtered_sorted.bam.

  10. 10.

    Index the final file using samtools: samtools index <sample>_nodups_noMt_nomulti_filtered_sorted.bam.

  11. 11.

    Use “clean” .bam file for peak calling and downstream analysis (see Note 12) (Fig. 3).

Fig. 3
figure 3

Standard bioinformatic workflow for processing and analyzing ATAC-seq data. Raw reads are removed of adapters, then aligned to the genome of interest with no additional quality trimming. Duplicate reads are removed, as well as reads that map to the mitochondrial genome, and reads that are not properly paired. We use Genrich to call peaks on each biological replicate, and then use IDR to call reproducible peaks between replicates. Finally, we use bedtools merge on all peaksets from all samples to create a non-overlapping set of peaks that represents a consensus peakset. For analysis, we use Diffbind to call differentially accessible peaks between samples, ChIPseeker to make peak-to-gene connections, and TOBIAS for footprinting and bound site calling. Results are best visualized first using IGV, and then pyGenomeTracks to create publication-quality figures

4 Notes

  1. 1.

    Alternatively, a homemade tagment buffer can be made: 20 mM Tris(hydroxymethyl)aminomethane; 10 mM MgCl2; 20% (v/v) dimethylformamide [17]. Adjust pH to 7.6 with 100% acetic acid before adding dimethylformamide. Store at −20 °C for 6 months.

  2. 2.

    Determining the optimal number of cells is the most crucial and variable aspect of the ATAC-seq protocol . As a first step, we recommend excising different tissue sizes, dissociating cells, and then counting to determine the size that most likely contains 50,000–100,000 cells. Once the appropriate general tissue size is identified, subsequent ATAC-seq experiments can be run without having to take the time-sensitive step of preparing and counting cells. It is likely that the immediate lysis of tissue described in this protocol is key to producing high-quality data by capturing chromatin state as quickly as possible. If the “direct lysis” method continues to give substandard libraries, we recommend the standard protocol of attaining a single-cell suspension of live cells, counting and attaining ~50,000 cells, then proceeding with the remainder of the protocol using 50 μL volumes of reagents.

  3. 3.

    Design experiments so multiple samples can be processed at the same time, e.g., a 0 hour post amputation (hpa) sample can be processed alongside a 6 hpa sample that was cut 6 h prior (Fig. 4).

  4. 4.

    We aim to include three biological replicates during each ATAC experiment, eventually choosing the best two samples based on the Tapestation trace to sequence. Due to the speed of the ATAC-seq protocol and depending on the timepoints desired, multiple samples of a regeneration time-course can be completed in a single day. We typically cut the appropriate number of animals at the beginning of the day, and process samples as different timepoints of regeneration (e.g., 0 hpa, 1 hpa, 3 hpa, 6 hpa). Avoid processing more than ~4 samples at a time to ensure the speed of the assay.

  5. 5.

    Due to the delicate nature of Hofstenia miamia tissue, we are able to attain single-cell suspensions in less than 30 s of gentle pipetting, which are simultaneously made accessible to the transposase by performing this step in lysis buffer. Depending on the organism or tissue being used, attaining a single-cell suspension or lysis may be more challenging, and thus, we suggest species-specific protocols to attain single-cell suspensions if necessary.

  6. 6.

    A variety of DNA purification kits are acceptable, including the Qiagen minelute kit. After adding the buffer in the first step of the cleanup protocol , the solution can be frozen at −20 °C for purification at another time.

  7. 7.

    In order to determine the correct number of PCR cycles to avoid saturation and PCR-induced artifacts, it is advisable to run the first PCR for five cycles and then subsequently perform a qPCR reaction. This protocol is provided in detail in [4]. We found that our libraries nearly always converged on 11 cycles as the optimal number, and thus adopted this number as part of the protocol . When starting ATAC-seq with a new system, the optimal number of PCR cycles should be explored using the qPCR instructions found in [4].

  8. 8.

    The automated electrophoresis tool is used (in this case) to view the sizes of nucleic acids present in an ATAC-seq library. In a properly tagmented library, the transposase enzyme will insert into histone linker regions, creating a “nucleosomal ladder” rolling landscape where peaks correspond to varying numbers of nucleosomes. A proper nucleosomal ladder is the best indicator of a successful ATAC-seq experiment, and examples of the most common traces of varying quality are shown in Fig. 2.

  9. 9.

    Work with sequencing facility or Illumina representative to pool different barcoded ATAC libraries at the proper molar ratios for the desired model of sequencer. If desired, molar ratios can be adjusted to skew the number of reads to a more desired library.

  10. 10.

    The ENCODE standard for acceptable libraries states that human paired-end ATAC-seq libraries must have 50 million non-duplicate, non-mitochondrial aligned reads (i.e., 25 million fragments). For footprinting, recommended sequencing depths are much higher (>200 million mapped reads), though we have found emerging footprinting software [18] can reliably detect footprints at substantially less sequencing depth, though this need to be empirically determined for the species of interest.

  11. 11.

    With a clean alignment file in hand, a number of useful downstream applications can be performed. These include peak-calling with programs such as Genrich or MAC2, which will call sets of peaks for each replicate that can be narrowed down to reproducible peaks. In order to identify dynamic regions of chromatin (e.g., between different stages of regeneration), two common tools are Diffbind [19] and csaw [20]. Motif-finding algorithms such as HOMER [21] or GimmeMotifs [22] using significant peaks, or assigning regions-to-nearest-gene using ChIPseeker [23] can be useful ways to define candidate regulators. In addition to scanning regions for motif matches, regions of the genome that are protected by bound TFs will leave “footprints” that can be determined bioinformatically (Fig. 1). We use TOBIAS [18] to make bound/unbound calls for all predicted sites of a particular TF in the genome, as well as create aggregate plots that show overall binding differences between timepoints. See Fig. 3 for a genome browser example of these types of data combined.

  12. 12.

    Depending on downstream analysis, removing multi-mapped reads and retaining only properly paired reads may be unnecessary or detrimental (too few reads retained). For instance, peak-calling with Genrich can utilize both multi-mapped and unpaired alignments in generating peak calls. We recommend checking the documentation of software for the desired downstream application to determine how best to handle these reads.

Fig. 4
figure 4

Expected results. (a) ~15 kb region of the Hofstenia genome that encompasses a gene, runt, showing ATAC data for 0 h and 6 h, along with with the consensus peakset, Diffbind-called regeneration peaks, and sites that are bound by the TF egr at 6 h by TOBIAS. There are two major regeneration-responsive peaks (shown in red), the first being the promoter, and the second in an intron. Both the regeneration peaks have “bound” sites for egr. (b) A second region of the Hofstenia genome, this time showing an intergenic regeneration-responsive peak upstream of a putative target gene