SNP-ChIP: a versatile and tag-free method to quantify changes in protein binding across the genome
Chromatin-immunoprecipitation followed by sequencing (ChIP-seq) is the method of choice for mapping genome-wide binding of chromatin-associated factors. However, broadly applicable methods for between-sample comparisons are lacking.
Here, we introduce SNP-ChIP, a method that leverages small-scale intra-species polymorphisms, mainly SNPs, for quantitative spike-in normalization of ChIP-seq results. Sourcing spike-in material from the same species ensures antibody cross-reactivity and physiological coherence, thereby eliminating two central limitations of traditional spike-in approaches. We show that SNP-ChIP is robust to changes in sequencing depth and spike-in proportions, and reliably identifies changes in overall protein levels, irrespective of changes in binding distribution. Application of SNP-ChIP to test cases from budding yeast meiosis allowed discovery of novel regulators of the chromosomal protein Red1 and quantitative analysis of the DNA-damage associated histone modification γ-H2AX.
SNP-ChIP is fully compatible with the intra-species diversity of humans and most model organisms and thus offers a general method for normalizing ChIP-seq results.
KeywordsChromatin immunoprecipitation ChIP-seq Spike-in Normalization Chromosomal proteins Post-translational modification Meiosis S. cerevisiae
Buffered yeast extract/tryptone/acetate
ChIP followed by quantitative PCR
ChIP followed by next-generation DNA sequencing
DNA double-strand break
kilo base pairs
Optical density at 600 nm
Polymerase chain reaction
Next-generation DNA sequencing of RNA
Standard error of the mean
Sequence (fragment) pileup per million reads
Histone 2A phosphorylated on serine 129
Chromatin immunoprecipitation followed by DNA sequencing (ChIP-seq) has emerged as the method of choice for mapping the genome-wide distribution of proteins and protein modifications and has led to important discoveries in both basic chromatin biology and disease states [1, 2]. A core result of ChIP-seq experiments is the generation of genome-wide target signal tracks, which are obtained from read pileups, typically normalized against a mock, non-immunoprecipitated control sample (input sample). Signal tracks are used for identification of regions with elevated numbers of mapped reads (peaks) as well as other downstream analyses . However, because of the necessary internal normalization procedures, signal tracks can only be used for comparisons between samples if a method for inter-sample normalization is available . This is a crucial, often overlooked, caveat of ChIP-seq, as well as other genome-wide biochemical analysis methods relying on next-generation sequencing .
For sparsely bound proteins, such as transcription factors, inter-sample normalization can often be achieved using statistical methods  or ChIP followed by real-time quantitative PCR (ChIP-qPCR) . These methods, however, either assume constant global signal or a constant signal at selected genes as basis for normalization, which is difficult to verify, in particular for more broadly distributed factors. The solution to overcome this limitation is the addition of a “spike-in” reference sample [2, 6]. The spike-in procedure consists of adding a constant amount of exogenous material to all tested samples, ideally prior to any critical steps in the experimental protocol. Provided that the spike-in material contains a target that is bound by the antibody as efficiently as the study target and that the resulting sequencing reads can be distinguished from the test sample, the number of spike-in reads should be the same across all tested samples. The spike-in thus functions as an internal control against which to normalize the ChIP-seq results . Spike-ins are well established for RNA-seq analyses where use of RNA from a different species allows simple sequence-based distinction between test sample and spike-in . The additional requirement for cross-reactivity of the antibody in ChIP-seq experiments, however, effectively restricts the applicability of inter-species spike-ins to a limited set of highly conserved proteins. For example, one previous work targeted subunits of RNA polymerases II and III in mouse chromatin and spiked with human chromatin . To ensure cross-reactivity, both antibodies were raised against peptides that are 100% conserved between mouse and human. Another study successfully measured global changes in post-translational demethylation of lysine 79 of histone H3 in human cells, using a Drosophila melanogaster cell spike-in .
Ways to broaden the applicability of ChIP spike-ins include either tagging proteins in the test and spike-in samples with a common epitope , or using a second, spike-in specific antibody against a natural  or a synthetic target . These strategies, however, come with their own specific drawbacks. The use of protein tagging adds the potential for prohibitive disruption of protein function and is incompatible with the analysis of protein modifications. The use of a second, spike-in specific antibody, on the other hand, requires labor-intensive technical validation of the compatibility of the second antibody and no longer controls for biases in the immunoprecipitation step between samples.
Here, we show that these issues can largely be overcome by using spike-in material from the same species. This approach, which we name SNP-ChIP, enables reproducible semi-quantitative measurement of global protein levels and also works for protein modifications and fast evolving proteins.
Experimental rationale of SNP-ChIP
SNP-ChIP of a rapidly evolving chromosomal protein
To test the utility of intra-species spike-ins, we turned to chromatin analyses in yeast. We specifically focused on chromosomes in meiosis because this process involves many broadly distributed chromosomal proteins and post-translational modifications. One typical example is the axial-element protein Red1, which plays important roles in meiotic recombination. Red1 is broadly bound along chromosomes [13, 14, 15] but, like other meiotic factors, its sequence has diverged even in closely related species . Furthermore, like many proteins, Red1 cannot easily be tagged without disrupting protein function [15, 17]. These attributes mean Red1 is not amenable to standard spike-in approaches, making it a particularly suitable target for SNP-ChIP. Moreover, mutations that change the overall levels and chromosomal distribution of Red1 are available [15, 18, 19], providing benchmarks for evaluating the efficacy of SNP-ChIP.
SNP-ChIP of Red1 was performed using the SK1 genetic background  as test strain and a meiosis-optimized variant of the widely used S288c reference strain as spike-in [21, 22]. For both genetic backgrounds, high-quality end-to-end genome assemblies are available . These assemblies differ by about 76,000 SNPs, spaced at an overall median distance of 70 bp (Additional file 1: Figure S1a) consistently across all chromosomes (Additional file 1: Figure S1b), which constitutes enough variation to allow unambiguous assignment of a large proportion of sequencing reads. To perform SNP-ChIP, test cells (SK1) were mixed with a constant fraction of meiotic spike-in cells (S288c) before subjecting the mixtures to a standard ChIP-seq protocol. The generated reads were aligned to a hybrid genome built by concatenating genome assemblies of the test and spike-in genomes. Reads were aligned with perfect match conditions, excluding any reads aligning to more than one location. Consequently, any reads overlapping at least one SNP were assigned to a specific genome and genomic location, while reads not overlapping a polymorphism mapped to both genomes and were thus discarded.
We initially investigated the ability of SNP-ChIP to detect changes in chromatin association resulting from reduced protein production. The red1ycs4S allele is caused by a mutation in the promoter of RED1 that leads to a reduction of Red1 levels to about 20–25% of wild type and a near complete loss of cytologically observable axial elements . Importantly, traditional ChIP-seq analysis was unable to detect this change in protein abundance and produced indistinguishable Red1 profiles between wild type and red1ycs4S mutants . By contrast, when we applied SNP-ChIP to compare these two strains, the reduced Red1 binding levels were readily apparent (Fig. 1b). Calculation of a spike-in normalization factor based on the relative abundance of total sample and spike-in reads yielded a Red1 level in the red1ycs4S mutant of 28.8 ± 5.1% (S.D.) of the wild type, closely matching the reported change in Red1 levels obtained from western analysis . This normalization factor allowed appropriate signal scaling of ChIP-seq profiles for the two conditions (Fig. 1c).
SNP-ChIP was further validated by applying it to a Red1 dosage series, which consists of different combinations of RED1 alleles (RED1, red1ycs4S, red1Δ) yielding a stepwise decrease in Red1 levels (Fig. 1d) . SNP-ChIP measurements of Red1 chromatin association in this series again closely matched previously published protein levels (Fig. 1d). In fact, SNP-ChIP measurements appeared more accurate than quantitative western analysis, which failed to resolve the expected reduction in protein levels between RED1/red1ycs4S and RED1/red1Δ cells . Taken together, these data show that SNP-ChIP accurately measures reductions in global Red1 binding over a wide range of target protein levels.
SNP-ChIP is robust to variation in sequencing depth and fraction of spike-in cells
Another condition that may affect the results of SNP-ChIP is the amount of spike-in material added to the samples. Spike-in normalization methods assume a linear relationship between the amount of spike-in material and the resulting proportion of spike-in reads in the immunoprecipitated sample. This condition is essential for the results to be independent of the amount of spike-in material. To verify this assumption, we prepared samples with spike-in cell proportions ranging from 5 to 30%. As test samples we used wild type and a strain with a single red1-pG162A promoter mutation that phenocopies the red1ycs4S allele. While red1ycs4S contains an introgressed genomic region with dozens of SNPs surrounding the RED1 locus, the red1-pG162A mutant was engineered to carry only the specific mutation responsible for the reduction in Red1 levels . As shown in Fig. 2c, the proportion of spike-in reads in the input samples (reflecting the amount of spike-in material added to the test sample) correlates linearly with the resulting proportion of spike-in reads in the immunoprecipitated sample, for both the wild type and the red1-pG162A sample. Moreover, the red1-pG162A sample yielded a very similar Red1 amount to the red1ycs4S allele (28.8% versus 28.1% of wild type, respectively, when using 20% of spike-in cells), further supporting the robustness of the method. Low spike-in cell percentages (5 and 10%) resulted in somewhat increased estimates of the Red1 amount (Fig. 2d), likely due to increased noise. These results suggest that spike-in material proportions of 15% and higher are appropriate for SNP-ChIP. All other experiments shown here used a spike-in proportion of 20%.
Finally, we investigated the impact of the calculation method to compute the spike-in normalization factor. The SNP-ChIP normalization factor calculated in the examples shown so far relies on total read counts aligned to the test and the spike-in genomes. An alternative method is to compute the scalar mean value of the aligned read pileup score. We tested the utility of this alternative by calculating the pileup score at (1) all genomic positions, (2) at SNP positions only, or (3) at SNP positions falling within called signal peaks (see Methods section). The last approach will effectively exclude regions expected to hold only background signal, along with any false negative regions. We found very similar values and high concordance between all four methods in all cases (Additional file 2: Figure S2a), although read pileups consistently produce slightly lower values than the read count method (Additional file 2: Figure S2b). Overall, however, the difference is relatively small and we believe the read count-based method, which is computationally much simpler, represents an appropriate approximation, at least for broadly distributed proteins.
Binding profiles obtained directly from SNP-ChIP experiments
The primary utility of SNP-ChIP is the generation of a normalization factor that allows scaling of profiles obtained by traditional ChIP-seq experiments run under the same conditions (Fig. 1c). Given the broad distribution of SNPs across the two analyzed genomes, we explored the possibility that SNP-ChIP could also directly yield informative binding profiles, even though this application is clearly limited by the available SNP density. Comparing a sample sequenced with spike-in to data obtained using a replicate, non-spiked sample  shows that signal tracks of spiked samples closely mirror those of the non-spiked control, although some signal gaps can be seen in the spiked sample (Additional file 3: Figure S3a; examples indicated by the red arrows). Thus, as expected, the use of same-species spike-in causes some loss of information. This issue appears negligible for broad peaks, as called peaks show a very close agreement (Additional file 3: Figure S3b). Narrow peaks show more disagreement, with only about one third of the called peaks overlapping between the two samples. These data indicate that SNP-ChIP can also provide direct information about protein distribution, in particular for larger-scale binding patterns.
Global Red1 levels are reduced in cohesin and hop1∆ mutants
Hop1 is another important protein of the yeast axial element  that physically interacts with Red1 . Axial-element proteins are recruited in higher amounts to small chromosomes, but in the absence of Hop1, Red1 binding becomes less dependent on chromosome size . Previous work using in silico scaling , suggested that this reduction resulted from a selective increase in Red1 recruitment to large chromosomes . That scaling approach, however, requires the definition of genomic regions that are unbound, which is difficult to ascertain with broadly distributed chromosomal proteins like Red1. Therefore, we reinvestigated this question by performing SNP-ChIP of Red1 in a hop1Δ mutant. SNP-ChIP reproduced the previously found weakening of chromosome-size bias. However, the spike-in normalization factor showed an overall decrease of Red1 recruitment to 71.9 ± 4.2% of the wild type Red1 amount (Fig. 3c, d). This decrease is stronger on small chromosomes (Fig. 3e). We note that mild loss of Red1 binding does not generally result in a loss of chromosome-size bias, because deletion of the histone methyltransferases Set1 and Dot1 causes similar ~ 20% reductions of overall Red1 recruitment levels but does not affect the distribution of Red1 binding among chromosomes (Additional file 4: Figure S4). These data suggest that loss of Hop1 leads to a general reduction of Red1 signal across all chromosomes that particularly affects the three smallest chromosomes.
γ-H2AX levels do not change in Red1 dosage strain series
To test if SNP-ChIP also allows quantitative analyses of protein modifications, we targeted phosphorylation of histone H2A on serine 129 (γ-H2AX). This modification is rapidly induced following the formation of DNA double-strand breaks (DSBs) . In mitotic yeast, the γ-H2AX modification spreads about 50 kb on either side of a DSB [27, 28]. In addition, constitutive γ-H2AX is found near telomeres throughout the cell cycle . To analyze the distribution and DSB dependence of γ-H2AX in meiosis, we performed SNP-ChIP in a wild type strain, as well as the Red1 dosage series, which shows a mild (up to 30%) reduction in DSB levels , and a spo11-Y135F mutant, encoding a catalytically dead Spo11 protein, which does not form meiotic DSBs [30, 31].
Our data show that small-scale intra-species genetic polymorphisms can be leveraged for quantitative spike-in normalization of ChIP-seq results. Sourcing spike-in material from the same species largely preserves antibody cross-reactivity and thus will work with virtually any target in an organism’s proteome without the need for epitope tagging. It also ensures complete physiological coherence between the test and the spike-in cells, thereby avoiding biases at experimental steps such as chromatin fixation or cell lysis.
The primary output of SNP-ChIP is a normalization factor that can be used to appropriately scale ChIP-seq profiles. Because the normalization factor relies on combined measurements of thousands of SNPs it is highly robust to variations in sequencing depth or changes in protein distribution between samples. In multiplexed libraries, SNP-ChIP can therefore be performed with relatively low sequencing coverage alongside traditional ChIP-seq experiments to yield the necessary scaling information.
SNP-ChIP can also provide substantial positional information, although this application is necessarily limited by the availability of high-confidence SNPs. Our experiments using yeast strains with ~ 0.7% sequence divergence and 100-nt long reads showed that the method generated sufficient resolution to recover genomic regions of Red1 enrichment. Moreover, preliminary experiments indicate that using longer reads further minimizes gaps (data not shown). Thus, SNP-ChIP can provide high-quality pilot information for subsequent ChIP-seq analyses at higher read depth.
The reliance on thousands of SNPs also means that SNP-ChIP will be particularly powerful for the quantitative analysis of broadly distributed proteins and chromatin marks. Applying SNP-ChIP to proteins that interact with chromatin in more specific, highly localized positions (e.g. transcription factors), will likely result in a disproportionate number of SNPs exhibiting background signal that will affect the calculation of the normalization factor. Indeed, preliminary experiments testing the budding yeast transcription factor Gal4 suggested that SNP-ChIP is not ready to handle such targets. While SNP-ChIP generated reliable signal track data, the normalization factor computation method does not work as-is and failed to detect differences in overall Gal4 binding (data not shown). SNP-ChIP would thus require further development to be usable with sparsely binding proteins. We note, however, that these are inherently more tractable targets for ChIP-qPCR, thus reducing the need for a spike-in method.
We conclude that SNP-ChIP provides a versatile method for normalizing the ChIP-seq results of broadly distributed chromosomal proteins and post-translational modifications. SNP-ChIP is fully compatible with the intra-species genetic diversity of humans and most model organisms  and should be applicable to any experimental system for which a reliable collection of high-quality SNPs is available. In preliminary in silico experiments testing decreasing numbers of SNPs, the method generated stable normalization factors with as low as 0.01% sequence divergence (equivalent to about 1200 SNPs in the yeast genome; data not shown). Thus, we expect that SNP-ChIP will allow semi-quantitative mapping of a wide range of chromatin binding factors and modifications that have so far stood beyond the reach of quantitative ChIP-seq methods.
Best practice for SNP-ChIP
The chief prerequisite for successful SNP-ChIP normalization is the availability of high-quality genome assemblies for two different strains or cell lines of the same species, as well as a ChIP-grade antibody against the ChIP target. For optimal results, we recommend using a minimum of 15% of spike-in material and at least 100-bp sequencing read length. Using longer reads will increase the proportion of assigned reads and minimize signal gaps. In general, SNP-ChIP should be used alongside traditional ChIP-seq experiments. This setup retains the maximal spatial resolution provided by ChIP-seq while providing the necessary scaling factor for quantitative comparisons between samples. Relatively shallow sequencing coverage of the SNP-ChIP sample is sufficient for this purpose. In addition, SNP-ChIP also serves as a stand-alone method for exploratory purposes that do not require < 100 bp resolution.
Strains and meiotic time courses
All strains used are listed in Additional file 5: Table S1. The test-sample strains were of the SK1 background. The spike-in material used a meiosis-optimized S288c strain that carries three SK1-derived SNPs, which improve sporulation efficiency and meiotic synchrony of S288c . To further improve synchrony of the spike-in strain, auxotrophic markers were restored using plasmid insertions or PCR-based allele transfer. To induce meiosis, cells were pregrown in YPD for 24 h at room temperature, followed by inoculation in BYTA media at O.D.600 = 0.3 and growth for 16.5 h at 30 °C . Cells were washed twice with water and inoculated at O.D.600 = 1.9 in 0.3% potassium acetate (pH 7.0) to induce meiotic entry. Synchronous entry was confirmed by taking hourly samples for flow cytometry analysis of DNA content.
SNP-ChIP sample preparation
Samples were collected at 3 h for SK1 strains or at 6 h for the slower sporulating S288c spike-in sample. Cells were fixed in 1% formaldehyde for 30 min at room temperature and quenched by addition of glycine to a final concentration of 125 mM. For the experiments shown here, we fixed the spike-in cells in advance as a batch and kept frozen aliquots at − 80 °C. However, spike-in cells can also be prepared simultaneously with the sample cells. The number of cells in each sample was determined by counting on a hemocytometer. Unless indicated otherwise, cells from the test sample (SK1) were mixed with cells from the spike-in sample (S288c) at a ratio of 80%:20% before cell lysis and ChIP.
Chromatin immunoprecipitation (ChIP) and Illumina sequencing
ChIP was performed as described previously . Samples were immunoprecipitated with 2 μl anti-Red1 serum (Lot#16440, kind gift of N. Hollingsworth) or 2 μl anti-phospho-H2A-S129 antibody (Abcam #ab15083) per sample. Library preparation was performed as described . Library quality was confirmed by Qubit HS assay and 2200 Tape Station. 100-bp single-end sequencing was performed on an Illumina NextSeq 500 instrument.
The generated reads were aligned to a hybrid genome built by concatenating recently published high-quality genome assemblies of the test and spike-in reference genomes (SK1 and S288c) . Reads were aligned with perfect match conditions and excluding any reads aligning to more than one location. Normalization of read density was completed as described . Where indicated, peaks of enrichment were called using MACS-2.1.0. Plots show an average of two replicates. To evaluate coverage relative to standard ChIP-seq profiles, we compared SNP-ChIP results to published datasets GSE69232  and GSE87060 .
Calculation of the spike-in normalization factor
To obtain spike-in-normalized conditions, each condition is multiplied by the respective normalization factor value Nf. The extent to which QChIP differs from QInput in each experimental condition is determined by the amount of target protein and how much that differs from the amount of target protein in the spike-in. Since the latter is constant across all tested conditions, the result of the normalization is a semi-quantitative measure of the target protein amounts, yielding normalized conditions that can be compared directly to each other.
We thank F. Winston for sharing strains and N. Hollingsworth for sharing antibodies. We are grateful to S. Ercan and S. Keeney for helpful discussions and the NYU Department of Biology Sequencing Core for technical assistance and debarcoding.
This work was supported in part by NIH grants R01 GM111715 and R01 GM123035 and research grant FY16–208 from the March of Dimes Foundation to A.H. The funders had no role in the design of the study, the collection, analysis and interpretation of data, or the writing of the manuscript.
Availability of data and materials
Data sets have been deposited in NCBI’s Gene Expression Omnibus and are accessible through GEO Series accession number GSE115092. Code used for data analysis and producing figures is available on Github (https://github.com/hochwagenlab/SNP-ChIP).
L.A.V.-S. and A.H. conceived of the study. T.E.M produced key resources. L.A.V.-S. conducted the experiments. L.A.V.-S. and A.H. analyzed the data and wrote the manuscript with input from T.E.M. All authors read and approved the final manuscript.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 15.Sun X, Huang L, Markowitz TE, Blitzblau HG, Chen D, Klein F, et al. Transcription dynamically patterns the meiotic chromosome-axis interface. eLife. 2015;4:e07424.Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.