Introduction

The human body is made up of approximately 4 × 1013 cells, which need to work in coordination, both among themselves as well as with the 1014 cells constituting the microbiome, in order to perpetuate life (Trosko 2003). Each cell nucleus carries the genome, with hereditable information stored in DNA and in epigenetic components associated to it. Noteworthy is that the dimensions of the entire cell would not be sufficient to contain the DNA in a completely stretched form. The largest human chromosome is nearly 3000 times larger than the average-sized cell diameter. Therefore, efficient compaction of DNA is an essential prerequisite for cellular function, with maximal levels of DNA packaging reached during mitosis, when the chromosomes are compacted more than 10,000-fold on the linear scale (Belmont 2002).

Each cell needs to be in dynamic equilibrium between the demand to achieve a high degree of DNA packaging and the need to access its information for gene expression, DNA replication, repair, and recombination. In different cells or at different times of a cell’s life, different regions of the genome must be packed or released from constriction with high fidelity and in response to shifting needs of the system (Cavalli and Misteli 2013). In fact, it is becoming increasingly evident that chromatin organization within the three-dimensional nuclear space is itself a likely factor affecting gene regulation and the systemic control of expression of multiple gene loci. Local changes in chromatin conformation, such as those triggered by aberrant DNA methylation and histone modification, are an impetus for oncogenic transformation (Misteli 2010). Since local chromatin conformation is both influenced by- and influences higher-order chromatin packaging, errors in chromatin packaging may disturb cellular homeostasis.

As the effect of 3D chromatin organization on various biological processes became apparent, the study of the 3D organization of chromatin has grown to an intense area of research (Fig. 1). While the basic information regarding the nature of chromatin organization was revealed more than half a century ago, the discovery of structural chromosomal components and the description of how the information in 3D chromatin arrangement is regulated have been ongoing for the last 30 years.

Fig. 1
figure 1

Timeline showing the major steps that lead to the view of chromatin organization and ultimately to the development of chromosome conformation capture assays. The parallel evolution of different converging fields is indicated by colored arrows. The recent rapid development of methods to detect chromosome conformation is highlighted in the right part of the panel

The first insights into the nuclear organization came from microscopic observations, which revealed the presence of subnuclear structures like Cajal bodies and differentially stained interphase chromatin described as heterochromatin and euchromatin (Gall 2003; Heitz 1928). Further investigations suggested that individual chromosomes occupy specific regions within the nucleus, called chromosome territories (Cremer et al. 1982; Cremer and Cremer 2006). Later, high resolution in situ fluorescent hybridization assays allowed identifying the intermingling of chromosomes at the boundaries of chromosome territories (Branco and Pombo 2006). Furthermore, gene-poor regions were shown to localize close to the nuclear periphery, while gene-dense chromatin is more likely to be found at the center of nuclei (Cremer et al. 2003; Croft et al. 1999).

Over the last decade, significant improvements in microscopy techniques have led to the development of super-resolution microscopy approaches that allow describing chromosome components at a spatial resolution far beyond the one traditionally limiting confocal studies (Cattoni et al. 2015). Furthermore, a molecular biology-based approach to study chromatin organization in 3D space was developed by Dekker et al. (2002). This assay, called chromosome conformation capture (3C), laid the foundations for the development of a plethora of derivative methods that enable inferring of principles of 3D chromosome organization by describing the contacts made by each locus with the others (Fig. 1). The initial steps involving formaldehyde crosslinking of cell populations, restriction digestion, and proximity ligation constitute the backbone of most 3C-derived techniques (Fig. 2). As with the original 3C, all derivative assays are dependent on the interaction frequency of specific chromatin fragments and thus can theoretically provide the information at a resolution of few hundred base pairs or, roughly, at the nucleosomal level. In general, the proximity of interacting regions detected by 3C-based assays should be validated by microscopy (FISH—fluorescence in situ hybridization) assays. 3C-based assays have the advantage of investigating interacting regions over a population of cells at high resolution but do not account for cell to cell variations and do not provide information on the 3D localization of chromosome loci within the nucleus. On the other hand, FISH assays are hypothesis-driven, may be hindered by the high concentration of repeated DNA sequences at certain loci, and have a limited throughput. Thus, these two classes of assays complement each other.

Fig. 2
figure 2

Schematic drawing depicting basic chromosome conformation capture assays and their high throughput derivatives developed over past decade. All 3C-based assays start with formaldehyde crosslinked nuclei, followed by (except ChIA-PET) restriction digestion. After this point, each method differs in subsequent ligation, enrichment, and sequencing/hybridization-based detection steps, depending on the respective downstream goals. Main intermediates leading to various alternative methodologies such as 3C, 4C, 5C, Hi-C, and ChIA-PET are illustrated

In this review, we focus on the biochemical approaches for mapping chromatin interactions. We will focus on the technological advances made with subsequent improvements of the general approach and describe the biological insight that we gain from them. Although we will mention here various 3C basic steps and their pitfalls, we suggest readers to follow other reviews for insights into the deeper methodological aspects (Dekker 2006; Ferraiuolo et al. 2012; van de Werken et al. 2012a).

The birth and the maturation of 3C

The 3C assay was first reported by Dekker et al. in 2002 (Dekker et al. 2002). The assay estimates the contact frequencies between two chosen genomic sites in cell populations, allowing making inferences on the 3D organization of the points of interest (Fig. 3a). 3C has become the most frequently used method to demonstrate interactions between two unique loci. The first step of the method involves formaldehyde crosslinking of the cell population. The percentage of formaldehyde varies in different applications, for mammalian and yeast cell fixation with 1 % formaldehyde for 10 min is preferred, whereas up to 3 % for 30 min has been used for Drosophila cells, and 2 % for 5 min were used in Arabidopsis cells (Comet et al. 2011; Duan et al. 2012; Grob et al. 2013; Hagege et al. 2007), but we note that an extensive cross-sample comparison in order to find optimal conditions for a large spectrum of samples is still lacking. Crosslinking is followed by digestion of chromatin with a restriction enzyme. Both four-base cutters (such as Dpn II) and six-base cutters (such as Hind III) have been used (Comet et al. 2011; Dekker et al. 2002). For fine mapping of genomic interactions over short distances (a few kb to tens of kb), four-base cutters are preferred because they digest the genome much more frequently. The next step in the original 3C method is the ligation of restriction fragments under dilute conditions in order to encourage intramolecular rather than intermolecular ligation. It has been suggested that, under diluted conditions, solubilized fragmented chromatin undergoes intramolecular ligation in the soluble fraction of the mix. However, recent evidence suggests that the majority of chromatin remains in the insoluble fraction of the ligation mix, where it undergoes ligation, thus emphasizing the need to rethink the necessity of diluting ligation reactions (Comet et al. 2011; Gavrilov et al. 2013b; Nagano et al. 2015; Rao et al. 2014). The end product of this reaction is a one-dimensional template representing the local 3D environment at the point of crosslinking. This template is used for semiquantitative or quantitative PCR in order to describe short- and long-range chromatin interactions at the region of interest (Hagege et al. 2007). As other complex, multistep procedures, 3C requires a number of controls, which were described elegantly in a previous review (Dekker 2006).

Fig. 3
figure 3

a Key steps distinguishing 3C, 4C, and 5C assays. b Key steps of the Hi-C approach, which enables 3C to be extended to describe chromatin contacts genome-wide. c Modification of the 3C approach to precede ligation by immunoprecipitation of the chromatin components of interest, in order to obtain genome-wide information of chromatin contacts mediated by specific factors

As with all genomic assays, 3C gives an average interaction pattern of the population. Furthermore, 3C only informs about the relative contact frequencies, reflecting the proximity of two fragments in a 3D environment, but does not indicate the functional relevance of this proximity. Therefore, suitable functional assays are required in order to establish the functional relevance of 3C interactions.

Since inception, 3C has been used extensively to demonstrate the presence of chromatin loops that represent interactions between various regulatory regions in different cell types. Although 3C is a relative assay, the analysis of synthetic DNA standards allows the quantification of absolute interaction frequencies. Comet et al. performed a real-time quantitative 3C assay with customized sequence controls, which allowed showing the ability of chromatin insulators to modulate chromatin loops that Polycomb response elements establish in order to silence a downstream promoter (Comet et al. 2011). In a second report, Gavrilov et al. estimated actual contact frequencies between two strongly interacting points located at around 50 kb at the beta globin gene locus and reported them to be ∼1 % of the ligation events. The authors suggest that the reason for this somewhat low frequency could be technical or due to true low frequency of in vivo interaction events (Gavrilov et al. 2013a). Another explanation may be linked to the fact that most of the interactions in 3C occur within a few kilobases, on either side of the point of interest. Therefore, even if each of the events is only in the order of 1 %, the total number of contacts in the regions of interest corresponds to the sum of all events, which could raise the frequency by an order of magnitude or more. Although several 3C contacts were previously validated by DNA FISH, a recent report showed that the contact frequencies from conformation capture assays sometimes do not correspond to 3D proximity (Williamson et al. 2014), adding a note of caution concerning the interpretation of 3C results and suggesting that alternative approaches such as microscopy should generally be used as a complementary information that allows to better interpret 3C results. More examples of comparisons between various FISH methods as well as different flavors of “C” technologies will allow disentangling the apparent contradiction noted in this work.

4C: one-to-all approaches to capture genome-wide interactions made by a single locus

The 3C sample used as a template to evaluate the frequency of interaction between two loci actually contains a very large collection of all possible chromatin interactions in the genome, the only limitations being that they must be close (in the Å range), stable enough to be captured by formaldehyde crosslinking, and that the restriction enzyme being used can generate genomic fragments amenable to ligation. Therefore, the development of higher throughput (testing also for long-distance interactions, ranging up to several Mb) assays was a natural step in the field. Four methods were developed independently in order to capture genome-wide interactions made by a locus of interest. They were based on similar molecular principles and were all called 4C, although they differ in the meaning of the acronym and in some specific steps. These 4C methods are 3C on chip, open-ended 3C, circular 3C, and olfactory receptor 3C (Lomvardas et al. 2006; Simonis et al. 2006; Wurtele and Chartrand 2006; Zhao et al. 2006). The main differences in these methods concern the design developed to enquire the interactions of the region of interest (bait) with the putative interacting regions (interactors), which can be in cis (4C is used to study short- and long-range interactions) or in trans (on different chromosomes). In the last decade, 4C methods were used in several studies that allowed gaining insights into the relationship of chromatin structure with gene function (Simonis et al. 2006; Zhao et al. 2006). The first step in these methods is formaldehyde crosslinking. While 3C on chip (from now on called 4C, as it is the most popular variant of this family of approaches, see Fig. 3a) employs two rounds of restriction digestion and ligation, circular 3C uses a single restriction digestion, generally a four-base cutter, followed by ligation. The reason for using two-step digestions in 4C is due to the fact that digestion of the crosslinked sample might yield hairball-like aggregates which, upon ligation, produce large circular DNA fragments (van de Werken et al. 2012a). The second round of restriction in these methods trims these large molecules into small PCR amplifiable fragments (Fig. 3a). In contrast, the use of four-base cutters in circular 3C virtually eliminates this necessity. Postligation, both of these methods use inverse PCR with primers that are specific to the first restriction junctions for 4C and the sole restriction junction for circular 3C. The amplified product in 4C is hybridized to a custom-made chip, while amplicons from circular 3C have been analyzed both by microarray hybridization and high throughput sequencing. 4C has been later modified by several groups. This allowed, for instance, the detection of extensive interaction networks of polycomb-repressed Hox genes in Drosophila or of active mouse globin genes in erythroid tissues (Bantignies et al. 2011; Schoenfelder et al. 2010). In both studies, 3C templates were generated and then amplified with a single biotinylated primer specific to the bait region. This amplified single strand was then reverse amplified with an adaptor and the products hybridized to a chip for detection (Bantignies et al. 2011; Schoenfelder et al. 2010). These results highlight the functional compartmentalization of the genome, where active and inactive regions tend to cluster (Simonis et al. 2006). As a further refinement, the 4C method (3C on chip) was adapted for NGS and called 4C-Seq (van de Werken et al. 2012b), allowing to uncover important principles of genome structure and function during development, cell differentiation, and reprogramming (Andrey et al. 2013; Apostolou et al. 2013; Ghavi-Helm et al. 2014).

A many-to-many approach: chromosome conformation capture carbon copy (5C)

Most of the chromatin architectural features obtained by 3C and 4C are not sufficient for the deduction of general principles of chromosome organization, since both approaches study interaction features of a single preselected fragment (or of several ones when multiplexing is used to study up to tens of regions in parallel). Chromosome conformation capture carbon copy or 5C allows obtaining information on the contacts established by multiple genomic fragments in a large genomic region (Fig. 3a). This approach involves a ligation-mediated amplification followed by detection of the standard 3C library (Dostie et al. 2006). This approach can thus be described as a “many-to-many” strategy. Its initial steps are the same as in a regular 3C. The 3C products are incubated with a complex mix of oligos (forward sense primers and reverse 5′ phosphorylated antisense primers) that are designed to anneal, each one exactly at one of the restriction sites of the genomic region of interest. If the restriction digestion and ligation (3C) works efficiently, then two designed oligos would face each other at the ligation junction. Taq ligase is subsequently added to ligate the primers. The synthetic ligation product is then PCR amplified with the help of universal sequences incorporated in the primers (Ferraiuolo et al. 2012).

The experimental design of 5C is dictated by the questions being addressed by the experiment. In the original 5C study, Dostie et al. analyzed the human β-globin locus and a 100-kb gene desert region (Dostie et al. 2006; Ferraiuolo et al. 2012). To study the beta globin locus, they have used a 5C scheme where reverse primers were designed at the region of interest while other primers spanned the surrounding regions. This scheme is useful to interrogate interactions among different regulatory elements spread all over a megabase-sized region. However, some a priori information of the regulatory sites is needed for this scheme. To study the gene desert region, the authors have used an alternating 5C primer scheme, where the forward and reverse primers are placed alternatively within the region of interest. This allows investigating the general 3D architecture of the region of interest. A mixed scheme with both types of primer designs could also be used. Two web-based programs, 5CPrimer and my5C platform, are available for 5C-specific primer design (Fraser et al. 2009; Lajoie et al. 2009).

Therefore, 5C is an unbiased method allowing describing the 3D conformation of genomic regions up to few megabases in size. The unbiased nature of the 5C readout and the greater coverage area also enables tandem studies of complex relationship between 3D organization, epigenome, and transcriptome. 5C has been used to study human mitotic chromosomes and the 3D architecture of the Caulobacter crescentus genome (Naumova et al. 2013; Umbarger et al. 2011). It has also been used to study the organization of Hox clusters in human and mouse systems (Rousseau et al. 2014a; Wang et al. 2011). Employing 5C assays, Phillips-Cremins et al. have demonstrated that different classes of architectural proteins maintain constitutive and transient chromatin contacts in mouse embryonic stem cells (mESC) and mouse neural progenitor cells (Phillips-Cremins et al. 2013). This approach was also used to identify, in the X chromosome, the existence of topological-associated domains or TADs (Nora et al. 2012), which turned out to be a very general feature of genome organization (Dixon et al. 2012; Sexton and Cavalli 2015; Sexton et al. 2012). 5C also led to the deciphering of the HoxA conformation using a machine-learning approach, which allows classifying different leukemia cell subtypes (Fraser et al. 2009; Rousseau et al. 2014b).

These examples show the power of the 5C approach. However, like 3C, 5C is also hypothesis-driven, as some knowledge about the significance and the regulatory role of the region of interest is needed a priori.

Hi-C: all-to-all approaches to capture all interactions within the genome

It was the advent of the Hi-C assay by Liebermann-Aiden et al., which effectively pushed up the potential of 3C-based technology, being both unbiased and unsupervised (Lieberman-Aiden et al. 2009). This “all-to-all” assay allows one to identify interactions both in cis (>few Mb) and in trans simultaneously (Fig. 3b). Similar to other 3C-based assays, Hi-C follows the basic steps of 3C template generation, but it has a slightly modified ligation step. Following restriction digestion of crosslinked nuclear DNA, the (restriction enzyme-mediated) DNA overhangs are filled in with dNTPs, one of which is biotinylated. The resulting DNA blunt ends are subsequently ligated. Similar to the original 3C, in the initial protocol (dilution HiC), ligation is done in highly diluted conditions, with the aim of reducing spurious ligation of non-crosslinked molecules (Lieberman-Aiden et al. 2009). Recently however, a modification of the method introduced ligation in a small volume (in-situ HiC) (Nagano et al. 2013, 2015; Rao et al. 2014). This modification does not alter the interaction readout qualitatively, but it greatly increases the percentage of usable reads, improving the efficiency and resolution of the assay. This method was used to generate ultra-deep Hi-C maps for human cells that allow detecting contacts with up to 1 kb resolution. In situ Hi-C performs ligation in permeabilized nuclei, the rationale of this being that crosslinking freezes the chromatin within the nucleus, making highly unlikely that the molecules that were far apart within the nuclear space may actually come close enough to be efficiently ligated. Following ligation, DNA is purified and sequenced using Illumina paired-end sequencing, similar to regular Hi-C. In light of these improved results, one can expect such advances of in situ ligation over in solution ligation might also improve the efficiency of other C-based assays like 3C, 4C, and 5C.

Only 6 years after the first publication (Lieberman-Aiden et al. 2009), several modifications of the Hi-C approach have been reported. Together with the increasing next-generation sequencing power, the progressive refinement of these technologies has allowed to gain finer details of genome organization. While the first report identified the compartmentalization of human genome in two compartments (A (active) and B (inactive)) (Lieberman-Aiden et al. 2009), later work identified the widespread existence of TADs. TADs are regions ranging in size between few tens of kb to 3 Mb (Dixon et al. 2012; Nora et al. 2012; Sexton et al. 2012; Vietri Rudan et al. 2015). They are characterized by a much higher frequency of interactions among regions within the TAD than among TADs. Using a slightly modified protocol in Drosophila, where biotin filling was replaced with size selection of long products (∼800 bp), Sexton et al. showed that chromatin organization overlaps with epigenomic domains (Sexton et al. 2012). This initial characterization classified TADs into four main types, one represented by active chromatin and three more corresponding to silent chromatin enriched in Polycomb proteins, heterochromatin or without any specific mark (Sexton et al. 2012). More recently, in situ Hi-C further subdivided chromosomes into six compartments based on genomic interactions (Rao et al. 2014). Sexton et al. also showed that active chromatin tends to form more interchromosomal contacts than average, suggesting that active regions tend to locate closer to the periphery of chromosome territories (Sexton et al. 2012). This finding was also observed in human cells (Kalhor et al. 2012). Further studies revealed that TADs overlap with DNA replication domains, suggesting they are not only structural compartments but also are the regulatory compartments of the genome (Pope et al. 2014). Using very deep sequencing of Hi-C libraries, Jin et al. have shown that cell type specific chromatin loops are hard-wired in the genome, supporting the conclusion that many of the chromatin contacts are relatively stable to changes in the transcription (Jin et al. 2013). Superimposed to stable contacts, regulatory contacts play a significant role in gene expression (Sexton and Cavalli 2015), and future work will certainly be aimed at discriminating stable structural chromosomal contacts from gene regulatory contacts.

Chromosome conformation capture-based methodologies rely on nuclear material pooled from a large number of cells. This raises the concern that heterogeneity in 3D chromatin organization within the cell population will not be reflected in the final readout. The coupling of an in situ Hi-C assay to the analysis of ligated DNA from single cell nuclei allowed to partially circumvent this problem and to show that the megabase-scale TAD organization is not the result of a cell population averaging but a distinctive feature of each individual cell (Nagano et al. 2013). Further applications of this method are expected to shed light into the cell-to-cell variability of specific chromatin contact patterns in cell physiology and differentiation.

Technical limitations and further improvements in Hi-C technology

Due to its unbiased approach and the decreasing cost of next-generation sequencing, Hi-C has been widely used for the analysis of organizational principles of prokaryotic and eukaryotic genomes, of mitotic chromosome structure and for detection of conformational changes in human disease (Dixon et al. 2012; Le et al. 2013; Marbouty et al. 2015; McCord et al. 2013; Naumova et al. 2013). Hi-C is the method of choice when one is looking for changes at the TAD or supra-TAD level in chromatin organization. However, this method is not ideal for the study of changes at few or individual loci, as most of the sequencing reads will not concern the loci of interest, reducing the resolution and the statistical power of the resulting contact maps. In these cases, 4C-seq or quantitative 3C is better suited for capturing the dynamics of chromatin contacts at specific loci.

Hi-C is dependent on restriction enzymes for chromatin fragmentation. This introduces a bias, since restriction enzyme-cutting sites are heterogeneously distributed over the genome. This limitation, along with the fact that nucleosomes reduce the efficiency of restriction digestion, results in a spatial resolution limit of contact mapping resolution from around 1 kb or better (when using 4-bp restriction cutters) to about 4–10 kb (when using 6-bp cutters). To circumvent this issue, two alternatives were developed. In the first, restriction enzymes were replaced by DNase I in order to digest crosslinked chromatin in two human cell lines, resulting in a Hi-C map with a resolution up to 2 kb (Ma et al. 2015). Combining DNase I Hi-C output with DNA sequence capture, the authors dissected the 3D-interactome of lincRNA promoters in human H1 ES and K562 cells. Alternatively, Hsieh et al. used micrococcal digestion of crosslinked chromatin instead of restriction digestion. This allowed obtaining a Hi-C map at nucleosomal resolution. This modified protocol, called Micro C (Hsieh et al. 2015), is very useful to study chromatin interaction over short spans and allowed the identification of previously unknown self-associating chromatin domains of small size (2–10 kb) in budding yeast, which may be seen as analogous to mammalian TADs. However, due to the huge combinatorial number of possible ligation products, this method seems still too costly for the study of bigger genomes.

Dilution Hi-C in general have a relatively poor signal to noise ratio, which is due to random intermolecular ligation in dilute ligation conditions (not in in situ Hi-C), which further deteriorates in experiments using four-base cutters. Kahlor et al. developed a modified Hi-C approach called tethered chromatin capture (TCC) (Kalhor et al. 2012). Here, the crosslinked, restriction-digested and protein-biotinylated chromatin fragments are captured on a streptavidin surface, on which biotin filling at the ends of restriction endonuclease fragments and ligation are performed. The authors reported an improvement of the signal to noise ratio using this variation of the basic Hi-C scheme, and data analysis revealed increased interchromosomal interactions among regions in the active genome compartment relative to that in the inactive compartment. This increase in performance might be due to the compartmentalization of the ligation process, which is a common theme in in situ Hi-C methodologies.

While Hi-C is the method of choice for deducing general principles of chromatin folding, the number of paired-end sequences scales quadratically with increasing genome size, effectively prohibiting very high resolution mapping of chromatin contacts in the case of large genomes. A clever approach enabling the generation of high-resolution maps of a subfraction of the genome in these cases involves the enrichment of specific regions of interest out of the Hi-C library. A number of such modifications have been published (Fig. 3b), called Capture C, Capture Hi-C (CHi-C), HiCap, and targeted chromatin capture (T2C) (Dryden et al. 2014; Hughes et al. 2014; Jager et al. 2015; Kolovos et al. 2014; Mifsud et al. 2015; Sahlen et al. 2015; Schoenfelder et al. 2015). Most of these assays are based on hybridization of sequencing adaptor-ligated 3C (Capture C) and Hi-C samples (CHi-C and HiCap) respectively, followed by pull-down using biotinylated RNA molecules (120 nucleotides). In terms of detecting genuine chromosomal interactions, CHi-C and HiCap performed better than Capture C did, which has been suggested to depend on more effective capture of the genuine interactions in the Hi-C method over regular 3C (Sahlen et al. 2015; Schoenfelder et al. 2015). Sahlen et al. has further shown that Capture C-enriched fragments are strongly enriched for unligated fragments (>1 kb apart) (Sahlen et al. 2015). On the other hand, T2C follows a 4C-like methodology of two restriction digestions followed by adaptor ligation and pull-down, either with oligos or on an array. Following pull-down, the samples were PE sequenced. For bioinformatic analysis of CHi-C data, Schoenfelder et al. and Mifsud et al. have used an analytical pipeline called GOTHiC (Schoenfelder et al. 2015).

These capture Hi-C methods combine the advantages of 4C/5C methods (high resolution at a relatively low sequencing depth) with those of the Hi-C methodology, producing genome-wide, unbiased contact data, for the genomic regions of interest. Depending on the objective of the experiment, one should carefully choose the C method to be used. 5C methods might fare better if one needs to decipher the chromatin structure at ultra-high resolution at a defined locus (few 100 kb). However, in cases where the goal is to get a high resolution global view of the interactions made by region of choice (s), capture Hi-C might fare better. Capture Hi-C assays capture interactions both within the target region(s) and between the target region(s) and the rest of the genome (a many-to-all approach). Therefore, this assay can provide richer data from a biological perspective, at the expense of some loss of spatial resolution. Furthermore, for technical reasons in Capture Hi-C, it is important to analyze contacts made within the captured region separately from contacts between any point within the captured region and any other genomic region.

ChIA-PET: capture chromatin interactions in the context of chromatin interacting proteins

Chromatin has a tripartite composition, including DNA, RNA, and proteins. However, 3C-based methods only provide information on the DNA interactome, irrespective of which protein or RNA components may mediate chromatin interactions. The first effort to identify the protein component of a chromatin interactome module was made by Horike and coworkers, who pulled down MeCP2-bound chromatin by ChIP followed by proximity ligation and detection, allowing them to show a role of MeCP2 in silent chromatin looping in Rett Syndrome (Horike et al. 2005). Another method called 6C employed restriction-digested chromatin in immunoprecipitation reactions followed by ligation in dilute conditions and cloning of the ligated DNAs in bacteria. The clones were then selected for specific 3C products and sequenced (Tiwari et al. 2008).

Compared to these early attempts, a significant advancement in the field was the introduction of ChIA-PET (Fig. 3c), in which four strategies were combined, namely, ChIP, proximity ligation, paired-end ditag generation (PET), and next-generation sequencing (Fullwood et al. 2009). Like in ChIP, the procedure starts with formaldehyde crosslinking, cell lysis, sonication, and chromatin immunoprecipitation using the antibodies of choice. Immunoprecipitated chromatin is then divided into two aliquots. To each aliquot, a biotinylated half linker (A or B) containing a MmeI (type IIS restriction enzyme) site is added. The two aliquots are then pooled together and proximity-ligation is performed under dilute conditions. The two different half linkers are to ensure the ability to quantify intramolecular ligations (molecules sharing the same half linkers) over intermolecular ligation (molecules with different half linkers). Although the authors suggested the occurrence of chimeric reads (spurious ligation products—AB) to be 39 % and genuine ligations (AA and BB) to be 61 %, care must be taken in analyzing AA and BB pairs, which, in addition to genuine ligations, certainly contain spurious reads as well. Indeed, among these AA and BB ligation products, the same (i.e., 39 %) fraction as that of AB products is likely to reflect chimeric ligations. Following purification and MmeI digestion, the fragments are sequenced to obtain ditag sequences (∼18 bp). The mapping of these ditags to the reference genome reflects the two putative interacting positions where the protein of interest is binding.

To overcome the pitfalls of the original ChIA-PET, a modified protocol called advanced or long read ChIA-PET was published recently (Tang et al. 2015). This method replaces the two separate biotinylation reactions with two half linkers with a single biotinylated linker ligation. The decrosslinked and purified DNA is then fragmented and adaptors are ligated using Tn5 transposase in a single step. The DNA is then PCR amplified and sequenced. The advanced ChIA-PET is an ideal method for comprehensive 3D genome mapping as it gives three types of output, all useful in different contexts. The first type includes self-ligation PET data that can be used to identify protein of interest binding sites (ChIP). The second contains clustered interligation PET data which are true ChIA-PET results originating from interactions mediated by ChIP-targeted protein factor. The third are the singleton interligation data where one read comes from a ChIP peak and the second read comes from any genomic region interacting with it (Tang et al. 2015).

Although ChIA-PET identifies both the proteins and the DNA present at given loci, it does not allow concluding whether the protein is the cause of the interactions or not. Furthermore, ChIA-PET results have to be verified with independent 3C and ChIP-PCR. Further functional assays like knockdown of the protein should be used to validate the functional aspect of protein binding. As ChIA-PET is the joint product of ChIP and Hi-C methodology, it allows the identification of candidate roles for a protein factor in the regulation of 3D chromatin folding and regulation. Using this assay, Fullwood et al. mapped the interactome of ERα in human cancer cells, suggesting the role of ERα in looping the binding sites to the site of transcription (Fullwood et al. 2009). Furthermore, advanced ChIA-PET allowed Tang et al. to show the effect of haplotype variants on chromatin topology and transcription (Tang et al. 2015). Similarly mapping RNA Pol II (human cancer cells) and CTCF (murine pluripotent cells) via ChIA-PET methodology identified the role of transcription and chromatin boundaries in the compartmentalization of the human genome into distinct domains (Handoko et al. 2011; Sandhu et al. 2012).

Future perspectives

Thanks to the development of various 3C-based assays (see Table 1 for a summary view of the current variant of these approaches) and to the progressive reduction of sequencing costs, there have been tremendous gains in the spatial resolution of the chromatin contact maps. However, the results obtained by 3C-based methods should be verified by other molecular biology approaches and by microscopy approaches such as DNA FISH. Apart from the need for validation, another major caveat that calls for special attention is the risk of artifacts in data interpretation due to incorrect normalization of sequencing data, both in Hi-C and in ChIA-PET type of approaches. Addressing this issue requires the development of dedicated bioinformatic pipelines to analyze and compare these datasets. Although the Human Epigenome Browser at Washington University (http://epigenomegateway.wustl.edu/browser/) from the NCBI roadmap to Epigenomics consortium (Zhou et al. 2013), the 3dgenome browser (http://www.3dgenome.org), and the Juicebox tool (http://www.aidenlab.org/juicebox/) from the Aiden lab are freely available, the scarcity of user-friendly interfaces for analysis and visualization of chromatin interaction data is still a severe obstacle to obtaining meaningful results from these assays. Therefore, the development of analytical methods for these technologies (Imakaev et al. 2012; Walter et al. 2014) is particularly important. In particular, reductions in sequencing costs result in datasets of ever-increasing size and the sheer data storage, manipulation, and use in complex computations is proving to be a daunting task for many labs in the field. This may further increase in the future, due to the need to implement these data into other types of genomics resources. For instance, one of the large endeavors in genomic research in the past decade was the surge of GWAS studies in order to identify candidate genes for various human diseases. Most of the top hits from these studies fall in the noncoding part of the genome (Maurano et al. 2012). By and large, the functional role for these hits could not be understood. It is likely that genetic variants outside coding regions play a regulatory role, but the target genes of these variants are difficult to identify, in particular when the location of the hit is very far away from all neighboring genes. We possibly need to combine the wealth of genomic data (SNP, CNV, etc.) that are already available to chromatin interaction data in order to identify their putative target genes, which can then be validated by analyzing chromatin changes in the presence of these genomic aberrations. Integrative approaches involving epigenetic, genomic, and chromatin conformation analyses are beginning to deliver exciting results (Jager et al. 2015), and their widespread application may lead to the understanding of hidden principles of genome function and regulation, both in normal conditions and in disease. Finally, all the present methods capture bipartite interactions. However, multipartite interactions certainly exist in vivo, and the development of methods allowing capturing and analysing them will be a key advance in the field.

Table 1 Chromosome conformation capture and its derivative assays