Cells
H9 (WA09; WiCell) human ES cells were maintained using Pluripro fully defined media and matrix (Cell Guidance Systems). Approximately 50 million cells (at passage 56) were harvested with Accutase (Life Technologies), suspended in Pluripro media and directly processed for fixation.
Mouse foetal livers were dissected from C57BL/6 mouse embryos at day 14.5 (E14.5) of development. Foetal liver cells were suspended in DMEM (Dulbecco's modified Eagle minimal essential medium; Life Technologies) supplemented with 10 % foetal bovine serum, filtered through a cell strainer (70 μm) and directly fixed by addition of formaldehyde.
Hi-C
Except for the ligation step, Hi-C was performed essentially as described in Lieberman-Aiden et al. [15], with some modifications.
Thirty to 50 million cells were fixed in 2 % formaldehyde for 10 min, quenched with 0.125 M glycine, spun down (400 × g, 5 min) and washed once with phosphate-buffered saline. The cells were incubated in 50 ml permeabilization buffer (10 mM Tris–HCl pH 8, 10 mM NaCl, 0.2 % Igepal CA-630, Complete EDTA-free protease inhibitor cocktail [Roche]) for 30 min on ice with occasional agitation, spun down (650 × g, 5 min, 4 °C), and the cell pellets were resuspended in 358 μl of 1.25× NEBuffer2 (NEB) per 5 million cell aliquot. We added 11 μl of 10 % SDS to each aliquot, followed by an incubation at 37 °C for 60 min with continuous agitation (950 rpm). To quench the SDS, 75 μl of 10 % Triton X-100 was then added per aliquot, followed by an incubation at 37 °C for 60 min with continuous agitation (950 rpm). To digest chromatin, 1500 U of HindIII (NEB) was added per aliquot and incubated at 37 °C overnight with continuous agitation (950 rpm). After digestion, restriction sites were filled in with Klenow (NEB) in the presence of biotin-14-dATP (Life Technologies), dCTP, dGTP and dTTP (all 30 μM) for 60 min at 37 °C.
For in-solution ligation, 86 μl of 10 % SDS was added per aliquot and incubated at 65 °C for 30 min with continuous agitation (950 rpm), followed by addition of 7.61 ml of ligation mix (745 μl of 10 % Triton X-100, 820 μl of 10× T4 DNA ligase reaction buffer [NEB], 82 μl of 10 mg/ml bovine serum albumin [NEB] and 5.965 ml water) per aliquot and incubation at 37 °C for 60 min with occasional agitation. For in-nucleus ligation, 7.61 ml of ligation mix (820 μl of 10× T4 DNA ligase reaction buffer [NEB], 82 μl of 10 mg/ml bovine serum albumin [NEB] and 6.71 ml water) was added per aliquot (compared with the in-solution ligation, SDS addition and incubation at 65 °C were omitted). For the ligation reaction (both in-solution and in-nucleus variants), 50 μl of 1 U/μl T4 DNA ligase (Life Technologies) was added per aliquot, followed by incubation at 16 °C for 4 h.
The cross-links were reversed by adding 60 μl of 10 mg/ml proteinase K (Roche) per aliquot and incubating at 65 °C overnight. After overnight incubation, another 60 μl of proteinase K per aliquot was added, followed by incubation at 65 °C for an additional 2 h. RNA was removed by adding 12.5 μl of 10 mg/ml RNase A (Roche) per aliquot and incubating at 37 °C for 60 min. DNA was isolated by a phenol (Sigma) extraction, followed by a phenol/chloroform/isoamylalcohol (Sigma) extraction and standard ethanol precipitation. The precipitated DNA was washed three times with 70 % ethanol, and dissolved in 25 μl TE per aliquot. Subsequently, all aliquots were pooled and the Hi-C DNA was quantified (Quant-iT Pico Green, Life Technologies). Biotin was removed from non-ligated restriction fragment ends by incubating 30–40 μg of Hi-C library DNA with T4 DNA polymerase (NEB) for 4 h at 20 °C in the presence of dATP. After DNA purification (QIAquick PCR purification kit, Qiagen) and sonication (Covaris E220), the sonicated DNA was end-repaired with T4 DNA polymerase, T4 DNA polynucleotide kinase, Klenow (all NEB) and dNTPs in 1× T4 DNA ligase reaction buffer (NEB). Double size selection of DNA was performed using AMPure XP beads (Beckman Coulter), before dATP-addition with Klenow exo− (NEB). Biotin-marked ligation products were isolated with MyOne Streptavidin C1 Dynabeads (Life Technologies) in binding buffer (5 mM Tris pH8, 0.5 mM EDTA, 1 M NaCl) for 30 min at room temperature, followed by two washes in binding buffer, and one wash in 1× T4 DNA ligase reaction buffer (NEB). Paired-end (PE) adapters (Illumina) were ligated onto Hi-C ligation products bound to streptavidin beads for 2 h at room temperature (T4 DNA ligase in 1× T4 DNA ligase reaction buffer [NEB], slowly rotating). After washes in wash buffer (5 mM Tris, 0.5 mM EDTA, 1 M NaCl, 0.05 % Tween-20) and binding buffer, the DNA-bound beads were resuspended in NEBuffer 2. Bead-bound Hi-C DNA was amplified with 12 PCR amplification cycles using PE PCR 1.0 and PE PCR 2.0 primers (Illumina). The concentration and size distribution of Hi-C library DNA after PCR amplification was determined by Bioanalyzer profiles (Agilent Technologies) and quantitative PCR, and the Hi-C libraries were paired-end sequenced on Illumina Hi-Seq 1000 or MiSeq platforms.
Mapping and filtering
The FASTQ paired-end read data were mapped against the appropriate reference genome (hg19, mm9 or an hg19/mm9 combined genome) and then filtered to remove frequently encountered experimental artefacts using the HiCUP [16] analysis pipeline developed at the Babraham Institute. After the filtering step, we calculated the difference of the ratio of the number of invalid di-tags relative to the uniquely mapped di-tags between the in-nucleus ligation and in-solution ligation datasets. For each di-tag category, we performed a t-test with the null hypothesis that the mean of the differences is 0, that is, there is no difference arising from the ligation step.
Proportion of hybrid mouse-human di-tags in the hybrid samples
For the mouse-human hybrid samples, we calculated the expected proportion of hybrid mouse-human di-tags (p
hybrid
) in the Hi-C library, assuming random ligation and that the enzymatic restriction was complete:
$$ {p}_{hybrid}=\frac{2{n}_{fend}^{mouse}{n}_{fend}^{human}}{{\left({n}_{fend}^{mouse} + {n}_{fend}^{human}\right)}^2} $$
where n
mouse
fend
is the number of mouse fragment ends (the number of mouse cells multiplied by twice the number of HindIII fragments in the mouse genome, 823,379), and n
human
fend
is the number of human fragment ends (the number of human cells multiplied by twice the number of HindIII fragments in the human genome, 837,163). In a sample containing a 5:1 ratio of mouse:human cells, p
hybrid
= 0.281.
Powerlaw curves
We plotted the frequency of cis-chromosomal interactions at various genomic distances. The frequency density was obtained by binning the unique cis-chromosomal Hi-C di-tags, using 50 bins of equal size on a log10 genomic distance plot.
Bias calculation
We quantified the extent to which the fragment length and the GC content of the fragment ends affect the read coverage using the hicpipe software (version 0.93) [26] developed by Yaffe and Tanay [21]. For each HindIII restriction fragment end, we calculated the fragment length, the GC content of the last 200 bp of the fragment end, and the mappability of the fragment. For the di-tags we used a segment length threshold of 500 bp, that is, we filtered out any di-tags where the sum of the distances from the read positions to the fragment ends where the ligation occurred was greater than this threshold. The algorithm binned the fragment lengths into 20 equally sized bins according to increasing fragment length. In turn, a 20 × 20 interaction matrix of these fragment length bins was used to describe the interaction bias between any two fragment ends. Similarly, a 20 × 20 interaction matrix was constructed using the GC content of the fragment ends. By performing a maximum likelihood optimization using the trans-chromosomal data (at 100 kb, 500 kb, 1 Mb and 10 Mb bin resolutions), we obtained the 20 × 20 interaction bias matrices describing the fragment length bias and the GC content bias.
Normalization of matrices
We calculated the coverage-corrected Hi-C matrices and the coverage-and-distance-corrected Hi-C matrices using the HOMER software [27] employing the algorithm described by Imakaev et al. [22]. It was assumed that the coverage of each bin should be the same in bias-free data, and that the observed Hi-C counts were the true counts multiplied by a factorizable bias (the factorizable bias of two interacting bins was the product of the bias contribution of the two individual bins).
The bias contribution vector and the true interaction matrix were optimized using an iterative approach, starting with the mapped filtered Hi-C data from HiCUP [16]. We used 1 and 10 Mb bin resolutions, excluding bins with coverage less than 20 % of the mean bin coverage, or more than 4 standard deviations away from the mean bin coverage.
Identification of compartments
We identified the compartments by calculating the first (or, for human samples, the first two) eigenvector(s) of the bin interaction profile correlation matrix for each chromosome, using the HOMER software [27]. The first eigenvector (or, for the human samples, the eigenvector related to the compartmental pattern as opposed to the chromosome arms) was aligned to active histone modification marks. This was done by multiplying the eigenvector by −1 if the Pearson correlation coefficient of the eigenvector and the H3K4me3 histone modification mark ChIP-seq [19, 28] profile was negative. The magnitude of the correlation coefficient was typically around 0.7. Chromosome bins with positive values in the eigenvector were considered to be in the A compartment, and bins with negative values to be in the B compartment. For the human chromosome 4, there was no clear separation between the first and second eigenvector profiles, so reads on human chromosome 4 were omitted from further analyses.
Compartment interaction bias among mouse–human hybrid reads
For the hybrid mouse–human di-tags, we assessed if there were any compartment-dependent non-random interactions, for instance, if mouse compartment A formed interactions preferentially with human compartment A. We counted hybrid di-tags in which both reads mapped to either compartment A or compartment B. We performed Fisher’s exact test on these counts.
Scatter plots and measures of matrix reproducibility
We calculated the Spearman correlation of all cis- and trans-chromosomal interactions between different Hi-C experiments, at a 10-Mb bin resolution, as well as at a TAD level, using TADs as variable sized bins. In addition, we plotted each binned interaction count in one dataset against the corresponding interaction count in a second dataset. We colored the points of the plot according to the genomic distance of the interacting bins.
We subdivided the bin interaction count data according to the genomic distance of the interacting bins, and performed a linear fit on each of these datasets (y = ax + b, where a is the slope and b is the intercept). For each distance, we then corrected the slope for the Hi-C library sizes (acorr = a Cx/Cy where Cx and Cy are the total counts in the libraries shown on the x and y axes). The DES was then the angle between the corrected slope and the y = x line:
$$ \mathrm{D}\mathrm{E}\mathrm{S} = \mathrm{atan}\left({\mathrm{a}}_{\mathrm{corr}}\right)\ \hbox{-}\ \mathrm{atan}(1). $$
A perfectly reproducible experiment would result in DES = 0 and a Spearman correlation R = 1.
Calculation of TAD boundaries
We calculated TADs in our coverage-corrected Hi-C matrices using the Hi-C domain finding tool of the HOMER software [27]. The algorithm defined directionality indices (DIs) as described in [18], based on the ratio of upstream and downstream interaction counts. We quantified the number of upstream and downstream interactions within an interaction distance of 1 Mb, using 25-kb overlapping bins with a step size of 5 kb. Bins with coverage less than 15 % of the mean bin coverage or greater than 4 standard deviations above the mean were excluded. This resulted in DI values at an effective 5-kb resolution (at the centre of each 25-kb window), which were further smoothed using a running average over a ±25 kb window. Domain boundaries were then called where the smoothed DI was at a local extremum and at least 0.5 standard deviations away from the mean. Using the domains identified by HOMER, we called consensus TAD boundaries for in-solution ligation and in-nucleus ligation datasets, by keeping only TAD boundaries (rounded to the closest genomic position using a 25-kb resolution).
Hi-C interactions around TAD boundaries
We plotted the interaction directionality profile around the TAD boundaries using the average of the standard scores of the un-smoothed DI values, as a function of distance from the domain boundary upstream or downstream. A random control included 9686 randomly selected genomic positions. In addition, we plotted the coverage- and distance-corrected Hi-C interaction profiles around the consensus TAD boundaries using HOMER [27] and 25-kb overlapping bins with a step size of 5 kb.
Availability of supporting data
The datasets supporting the results of this article are available in the Gene Expression Omnibus (GEO) repository under accession number [GEO:GSE70181] [29].