Abstract
Single-cell Hi-C (scHi-C) technologies allow for probing of genome-wide cell-to-cell variability in three-dimensional (3D) genome organization from individual cells. Computational methods have been developed to reveal single-cell 3D genome features based on scHi-C, including A/B compartments, topologically associating domains and chromatin loops. However, no method exists for annotating single-cell subcompartments, which is important for understanding chromosome spatial localization in single cells. Here we present scGHOST, a single-cell subcompartment annotation method using graph embedding with constrained random walk sampling. Applications of scGHOST to scHi-C data and contact maps derived from single-cell 3D genome imaging demonstrate reliable identification of single-cell subcompartments, offering insights into cell-to-cell variability of nuclear subcompartments. Using scHi-C data from complex tissues, scGHOST identifies cell-type-specific or allele-specific subcompartments linked to gene transcription across various cell types and developmental stages, suggesting functional implications of single-cell subcompartments. scGHOST is an effective method for annotating single-cell 3D genome subcompartments in a broad range of biological contexts.
Similar content being viewed by others
Data availability
In this work, we used several public datasets. scHi-C data for the GM12878 cell line12 were downloaded from the 4DN Data Portal29,40 (4DNES4D5MWEZ, 4DNESUE2NSGS and 4DNESTVIP977) in FASTQ format and were processed into contact maps at 500-kb resolution using the recommended processing pipeline (https://github.com/VRam142/combinatorialHiC)41 of the data source. The scHi-C dataset of the human prefrontal cortex14 was downloaded from the Gene Expression Omnibus (GEO) (GSE130711) in contact pairs format, which was then transformed into contact maps at 500-kb resolution. The WTC11 scHi-C dataset was downloaded from the 4DN Data Portal29 (4DNESF829JOW and 4DNESJQ4RXY5). All scHi-C datasets were imputed with Higashi17 (https://github.com/ma-compbio/Higashi)42 with default parameters. The Dip-C developing mouse brain dataset13 was downloaded from the GEO (GSE162511). The HiRES developing mouse embryos dataset33 was downloaded from the GEO (GSE223917). We downloaded the following ENCODE datasets: ENCFF167NBF, ENCFF171MDW, ENCFF803DJF, ENCFF776OVW, ENCFF001GNK, ENCFF001GNN, ENCFF001GOA, ENCFF001GNX, ENCFF001GNT, ENCFF001GNR, ENCFF001GRA, ENCFF001GRD, ENCFF001GRQ, ENCFF001GRM, ENCFF001GRJ, ENCFF001GRG, ENCFF834HNV, ENCFF066MEE, ENCFF366BVS, ENCFF050ZTH and ENCFF519FHW. The imaging dataset31 was obtained from Zenodo (https://doi.org/10.5281/zenodo.3928890)43. We also downloaded the scRNA-seq of multiple cortical areas of the human brain from the Allen Brain Map44,45. The marker genes for astrocyte (Astro), oligodendrocyte (ODC), oligodendrocyte progenitor cell (OPC), endothelial cell (Endo), microglia (MG) and neuron cell types were identified using Seurat46,47 with default parameters. For each cell type, the background was chosen as the rest of the cell types. When identifying marker genes for neuron subtypes, the background was chosen as the rest of the neuron cells. The genes were then ranked by the log fold change value between a specific cell type and the background.
Code availability
The source code of scGHOST can be accessed at https://github.com/ma-compbio/scGHOST (ref. 48), which has also been deposited to Zenodo (https://doi.org/10.5281/zenodo.10141210; ref. 49).
References
Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).
Rao, S. S. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014).
Xiong, K. & Ma, J. Revealing Hi-C subcompartments by imputing inter-chromosomal chromatin interactions. Nat. Commun. 10, 5069 (2019).
Dixon, J. R. et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485, 376–380 (2012).
Nora, E. P. et al. Spatial partitioning of the regulatory landscape of the X-inactivation centre. Nature 485, 381–385 (2012).
Zheng, H. & Xie, W. The role of 3D genome organization in development and cell differentiation. Nat. Rev. Mol. Cell Biol. 20, 535–550 (2019).
Marchal, C., Sima, J. & Gilbert, D. M. Control of DNA replication timing in the 3D genome. Nat. Rev. Mol. Cell Biol. 20, 721–737 (2019).
Misteli, T. The self-organizing genome: principles of genome architecture and function. Cell 183, 28–45 (2020).
Ramani, V. et al. Massively multiplex single-cell Hi-C. Nat. Methods 14, 263–266 (2017).
Nagano, T. et al. Cell-cycle dynamics of chromosomal organization at single-cell resolution. Nature 547, 61–67 (2017).
Tan, L., Xing, D., Chang, C.-H., Li, H. & Xie, X. S. Three-dimensional genome structures of single diploid human cells. Science 361, 924–928 (2018).
Kim, H.-J. et al. Capturing cell type-specific chromatin compartment patterns by applying topic modeling to single-cell Hi-C data. PLoS Comput. Biol. 16, e1008173 (2020).
Tan, L. et al. Changes in genome architecture and transcriptional dynamics progress independently of sensory experience during post-natal brain development. Cell 184, 741–758 (2021).
Lee, D.-S. et al. Simultaneous profiling of 3D genome structure and DNA methylation in single human cells. Nat. Methods 16, 999–1006 (2019).
Liu, H. et al. DNA methylation atlas of the mouse brain at single-cell resolution. Nature 598, 120–128 (2021).
Zhou, J. et al. Robust single-cell Hi-C clustering by convolution-and random-walk–based imputation. Proc. Natl Acad. Sci. USA 116, 14011–14018 (2019).
Zhang, R., Zhou, T. & Ma, J. Multiscale and integrative single-cell Hi-C analysis with Higashi. Nat. Biotechnol. 40, 254–261 (2022).
Zhang, R., Zhou, T. & Ma, J. Ultrafast and interpretable single-cell 3D genome analysis with Fast-Higashi. Cell Syst. 13, 798–807 (2022).
Zhang, Y. et al. Computational methods for analysing multiscale 3D genome organization. Nat. Rev. Genet. 25, 123–141 (2023).
Zhou, T., Zhang, R. & Ma, J. The 3D genome structure of single cells. Annu. Rev. Biomed. Data Sci. 4, 21–41 (2021).
Yu, M. et al. SnapHiC: a computational pipeline to identify chromatin loops from single-cell Hi-C data. Nat. Methods 18, 1056–1059 (2021).
Belmont, A. S. Nuclear compartments: an incomplete primer to nuclear compartments, bodies, and genome organization relative to nuclear architecture. Cold Spring Harb. Perspect. Biol. 14, a041268 (2022).
Liu, Y. et al. Systematic inference and comparison of multi-scale chromatin sub-compartments connects spatial organization to cell phenotypes. Nat. Commun. 12, 2439 (2021).
Ashoor, H. et al. Graph embedding and unsupervised learning predict genomic sub-compartments from hic chromatin interaction data. Nat. Commun. 11, 1173 (2020).
Grover, A. & Leskovec, J. node2vec: scalable feature learning for networks. In Proc. of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 855–864 (Association for Computing Machinery, 2016).
Mikolov, T., Chen, K., Corrado, G. & Dean, J. Efficient estimation of word representations in vector space. Preprint at https://arxiv.org/abs/1301.3781 (2013).
Trojer, P. & Reinberg, D. Facultative heterochromatin: is there a distinctive molecular signature? Mol. Cell 28, 1–13 (2007).
Zhu, C. et al. Joint profiling of histone modifications and transcriptome in single cells from mouse brain. Nat. Methods 18, 283–292 (2021).
Reiff, S. B. et al. The 4D Nucleome Data Portal as a resource for searching and visualizing curated nucleomics data. Nat. Commun. 13, 2365 (2022).
Friedman, C. E. et al. Single-cell transcriptomic analysis of cardiac differentiation from human PSCs reveals HOPX-dependent cardiomyocyte maturation. Cell Stem Cell 23, 586–598 (2018).
Su, J.-H., Zheng, P., Kinrot, S. S., Bintu, B. & Zhuang, X. Genome-scale imaging of the 3D organization and transcriptional activity of chromatin. Cell 182, 1641–1659 (2020).
Perez, J. D. et al. Quantitative and functional interrogation of parent-of-origin allelic expression biases in the brain. eLife 4, e07860 (2015).
Liu, Z. et al. Linking genome structures to functions by simultaneous single-cell Hi-C and RNA-seq. Science 380, 1070–1076 (2023).
Zhou, T. et al. Concurrent profiling of multiscale 3D genome organization and gene expression in single mammalian cells. Preprint at bioRxiv https://doi.org/10.1101/2023.07.20.549578 (2023).
Tang, J. et al. LINE: large-scale information network embedding. In Proc. of the 24th International Conference on World Wide Web 1067–1077 (Association for Computing Machinery, 2015).
Perozzi, B., Al-Rfou, R. & Skiena, S. DeepWalk: online learning of social representations. In Proc. of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 701–710 (Association for Computing Machinery, 2014).
Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. In Proc. of the 3rd International Conference on Learning Representations (ICLR, 2015).
Satopaa, V., Albrecht, J., Irwin, D. & Raghavan, B. Finding a ‘kneedle’ in a haystack: detecting knee points in system behavior. In 2011 31st International Conference on Distributed Computing Systems Workshops 166–171 (IEEE, 2011).
Arvai, K. kneed. GitHub https://github.com/arvkevi/kneed (2020).
Dekker, J. et al. The 4D nucleome project. Nature 549, 219–226 (2017).
VRam142/combinatorialHiC. GitHub https://github.com/VRam142/combinatorialHiC (2017).
ma-compbio/Higashi. GitHub https://github.com/ma-compbio/Higashi (2022).
Su, J.-H., Zheng, P., Kinrot, S., Bintu, B. & Zhuang, X. Genome-scale imaging of the 3D organization and transcriptional activity of chromatin. Zenodo https://doi.org/10.5281/zenodo.3928890 (2020).
Hawrylycz, M. J. et al. An anatomically comprehensive atlas of the adult human brain transcriptome. Nature 489, 391–399 (2012).
Hodge, R. D. et al. Conserved cell types with divergent features in human versus mouse cortex. Nature 573, 61–68 (2019).
Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420 (2018).
Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902 (2019).
ma-compbio/scGHOST. GitHub https://github.com/ma-compbio/scGHOST (2024).
Xiong, K., Zhang, R. & Ma, J. scGHOST. Zenodo https://doi.org/10.5281/zenodo.10116434 (2023).
Acknowledgements
This work was supported, in part, by National Institutes of Health Common Fund 4D Nucleome Program grant UM1HG011593 (J.M.); National Institutes of Health Common Fund Cellular Senescence Network Program grant UG3CA268202 (J.M.); and National Institutes of Health grants R01HG007352 (J.M.) and R01HG012303 (J.M.). J.M. was additionally supported by a Guggenheim Fellowship from the John Simon Guggenheim Memorial Foundation, a Google Research Collabs Award and a Single-Cell Biology Data Insights award from the Chan Zuckerberg Initiative. R.Z. was additionally supported by funding from the Eric and Wendy Schmidt Center at the Broad Institute of MIT and Harvard.
Author information
Authors and Affiliations
Contributions
K.X. and J.M. conceived the development of this work. K.X. and R.Z. developed the software tools. K.X., R.Z. and J.M. conducted data analysis and investigation. K.X., R.Z. and J.M. wrote the paper. J.M. acquired funding to support this work.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Methods thanks Ming Hu, Fulai Jin and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available. Primary Handling Editor: Lei Tang, in collaboration with the Nature Methods team.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Figs. 1–32 and Supplementary Note.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Xiong, K., Zhang, R. & Ma, J. scGHOST: identifying single-cell 3D genome subcompartments. Nat Methods 21, 814–822 (2024). https://doi.org/10.1038/s41592-024-02230-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41592-024-02230-9
- Springer Nature America, Inc.