Abstract
Many genomic alterations associated with human diseases localize in noncoding regulatory elements located far from the promoters they regulate, making it challenging to link noncoding mutations or risk-associated variants with target genes. The range of action of a given set of enhancers is thought to be defined by insulator elements bound by the 11 zinc-finger nuclear factor CCCTC-binding protein (CTCF). Here we analyzed the genomic distribution of CTCF in various human, mouse and chicken cell types, demonstrating the existence of evolutionarily conserved CTCF-bound sites beyond mammals. These sites preferentially flank transcription factor–encoding genes, often associated with human diseases, and function as enhancer blockers in vivo, suggesting that they act as evolutionarily invariant gene boundaries. We then applied this concept to predict and functionally demonstrate that the polymorphic variants associated with multiple sclerosis located within the EVI5 gene impinge on the adjacent gene GFI1.
Similar content being viewed by others
Change history
03 June 2011
In the version of this article initially published, the affiliation for authors at the Department of Molecular and Cellular Biology, Centro Nacional de Biotecnología, Madrid, Spain, was incomplete. The full affiliation is "Department of Molecular and Cellular Biology, Centro Nacional de Biotecnología, CSIC, Madrid, Spain." The error has been corrected in the HTML and PDF versions of the article.
References
Elgar, G. & Vavouri, T. Tuning in to the signals: noncoding sequence conservation in vertebrate genomes. Trends Genet. 24, 344–352 (2008).
Manolio, T.A. Genomewide association studies and assessment of the risk of disease. N. Engl. J. Med. 363, 166–176 (2010).
Epstein, D.J. Cis-regulatory mutations in human disease. Brief. Funct. Genomics Proteomics 8, 310–316 (2009).
Ragvin, A. et al. Long-range gene regulation links genomic type 2 diabetes and obesity risk regions to HHEX, SOX4, and IRX3. Proc. Natl. Acad. Sci. USA 107, 775–780 (2010).
Phillips, J.E. & Corces, V.G. CTCF: master weaver of the genome. Cell 137, 1194–1211 (2009).
Barski, A. et al. High-resolution profiling of histone methylations in the human genome. Cell 129, 823–837 (2007).
Cuddapah, S. et al. Global analysis of the insulator binding protein CTCF in chromatin barrier regions reveals demarcation of active and repressive domains. Genome Res. 19, 24–32 (2009).
Bushey, A.M., Ramos, E. & Corces, V.G. Three subclasses of a Drosophila insulator show distinct and cell type-specific genomic distributions. Genes Dev. 23, 1338–1350 (2009).
Nègre, N. et al. A comprehensive map of insulator elements for the Drosophila genome. PLoS Genet. 6, e1000814 (2010).
Ohlsson, R., Bartkuhn, M. & Renkawitz, R. CTCF shapes chromatin by multiple mechanisms: the impact of 20 years of CTCF research on understanding the workings of chromatin. Chromosoma 119, 351–360 (2010).
Ishihara, K., Oshimura, M. & Nakao, M. CTCF-dependent chromatin insulator is linked to epigenetic remodeling. Mol. Cell 23, 733–742 (2006).
Yao, H. et al. Mediation of CTCF transcriptional insulation by DEAD-box RNA-binding protein p68 and steroid receptor RNA activator SRA. Genes Dev. 24, 2543–2555 (2010).
Parelho, V. et al. Cohesins functionally associate with CTCF on mammalian chromosome arms. Cell 132, 422–433 (2008).
Rubio, E.D. et al. CTCF physically links cohesin to chromatin. Proc. Natl. Acad. Sci. USA 105, 8309–8314 (2008).
Wendt, K.S. et al. Cohesin mediates transcriptional insulation by CCCTC-binding factor. Nature 451, 796–801 (2008).
Mikkelsen, T.S. et al. Comparative epigenomic analysis of murine and human adipogenesis. Cell 143, 156–169 (2010).
Shubin, N., Tabin, C. & Carroll, S. Deep homology and the origins of evolutionary novelty. Nature 457, 818–823 (2009).
Oksenberg, J.R., Baranzini, S.E., Sawcer, S. & Hauser, S.L. The genetics of multiple sclerosis: SNPs to pathways to pathogenesis. Nat. Rev. Genet. 9, 516–526 (2008).
Handel, A.E., Handunnetthi, L., Giovannoni, G., Ebers, G.C. & Ramagopalan, S.V. Genetic and environmental factors and the distribution of multiple sclerosis in Europe. Eur. J. Neurol. 17, 1210–1214 (2010).
Hoffjan, S. & Akkad, D.A. The genetics of multiple sclerosis: an update 2010. Mol. Cell. Probes 24, 237–243 (2010).
Hafler, D.A. et al. Risk alleles for multiple sclerosis identified by a genomewide study. N. Engl. J. Med. 357, 851–862 (2007).
Hoppenbrouwers, I.A. et al. EVI5 is a risk gene for multiple sclerosis. Genes Immun. 9, 334–337 (2008).
Australia and New Zealand Multiple Sclerosis Genetics Consortium (ANZgene). Genome-wide association study identifies new multiple sclerosis susceptibility loci on chromosomes 12 and 20. Nat. Genet. 41, 824–828 (2009).
Alcina, A. et al. Tag-SNP analysis of the GFI1–EVI5-RPL5–FAM69 risk locus for multiple sclerosis. Eur. J. Hum. Genet. 18, 827–831 (2010).
Jothi, R., Cuddapah, S., Barski, A., Cui, K. & Zhao, K. Genome-wide identification of in vivo protein-DNA binding sites from ChIP-Seq data. Nucleic Acids Res. 36, 5221–5231 (2008).
Rhead, B. et al. The UCSC Genome Browser database: update 2010. Nucleic Acids Res. 38, D613–D619 (2010).
Ovcharenko, I., Nobrega, M.A., Loots, G.G. & Stubbs, L. ECR Browser: a tool for visualizing and accessing data from comparisons of multiple vertebrate genomes. Nucleic Acids Res. 32, W280–W286 (2004).
Filippova, G.N. et al. An exceptionally conserved transcriptional repressor, CTCF, employs different combinations of zinc fingers to bind diverged promoter sequences of avian and mammalian c-myc oncogenes. Mol. Cell. Biol. 16, 2802–2813 (1996).
Chen, X. et al. Integration of external signaling pathways with the core transcriptional network in embryonic stem cells. Cell 133, 1106–1117 (2008).
Pikaart, M.J., Recillas-Targa, F. & Felsenfeld, G. Loss of transcriptional activity of a transgene is accompanied by DNA methylation and histone deacetylation and is prevented by insulators. Genes Dev. 12, 2852–2862 (1998).
Recillas-Targa, F. et al. Position-effect protection and enhancer blocking by the chicken β-globin insulator are separable activities. Proc. Natl. Acad. Sci. USA 14, 6883–6888 (2002).
Wallace, J.A. & Felsenfeld, G. We gather together: insulators and genome organization. Curr. Opin. Genet. Dev. 17, 400–407 (2007).
Guelen, L. et al. Domain organization of human chromosomes revealed by mapping of nuclear lamina interactions. Nature 453, 948–951 (2008).
Lunyak, V.V. et al. Developmentally regulated activation of a SINE B2 repeat as a domain boundary in organogenesis. Science 317, 248–251 (2007).
Recillas-Targa, F., Bell, A.C. & Felsenfeld, G. Positional enhancer-blocking activity of the chicken β-globin insulator in transiently transfected cells. Proc. Natl. Acad. Sci. USA 96, 14354–14359 (1999).
Bessa, J. et al. Zebrafish enhancer detection (ZED) vector: a new tool to facilitate transgenesis and the functional analysis of cis-regulatory regions in zebrafish. Dev. Dyn. 238, 2409–2417 (2009).
Ravasi, T. et al. An atlas of combinatorial transcriptional regulation in mouse and man. Cell 140, 744–752 (2010).
Kleinjan, D.A. & van Heyningen, V. Long-range control of gene expression: emerging mechanisms and disruption in disease. Am. J. Hum. Genet. 76, 8–32 (2005).
Hemmer, B., Cepok, S., Zhou, D. & Sommer, N. Multiple sclerosis–a coordinated immune attack across the blood brain barrier. Curr. Neurovasc. Res. 1, 141–150 (2004).
Phelan, J.D., Shroyer, N.F., Cook, T., Gebelein, B. & Grimes, H.L. Gfi1-cells and circuits: unraveling transcriptional networks of development and disease. Curr. Opin. Hematol. 17, 300–307 (2010).
Wilson, N.K. et al. Gfi1 expression is controlled by five distinct regulatory regions spread over 100 kilobases, with Scl/Tal1, Gata2, PU.1, Erg, Meis1, and Runx1 acting as upstream regulators in early hematopoietic cells. Mol. Cell. Biol. 30, 3853–3863 (2010).
Achiron, A. et al. Microarray analysis identifies altered regulation of nuclear receptor family members in the pre-disease state of multiple sclerosis. Neurobiol. Dis. 38, 201–209 (2010).
Gonzalez, S. et al. Oncogenic activity of Cdc6 through repression of the INK4/ARF locus. Nature 440, 702–706 (2006).
Escamilla-Del-Arenal, M. & Recillas-Targa, F. GATA-1 modulates the chromatin structure and activity of the chicken α-globin 3′ enhancer. Mol. Cell. Biol. 28, 575–586 (2008).
Rincón-Arano, H., Guerrero, G., Valdes-Quezada, C. & Recillas-Targa, F. Chicken α-globin switching depends on autonomous silencing of the embryonic π globin gene by epigenetics mechanisms. J. Cell. Biochem. 108, 675–687 (2009).
Blankenberg, D. et al. Galaxy: a web-based genome analysis tool for experimentalists. Curr. Protoc. Mol. Biol. 19, 19.10.1–19.1.21 (2010).
Blankenberg, D et al. A framework for collaborative analysis of ENCODE data: making large-scale analyses biologist-friendly. Genome Res. 17, 960–964 (2007).
Zambelli, F., Pesole, G. & Pavesi, G. Pscan: finding over-represented transcription factor binding site motifs in sequences from co-regulated or co-expressed genes. Nucleic Acids Res. 37, W247–W252 (2009).
Matys, V. et al. TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic Acids Res. 34, D108–D110 (2006).
Bailey, T.L. & Elkan, C. The value of prior knowledge in discovering motifs with MEME. Proc. Int. Conf. Intell. Syst. Mol. Biol. 3, 21–29 (1995).
Martin, D. et al. GOToolBox: functional analysis of gene datasets based on Gene Ontology. Genome Biol. 5, R101 (2004).
Hsu, F. et al. The UCSC known genes. Bioinformatics 22, 1036–1046 (2006).
Su, A.I. et al. A gene atlas of the mouse and human protein-encoding transcriptomes. Proc. Natl. Acad. Sci. USA 101, 6062–6067 (2004).
Dekker, J., Rippe, K., Dekker, M. & Kleckner, N. Capturing chromosome conformation. Science 295, 1306–1311 (2002).
Acknowledgements
This research was supported by the following grants: BFU2007-60042/BMC, BFU2010-14839, Petri PET2007_0158, CONSOLIDER CSD2007-00008 (Spanish Ministerio de Ciencia e Innovación (MICINN)) and Proyecto de Excelencia CVI-3488 (Junta de Andalucía) to J.L.G.-S.; BFU2009-07044 (MICINN) and Proyecto de Excelencia CVI 2658 (Junta de Andalucía) to F.C.; FIS PI081636 (ISCIII) to F.M.; PN-SAF2009-11491 (MICINN) and Proyecto de Excelencia P07-CVI-02551 (Junta de Andalucía) to A.A.; BFU2008-00838, CONSOLIDER CSD2007-00008 (MICINN), Regional Government of Madrid (CAM S-SAL-0190-2006) and the Pro-CNIC Foundation to M.M.; BFU2006-12185 and BIO2009-12697 (MICINN) to L.M.; Dirección General de Asuntos del Personal Académico, Universidad Nacional Autónoma de México (IN209403, IN214407 and IN203811) and Consejo Nacional de Ciencia y Tecnología, México (CONACyT: 42653-Q, 58767 and 128464) to F.R.-T.; Intramural Research Program of the US NCBI (NIH) to I.O. and BIO2006-03380, CONSOLIDER CSD2007-00050 (MICINN) and RETICS RD07/0067/0012 (Spanish MICINN) to R.G. L.M. thanks A. Fernández for technical assistance and L. Barrios for statistical analysis. F.R.-T. thanks G.G. Avendaño for technical assistance.
Author information
Authors and Affiliations
Contributions
J.L.G.-S. and F.C. conceived the study, designed the experiments, interpreted results and wrote the manuscript. D.M. devised bioinformatics methods, carried out data analysis and wrote the paper. C.P., M.S. and M.A.B. conducted mouse ChIP experiments. C.V.-Q., M.F.-M. and F.R.-T. carried out chicken ChIP experiments. E.C.-M., E.M. and L.M. carried out insulator assays. A.F.M. conducted the 3C experiments. F.M., A.A. and M.F. provided PBMCs from blood cells and carried out qrtPCR, CNRA/CNRB activity assay in a luciferase reporter assay, quantification of GFI1 relative expression of 108 PBMC samples, genotyping of the EVI5 rs11804321 and statistical analysis. O.D. carried out the high-throughput sequencing. O.B., L.T., I.O., S.C. and P.S.P. did data analysis. M.M. and R.G. collaborated in the experimental design, discussion of results and in writing the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–7 and Supplementary Methods (PDF 4457 kb)
Supplementary Table 1
Genes flanking CTCF sites in human, mouse and chicken (XLS 21996 kb)
Supplementary Table 2
Gene Ontology of genes associated with CTCF sites (Biological Processes) (XLS 15459 kb)
Supplementary Table 3
Gene Ontology of genes associated with CTCF sites (Molecular Function) (XLS 4972 kb)
Supplementary Table 4
Primers used to amplify the human CTCF bound regions and in the 3C experiments (XLS 50 kb)
Rights and permissions
About this article
Cite this article
Martin, D., Pantoja, C., Miñán, A. et al. Genome-wide CTCF distribution in vertebrates defines equivalent sites that aid the identification of disease-associated genes. Nat Struct Mol Biol 18, 708–714 (2011). https://doi.org/10.1038/nsmb.2059
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nsmb.2059
- Springer Nature America, Inc.
This article is cited by
-
Loss of the Ash2l subunit of histone H3K4 methyltransferase complexes reduces chromatin accessibility at promoters
Scientific Reports (2022)
-
Replication analysis of variants associated with multiple sclerosis risk
Scientific Reports (2020)
-
Heart enhancers with deeply conserved regulatory activity are established early in zebrafish development
Nature Communications (2018)
-
Smoking induces DNA methylation changes in Multiple Sclerosis patients with exposure-response relationship
Scientific Reports (2017)
-
Tissue-specific targeting of cell fate regulatory genes by E2f factors
Cell Death & Differentiation (2016)