Calcified Tissue International

, Volume 94, Issue 4, pp 396–402

A ChIP-seq-Defined Genome-Wide Map of MEF2C Binding Reveals Inflammatory Pathways Associated with Its Role in Bone Density Determination

Authors

  • Matthew E. Johnson
    • Division of Human GeneticsThe Children’s Hospital of Philadelphia Research Center
  • Sandra Deliard
    • Division of Human GeneticsThe Children’s Hospital of Philadelphia Research Center
  • Fengchang Zhu
    • School of Veterinary MedicineUniversity of Pennsylvania
  • Qianghua Xia
    • Division of Human GeneticsThe Children’s Hospital of Philadelphia Research Center
  • Andrew D. Wells
    • Pathology and Laboratory MedicineChildren’s Hospital of Philadelphia
  • Kurt D. Hankenson
    • School of Veterinary MedicineUniversity of Pennsylvania
    • Division of Human GeneticsThe Children’s Hospital of Philadelphia Research Center
    • Department of Pediatrics, Perelman School of MedicineUniversity of Pennsylvania
Original Research

DOI: 10.1007/s00223-013-9824-5

Cite this article as:
Johnson, M.E., Deliard, S., Zhu, F. et al. Calcif Tissue Int (2014) 94: 396. doi:10.1007/s00223-013-9824-5

Abstract

Genome-wide association studies (GWAS) have demonstrated that genetic variation at the MADS box transcription enhancer factor 2, polypeptide C (MEF2C) locus is robustly associated with bone mineral density, primarily at the femoral neck. MEF2C is a transcription factor known to operate via the Wnt signaling pathway. Our hypothesis was that MEF2C regulates the expression of a set of molecular pathways critical to skeletal function. Drawing on our laboratory and bioinformatic experience with ChIP-seq, we analyzed ChIP-seq data for MEF2C available via the ENCODE project to gain insight in to its global genomic binding pattern. We aligned the ChIP-seq data generated for GM12878 (an established lymphoblastoid cell line) and, using the analysis package HOMER, a total of 17,611 binding sites corresponding to 8,118 known genes were observed. We then performed a pathway analysis of the gene list using Ingenuity. At 5 kb, the gene list yielded ‘EIF2 Signaling’ as the most significant annotation, with a P value of 5.01 × 10−26. Moving further out, this category remained the top pathway at 50 and 100 kb, then dropped to just second place at 500 kb and beyond by ‘Molecular Mechanisms of Cancer’. In addition, at 50 kb and beyond ‘RANK Signaling in Osteoclasts’ was a consistent feature and resonates with the main general finding from GWAS of bone density. We also observed that MEF2C binding sites were significantly enriched primarily near inflammation associated genes identified from GWAS; indeed, a similar enrichment for inflammation genes has been reported previously using a similar approach for the vitamin D receptor, an established key regulator of bone turnover. Our analyses point to known connective tissue and skeletal processes but also provide novel insights in to networks involved in skeletal regulation. The fact that a specific GWAS category is enriched points to a possible role of inflammation through which it impacts bone mineral density.

Keywords

ChIP-seqMEF2CGWASBoneInflammation

Introduction

In adults, bone mineral density (BMD) is the single best predictor of fragility fractures and is used as an important diagnostic criterion for osteoporosis [1]. It is now widely recognized from twin studies [2] that BMD is highly heritable and recent discoveries resulting from genome wide association studies (GWAS) of this trait further confirm a genetic component. To date, GWAS in adults has revealed 70 unique loci associated with BMD and/or fracture in populations of European ancestry [38].

As a consequence of these GWAS outcomes, there is compelling evidence that the gene encoding MADS box transcription enhancer factor 2, polypeptide C (MEF2C), a transcription factor known to operate via the Wnt signaling pathway, is associated with adult BMD and osteoporosis risk, primarily at the femoral neck. As such, MEF2C may offer an opportunity to provide a global insight in to the process of human bone formation through the defining of the pattern of genomic binding of this transcription factor.

We previously carried out chromatin immunoprecipitation followed by massively parallel sequencing (ChIP-seq) with another GWAS-implicated member of the Wnt signaling family, namely TCF7L2, which is widely considered the most strongly associated locus for type 2 diabetes reported to date [9], in order to elucidate its occupancy repertoire genome wide [10, 11] and to gain insight in to its mechanism of action. Unexpectedly, and despite employing the carcinoma cell line HCT116, our initial data suggested that the genes with TCF7L2 occupancy sites are strongly enriched in pathway categories related to metabolic-related functions and traits. It was noted that TCF7L2 binds to many more genes implicated in GWAS of metabolic and cardiovascular traits than would be expected by chance [10].

Following this notion, we aimed to elucidate such patterns for MEF2C in order to address our hypothesis that MEF2C regulates the expression of a set of molecular pathways critical to skeletal function. Drawing on our bioinformatic experience with TCF7L2 ChIP-seq, we analyzed ChIP-seq data for MEF2C derived from the B-lymphocyte-derived cell line GM12878, available via the ENCODE project, to similarly gain insight in to its global genomic binding pattern.

Materials and Methods

The ENCODE MEF2C ChIP-seq and input raw sequence files were downloaded from the UCSC database (http://genome.ucsc.edu/cgi-bin/hgFileSearch) [12]. After aligning the reads with Bowtie to hg19, binding sites were defined at a false discovery rate of 1 % and having four-fold more experimental tags than the control using the HOMER (Hypergeometric Optimization of Motif EnRichment) [13] analysis package. The candidate target gene was the closest gene regardless of direction from binding site. In all cases, the transcription start site (TSS) of the aligned transcript was used as the anchor point for distance measurements.

Pathway Analysis

Data were analyzed through the use of Ingenuity Pathways Analysis (Ingenuity Systems, Redwood City, CA, USA; http://www.ingenuity.com) specified for ‘Human.’ The genes that contain at least one function or pathway annotation in the Ingenuity knowledge base were eligible for the analysis. The P value associated with functions and pathways was calculated using the right-tailed Fisher Exact Test.

GWAS Data Analysis

We based our analysis on all GWAS genes summarized in the NHGRI GWAS catalog (http://www.genome.gov/gwastudies) from February 19, 2013. We cross-compared the name of the nearest gene to a given occupancy site at a given distance with the gene names assigned to loci recorded by the GWAS catalog. Enrichment was investigated by Chi-square analysis. Our method of scoring GWAS ChIP-seq gene overlap was to assign 1 to a GWAS region where all the genes in the region were found in our list and a fraction of a point determine by how many genes where found in our gene list divide by the total genes in the GWAS region. This analysis model would equally weight a GWAS region with one gene the same as a region with eight genes as a single region.

Results

We aligned the ChIP-seq data generated for GM12878 (an established lymphoblastoid cell line) by the ENCODE project to globally map DNA sequences bound by MEF2C. The antibody that was employed was MEF2C C-17 (sc-13268; Santa Cruz Biotechnology). A total of 17,611 binding sites were observed using the HOMER analysis package set at a false discovery rate of 1 %, corresponding to 8,118 known genes. The distribution of the binding sites were 39.8 % intronic, 35.5 % intergenic, 20.7 % promoter TSS, and the remaining 4.0 % in various other gene regions (Fig. 1).
https://static-content.springer.com/image/art%3A10.1007%2Fs00223-013-9824-5/MediaObjects/223_2013_9824_Fig1_HTML.gif
Fig. 1

Genomic distribution of MEF2C binding sites in the GM12878 cell line. The ChIP-seq data set generated a total of 17,611 binding sites observed at a false discovery rate of 1 % using HOMER. The majority of binding sites were determined to be located in intergenic and intronic regions

We went on to use the de novo motif discovery algorithm within HOMER to derive the consensus binding site for this MEF2C ChIP-seq data set. The 12-bp consensus was found in 26.0 % of all binding sites (Fig. 2). From a distribution point of view, the vast majority of occupancy (>97.9 %) fell within 500 kb of a RefSeq gene transcription start site.
https://static-content.springer.com/image/art%3A10.1007%2Fs00223-013-9824-5/MediaObjects/223_2013_9824_Fig2_HTML.gif
Fig. 2

Overall 12-bp consensus motif generated from nucleotide distribution for the in vivo MEF2Cs binding pattern. MEF2C was defined by HOMER (P = 1.0 × 10−1254); 12-bp consensus found in 26.0 %

The MEF2C bound regions were within 5 kb (i.e., directly upstream or downstream) of 4,600 known genes. Indeed, approximately 31.1 % of the regions were within 5 kb from the transcriptional start sites of genes, with the remainder being more distant from the promoter, either upstream or within the gene of interest.

Next, we performed a comprehensive pathway analysis including 4,600 genes within 5 kb of a MEF2C binding site that were well annotated functionally. Data were analyzed through the use of Ingenuity Pathway Analysis specified for ‘Human.’ The genes that contain at least one function or pathway annotation in the Ingenuity knowledge base were eligible for the analysis. 106 of these genes are known in the context of ‘EIF2 Signaling,’ which was ranked the most significant annotation with a P value of 5.01 × 10−26. Moving further out, this category remained the top pathway at 50 and 100 kb, then dropped to just second place at 500 kb and beyond by ‘Molecular Mechanisms of Cancer.’ Also consistently high on these lists at all distances was ‘B Cell Receptor Signaling’ and ‘T Cell Receptor Signaling,’ plus at 50 kb and beyond ‘RANK Signaling in Osteoclasts’ was a consistent feature and resonates with the main general finding from GWAS of bone density [38], making it highly notable. All these highly enriched categories easily survived correction for multiple testing (Supplementary Table S1). Furthermore, the category ‘Cell Morphology, Skeletal and Muscular Disorders, Skeletal and Muscular System Development and Function’ was ranked the most significantly enriched for ‘Associated Network Functions’ when considering all genes.

Because our intention was to execute a global analysis that was not hypothesis driven, our Ingenuity pathway approach was not restricted to sequences harboring the most common consensus binding site. The pathway analyses included all nearest genes that had a MEF2C occupancy site within the set distances, regardless of whether they harbored the common binding motif or not (Fig. 2). However, we also carried out a pathway analysis constrained on this key consensus binding site at the ‘All’ distance, where we had greatest statistical power, (see the top 20 annotations in Supplementary Table S2). This analysis revealed that some pathways strengthened while others weakened; indeed, ‘RANK Signaling in Osteoclasts’ substantially strengthened.

Motivated by the GWAS-implicated locus enrichment from our previous TCF7L2 work [10], we queried the results against all GWAS signals reported to date, as derived from the NHGRI GWAS catalog (February 19, 2013). Of the 4,600 genes with MEF2C binding sites within 5 kb, which represents 24.2 % of all RefSeq genes (n = 19,015), there was in fact a significant underrepresentation of GWAS loci (20.3 % of loci; P = 5.36 × 10−5) (Table 1), with individual disease category reflecting the same direction except for inflammation, which showed evidence for enrichment, albeit not significant at this distance. However, as we moved out, the inflammation category enrichment became increasing evident (50 kb: 47.2 % of loci, P = 0.0013; 100 kb: 50.9 % of loci P = 0.0011; 500 kb: 54.4 % of loci P = 0.00083; ‘All’: 54.5 % of loci P = 0.0011). At ‘All’, nominal evidence was also observed for the cardiovascular category (50.1 % of loci, P = 0.05). Regarding specific traits, the bone density/osteoporosis related GWAS findings were generally enriched at all distances, except at 100 kb, albeit not significantly as a result of lack of statistical power. Relatively marginal or no significant enrichment of GWAS signals for endocrine, cancer or neurological traits were observed at any distance.
Table 1

Enrichment of GWAS signals for nearest RefSeq genes to MEF2C binding site at given distances from the nearest transcription start site

MEF2C

% tot hg19 gene list

% ChIP-seq gene list

P (χ2)

4,600 genes, 5 kb

 Endocrine

24.2 % (4,600/19,015)

21.5 % (191.2/888)

0.15

 (Bone)

24.2 % (4,600/19,015)

26.4 % (14.5/55)

0.77

 Cancer

24.2 % (4,600/19,015)

23.5 % (78.8/335)

0.82

 Cardiovascular

24.2 % (4,600/19,015)

20.1 % (92.89/463)

0.10

 Inflammation

24.2 % (4,600/19,015)

27.9 % (145.58/521)

0.13

 Neuropsychiatrica

24.2 % (4,600/19,015)

18.8 % (109.34/582)

0.016

 Alla

24.2 % (4,600/19,015)

20.3 % (731.35/3,607)

5.36 × 10−5

6,978 genes, 50 kb

 Endocrine

36.7 % (6,978/19,015)

34.8 % (309.23/888)

0.44

 (Bone)

36.7 % (6,978/19,015)

39.1 % (21.5/55)

0.80

 Cancer

36.7 % (6,978/19,015)

39.3 % (131.71/335)

0.51

 Cardiovascular

36.7 % (6,978/19,015)

37.3 % (172.56/463)

0.86

 Inflammationb

36.7 % (6,978/19,015)

47.2 % (246.01/521)

0.0013

 Neuropsychiatric

36.7 % (6,978/19,015)

33.3 % (194.07/582)

0.25

 All

36.7 % (6,978/19,015)

34.8 % (1,256.93/3607)

0.15

7,546 genes, 100 kb

 Endocrine

39.7 % (7,546/19,015)

39.2 % (348.15/888)

0.85

 (Bone)

39.7 % (7,546/19,015)

39.1 % (21.5/55)

0.95

 Cancer

39.7 % (7,546/19,015)

42.7 % (142.91/335)

0.47

 Cardiovascular

39.7 % (7,546/19,015)

42.6 % (197.45/463)

0.40

 Inflammationb

39.7 % (7,546/19,015)

50.9 % (265.14/521)

0.0011

 Neuropsychiatric

39.7 % (7,546/19,015)

37.5 % (218.10/582)

0.48

 All

39.7 % (7,546/19,015)

39.1 % (1,409.29/3607)

0.65

8,058 genes, 500 kb

 Endocrine

42.4 % (8,058/19,015)

45.6 % (404.70/888)

0.24

 (Bone)

42.4 % (8,058/19,015)

44.5 % (24.5/55)

0.84

 Cancer

42.4 % (8,058/19,015)

47.7 % (159.93/335)

0.22

 Cardiovascular

42.4 % (8,058/19,015)

49.3 % (228.19/463)

0.065

 Inflammationb

42.4 % (8,058/19,015)

54.4 % (283.5/521)

0.00083

 Neuropsychiatric

42.4 % (8,058/19,015)

44.5 % (259.00/582)

0.52

 All

42.4 % (8,058/19,015)

45.0 % (1,624.86/3607)

0.062

8,118 genes, all genes

 Endocrine

42.7 % (8,118/19,015)

46.2 % (410.65/888)

0.191

 (Bone)

42.7 % (8,118/19,015)

44.5 % (24.5/55)

0.86

 Cancer

42.7 % (8,118/19,015)

48.7 % (163/335)

0.175

 Cardiovascularb

42.7 % (8,118/19,015)

50.1 % (231.89/463)

0.05

 Inflammationb

42.7 % (8,118/19,015)

54.5 % (283.93/521)

0.0011

 Neuropsychiatric

42.7 % (8,118/19,015)

45.2 % (263.2/582)

0.45

 Allb

42.7 % (8,118/19,015)

45.9 % (1,658.44/3607)

0.022

We based our analysis on all GWAS genes summarized in the NHGRI GWAS catalog (http://www.genome.gov/gwastudies) from February 19, 2013. Enrichment was investigated by Chi-square analyses. Our method of scoring GWAS ChIP-seq gene overlap was to assign 1 point to a GWAS region where all the genes in the region were found in our list, and a fraction of a point determined by how many genes where found in our gene list divided by the total genes in the GWAS region. This analysis model would equally weight a GWAS region with 1 gene the same as a region with 8 genes as a single region. The ‘Bone’ category is in brackets as it is a subset of ‘Endocrine’

GWAS genome-wide association studies, ChIP-seq chromatin immunoprecipitation combined with massively parallel sequencing

aUnderrepresentation

bOverrepresentation

To contrast with control data sets, we also generated a random list of 5,000 genes from the 19,015 RefSeq genes used by HOMER to determine nearest gene lists described above to ascertain if there was a bias of our data analysis. The randomly generated gene list showed no significant over representation of GWAS genes in the random gene set; in fact it showed a trend of under-representation of GWAS genes in the random HOMER gene set. To further validate our results, we generated a gene list from a histone methylation mark because to date, there has not been any clear correlation of histone marks to GWAS genes. We chose histone mark H3K36me3 because it marks actively transcribed gene bodies in HEGP2 ENCODE cell line and was ChIP-seq as part of the ENCODE project. The H3K36me3 generated gene list showed no significant over representation of GWAS genes in the H3K36me3 gene set; in fact it also showed a trend of under representation of GWAS genes in the H3K36me3 gene set (data not shown).

Discussion

We have employed bioinformatic analyses of ENCODE generated ChIP-seq data in order to shed light on the possible mechanism of action of the influence of the MEF2C locus on bone density determination and osteoporosis fracture pathogenesis.

We have identified 17,611 discrete MEF2C binding sites in the GM12878 cell line, with the consensus motif derived in strong agreement with previous reports. Strikingly, when we perform disease/function analyses using the gene list for MEF2C binding sites, we observe significant GWAS-implicated locus enrichment in a key disease-related category, i.e., inflammation. In addition, eukaryotic initiation factor 2 (EIF2) signaling, which is a strongly over-represented pathway on our gene list, regulates proinflammatory cytokine expression [14]; however, the EIF2 pathway is also known to play a role in the pathogenesis of neuropsychiatric disease [15, 16], but the fact that there was no enrichment for this disease GWAS category may suggest that this is not a relevant association in the context of this current study. In addition, the pathway analysis approach we employed is knowledge based, but we find the set of classical pathways intriguing that are found to be enriched. Overall, we can hypothesize that the genes in the vicinity of MEF2C binding sites not yet associated with these bone related traits are good candidates for follow-up research with an inflammation point of view.

MEF2C is required for the expression of the osteocyte bone formation inhibitor, SOST, in vitro and that deletion of MEF2C in mice decreases the expression of SOST leading to increased bone mass and cortical thickness [17]. In addition, the inhibition of SOST expression by PTH is mediated by MEF2C [18]. Furthermore, MEF2C is well established to play a role in involvement and regulation of chondrocyte biology [19]. The findings made in this current study now also implicate MEF2C mediating an effect on bone via inflammation; indeed, factors related to inflammation, in particular cytokines, are well known to play a key role in bone signaling [20]. It is also interesting to note that the MEF2C locus has been reported in GWAS of a number of other traits [2124], and based on these ChIP-seq findings, it could be conferring its effect via inflammation in those disease settings as well.

Interestingly, a similar enrichment of GWAS-discovered genes has also recently been reported by an independent group studying vitamin D receptor (VDR) occupancy with a very comparable ChIP-seq approach. And like our findings, they discovered VDR binding sites were significantly enriched near inflammatory associated genes identified from GWAS [25]. This is, of course, unlike our observations with TCF7L2 where there is enrichment of multiple GWAS categories, suggesting that VDR and MEF2C, both encoded by key genes in the bone field, act more specifically.

There are some limitations to this study. First of all, this ENCODE data set was generated from a B-Lymphocyte derived source, a tissue that that may not reflect the full extent of MEF2C action with respect to bone, and may lead to an over-representation of inflammation-related processes given that it is a white blood cell line; however, with our work with TCF7L2 ChIP-seq, despite using a carcinoma cell line, we observed clear enrichment of type 2 diabetes categories [10, 11], suggesting that binding repertoires can be relatively robust across multiple tissue types. In order to elucidate if this is the case with MEF2C, more ChIP-seq will need to be carried out in a battery of different tissues. Furthermore, ChIP-seq can be very much influenced by the challenging antibody selection (in order to determine which ones will be efficient for the process), plus additional antibodies targeted to different sites of the MEF2C protein product would provide a higher level of insight.

We observed an overall deficiency of GWAS-implicated loci at the 5 kb distance. In fact a slight deficiency, as we also see in our control runs, would be what one would expect as locus nomenclature is not completely uniform in the NIH curated GWAS catalog so may not always correspond to a classical RefSeq annotation. As such, the fact that we subsequently see enrichment as greater distances in specific categories speaks to the clarity of our observations.

The use of this ENCODE data resonates with previous studies using such datasets that suggest an enrichment of GWAS identified single nucleotide polymorphisms in transcriptionally relevant genomic locations [26]. Our bioinformatics approach in analyzing ENCODE-derived data points to known connective tissue and skeletal processes but also provides novel insights in to networks involved in skeletal regulation. The fact that specific GWAS disease associated categories are significantly enriched among genes bound by MEF2C not only speaks to its known cardiac role [27] because the cardiovascular GWAS category was nominally significantly enriched when ‘All’ genes were considered, but also points to a role of inflammation through which it impacts BMD.

Acknowledgments

This work was supported by Institutional Development Funds from the Children’s Hospital of Philadelphia and a Penn Center for Musculoskeletal Disorders pilot award.

Supplementary material

223_2013_9824_MOESM1_ESM.docx (56 kb)
Supplementary material 1 (DOCX 56 kb)

Copyright information

© Springer Science+Business Media New York 2013