Introduction

Clubfoot is a common congenital abnormality characterized by rigid inward turning of the contracted foot toward the midline of the body [6]. Severe calf muscle hypoplasia can be present, even after corrective treatment [24]. Approximately 20% of clubfeet occur as part of a syndrome, while the remaining 80% occur as an isolated abnormality (nonsyndromic, no other malformations) affecting one of 700 to 1000 newborns [5, 7, 9, 10, 30, 35].

Clubfoot is a complex birth defect wherein genetics and environmental factors play etiologic roles [11]. Even with overwhelming evidence for a genetic etiology, only a few genes have been implicated in clubfoot, including TBX4, PITX1, muscle contraction and apoptotic genes [2, 3, 13, 14, 20, 32, 43]. Therefore, most of the genetic variation remains to be identified.

Candidate gene approaches traditionally have been the preferred method for identifying potential clubfoot genes. Two previous studies focused on genes involved in muscle development and function because of the presence of calf muscle hypoplasia and muscle contraction in clubfoot [14, 43]. These included the HOXA and D genes that are involved in synchronized development of muscles, tendons, and cartilage of the limb; mutations in these genes have been shown to cause various limb malformations but not clubfoot [1, 8, 12, 16, 22, 27, 29, 33, 36, 37, 40, 41, 45]. Additionally, muscle contraction genes (TPM2, MYH3, TNNT3, and TNNI2) were interrogated because mutations in these genes cause distal arthrogryposis syndromes 1, 2A, and 2B (OMIM 108120, 193700, and 601680). These autosomal dominant syndromes are characterized by multiple congenital joint contractures, clubfoot, and muscle hypoplasia. In previous studies, a total of 94 single nucleotide polymorphisms (SNPs) spanning 35 genes involved in muscle patterning and contraction were interrogated in a large family-based clubfoot dataset [14, 43]. The strongest associations were found for SNPs in potential regulatory regions suggesting a common mechanism of gene regulation.

This led us to hypothesize that changes in gene expression contribute to clubfoot. We, therefore, asked whether four SNPs, rs3801776/HOXA9, rs4075583/TPM1, rs2025126/TPM2, and rs2145925/TPM2, located in potential regulatory regions, play a functional role in gene regulation.

Materials and Methods

Electrophoretic Mobility Shift Assay

Nuclear extracts isolated from undifferentiated and differentiated C2C12 mouse muscle cells were obtained from Active Motif (Carlsbad, CA, USA). Complementary single-stranded oligonucleotides (synthesized by Integrated DNA Technologies, Coralville, IA, USA) were annealed and end-labeled with corresponding radiolabeled α-32P nucleotides on the forward strand to be used as probes (PerkinElmer, Waltham, MA, USA). Probes were designed with approximately 10 bp on either side of the candidate SNPs (Supplemental Table 1. Supplemental material is available with the online version of CORR ®). Binding reactions were performed in the absence or presence of unlabeled competitor oligonucleotides (5×, 10×, and 50×) with 4.5 µg (undifferentiated) or 5.00 µg (differentiated) C2C12 nuclear extract in 20 µL of 1× binding buffer (50 mmol/L KCl, 10% glycerol, 0.5 mmol/L EDTA, 0.5 mmol/L dithiothreitol, 0.05% NP-40, 1 mmol/L phenylmethylsulfonylfluoride) with 1 µg deoxyguanylic/deoxycytidylic (Sigma-Aldrich, St. Louis, MO, USA) and 1 µL of radiolabeled probe for 1 hour at 4°C. Protein-DNA complexes were separated on 5% polyacrylamide gels. The gels were dried and radioactivity was observed by autoradiography on radiographic film.

Cell Culture Technique

C2C12 mouse muscle cells were obtained from the American Type Culture Collection (ATCC# CRL-1772, Manassas, VA, USA). For the undifferentiated cells, the cells were cultured in Gibco® DMEM High Glucose medium (Life Technologies, Grand Island, NY, USA) supplemented with 10% fetal bovine serum. To initiate differentiation, cells first were washed with phosphate buffered saline and Gibco® DMEM High Glucose medium supplemented with 2% horse serum for 5 days.

Generation of TPM1 and TPM2 Promoter Constructs

Promoter constructs for TPM1 and TPM2 have not been characterized; therefore we targeted the first 500-bps upstream of the transcriptional start site known to contain the basal promoter [28, 34]. To obtain TPM1 (-500 to -1 bp) and TPM2 (−461 to −1 bp) constructs, promoter sequences were amplified using primers incorporating KpnI and XhoI sites (Supplemental Table 2, Supplemental material is available with the online version of CORR ®) by standard PCR methods from bacterial artificial chromosome (BAC) clones (TPM1 [RP11-244F12] and TPM2 [RP-112J3]) and ligated into the linearized pGL4.10 basic vector (Promega, Madison, WI, USA) using the In-Fusion® HD Cloning System (Clontech, Mountain View, CA, USA). For the TPM2 promoter, only the region spanning −34 to −494 bps could be generated because secondary structure inhibited PCR amplification. The 400 bp HOXA9 luciferase construct was obtained from one of the authors (CVP) and construct design was described Trivedi et al. [42]. Cloning efficiency was verified by Sanger sequencing.

Generation of Promoter With Regulatory SNP Construct

Allele-specific transcription factor binding identified by electrophoretic mobility shift assays was evaluated for effect on promoter activity. Constructs were designed to incorporate the electrophoretic mobility shift assay probe sequences with ancestral and alternate forms of the SNP using KpnI and XhoI cut site overhangs for direct ligation in the corresponding linearized gene promoter luciferase construct. The ancestral (G) allele of rs3801776 located at -206 was contained in the 400 bp HOXA9 construct. Site-directed mutagenesis was used to create the HOXA9 alternate (A) allele construct using QuikChange® II (Agilent Technologies, Santa Clara, CA, USA) following the manufacturer’s protocol. All constructs were verified by sequencing.

Generation of TPM1 Haplotype Frequencies

To identify common TPM1 haplotypes, the 1774 base pair region (−1714 to +60) containing eight TPM1 SNPs, described by Savill et al. [38], was sequenced in 64 non-Hispanic white: 28 multiplex (+FH) and 36 simplex (−FH) and 73 Hispanic: 21 multiplex (+FH) and 52 simplex (−FH) clubfoot probands. DNAs were amplified following standard PCR methods using two primers sets: set 1 (940 bp, Tm: 63°C): forward primer, ACTCACCTGAAACTGACCTTCCCA; reverse primer, AAGTCACGCAGCAGGAAACTAGGA; set 2: (1,281 bp, Tm: 56°C): forward primer, ATGGGCCTCAGCCTGACTCTTAAA; reverse primer, AACGGGTGGTGTTGAGAAGGTTCT). Control TPM1 haplotype frequencies were obtained from the 1000 genomes (http://browser.1000genomes.org/index.html) using CEU for non-Hispanic whites and MXL for the Mexican ancestry to represent the Hispanics. Haploview was used to identify the common haplotypes in cases and controls [4].

Generation of TPM1 Haplotype Construct

Luciferase constructs were designed for the four common TPM1 haplotypes (Table 1) for effect on TPM1 promoter activity. Inserts containing TPM1 haplotypes 2 to 4 (Table 1) were obtained by XhoI and SacI double digestion of the TPM1 constructs described by Savill et al. [38]. Haplotypes 2 to 4 inserts were ligated in the linearized TPM1 promoter luciferase construct. TPM1 haplotype 1 was generated using site-specific mutagenesis following the manufacturer’s protocol (QuikChange® II, Agilent Technologies). Cloning efficiency was verified by Sanger sequencing.

Table 1 TPM1 haplotype frequencies in non-Hispanic Whites

Luciferase Assays

C2C12 cells (100,000 cells/well) were seeded in 12-well plates for 24 hours before transfection. For transfection, 1.12 µg of luciferase reporter construct, 0.048 µg of Renilla (internal control), and Opti-MEM® (Thermo-Scientific, Grand Island, NY, USA) were incubated with FuGENE® HD (Promega) following the manufacturer’s protocol. All experiments were performed in triplicate and repeated at least three independent times. Luciferase activities for undifferentiated C2C12 cells were determined 48 hours after transfection using the dual-luciferase system (Promega). For differentiated C2C12 cells, the media were replaced 48 hours after transfection with DMEM medium supplemented with 2% horse serum (Life Technologies) and luciferase activity was measured 5 days later. Unpaired t-tests were used to compare luciferase expression between constructs.

Results

Electrophoretic mobility shift assays identified nuclear protein interactions for the ancestral alleles of rs3801776C/HOXA9 (Fig. 1A), rs4075583G/TPM1 (Fig. 1B), rs2025126C/TPM2 (Fig. 1C) and the alternate allele of rs2145925G/TPM2 (Fig. 1D). Competitive assays for each of these nuclear protein interactions confirmed that the DNA-protein complexes were allele-specific and showed differential binding affinities.

Fig. 1A–D
figure 1

Electrophoretic mobility shift assays were performed to evaluate whether the associated single nucleotide polymorphisms (SNPs) effected DNA-protein interactions. The ancestral G allele of (A) rs3801776/HOXA9 and (B) rs4075583/TPM1created DNA-protein interactions that were eliminated when the alternate A allele was present. The (C) rs2025126/TPM2 ancestral C allele created two DNA-protein interactions that were eliminated when the alternate T allele was present. The (D) rs2145925/TPM2 alternate C allele created a DNA-protein interaction that was eliminated with the ancestral T allele. The allele-specific DNA-protein interactions were confirmed through competitive assays using corresponding unlabeled probes in excess of 5×, 10× or 50×. The arrow indicates allele-specific DNA-protein complex.

Luciferase reporter assays were used to assess the effect of these allele-specific binding nuclear proteins on promoter activity (Figs. 2 and 3). The presence of the genomic sequence containing either the TPM1 or TPM2 SNPs significantly increased promoter activity above the basal level suggesting that they have enhancer activity (Figs. 2 and 3). For rs2025126/TPM2 (Fig. 1C), the ancestral allele nuclear protein increased promoter activity compared with the alternate form only in differentiated C2C12 mouse muscle cells (Fig. 2C), which is in contrast to rs2145925/TPM2 alternate allele nuclear protein that increased promoter activity (Fig. 2D). Although the ancestral allele of rs4075583/TPM1 creates a unique nuclear protein interaction, there was no difference in promoter activity in the presence of the ancestral or alternate alleles. In contrast, the ancestral allele for rs3801776/HOXA9 increased promoter activity in undifferentiated muscle cells (Fig. 3C).

Fig. 2A–D
figure 2

Luciferase assays were performed to assess the effect of the DNA-protein interactions on TPM2 promoter activity for rs2025126/TPM2 and rs2145925/TPM2. The TPM2 promoter construct incorporated 460-bps upstream of the transcriptional start site and was ligated into the pGL4.10 basic luciferase vector. The ancestral and alternate allele constructs were generated by ligating the electrophoretic mobility shift assay double-stranded oligonucleotides that contained (A) rs2025126 or (B) rs2145925 in front of the TPM2 promoter construct. (C) The ancestral allele of rs2025126 that creates a DNA-protein interation caused a decrease in promoter activity. (D) The alternate allele of rs2145925 creating the DNA-protein interaction causes an increase in promoter activity.

Fig. 3A–D
figure 3

Luciferase assays were performed to assess the effect of the DNA-protein interactions on TPM1 promoter activity for rs4075583 and HOXA9 promoter activity for rs3801776. (A) The HOXA9 promoter construct containing the rs3801776 is shown. (B) The rs4075583/TPM1 construct contained 500-bps upstream of the transcriptional start site for the TPM1 skeletal muscle isoform ligated into the pGL4.10 basic luciferase vector. (C) The ancestral allele-specific DNA-protein interaction for rs3801776 increased promoter activity in undifferentiated mouse muscle cells. (D) The ancestral G allele and alternate TPM1 constructs were generated by ligating the electrophoretic mobility shift assay double-stranded oligonucleotides in front of the TPM1 promoter construct (as shown in Illustration C). Although the ancestral allele for rs4075583/TPM1 creates a DNA-protein interaction (as shown in Illustration B), it did not significantly affect promoter activity in differentiated mouse muscle cells.

rs4075583/TPM1 was next evaluated in the context of the genomic region containing seven additional SNPs in muscle cells (Fig. 4A). We identified the common haplotypes by genotyping probands from 64 non-Hispanic whites (Table 1) and 73 Hispanic families (Table 2). The most common in non-Hispanic whites and Hispanics was haplotype 1, while haplotypes 2 and 3 were the second most common in non-Hispanic whites and Hispanics, respectively, indicating ethnic differences. Haplotype 1 containing the alternate allele, rs4075583A showed the highest promoter activity, while the ancestral allele in haplotypes 2 to 4 produced less promoter activity (Fig. 4B). These findings are of interest because none of the seven other SNPs create nuclear protein interactions (data not shown). Notably, haplotype 3, containing the most allelic variation (variation at five SNP positions) in comparison to the most common haplotype (Haplotype 1) showed the lowest in promoter activity.

Fig. 4A–B
figure 4

Luciferase assays were performed to evaluate whether specific TPM1 haplotypes affected TPM1 promoter activity. The 1774 bp TPM1 region containing rs4075583/TPM1 has been shown to influence expression depending on cell type and haplotype. Based in that finding, (A) constructs were designed to incorporate the four most common TPM1 haplotypes and the TPM1 promoter region. (B) The four common haplotypes produced varying degrees of promoter activity in differentiated mouse muscle cells with the least promoter activity found with haplotype 3, which contains the alternate A allele of rs40755863 which eliminates the DNA-protein interaction observed with the ancestral G allele.

Table 2 TPM1 haplotype frequencies in Hispanics

Discussion

This study assessed the functionality of potential regulatory variants, rs3801776/HOXA9, rs2025126/TPM2, rs2145925/TPM2, and rs4075583/TPM1 previously found to be associated with clubfoot [43]. These genes initially were evaluated because of the role they play in muscle development and/or function, which is abnormal in clubfoot [15, 17, 19, 21, 2326, 31, 39, 44]. Three SNPs, rs3801776/HOXA9, rs2025126/TPM2, and rs2145925/TPM2, created allele-specific nuclear protein interactions that individually had significant functional consequences by affecting promoter activity in muscle cells [14, 43]. While rs4075583/TPM1 creates an allele-specific nuclear protein interaction, it was not sufficient to alter promoter activity. However, it did alter activity when in the context of the surrounding approximately 1.7 kb genomic architecture. These results suggest that these variants may enhance or repress gene expression individually or in combination, which may affect muscle morphogenesis and/or contraction. This is consistent with a multifactorial inheritance model proposed for clubfoot which predicts that changes in multiple genes are necessary to cause phenotypic consequences [11].

Muscle development and function are multifaceted processes that involve multiple genes expressed at precise times and locations. Alterations during these key processes could cause a cascade of muscle anomalies such as muscle hypoplasia and abnormal muscle contraction, both phenotypic characteristics of clubfoot. To account for these differences, we tested the functionality of our SNPs in undifferentiated and differentiated mouse muscle cells as a proxy for in vivo stages of development. During the beginning stages of myogenesis (undifferentiated cells), muscle cell migration and patterning are highly regulated with specific genes playing important roles in limb and foot development. HOXA9, a member of the homeobox A gene transcription factor cluster plays a role in synchronized patterning and differentiation of muscles, tendons, and cartilages in embryonic fore- and hindlimbs. HOXA9 regulates other transcription factors such as LBX1, which are important in the migration of muscle precursor cells into the developing hindlimb bud. The observed decrease in promoter activity with the alternate allele of rs3801776/HOXA9 (Fig. 3C) could lead to alterations in muscle structure, calf muscle size, and slow and fast-twitch muscle composition, all of which have been reported in clubfoot [17, 24, 25].

The maturing of myoblasts (differentiated muscle cells) into long, multinucleated muscle fibers facilitates the functional properties of muscle such as contraction, which is a highly regulated process involving many proteins including TPM1 in the quick, easily fatigued fast-twitch muscle fibers and TPM2 in the slow, long contracting, slow-twitch fibers [18]. Based on this information, we hypothesized that variation in expression of TPM1 and TPM2 could alter muscle contraction related to changes in binding of regulatory proteins. We found that two SNPs in TPM2 had functional activity that significantly differed depending on which variant was present. For example, rs2025126, located upstream of TPM2, had decreased promoter activity when the alternate allele was present because it eliminated nuclear protein interactions (Fig. 2C). Decreased TPM2 could allow for continuous actin-myosin interaction causing perpetual muscle contraction resulting in a contracted foot. In contrast, rs2145925, located in the first intron of TPM2, created a nuclear protein interaction in the presence of the alternate allele that increased promoter activity (Fig. 2D). Increased expression of TPM2 could lead to the actin inhibition limiting muscle contraction and foot movement thereby potentially causing muscle wasting as manifested by calf muscle hypoplasia. Imbalance of these important muscle functions could contribute to clubfoot.

Previously, we found an association with rs4075583, a variant located in intron 1 of the TPM1 skeletal muscle isoform, in a potential enhancer or suppressor regulatory region [43]. While the ancestral allele of rs4075583/TPM1 created a nuclear protein interaction, this complex alone did not alter promoter activity (Fig. 4B). However, different levels of expression were found when rs4075583/TPM1 was evaluated in the context of the approximate 1.7 kb genomic region (Fig. 4B). Haplotype 3, the only haplotype containing the alternate allele, had the lowest activity consistent with elimination of the nuclear protein interaction (Fig. 4B and C). These findings show that the binding affinity of the ancestral allele nuclear protein is influenced by the allelic composition of the surrounding genomic region and also suggest the importance of evaluating each SNP in its genomic context to understand its functional role.

The lack of an animal model and/or human muscle cell lines for use in functional studies is a limitation of this study. To circumvent this problem, C2C12 mouse muscle cells were used for the functional analyses because the phenotype of these cells can be manipulated to mimic different stages of muscle development, thereby allowing us to assess the effects that may occur at different times in development. For example, in the undifferentiated state, myoblast cells are mononucleated like those cells involved in early muscle patterning where HOXA9 plays an important role. In contrast, differentiated muscle cells are long multinucleated myotubes that contain the contractile proteins including TPM1 and TPM2 that have specific functions. While this system may not perfectly mimic the in vivo state, the observed changes provide evidence that these SNPs play a functional role and this expression may be observed only at different stages of development. Therefore, this study lays the groundwork for future studies.

In this study, we show that variants associated with clubfoot have functional consequences individually and in the context of the surrounding genomic architecture. This latter finding is important because lack of a positive functional outcome for an individual SNP may lead to the inappropriate exclusion of some variants or genes as playing a role in clubfoot. Therefore, all variants need to be analyzed individually and then in combination with surrounding SNPs. The future challenge resides in identifying all the common and rare variants, along with the haplotypes that confer increased risk for clubfoot. Next-generation sequencing approaches applied to family-based datasets should begin to yield all potential rare and common variants that will need to be functionally assessed. Knowledge of genetic variation underlying clubfoot will provide important insights into the etiologic pathways, provide a means to assess genetic risk and accurate genetic counseling based on individual risks, and develop prevention programs and potentially new treatments.