Background

Height has been correlated with various disorders, including the observations that taller people are at a higher risk of developing cancer and shorter people are more likely to present with type 2 diabetes [13]. Determination of height in humans has long been considered to be largely influenced by genetic factors; indeed, twin and family studies have suggested that as much as 90% of variation in human height is genetically determined[48].

For many years, studies have attempted to identify genetic factors influencing human height in order to provide insights into human growth and development. Prior to 2007, genome-wide linkage and candidate-gene association studies had limited success in this regard; however, with the recent emergence of genome wide association (GWA) studies, tens of common genetics variants influencing height have now been uncovered, primarily in adults[914].

Weedon et al published the first GWA study of height using the Affymetrix GeneChip Human Mapping 500 K platform on nearly 5,000 individuals of self-reported European ancestry[9]. As a consequence, they observed association to common variation in the mobility group-A2 (HMGA2) oncogene. Follow-up analyses in approximately 19,000 more individuals (both adults and children) revealed strong replication of this observation. A subsequent GWA study uncovered another height locus, GDF5-UQCC, using data from the FUSION and SardiNIA cohorts[10].

These initial discoveries were followed by four meta-analyses with larger sample sizes, which collectively revealed 44 additional height loci [1114]. However, some lack of overlap between the results of these GWA studies has been observed, which may be partly explained by the different statistical powers of the studies[15].

Although the causal variants at these loci have still to be elucidated, it has been shown that many of the implicated genes are involved in pathways influencing bone and cartilage development, including skeletal development signaling (PTCH1, HHIP, BMPs, GDF5), the extracellular matrix (ACAN, FBLN5, EFEMP1, ADAMTS17, ADAMTSL3), chromatin structure and regulation (DOT1L, SCMH1, HMGA2) and cell cycle regulation and mitosis (CDK6, ANAPC13, NCAPG)[15]. In addition, some of the loci were novel and are now a clear focus of attention in height biology.

In this study we aimed at examining these initial and meta-analysis findings that were previously reported to be genome wide significant in a large European American pediatric cohort with height measurements to determine the relative impact of these variants on childhood stature. For this purpose, we leveraged genotyping data from our ongoing GWA study of height variation in children.

Methods

Study population

All subjects were consecutively recruited from the Greater Philadelphia area from 2006 to 2009 at the Children's Hospital of Philadelphia and its Primary Care Centers. Our study cohort consisted of 8,184 children of European ancestry with height information. All subjects were biologically unrelated and were aged between 0 and 18 years old. The basic characteristics of the study subjects are outlined in Table 1. This study was approved by the Institutional Review Board of the Children's Hospital of Philadelphia. Parental informed consent was given for each study participant for both the blood collection and subsequent genotyping.

Table 1 Basic characteristics of the study subjects, including sample size and mean height plus standard deviation (S.D.) for each age and gender separately

Genotyping

We performed high throughput genome-wide SNP genotyping using either the Illumina Infinium™ II HumanHap550 or Human 610 BeadChip technology in the same manner as our center has reported previously[16]. The SNPs analyzed survived the filtering of the genome wide dataset for SNPs with call rates < 95%, minor allele frequency < 1%, missing rate per person < 2% and Hardy-Weinberg equilibrium P < 10-5.

Loci described from GWA studies published to date have been found using either the Affymetrix or Illumina platform. In the event a locus was reported using both the Illumina and Affymetrix arrays, we used the SNPs present on the Illumina array. In the event of a signal only being described on the Affymetrix array, we either already had that SNP on our Illumina array or we identified and used the best surrogate SNP available (see Additional file 1: Supplemental Table S1 for the surrogates employed).

Statistical analyses

From our database of heights for our multi-dimensional scaling (MDS) determined Caucasians, as previously described[1719] and resulting in a low genomic inflation factor, we eliminated height outliers using 2% cutoff for each age category in order to remove potential measurement error. As height values vary widely across pediatric age groups and gender, we calculated the Z-scores using inverse-normal transformation for each age (one year bin) and gender category, and conducted association analysis with the Z-scores as the outcome variable.

We queried the data for the indicated SNPs in our pediatric samples. All statistical analyses were carried out using the software package plink[20]. By treating the Z-score for height as a quantitative trait, association analysis for each SNP was carried out using linear regression with the SNP included as an independent variable (coded as 0, 1, and 2, counting the number of minor alleles at the SNP).

The results for Figure 1 were generated by summing the number of height increasing alleles across all 16 height-associated SNPs in our study to in order to produce a scatter plot showing the impact of the genotype score on the cumulative height Z-score.

Figure 1
figure 1

Scatter plot for association between height z-score and the genotype score by summing the number of height increasing alleles across all 16 height-associated SNPs.

Results

The 51 SNPs corresponding to the 46 previously reported height loci were investigated with respect to their association to normalized pediatric height in MDS-determined European Americans (Table 2; also Additional file 2: Supplemental Table S2 for analyses by age categories).

Table 2 Quantitative association results for the candidate loci in the European American height cohort (n = 8,184), sorted by chromosomal location.

In summary, sixteen of these SNPs yielded at least nominally significant association to height (P < 0.05), representing fifteen different loci with the same direction of effect as previously reported. Of these fifteen loci, variation at the EFEMP1-PNPT1 locus yielded the strongest association with P = 1.39×10-5, namely rs3791679.

With a slightly lower magnitude of association was GPR126 with rs3748069 yielding a P = 3.64×10-4, C6orf173 (also known as LOC387103) with rs1490388 yielding a P = 7.20×10-4, SPAG17 with 118574711 yielding a P = 7.27×10-4 and the Histone class 1 gene cluster with rs10946808 yielding a P = 9.57×10-4.

Overall, in addition to these loci, we found evidence for association at the HLA class III region, UQCC-GDF5, C6orf106, JAZF1, ZBTB38, PLAG1, C1orf19-GLT25D2, LCORL-NCAPG, CABLES1-RBBP8-C18orf45 and SCMH1 loci. One could argue that we have carried out multiple testing in our height cohort for these previously reported SNPs, albeit at a number of magnitudes less than for a full GWA study. If we were to apply the strictest correction, i.e. the Bonferroni correction based on 51 SNPs, then EFEMP1-PNPT1, GPR126, C6orf173, SPAG17 and the Histone class 1 gene cluster would still be considered significant and their effects are consistent with the outcomes of the adult GWA studies.

It was also observed that SNPs residing at the 31 other loci did not reveal any evidence of association with height in our pediatric cohort, most notably HMGA2.

Finally, we investigated the sixteen significant SNPs further by testing for association between height Z-score and the genotype score, by summing the number of height increasing alleles across all these SNPs. The resulting P-value for the genotype score was < 2×10-16 (Figure 1). The genotype score explains 1.64% of the total variation for height z-score. We also tested pair-wise interactions between the sixteen significant SNPs, but none of the interaction effects were significant, suggesting that these sixteen SNPs act additively on pediatric height.

Discussion

We queried the existing dataset from our ongoing GWAS of pediatric height in European Americans for adult height loci uncovered in GWAS to date. We examined 51 single nucleotide polymorphisms (SNPs) corresponding to 46 genomic loci in 8,184 children with height measurements. Sixteen of these SNPs yielded at least nominally significant association to the trait, representing fifteen different loci.

One of the more notable results is the negative association with HMGA2. This gene is one of the most strongly associated loci with adult height[9] so its lack of association with childhood stature in this study is striking. We previously published a replication attempt with this locus and pediatric height when our cohort was substantially smaller[21]; at that time, we observed nominal association but it is clear that as our cohort has grown, this signal has failed to strengthen. Despite the wealth the evidence from adult GWA studies and from previous work with knock-out mouse models, it is of surprise not to observe association with HMGA2. However, when considering the age bins presented in Additional file 2: Supplemental Table S2, the T statistic generally increases with age, with the strongest value being for the 15-18 age group. Although none of these observations are significant, it may point to an age-specific effect at a particular point during childhood that is undetected in the overall analysis; however our large cohort size may still not be powered enough to tease out this effect.

For the loci we did not observe any evidence for association at all may be due to power issues, but could also indicate that they have a less pronounced role in a pediatric setting. In addition, only a portion of the published adult height loci have been independently and robustly replicated to date[22]. It should also be noted that childhood growth is an ongoing process where development factors may cloud detection at certain loci, including at the two rapid growth stages, where nutrition plays a major role in infant growth and hormone signaling impacts at puberty. Our study may lack power to detect stage specific association when using a mixed age childhood cohort; however we have presented the association results for specific age bins in Additional file 2: Supplemental Table S2.

From this analysis, it is clear that a number of loci previously reported from GWA analyses of adult height also play a role in our phenotype of interest. While these recently discovered loci unveil several new biomolecular pathways not previously associated with height, it is important to note that these well established genetic associations with stature explain very little of the genetic contribution for this pediatric phenotype, suggesting the existence of additional loci whose number and effect size remain unknown.

Conclusions

Among 46 loci that have been reported to associate with adult height to date, at least 15 also contribute to the determination of height in childhood. Once our GWA study is complete, we will have the opportunity to look for other variants in the genome that are associated with height in childhood.