Background

Height is the growth phenotype during the entire developmental period from infancy to adulthood and becomes relatively stable in adulthood [1,2,3]. Previous studies have reported that social and environmental factors can influence height. Some of these factors include educational attainment, smoking, alcohol consumption, and regular exercise [3,4,5,6,7,8,9]. Higher levels of education [4, 5] and regular exercise [9] are associated with increased growth and height. In contrast, exposure to smoking or drinking may cause bone mass loss, reduced growth, and reduced height [10,11,12]. Height is determined by polygenic inheritance under complex and multi-locus genetic regulation [13,14,15]. Genome-wide association studies (GWAS) of height have identified hundreds of genetic loci (or single nucleotide polymorphisms [SNPs]) with genome-wide significance, especially in individuals of European ancestry [15,16,17,18,19,20,21,22,23,24,25,26]. These identified genetic loci are associated with proteins of the tyrosine phosphatase family, insulin-like growth factors, proteins involved in skeletal development and mitosis, fibroblast growth factors, the Wnt/β-catenin pathway, Hedgehog signaling, and cancer-associated pathways. These findings highlight the polygenic, complex, and multilocus genetic regulation of height.

Height is associated with several health-related outcomes later in life [27,28,29,30,31,32,33,34]. For instance, taller people tend to have a higher risk of cancer [28] and cancer-related mortality [27] but have a reduced risk of CVD [27, 29], CVD-related mortality [27, 29], type 2 diabetes [34], better retention of cognitive function [30,31,32], and healthy aging [33]. Height can be measured as a genetic component using a polygenic risk score (PRS). PRS is the sum of the weighted risk alleles from a combination of independent SNPs, usually with genome-wide significance, derived from GWAS results [13, 35, 36]. PRS serves as a genetic instrument variable and can be used to assess associations with health-related outcomes without confounding [37, 38]. Genetically determined taller height (in those with European ancestry) is also associated with an increased risk of cancers [39,40,41,42,43,44] and cancer-related mortality [45, 46] but a reduced risk of CVD [42, 47,48,49,50]. The precise shared genetic loci between height and health-related outcomes are yet to be elucidated, especially in individuals of Han Chinese ancestry. In addition, the mechanisms on how shared genetic loci contribute to both height and health-related outcomes remain unclear.

Therefore, this study aimed to identify the genetic architecture for height in individuals from the Taiwan Biobank—a community-based biobank in Taiwan. We also performed observational and genetic PRS analyses of height and health-related outcomes.

Methods

Taiwan Biobank

The Taiwan Biobank is a database for phenotypic and genomic measurements of the Taiwanese population that was established in 2012. The study recruited volunteers aged 30–70 years with no history of malignancy at enrollment (Twbiobank; https://www.twbiobank.org.tw/new_web/) [51, 52]. All volunteers were residents of Taiwan and provided informed consent. Participants completed questionnaires and underwent interviews, anthropometric measurements, and blood and urine tests to collect demographic, lifestyle, and genomic data.

Taiwan Biobank phenotypes

Anthropometric measurements, including height, body weight, waist circumference, hip circumference, and body fat percentage, were obtained from participants in the Taiwan Biobank (Additional file 1: Table S1). Body mass index (BMI) was calculated as BMI = body weight/body height2. The waist-hip ratio (WHR) was calculated as WHR = waist circumference/ hip circumference. Anthropometric measurements were stratified by sex and analyzed using the mean and standard deviation (SD), where data were normalized to one SD before further analysis.

Blood pressure and lipid and glucose levels were quantitatively measured in participants in the Taiwan Biobank. Systolic blood pressure (SBP), diastolic blood pressure (DBP), total cholesterol (TC), triglyceride (TG), low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C), fasting glucose, and hemoglobin (Hb) A1c levels were obtained (Additional file 1: Table S1).

Participants were asked to report their health status using questionnaires and interviews. According to participants’ self-reported health status (comorbidities) in the Taiwan Biobank, 10 broad categories of 49 diseases were investigated in our study as follows (Additional file 1: Table S1): (1) orthopedic or joint disorders: osteoporosis, arthritis, rheumatoid arthritis, osteoarthritis, and gout; (2) lung and respiratory diseases: asthma and emphysema or chronic bronchitis; (3) cardiovascular diseases: valvular heart disease, coronary artery disease, heart arrhythmia, cardiomyopathy, congenital heart defect, other type of heart disease, hyperlipidemia, hypertension, and stroke; (4) diabetes: type 1 diabetes and type 2 diabetes; (5) digestive diseases: peptic ulcer disease, gastroesophageal reflux disease, and irritable bowel syndrome; (6) Mental or emotional disorders: depression, bipolar disorder, postpartum depression, obsessive-compulsive disorder, alcohol addiction or drug abuse, and schizophrenia; (7) nervous system disorders: epilepsy, migraine, multiple sclerosis, Parkinson’s disorder, and dementia; (8) other types of disease: gallstones, kidney stones, kidney failure, and vertigo; (9) eye diseases: cataract, glaucoma, dry eye syndrome, retinal detachment, floaters, blindness, color blindness, and others; and (10) female diseases: severe menstrual cramps, uterine fibroids, ovarian cysts, endometriosis, and uterine/cervical polyps.

Study population

A total of 132,720 participants were selected from the Taiwan Biobank (Fig. 1). The exclusion criteria were as follows: (1) individuals who did not have GWAS data (N = 16,654), (2) individuals who did not pass the quality control (QC) and principal component analysis (PCA) of GWAS data (N = 19,555), (3) individuals who did not have height information (N = 25), (4) individuals with their height more than ±4 SD (N = 23), (5) individuals without drinking information (N = 52), (6) individuals without smoking information (N = 12), and (7) individuals without regular exercise information (N = 38). The criteria for drinking included current drinkers for at least 6 months; smoking criteria included current smokers for at least 6 months; finally, regular exercise criteria included participants performing regular exercise currently, for at least 6 months.

Fig. 1
figure 1

Flowchart of the study design and analysis process

Finally, 96,361 participants of Han Chinese ancestry were included in this study (Fig. 1) and assigned to the training, testing, and validation groups using a simple random sampling method (7:1.5:1.5 ratio). The training group (N = 67,452 participants) comprised 70% of the total study population and underwent GWAS based on height (Additional file 2: Table S2; Figs. 2 and 3). Before the height GWAS analysis, the measured height (phenotype) was stratified by sex and subsequently mean-centered and normalized to one SD. GWAS for height was then performed using a linear regression model with the assumption of additive allelic effects of SNP dosages, with adjusted covariates including age, sex, education, drinking, smoking, regular exercise, and the first 10 PCAs (Additional file 4: Fig. S2), using the PLINK software (version 1.9, 2.0) [3,4,5,6,7,8,9, 20, 53,54,55,56]. A genome-wide significance value was used (p < 5.00E−8 for the additive test).

Fig. 2
figure 2

Manhattan plot for height with Han Chinese ancestry

Fig. 3
figure 3

Regional plots for the independent signals at seven genetic loci for height in individuals with Han Chinese ancestry. A rs3791675 in EGF containing fibulin extracellular matrix protein 1 (EFEMP1). B rs76803230 in DIS3 like 3′-5′ exoribonuclease 2 (DIS3L2). C rs57345461 in zinc finger and BTB domain containing 38 (ZBTB38). D rs16895971 in ligand dependent nuclear receptor corepressor like (LCORL). E rs2780226 in high mobility group AT-hook 1 (HMGA1). F rs3816804 in citrate synthase (CS). G rs143384 in growth differentiation factor 5 (GDF5). Each plot shows the –log10 p-value on the y-axis for each SNP and the SNP position in the genome region on the x-axis. The top significant SNP is shown by a purple diamond; genes in its proximity are shown below each plot. LD with nearby SNPs is measured using R2 values, according to the 1000 Genomes Project Phase 3 East Asia Summit data, and is indicated by the color of each circle

We also ensured that our GWAS findings of the training group replicated previously identified height-associated genetic variants (https://www.ebi.ac.uk/gwas/efotraits/EFO_0004339). The reported GWAS height-associated genetic variants were mainly from individuals of European ancestry (excluding the genetic variants of infant or child height traits) and were downloaded from the GWAS catalog website. After removal of the repeated SNPs, 1722 reported GWAS body height-related SNPs were obtained from the GWAS catalog (Additional file 3: Table S3). These SNPs were replicated in our cohort, and we further identified 313 GWAS SNPs associated with height in our cohort (p < 0.05/1722 SNPs) (Additional file 3: Table S3).

The testing group (N = 14,454 participants) comprised 15% of the total study population and was used to select the best-fit PRS, to investigate the association between genetically determined height (PRS251) and measured height (phenotype) using linear regression analysis (Fig. 4). The validation group (n = 14,455 participants) comprised 15% of the total population and was used to determine the association between genetically determined height (PRS251) and measured height (phenotype) using linear regression analysis (Fig. 4). This study was approved by the Human Studies Committee of the China Medical University Hospital, Taichung, Taiwan (approval number: CMUH107-REC3-074).

Fig. 4
figure 4

Association between genetically determined height (PRS251) and measured height (phenotype). Measured height (cm) and calculated polygenic risk score (PRS) for height in the testing and validation groups were stratified by sex, mean-centered, and normalized to one standard deviation (SD), respectively (males, N = 10,919; females, N = 17,990). The normalized measured height is shown on the y-axis and normalized genetically determined height (PRS251) is shown on the x-axis

QC of the original data

Genomic DNA from the Taiwan Biobank was genotyped using Axiom genome-wide TWB1 (653,291 SNPs) or TWB2 (752,921 SNPs) array plates based on the Axiom genome-wide array plate system, according to the manufacturer’s instructions (Affymetrix Inc., Santa Clara, CA, USA). Genotyping was performed at the National Genotyping Center of Academia Sinica, Taipei, Taiwan (http://ncgm.sinica.edu.tw/affymetrix_tech_01.html) (https://www.biobank.org.tw/fd.php).

Genotypic data were then subjected to QC procedures (individual QC and SNP QC) in the Taiwan Biobank (https://www.biobank.org.tw/fd.php). The exclusion criteria for individual QC were as follows: (1) individuals with a missing call rate of > 5%, (2) a heterozygosity rate of >±5 SD, (3) individual identity by descent (IBD) score of ≥ 0.125, and (4) individuals who did not fit the East Asia Summit ancestry PCA. Similar to the results of a previous study [51], most individuals from the Taiwan Biobank were of Han Chinese ancestry. The exclusion criteria for SNP QC were as follows: (1) SNPs with a missing call rate of >5%, (2) SNPs with Hardy–Weinberg equilibrium (HWE) p-value of <1 × 10−5, and (3) SNPs with a minor allele frequency (MAF) of <5%.

Imputation

Qualified genotype data were then subjected to an imputation procedure to maximize the number of SNPs in the Taiwan Biobank (https://www.biobank.org.tw/fd.php). First, the SNPs of the qualified genotype data were excluded based on the following criteria: (1) SNPs with MAF of <1%, (2) SNPs with a HWE p-value of <1 × 10−5, and (3) SNP with a missing call rate > 5% using the PLINK software (versions 1.9 and 2.0, http://zzz.bwh.harvard.edu/plink/). SHAPEIT2 (v2.r790) was used to phase the genotypes into full haplotypes (https://mathgen.stats.ox.ac.uk/genetics_software/shapeit/shapeit.html). Third, imputation was performed using IMPUTE2 (v2.3.1, https://mathgen.stats.ox.ac.uk/impute/impute_v2.html), according to the pooled reference panel [Taiwan Biobank (TWB) + East Asian (EAS)]. The pooled reference panel comprised 973 phased individuals with the TWB panel from the Taiwan Biobank [57] and 504 phased individuals from the EAS panel [57, 58] (The Phase 3 1000 Genomes Project reference panel; The 1000 Genomes Project Consortium, 2010). The pooled reference panel with TWB and EAS ancestry groups was used to improve imputation accuracy [57, 58]. The following imputed SNPs were excluded: (1) SNP with a missing call rate of > 5%, (2) SNPs with MAF < 0.01%, and (3) SNPs with IMPUTE2 information score of < 0.3.

QC for this study

Imputed GWAS data were obtained from the Taiwan Biobank. In our study, SNP QC and individual QC procedures were applied before GWAS of height (Fig. 1). SNPs were excluded from the SNP QC based on the following criteria: (1) SNP with a missing call rate of > 5%, (2) SNPs with HWE p-value of < 1 × 10−6; and (3) SNPs with MAF < 0.01%. After SNP QC, the remaining SNPs were used to perform ancestry PCA for the population structure analysis. The exclusion criteria for individual QC were as follows: (1) individual with a missing call rate > 5%, (2) heterozygosity rate > ±5 SD, (3) individual IBD score ≥0.125, and (4) individuals who did not fit the East Asia Summit ancestry PCA. Participants of non-Chinese ancestry, with evidence of relatedness, or with DNA contamination were excluded.

PRS calculation

The PRS was calculated in the testing group using PLINK software (versions 1.9 and 2.0) [35, 53, 59], based on the statistical results of the 6,941 SNPs in the training group (Fig. 1). The 6941 SNPs comprised SNPs with genome-wide significance (p < 5 × 10−8) and SNPs that were replicated from previously reported body-height GWAS SNPs (p < 0.05/1722 SNPs; Fig. 1).

The 6941 SNPs were then subjected to the clumping procedure (within the range of 250,000 base pairs of the index SNP, where SNPs were removed when r2 > 0.1), according to the estimated linkage disequilibrium (LD) among the SNPs in the testing group (Additional file 4: Fig. S1A). After clumping, 251 SNPs were obtained. These 251 SNPs were used to select the “best-fit” PRS according to a series of cutoff values for height-associated p-value thresholds (including 5 × 10−15, 5 × 10−14, 5 × 10−13, 5 × 10−12, 5 × 10−11, 5 × 10−10, 5 × 10−9, 5 × 10−8, 5 × 10−7, 5 × 10−6, and 5 × 10−5) in the testing group. The p-value cutoff (5 × 10−5) was adopted by the “best-fit” PRS with the largest explicable phenotype r2 using only the PRS (PRS r2 = 0.0712, SNP number = 251; Additional file 4: Fig. S1A). In total, 251 SNPs were obtained for the best-fit PRS calculations for all participants. For each participant, the genetically determined height (PRS value) was calculated [35, 53, 59] using 251 SNPs obtained after the clumping protocol. Data centering and standardization were also performed for the PRS height data.

Statistical analyses

Genotype and imputed genotype data were used for GWAS analysis, as previously described. The HWE for the SNPs in the controls was evaluated using chi-square (χ2) tests. Lewontin’s D and R2 values were used to evaluate the inter-marker coefficient of LD for haplotype block analysis [60]. The confidence interval (CI) for LD was used to construct haplotype blocks by resampling [61, 62]. LocusZoom was used to plot the resulting significant locus [63].

Measured height (phenotype) served as the exposure variables. Sixty-three health-related outcomes, including 14 traits and 49 diseases, were used as outcome variables. A multivariate linear regression model was made for continuous outcome variables (14 traits), with adjustments for age, sex, education, drinking, smoking, regular exercise, and 10 PCAs [3,4,5,6,7,8,9, 20]. Multivariate logistic regression analysis was performed for binary outcome variables (49 diseases), with adjustments for age, sex, education, drinking, smoking, regular exercise, and 10 PCAs [3,4,5,6,7,8,9, 20].

The genetically determined height (PRS251) also served as the exposure variable. Sixty-three health-related outcomes, including 14 traits and 49 diseases, were used as outcome variables. A multivariate linear regression model was performed for continuous outcome variables (14 traits), with adjustments for age, sex, education, drinking, smoking, regular exercise, and 10 PCAs [3,4,5,6,7,8,9, 20]. Multivariate logistic regression analysis was performed for binary outcome variables (49 diseases), with adjustments for age, sex, education, drinking, smoking, regular exercise, and 10 PCAs [3,4,5,6,7,8,9, 20]. PLINK software (versions 1.9 and 2.0) and R packages for Windows were used for all statistical analyses.

Results

GWAS of the quantitative trait of height in Han Chinese in Taiwan

The Manhattan and QQ plots for the adult-height GWAS results are shown in Fig. 2. GWAS association analysis identified 6843 SNPs in 89 genomic regions with genome-wide significance (p < 5.00E−08 [5 × 10−8], not shown). The top lead SNPs were selected in 89 genomic regions with significant associations (p < 5 × 10−8) using an LD of < 0.2 (Additional file 2: Table S2). Among these, 18 novel lead SNPs within 18 novel regions, 48 novel lead SNPs within 48 reported regions, and 23 lead SNPs within 23 reported regions were found (Additional file 2: Table S2). Moreover, among these 89 genomic regions, the seven lead SNPs were located within seven genetic loci (Fig. 2). These seven genetic loci were located near the following genes: EGF-containing fibulin extracellular matrix protein 1 (EFEMP1), DIS3 like 3′-5′ exoribonuclease 2 (DIS3L2), zinc finger and BTB domain containing 38 (ZBTB38), ligand-dependent nuclear receptor corepressor like (LCORL), high-mobility group AT-hook 1 (HMGA1), citrate synthase (CS), and growth differentiation factor 5 (GDF5). The seven lead SNPs from these seven genetic loci are shown in Additional file 2: Table S2.

Regional plots of the lead SNPs and their neighboring SNPs from these seven genetic loci are shown in Fig. 3. Among them, two lead SNPs were novel (Fig. 3B, C; chromosome 2, rs76803230 in DIS3L2; chromosome 3, rs57345461 in ZBTB38), whereas the remaining five lead SNPs were previously reported (Fig. 3A, D–G). On chromosome 2, the lead SNP rs76803230 was located in the intronic region of DIS3L2 (risk allele: T, beta = 0.0681, [95% CI:0.0583–0.0780], p = 7.47E−42 [7.47 × 10−42]; Additional file 2: Table S2; Fig. 3B). On chromosome 3, the lead SNP rs57345461 was located in the intronic region of ZBTB38 (risk allele: T, beta = 0.0723, [95% CI: 0.0619–0.0827], p = 2.54E−42 [2.54 × 10−42]; Additional file 2: Table S2; Fig. 3C).

In the previously reported SNPs on chromosome 2, only a handful reached genome-wide significance associated with height, where the lead SNP rs3791675 was located in the intronic region of EFEMP1 (risk allele: C; training group: beta = 0.0753, [95% CI: 0.0638–0.0869], p = 2.78E-37 [2.78 × 10−37]; Additional file 2: Table S2; Fig. 3A). This SNP has been associated with body height, BMI-adjusted waist circumference, pelvic organ prolapse, and BMI-adjusted WHR [23, 64,65,66]. On chromosome 4, the lead SNP rs16895971 was located in the 3-untranslated region (3UTR) of LCORL (risk allele: T, beta = 0.1018, [95% CI: 0.0911–0.1125], p = 3.69E−77 [3.69 × 10−77]; Additional file 2: Table S2; Fig. 3D). This SNP has been associated with body height in East Asians [67]. On chromosome 6, the lead SNP rs2780226 was located in the 5 untranslated region (UTR) of HMGA1 (risk allele: C, beta = 0.0948, [95% CI: 0.0791–0.1104], p = 1.75E−32 [1.75 × 10−32]; Additional file 2: Table S2; Fig. 3E). This SNP has been associated with body height, BMI-adjusted waist circumference, and birth weight [15, 68, 69]. On chromosome 12, the lead SNP rs3816804 was located in the intronic region of CS (risk allele: C, beta = 0.1124, [95% CI: 0.0993–0.1255], p = 6.35E−63 [6.35 × 10−63]; Additional file 2: Table S2; Fig. 3F). This SNP has also been associated with body height in East Asians [70]. On chromosome 20, the lead SNP rs143384 was located in the 5-UTR of GDF5 (risk allele: G, beta = 0.0738, [95% CI: 0.0629–0.0847], p = 3.61E−40 [3.61 × 10−40]; Additional file 2: Table S2; Fig. 3G). This SNP has been associated with body height, BMI-adjusted hip circumference, BMI-adjusted WHR, and body fat [15, 64, 71, 72].

Replication of previously reported GWAS-determined SNPs in the Taiwan Han population

The previously reported GWAS-determined SNPs for height were obtained from the GWAS catalog (https://www.ebi.ac.uk/gwas/efotraits/EFO_0004339) and used to replicate the reported SNPs in the training group using the linear regression model, as described previously. In this study, an association analysis identified 313 SNPs that were significantly associated with height (p < 0.05/1722 SNPs; Additional file 3: Table S3).

In this study, GWAS-identified 6843 SNPs, and 313 of the reported SNPs were combined. After removing duplicate SNPs, 6941 SNPs were associated with height (Fig. 1). These 6941 SNPs were then applied to exclude SNPs with strong LD and to select the best SNP combination for the best-fit PRS calculation in the testing group, using PLINK software (versions 1.9 and 2.0) [53]. This resulted in the identification of independent genetic signals for the best-fit PRS with 251 SNPs. These 251 SNPs included 168 GWAS-identified SNPs (Table 1) and 83 previously reported GWAS-determined SNPs (Table 2). These results show that 168 novel GWAS-identified SNPs and 83 reported SNPs were associated with height in individuals of Han Chinese ancestry in Taiwan.

Table 1 Newly identified SNPs associated with height in Taiwan
Table 2 Association of previously reported GWAS height SNPs with height in Taiwan

Association between the genetically determined height (PRS251) and the measured height (phenotype)

The association between genetically determined height (PRS251) and measured height (phenotype) was investigated in the testing and validation groups, where height was stratified by sex (male: N = 10,919; female: N = 17,990; Fig. 4). For males, the regression line indicated that a 1-SD increase in PRS251 was associated with a 0.257-SD increase in normalized measured height (slope = 0.257; p < 0.001; green line). For females, the regression line indicated that a 1-SD increase in PRS251 was associated with a 0.274-SD increase in normalized measured height (slope = 0.274; p < 0.001; red line).

Furthermore, to assess the validity of our findings, we replicated the association between genetically determined height (PRS237) and measured heights (phenotype) in another cohort, kindly provided by the Big Data Center in China Medical University Hospital (CMUH), Taichung, Taiwan (Additional file 4: Fig. S3). As shown, only 237 of the 251 SNPs were available from the independent cohort of the Big Data Center at CMUH (Additional file 5: Table S4). The measured height (phenotype) and genetically determined height (PRS237) were normalized (standardized) by sex. For males, the regression line indicated that a 1-SD increase in PRS237 was associated with a 0.0972-SD increase in normalized measured height (slope = 0.0972; p < 0.001; green line; Additional file 4: Fig. S3). For females, the regression line indicated that a 1-SD increase in (PRS237) was associated with a 0.104-SD increase in normalized measured height (slope = 0.104; p < 0.001; red line; Additional file 4: Fig. S3).

Associations of height with health-related outcomes

In this study, we performed both observational (phenotype) and genetic PRS association analyses of height with 63 health-related outcomes using the Taiwan Biobank (Fig. 5). We examined the association between observational (phenotype) height with 63 health-related outcomes, including 14 traits and 49 diseases, in 67,452 individuals of Han Chinese ancestry (Fig. 5). Similar analyses of the association between genetic PRS of height and height were performed (Fig. 5). The genetically determined height of PRS251 (251 SNPs) applied in this analysis was calculated from our GWAS results, consisting of 168 GWAS-identified SNPs (Table 1) and 83 previously reported GWAS-determined SNPs (Table 2). The estimated beta values (95% CI) for the 14 traits are shown in Fig. 5A–C. The estimated odds ratios (95% CI) for the 49 diseases are also shown in Fig. 5D–M. After adjusting for age, sex, education, drinking, smoking, regular exercise, and 10 PCA results, our analyses showed that observational (phenotype) height was associated with eight of the 14 traits (p < 0.05/[14 + 49]; Table 3). Further analyses confirmed that genetic (PRS251) height was also associated with these eight traits (Table 3). No significant associations were observed between the measured and genetic PRS height with the 49 diseases (p > 0.05/[14 + 49]; Fig. 5D–M). Among anthropometric traits, observational height was positively associated with body weight, waist circumference, and hip circumference but negatively associated with BMI, WHR, and body fat (Table 3). Genetic PRS height was associated with increased body weight (beta = 1.2182, 95% CI = 1.1405–1.2959), waist circumference (beta = 0.4462, 95% CI = 0.3754–0.5171), and hip circumference (beta = 0.6006, 95% CI = 0.5488–0.6523), and a decreased BMI (beta = −0.0837, 95% CI = (−0.1110)–(−0.0563)), WHR (beta = −0.0008, 95% CI = (−0.0012)–(−0.0003)), and body fat (beta = −0.1401, 95% CI = (−0.1856)–(−0.0946)).

Fig. 5
figure 5

Observational (phenotype) and genetic PRS251 associations of height with 63 health-related outcomes. Beta value and 95% confidence interval (CI) per standard deviation (SD) increase in height are shown for A anthropometric trait 1 (hip circumference, waist circumference, and body weight), B anthropometric trait 2 (body fat, waist-hip ratio, and body mass index), and C blood pressure, blood lipid level, and blood glucose level (including systolic blood pressure (SBP), diastolic blood pressure (DBP), total cholesterol (TC), triglyceride (TG), low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C), fasting glucose, and HbA1c). Odds ratio and 95% CI per SD increase in height are shown for D orthopedic or joint disorders (osteoporosis, arthritis, rheumatoid arthritis, osteoarthritis, and gout), E lung and respiratory diseases (asthma and emphysema or chronic bronchitis), F cardiovascular diseases (valvular heart disease, coronary artery disease, heart arrhythmia, cardiomyopathy, congenital heart defect, other type of heart disease, hyperlipidemia, hypertension, and stroke), G diabetes (type 1 diabetes and type 2 diabetes), H mental or emotional disorders (depression, bipolar disorder, postpartum depression, obsessive-compulsive disorder, alcohol addiction or drug abuse, and schizophrenia), I digestive diseases (peptic ulcer disease, gastroesophageal reflux disease, and irritable bowel syndrome), J nervous system disorders (epilepsy, migraine, multiple sclerosis, Parkinson’s disorder, and dementia), K other types of disease (gallstones, kidney stones, kidney failure, and vertigo), L eye diseases (cataract, glaucoma, dry eye syndrome, retinal detachment, floaters, blindness, color blindness, and others), and M female diseases (severe menstrual cramps, uterine fibroids, ovarian cysts, endometriosis, and Uterine/cervical polyps)

Table 3 Significant association between phenotypic and genetically determined height with eight traits

Regarding blood pressure, and lipid and glucose levels, observational height was negatively associated with TC and LDL-C (Table 3). Genetic PRS height was associated with decreased TC (beta = −0.5869, 95% CI = (−0.8530)–(−0.3207)) and LDL-C levels (beta = −0.6291, 95% CI = (−0.8672)–(−0.3910)).

Discussion

We reported a genetic profile for height in the Han Chinese population using genome-wide SNP analysis and a replication study in the Taiwan Biobank—a community-based database in Taiwan. This is the first large-scale finding on the genetic basis for height and health-related outcomes in individuals of Han Chinese ancestry in Taiwan. Our study results are consistent with the genetic profile of height observed mainly in individuals of European ancestry [15,16,17,18,19,20,21,22,23,24,25,26]. Accordingly, our findings support the validity of height in observational (phenotype) studies and are consistent with the health-related outcomes of this phenotype [73,74,75,76,77,78,79,80,81].

In this study, we identified 6843 SNPs with genome-wide significance in 89 genomic regions, including 18 novel loci. Among these, we identified seven independent lead SNPs at seven genetic loci (two of these lead SNPs were novel: chromosome 2, rs76803230 in DIS3L2; chromosome 3, rs57345461 in ZBTB38) with genome-wide significance. DIS3L2 encodes one of the subunits of the RNA exosome, and its genetic variants are associated with height in individuals of European ancestry [21, 82] and East Asian ancestry [83, 84]. ZBTB38 is a zinc finger transcriptional activator that binds methylated DNA and is associated with apoptosis. ZBTB38 genetic variants are associated with height in individuals of European ancestry [17, 18] and East Asian ancestry [20, 23]. The remaining five lead SNPs were reported previously [15, 23, 64,65,66,67,68,69,70,71,72]. The lead SNP rs3791675 in EFEMP1 encodes an extracellular matrix glycoprotein of the fibulin family and has been associated with body height, BMI-adjusted waist circumference, pelvic organ prolapse, and BMI-adjusted WHR [23, 64,65,66]. The lead SNP rs16895971 in LCORL, a transcription factor involved in spermatogenesis, has been associated with height in East Asians [67]. The lead SNP rs2780226 in HMGA1 encodes a chromatin-associated protein that regulates gene transcription and metastatic progression of cancer cells and has been associated with body height, BMI-adjusted waist circumference, and birth weight [15, 68, 69]. The lead SNP rs3816804 in CS has also been associated with height in East Asians [70]. The lead SNP rs143384 in GDF5, which encodes a secreted ligand of the transforming growth factor-beta superfamily, regulates the development of numerous tissue and cell types and has been associated with body height, BMI-adjusted hip circumference, BMI-adjusted WHR, and body fat [15, 64, 71, 72]. Our observations report novel lead SNPs in the Han Chinese population that share an overlapping genetic architecture for height, mainly discovered in individuals of European ancestry.

Our observational (phenotype) analyses showed that height was associated with eight traits. Furthermore, our PRS251 analyses confirmed that genetic height was also associated with these eight traits. Taller height was associated with decreased BMI, WHR, body fat, TC, and LDL-C, but with increased body weight, waist circumference, and hip circumference. For anthropometric traits, we observed that both observational (phenotype) and genetically determined height (PRS251) were associated with BMI, WHR, body fat, body weight, waist circumference, and hip circumference. Taller height was associated with decreased BMI, WHR, and body fat, but increased body weight, waist circumference, and hip circumference. Our findings are consistent with previous observational (phenotype) studies that reported an inverse association between height and obesity-related traits, including BMI, WHR, and body fat [73, 74]. As expected, taller adults had lower rates of obesity [73, 74]. Our findings also support previous observational (phenotype) studies that reported positive associations between height and body weight [73,74,75], waist circumference [74, 75], and hip circumference [75]. Furthermore, taller adults had increased body weight and waist and hip circumferences. The results of our genetic PRS of height association were also in agreement with previous genetic correlation studies, mainly conducted in individuals of European ancestry [78,79,80,81]. There was a negative correlation between genetically determined height and BMI [78,79,80], and positive correlations between genetically determined height with waist and hip circumference [80, 81]. Our findings may reflect a partial genetic overlap between height and anthropometric traits including BMI, WHR, body fat, body weight, waist circumference, and hip circumference. However, genetic correlations in individuals of Han Chinese ancestry remain to be elucidated.

Regarding blood pressure and lipids and glucose levels, both observational (phenotype) and genetically determined height (PRS251) were inversely associated with TC and LDL-C. Our findings are consistent with those of previous observational (phenotype) studies that reported an inverse association between height and blood lipid levels [76, 77, 85]. Taller adults have lower levels of TC and LDL-C [76, 77, 85]. The results of our genetic PRS of height are also in agreement with previous studies, mainly conducted in individuals of European ancestry [50, 80, 81]. Negative genetic correlations between height and TC were found [80, 81]. A taller genetic PRS was associated with lower LDL-C levels in individuals of European ancestry [50]. Our results also suggest that genetically taller individuals of Han Chinese ancestry have lower levels of TC and LDL-C.

Conclusions

This large-scale assessment of the genetic architecture of height in the Han Chinese population of Taiwan quantified the extent of the shared genetic basis with individuals of European ancestry. Our observational and genetic study supports the relevance of height to the etiology of various health-related outcomes, especially those regarding anthropometric traits and blood lipids.