Introduction

Osteoporosis is a bone disease that develops when bone mineral density (BMD) and bone mass decrease or when the structure and strength of bone change. This can increase susceptibility to fractures, especially in the hip, spine, and wrist [1]. Osteoporotic fractures can lead to significant morbidity, mortality, and healthcare expenses [2], with an estimated 2 million cases and $19 billion in costs annually in the United States alone [3, 4]. Given the global aging population, the incidence of osteoporosis is projected to increase [5], underscoring the importance of early identification of individuals at high risk of primary fractures.

The risk of osteoporotic fracture has a high heritability, with genetic liability up to 46% [6]. Genetic factors substantially contribute to fracture risk [7]. Genome-wide association studies (GWAS) over the past decade have identified single nucleotide polymorphisms (SNPs) associated with bone strength-related traits [7]. Around 43 genomic loci and thousands of SNPs are robustly associated with fractures [8], and many more genetic associations have been reported for fracture-related traits/risk factors [9,10,11].

Bone mineral density (BMD) is the most critical predictor of osteoporosis and fracture [12]. Polygenic score (PGS) derived from GWAS summary statistics for BMD has been used to quantify an individual’s genetic liability to fractures [13,14,15,16,17]. Previous studies have highlighted the potential of BMD-related PGS for risk prediction of fracture [13, 14]. Nevertheless, the clinical utility of PGS in fracture prediction is limited, with a marginal additive effect of PGS on clinical factors.

A multi-PGS extension, metaPGS, has been developed to improve predictive performance by combining multiple PGSs into one score [18]. It has been applied to many other complex diseases and was proven to significantly increase the predictive accuracy of coronary artery disease [18], ischemic stroke [19], type 2 diabetes [20], and breast cancer [21]. In fracture prediction, an individual’s estimated genetic propensity was typically derived based on the GWAS summary statistics of a single trait, BMD. Considering that fragility fracture is a multifactorial disease influenced by various physiological factors beyond BMD [22], PGS depending on only one trait may not be sufficient to capture the genetic components of fracture. If a particular disease/trait is causally involved in the etiology of fracture, the PGS for that disease/trait as a genetic proxy should predict fracture occurrence, and a metaPGS may be particularly useful in fracture prediction. Integrating genetic information of multiple fracture–related traits into metaPGS can improve predictive accuracy.

Therefore, this study aimed to develop and validate a multi-trait metaPGS to integrate genetic information of multiple fracture–related traits to improve predictive accuracy. To evaluate the predictive value of metaPGS beyond the currently available fracture prediction tool, we examined the potential clinical use of metaPGS beyond the existing fracture risk assessment tool (FRAX), an algorithm predicting 10-year probabilities of major osteoporotic fracture (MOF) and hip fracture (HF) based on 12 clinical risk factors [23]. By improving the accuracy of genetic risk prediction for osteoporotic fractures, metaPGS could aid in identifying high-risk individuals and implementing preventive measures.

Methods

Study cohort

The UK Biobank (UKB) is a large-scale population-based observational study comprising 502,617 individuals aged between 40 and 69 years who were recruited from the UK between 2006 and 2010 [24]. A standardized socio-demographic questionnaire, medical history, and other lifestyle factors were collected at recruitment. Individual records were linked to the Hospital Episode Statistics (HES) records and the national death and cancer registries as the underlying genetic models were developed and trained primarily using European ancestry samples, including individuals of white British ancestry in the current study, allowed for a better representation of the genetic architecture in that population, and resulted in more accurate predictions. Thus, the current study only included individuals of white British ancestry to examine a relatively homogeneous group.

Fracture events ascertaining

Fracture cases were identified using the baseline questionnaire of self-reported fracture incidents fractures within the past 5 years. Hospital Episode Statistics are linked through NHS Digital with a hospital-based fracture diagnosis irrespective of mechanism within the primary or secondary diagnosis field (Supplementary Table 1). All the incident fracture cases were identified through the hospital episode statistics. Fractures of the skull, face, hands, and feet, pathological fractures due to malignancy, atypical femoral fractures, and periprosthetic and healed fractures were excluded from the analysis. Based on the date of the ICD-10 record, fractures sustained after the initial assessment visit were defined as incident cases (n = 13,623).

Data processing and quality control

A total of 488,251 participants were genotyped using Affymetrix arrays [25]. The genotype data were quality controlled and additionally imputed using the Haplotype Reference Consortium (HRC) [26] and the UK10K haplotype resources, yielding a total of 96 million imputed variants. SNPs with minor allele frequency less than 0.1% and SNPs that are missing in a high fraction of subjects (> 0.01), Hardy–Weinberg equilibrium p value > 1 × \({10}^{-6}\). Individuals with a high rate of genotype missingness (> 0.01) were excluded from PGS construction. A total of 450,395 individuals and 11.5 million variants passed the quality control standards and remained for subsequent analysis.

Individual PGS tuning

GWAS summary statistics were available for 16 complex traits/diseases related to fracture risk. PGSs were generated with the estimated effect sizes from the most recent literature on large GWAS (Supplementary Table 2). To minimize the risk of over-fitting due to overlapping samples between the GWAS discovery set and the UKB validation set, the selected GWAS did not include UKB samples. GWASs for femoral neck BMD [27], total body BMD [28], hand grip strength (HGS) [9], appendicular lean mass (ALM) [10], whole body lean mass (WBLM) [10], vitamin D (VD) [11], serum calcium concentration (SCC) [29], homocysteine (HC) [30], thyroid stimulating hormone level (TSH) [31], fasting glucose (FG) [32], fasting insulin (FI) [32], type 1 diabetes (T1D) [33], type 2 diabetes (T2D) [34], rheumatoid arthritis (RA) [35], inflammatory bowel disease (IBD) [36], hip bone size (HBS) [37], and coronary artery disease (CAD) [38] were selected for individual PGS derivation.

We randomly selected 1000 fracture cases and 2000 non-fracture cases for individual PGS tuning. Based on GWAS summary statistics of 16 fracture-related phenotypes and a linkage disequilibrium reference panel of 503 European samples from 1000 Genomes (phase 3, version 5), a set of candidate PGSs was derived for each phenotype/trait using the Pruning and Thresholding (P + T) method and the LDPred2 computational algorithm [39].

Using the P + T method, 24 candidate PGSs were calculated with combinations of p value (1.0, 0.5, 0.05, 5 × 10−4, 5 × 10−6, and 5 × 10−8) and \({r}^{2}\) (0.2, 0.4, 0.6, and 0.8) thresholds for each trait. The LDPred2 computational algorithm grid mode was used to generate seven candidate PGSs based on seven hyper-parameter values of ρ (1, 0.3, 0.1, 0.03, 0.01, 0.003, and 0.001). The PGS construction was restricted to the HapMap3 variants only, as LDpred2 suggested [29].

For each of the 16 phenotypes, 31 candidate PGS were derived for each individual in the UKB tuning set. The risk of fractures increases with age due to the weakening of bones. Women are at higher risk for osteoporosis-related fractures than men; the association between each PGS and the fracture was further evaluated in terms of odds ratios (OR) per standard deviation of PGS using logistic regression adjusted for age, sex, and BiLEVE/UKB genotyping array and the first four principal components (PCs). The most optimal model for the largest magnitude odds ratio was selected as the one representative PGS for each trait and carried forward into subsequent analyses.

Derivation of the metaPGS

Each representative PGS determined from the previous step was standardized to have a zero mean and unit standard deviation. We then split the remaining UKB European ancestry dataset into a training set (n = 135, 119) and a testing set (n = 315,276). Using the UKB training set, we employed elastic-net logistic regression [40] to model the association between the 16 PGSs and fracture, adjusting for age, sex, and the first four PCs. A range of models with different penalties was evaluated using tenfold cross-validation. Regarding the highest area under the receiving-operating characteristic curve (AUC), the best model was selected as the final model to generate metaPGS and held fixed for validation in the UKB testing set. The metaPGS was calculated using a weighted average of the standardized individual PGSs:

$${{\text{PGS}}}_{{\text{i}}}^{{\text{meta}}}=\frac{{\alpha }_{1}{{\text{PGS}}}_{{\text{i}}1}+\dots +{\alpha }_{16}{{\text{PGS}}}_{{\text{i}}16}}{{\alpha }_{1}+\dots +{\alpha }_{16}}$$

where \({{\text{PGS}}}_{{\text{i}}1}\),…,\({{\text{PGS}}}_{{\text{i}}16}\) are the 16 zero mean and unit variance standardized PGSs for the \(i\) th individual; \({\alpha }_{1}\),…,\({\alpha }_{19}\) are the coefficients (log odds ratio) for each of the 16 PGSs (Fig. 1).

Fig. 1
figure 1

Study design and workflow. a Derivation of individual PRSs in the UKB training set (n = 135,119) using GWAS summary statistics for individual traits. b The metaPGS for fracture was then derived by integrating individual PGSs using the elastic-net cross-validation. c Validation of the metaPGS for fracture will be performed in the UKB validation set (n = 315,276). PGS, polygenic score; FNBMD, femoral neck bone mineral density; TBBMD, total body bone mineral density; HGS, hand grip strength; ALM, appendicular lean mass; WBLM, whole body lean mass; VD, vitamin D; SCC, serum calcium concentration; HC, homocysteine; TSH, thyroid stimulating hormone level; FG, fasting glucose; FI, fasting insulin; T1D, type 1 diabetes; T2D, type 2 diabetes; RA, rheumatoid arthritis; IBD, inflammatory bowel disease; HBS, hip bone size; CAD, coronary artery disease

Statistical analyses

The demographic and clinical characteristics of the UKB testing set were described using mean and standard deviation (SD) for continuous variables and the frequency and percent for categorical variables. The primary outcome of this study was incident fractures. All PGSs in the UKB testing set were standardized to facilitate interpretability to have unit variance. To illustrate the different cumulative incidences of fracture in individuals with distinct genetic predispositions, we grouped individuals according to different quantile ranges of metaPGS: ≤ 1%, 1–5%, 5–20%, 20–40%, 40–60%, 60–80%, 80–95%, 95–99%, and > 99%. The cumulative incidence of fracture by metaPGS groups was then derived using the cumulative incidence function (CIF), with the competing mortality risk accounted for.

The separate prediction of each of the 16 trait-specific PGSs was examined by fitting a series of simple logistic regression models. To account for multiple testing across the individual PGSs tested in separate logistic regression models (single-PGS models), we used 10,000 permutations to find the significance threshold to control the false discovery rate p values. Using the UKB training set, we employed elastic-net logistic regression [40] to model the association between the 16 PGSs and fracture, adjusting for age, sex, and the first four PCs. Based on significant individual PGSs selected from the elastic regularized regression model, metaPGS was derived for each individual in the UKB testing set. Two previously developed BMD-related PGSs (PGS_FNBMD [13] and PGS_TBBMD [16]) were also included in the subsequent analysis for comparison purposes.

All scores (PGS_FNBMD, PGS_TBBMD, and metaPGS) were evaluated using logistic regression and Cox proportional hazard regression. The performance of models with and without PGSs in identifying individuals at risk of sustaining a fracture was evaluated using the AUC and tested for statistical significance using the Delong test. Additionally, we examined the fracture incidence according to the PGS category in the UKB testing set. We compared the effect of top percentiles (1%, 5%, 10%, and 20%) with the remaining percentiles (99%, 95%, 90%, and 80%) of each PGS using Cox proportional hazard models. All regression models were controlled for age, sex, and the first four PCs.

We also investigated the predictive value of metaPGS beyond the existing fracture assessment tool and compared its performance with two previously developed BMD-related PGSs (PGS_FNBMD [13] and PGS_TBBMD [16]). The association between each PGS with fracture risk, adjusted for the FRAX risk factors, including age, BMI, previous fracture, current smoking, glucocorticoids, and rheumatoid arthritis, was assessed using Cox proportional hazard models. The model with only FRAX risk factors was set as the base model. Four models were formulated: (1) Model 1—base model; (2) Model 2—base model + \(PGS\_FNBMD\); (3) Model 3—base model + \(PGS\_TBBMD\); and (4) Model 4—base model + \(metaPGS\). The magnitude of the association between each PGS and fracture risk was assessed by the hazard ratio and its corresponding 95% confidence intervals. Model comparison was performed using the bootstraps.

In addition, net reclassification improvement (NRI) was adopted to compare the reclassification ability of the models with PGSs to those without PGS. We designated “high risk” as the predicted MOF risk ≥ 20% and “low risk” as the predicted MOF risk < 20%, based on the National Osteoporosis Foundation’s recommended fixed intervention cutoff [41]. The integrated discrimination improvement (IDI) was also calculated to incorporate both the direction of change in the calculated risk and the extent of change.

The estimated BMD (eBMD) calculated based on the quantitative ultrasound index through the calcaneus is available for the majority of the subjects in the UKB. Given that eBMD is recognized as a predictor of fracture risk, we sought to enhance our analysis by conducting a sensitivity analysis. This additional investigation aimed to provide a more comprehensive understanding of the impact of PGSs in a model that incorporates both FRAX risk factors and eBMD. Furthermore, we extended our sensitivity analysis to evaluate the predictive ability of the developed metaPGS in the context of non-vertebral fractures. All statistical analyses were conducted using R version 4.0.3 software and SAS 9.4 (SAS Institute, Inc., Cary, NC, USA).

Results

The characteristics of the UKB testing set are shown in Supplementary Table 3. The overall UKB testing set consists of 315,276 individuals, of which 8787 were incident fracture cases and 306,489 were non-fracture cases. Supplementary Fig. 1 shows correlations between 16 individual PGSs, with strong correlations observed between HC and SCC, SCC and CAD, CAD and IBD, ALM and WBLM, T1D and TSH, TSH and TBBMD, TBBMD, and RA. The metaPGS was derived based on 11 significant individual PGSs selected from the elastic regularized regression model (model weights are shown in Fig. 2).

Fig. 2
figure 2

Associations of 16 trait-specific PGSs with the fracture outcome in the UKB derivation set. Estimates per standard deviation increase of each individual PRS evaluated in logistic regression (univariate) and elastic-net logistic regression adjusted for age and sex. “inactive” indicates that the elastic-net estimated odds ratio was negligible (between 0.999 and 1.001, shown as a blue dot). CI, confidence interval

We assessed the crude 10-year cumulative fracture incidence by nine PGS groups (Fig. 3). With competing mortality risk accounted for, significant differences in the 10-year fracture risk were observed across metaPGS deciles (p < 0.0001). The top and bottom 1% of the metaPGS showed a substantial difference in the cumulative fracture incidence. A comparison of the metaPGS with its individual components (PGS_FNBMD and PGS_TBBMD) is shown in Fig. 4. Results show that metaPGS had a greater association with fracture risk than the two individual PGSs. All three PGSs were strongly associated with incident fracture (p < 0.0001), with an odds ratio (OR) ranging from 1.15 to 1.35. In comparison to the baseline model, which incorporated only age and sex, models augmented with PGS_FNBMD, PGS_TBBMD, and metaPGS demonstrated marginal improvements in the AUC from 0.643 to 0.647, 0.654, and 0.654, respectively. However, these improvements were not deemed statistically significant. The metaPGS was associated with an incident fracture with a hazard ratio (HR) of 1.22 (95% CI 1.19–1.27) per standard deviation of metaPGS, which was stronger than PGS_FNBMD (HR = 1.10, 95% CI 1.08–1.12) and PGS_TBBMD (HR = 1.15, 95% CI 1.12–1.18) (Fig. 4). Using Cox proportional hazard models, we also assessed the HRs for the top 1%, 5%, 10%, and 20% decile vs. the remaining percentiles of the PGSs. The results showed that the bottom 1% of the population had a 1.36-fold (95% CI 1.15–1.61) increased fracture risk than the remaining population (Supplementary Table 4).

Fig. 3
figure 3

Cumulative incident function plot for fracture according to decile of the metaPGS in UKB testing set. Shaded regions denote 95% confidence intervals

Fig. 4
figure 4

Relative performance of PGS_FNBMD, PGS_TBBMD, and metaPGS for fracture. A Cox proportional hazard models; B multivariate logistic regression models. Separated logistic/Cox proportional hazard regression was conducted for each PGS; each estimate was adjusted for age, sex, and the first four principal components

The clinical utility of a PGS depends on its performance in combination with established risk factors and genetic risk models. Next, we evaluated the predictive value of metaPGS while adjusting for established risk factors. We examined seven FRAX risk factors available in the UKB data. As expected, established risk factors were positively associated with incident fracture, current smoking, and sex being the strongest risk factors (Table 1). Adjusting for these risk factors only modestly attenuated the association of the metaPGS with incident fracture. The metaPGS had the strongest association with incident fracture. The HRs of PGS_FNBMD, PGS_TBBMD, and metaPGS for incident fracture were 1.09 (95% CI, 1.07–1.12), 1.15 (95% CI, 1.12–1.18), and 1.21 (95% CI, 1.18–1.25), respectively. Models with PGS_FNBMD, PGS_TBBMD, and metaPGS had slightly higher but statistically non-significant c-index than the base model (0.640, 0.644, 0.644 vs. 0.638) (Supplementary Table 5). Compared to the base model, the association between clinical risk factors and incident fracture risk did not attenuate in all four PGS models. The sensitivity analysis showed similar but attenuated results. The effect size of PGSs was attenuated in the sensitivity analysis but remained statistically significant. PGS_FNBMD (HR 1.06; 95% CI 1.03–1.09, p < 0.0001), PGS_TBBMD (HR 1.09; 95% CI 1.05–1.11, p < 0.0001), and metaPGS (HR 1.13, 95% CI 1.09–1.18, p < 0.0001) were significantly associated with an incident fracture, with FRAX risk factors and estimated BMD adjusted for (Supplementary Table 6). When further limited to non-vertebral incident fractures, the HR of PRS_FNBMD, PRS_TBBMD, and metaPGS were 1.05 (95% CI, 1.01–1.09), 1.08 (95% CI, 1.04–1.12), and 1.12 (95% CI, 1.06–1.16), respectively (Supplementary Table 7).

Table 1 Hazard ratio for the hazard function for significant predictive variables for incident fractures in the base model and models with PGS_FNBMD, PGS_TBBMD, and metaPGS (n = 315,279)

In the reclassification analysis, compared to the base model, the models with PGS_FNBMD, PGS_TBBMD, and metaPGS improved the reclassification of fracture by 0.9% (95% CI, 0.04 to 1.58%), 1.36% (95% CI, 0.52 to 2.19%), and 1.41% (95% CI, 0.58 to 2.24%), respectively (Table 2). Moreover, the metaPGS showed the most remarkable improvement in terms of reclassification. For the model that included metaPGS, 13,799 (6.9%) individuals were correctly reclassified up to the high-risk group, and 13,530 (4.3%) individuals who did not experience a fracture were correctly reclassified from the high-risk group to the low-risk group. The continuous NRI showed that improvement in fracture reclassification contributed by PGS_FNBMD, PGS_TBBMD, and metaPGS were 10.1%, 15.9%, and 16.8%, respectively.

Table 2 Reclassification table of 10-year osteoporotic fracture stratified by event status. Results of reclassification analysis: percent of reclassification compared with FRAX base model (n = 315,279)

Discussion

The present study developed and evaluated a novel metaPGS for fracture risk prediction by combining genetic information from multiple fracture–related traits. The ability of the metaPGS to predict fracture risk was evaluated alone and in combination with the clinical risk score recommended by guidelines. The metaPGS demonstrated a significant association with incident fractures, with a hazard ratio of 1.22 per standard deviation of metaPGS, which was significantly more potent than previously established BMD-related individual PGSs. The predictive power of the metaPGS was comparable to established risk factors such as age, body weight, and early menopause. Adding the metaPGS to the existing FRAX clinical risk factors improved the discrimination of fractures from non-fracture cases, suggesting that the metaPGS can help stratify fracture risk in the European population and develop personalized prevention strategies.

Our study contributes to using genomic information to stratify individuals for fracture risk. Pleiotropy, a phenomenon in which a single gene or genetic variant influences multiple traits or diseases, has been well-documented in previous research [42]. Since genetic variants can affect multiple traits simultaneously, independent PGSs for fracture risk are expected to overlap significantly. To overcome this challenge, we employed elastic net regularized regression to combine multiple PGSs and estimate their contributions to fracture risk prediction while minimizing collinearity. The resulting metaPGS combines genetic information from 11 of 16 bone-related traits and disorders, resulting in a robust and strongly associated predictor of fracture risk.

Compared to existing individual PGSs, the new metaPGS showed a more significant association with fracture and a more remarkable risk discrimination ability. Moreover, the metaPGS has comparable predictive power to some established risk factors. By combining metaPGS with the current fracture risk assessment tool, our findings suggested the added value of metaPGS beyond established clinical risk factors. The predictive ability of metaPGS was largely independent of established risk factors for fracture, implying that the metaPGS captured residual risk that was not quantified by the established risk factors. In addition, the results of reclassification analyses indicated that combining metaPGS with the FRAX risk factors improved discriminating fractures and non-fracture cases. Its fracture risk reclassification is better than the two previously developed BMD-related PGSs [43].

There are several limitations worth mentioning. Notably, the predictive performance of the metaPGS for fracture is limited when compared with certain diseases, such as CAD [18]. The reasons could be that fragility fracture is more heterogeneous than other diseases and that the GWAS sample size for mechanistically defined fracture is also limited. Also, our investigation focused on fractures reported by participants and the electronic health records, potentially leading to an underrepresentation of asymptomatic vertebral fractures. This limitation is noteworthy and likely plays a role in the comparatively lower predictive performance of the metaPGS for fractures. Furthermore, the sample size of older individuals (> 75 years) in the UKB is relatively small, limiting our ability to model fracture risk in the age strata where most events occur. Furthermore, the duration of follow-up in UKB is relatively limited. Because of the limited covariates available in the UKB, we could not assess the predictive value of the metaPGS beyond the full FRAX model. Moreover, as the metaPGS was derived and tested primarily in individuals of European ancestry, it may not have equivalent predictive power for other ethnic groups due to variations in allele frequencies, linkage disequilibrium patterns, and effect sizes of common polymorphisms across different ancestries. The absence of a family history of fracture in the UKB precluded an examination of whether the association of the metaPGS with fracture risk is influenced by familial factors. Finally, we only used a partial of the risk factors included in FRAX and did not calculate the FRAX estimate. Therefore, the effect of metaPGS beyond the FRAX may not be sufficiently adjusted.

Our study developed and evaluated a novel approach for fracture risk prediction, the metaPGS, which combines genetic information from multiple fracture–related traits. Despite challenges in phenotypic heterogeneity and GWAS power, our study presents a powerful fracture genomic risk score to date. It assesses its potential for risk stratification in the context of established risk factors and clinical guidelines. The metaPGS provides added value to established clinical risk factors and has potential clinical utility for personalized prevention strategies. However, it is imperative to acknowledge the possibility of cases falling outside the predictive scope of our model. Predictive models, including the metaPGS, inherently have limitations, and our findings suggest that not all fracture cases were accurately predicted. Future research endeavors could focus on incorporating additional variables, refining genetic markers, or exploring alternative methodologies to address these limitations. Future studies should also validate the metaPGS in other populations and evaluate its clinical utility. The metaPGS is a promising approach for fracture risk prediction that overcomes the limitations of single PGSs and represents a significant step towards using genomic information to help stratify individuals for fracture risk.