This study included five cohorts: the UK Biobank , the MrOS US , the MrOS Sweden , the SOF , and the CKB .
The UK Biobank comprises more than 500,000 participants who were recruited at multiple assessment centers in the UK and were enrolled between 2006 and 2010 at ages ranging from 40 to 69 years. Compared to a general population, the UK Biobank has been reported to be healthier, less obese, and less likely to smoke and drink alcohol . Participants of the UK Biobank were genome-wide genotyped using Affymetrix arrays, and their genotypes were imputed to the Haplotype Reference Consortium (HRC) reference panel [25, 33]. Site-specific major osteoporotic fractures were defined based on ICD10 codes of primary diagnoses (M8002, M8003, M8005, M8008, M8082, M8083, M8085, M8088, S220, S320, S422, S423, S424, S520, S521, S522, S523, S525, S526, S529, S720, S721, and S722) and self-reported medical history. Since the exact cause of fracture was not identifiable, these identified events may include traumatic fractures. Fracture risk factors were those captured at the baseline visit, and incident fractures were defined to be those occurring after the baseline visit.
The MrOS is an international multicenter longitudinal study comprised of elderly men . The MrOS US cohort recruited 5995 men aged ≥ 65 years at multiple assessment centers in the USA between 2000 and 2002 . Baseline examinations were conducted upon recruitment followed by a more extensive questionnaire after 2–2.5 years of follow-up. After a mean follow-up of 4.5 years, a second clinic visit was completed. Tri-annual questionnaires inquiring about the occurrence of incident falls, fractures, and fracture risk factors were collected during the intervening period. In this cohort, 11% of the participants were visible minorities. Among these 5995 men, 5130 were genotyped. After overlaying the first two genetic principal components with those from the 1000 Genomes Project , 4663 men of a genetically determined European ancestry were included in this study. Their genotypes were imputed to the HRC reference panel. The MrOS Sweden cohort recruited 3014 men aged 69 to 81 years in three Swedish cities between 2001 and 2004 . After baseline examinations, participants were followed for up to 10 years after the baseline examination. Fracture evaluation was done by searching digital X-ray archives and matching them with MrOS Sweden participants using unique personal identification number, which all Swedish citizens have. Among these 3014 men, 1880 were genotyped and included in this study. Their genotypes were imputed to the HRC reference panel.
The SOF cohort recruited 9704 women at four assessment centers in the USA who were aged ≥ 65 years at enrollment between 1986 and 1988 . Falls and fractures were monitored every 4 months. After baseline examinations, follow-up examinations took place approximately every 2 years during a mean follow-up of 14.5 years. Among these 9704 women, 3625 were genotyped, and 3615 women with a genetically determined European ancestry with reference to the 1000 Genomes Project  were included in this study. Their genotypes were also imputed to the HRC reference panel.
To be eligible for the MrOS US, the MrOS Sweden, or the SOF cohorts, participants were required to (1) be able to walk without the assistance of another, (2) not have had bilateral hip replacements, and (3) be able to provide self-reported data [26,27,28]. All reported fractures after baseline were confirmed by a physician review of the radiology report.
The CKB cohort recruited more than 500,000 adults aged between 30 and 79 years, from 2004 to 2008 in 10 survey sites in China . After the baseline survey, long-term follow-up was conducted by accessing electronic records of mortality registries, morbidity registries, and all hospitalized events and procedures that are available for 98% of the participants covered by the nation-wide health insurance system. SOS and incident fracture were collected for 25,034 participants during the second survey of the CKB between 2013 and 2014. These participants were genotyped, and their genotypes were imputed to a 1000 Genomes Project-based reference panel for East Asian populations .
Each study was approved by the institutional review boards of participating institutions and, all participants provided informed consent to respective studies.
Risk factor measurement
Clinically relevant risk factors were measured at the baseline visit for these cohorts, including FRAX clinical risk factors, such as age, sex, BMI (in the unit of kg/m2), prior fractures (hip fractures and other osteoporotic fractures), smoking, glucocorticoid use, rheumatoid arthritis, and femoral neck BMD based on DXA scan. Diagnosis of secondary causes of osteoporosis (type 1 diabetes, osteogenesis imperfecta in adults, untreated long-standing hyperthyroidism, hypogonadism or premature menopause, chronic malnutrition, or malabsorption and chronic liver disease) was available only in the UK Biobank. Parental history of fracture, at-risk drinking (≥ 3 units per day), and self-reported falls (a risk factor independent of FRAX probability ) based on interviews or questionnaires were available in the MrOS US, the MrOS Sweden, and the SOF cohorts. Cohorts with missing risk factor measurements were considered free of the corresponding risk factors for the derivation of FRAX probability, as suggested by the FRAX model (https://www.sheffield.ac.uk/FRAX/faq.aspx).
Development of a polygenic risk score
As described previously , we developed a polygenic risk score using a statistical learning approach to predict SOS, a risk factor for osteoporotic fracture in the UK Biobank. The polygenic risk score for FRAX is referred to here as “gSOS.”
There were 426,811 participants in the UK Biobank of white British ancestry who had measured SOS and had undergone genome-wide genotyping. We first assigned these individuals to a training dataset (N = 341,449, 80% of the total cohort), a model selection dataset (N = 5335, 1.25% of the total cohort), and a test dataset (N = 80,014, 18.75% of the total cohort). All 4717 individuals with femoral neck BMD measurements were assigned to the test dataset in order to compare the fracture predictive performance of gSOS with BMD. We performed a genome-wide association study on the training dataset using a linear mixed model adjusting for age, sex, assessment center, genotyping array, and the top 20 genetic principal components. We next performed a series of least absolute shrinkage and selection operator (LASSO) regressions  in the training dataset using SOS as an outcome and SNPs as predictors, applying different p value thresholds (p ≤ 5 × 10−3, p ≤ 5 × 10−4, p ≤ 5 × 10−5, p ≤ 5 × 10−6, p ≤ 5 × 10−7, and p ≤ 5 × 10−8) after removing SNPs demonstrating linkage disequilibrium (r2 > 0.05) with other SNPs. Each LASSO regression model thus yielded a polygenic risk score with tuned coefficients associated with different subsets of SNPs. We selected the best-performing model as that with the highest proportion of variance explained in the model selection dataset (Additional file 1: Table S1). This model included 21,717 activated SNPs with a p value ≤ 5 × 10−4 and was able to explain 23.18% (95% CI 22.66%–23.69%) of the total variance in the test dataset . Since all SNPs employed in the optimized polygenic risk score were present in all European ancestry cohorts under investigation, we obtained a genetically predicted SOS (gSOS) for each individual in the UK Biobank test dataset as well as the other three cohorts. We did not perform downstream analyses on the UK Biobank training and model selection datasets, since these datasets were used to generate and select gSOS, respectively, and could therefore be prone to biased estimates due to model over-fitting.
Because the genotypes of CKB participants were imputed to a different reference panel, only 13,848 activated SNPs were available in the CKB. Using these available SNPs, we derived a gSOS estimate for each CKB participant. Differences in minor allele frequencies between the CKB and the UK Biobank are summarized in Additional file 1: Fig. S1.
Derivation of FRAX-based risk scores
The country-specific FRAX tool (https://www.sheffield.ac.uk/FRAX/) incorporating clinically relevant risk factors was used to calculate the 10-year probability of experiencing a major osteoporotic fracture or a hip fracture . For individuals whose femoral neck BMD was available, a FRAX score including BMD was also calculated using this algorithm.
We also developed a clinical risk factor plus gSOS-based FRAX (FRAX-gSOS). The largest meta-analysis of osteoporosis cohorts to date, which did not include the UK Biobank, estimated that one standard deviation decrease in SOS was associated with 1.42-fold increased odds of experiencing major osteoporotic fracture . Therefore, to generate a gSOS-adjusted FRAX score for major osteoporotic fracture, we first converted the FRAX algorithm-derived fracture probabilities to odds of fracture. We divided the odds by 1.42 to the power of standardized gSOS, such that an individual with a standardized gSOS of 1 would have a 1.42-fold decreased odds; an individual with a standardized gSOS of − 1 would have a 1.42-fold increased odds. The adjusted odds were then converted back to fracture probabilities. For hip fractures, the same transformation was performed, except that the increase in risk per standard deviation decrease in SOS was previously demonstrated to be 1.80-fold .
Thus, for each individual, three FRAX scores were generated: a clinical risk factor-based FRAX (FRAX-CRF), a clinical risk factor plus BMD-based FRAX (FRAX-BMD), and a FRAX-gSOS. This allowed a direct comparison of the predictive accuracy of each of these three methods.
Due to the missingness of several FRAX clinical risk factors in the CKB cohort (Table 1), we did not generate FRAX scores for CKB participants. Analyses were thus restricted to testing the association (described below) between gSOS and incident major osteoporotic fracture or hip fracture risk adjusted for age and sex in the CKB cohort. We then compared the effect size of gSOS with those obtained in European populations based on the other cohorts.
Statistical analysis in European ancestry-based cohorts
We first standardized gSOS scores for each cohort separately to have a mean of zero and a standard deviation of one. This enables a more standardized comparison of effects across cohorts. Since risk predictors are more useful if they are independent of known risk factors, we assessed linear correlations between gSOS and clinically relevant risk factors of fracture using the Pearson correlation coefficient. We next quantified incident major osteoporotic fractures and binomial proportion CIs among individuals grouped with respect to different quantile ranges of gSOS: ≤ 1%, 1–5%, 5–20%, 20–40%, 40–60%, 60–80%, 80–95%, 95–99%, and > 99%.
We tested the association between the risk of incident major osteoporotic fracture and gSOS as well as other clinically relevant risk factors using logistic regression adjusted for age and sex. We also tested the association between the risk of incident major osteoporotic fracture and each of the three FRAX scores using logistic regression without adjusting for age and sex since they are included in the FRAX algorithm. Model comparison was performed by the likelihood ratio test. We assessed the predictive performance of each of the three FRAX scores using the AUROC or the area under the precision-recall curve (AUPRC) for major osteoporotic fractures. AUROC and CIs were computed using the R package “pROC” version 1.15.3 . DeLong’s test was performed to compare AUROC . AUPRC and bootstrapped CIs were computed using the R package “PRROC” version 1.3.1 .
Further, we evaluated the cumulative incidence of major osteoporotic fracture in the MrOS US and SOF cohorts separately by Kaplan-Meier estimates, censored at 90 years. These studies were selected because of their longer length of follow-up. We also assessed the cumulative risk of incident major osteoporotic fracture associated with gSOS, FRAX clinical risk factors, and FRAX-based scores by Cox proportional hazards regression using age at the time of fracture as the time scale. Timing of fracture was not available for other cohorts. The MrOS US and SOF cohorts were combined in the Cox models which were then sex-stratified. We assessed the predictive performance of each model using C-index (a generalization of the AUROC considering censored data in Cox models). Survival analyses were performed using the R package “rms” version 5.1-3.1 [41, 42]. C-indices and 95% CIs were computed using the R package “Hmisc” version 4.2-0 [41, 43]. The above analyses were repeated for incident hip fracture.
Lastly, we examined whether a FRAX-gSOS score can improve clinical screening by net reclassification index (NRI) and integrated discrimination index (IDI) . The NRI measures how well a new predictive model correctly categorizes individuals into their corresponding groups, while the IDI quantifies changes in the average discriminative power. We compared the gSOS-FRAX score to the CRF-FRAX score. The clinical cutoff was set at 20% and 3% (above which pharmacological treatment is recommended by the National Osteoporosis Foundation ) respectively for predicted 10-year major osteoporotic fracture risk and hip fracture risk. NRIs and IDIs were computed using the R package “PredictABEL” version 1.2-2 .