Introduction

Osteoporosis is a major public health problem affecting millions of people, predominantly postmenopausal women [1,2,3,4]. Osteoporotic fractures are a major cause of hospitalizations, disability, long-term medical care, and mortality in urban societies with relatively old populations [5, 6]. Worldwide, the burden of osteoporosis is escalating as the frequency of osteoporotic fractures continues to rise due, in part, to an aging population [1, 3, 4, 7].

While efforts have been made to increase inclusion of a diverse population in clinical trials, it is often impractical to conduct randomized trials with sufficient statistical power to evaluate clinical endpoints in all possible subgroups [8, 9]. Clinical trials involving therapies for chronic conditions are of shorter duration and typically enroll lower-risk groups than patient populations treated in real-world clinical practice [10, 11]. In addition, the impact that factors such as adherence can have on treatment outcomes differs considerably in clinical trials compared with real-world settings [2, 12]. Real-world evidence may provide complementary knowledge of therapeutic effectiveness in a heterogeneous mix of patients reflecting intrinsic (e.g., age, sex, ethnicity, clinical characteristics, and treatment preferences) and extrinsic (e.g., environment, socioeconomic status, access to healthcare, healthcare policy, and regional regulations) factors occurring in practice [8, 11, 13].

To measure the effectiveness of denosumab for fracture prevention in real-world postmenopausal women with osteoporosis, the current study compared fracture risk between two real-world cohorts: a “treatment” cohort (patients initiating denosumab and continuing therapy) versus a “non-treatment” cohort (patients initiating denosumab and discontinuing therapy after a single dose). The latter cohort was selected to limit the bias in the study related to the initial treatment decision and to leverage the context that a single dose of denosumab has no known sustained clinical benefit [14], thereby approximating the placebo arm of a clinical trial.

Materials and methods

Study design

We approached the objective of this retrospective cohort study within a framework to address potential issues of data quality, confounding biases, and reproducibility that threaten the validity of findings [15]. Hence, the methods include two sub-studies to validate the primary endpoint and to quantify the significance of unmeasured variables; a series of sensitivity analyses to assess confounding; quantitative bias analysis that estimates the direction and magnitude associated with systematic errors influencing measures of associations; and use of a common protocol applied by two different investigators in two different data systems to assess reproducibility and replicability.

Data sources

Data sources for this study were the medical care administrative databases available through the National Health Insurance Database in Taiwan and the Clinical Data Analysis and Reporting System in Hong Kong. The Taiwan Administration of National Health Insurance manages the single-payer health insurance system that provides medical and dental care for 99% of the 23 million Taiwanese enrolled in the system. The Hong Kong Hospital Authority serves a population of 7 million through 41 hospitals and more than 100 outpatient clinics. Approximately 80% of all hospital admissions in Hong Kong occur in public hospitals. Both data sources have been used extensively for research, and all individual data have been deidentified [16, 17].

Study protocol was approved by the National Cheng Kung University Institutional Review Board (HREC#107–008) in Taiwan and the Institutional Review Board of the University of Hong Kong/Hospital Authority Hong Kong West Cluster (UW 19–154) in Hong Kong. For the Taiwan data, the study period was inclusive of ≥ 1 year prior to regulatory approval of denosumab in 2011 through December 2016, and for the Hong Kong data was inclusive of ≥ 1 year prior to regulatory approval of denosumab in 2012 through August 2018.

Study population

Study population included all patients who were: 1) new users of denosumab 60 mg (defined as receipt of ≥ 1 dose with no history of prior use); 2) postmenopausal women (defined as ≥ 55 years of age at the time of initial dose); 3) free of any history of malignancy or Paget’s disease (see Supplementary Figures S1, A and B for the inclusion flows of the Taiwan and Hong Kong study populations). Of note, denosumab reimbursement criteria used in Taiwan do not require prior osteoporosis treatment and include patients with 1) T score ≤ -2.5 SD and a vertebral or hip fracture, or 2) T score -1.0 SD to less than -2.5 SD plus two vertebral or hip fractures.

Treatment cohorts

As denosumab is subcutaneously administered every 6 months, two patient cohorts were created based on the receipt status of second dose (Fig. 1). The treatment cohort included all patients that received a second dose at the expected administration date (i.e., 180 days after first dose plus a 45-day grace period for administrative challenges like scheduling appointments). Cohort entry date (i.e., index date) began at 225 days after initial dose and follow-up continued through the earliest date of denosumab discontinuation (i.e., 225 days since last administration), fracture endpoints, death or end date of data source. The off-treatment cohort included all patients that did not receive a second dose at the expected administration date. The index date for this cohort began at 225 days after the initial dose and follow-up continued through the earliest date of starting or re-starting any osteoporosis therapy, fracture endpoints, death or end date of data source.

Fig. 1
figure 1

Study design schema

Endpoints

Primary endpoint was hip fracture. Secondary endpoints included clinical vertebral fracture and nonvertebral fracture (hip, humerus, wrist, and distal forearm). To increase the likelihood of high specificity, albeit with decreased sensitivity, all fractures were identified from inpatient claims. In Taiwan, diagnosis codes were based on the International Classification of Diseases, 9th Revision, Clinical Modification (ICD-9-CM) through December 2015 and on the International Classification of Diseases, 10th Revision, Clinical Modification (ICD-10-CM) through the end of study period. In Hong Kong, diagnosis codes were based on the ICD-9-CM for the entire study period. To increase likelihood that fractures were due to osteoporosis, fractures concurrent with a motor vehicle accident on the same date were not included as endpoints (see Supplementary Table S1, A–E for codes and definitions of fracture endpoints).

Validity of the primary endpoint was assessed by comparing the operational definition of hip fracture versus medical chart review. In Taiwan, medical charts and radiographs of 300 randomly selected patients having either an ICD-9-CM or ICD-10-CM code for hip fracture at a large academic institution were reviewed by two physicians independently. Of these 300 patients, 298 were confirmed to have a hip fracture by the physicians (i.e., a positive predictive value = 99.3%). The validation report for hip fractures in Taiwan is included in Online Resource 1. In Hong Kong, the validity of ICD-9-CM code for hip fracture endpoint was previously reported by a similar process for 104 patients with hip fracture, of which all 104 were confirmed hip fractures (positive predictive value = 100%) [17].

Covariates

Covariates were used to describe characteristics of the study population and were used for risk adjustment of prognostic differences for fracture between cohorts. The assessment period for covariates was defined as 1 year before the cohort entry date and identified through drug administration, drug dispensing, and diagnosis codes. Covariates included in the study known to confer fracture risk included 59 variables of: demographic characteristics and histories of comorbidities, medication use, and health-seeking behavior (see Supplementary Table S2 for the covariate list).

Bone mineral density (BMD) and body mass index (BMI) are two strong risk factors for osteoporotic fractures, and their values were not directly available in the data sources. Hence for the Taiwan population, an additional assessment (IRB #:201801493B0) was undertaken to inspect the balance in BMD and BMI between cohorts for a study subset having the information available in the electronic medical records of the Chang Gung Research Database (CGRD) [18]. CGRD contained 7% (n = 2605 patients, mean age 76.5) of the study population in Taiwan (n = 38,906 patients; mean age 77.4). The assessment report for the population subset with BMD and BMI is included in Online Resource 2.

Smoking and alcohol use are two more known risk factors for osteoporotic fracture, and their values were not directly available in the data source. Fortunately, prevalence of alcohol and tobacco use among the elderly is low (< 5%) in Taiwan; hence, little possibility exists of these risk factors occurring in the study population [19].

Statistical analysis

Propensity-score (PS) matching was used to control for confounding arising from measurable variables. As a first step, the PS was calculated for each patient in the study population using multivariable logistic regression analysis, conditional on all 59 baseline covariates. Next, the distribution of PS for both treatment and off-treatment cohorts was described. The standardized mean difference (SMD) was used to test the differences in baseline covariates between cohorts. A SMD > 0.1 (10%) represents a clinically significant difference. Using the greedy 8 → 1 digit algorithm, the PS was used to match treatment and off-treatment cohorts in a 1:1 ratio [20]. As the final step, the Kaplan–Meier method was used to plot survival curves and relative risk reduction was estimated using the Cox proportional hazard model adjusting for baseline covariates.

Sensitivity and subgroup analysis

Sensitivity and subgroup analyses evaluated the robustness of results. An alternative approach to PS-matching included the inverse probability of treatment weights (IPTWs) to estimate the adjusted hazard ratio in the IPTW-weighted cohort using the Cox proportional hazard model.

Subgroup analysis included the primary analysis by age group, bisphosphonate use in year prior to study entry, and fracture history in year prior to study entry.

Assessment of residual confounding

The analytical strategy included three additional analyses to evaluate the potential for unmeasured or residual confounding to impact the interpretation of study results. First, a high-dimensional propensity score (HdPS) was used to examine the robustness of confounding controls by HdPS. The methodology and HdPS algorithm have been published elsewhere [21]. In brief, HdPS is based upon the selection of empirically derived covariates from among all the information in administrative claims data. The selection of variables to include within the HdPS model is based upon their confounding potential—i.e., if a covariate is strongly related to both the outcome and the treatment and is prevalent, then it is deemed to be a potentially important confounder and is ranked higher for inclusion in the model. As these empirically derived covariates may collectively be proxies for unmeasured variables, the use of the HdPS can mitigate the likelihood of residual confounding. Second, the extent of residual confounding that would be required to refute an observed difference in fracture incidence between cohorts (i.e., rule-out method) was assessed. This quantitative bias analysis has been described previously [20], and we applied the algorithms that are publicly available [22, 23]. Third, as described and reported in the covariate section, we adopted an approach described previously [24] and inspected the balance of BMD and BMI between cohorts within the subset having available information to quantify their potential as a source of residual confounding.

Replicability of findings

The International Society for Pharmacoepidemiology-International Society for Pharmacoepidemiology and Outcomes Research (ISPE-ISPOR) Joint Task Force has highlighted that replicable findings can be tested by conducting multiple studies that evaluate the same research question but use different data or methodology or operational decisions [25, 26]. To assess the reproducibility of these conclusions, the Hong Kong study PI applied the same common protocol (i.e., same general methodology) to the Hong Kong dataset, and operational decisions were unique to each study site.

Results

Study population in Taiwan

Eligible study population included 38,906 female patients who initiated denosumab therapy. Mean age was 77 years and in the year prior to initiating denosumab, 17% had a hip fracture and 35% had received bisphosphonate therapy. Treatment cohort included 25,059 women (64% of study population) who received between 2 and 10 doses of denosumab (median, 4 doses) during the study period. Follow-up for fracture endpoints in this cohort while receiving therapy was a mean ± standard deviation (SD) of 16 ± 11 months for a total of 34,013 person-years of observation. The off-treatment cohort included 13,847 women (36% of study population) who received 1 dose of denosumab and then discontinued therapy. Follow-up for fracture endpoints in this cohort while receiving no therapy was a mean ± SD of 19 ± 12 months for a total of 22,153 person-years of observation.

Within the eligible study population, most (46 of 59) of the risk factors for fracture (i.e., covariates) were balanced between study cohorts (Fig. 2). The variables not balanced suggested that greater age, comorbidities, and use of other medications have some association to discontinuing therapy after 1 dose of denosumab. After PS matching, both cohorts of 13,419 patients were matched for all covariates (Supplementary Table S3).

Fig. 2
figure 2

Balance of covariates (N = 59) before and after propensity score matching—Taiwan. CCB = calcium channel blocker; ER = emergency room; SMD = standardized mean difference. SMD < 0.1 represents a difference between treatment and off-treatment cohorts that is not statistically significant. The covariates labeled in the figure are those with SMD > 0.1

Effectiveness of denosumab in Taiwan

Primary analysis in the PS-matched population included 554 hip fractures occurring in 183 patients in the treatment cohort and 371 patients in the off-treatment cohort. Crude incidence rate was 0.9 per 100 person-years in the treatment cohort and 1.7 per 100 person-years in the off-treatment cohort. After adjusting for any prognostic differences between the two cohorts, denosumab reduced the risk of hip fractures (hazard ratio [HR], 0.62; 95% CI, 0.52 to 0.75) (Table 1). Cumulative incidence of hip fracture is shown in Fig. 3. Similar risk reductions for hip fracture were observed across age groups, prior use of bisphosphonates, and fracture history (Fig. 4 and Supplementary Table S4). Similar risk reductions were observed for the secondary endpoints of clinical vertebral fractures and nonvertebral fractures (Table 1).

Table 1 Fracture risk in treatment and off-treatment cohorts—Taiwan population
Fig. 3
figure 3

Kaplan–Meier plot of cumulative incidence of hip fracture in patients in the treatment and off-treatment cohorts—Taiwan population

Fig. 4
figure 4

Results of subgroup analyses and sensitivity analysis for the primary endpoint of hip fracture—Taiwan population. BPs = bisphosphonates; HdPS = high-dimensional propensity score; HR = hazard ratio; IPTW = inverse probability of treatment weight

Sensitivity analysis and assessment of residual confounding in Taiwan

As an alternative analytical approach, IPTW was used to include the entire eligible study population of 38,906 patients that initiated denosumab therapy (see Supplementary Table S5 for baseline characteristics of the weighted population). To address the potential of residual confounding, HdPS included a broader set of covariates than in the primary analysis (see Supplementary Table S6 for baseline characteristics of the matched population). Both the IPTW and HdPS approaches identified a risk reduction for hip fracture in the treatment cohort similar to the primary analysis (Fig. 4).

As one approach to assess the potential extent of residual confounding from unmeasured risk factors, both BMD and BMI were available for 2,605 patients (6.7% of the eligible study population). In these patients with the additional available data, the BMD and BMI measures were well balanced between the two study cohorts at baseline (Online Resource 2). Specifically, at the femoral neck, mean ± SD BMD was -3.4 ± 1.0 and -3.5 ± 1.1 in the treatment and the off-treatment cohorts, respectively, and mean ± SD BMI was 24.3 ± 3.9 and 24.3 ± 4.0. The balanced distribution of BMD and BMI in this subset of the study population at baseline suggested they were less likely to be significant confounders.

Quantitative bias assessment (i.e., rule-out method) was another approach to assess the extent of residual confounding that would be required to refute the observed difference in fracture incidence between cohorts (Supplementary Figure S2). Results of this analysis suggest that unmeasured variables with a prevalence < 10% in this population, such as alcohol and tobacco use [19], could not have a meaningful influence on results.

Analysis of Hong Kong population

Eligible study population included 2835 patients that initiated denosumab therapy. Mean age was 78 years, and in the three years prior to initiating denosumab, 14% had a hip fracture and 37% had received bisphosphonate therapy. Treatment cohort included 2379 women (84% of study population) who received between 2 and 13 doses of denosumab (median, 4 doses) during the study period, and the off-treatment cohort included 456 women (16% of study population). Baseline characteristics of each cohort, summarized in Supplementary Table S7, suggested that greater age, comorbidities, and use of other medications are more prevalent in the off-treatment group. After PS matching, the following variables remained unbalanced: number of emergency room visits, number of hospitalizations, diabetes history, fracture history, and diuretic medication use, and they were further included into outcome models for adjustment.

After PS matching, there were a total of 33 hip fractures (treatment cohort, 26; off-treatment cohort, 7). The observed incidence rates of hip fracture were 0.98 and 1.71 cases per 100 person-years in the treatment and off-treatment cohorts, respectively. Risk reductions for fracture endpoints were in the same direction as the Taiwan analysis, though with much wider 95% CI for the HR that included 1.0 (Supplementary Table S8).

Discussion

Results from this large population-based cohort study showed clinically meaningful risk reduction for hip fracture, clinical vertebral fracture, and nonvertebral fracture with denosumab treatment. Within the study population in Taiwan, patients in the cohort persistent to denosumab experienced relative risk reductions of 38% for hip fracture, 37% for clinical vertebral fracture, and 38% for nonvertebral fractures, with a mean (± SD) on-treatment follow-up time of 16 ± 11 months. Although not directly comparable due to differences in study settings and population, these real-world study results were largely consistent with the results of the placebo-controlled, randomized trial, the Fracture REduction Evaluation of Denosumab in Osteoporosis every 6 Months (FREEDOM) study, which found relative risk reductions through month 36 of 40% (HR 0.60 [95% CI, 0.37 to 0.97]; p = 0.04) for hip fractures, 69% (HR 0.31 [95% CI, 0.20 to 0.47]; p < 0.001) for clinical vertebral fractures and 20% (HR 0.80 [95% CI, 0.67 to 0.95]; p = 0.01) for nonvertebral fractures [14]. Results of the second analysis in the smaller Hong Kong population were consistent with results in Taiwan and FREEDOM.

Analysis of the Kaplan–Meier curves suggests that the benefits of fracture risk reduction with denosumab were pronounced in patients who remained on treatment. This is consistent with multiple studies showing adherence to osteoporosis therapy is associated with better outcomes, including a reduction in risk of hip fracture [2, 27,28,29]. In this analysis, a separation in the cumulative incidence of hip fractures between the treatment and off-treatment cohorts is first noticeable after approximately 12–15 months of treatment. In the FREEDOM trial, this effect is apparent 9–12 months into treatment [14].

The current study has several strengths. To our knowledge, it is the first large-scale study investigating the effectiveness of denosumab in postmenopausal women under real-world clinical settings with long-term follow-up. Clinical trials typically enroll a lower-risk group of patients [10, 11], may benefit from better adherence [2, 13], and may detect effectiveness within a shorter period of time [30] when compared to real-world studies where factors such as comorbidities, concomitant treatments, access, cost, and follow-up care influence both prescriber and patient behavior. As a result, while randomized trials are the gold standard in establishing the efficacy and safety of new therapies, real-world evidence is necessary to fully characterize treatment outcomes. The data sources in Taiwan and Hong Kong are highly representative of their respective populations, and the hip fracture record is highly valid. The study design helped ensure clinical homogeneity of patients included in the analysis. Health insurance utilization management criteria in Taiwan meant the entire study population was at high risk for fracture, which mitigates the likelihood of differences between study arms in both measured and unmeasured variables related to severity of osteoporosis. Because both cohorts initiated denosumab therapy, the potential for residual confounding related to the clinician’s decision to treat is limited [31]. Overall, this study design resulted in the majority of measured confounders being balanced between study arms before any statistical adjustment.

As the inherent limitation of observational studies, we cannot fully rule out the potential impact of unmeasured confounders. Several analytical methods were used to both minimize and evaluate the significance of unmeasured variables in the primary analysis of the effectiveness of denosumab for reduction of hip fractures among postmenopausal women in Taiwan. The results of a series of pre-specified sensitivity analyses, as well as an hdPS algorithm, were consistent with the primary analysis suggesting it is unlikely there is meaningful residual confounding due to unmeasured confounders. Quantitative bias analysis indicated it is highly unlikely an unmeasured variable with a prevalence < 10% in the study population, such as smoking and alcohol use, could explain the observed results. An additional analysis undertaken to assess the potential for residual confounding associated with BMD and BMI found that both variables were balanced between the two study arms at baseline, therefore mitigating the concern of residual confounding due to inability to include these parameters as part of the PS matching in the primary analysis. Another residual confounder could be an increased fracture risk in the off-treatment cohort due to rebound bone resorption after denosumab discontinuation [32]. However, this potential confounder is unlikely as it has been shown that rebound of bone turnover is absent in subjects receiving a single injection of denosumab [33, 34]. In addition, the observed apparent similarities in the cumulative risk of fracture in the Kaplan–Meier curves between the initial 6 months of discontinuation in the non-treatment cohort and the corresponding months in the treatment cohort suggest no detectable increased risk of fracture after discontinuation of 1 dose of denosumab. Finally, we did not know the reasons patients discontinued their drugs. However, our findings that 64% of patients refilled their second dose is within the range of published data, but at the lower end likely due to the stricter criteria for grace period (i.e., 45 days) used in our study [35]. We also included several known covariates that are associated with adherence into outcome models (e.g., age, health resource utilization, social economic level, etc.). Managing non-persistence is especially important in patients initially treated with reversible osteoporosis medications such as denosumab. Timely re-initiation of denosumab or transition to a different class of antiresorptives such as bisphosphonates may attenuate the potential adverse effects of discontinuing denosumab [36].

The increasing pace of population aging is a worldwide phenomenon that has fueled recognition of osteoporosis as an important public health issue, particularly in large aging populations such as in Asia [7, 37]. Evidence-based interventions aimed at minimizing the burden associated with osteoporosis and improving treatment adherence are needed to help maintain quality of life in affected patients. We found a clinically meaningful risk reduction for hip fracture among postmenopausal women who remained on denosumab therapy in a real-world setting. The risk reduction was consistent across subjects with a wide range of baseline characteristics and fracture risk categories and was similar to what was observed in the randomized controlled FREEDOM study.