Introduction

Osteoporosis therapeutic options have expanded greatly since the introduction of daily oral bisphosphonates in the mid-1990s [1,2,3,4]. Current options now include weekly, monthly and delayed-release formulations of oral bisphosphonates, as well as zoledronic acid, denosumab, raloxifene, teriparatide, strontium, and romosozumab [1,2,3, 5, 6]. Randomized controlled trials (RCTs) provide evidence of drug efficacy for drug approval and market entry [7, 8]. However, observational methods are critical to provide evidence of drug safety and effectiveness in real-world settings [9,10,11].

Healthcare administrative claims (hereafter “claims”) data are commonly utilized to identify outcomes in real-world settings. Claims data are produced when healthcare providers and organizations receive reimbursement for goods and services, and are often repurposed to estimate drug effects [9, 12]. Although the purpose of collating claims for billing is not estimating drug safety or effectiveness, investigators access these data for research purposes. However, there are no standardized methods to define outcomes in observational research studies. In contrast, randomized controlled trials (RCTs) are carefully designed to estimate drug efficacy and safety. Primary outcomes are carefully adjudicated by a panel of experts, and thus RCT evidence serves as the basis for drug approval. Still, once drugs are available on the market, observational studies that use claims data are essential to estimate drug effectiveness and safety in the real world. In particular, RCTs often restrict inclusion to healthy patients, and thus once on the market, many patients have comorbidities that may impact drug effects. In addition, sample sizes for RCT target efficacy and thus it can be challenging to identify rare safety concerns or long-term drug effects.

Claims data generated from clinical encounters among patients with osteoporosis often provide detail on the location of the fracture, concomitant diagnoses, and treatments provided. With the exception of vertebral fractures which often go undiagnosed [13], most fractures tend to be identified in claims because they require prompt medical attention and treatment. The use of claims data is thus an ideal method for identifying fractures as an outcome in real-world settings [14]. Data used to identify fracture diagnoses and procedures can be sourced from diagnosis and procedural codes from emergency department, inpatient (hospitalization), and outpatient settings. Although fracture validation studies exist [15,16,17,18], we recently noted inconsistency in the use of fracture outcome definitions in real-world fracture outcome studies [19]. Indeed, we have modified our own definitions over time based on clinical expertise. For example, we initially followed a validation study and required hip fracture diagnostic and procedural codes [15, 20]. Yet, upon discussion with orthopedic surgeons, we came to appreciate that requiring a procedural code would miss inoperable hip fractures (e.g., if a patient is too frail or dies before surgery) and thus started to omit the need for an inpatient procedural code [21, 22].

Differences in outcome definitions have led to calls for transparency and agreement among outcome definitions in studies that use claims data to increase rigor in real-world evidence [23]. Consistent and accurate definitions are critical to minimizing outcome misclassification and thus biased estimates of fracture risk [18]. We conducted this scoping review to better understand how fracture outcomes are defined in osteoporosis drug effects studies that use claims data. We aimed to answer the research question, “How do osteoporosis drug effects studies that use claims data in Canada and the United States of America (USA) define fracture outcomes?”[19]

Methods

This scoping review was conducted in accordance with JBI methodology [24], was registered on Open Science Framework [25], and followed an a priori protocol [19]. Detailed methods have been previously published [19]. In brief, we considered observational studies (e.g., cohort, case–control) that estimated the effects of osteoporosis drugs on fracture risk (Table 1). We targeted studies that utilized claims data from Canada or the USA, due to their similarities in coding systems to define fractures [19]. Studies were excluded for the following reasons: Abstract-only, osteoporosis medications were not the primary exposure, fracture was not a main outcome, non-eligible study design (experimental or descriptive study), claims data were not used to define fractures, and claims data from Canada or the USA were not used.

Table 1 Eligibility criteria

We searched for studies published in English, the language of the authors and the most common official language in the USA and Canada, from January 1, 2000, to December 31, 2020, in MEDLINE (Ovid), Embase (Ovid), and CINAHL (EBSCO). An initial search was conducted on June 29, 2020, and updated June 24, 2021, to include papers published through to December 31, 2020 [26]. The search strategies are provided in the published protocol [19]. We also searched gray literature for pharmacovigilance studies to inform our safety outcome (atypical fracture of the femur) in the following sources: Food and Drug Administration (FDA) Sentinel [27], Canadian Agency for Drugs and Technology in Health (CADTH) [28], American Society for Bone and Mineral Research (ASBMR) [29], Public Health Agency of Canada (PHAC) [30], and National Foundation of Osteoporosis[31] websites. The reference lists of eligible articles were also screened for additional papers.

Two authors (AMR, NK) separately completed literature searches, screened abstracts, and extracted data [19]. Since the publication of our protocol [19], we modified our data extraction tool to enable us to record trauma codes [32]. Any disagreements that arose between the reviewers with respect to screening or data extraction were resolved through discussion or by a third reviewer (SMC) if consensus among primary reviewers was not reached through discussion. The extracted data included publication information and detail about data sources (inpatient, outpatient, emergency department), fracture sites (e.g., hip, humerus), number and types of codes (diagnostic or procedural) used to define fractures, use of washout windows, trauma codes, and citations for chosen fracture definitions. Study data were collected and managed using REDCap electronic data capture tools [33, 34].

Study characteristics and fracture definitions were summarized in tabular form. We stratified results by fracture site and whether studies indicated the data sources utilized for each code (e.g., diagnostic code from inpatient data versus diagnostic code from emergency department data).

Results

Of 9728 unique publications based on the relevance of their titles and abstracts, 345 full-text articles were screened, and 57 publications were included: 54 from our initial search and 3 from screening the reference lists of articles identified from our initial search (Fig. 1). We identified 147 additional sources by searching gray literature, yet no gray literature qualified for inclusion.

Fig. 1
figure 1

Study inclusion and exclusion flow

Characteristics of included studies

Most studies (91%, n = 52) examined medication effectiveness, 2 (4%) evaluated safety, and 3 (5%) evaluated both (online resource: Appendix I). The majority of studies (n = 53, 93%) examined the effect of bisphosphonates (alendronate, etidronate, ibandronate, pamidronate, risedronate, or zoledronic acid) on fracture risk and up to 30% considered other drugs; raloxifene (28%), calcitonin (30%), teriparatide (26%), denosumab (14%), estradiol/estrogens, or hormone replacement therapy (11%). No studies examined the effects of strontium or romosozumab.

Half (n = 29, 51%) of the studies did not provide a citation for their fracture definition. Among the 28 studies that included a citation for their fracture outcome definitions, 13 cited validation articles [15, 16, 18, 35, 36], 3 cited guidance on which fracture sites are due to osteoporosis [37], and 15 cited other primary research articles. Ray et al. [15] was the most commonly cited validation article for non-vertebral fractures (n = 10, 36%), while Curtis et al. [18] was the most commonly cited article for vertebral fractures (n = 7, 25%).

Less than half of the studies across each fracture site excluded traumatic fractures (hip: n = 21, 40%; humerus: n = 17, 45%; radius/ulna: n = 14, 41%; vertebra: n = 14, 40%, atypical fracture of the femur: n = 2, 40%). Washout windows across fractures ranged from 30 to 180 days and were most common for humerus fractures (n = 12, 32%), followed by vertebra (n = 9, 26%), radius/ulna (n = 8, 24%), and hip (n = 8, 15%). A washout window of 180 days was used for atypical fracture of the femur (n = 1, 20%). Methods on how washouts were implemented were not described.

Half (n = 29) of the studies did not indicate specific data sources for the codes in at least one of their outcome definitions. In fact, 4 studies (7%) did not provide any descriptions of their fracture definitions [22, 38,39,40]. The definitions and codes for each fracture site are shown in the online resource (Appendix II, III). In the following sections, we provide descriptions of the definitions used to identify each fracture site among studies that indicated the data sources utilized for each code.

Hip fractures

Of the 53 articles that studied hip fractures, half (n = 29, 55%) indicated their data sources, and 76% (n = 22) used only inpatient data (Table 2). A total of 12 definitions were identified, with the most common being: 1 inpatient diagnostic code (41%) and 1 diagnostic plus 1 procedural code during the same inpatient stay (14%). Other definitions include 1 diagnostic code from inpatient or emergency department (3%) and 1 inpatient or outpatient diagnostic code (3%).

Table 2 Summary of main fracture definitions among studies that indicated their data source(s)

Humerus fractures

Of the 38 studies that considered humerus fractures, 17 (45%) indicated their data sources, Table 2. Among these 17 studies, 8 different definitions were used. The most common definitions were 1 inpatient or outpatient diagnostic code (29%), and 1 inpatient diagnostic code or 2 outpatient diagnostic codes within 90 days (18%). Additional definitions include an unspecified number of diagnostic code(s) from inpatient data (11%).

Radius/ulna Fractures

Of the 34 studies that considered radius/ulna fractures, 13 (38%) indicated their data sources (Table 2). Among these, 9 (69%) used codes from inpatient or outpatient data. Out of the 8 definitions used across the 13 studies, the most common was 1 inpatient diagnostic code or 2 outpatient diagnostic codes within 90 days (23%), followed by 1 inpatient or outpatient diagnostic code (15%), and an unspecified number of inpatient diagnostic code(s) (15%).

Vertebra fractures

Of the 35 studies that examined vertebral fractures, 15 (43%) indicated their data sources (Table 2). Among these, 12 (80%) used inpatient or outpatient data and 9 definitions were used. The most common definition was 1 inpatient diagnostic code or 2 outpatient diagnostic codes with a maximum period of 90 days between outpatient diagnoses (20%). Additional definitions include 1 diagnostic code plus 1 procedure (from inpatient or outpatient data) (13%) and 1 inpatient or outpatient diagnostic code (13%).

Atypical fractures of the femur

Of the 5 studies that considered atypical fracture of the femur, 4 (80%) indicated their data sources (Table 2). Among these 4, three definitions were used: 2 studies (50%) used 1 inpatient diagnostic code, 1 study (25%) used an unspecified number of inpatient diagnostic codes, and 1 study used 1 diagnostic code from inpatient or emergency data. Three studies provided a citation for their definition, with each providing a different source [16, 41, 42].

Discussion

We identified little transparency in osteoporosis fracture outcome studies that use healthcare claims data. First, less than half of the studies provided references for their fracture definitions. Among studies that did, the most commonly cited paper for non-vertebral fractures was Ray et al. [15]. Ray and coauthors assessed the validity of Medicare hospitalization (inpatient), outpatient, and emergency department data to identify fractures and developed definitions with positive predictive values ranging from 95 to 98% and sensitivities ranging from 90 to 97% to identify fractures of the hip, radius/ulna, and humerus [15]. The validation paper by Curtis and colleagues [18] was most commonly cited for vertebral fractures. Curtis’ recommended definition identified vertebral fractures as a diagnosis followed by a procedural code for a spine imaging test within 10 days, or a hospitalization with a primary diagnostic code, and had a positive predictive value of 61% [18]. Despite being commonly cited, we noted that fracture definitions were not always used as described in the original validation articles. However, it is possible that this finding reflects lack of details in reporting. Detailed reporting of fracture definitions is critical to allow reproducibility and comparison in the field. In addition, several studies cited various other articles, rather than validation papers, making it difficult to know the actual codes and data sources used, especially when the cited paper(s) cites other studies for their fracture definition.

Second, only half of the studies identified their data sources for fracture definitions. Of the studies that described their definitions in detail, we observed heterogeneity among definitions (e.g., 12 definitions used among 29 studies that considered hip fracture, with 15% using a washout window and 40% using trauma codes), making comparisons of drug effectiveness and safety between studies challenging. For example, a study requiring a diagnosis and procedure for hip fractures from inpatient data will not capture the frail, older patients who pass away prior to surgery, while another study requiring a diagnosis from inpatient or emergency department claims would theoretically capture all hip fracture patients that survive until they reach the emergency department. Differences in outcome coding can lead to outcome misclassification and biased estimates of fracture risk [18]. Indeed, Curtis and colleagues found that definitions with low positive predictive values underestimate the true relative risk reduction in vertebral fracture risk by up to 50% [18].

Our study has strengths and limitations worth noting. Our review employs the robust JBI methodology for scoping reviews, which has allowed us to conduct our review in accordance with an a priori published protocol [19]. Additionally, even though numerous validation studies have been published, our review is also the first study to document fracture outcome definitions used in osteoporosis drug effects studies. However, our study also has limitations. Although we were limited to studies in English that utilized claims data from Canada or the USA claims data, we expect the lack of transparency in fracture outcome definitions to be applicable to other regions and data sources. We also recognize limitations in our interpretation of fracture definitions due to ambiguity in reporting. We describe fracture definitions as reported by the authors of each paper. A notable example involves studies that specified the use of “diagnostic code(s)ˮ to identify fracture. In this case, there was often no specification of the number of codes used. This definition was recorded as either a single diagnostic code or “number of diagnostic codes not indicated,ˮ based on the presence of a singular article (i.e., the, a) preceding the code description. We recommend that detailed fracture outcome coding always be available in text or supplemental material.

In conclusion, claims data are a rich resource for pharmacoepidemiologic research and allow for the observation of drug outcomes from thousands of patients, which may not be possible using other data sources. However, we found large variation in the reporting and methods used to identify fractures. We provide specific examples of fracture outcome definitions for studies examining the effects of osteoporosis medications, yet similar inconsistencies may be found in studies examining the effects of other medications on fracture risk (e.g., diabetes medications). Consistency in fracture definitions across studies is key to making study results readily comparable. Yet, the first step for achieving consistency is transparency [23]. The reporting of fracture definitions must be improved to enhance clarity and promote consistency [23]. Furthermore, our findings highlight that although there is a considerable amount of literature dedicated to the development and validation of definitions to define fractures in osteoporosis, in practice, many osteoporosis drug effect studies do not utilize this literature to its fullest capacity, resulting in heterogeneity in fracture definitions used across studies. Future studies that explore the differences in fracture identification and impact on study results using the definitions identified in this review are warranted.