Introduction

Cost-effectiveness research has emerged as an increasingly used tool in evaluating treatments rendered in health care considering the efficacy of treatment and the cost [1]. Moreover, it provides for outcomes assessment when more rigorous scientific trials such as randomized prospective trials are difficult or impractical to perform [2]. As pressures continue to increase in proving efficacy of particular treatments in health care, cost-effectiveness research may aid in helping payers, providers, and policymakers invest in healthcare treatments that are the most efficient considering the effect of outcome and cost. In this era of increasing cost-consciousness, when cost containment is more important than ever, healthcare economic analyses have become more common in the medical [11] and orthopaedic surgery literature [4]. Some reviews have concluded that many operative interventions in orthopaedic surgery are cost effective [7, 13], while another questioned the cost-effectiveness of certain procedures [19].

There are multiple factors that affect the quality, and therefore validity, of cost-effectiveness research, including, but not limited to, the quality of the primary source data used in the models and the tools used to measure cost-effectiveness. Primary data are relied on to make assessments of the cost and efficacy of the treatment in question. Health economic models are most reliable when these data are derived from high-quality sources. These data ideally originate from peer-reviewed studies, but they may not always be available in a form that facilitates use in a decision tree [1]. Obtaining high-quality primary data is challenging on both fronts. Cost data can be difficult to obtain for multiple complex reasons, including variation in negotiated price points for certain goods and services, which goods and services are captured by reported costs, whether indirect costs such as lost productivity or personal resources are incorporated, and a lack of transparency surrounding costs in general. In one review of economic evaluation studies in health care, costs were identified as the factor most frequently cited as generating variability in economic evaluations between geographic locations [16]. In addition, high-level randomized control trials often are needed to establish reliable results for given treatment options, although such data often are not available. Without high-quality primary data, the results and conclusions of a cost-effectiveness study are less reliable.

In addition to the quality of primary data, the tools used to measure cost-effectiveness also play an important role and can limit the quality of such studies. Quality-adjusted life years (QALYs), for example, frequently are used in health economics studies to determine cost-effectiveness, with USD 50,000 per QALY commonly used as the minimum threshold value. Despite our historical reliance on this figure, however, it has been questioned and might not be the optimal way to judge the value of an intervention [12]. The extent to which these factors affect the overall quality of health economics research is unknown.

Given the increasing prominence and potential effect of such studies, it is imperative that published cost-effectiveness studies not only be relevant to current decision-making algorithms, but also of high enough quality that the results can be safely incorporated in such decision-making. The strength of recommendations for recent orthopaedic cost-effectiveness studies is currently unknown. We therefore asked (1) what are the strengths of recommendations in recent orthopaedic cost-effectiveness studies; (2) what are the reasons for weak recommendations; and (3) what are the methodologic reporting practices used by these studies?

Patients and Methods

The titles of all articles published in six different orthopaedic journals spanning more than 10 years were scanned for initial inclusion. The journals used for the search were Journal of Bone & Joint Surgery (American volume), Journal of Bone & Joint Surgery (British volume)/The Bone & Joint Journal, Clinical Orthopaedics and Related Research ® , The Journal of Arthroplasty, The American Journal of Sports Medicine, and Spine. We chose these journals because of their high impact and to achieve broad coverage across orthopaedic subspecialties in the United States, a strategy similar to that performed in a prior study on the data quality of rotator cuff interventions [10]. Each issue of these journals from January 1, 2004, through April 1, 2014, was reviewed for the following terms: “cost”, “utility”, “economic”, “price”, “cost-effectiveness”, “cost-utility”, “decision-making”, “Medicare”, “Medicaid”, “reimburse”, and “cost-benefit”.

Inclusion criteria included any original research study with a cost-effectiveness element comparing two different types of treatment or intervention, including operative and nonoperative management. Studies were excluded if this analysis was not performed or if the cost-effectiveness analysis focused on only one intervention (as might occur in studies on the cost-effectiveness to society of a particular treatment).

The literature search process included an initial title scan for key terms, which yielded a subset of possible papers that then were further included or excluded based on a full-title review (Fig. 1). After extraction for possible inclusion based on title, abstracts were retrieved for each study to determine final inclusion by two study authors (ECM, MES), who performed this inclusion assessment independently, with any conflicting titles agreed on. In analyzing studies that were selected for final inclusion, the full-text references were reviewed.

Fig. 1
figure 1

A flow diagram shows the search process used in this study. Refs = references; JBJS Am = Journal of Bone & Joint Surgery (American volume); JBJS Br/BJJ = Journal of Bone & Joint Surgery (British volume)/The Bone & Joint Journal; CORR® = Clinical Orthopaedics and Related Research ®; JOA = The Journal of Arthroplasty; AJSM = The American Journal of Sports Medicine.

A total of 79 studies met final inclusion criteria and were included for evaluation. In total, 23 references were included from the Journal of Bone & Joint Surgery (American volume), five references from the Journal of Bone & Joint Surgery (British volume)/The Bone & Joint Journal, eight references from Clinical Orthopaedics and Related Research ®, 13 references from The Journal of Arthroplasty, five references from The American Journal of Sports Medicine, and 25 references from Spine (Table 1).

Table 1 Included references by journal

For all included studies (n = 79), the main endpoint was strength of recommendation as made by a subjective full-text review. A study was considered to provide weak recommendations if it lacked in key primary data (clinical outcomes data, cost information, or utility values associated with the disease states in question) which limited its ability to make definitive recommendations about the cost-effectiveness of one treatment over another. Conversely, a study was classified as providing strong recommendations when key primary data were available, enabling its authors to make definitive conclusions. Under this classification, a study would be considered weak if its authors could not make a recommendation about the cost-effectiveness of one treatment over another owing to lack or uncertainty surrounding primary data. However, a study in which key primary data were neither lacking nor generated uncertainty was classified as providing strong recommendations, even if treatment options A and B were found to be equivalent in their cost-effectiveness. To further understand the barriers to providing stronger recommendations, in each study classified as weak the reasons for such classification were further noted. After full-text review of the articles included in our study, we found that these absent or uncertain data broadly fell into one of three categories – clinical outcomes, costs, or utility values. These underlying limitations for studies with weak recommendations were recorded to understand the extent to which each limited abilities of studies to generate stronger recommendations.

In addition, the study type (eg, surgical intervention versus nonoperative management) was recorded, and key methodologic reporting practices were noted for each reference. These were comprised of framing, costs, and results reporting practices, derived from the checklist and recommendations of the United States Panel on Cost-Effectiveness in Health and Medicine [17, 20], and adapted from Brauer et al. [5]. The framing variables included clearly defined intervention, adequate description of a comparator, study perspective clearly stated, and reported discount rate for future costs and QALYs. Cost variables included economic data collected alongside a clinical trial or another primary source and clear statement of the year of monetary units. Finally, results reporting included whether a sensitivity analysis was performed. In addition to these parameters which have been studied previously [5], studies were evaluated using The Quality of Health Economic Studies (QHES) instrument [15], a 16-item checklist with defined point values for each that sum to 100, which has been validated to evaluate the quality of health economics research [6] and has been used in the orthopaedic literature [14]. All analyses of included studies were performed independently by two of the authors (ECM, MES), with any discrepancies being reconciled with mutual agreement following discussion between the two authors.

In the current study, cost-effectiveness is used to refer to all health economics study types, including cost-effectiveness analysis, cost-utility analysis, cost-minimization analysis, and cost-benefit analysis. Broadly speaking, for these methods cost is determined using monetary value with the primary difference between each type of analysis being in the way outcomes are measured. For cost-effectiveness analysis, physical or “natural” units, such as cases treated or years of life gained, are used; for cost-utility analysis, health state utility values such as QALYs are used; for cost-minimization analysis, only input costs are considered to identify the least expensive way to achieve the same outcome; and for cost-benefit analysis, outcomes are measured in monetary units. The distinction between these analyses is described more fully elsewhere [1, 3, 4, 13].

Results

Of the 79 articles included in this study, 50 (63%) provided strong recommendations, whereas 29 (37%) provided weak recommendations (Fig. 2). The number of cost-effectiveness studies published in the included journals is increasing with four articles published in 2004 and 13 published in 2013, with the proportion of published studies classified as strong and weak remaining relatively constant with time (Fig. 3). The 20 published studies reported for 2014 represents an estimate for the number of articles to be published for the full year based on 3 months of data from January 1, 2014, to April 1, 2014. The types of studies included in this assessment include comparisons between surgical and nonsurgical interventions (Table 2). Of the 79 included studies, 31 (39%) involved a comparison of the cost-effectiveness of two surgical interventions. A total of 34 (43%) references involved cost-effectiveness comparisons between nonsurgical strategies, with 22 comparing cost-effectiveness of a nonsurgical intervention with nonsurgical controls and 12 comparing two nonsurgical interventions. Fourteen studies (18%) involved a cost-effectiveness comparison of a surgical intervention and nonoperative management. Of the studies that compared nonsurgical interventions and nonsurgical controls, 77% (17 of 22) provided strong recommendations. Similarly, of the studies that compared two nonsurgical interventions, 75% (nine of 12) provided strong recommendations. Studies comparing two surgical interventions and those comparing surgical intervention with nonoperative management provided strong recommendations in 52% (16 of 31) and 57% (eight of 14) of references, respectively.

Fig. 2
figure 2

Sixty-three percent of included references provided strong recommendations, whereas 37% provided weak recommendations.

Fig. 3
figure 3

The number of included references in the study are shown by publication year and strength of recommendation. This figure suggests a recent trend of increasing number of cost-effectiveness studies in orthopaedics, with the proportion of strong and weak studies remaining relatively constant with time. There were five total publications from January 1, 2014, through April 1, 2014. The figure shown for 2014 is a projection of publications for the full year.

Table 2 Details on study type of included references

For references with weak recommendations, their inability to provide definitive conclusions regarding cost-effectiveness was primarily the result of the absence or uncertainty surrounding three key variables: clinical outcomes, cost, and utility data (Fig. 4). Of the three variables, clinical outcomes data were cited in 26 (of 29) studies as being insufficient to provide definitive conclusions, whereas cost and utility data were cited in 13 (of 29) and seven (of 29) articles, respectively.

Fig. 4
figure 4

Weak recommendation studies are shown by reported variable with insufficient data. Each of these variables is depicted by the frequency with which it limits more conclusive recommendations. References reported insufficient data with respect to one or more of these variables.

Methodologic reporting practices varied, with mixed adherence to framing, costs, and results reporting (Table 3). Although all articles were found to clearly define the intervention and 95% (75 of 79) provided an adequate description of the comparator, articles included in this study varied in reporting other framing variables in their studies, with 77% (61 of 79) clearly stating the perspective of the study and 57% (45 of 79) including a discount rate for future costs and QALYs. Cost reporting also varied, with 58% (46 of 79) of included references using economic data collected from a clinical trial or primary source and 73% (58 of 79) clearly stating the year of monetary units. Finally, reporting of sensitivity analyses was similarly varied, with 62% (49 of 79) of included articles reporting sensitivity analyses in their study. When grouped by studies with strong and weak recommendations, most variables show similar reporting rates. The major differences in reporting were with discount rate, which was reported in 50% (25 of 50) and 69% (20 of 29) of articles providing strong and weak recommendations, respectively, and sensitivity analysis, which was reported in 52% (26 of 50) and 79% (23 of 29) of articles providing strong and weak recommendations, respectively. Grading the studies using the QHES instrument, the mean score for all studies was 79.7 (range, 47–100), with means for studies with strong and weak recommendations of 77.3 and 83.9, respectively. The studies showed mixed adherence to these items, with lower rates of adherence for variable estimates using the best available source, uncertainty testing, incremental analysis, adequate time horizon and discounting, and explicit discussion of direction and magnitude of biases (Fig. 5). Strong and weak studies showed similar rates of adherence, with most marked differences in uncertainty testing and adequate time horizon and discounting, which showed higher rates of reporting among studies providing weak recommendations.

Table 3 Methodologic reporting practices of included studies*
Fig. 5
figure 5

Adherence to methodologic reporting practices outlined by the Quality of Health Economic Studies (QHES) instrument is reported for all studies and by strength of recommendation. Overall there is good adherence to these practices, with lower rates of adherence for variable estimates using the best available source, uncertainty testing, incremental analysis, adequate time horizon and discounting, and explicit discussion of direction and magnitude of biases. Strong and weak studies showed similar adherence, with most marked differences in uncertainty testing and adequate time horizon and discounting, which showed higher rates of reporting among studies providing weak recommendations.

Discussion

Cost-effectiveness research is an increasingly used tool in assessing treatments in orthopaedic surgery, providing recommendations for treatments considering efficacy and cost. Considerable primary data variability and lack of high-quality sources can cause uncertainty and limit the strength of recommendations in these studies, resulting in assessments that are not useful, or, worse, that are harmful. We found that a substantial number of cost-effectiveness studies provide weak recommendations owing to insufficient clinical outcomes, cost, and/or utility data and that methodologic reporting practices varied greatly.

This study had numerous limitations. First, not all orthopaedic surgery publications were reviewed for study inclusion. Rather, six high-impact journals were chosen, and a comprehensive 10-year search of all issues from these journals was performed by the study team, a strategy used previously to assess data quality [10]. Focusing on these journals across a broad range of subspecialties, we did not seek to perform a comprehensive review, but rather to identify high-quality studies as a way to broadly understand the strengths of recommendation provided in the US health economic literature related to orthopaedics. This focus on the US healthcare system is a limitation of our study, as we did not include more internationally recognized journals which could be a subject for future research. Furthermore, there is a risk that some articles were excluded, because study recruitment initially was performed based on title scans of studies, and we did not supplement the initial search of these journals with any other searches. By focusing on titles that highlighted cost-effectiveness as a major component of the study and including a detailed list of terms to facilitate related study inclusion, we were able to include only dedicated cost-effectiveness studies in our assessment. An additional limitation was that the definition of a study’s strength of recommendation was made based on subjective review by the study team. This subjective assessment was based on full-text review of the included studies and therefore certain instances of lacking or uncertain data could have been overlooked or misclassified. To limit this possibility, two different study investigators (ECM, MES) conducted an independent review of the data to minimize any incorrect classifications of recommendation strength. In addition to the methodologic parameters reported by Brauer et al. [5], in the current article, we have included an assessment with the validated QHES instrument to score each of these studies on their methodologic reporting practices.

The results indicate that of the 79 studies that compare two different treatment modalities, only 63% (n = 50) were able to produce strong, definitive recommendations regarding cost-effectiveness between the tested interventions. Given the importance of cost-effectiveness research in clinical decision and policy-making, understanding the proportions of strong and weak recommendations offered by these studies is imperative. This is especially true given the steadily increasing volume of such studies in the orthopaedic literature [35, 14] and seen again in our study. This approach enables an understanding of the proportion of cost-effectiveness analysis studies that are either lacking in or have uncertainty around key data parameters, which limits the strength of the authors’ recommendations.

Of the 29 articles found to provide weak recommendations, insufficient clinical outcomes data were most commonly cited as a factor limiting the strength of recommendations, followed by inadequate cost and utility data. These results are not surprising, given the difficulty in conducting randomized-controlled trials to obtain high-quality data when comparing two surgical interventions [18]. Furthermore, Garber and Sox [8] stated that the long-term outcomes data required in cost-effectiveness analysis most often do not exist and therefore are an important source of model uncertainty. Others have found that cost data can be equally challenging to capture [9], limiting our ability to draw useful conclusions in cost-effectiveness analysis studies. Hamid et al. [9] reported that accurately capturing cost data is difficult owing to dissimilar accounting methods across institutions and questionable accuracy of these methods. They suggested that cost data can be particularly difficult to obtain in orthopaedics, given proprietary restrictions imposed by manufacturers and payers [9]. These previous studies [8, 9, 18] reporting on the difficulties in obtaining accurate primary data lend support to our findings.

Our increasing reliance on cost-effectiveness studies demands that these studies incorporate sound methodologic practices. When examining the extent to which methodologic parameters were reported, we found substantial heterogeneity in framing, costs, and results reporting in the studies included in our analysis. Brauer et al. [5] found similar methodologic flaws, noting a lack of clearly stated study perspectives (present in only 43% of studies), defined reporting of discount rates (49%), and establishment of explicit “cutoffs” for cost-effectiveness (49%). Another review [3] challenged the quality of orthopaedic cost-effectiveness studies, reporting that a majority of published studies contained significant methodologic flaws. Our study showed some improvement in reporting practices compared with prior reviews, particularly in framing and cost-reporting. One possible explanation for this improvement could be the more recent study period in our analysis. Prior reviews have shown a tendency toward improvements in reporting with time [11, 14]. Furthermore, studies focused on only one intervention were excluded from our study, which could explain the increased adherence to framing practices seen in our study. When examining the studies with the validated QHES instrument, we found similarly mixed reporting, with a mean score of 79.7, and mean adherence rates for specific parameters ranging from 41% (explicit discussion of biases) to 100% (conclusions based on study results). These results are similar to those reported in a review of orthopaedic sports literature (mean of 81.8) [14]. When analyzed by strength of recommendations, there appear to be few differences in reporting practices. One possible explanation is that, although classified as weaker in our study based on a lack of key data which limited the strength of their recommendations, weak studies may not be lower quality than their stronger counterparts.

The increasing popularity and reliance on cost-effectiveness studies to inform clinical and policy decision-making requires high-quality research. Given that a substantial portion of orthopaedic studies provide weak recommendations and have varied methodologic practices, we suggest that clinicians read these studies with a critical eye, incorporating recommendations in practice with considerable caution. This is particularly true for studies providing strong recommendations, which, although presenting clear and confident conclusions, are not necessarily higher quality. That studies have presented evidence that methodologic reporting is improving with time is encouraging. Nevertheless, readers should be mindful of the methodologic quality of the studies, and could use validated instruments such as the QHES to consistently evaluate them. Future research could assess the effect of health economics studies on clinical practice, and whether recommendation strength has any effect on practice patterns.