Introduction

Healthcare systems around the globe face a growing demand for hip and knee replacements in the coming years, resultant from an aging population and rising prevalence of symptomatic osteoarthritis [1,2,3]. For example, modeling suggests that as many as 935,000–1.26 million patients in the United States will undergo total knee arthroplasty (TKA) in 2030 alone [3]. Additionally, forecasts for orthopaedic surgeons performing joint replacement suggest a supply shortage in the future [4, 5]. Concurrently, due to a variety of market dynamics, hospital networks and physician groups have seen substantial consolidation in recent years via mergers and acquisitions [6,7,8]. To manage this growing demand for joint arthroplasty in a healthcare environment deficient in orthopaedic surgeons and increasingly dominated by larger and more complex health systems, physicians and other musculoskeletal service line stakeholders will need to develop innovative solutions.

Conceptually, these solutions will need to both optimize the interactions between supply and demand of joint replacement surgeons as well as decrease transactional friction. Specifically, in the context of healthcare service delivery, this means matching a patient with the optimal provider at the correct time in their disease course to achieve optimal therapeutic outcome. For instance, it would be inefficient and unnecessarily costly for a patient to continue seeing a non-operative interventionist for hip arthritis if that patient had exhausted nonoperative modalities, would benefit from hip arthroplasty, and were interested in such a procedure. Conversely, it may not be ideal to have a patient see an orthopaedic surgeon for hip arthroplasty when they do not have radiographically confirmed advanced arthritis and have yet to explore any conservative treatment. Optimizing supply–demand logistics requires robust data analytics. Fortunately, there has been tremendous growth in the adoption and utilization of electronic health records (EHRs), which can facilitate this end [9, 10].

With this context in mind, it is important to understand what factors may influence treatment disposition and, ultimately, surgical indication within the population of patients with hip and knee osteoarthritis. Several prior investigations have suggested that age [11], comorbidities [11], patient-reported outcome measures [11, 12], willingness to consider total joint arthroplasty (TJA) [11], Kellgren–Lawrence grade of radiographic arthritis [12], physical function [13], body mass index (BMI) [14], and use of ambulatory assist devices [15] may be associated with surgical indications for joint arthroplasty. Nonetheless, to our knowledge no treatment algorithm has yet been published to identify who may or may not be a potential surgical candidate for joint arthroplasty based purely on information that could be readily available in the EHR (i.e. prior to in-person evaluation by an orthopaedic surgeon). This would be the ideal source for such information in a health ecosystem increasingly mismatched in the manner previously described.

The COVID-19 pandemic created an environment uniquely well-positioned for such an endeavor. Out of necessity during this timeframe, providers diagnosed patients and made procedural plans solely via telemedicine interactions without in-person patient interactions. Previous investigations revealed that when orthopaedic surgeons and interventionists made surgical or procedural recommendations within this setting, their specific plans (e.g. arthroscopic rotator cuff repair, total hip arthroplasty, L4--5 transforaminal epidural steroid injection) rarely changed after meeting and examining patients in person [16,17,18]. Of particular note, our prior work within this domain revealed that no patients who were indicated for THA, TKA, or unicompartmental knee arthroplasty (UKA) by telemedicine and without in-person physical examinations experienced change to their surgical plans after an in-person examination was subsequently performed [17]. The proliferation of encounters like these throughout the pandemic has provided a unique opportunity, wherein all information and data generated to make a procedural indication (via history, diagnostic studies, and/or imaging) was gathered without any in-person patient interaction. Although gathered by physicians during their telemedicine appointments, such data are not unique to the in-person physician evaluation and, therefore, are particularly beneficial for the development of a potential screening algorithm.

Therefore, we sought to develop a machine learning algorithm for the prediction of patients who would be indicated for THA, TKA, or UKA following telemedicine encounter and without in-person evaluation.

Methods

Guidelines

We followed the Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis guidelines as well as the Journal of Medical Internet Research Guidelines for Developing and Reporting Machine Learning Predictive Models in Biomedical Research [22, 23].

Data source

Our institutional review board approved retrospective review of electronic medical records at two academic medical centers and three community hospitals. Inclusion criteria for the study were: (1) adult patients, age greater 18 years who had a (2) new patient visit via (3) telemedicine in a (4) lower extremity arthroplasty (total hip arthroplasty, total knee arthroplasty, unicompartmental knee arthroplasty) clinic between (5) March 1, and July 31, 2020. Exclusion criteria for the study were: (1) diagnoses other than osteoarthritis and (2) revision procedures.

Outcome

The primary outcome was indication for operative intervention on the basis of the telemedicine visit.

Variables

The following variables were abstracted by retrospective review of electronic medical records: age (years), sex, body mass index (BMI) (kg/m2), race, Charlson comorbidity score (CCI), diabetes, chronic obstructive pulmonary disease (COPD), chronic kidney disease (CKD), depression, opioid use in the year prior to evaluation, benzodiazepine use in the year prior to evaluation, current smoking status, degree of radiographic arthritis, prior intra-articular injection (of steroid or hyaluronic acid), prior trial of physical therapy, and current use of ambulatory assistive device. Degree of radiographic arthritis was quantified by the official report generated within the EHR of a board-certified attending radiologist and categorized as none, mild (or Kellgren–Lawrence grade 1–2), moderate (or Kellgren–Lawrence grade 3), or severe (or Kellgren–Lawrence grade 4). If no official report was available in the EHR (i.e. a patient underwent radiographs at an outside hospital), the radiographic read documented in the orthopaedic surgeon’s telemedicine note was utilized. Generally, knee radiographs included weight-bearing anteroposterior (AP), lateral, and skyline views whereas hip radiographs included weight-bearing AP and frog leg lateral views.

Missing data

Rates of missing data were low across all variables, with none found to have greater than 30% missing data (Online Appendix Table 1). The variables with the highest amount of missing data were opioid use (n = 43; 27.2%) and benzodiazepine use (n = 43; 27.2%). Missingness for other variables ranged from 0.6 to 10.1%. The missForest multiple imputation method was used to impute missing data across these variables [24].

Model development

A stratified split (70:30) was undertaken to create training (n = 112) and testing sets (n = 46). Recursive feature elimination with random forest algorithms was used to identify the predictors of indication for operative intervention. Five machine learning algorithms (stochastic gradient boosting, random forest, support vector machine, neural network, elastic-net penalized logistic regression) were developed on the training set to predict indication for operative intervention. Final algorithms were evaluated by ten-fold cross validation of the training set and on evaluation in the independent testing set not used for algorithm development. Algorithm performance was assessed by discrimination (area under the receiver operating curve [AUC]), calibration (calibration curve, calibration slope, calibration intercept), overall performance (Brier score), and decision curve analysis. The null model Brier score (score for an algorithm that predicts a probability equal to the prevalence of the outcome for every patient) was calculated to benchmark the algorithms’ Brier score.

Data analysis

The Anaconda Distribution (Anaconda, Inc., Austin, Texas), R (The R Foundation, Vienna, Austria), RStudio (RStudio, Boston, Massachusetts), and Python (Python Software Foundation, Wilmington, Delaware) were used for data analysis.

Results

Overall, 158 patients who underwent new patient telemedicine evaluation for consideration of THA, TKA, or UKA and 65.2% (n = 103) for a diagnosis of osteoarthritis were included (Table 1). Of these patients, 42.7% (n = 44) were indicated for total hip arthroplasty and 57.3% (n = 59) were indicated for total knee arthroplasty or unicompartmental knee arthroplasty. Among the 103 patients who were indicated for surgery, 95 (92.2%) ultimately underwent surgical intervention. The remaining 8 patients were lost to follow-up after surgical indication. The median age was 65 (interquartile range 59–70).

Table 1 Baseline characteristics of study population, n = 158

Variables associated with an indication for operative intervention were radiographic degree of arthritis, prior trial of intra-articular injection, trial of physical therapy, current opioid use, and current tobacco use. On ten-fold cross validation of the training set, the AUC ranged from 0.78 (stochastic gradient boosting) to 0.83 (support vector machine) (Table 2). The calibration intercept ranged from – 8.12 (elastic-net penalized logistic regression) to 0.23 (support vector machine) and the calibration slope ranged from 3.73 (support vector machine) to 21.4 (elastic-net penalized logistic regression). The Brier score ranged from 0.15 to 0.16 compared to the null model Brier score of 0.23.

Table 2 Algorithm performance on cross-validation of training set, n = 112, mean (95% confidence interval)

In the independent testing set (n = 46), the stochastic gradient boosting algorithm achieved the best performance with AUC 0.83 (95% CI 0.67, 0.95) (Fig. 1). The model had calibration intercept -0.13 (95% CI − 0.65, 0.92) and calibration slope 1.03 (95% CI 0.52, 1.86) (Fig. 2). For overall performance, the model achieved Brier score 0.15 (95% CI 0.09, 0.22) relative to a null model Brier score of 0.23 (Table 3). Management changes made on the basis of the model’s predictions resulted in higher net benefit than the default strategies of changing management for all patients or for no patients at all thresholds (Fig. 2B).

Fig. 1
figure 1

Receiver operating curve (A) and global variable importance (B) of stochastic gradient boosting algorithm in testing set, n = 46

Fig. 2
figure 2

Calibration curve (A) and decision curve analysis (B) of stochastic gradient boosting algorithm in testing set, n = 46

Table 3 Algorithm performance in independent testing set (95% confidence interval), n = 46

In the case of a hypothetical patient (Fig. 3), we display the likelihood for indication for surgical intervention in the setting of: (1) a previous trial of physical therapy, (2) severe radiographic arthritis, (3) prior intra-articular injection, (4) no tobacco use, and (5) opioid use. The predicted likelihood of indication for surgical intervention was 0.80 and the previous trial of physical therapy, severe arthritis, prior injection, and no tobacco use increased this estimation, while opioid use decreased this likelihood.

Fig. 3
figure 3

Example of individual patient-level explanation for prediction of the stochastic gradient boosting algorithm in a patient indicated for surgery via telemedicine

Discussion

Musculoskeletal care delivery will require innovative strategies beyond scaling service lines to meet the growing demand for orthopaedic surgery in an aging population. The forecasted supply shortage of arthroplasty surgeons [3,4,5] has likely been further exacerbated by the COVID-19 pandemic, wherein late career stage surgeons have retired earlier than planned and case backlogs grew [25,26,27]. Concurrently, healthcare delivery has become increasingly complex, with fiscal incentives driving consolidation of hospitals, clinics, providers, service lines, and physician organizations [6,7,8]. As systems grow in complexity, it is paramount to optimize supply–demand matching between providers and patients, meaning that a patient with a particular diagnosis is matched with the appropriate provider to initiate the optimal treatment or procedure. A majority of industries are heavily leveraging data analytics to predict consumer behavior and optimize supply–demand matching, but healthcare lags far behind in this area [9].

Several forces in our healthcare systems are creating environments more capable of generating predictive algorithms to direct patients to optimal providers and treatments, namely the widespread conversion to electronic health records and the consolidation of healthcare enterprises that manage care longitudinally and span from primary to specialty care. Thus, successful health systems in the future will rely on data analytics to match patients with the appropriate provider at the optimal time in their disease states. We believe this will increase the quality of care while also decreasing time to treatment and cost associated with inefficient interactions. With this context in mind, we sought to develop a machine learning algorithm to predict the rate at which a patient would be indicated to undergo hip or knee arthroplasty based solely on data that could be obtained prior to in-person evaluation by an orthopaedic surgeon. Paramount to our ability to derive this predictive algorithm was a unique dataset generated during the COVID pandemic, wherein providers evaluated patients via telemedicine encounters alone and made specific surgical recommendations (e.g. total hip arthroplasty) without in-person physical examination.

Overall, we were able to create an algorithm with strong diagnostic ability. This algorithm suggests that the most predictive factors in the determination of whether patients will be indicated for TJA or UKA during their subsequent orthopaedic surgery visit are: (1) degree of radiographic arthritis (2) a trial of physical therapy (PT) (3) history of intra-articular injections (4) smoking status and (5) opioid use. While the degree of radiographic arthritis may be an intuitive factor in such a determination, the other components of our algorithm are important to highlight not only due to their predictive power, but also because they represent viable treatment options and modifiable risk factors to pursue prior to consideration of arthroplasty. For example, a trial of PT, which is strongly predictive in our algorithm, is also currently recommended with “strong evidence” as a nonoperative treatment modality by the American Academy of Orthopaedic Surgeons (AAOS) [28]. Similarly, intra-articular injections of corticosteroid (but not hyaluronic acid) are also supported with the same confidence by the AAOS [28]. The Centers for Medicare & Medicaid Services also lists both of these modalities as examples of viable non-surgical treatments that should be attempted prior to their approval for arthroplasty [29]. As such, primary care providers, interventionists, and healthcare stakeholders may choose to prioritize these treatment options for patients with early stage hip and knee arthritis, though they should be cognizant of the several potential risks associated with injections [30,31,32,33,34,35].

This algorithm may also help patients and providers alike to understand that factors such as current smoking status and the use of opioids should be modified both for overall health benefits and for potential surgical candidacy prior to evaluation. In their analysis of short-term complications following total hip and knee arthroplasty, Duchman et al. found that smokers experienced a higher rate of wound complications, deep wound complications, and total complication profile as compared to nonsmokers [36]. A systematic review has suggested similar findings and also noted an elevated mortality risk amongst smokers following TJA [37]. Similarly, preoperative opioid use has been associated with worse patient-reported outcomes [38], increased postoperative opioid use [39], greater complication rates [39], and higher rates of subsequent revision [40] following TJA. Given the elective nature of hip and knee arthroplasty, it is understandable that providers are reticent to suggest surgery for patients in whom such risks are not modified. These also represent areas of potential inefficiency in specialty clinic visits, as modification of these risks may be better handled in the primary care setting prior to surgical evaluation.

This project is not without several important limitations. First, the number of patients included in our analysis is fewer than we would generally target in the development of an algorithm with so many potential predictors. Nonetheless, these patients represent all available new patients seen via telemedicine for hip or knee osteoarthritis within a large, integrated healthcare system during the height of the COVID-19 pandemic at a time when in-person evaluation was not permitted. This allowed us to conduct observational research that would otherwise be confounded by selection and indication bias if data were included from time periods when patients could equally be assessed through telemedicine and in-person encounters. As a result, we recognize that the algorithm we developed must be studied further using external data and via prospective analyses before it can be deemed ready for clinical application. Second, although we tried to include all evidence-based potential predictors of surgical candidacy, it is still possible that other factors not included in this analysis could prove informative. Examples of factors that we would have liked to include but could not due to data availability were patient-reported outcomes as well as a patient’s desire for surgery. These variables are important to consider in the future prospective work we envision. Additionally, although ten arthroplasty surgeons’ practices were included in analysis, they are all part of an integrated health system serving a similar population within a metropolitan area, which may limit heterogeneity and generalizability. Finally, it is important to note that these encounters occurred in the setting of a pandemic. While we believe this allows for the study of patient-specific factors that are not biased by in-person physical examination, this reality may have systematically affected indications for surgery in ways not currently characterized or understood.

In conclusion, we were able to create a machine learning algorithm to identify and quantify potential surgical candidacy for THA, TKA, or UKA in the setting of osteoarthritis without an in-person evaluation or physical examination. If externally validated, this algorithm could be deployed in a multitude of ways to optimize musculoskeletal care delivery. For example, patients, primary care providers, and health systems could utilize this algorithm to direct next steps in care for osteoarthritis and the need for surgeon or non-operative interventionist referral. In a future state of growing supply–demand mismatch for orthopaedic care, musculoskeletal service lines will need to increasingly leverage data analytics and similar algorithms to optimize patient-provider interactions to provide high quality, efficient, and cost-effective care.