Main text

The systematic review by te Molder and colleagues [1] summarized various methods used by investigators to dichotomize outcomes of patients with knee arthroplasty (KA) as either good or poor. There are important reasons for wanting to know if a patient’s KA outcome is good or poor. For example, interventions to improve outcome can be specifically designed and targeted to patients fitting the poor outcome phenotype. The dilemma with categorizing outcome, as te Molder et al. and others [2, 3] have noted, is that definitions of good versus poor outcome vary substantially across the many studies that have attempted to categorize outcomes following KA. Variation precludes consensus and prevents meaningful comparisons across study cohorts. We noted an additional problem with evidence classifying outcome as good or poor [4]. Definitions of good versus poor outcome are grounded in the use of arbitrary cutoff values, whether based on final outcome score, percent or absolute change from baseline or the Minimal Clinically Important Difference (MCID) family of change indicators.

The main conclusion of the study by te Molder and colleagues was that there was substantial heterogeneity in the 47 definitions of good versus poor KA outcomes. In our view, te Molder et al. should also have focused on implications related to the homogeneity of these 47 definitions. All studies in the review used the cutoff method to determine good versus poor outcome. Cutoff scores are, by definition, arbitrary. Supplemental file 3 in the study by te Molder et al. [1] provides a partial list of definitions used to establish arbitrary cutoff scores (including two of our prior studies [5, 6]). For example, Brander and colleagues indicated that a 0 (no pain) to 100 (worst pain imaginable) visual analogue pain scale of > 40 indicated a poor pain outcome [7]. This cutoff is arbitrary.

Over three decades ago, researchers and clinicians were warned about the arbitrary nature of the cutoff method for clinical decision making and proposed latent class analysis as a scientifically defensible alternative [8]. Recent methodological developments also have been extensively documented [9]. In 2011, we further elaborated on why the cutoff method should not be used to determine patient groupings in scientific research, developed methods originating from discrete latent variable modeling approaches to circumvent problems associated with the arbitrary cutoff method, and provided multiple examples using real-life data to illustrate how new methods could be used to answer scientific questions [10]. In 2019, we used methods originating from a longitudinal discrete latent variable modeling framework to define poor versus good outcomes in KA [4]. For reasons that were unclear to us, given that it met inclusion criteria by te Molder and colleagues, our 2019 study [11] was not included in the review. This latent variable modeling method does not rely on biased good versus poor cutoffs but rather on statistical modeling that is free of arbitrary decision-making.

The cutoff method is an impediment to scientific progress. If we continue to overlook homogeneity, and don’t acknowledge that this evidence relies on arbitrary cutoff scores, we will keep using arbitrary cutoff scores to define poor outcome in KA. Going down this road would lead to even more studies that rely on arbitrary cutoffs and we’ll have made no progress. In our view, the answer to the lack-of-consensus problem posed by te Molder et al. for defining good versus poor outcome in KA is not to continue relying on arbitrary cutoff scores. Instead, we should rely on a non-biased statistical model-based approach to categorizing good versus poor outcome [11].

Once the cutoff method is replaced with model-based approaches, we suggest the following strategy: Researchers focus on factors that matter most as the sources of outcome variability. For example, what constitutes the KA outcome (e.g., self-reported knee pain, function, health-related quality of life)? Whose perspective(s) should be captured (e.g., patients, relatives, surgeons, or a combination)? What are the optimal time point(s) for measuring outcome (e.g., 2 weeks before and after KA, and four additional times over subsequent 2 years)? What are the key predictors of good versus poor outcome classes? We contend that a coordinated consensus-based strategy like the one described above is needed to shift the paradigm of this type of work and advance the science of good versus poor outcome identification in KA.

Abbreviations

KA: Knee arthroplasty; MCID: Minimal Clinically Important Difference

Acknowledgements

N/a

Authors’ contributions

DLR and LD each contributed to the original draft, the revisions and both approved the final version.

Funding

No funding was obtained for the paper.

Availability of data and materials

N/a

Ethics approval and consent to participate

N/a

Consent for publication

N/a

Competing interests

The authors declare that they have no competing interests.

Author’s response

Thank you for giving us the opportunity to write a response to the correspondence “Classification of good versus poor outcome following knee arthroplasty should not be defined using arbitrary criteria”.

We thank Riddle et al for their interest and critical assessment of our inventory review in which we summarized definitions of poor response to total knee arthroplasty (TKA). Riddle et al suggest that we should have focused on implications related to the arbitrary and homogeneous use of cutoff points. Instead, Riddle et al strongly recommend to rely on a model-based approach to define poor response to TKA. Several model-based approaches are available to identify subgroups with different growth curves. We acknowledge the value of those models. However, a major limitation of these types of models is that membership of poor and good outcome classes can only be determined afterwards and that results with regard to membership of classes cannot be transferred to other study populations.

We fully agree that a drawback of dichotomizing data is data reduction and that a continuous measure is more sensitive to change, and, therefore, more useful on individual level and in clinical decision making. Mixture models can provide more in-depth insight in the course of outcome over time and its determinants. However, to allow comparisons of the prevalence of poor responders to TKA across hospitals, countries, and over time, a strict definition is necessary with clearly defined criteria and thresholds. For this purpose, a dichotomous outcome is more appropriate while the use of mixture models is preferred if the purpose is to gain insight in factors underlying outcomes over time.

The second remark relates to the reason why the 2019 study by Dumenci et al [4] was not included in the review. As the authors indicated in their correspondence, inclusion criteria for our inventory review focused on predefined dichotomized cutoffs to define poor outcome and, therefore, studies including model-based approaches (such as the study by Dumenci et al) were not included.

Nevertheless, we thank Riddle et al for their suggested strategy to focus on factors, perspective(s) and optimal time point(s) for measuring good versus poor outcome in TKA. The intended strategy of our project is exactly what Riddle et al proposed. We use the results of our inventory review and an ongoing qualitative study in patients as well as health care providers to focus on relevant concepts underlying a poor response to TKA. Once the relevant concepts have been identified, we can start the discussion among panelists of a subsequent Delphi study. The ultimate aim of our project is to reach consensus on a definition of poor response to TKA after which we, hopefully, can properly compare the prevalence of poor responders across hospitals and countries.