Introduction

Clinicians have long-acknowledged that patients adapt to their health condition [1]. They find ways to be happy despite restrictions in ambulation, respiration, and energy [2]. They find meaning and purpose even as their life narrows in scope or activity [3]. They re-think what is important to them [4], what “good quality of life” (QOL) means [5], what “moderate fatigue” means [6]. All of these underlying and often unspoken changes mean that, although the same person completes patient-reported outcomes repeatedly in a longitudinal study, they may be using different internal standards, referencing different values, or considering a different conceptualization of what the investigator is targeting [7,8,9]. These “response shifts” are critical to adaptation, and without response shift, patients’ ability to process the vicissitudes of life, health, and aging would be impaired [10, 11].

While a substantial body of work postulates that response-shift effects may serve to hide intervention benefits [7, 8, 12, 13], much of the research concerns observational studies that are not randomized controlled designs. Response-shift effects are, however, likely of relevance to clinical-trials research. The intervention and control/placebo groups likely adapt differently, particularly if the intervention is effective. If they adapt differently, what are the implications for the treatment’s observed benefit? Does response shift play a role in non-inferiority trials? What can be learned in such trials if, for example, treatment arm differences are negligible? Is response shift ignorable by clinical trialists? By governmental agencies responsible for vetting new drugs?

We postulate that clinical trialists have largely ignored response shift phenomena in pivotal trials, and that any such investigations would be secondary analyses. We believe that this context may relate to concerns that response-shift studies will undermine pivotal trial findings, no matter how well-designed the control group. By ignoring this widely acknowledged human-adaptation process, however, trialists are co-existing with an “elephant in the room.” In other words, everyone knows it’s there and no one is talking about it.

The present work implemented a scoping review of the literature to find all clinical trials that addressed response shift phenomena, and to characterize how response-shift effects impacted trial findings. Such a review is an approach for evidence synthesis [14] that seeks to identify knowledge gaps, clarify concepts, or investigate research conduct [14]. A scoping review may be a precursor to a systematic review, the latter seeking to uncover and appraise international evidence, utilizing methods that minimize bias using rigorous methods to synthesize information related to a particular question, and to inform practice [14]. Accordingly, and consistent with a scoping-review approach [15], the present work seeks to describe the clinical-trials literature with regard to response-shift investigations. It does not seek to quantify average effect sizes, which would be more appropriate to a meta-analysis (e.g., [16]).

Methods

Data sources and search strategy

We implemented a scoping review of the medical literature from 1968 to 2021. The goal of the scoping review was to focus on response-shift research in the context of clinical trials. Using the search terms (i.e., keywords) “response shift” and “clinical trial,” we searched the following databases: Pubmed, CINAHL Plus, Embase, PsycInfo, and Google Scholar. We then combined the search results and removed any duplicate literature.

Article selection and characterization

The research team met repeatedly in advance of the initial screen and selection of articles to make sure we all understood the screening methodology. This approach was similar to an earlier project done by this research team, in which we demonstrated that this collaborative approach was efficient and achieved a high level of reliability [11]. The first author (CES) then examined the list of search results, and identified articles that explicitly examined response-shift effects in clinical-trials data. Articles were included at this initial stage if they were a clinical trial that explicitly examined response-shift effects. Articles were excluded at this initial stage for the following possible reasons: were not a clinical trial (e.g., observational study, literature review, protocol only; were an abstract only, not a full report); or if response shift was mentioned only in the introduction section and/or discussion section of the paper.

The resulting set of clinical-trials articles were then divided among four raters (CES, ICH, GR, RS) for further characterization of: (a) the main research question; (b) the drug or intervention being evaluated; (c) the clinical-trial design; (d) patient population; (e) patient-reported outcome tools used; (f) response-shift methods used; (g) whether there were hypotheses regarding response-shift effects and (h) response-shift findings. At this stage, further exclusions were made if the article focused only on non-randomized study participants or was not a clear example of response shift. Additionally, we excluded articles that used the then-test method, due to the plethora of studies documenting problems of reliability and validity of this obsolete method [17,18,19]. All summary information about the final set of included articles was double-checked by all four raters to ensure accuracy of presented syntheses.

Synthesis of review results

We grouped the final set of retained articles by: (a) whether the study was a primary or secondary/post-hoc analysis; (b) whether the intervention was drug/medical device/surgery or psychosocial/behavioral/nursing intervention; and (c) whether the stated focus of the work was primarily methodological or on the clinical impact of response-shift effects.

We then examined whether response shift affected trial results as a function of the study’s statistical power for the specific response-shift detection method used, using well-accepted guidelines for statistical-power considerations [20,21,22]. For example, using Cohen’s criteria for a comparison of means, a study would need to have at least 26 people per group or treatment arm to be powered to detect a large effect size [21]. For multivariate analytic methods that require larger sample sizes to yield robust estimates, larger sample sizes would be needed [23]. For example, a rule of thumb for structural equation modeling is 200 people per group/time point being evaluated, which would enable adequate power robust estimates of loadings of 0.90 but not for loadings of 0.80 [24]. Although some recent studies indicate a range of sample size goals for such modeling, the differences in sample sizes are primarily driven by number of variables in the model. In studies investigating response-shift effects as measurement invariance using a structural equation modeling framework, we believe the general rule of thumb is a reasonable criterion. Ideally, studies would be powered to detect at least a medium effect size since this corresponds to clinically significant change [25]. Being powered to detect small effect sizes would correspond to current estimates of response-shift effects in observational research [16], although a clinical trial of a potent intervention might be expected to yield at least medium response-shift effects. For the purposes of the current work, “adequate” was defined to be 80% power, α = 0.05, to detect at least a large effect size given the specific statistical method used. The point estimate for small, medium, and large effect sizes differs depending on the statistical method used, and the interested reader is referred to Cohen’s seminal paper for examples using common behavioral-science methods [21].

If adequately powered, we evaluated whether the response-shift adjustment made the treatment look more or less effective. In other words, did the treatment(s) being assessed in the clinical trial have a more or less beneficial impact on outcomes when response-shift effects were considered?

Results

Descriptive characteristics of included articles

The database search yielded 2148 unique references, 116 of which were found using PubMed, CINAHL Plus, EMBASE, and PsycINFO, all of which were duplicated in the Google Scholar search which yielded 2148 articles. After excluding ineligible articles (2088 not clinical trials, 35 mentioning response shift only in introduction or discussion), a set of 25 articles was included in the scoping review (Appendix Table 2) (Fig. 1). Further exclusion was done because the article focused on a non-randomized sample (n = 1) [26]; the article used the then-test exclusively (n = 5) [27,28,29,30,31]; or because the article was not a clear example of response shift (n = 2) [32, 33] (Table 1, Fig. 1). The remaining 17 articles included ten papers reflecting four trials, where 11 of which addressed a distinct response-shift hypothesis (Fig. 1).

Fig. 1
figure 1

Flow chart of the article selection process for final set of retained articles

The 17 articles were predominantly secondary or post-hoc analyses (n = 12) [34,35,36,37,38,39,40,41,42,43,44,45], rather than explicitly designing the study to address response-shift effects (n = 5) [19, 46,47,48,49] (Fig. 2). Eight of the retained articles addressed drug, medical-device, or surgical interventions [11, 36, 37, 39, 40, 43, 44, 48], and nine addressed psychosocial, behavioral, or nursing interventions [34, 35, 38, 41, 45,46,47, 49, 50]. Ten of the articles were focused on methodological development [19, 34,35,36,37,38, 40, 41, 50, 51], and seven on the clinical impact of response shift [11, 39, 43,44,45,46, 49].

Fig. 2
figure 2

Characterization of clinical-trials articles included

Substantive findings

Of note, the articles documented that response shift affected trial results more often than not. This impact was also associated with the statistical power of the comparisons done (Fig. 3). Among the 10 retained articles that had adequate power, seven documented a clinically-important response-shift effect that affected trial results [11, 37, 38, 43, 44, 46, 49], two did not [47, 50], and one did not address the clinical impact of response shift [41]. Among the seven retained articles with inadequate power, two documented a clinically-important response-shift effect (one better [39], one worse [35]), and five documented no impact on the estimated intervention impact [36, 40, 45, 47, 48].

Fig. 3
figure 3

Impact of power on response-shift effects on trial results

Considering only the ten adequately powered studies that documented a response-shift impact on the intervention, seven revealed a clinically-important response-shift effect that made the intervention look significantly better [11, 37, 38, 43, 44, 46, 49], none made it look worse, and three documented no impact [34, 50], and one did not address the impact on the intervention effect [41] (Fig. 4). These findings support the long-standing presumption that response shift phenomena may serve to obfuscate treatment benefits [7,8,9,10, 52, 53].

Fig. 4
figure 4

Impact of response-shift adjustment on estimated treatment benefit

The studies that revealed a greater intervention effect after integrating response-shift related changes. Consistent with intervention goals, there were changes in priorities/preferences after Advance Care Planning interventions [46, 49]; changes in conceptualization after post-stroke rehabilitation [34]; changes in internal standards for physical functioning after anti-hypertensive treatment [37]; and changes in internal standards resulting in increased honesty about risky drinking behavior after a motivational-interviewing intervention [38]. Larger differences in mental-health functioning were found between treatment arms, after considering response-shift effects in a trial comparing a highly effective drug to placebo for patients with a chronic progressive neurological disease [11, 43]; and two effective treatments in a non-inferiority trial for a chronic blood disorder were found to yield “better than normal” QOL compared to the general population [44].

Table 1 Clinical Trials Articles Found After Initial Inclusion/Exclusion Criteria Applied

Discussion

The present scoping review documents an emerging literature on response shift in randomized-controlled clinical-trials research. This literature draws on medical and behavioral interventions, and focuses on a both methodological and clinical-impact studies. Among the two-thirds with adequate statistical power for the response-shift analyses implemented, it was eminently clear that response shift phenomena affected trial results and predominantly in the direction of revealing more substantial treatment benefits. Most of the studies suggesting no impact of response shift phenomena were underpowered for the response-shift analyses implemented, thereby undermining their conclusions. Thus, when the “elephant” was empowered to “speak”, the response-shift effects detected in adequately powered studies suggested greater treatment benefits than previously found in the pivotal trials.

The implications of the present work are substantial for clinical trialists. First, response-shift effects are likely important for better understanding treatment effects for both medical and behavioral interventions. Further, this better understanding is unlikely to denigrate the benefit of the treatment effects, as estimated by analyses that do not explicitly consider response-shift effects. For example, clinical trials that do not document a positive impact on mental health in the context of a powerful treatment that modifies disease progression, may well be “hiding” a response-shift effect that belies the greater benefit of the drug (e.g., [11, 43]). Uncovering such effects is an important and clinically relevant outcome.

A second implication is that clinical trialists interested in evaluating response-shift effects in their trial data should pay attention to statistical power considerations when selecting the response-shift detection method. Our study suggests that being underpowered studies were more likely to conclude that response shift phenomena did not affect trial results. Recent developments in response-shift methods provide efficient and effective ways to examine response-shift effects even in the context of small samples [11, 43] or non-inferiority trials [44]. These include adaptations of random-effects modeling [54], equating [55], and case–control studies [56].

The present study also has implications for the Federal Drug Administration (FDA) and European Medicine Agency (EMA) in their process for considering new drug applications. If considering response-shift effects leads to an increased estimate of the treatments effect, then it should be standard practice to use methods that integrate response-shift effects in clinical trials analyses. Of note, such methods should have a strong evidence base for reliability and validity, thus specifically excluding use of the then-test method. Additionally, treatments that enable response-shift effects are likely more desirable than those that do not. While this idea is apparent in the context of rehabilitative nursing interventions (e.g., [34]), it is also desirable in other contexts. A drug with severe toxicities may make it difficult for the patient to reprioritize or reconceptualize QOL, particularly in the control arm if the drug is the current standard of care, because so much of the patient’s time is spent in suffering. Explicitly requiring analyses that consider response-shift effects for new drug applications would be an appropriate policy implication of the present work. Further, trialists should be encouraged to evaluate response effects separately for the interventional arm and control arm. If the response-shift effect is detected, then the treatment findings should further incorporate or adjust the response-shift effects. Accordingly, the FDA and EMA should integrate response-shift effects into their guidelines and operational standards for incorporating response-shift evaluation in the future trials, particularly with regard to considerations of statistical power for the response-shift detection method(s) being used.

The present work had notable advantages, such as considering a large set of potential articles and reducing the set for further consideration based on clear and replicable criteria. It is possible, however, that this large set was limited by publication bias, that is that null results were not deemed publishable and thus not available for inclusion. This source of bias may have distorted the findings of the present work.

Conclusions

In summary, this scoping review identified 25 randomized-controlled clinical trials that addressed response-shift effects. A subset of 17 were retained after implementing exclusion criteria, of which 10 were adequately powered to implement the statistical methodology used. These papers generally documented a larger treatment effect after considering response-shift effects. This work thus demonstrated that response shift has an effect on clinical-trial outcomes, and supports the recommendation that current and future researchers should incorporate methods to detect response shift when reporting results, especially when reporting null results. The formal consideration of response-shift effects in clinical trials research will thus not only improve estimation of treatment effects, but will also integrate the inherent healing process of treatments. Adaptation is part of a positive outcome process, and thus should be central to clinical trials analyses.