Patient Reported Outcomes Measures (PROMs) of constructs such as Quality of Life (QoL) are important patient-centered outcomes that are used to evaluate healthcare interventions . Measurement requires standardization to be valid and reliable for estimating change. Longitudinal measurement invariance is considered a required condition for allowing comparisons of PROM scores over time . The actual occurrence of this condition in the context of analyzing longitudinal PROM data has been challenged  and was illustrated by what were initially called “paradoxical and counter-intuitive findings” , such as reports of stable or improving QoL over time by patients with a life-threatening disease . Such findings suggest that the meaning of some constructs and items is time dependent and patients understand them differently as they go through new life experiences. This suggestion is especially important when the instruments aim to be patient-centered [6, 7]. Evaluation-based self-reports (i.e. self-reports which involve judgment using idiosyncratic criteria such as items like “How difficult is it to walk up a flight of stairs?”) are particularly prone to this change in meaning over time . This phenomenon is now known as response shift .
In the last 25 years, a growing body of literature has explored the intricacies of considering response shift in measuring constructs . Various definitions and theories were proposed to integrate response shift in explaining change in self-reports [3, 6, 9, 10]. Multiple methods were proposed to analyze response shift in PROM data [11, 12]. Response shift was evidenced in various conditions [13, 14]. These studies have helped to better understand occasional discrepancies between researchers’ or healthcare professionals’ expected assessments of patients’ health and patients’ self-reported health, by highlighting processes such as psychological adaptation to illness or the appraisal of PROM items. Thus, these insights have enriched the interpretation of PROM results [8, 15]. Meanwhile, fundamental debates continued, evolving around the definition of response shift [16,17,18,19,20,21,22,23,24,25,26,27,28,29], the act of measuring subjective constructs , and the relationships between response shift and related concepts [31,32,33,34].
Hence, a critical, comprehensive review and synthesis of the work on response shift was deemed crucial. In 2019, an international, interdisciplinary working group of 26 researchers, consisting of response shift experts, new investigators, and independent external experts was formed to achieve this synthesis . They were divided in four teams [12, 14, 15], with the current team focusing on definition and theory.
The objectives are to: (1) outline extant definitions and theories of response shift and related concepts; (2) identify the predicaments encountered in the response shift definitions and theories; (3) propose a more specific, formal definition of response shift; and (4) illustrate it with a revised model addressing the identified predicaments. We also provide some examples of how specific parts of the proposed model can be tested (eText1), while acknowledging that details about operationalizations of model entities are beyond the scope of this paper.
Extant definitions, theories of response shift and related concepts (Supplementary eTable 1)
The concept of response shift dates back to research on organizational change, where in 1976, Golembiewski proposed a typology of change that took into account that some intervals of a measurement continuum associated with a constant conceptual domain may be recalibrated (beta change) and that some domains may be reconceptualized (gamma change) .
Independent of this work, in the field of education, the term “response shift” was coined by Howard et al. in 1979 as an explanation for an observed discrepancy between quantitative self-reports (an increase in self-reported dogmatism at the group-level after an intervention designed to reduce dogmatism) and qualitative interviews (endorsing that the intervention was considered beneficial) . Howard et al. hypothesized a change in internal standards of measurement of dogmatism in people’s mind explaining this discrepancy. They proposed to extend the pretest–posttest research design with a retrospective self-assessment of the pretest level (called “then-test”) immediately administered after posttest assessment. The posttest minus then-test difference was considered a better method of assessing the intervention induced change as both measurements were presumably taken within the same cognitive framework (that from the posttest perspective). Response shift was then defined as the mean difference between pretest and then-test self-report ratings .
Sprangers and Schwartz  combined and expanded the two aforementioned definitions and proposed a working definition of response shift as a change in the meaning of one’s self evaluation of a target construct as a result of three causes. First, recalibration, indicating a change in the respondent’s internal standard of measurement. For example, a person may rate his/her chronic back pain level on a Visual Analogue Scale as 5/10 with 10 being the worst pain imaginable. However, after experiencing an extreme acute pain such as renal colic, providing a new experience of the worst pain imaginable, the patient, may rate his pain level as 3/10 despite the level of pain being the same as before. Second, reprioritization, which is a change in the importance of component domains constituting the target construct. To illustrate, after a car crash that resulted in permanent motor deficits, social functioning and good relationships can become more important for one’s quality of life than physical functioning. Third, reconceptualization, which pertains to a redefinition of the target construct. For instance, after experiencing depressive disorder, mental health may be understood as including components previously related to physical health such as exhaustion. A theoretical model was proposed where a catalyst (a salient health event, e.g., initiation of a medical treatment) may trigger psychological mechanisms (e.g. coping, social comparison) to accommodate the health change, which in turn may induce response shift that can affect the self-evaluation of the target construct (e.g. QoL) . The kind of mechanisms an individual would adopt and the magnitude and type of response shift that would result, was made dependent on dispositional characteristics that were termed antecedents.
In 2004, Rapkin and Schwartz proposed an updated model focusing on the previously insufficient differentiation of response shift from both mechanisms and outcomes. They contend that any self-report is a function of appraisal (i.e. the cognitive processes needed for answering survey questions ) . Four main types of appraisal processes were specified. Response shift is defined as changes in appraisal (e.g. a change in standard of comparison such as comparing pain from “the worst pain I’ve ever had” to “what my doctor told me to expect”), that can account for unexpected changes in QoL that cannot be explained by “standard influences” (such as the impact of the catalyst) .
In 2005, Oort adopted a different perspective in an attempt to enhance definitional clarity by proposing a formal definition of response shift  as a special case of violation of the Principle of Conditional Independence (PCI) . Conditional independence refers to the situation where a PROM provides the same results across different samples or over time, given that there are no differences or changes in the target construct. In 2009, Oort and colleagues used this definition to distinguish between two perspectives on response shift. From a measurement perspective, response shift occurs when change in the target construct is not fully reflected by the observed change in the measurement. In the conceptual perspective, response shift is viewed as an effect occurring when change in the construct is not only explained by “standard influences” (i.e., acknowledged explanatory variables) but also by other variables such as the impact of psychological mechanisms .
These laudable attempts to define response shift did not prevent people from attaching diverse meanings to the term [10, 16]. Table 1 lists a range of frequently employed concepts in the literature that are related to response shift. We defined these concepts and clarified their relationships to response shift. For example, in health psychology, post-traumatic growth can be viewed as a cause of response shift. In the context of measurement theory, concepts for which violations of conditional independence are used to identify systematic differences in indicators across time are clearly related to response shift (e.g., when investigating differential item functioning  or non-invariance between measurements at different points in time in a longitudinal study ). But those are only approaches to detect phenomena that could be the result of a response shift occurring, not necessarily the response shift itself (see Table 1).
Predicaments encountered in previous definitions and theories of response shift
Several predicaments were encountered during the review of the definitions and theories of response shift. First, in an attempt to reconcile different perspectives on response shift, Oort et al. proposed two definitions of response shift, from the measurement and the conceptual perspective . Each definition was formulated using the same (statistical) terminology, i.e., as a violation of conditional independence. However, this distinction has not been widely adopted, possibly on account of a too general conceptualization, encompassing other instances of measurement bias and its statistical foundation may have been too complex. We therefore propose further specification and clarification of their response shift definition.
Second, as response shift is a time-dependent phenomenon related to change, the models of Sprangers and Schwartz  and Rapkin and Schwartz  are indeed focused on explaining change in the target construct. For example, in the Rapkin and Schwartz model, the processes are shown to drive “Change in Quality of Life” . By focusing on explaining change in the target construct rather than explaining the construct at each measurement occasion (with at least two time points as the simplest model), those models neglected the importance of using multiple time points to investigate response shift. Incorporating at least two time points in a theoretical model would enable a clearer explication of the chain of causality among the constituting components over time [3, 10].
Third, extant models do not explicitly discriminate the target construct (e.g., QoL) from its measure (e.g., PROM). Whereas the construct and its measure are closely related, by definition, response shift is a phenomenon addressing changes in their relationship. Explicitly distinguishing the construct and its measure enables better characterization of how response shift can occur.
A more specific formal definition of response shift
Usually, a PROM is designed to measure a construct defined with an a priori conceptual model of its component domain(s) and is used after it has been shown to yield sufficient psychometric quality [1, 40]. The interpretability and validity of a PROM lies, in part, in ensuring that patients understand the items in the same way the designers intended. However, as answering a PROM inherently involves a subjective process of appraisal [6, 37], a discrepancy can occur between the meaning inferred from this process and the meaning the designer wanted to convey. If respondents understood the items in the same way over time (intra-individual invariance of meaning over time), there would be no response shift . But circumstances may change, and that change may impact patients’ interpretations of the item(s). When that happens, it seems reasonable to assume the a priori relationship between the target construct and its measure also has changed over time. Thus, a formal definition of response shift should encompass the occurrence of this discrepancy between measurement occasions.
To address the first predicament, we consider the measurement perspective to response shift only. Response shift is then the effect that occurs when circumstances cause people to change their interpretation of the measurement of the underlying target construct, e.g., as the result of accommodating a health change. Consequently, there is a discrepancy between the observed change (e.g., change in PROM scores) and the target change (i.e., change in the target construct). Response shift therefore can be more narrowly defined as a special case of violation of the PCI when observed change is not fully explained by target change. This definition can lead to the operationalization of response shift at group level as well as individual level. Moreover, we assume this phenomenon to be the consequence of “a change in the meaning of one’s self evaluation of a target construct,” which phrase was used in the working definition of response shift .
A possible translation into mathematical terms of the definition (i.e. formal definition) at a statistical level is given by: there is response shift if ψ1(Observed Change|Target Change) ≠ ψ2(Observed Change|Target Change, Other Variables), where ψ1 signifies the distribution of observed changes (Observed Change; e.g., change in PROM scores) conditional on the change in the construct intended to be measured (Target Change; e.g., change in QoL), is unequal to ψ2, the distribution of Observed Change given change in the target construct and any Other Variables (e.g., adaptation to or coping with a new health state).
This more specific definition considers response shift as an effect but does not explain how this effect occurs. In the context of health care, we need a theoretical model depicting the components that can be understood as “Observed Change”, “Target Change” and especially “Other Variables” (e.g., catalyst, mechanisms, antecedents). Moreover, the model needs to illustrate the relationships between these components over time to unravel the potential pathways leading to response shift. Thus, the next step is to propose a model depicting these components and their relationships at two time points (addressing the second predicament), distinguishing both the target construct and its measure (addressing the third predicament).
A revised response shift model (Fig. 1, Tables 2 and 3)
The proposed model is a modified version of previous ones [3, 6]. This model makes an explicit distinction between the target construct (e.g., QoL) and its measure (e.g., PROM scores) and shows the conceptual components and their interrelationships at two points in time. It depicts the simplest longitudinal design but can be extended to more time points. It is a Structural Equation Metamodel , which means it depicts relationships between conceptual entities without any assumptions about the operationalization of such entities as variables or the mathematical form of the relationships among the entities. As the passage of time drives the relationships between entities, cause-effect relationships are proposed. The most plausible paths are depicted and explicitly labeled. Table 2 lists the underlying assumptions of the proposed model.
In addition to the target construct and its measure at each time point, three main interrelated components that featured in the previous models are also retained. First, the model is centered on a catalyst: a health event or life experience that can have an impact on the target construct (C2 path) at time 2. It can differ from person to person, it can be a distinctive event (e.g. a car accident), multiple events happening in a short period of time (e.g. diagnosis of cancer) or experience accumulated with passage of time. The catalyst represents the necessary condition leading to change.
Second, antecedents are more or less stable characteristics related to personal (e.g., personality, comorbid conditions) or environmental factors (e.g., access to health care) that determine the context in which individuals live (see Fig. 1). Hence, the term is used more broadly, also encompassing environmental factors, than Sprangers and Schwartz did . Several models have been proposed to classify these factors including the International Classification of Functioning, Disability and Health  and the Wilson-Cleary Model . In a given empirical situation, these antecedents need to be known because they influence the baseline condition, including the possible occurrence of a catalyst (A2) and the way someone will react to the catalyst (A4). Moreover, they may influence the target construct (A3) and the responses to PROM items (A1) at time 1. These influences can be carried to time 2 through the TC14 path (target construct at time 2) and the Me11 path (responses to PROM items at time 2).
Third, mechanisms are psychological processes triggered by the catalyst to accommodate the threat to one’s homeostasis. These processes may be adaptive or maladaptive and people can adopt more than one mechanism simultaneously to restore the balance (see Fig. 1, and for examples of psychological processes Table 1).
When all the pathways coming from the catalyst, either directly (C2 and C3) or mediated by the mechanisms (C1 then M1 and M2 paths), are equal to zero, the variability of the target construct and its measure are carried from time 1 to time 2 (TC14, and Me11). In that case there is no change. Otherwise, there is change in the construct and/or its measure.
According to this model, response shift occurs when the target construct cannot fully explain the variability of the PROM results at time 2 (another path than the TC21 and Me11 explain the measure at time 2). Two main pathways indicate the possible occurrence of response shift. First, a direct effect of a catalyst on the PROM at time 2 (e.g. an acute shock due to a near escape from a car accident, influencing the interpretation of a PROM immediately administered afterwards, where the limited passage of time makes the influence of mechanisms less likely). This effect will explain part of the variability in the PROM at time 2 (C3) and as it is not explained by the variability of the construct (TC21), there will be response shift. Second, a more convincing response shift effect occurs when the catalyst impacts the PROM at time 2, mediated by the mechanisms (C1 then M2 paths). These paths depict the possibility that psychological adaptation to a situation can impact the way someone answers PROM items at time 2. Again, if this influence directly explains, in part, the variability of the PROM at time 2 (M2), then response shift has taken place.
Finally, apart from its baseline value (TC11) and the impact of the catalyst (C2), the target construct at time 2 can be explained by another pathway: the direct influence of mechanisms (M1). Nonetheless, we do not consider this as response shift because it impacts the target construct but not the PROM, so it will not lead to a discrepancy between observed and target change. A description and illustration of the individual paths in the model is presented in Table 3.
Response shift and the operational model are not just “armchair” phenomena and processes but refer to real life experiences of people as they go through and try to make sense of health changes. Each of the components of the model have been experienced by people in their everyday lives. Supplementary eTable 2 presents how people have described these experiences in their own words .
Implications of the formal definition and its application to PROMs at two time points
The more specific formal definition and the revised model clarify that response shift is an effect. The revised model specifically explicates the chain of causality among the constituting components over time and the multiple pathways leading to both direct (i.e., impact of the catalyst) and mediated effects (i.e., by mechanisms) on the PROM indicating response shift. Several implications and assumptions warrant attention.
First, a major implication is that recalibration, reprioritization, and reconceptualization (3 Rs) have been removed from the definition of response shift. These concepts are not necessarily response shift in itself. Rather, they explain how response shift can occur, i.e. they add further explanation to the processes depicted by the model. The interaction between a catalyst, antecedents, and mechanisms may cause people to recalibrate the measurement scale they need to complete, reprioritize domains they value, and/or reconceptualize the underlying construct they need to rate, such that it will lead to a discrepancy between target change and observed change, hence a response shift.
Similarly, we also consider change in appraisal as an explanation of how response shift occurs  rather than response shift itself . At each measurement occasion, appraisal is needed to arrive at a response to PROM item(s) [6, 37]. Appraisals are cognitive processes that come into play when a respondent evaluates him/herself with respect to a target construct and chooses a response option. When there is a change in appraisal then the meaning of the observed response changes. Rapkin and Schwartz showed how each of the four appraisal processes they adopted correspond with the 3 Rs . The 3 Rs can thus be viewed as examples of changes in appraisal. It should be noted that changes in appraisal may not be limited to the 3 Rs as more cognitive processes have been identified .
Third, in the model we depicted an extra box referring to theories that may explain why response shift could occur. These theories purport to explain why people adapt, cope, and try to regain balance after a disruptive event (see Table 1). These theories describe possible mechanisms that may induce response shift and can be considered the underlying theories explaining the main principles behind the model.
The proposed model delineates the plausible paths explaining both changes in the target construct between two times of measurement and offers numerous opportunities for strong predictions and empirical tests. We have adopted an agnostic approach, i.e., we have not specified how the depicted entities are operationalized nor how these are mathematically linked. At the stage of analyzing data, careful attention is needed for appropriate testing of response shift. For example, the target construct can be operationalized as a latent variable inferred from directly measured variables (e.g. scores) using Structural Equation Modeling, Item Response Theory or Rasch Measurement Theory. As these latent variable models allow to formally specify and estimate the measurement model between the target construct (as latent variable(s)) and the measure (e.g. the items) using a set of equations, a test verifying whether this set of equations can be assumed equivalent at each time of measurement can be seen as a formal test of the violation of the PCI [45,46,47]. Sébille et al.’s critical review of the literature also demonstrated that there are other response shift methods that also examine discrepancies between target change and observed change . To provide a starting point, a selection of approaches to test specific parts of the proposed model are presented in supplementary eText 1. It should be noted that these are mere examples, without intending to narrow the presented model nor the range of potential statistical or psychometric methods. We anticipate that findings which will either support or refute this revised model will require multiple studies, employing a variety of methods.
As mentioned before, we assume that response shift as defined as a special case of violation of the PCI is caused by “a change in the meaning of one’s self evaluation of a target construct”. Our formal definition has the advantage that it separates response shift from its possible causes. It also separates response shift from its methods of detection. Indeed, any method that could detect violations of the PCI in longitudinal data is able to detect response shift. However, as discussed by Sébille et al. , violation of the PCI may be considered a necessary but not a sufficient condition for the occurrence of response shift. That is, violation of the PCI may not always imply change in the meaning of one’s self-evaluation. Hence, if we further restrict the definition of response shift by requiring that it must be caused by a change in the meaning of one’s self evaluation, then alternative explanations need to be ruled out before the conclusion that response shift has occurred is warranted (Table 1).
Lastly, our definition and model rely on multiple epistemic, methodological and practical assumptions (Table 2). In our definition and model, response shift is understood to be an effect that occurs when the construct is not similarly measured over time. Thus, the model treats response shift as a discrepancy between a theoretical model where observed change is fully explained by target change at each time of measurement and what happens in reality. Our definition and model seem to conflict with some of the disability literature. Disability-positive testimonies and the disability pride movement advocate that QoL and functioning with disability can be good. These testimonies make a particular point of emphasizing that mechanisms such as coping transform constructs such as QoL and functioning . Put differently, disability-positive testimonies argue that these constructs are heavily idiosyncratic constructs. This alternative conception can help to recognize our definition and model are deeply connected with the idea of measuring a construct in a quantitative manner and are therefore possibly a better fit for a nomothetic approach of constructs using statistical modeling on empirical quantitative data.