We agree with Ubel, Peeters, and Smith [1] that distinct and distinguishable phenomena are subsumed under the term “response shift” and that the field would benefit from conceptual clarity. Without acknowledging the literature of the past 5 years, they argue that researchers should replace the term “response shift” by “recalibration” and “adaptation.” In this counterpoint, we will argue that (1) our understanding of the components and implications of response shift refutes part of their criticism; (2) their suggestion will unfortunately not solve the identified problems; and (3) the recently published approaches to disentangling the different components of response shift are more promising to further the field.

A clear sign that the field of response shift is maturing is the controversies that are appearing in the literature. Ubel, Peeters, and Smith [1] are to be commended for initiating a debate about the need for conceptual clarity in defining response shift. They contend that “the term response shift is currently being used to lump together distinct phenomena” [1; page 2]. We wholeheartedly agree with them. Under the heading “Is it the “Emperor’s new clothes” or not?,” we noted in 2005 that “While a decade ago response shift was a rare term in QL research, it now seems to have evolved into a buzzword that is often used in circumstances where contradictory ‘findings we do not understand’ are described. If we allow response shift to encompass ‘everything’ it has lost its meaning, and loses its potential to further QL science” [2]. In the same chapter, we also noted that when we talk with colleagues about response shift, we are always struck by the diverse meanings associated with the term. Moreover, the operational definitions and detection procedures vary widely. Inspired by our conversations with Dr. Frans Oort, we summarized the interpretational dilemmas according to six polar descriptions, two examples of which are bias versus meaningful change and measurement versus subject characteristics. These two dimensions may capture what Ubel and colleagues mean with measurement error versus mechanisms by which people’s true QL changes [1; page 3].

Ubel and colleagues provide two enlightening hypothetical case studies to illustrate their concern about the term response shift. However, we interpret these differently ‘through the lens of response shift.’ In the first case study, the QL score of the man who became paraplegic as a result of an accident has risen over time. We do not doubt that this man genuinely experiences a better QL. However, this change cannot be equaled to true change in a strict psychometric sense, i.e., mean change of an invariant construct. His QL outcome is affected by the shift in focus on what he physically can do instead of cannot do (reprioritization response shift) and by having spirituality to become an important component of his QL that was irrelevant immediately prior to the accident (reconceptualization response shift). Oort proposed a procedure for the detection of response shifts in the measurement of true change through structural equation modeling. His operationalizations of response shifts are based on the idea that reconceptualization refers to a change in the meaning of the item content (i.e., a change in the pattern of common factors loadings); reprioritization refers to change in the relative importance of the item as an indicator of the target construct (i.e., a change in the value of common factor loadings); and recalibration refers to a change in the meaning of the response options of the item (i.e., a change in intercepts and residual factor variances). Only if these three types of change have been accounted for or ruled out does a change in the common factor means reflect true change [3, 4]. In other words, this approach counters Ubel and colleagues’ claim that reprioritization and reconceptualization do not necessarily invalidate QL change. Thus, by Oort’s and our interpretation, all three aspects of response shift are validity threats to within- and between-person comparisons if they remain undetected and unadjusted.

In the second case, a woman reinterprets her pain level caused by a leg wound after experiencing a more painful bout of kidney stones. We agree with Ubel and colleagues that this woman recalibrated the pain scale while experiencing stable pain. We only want to highlight that recalibration may also be at stake when pain levels genuinely change. In other words, recalibration is not restricted to unchanged outcomes. To provide another example, imagine a patient with prostate cancer who feels downhearted and blue at the beginning of radiotherapy as he is anxious about the treatment and his prospects in general. During radiotherapy, he meets other men who are doing physically and psychologically much worse. Based on downward comparison, he believes he is much better off and his spirits lift a little bit. While he has recalibrated the ‘emotional functioning scale’ since the start of radiotherapy, he may genuinely experience and report a better mood. Interestingly, this is another example of what Ubel and colleagues might want to call “emotional adaptation” that, in this case, induces recalibration response shift.

We agree with Ubel and colleagues when they state “Clearly, to understand the QL of people with chronic illness or disability, it is important not only to know what their overall QL is, but also to understand what they mean by QL.” [1; page 11]. This is exactly what motivated our research into response shift. We also could not agree more with the statement “Yet, when people find happiness by shifting their values, their high self-reported QL may simply reflect that they have a good QL!” [1; page 12]. The Rapkin and Schwartz [5] extension to our theoretical model explicitly proposes to measure four appraisal parameters that comprise what patients mean by QL. In their companion paper on the psychometric implications of response shift, they propose the standard of a “contingent true score” whereby scores are deemed comparable, conditional on sharing the same appraisal parameters [6].

Ubel and colleagues’ suggestion that response shift researchers do not trust high levels of reported QL is an unwarranted and misconceived representation. The key is that response shift is only an issue when QL scores are compared, either within individuals over time or between individuals who have different perspectives on QL (e.g., the young versus the old) at one point in time. Response shift is not invoked when interpreting a high or, for that matter, low QL score of an individual or group at one point in time.

A confusion that Ubel and colleagues highlight is worth mentioning. They rightly point out that changing values can be a mechanism by which people emotionally adapt to illness or disability, thereby exhibiting response shift. In other words, there is a logical circularity “if the operationalization of a mechanism is synonymous with the operationalization of response shift … and “when the process of response shift becomes synonymous with the outcome of response shift” [7; page 74–75]. The Rapkin and Schwartz model deals with this circularity by measuring appraisal and changes in appraisal and inferring response shift when changes in appraisal explain discrepancies between expected and observed QL. The distinction between process (adaptation) and outcome of response shift has also motivated work by Oort and colleagues [8], in which they propose two formal definitions of response shift: response shift according to the measurement perspective and response shift according to the conceptual perspective. Each definition was formulated using the same (statistical) terminology. They also revisited the six above-mentioned interpretational dilemmas and showed how these can be resolved with these formal definitions. By disentangling these two perspectives, the authors hope to facilitate response shift research by providing a consistent and clear terminology, formal definitions and sound statistical approaches.

We sympathize with the difficulty Ubel and colleagues have with the term response shift. When we started our research into response shift, we vehemently discussed the usefulness of this term, as semantically it does not capture the phenomena it purports to describe. Not only may responses shift, but also the underlying conceptualization and the relative importance of its constituent domains. However, since response shift was the term already in use and while originally defined as scale recalibration, evolved to include reconceptualization [9], we decided to adopt this term for use in QL research. We do not believe that misleading connotations nor equating response shift with scale recalibration or measurement error warrant abandoning this concept. As indicated above, presenting the issue as “disentangling measurement error,” specifically scale recalibration, from “true change” is too simplistic. Additionally, using the terms “scale recalibration” versus “adaptation” instead of response shift is not likely to help. According to our understanding, adaptation is a mechanism or process and scale recalibration and the other two types of response shift are the outcome. Thus, this distinction mixes two levels of abstraction at best and is a confusing dichotomy at worst.

We do agree, however, that research into response shift has been hampered by conceptual and operational confusion and that precise language and specific terminology is needed. We believe the field would be helped by explicitly distinguishing between recalibration response shift, reprioritization response shift, and reconceptualization response shift. We also believe that Oort’s distinction between response shift according to the measurement perspective and according to the conceptual perspective is a helpful step forward. We agree that the widely used then-test has significant limitations and have suggested guidelines to make research using this design approach more stringent [10]. Clearly, other methods are needed, and we feel encouraged by the novel analytic approaches built on sound scientific foundations that are forwarded in our field, such as structural equation modeling [1113], latent trajectory analysis [14], and classification and regression tree analysis [15].

In summary, we agree with Ubel et al.’s basic concern that a conceptual confusion surrounds response shift research that needs to be resolved. We believe that the confusion is in part due to the complexity of the phenomenon and in part due to the relatively early stage of the field. Whereas response shift research has developed over the past decade, it has consequently stimulated many further questions. The public debate initiated by Ubel and colleagues is timely and will hopefully accelerate the progress. We are indebted to them for this initiative.