Time trade-off health state utility values for depression: a systematic review and meta-analysis

Purpose This study aims to systematically review the literature on health utility in depression generated by time trade-off (TTO) method and to compare health state vignettes. Methods Systematic literature search was conducted following PRISMA guideline in 2020 November (updated in 2022 March) in Pubmed, Web of Science, PsycInfo, and Cochrane Database of Systematic Reviews. Random effect meta-analysis was conducted to pool vignette-based utility values of mild, moderate, and severe depression and to compare the preferences of depressed and nondepressed population. Results Overall, 264 records were found, 143 screened by title and abstract after removing duplicates, 18 assessed full text, and 14 original publications included. Majority of the studies (n = 9) used conventional TTO method, and most of the studies (n = 8) applied 10-year timeframe. Eight studies evaluated self-experienced health (own-current depression). Six studies assessed vignette-based health states of remitted, mild, moderate, and severe depression, half of them applied McSad measure based health description. Altogether, 61 different utility values have been cataloged, mean utility of self-experienced depression states (n = 33) ranged between 0.89 (current-own depression) and 0.24 (worst experienced depression). Pooled utility estimates for vignette-based mild, moderate, and severe depression was 0.75, 0.66 and 0.50, respectively. Meta-regression showed that severe depression (β = −0.16) and depressed sample populations (β =  −0.13) significantly decrease vignette-based utility scores. Conclusion Our review revealed extent heterogeneity both in TTO methodology and health state vignette development. Patient’s perception of depression health states was worse than healthy respondents. Supplementary Information The online version contains supplementary material available at 10.1007/s11136-022-03253-5.


Introduction
Depression is a common mental disorder worldwide, affecting more than 264 million people of all age groups [1]. It has a wide range of levels of severity and a variable degree of intensity (e.g., major depression, bipolar disorder, affective disorder), and it is characterized by typical physiological and mental symptoms. This disease can cause sleep and eating dysfunctions, impair emotional and cognitive functions, harm individuals' self-assessments, and disrupt role functions or ordinary behaviors. Depression is regarded as a chronic condition [2] that can cause functional impairment leading to deterioration in health-related quality of life (HRQoL) or decline in subjective perceptions of social, occupational, and health-related well-being [4,5].
Health economic evaluations frequently apply the notion of the quality-adjusted life year (QALY) to quantify health gains. The notion of QALY consists of two elements: quality of life, which is measured by health state utility, and quantity of life, which is expressed in terms of life expectancy. One year of full health equals one QALY [6]. The process of calculating the associated utility may employ either direct methods-such as time trade-off (TTO), the standard gamble (SG), the visual analog scale (VAS) or discrete choice experiments-or indirect methods. The indirect utility of respondents can be measured using either generic (e.g., Health Utility Index, SF-6D) or disease-specific (e.g., Hamilton Depression Rating Scale, Patient Health Questionnaire-9) HRQoL questionnaires. Direct utility is obtained by reference to people's preferences for a given health state. Indirectly evaluated scores can be transformed into utilities using various weights of societal or patient preferences based on the results of the direct utility assessments [3].
The most frequently recommended generic HRQoL instrument for eliciting indirect utility is the EQ-5D questionnaire (EQ-5D-3L and/or EQ-5D-5L) [7], while TTO is a commonly and strongly advised measurement method for eliciting direct utility. Due to its explicit relationship with QALY and taking into account its relative simplicity, the stronger preferences of respondents (as compared to their preference for SG) and better compliance with the theoretical axioms of economic evaluations (as compared to the measurements of VAS), TTO has become very popular among direct health state preference elicitation techniques over the past 30 years [8].
The time trade-off task is designed to force a respondent to express indifference between living for a period of time 't' in a better health state and living for a period of time 'x' in a particular imperfect health state [9]. Traded years 't−x' represent the amount ("price") that the respondent is willing to sacrifice for quality over quantity of life. Utility is calculated directly based on the point of indifference ('x'), at which the preferences of the respondent are equal with respect to the two alternatives. A health utility of '1' equals full health, '0' indicates dead and negative values represent health states that are worse than death (WTD) [10].
For example, in a TTO exercise, the respondent must choose between living 10 years with mild depression or living 9 years with full health, and the utility of mild depression is calculated in terms of the ratio of the indifference point to the length of the time period in question: U = x/t = 9/10. If 1 year spent in full health equals one QALY, then the outcome of the two alternatives is equalized in terms of the following measure: TTO features a diverse methodology; the relevant timeframe, iteration process, smallest tradable amount, assessed 9 years × utility of full health (1.0) = 9 QALY = 10 years × utility of mild depression (0.9) health state (current-own/self-experienced vs. vignettebased/hypothetical health state), numbers, and orders of evaluated health states, and methods of data collection can vary. Researchers must compromise between adjusting this method to the specific attributes of their studies and/or following a standard protocol to ensure the comparability of results [11]. Many studies have used conventional, composite, or indifference in one answer methods of TTO [12,13], but alternative forms such as waiting or sleep tradeoffs have also appeared in certain papers [14]. Following conventional, composite, or other methodological protocols has a crucial impact on the resulting utilities [12,15,16]. Timeframe differences [17], valuing vignette-based vs. selfexperienced health states [18,19], the responding population [20,21], the health state description system [22], the iteration process [23], and the anchor health state at 'utility = 1' [24,25] can all alter the resulting utilities. It is also important to note that, in addition to methodological attributes, the vignettes that are used to describe the disease and the clinical assessment tools employed can differ significantly.
Health state vignettes describe a given health state in the context of TTO tasks. A description of, for example, 'mild depression' may differ in terms of disease domains and severity across vignettes [26,27]. The aim of health state vignettes is to depict the disease as precisely as possible. Domains describe disease-specific burdens and attributes while simultaneously differentiating the stages of disease severity [28]. Health descriptions are extremely important to obtain accurate utility results, and differences in similar health state descriptions are possible sources of systematic differences in responses [29,30]. Several studies have reported that presentation of the valuation task has an impact on the values, thus, elicited [31][32][33]. The development of health state vignettes can employ different practices. Typically, vignettes are based on a literature review/scoping or consultation with (health) professionals. Recent reviews have reported controversial conclusions regarding the impact of vignettes on utility estimates [34,35].
Compared to a large number of empirical HRQoL outcome studies that have focused on patients with depression, to the best of our knowledge, only two systematic reviews summarizing HRQoL outcome studies in the context of depression have been published. Mohiuddin et al. reviewed utilities derived from the EQ-5D and the standard gamble (SG) in the context of unipolar depression, while Brockbank et al. examined studies reporting the effects of treatment of major depressive disorder [36,37]. Neither of these sets of authors searched directly for studies that applied TTO utility measurements, although they did identify two empirical TTO studies [38,39]. Different search methods have been used by previous reviews in depression, all of which are distinct from the approach taken by current research. To address this research gap, our study aims to systematically review all 1 3 original articles that report direct, depression-related utility elicited by the TTO method and to describe the associate (1) study characteristics, (2) vignette development and (3) cataloged utility in the context of depression health. The secondary purpose of this study is to (4) estimate the pooled utilities of depressed and healthy populations with respect to mild, moderate, and severe depression-related health states based on vignettes.

Search strategy
A systematic literature search was conducted in November 2020 following the principles of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) [40]. The databases searched were PubMed, Web of Science, PsycINFO, and the Cochrane Database of Systematic Reviews. No language or publication date restrictions were employed. The search was updated in March 2022 to ensure that the review was up to date. The selected keywords were discussed by the authors. The search strategy was developed as a combination of the following terminological variants: 'time trade-off,' 'time tradeoff,' 'time trade off,' 'TTO,' and 'depression' (for the detailed search strategy, see Supplementary Material 1). Citation tracking of the eligible studies was conducted by searching reference lists by hand.

Study selection
After removing duplicate studies, two independent reviewers (PB, DE) screened the titles and abstracts of selected articles to determine their eligibility for the study (based on the exclusion and inclusion criteria). Disagreements were resolved via joint discussion among the authors (PB, DE, VB). Potentially relevant full-text articles were retrieved and screened in full; nonrelevant studies were excluded based on eight exclusion criteria.
The exclusion criteria for the title and abstract screening were as follows, listed in hierarchical order: no English abstract, English abstract of non-English full-text paper, not a journal article (e.g., abstract, editorial, letter to the editor), animal/in vitro/in silico or other preclinical study, abstract unrelated to the relevant disease (depression), abstract was a review/secondary search, EQ-5D valuation studies, use of TTO to elicit utility for health states described by EQ-5D descriptive system. The included articles were (1) empirical studies, (2) primer TTO studies focused on depression or depressed states as evaluated via TTO, and (3) studies focused on samples of patients, health professionals, or a general or combined population.

Data extraction
This review summarizes general information regarding the articles, such as their authors, years of publication, study settings, countries, and main study objectives. Furthermore, the following information was extracted: (1) study population; (2) sample size; (3) proportion of women; (4) age; (5) data collection method; (6) TTO method; (7) time frame; (8) evaluated health state; (9) number of vignettes; (10) description of health states; (11) type of vignette used; (12) number of health-description domains; and (13) health state utility (mean and SD, if available). All included full-text articles were distributed equally between PB and DE, who independently extracted data from the articles. Unclear cases were resolved via joint discussion. Considering that there is no standard quality assessment protocol for TTO measurements and the age of the included studies, we used seven aspects of the earlier EuroQol valuation protocol, the Measurement and Valuation of Health (MVH) to evaluate the TTO studies: framework, time horizon, anchor state, iteration algorithm, mode of administration, method of data collection and respondent training [41].

Statistical analysis
Meta-analysis was conducted using the random effect (REML) model to evaluate mean utility estimates. By default, two eligibility criteria were established. The pooled utility of (1) mild, moderate, and severe depression (2) described by vignettes was included in the meta-analysis, ensuring that comparisons were made only between vignette-based health states. The missing standard deviation (SD) data were replaced by the sample size-weighted aver- ) of reported SDs [42]. The average utility of three vignette-based health states-mild, moderate, and severe depression-were compared across depressed and nondepressed populations. The effects of four binary coded variables, i.e., vignette type (McSad or other); population group (depressed or nondepressed); method of data collection (self-completed or interviewer-administered); and depression severity (mild or severe), on utility were analyzed via a meta-regression. Heterogeneity was tested using I 2 , which measures the proportion of observed variation between studies, in which context the differences were expressed in terms of utility estimates (0-100%). The dispersion between studies (variance of utilities) was estimated by computing T 2 and Tau [43]. The meta-analysis forest plots and the meta-regression were developed using Stata 16.0 software (StataCorp LLC).

Study selection
Overall, 264 records were found across four databases (Pub-Med: 104; Web of Science: 107; PsycINFO: 32; Cochrane: 21). After duplicates were removed (n = 121), the abstracts and titles of 143 articles were screened, of which 125 articles were excluded. Diseases/health states that were unrelated to depression and EQ-5D evaluation studies were the most common reasons for such exclusion. Accordingly, 18 articles were included for full-text analysis, due to which an additional 4 articles were discarded (not in English = 1 [44]; not focused on depression = 2 [45,46]; design for an unrealized trial = 1 [47]. In total, 14 articles met the inclusion criteria [38,[48][49][50][51][52][53][54][55][56][57][58]. Two publications from one study [59,60] were merged based on the use of the same sample population and TTO method. One additional empirical TTO publication was found by hand searching the reference lists, resulting in 14 included studies. The updated search found 42 new records (PubMed: 121; Web of Science: 124; PsycINFO: 39; Cochrane: 22), and no additional studies eligible for inclusion were found (Fig. 1).

Study characteristics
The studies were heterogeneous in terms of study setting, patient characteristics, health status, data collection, and description of health status vignettes ( Table 1). All studies were published between 1991 and 2020, and they were conducted in 9 countries: 4 in the US [39,48,49,56], 2 in

Health state vignettes
The comparison of the reviewed vignettes and their characteristics are summarized in Table 3. Overall, six studies (43%) employed a vignette-based TTO task [51, 53-55, 57, 58] describing remitted, mild, moderate, and severe states of depression. The vignettes covered 11 dimensions: emotions, physiology, mood, anxiety, cognition, behaviors, role function, social relations, usual activities, and self-appraisal. The number of dimensions used in the studies ranged from 1 to 6, with a mode of six. The number of evaluated health states ranged between 1 and 8. Two studies (33%) evaluated mild, moderate, and severe depression [55,58]; one study (16%) evaluated no/in-remission, mild, moderate, and severe states of depression [51]; two studies investigated only the severe level of depression [53,57]; and one study evaluated mild and severe depression separately alongside three co-occurring diseases (cancer, diabetes, and heart disease) [54]. The number of designed vignettes included in these studies ranged from 1 to 30. Almost all studies (83%) used one vignette for each different level of depression severity [51,53,54,57,58], while one study [55] designed 4 mild, 17 moderate and 9 severe vignettes to differentiate vignettebased depression-related health states. Regarding the presentation of vignette-based health states, three studies (50%) used scenarios (which were interpreted from a third-person perspective) [51,54,55], while three studies (50%) used statements as descriptions [53,57,58]. Attributes of disease were covered in a similar manner: dimension items focused on emotions, physiological functioning, and cognition appeared in five out of six descriptions. Only one research group used the same vignette design in two studies [54,55]; descriptions notably differed across vignettes. Three of six descriptions used the McSad depression scale (which was originally developed for direct utility measurement), although the method of presentation differed across cases (statements vs. scenario) [54,55,57]. One study used 6 items of the SF-12 (MCS), which were presented as statements, and altered the original response options to describe 4 levels of depression severity [51]. Another study focused on a single dimension, single statement description, using the first item of the mental health-specific Tolerability and Quality of Life (TooL) questionnaire [53]. The basis of the description design was not disclosed by one study; however, the description closely resembled the scenario-based McSad vignette [58].

Health state utilities
Overall, 61 utilities were extracted from 14 original studies. Nine studies (64%) [39, 48-50, 52, 53, 56, 58-60] reported the utilities of depression patients (physician or self-diagnosed), two studies focused on the nondepressed population (general population, health professionals), [51,54] and three emphasized both the depressed and the nondepressed population [38,55,57]. Supplementary Material 2 contains the utility catalog, which indicates the descriptions of the included study populations alongside the health states and utilities examined (means, SDs). Eight studies (57%) [38, 39, 48-50, 52, 56, 59, 60] calculated 36 utilities pertaining to 33 different self-experienced depression states among respondents, ranging from 0.89 (self-experienced health state of US depression patients) to 0.24 (worst own health state experienced by Canadian depression patients). Six studies (43%) [51, 53-55, 57, 58] calculated 25 vignette-based utilities for no, mild, moderate, and severe levels of depression as well as six comorbid conditions co-occurring with depression. These values ranged from 0.96 (reported depression as evaluated by health professionals) to 0.31 (patient perceptions of severe depression resulting from mental illness).

Results of the meta-analysis
Our meta-analysis examined three different vignette-based depression-related health states (mild: n = 5; moderate: n = 4; severe: n = 9) derived from patients (n = 3), the general population of depressed/nondepressed persons (n = 4) and health professionals (n = 2). Utility estimates were pooled and compared between the study populations without depression (n = 11) and with depression (n = 7).
A large proportion of I 2 was observed among the pooled utilities of the moderate (92.1%) and severe (93.8%) depression subgroups. All subgroups exhibited a low (standard) deviation of utilities across studies (T 2 = 0.000-0.020). The meta-regression showed that evaluating a severe depression state (β = − 0.16) and focusing on a depressed population sample (β = − 0.13) had small but significant (p < 0.05) negative effects on the resulting utilities (Table 4).

Discussion
In our review, all empirical studies reporting vignette-based or self-experienced TTO utility in the context of depression were collected to produce a catalog of reported utilities and compare vignettes concerning health states. The pooled mean utilities of different depression-related health states elicited in both patients and healthy individuals were estimated in a meta-analysis. Our review identified 14 articles reporting 36 selfexperienced and 25 vignette-based utilities for depressionrelated health states. The utilities reported by two randomized control trials (measuring the effects of usual care vs. rehabilitation and enhanced psychotherapy & management care) [39,59,60] accounted for one third of the 61 health state utilities cataloged. The mean utility of depression patients' self-experienced health ranged between 0.89 and 0.24, while the vignette-based mean utility of mild, moderate, and severe depression ranged between 0.91-0.66, 0.79-0.49 and 0.66-0.31, respectively. For comparison, previously reviewed SG and EQ-5D utilities in the context of unipolar depression ranged from 0.92-0.09 to 0.90-0.14, respectively [36].
Currently, little is known regarding the comprehensive impact of various health state vignettes on the elicited TTO utilities. Vignette designs are not standardized and Our findings support the use of a common approach to vignette development [26,29,61]. The overall pooled vignette-based TTO utilities for mild, moderate, and severe depression states were 0.75, 0.66 and 0.50, respectively. Meta-regression revealed the significant negative impact of severe depression (β = − 0.155) and the depressed population (β = − 0.128) on these pooled utility estimates. The high proportion of heterogeneity found in this case suggests the existence of additional subgroup or moderator effects, especially in cases of severe and mild depression. Our pooled TTO utilities were higher than the SG and EQ-5D utilities reported in a previous meta-analysis of studies focusing on patients with unipolar depression (mild: 0.75, 0.69 and 0.56, moderate: 0.66, 0.52 and 0.45, severe: 0.50, 0.27 and 0.25) [36]. Our findings support the claim that patients report different health utilities than the general population [22,62]. Many studies have suggested that patients generally have higher utilities that are attributable mostly to experience-based evaluations [63,64]. Our study contradicts the assumption that patients report higher utilities; in all comparable (N = 18) vignette-based health states, depressed groups had lower pooled mean utility in cases of mild, moderate, and severe depression.
Remarkable differences between various methods of utility generation have been reported [65][66][67]. Nine of the included studies used methods other than TTO to elicit utilities pertaining to the same health state. Three studies [49,50,57] compared rating scale (RS) vs. SG vs. TTO and reported a consistent order of self-experienced depression utilities: RS < TTO < SG. The claim that RS < TTO  utility was further supported by a vignette-based assessment [51]. Another three studies compared the valuations of SG and TTO, in which context SG indicated higher utility in all cases [39,48,56]. Vignette-based evaluation of mild, moderate, and severe depression comparing 3 × 3 mean utilities indicates an order of EQ-5D < TTO < VAS, with the three approaches exhibiting considerable differences [58]. In the context of affective disorder, [52] German value set-based utility (0.79) was higher than TTO (0.66), but UK value set-based EQ-5D-3 L utility was lower (0.63). This review suggests that the utility generation method may significantly impact the utility value associated with depression. This study faces certain limitations. First, the substantial heterogeneity of the included studies made it difficult to compare utilities across studies and to analyze the impact of TTO task attributes on utility values. Similarly, the heterogeneity of the studies included in the meta-analysis and

Conclusions
Despite the wide range of empirical studies that have examined HRQoL using generic or disease-specific tools to examine utility in the context of depression, [3,69,70] studies measuring health state via the time trade-off method have not hitherto been reviewed. To our knowledge, this study is the first to compare health state vignettes in the context of depression and to provide a comprehensive catalog of TTO utility. Our review revealed the extent of heterogeneity both in TTO methodology and in the development of health state vignettes. The meta-regression showed that a severe level of depression and the inclusion of a depressed sample decreases utility. Interestingly, in contrast with the findings of previous TTO reviews [63,71], patients' perceptions of depression-related health states were worse than those of healthy respondents. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.