Introduction

Demand for health care services exceeds public resources throughout the world. As a result, both rich and poor countries face health care resource allocation problems. As a small Caribbean country, Trinidad and Tobago imports medical technology and medical procedures from highly industrialized regions such as the United States (US), Canada and Western Europe. The government pays for virtually all of the health care services and is interested in the cost-effectiveness of different interventions [1].

Preference-based health-related quality of life (HRQoL) can be used to summarize effectiveness in terms of a single number where 0.0 is dead and 1.0 is best possible health. This number along with information on survival and point-in-time health-related quality of life is used to estimate Quality-Adjusted Life Years, and additional information on duration of life and survival can be added to estimate Quality-Adjusted Life Expectancy [24]. The Quality of Well-Being Scale (QWB) is a widely used HRQoL preference measure that produces such a number (the QWB score) [2]. The QWB measure has four components that assess different aspects of HRQoL. The components are three subscales, mobility (MOB), physical activity (PAC), social activity (SOC) and a list of symptom and problem complexes (CPX). The subscales have steps that range from no disability to severe disability. The MOB and PAC subscales have three steps each. The SOC subscale has five steps. Each subscale step and CPX item is assigned a weight that contributes to the QWB score. These weights were originally obtained by having judges from a general population rate case descriptions made up of combinations of the QWB items. The existing weights for the QWB were derived from a US sample. It is unclear whether it is appropriate to use preference weights from the US to value health states from another culture [5]. To date, the QWB has not been evaluated in any Caribbean country.

This study assesses the extent to which the QWB preference weights derived in the United States can be generalized to Trinidad and Tobago. This English-speaking country was chosen because it is very different from the United States in terms of climate, culture, racial composition and social development. It is made up of two islands that are 22 miles apart. The population size is about 1.3 million with approximately 40% Indo-Trinidadian, 40% Afro-Trinidadian and approximately 18% Mixed race. The remainder of the population is of indigenous, European, Middle-Eastern, Chinese, and Pilipino descent. The Central Statistical Office of Trinidad and Tobago estimates the average life expectancy at 70 years. The literacy rate as of 1995 was 98% [6]. If climate, culture, racial composition and social development affect the preferences that are used to produce weights for the QWB, then the QWB with weights derived in the United States should not be transportable to Trinidad and Tobago.

Methods

QWB weights were elicited for Trinidad using the same procedures used in the United States [2].

Trinidad and Tobago Survey

Data for the study were obtained from in-person household interviews that were completed between February 2000 and July 2001. Thirteen interviewers included health professionals (such as physicians, nurses, and health policy analysts) and clerk-level persons (such as vector control inspectors, census takers). All interviewers but the census takers were trained in a four and a half day workshop. The census takers were familiarized with the survey in a single 4-h session.

Sample

We collected data to test the QWB and preference-elicitation procedures on a sample of the non-institutionalized general population in three highly populated areas in Trinidad. Adults were sampled from the Port of Spain, Chaguanas and San Fernando. Approximately, 96% of the respondents contacted gave consent and were interviewed.

Data collection

The interview included three parts: (1) lists of chronic conditions, (2) demographic, household composition, health services access, and consumer assessment of health services using the pertinent sections of the 1994 Trinidad and Tobago National Health Interview Survey (TT-NHIS) and the QWB, and (3) material for the elicitation of QWB preferences. If there was more than one willing adult, the individual with the most recent birthday was chosen [7]. Each adult respondent signed a consent form approved by the Ministry of Health of Trinidad and Tobago and San Diego State University Committee for the Protection of Human Subjects #0005172X. Staff from the Directorate of Health Policy and Planning managed the data.

Elicitation of QWB preferences

Respondents used a vertically oriented visual analog scale (0 up to 10) to visualize their judgments. Each participating respondent rated a set of 65 vignettes, where the 0 was used for the low anchor of dead (or a state as bad as death) that day and 10 for “top state” representing optimal functioning with no symptoms or health problems. The 65 vignettes were drawn from the universe of QWB health states. Each vignette contained one of the symptom/problem complexes and a step from each of the three subscales and an age range. We selected a set of vignettes such that each age range and each component of the QWB was included at least once. The respondent saw a booklet of 80 vignettes with approximately nine on each page. The first 15 vignettes were intended as “warm-ups” to familiarize respondents with the task.

Analysis

The analysis was conducted in two stages. The first stage derived preference weights for Trinidad. The second stage calculated QWB scores from the Trinidad preference weights and compared them with the QWB scores from the US preference weights (using paired-sample t-tests). Multiple linear regression models were built to obtain the preference weights. A mixed model with random effects and an ordinary least squares model with a correction for cluster within respondent were constructed. Both produced the same parameter estimates and very similar standard errors. Therefore, the more parsimonious ordinary least squares model was used. The dependent variable was the preference rating (0–10), and independent variables were the CPX items and steps of the function scales as indicator variables (1 if present; 0 otherwise). In the regression model, the function scale step that indicated no disability was used as the reference. We used the standard interpretation of parameter estimates, i.e. the larger the parameter estimate, the more important or severe was the step or symptom/problem complex (Table 1). We conducted a series of analyses to determine whether any conclusions would be different depending on the weights used. Calculation of the QWB score for the respondents used the following equation.

Table 1 Characteristics of Sample by Rating Status
$$ {\text{QWB}}\;{\text{score}}\, = \,1 - ({\text{CPX}}\, + \,{\text{MOB}}\, + \,{\text{PAC}}\, + \,{\text{SOC}})\;{\text{weights}} $$

For the CPX item, “excessive worry or anxiety” was used as the reference, but assigned the mean of the two preceding CPX items for its weight, i.e. “General tiredness, weakness, or weight loss” and “Cough, wheezing, or shortness of breath with or without fever, chills, or aching all over” (the two nearest items). Another item (“Burn over large areas of face, body, arms, or legs”) was inadvertently left out of the vignettes. It was given the average Trinidad weights of the two adjacent CPX items “fainting, or coma (out cold or knocked out)”and “Pain, bleeding, itching, or discharge (drainage) from sexual organs–does not include normal menstrual (monthly) bleeding” when the CPX items were in the order of the standard US table.

Results

In Trinidad, 235 respondents completed the list of chronic conditions, sections of the Trinidad and Tobago National Health Interview Survey and the QWB. However, only 119 (51%) respondents provided ratings for each of the 65 vignettes and these were used to derive the preference weights. Interestingly, all respondents that chose to participate in the section of the survey to rate the vignettes completed all 65 vignettes. No one stopped after rating a few vignettes. Thus, the regression was run on 7,735 (119 × 65) observations. The overall mean (SD) age of the respondents was 46 (17). The 116 respondents that declined to participate in rating the vignettes were demographically similar to the 119 that provided ratings. Mean age was similar (raters 46.3 (15.4) versus non-raters 45.0 (17.9), P = 0.54). The overall mean (SD) QWB score was 0.765 (0.129). The mean (SD) QWB score was not significantly different (raters 0.754 (0.125) versus non-raters 0.776 (0.133), P = 0.50). Gender had no effect on whether respondents rated the vignettes (P = 0.21). Among all the demographic factors, the only difference between raters and non-raters was marital status (P = 0.04) (Table 1).

Table 2 Regression Model of Ratings for QWB Scale

Sample characteristics

There was approximately the same number of Africans 95 (40.4%) and Indians 90 (38.3%) and both were more than persons of Mixed descent 50 (21.3%). There were no differences in the likelihood of persons from different cities to provide ratings (P = 0.86) (Table 1).

Rating evaluation

For MOB, PAC and SOC, the more severe the disability step, the more the negative was the weight it was assigned. This was especially the case for the MOB and PAC, but less so for SOC. However, there were some notable peculiarities in weighting. For example, in SOC, the most restrictive step (Performed no major role activity, health related, and limited in self-care activities, health related = 0.111) was weighted slightly less than the next most restrictive step (Performed no major role activity, but did perform self-care activities = 0.127). On the other end of the scale, the least restrictive step (Limited in other role activity, health related = 0.088) was weighted slightly heavier than the next least restrictive step (Limited in major (primary) role activity, health related = 0.083).

All steps were significantly different from the respective reference step. The CPXs with the largest negative weights were ‘Trouble sleeping’ (0.541), ‘Taking medication or staying on a prescribed diet for health reasons’ (0.522), ‘Headache, or dizziness, or ringing in ears, or spells of feeling hot, or nervous, or shaky’ (0.411), and ‘Trouble learning, remembering, or thinking clearly’ (0.388) (Table 2).

QWB scores for respondents

The mean (SD) QWB scores for the respondents were 0.767 (0.128) using the weights from Trinidad and 0.765 (0.130) using US weights. The t-test of the difference in these QWB scores was not statistically significant (t = 0.13, df = 468, P = 0.89). Figure 1 presents a regression plot of US-QWB scores (US_QWB—dependent variable) on Trinidad scores (TNT_QWB—independent variable). The regression model is US-QWB = −0.003 + 1.002*TNT-QWB. The R-square and adjusted R-square are both approximately 0.98. These values suggest that most of the variance in scores with US weights is accounted for by the Trinidad weights. The regression line is virtually indistinguishable from the predicted 95% confidence intervals. The RMSE is very small at approximately 0.02.

Fig. 1
figure 1

Regression of QWB Scores for respondents

We also examined associations of the QWB scores with individual characteristic of the respondents. A regression of the US-weighted QWB score on age yielded the following equation: US-QWB = 0.95–0.0041*AGE. Every decade was associated with a reduction in QWB score of 0.04 (adjusted R-square = 27%). The equation for the Trinidad-weighted QWB was very similar: TNT-QWB = 0.94–0.0039*AGE. Every decade was associated with a 0.04 reduction in QWB score (adjusted R-square = 25%).

QWB scores by various respondent characteristics are shown in Table 3. The characteristics were associated with significantly different QWB scores based on US weights also had significant differences using the Trinidad weights. Moreover, the trends were also the same. For example, men had significantly higher scores than women with both US and Trinidad weights. The patterns were also significant and the same for marital status, education, employment status and household income.

Table 3 QWB Scores by Characteristics of Sample

Discussion

Leaders and health administrators in developing countries are interested in using HRQoL instruments from developed countries for a variety of reasons. First, using instruments that are already evaluated and recognized saves the effort of developing them from scratch. Furthermore, developing countries are less likely to possess the expertise and resources to construct and evaluate an instrument. Second, existing instruments produce results in terms and measures already recognized in the field [4]. However, the preferences of HRQoL instruments are based on valuations in the populations from their countries of origin. It is important to determine if the instruments are appropriate for each application [5, 8, 9]. The original QWB preference weights were derived from a probability sample of the non-institutionalized population of San Diego, California [2]. In addition to the general population of San Diego, the instrument has shown evidence of validity in other populations [9, 10]. Because the population of Trinidad was not included in the generation of these weights, it is important to elicit preferences in Trinidad before application of the measure [5, 8].

Although only 50% of the eligible respondents provided ratings, we speculated that the valuations and resulting preferences would remain the same even if all of the eligible respondents had provided ratings. Those who provided ratings did not differ on any of the demographic variables or QWB scores from those who did (Table 1). Since we did not have the direct valuations from the San Diego sample, we calculated weights and compared them with the San Diego US weights. Balaban et al. [9] used the same technique (for the same reason) when they compared patients with arthritis with the sample of the US general population. Overall, we found the US and Trinidad and Tobago weights were highly similar and that the choice of weights would lead to the same conclusions for most analyses.

Our method of eliciting valuations was similar, but not identical to the methods used previously. The QWB was updated during the intervening period from the version used by Patrick et al. [11] and the version used by Balaban et al. [9]. The most recent version is also different from the version used by Balaban et al. [9]. Specifically, the mobility scale has been condensed to three steps from five and the physical activity scale was condensed to three steps from four. This study compared the weights from the most recent version. The method of presenting the scenario was also different. The previous valuation studies had each “scenario” on a single card that the respondent sorted into slots [11]. Thus, the respondents did not have the opportunity to review previous scenarios. In this study, nine vignettes were listed on pages of a booklet. The respondent had the opportunity to review vignettes they previously rated and calibrate the current rating to their previous ratings. While no one changed an earlier rating, a few referred back to prior ratings while rating a vignette. For example, a respondent noted, “I rated vignette 12 with a 7. I think a person in the condition of vignette 20 is worse off. So I will give it 5.” This method of comparison did not appear to materially affect the preferences. There were other differences in calculating preference weights between this study and previous work. The CPX items were maintained in the order of the US standard CPX table, but assigned the Trinidad weights. We averaged the Trinidad weights of two adjacent CPX items for the unrated items. This is in contrast to the standard approach where the average of all CPX weights is assigned to four CPX items that are “not on the respondent’s card” in the United States. Because the most undesirable CPX score is used in calculating the QWB score, the standard approach pulls the QWB score toward the mean when any of the four unrated items end up being the most undesirable CPX. In either case, assigning the average of all CPX weight or the average of two adjacent items is unlikely to have much effect on the average QWB score for the sample. While preference weights for the mobility and physical activity scales had the same ordering with the US and Trinidad preferences, the social activity scale had a slightly different ordering. The US preference weights distinguished not performing major role activity and not performing self-care distinct from the three other levels of the scale. In contrast, the Trinidad preferences distinguished limited role performance and not performing as separate groups with the level of self-care (limited versus did perform) not affecting the preference score. Our findings suggest that elicitation methods can tolerate a small amount of modification [8, 9].

In this study and in the study by Balaban et al. [9], the QWB scores with the study weights were plotted against QWB scores with the original QWB weights. In both studies, the R-square was very high. It reached 0.99 for Balaban et al. [9] and 0.98 in this study. The similarity in the US and Trinidad weights was further evidenced by the similarity in associations of QWB scores with gender, income, education level, and religious affiliation.

Not all studies show these similarities in weights and the differences that have been reported have important policy implications. Johnson et al. [8] reported that valuations for the EQ-5D were different for the United States and the United Kingdom. This research has led some observers to suggest that preference weights might be culture specific [12, 13]. Johnson et al. [8] used virtually the identical time trade-off (TTO) method to elicit valuations in the United States as in the United Kingdom and concluded that the TTO method might have been problematic for several reasons including cultural differences: “… valuations of any health state were elicited in a specific cultural context.” (Johnson et al. [8], p. 227).

Trinidadians are of African descent, Indian descent and Mixtures of the various races that live in close proximity on the island. They write standard English, but speak a peculiar mixture of British English, Spanish, Hindi, French with Sango words and sayings from the Central African Republic. Ethnicity did not have a strong effect on preferences.

Our study further suggests that QWB weights from the US might be appropriate in at least one very different cultural setting. The correspondence between the QWB weights from Trinidad and the QWB weights from the United States in 1975 favors the null hypothesis that health preferences do not migrate as time elapses.

Several limitations were present in this study. First, the probability sample covered three locations rather than the entire country. Therefore, the results pertain to these locations. However, a probability sample is not required to test the validity of the QWB [9, 10]. In addition, the QWB was originally tested on probability sample from San Diego and applied throughout the United States [2]. Second, the large number of interviewers with different levels of medical training added a substantial amount of complexity to a project of this size. The different levels of medical training served to determine the lowest level of medical training needed to successfully administer the QWB in Trinidad. Using clerk-level interviewers could be important for developing Caribbean countries. These workers are well educated and in relatively good supply. They can be used in various healthcare facilities to administer the QWB to patients. The response rate was low due to the length of the interview. The TT-NHIS section covered many aspects of the health care services and took a long time to be completed. The duration ran up against other obligations for some of the respondents. The obligations included things like reacting to rainy weather that frequently causes flooding. In urban areas, flooding paralyzes the city and poses physical threats to adults and children [14]. In rural areas, flooding increases the occurrence of invasion by reptiles that pose threats to children and small animals [15]. Also meals required longer preparation times because ingredients were not usually prepackaged. A meal that included chicken usually began with a live bird. Similarly, fish had to be scaled and cleaned. Thus, parents tended to terminate the interview when mealtimes approached or to secure their home and family during inclement weather. Conducting surveys in developing countries poses challenges not encountered when conducting surveys in developed countries, but the low response rate for preference ratings was not related to health status or demographics and was unlikely to introduce any bias in the preference ratings.

In summary, this study suggests that QWB weights are interchangeable between the United States and Trinidad. In most analyses, weights derived from the Trinidad and Tobago population would produce results similar to those using standard US weights. Leaders and health administrators in other Caribbean countries may use the QWB in their populations with greater confidence that the weights will be appropriate for their populations. While leaders and health administrators can use the QWB with Trinidad weights to assess the impact of disease or injury on the population, for the sake of uniformity and direct comparisons, they might opt to use the US weights.