Background

Urinary problems, particularly when accompanied with urinary incontinence (UI), have been shown to significantly impact different domains of health-related quality of life (HRQoL) such as emotional well-being, performance of daily activities and social interaction [1], and have also been associated with economic burden [2],[3] and lower productivity [1],[4]. Neurogenic detrusor overactivity (NDO) is an etiology of UI that is caused by conditions such as multiple sclerosis (MS) or spinal cord injury (SCI). Detrusor overactivity is an involuntary bladder contraction during the filling phase of cystometry [5]. As a result of a disruption in the regulation of the micturition reflex, NDO patients frequently suffer from urinary symptoms including urgency and urinary urgency incontinence, which negatively affect their HRQoL [6],[7].

A practical approach to evaluating the health states derived from a disease is through administration of existing generic preference-based instruments such as the EQ-5D [8],[9], the Health Utility Index –Mark 2 or Mark 3- (HUI2 or HUI 3, respectively) [10],[11] or the SF-6D [12],[13]. These instruments are suitable across patient populations, regardless of the disease, allowing investigators to describe and compare important aspects of HRQoL and produce preference-based or utility scores. Although there is evidence to suggest that there is an underlying basic construct measured by the three generic instruments, it is well established that they produce different values and are not interchangeable [14]-[18]. Furthermore, there is controversy about their discriminative ability and sensitivity to detect clinically important changes in varying patient populations and consequently, these measures may not be the best choice for certain conditions [19], including urinary incontinence-related problems [20]-[23].

There are a number of condition-specific instruments available for patients with lower urinary tract symptoms and UI. Some commonly-used measures in clinical trials and outcomes research are the Overactive Bladder Questionnaire (OAB-q) [24], the King's Health Questionnaire (KHQ) [25] and the Incontinence Quality of Life Questionnaire (I-QOL) [26]-[28]. These instruments have good psychometric properties in terms of reliability, construct and discriminant validity, and responsiveness [6],[24],[27],[29]. Recently, new efforts have been focused on estimating utilities related to health states derived from these tools by means of surveying different samples of patients or general population from Europe and the US [30]-[32]. However, out of all these measures, only the I-QOL questionnaire includes a specific module for NDO patients developed from the needs-based model [26],[27]. In addition, the validity of the I-QOL has been demonstrated in patients with neurogenic urinary incontinence [6]. Consequently, the overall aim of this research was to generate a preference-based measure from the I-QOL and its neurogenic set, the Incontinence Utility Index (IUI), by means of surveying a representative sample of the general population. This new instrument would represent a more comprehensive measure for valuing health states associated with urinary problems from a range of different etiologies.

Methods

Overview

The I-QOL Questionnaire and Neurogenic Module

The I-QOL is a self-administered disease-specific instrument comprised of 22 questions (5 point Likert scale) addressing three main domains: Avoidance and Limiting Behavior, Psychosocial Impact, and Social Embarrassment [26],[27]. A global scale score is obtained by summing up the responses to all items and transforming the raw total score to a 0–100 scale (0-worst/100-best HRQoL). As noted before, the I-QOL was developed and validated among patients with stress UI and overactive bladder (OAB) [27] and has since been successfully tested on other patient populations, such as NDO patients [6] or patients with urgency UI who had not been adequately managed with anticholinergic therapy [28]. The additional module for patients with neurogenic bladder consists of 5 items about limiting caffeine drinks, worry about long-term effects of catheterization, accessibility and privacy in public toilets, bother associated with catheterizing, and bother associated with the use of pads or diapers.

A 2-stage process was used to develop the IUI from the I-QOL. The first stage was to use Rasch analysis and classical psychometrical tests to derive an abbreviated health state classification from the I-QOL and Neurogenic Module that is suitable for preference elicitation. The second stage was to conduct a preference elicitation survey to allow the estimation and validation of a multi-attribute utility function (MAUF) for the IUI.

Deriving an abbreviated health state classification system from the I-QOL and the neurogenic module

Multi-Attribute Utility Theory (MAUT) is an approach that assigns utility weights to different outcomes by considering multiple attributes and the associated preferences reported by a given population, and then combining individual values into an overall utility measure. This process involves specifying a particular form for the utility function and the possible preference interactions among the attributes [10],[11],[33]. For MAUF estimation, it is important to include a range of aspects describing relevant consequences of a given disease on patients' lives to ensure accuracy and sensitivity to change. The attributes should not be too large, however, so as not to increase respondents' cognitive burden and make data collection impractical. The I-QOL, along with Neurogenic Module, generates a total of 527 different health states. Hence, a psychometric analysis was required to extract a minimum but valid set of health states [34]. To this end, Rasch analysis and statistics from Classical Test Theory (CTT) were combined.

Rasch analysis is a scaling methodology that allows the examination of the hierarchical structure, unidimensionality and additivity of HRQoL measures [35]. Rasch methods may be used to identify and select items in an instrument that best cover the entire continuum of the underlying construct and remove redundant items [34],[36],[37]. Data dimensionality was investigated using the approach suggested by Linacre (1998) [38] and item responses were analysed using the Partial Credit Model [35], considering model fit, item and category locations, and differential item functioning (DIF) regarding sex, age group and etiology (i.e. MS or SCI). Model fit in the range 0.5-1.5 was considered acceptable, and items with centered locations and ordered categories as spread as possible were preferred. DIF was considered relevant when it was statistically significant and the difficulty difference between groups was over 0.5 logits. Additionally, an expert panel including the developer of the I-QOL and other experts in urology and psychometrics was convened to review the best items according to the results of the analyses. Finally, internal consistency (Cronbach's α > 0.8), criterion validity (statistically significant differences in HRQoL according to the reduction in the frequency of UI episodes) and agreement with the original I-QOL (Intraclass correlation coefficient –ICC- ≥0.75) were tested to ensure that the abbreviated version met an acceptable standard in these properties. The sample used in this first stage has been described elsewhere [39],[40]. Briefly, we pooled data from two randomized trials of onabotuliniumtoxinA (BOTOX®) 200U or 300U vs. placebo in adult patients with UI due to NDO. A total of 691 patients were enrolled in these two trials, 44.9% of whom had SCI and 55.1% of whom had MS. The primary time point was week 6; patients could request a second treatment after week 12 and were followed up to week 52.

Weighting the health states defined by the new abbreviated health state classification system

Elicitation survey

A cross-sectional observational study was conducted between October and December 2012 to survey a representative sample (n = 442) of English-speaking, non-institutionalized adults from the general population in United Kingdom (UK). Participants were eligible to participate if they were willing to complete the interview process and to endorse their compliance with the quality standards of the survey. Written informed consent was obtained prior to participation. Exclusion criteria included cognitive impairment, suspicion of being under the effects of alcohol or narcotics use during the study visit, and any concurrent medical condition limiting their capacity to complete the evaluation.

Sample size was set to a minimum of 338 responders to enable the estimation of mean values with a confidence interval of ±0.032 points, and a standard deviation of 0.3 points, assuming a normal distribution of scores with a confidence interval of 95% (t-value = 1.96) [41]. However, given the complexity of the elicitation process and previous experiences [42], it was estimated that a maximum of 30% of the respondents would not be able to successfully complete all the proposed rating exercises. Hence, a total of 440 participants were interviewed.

Sampling was carried out in two steps: Cluster random sampling was first applied based on UK regions/postcodes. Subjects were then randomly selected from each region/postcode while monitoring other key socio-demographic variables (age, gender, education and employment status). The uniqueness of each participant's identity was verified at recruitment and before the interview using their name and address. Single visits for interviews were face to face and conducted using a Computer Assisted Personal Interviewing methodology, at either a central location in major hubs across the UK, or by visiting respondents at home at an agreed time and date. Respondents were offered £20 for participating in the survey, plus an additional £5 for travel expenses if they were asked to come to a central location to participate.

A total of 10 professional interviewers with required qualifications and relevant experience in conducting face to face interviews collaborated in this research. They received intensive instruction, including role-playing sessions to ensure the quality of interviews. Interviewers were required to record the time needed to complete each interview immediately following each survey, and also to rate the degree of understanding and cooperation from participants along with the overall quality of the interview.

Opinion Health© was in charge of data collection which was conducted according to the Code of Conduct of the Market Research Society [43], European Pharmaceutical Market Research Association [44] and qualitative recruitment best practice outlined by The Association of Qualitative Research [45].

All procedures and materials were tested in a pilot study (n = 13) to identify any practical problems with data collection and to validate instruments and materials used prior to the study. Following the pilot study, additional debriefing sessions were conducted with the interviewers in order to ensure that interviews would be conducted in a systematic way to minimize possible sources of bias.

Modelling space of the new abbreviated health state classification system

All health states were carefully chosen to allow estimation of the 5 single-attribute utility functions, each of the attribute weights in the multi-attribute utility function, and the interaction term (see MAUF estimation below). A total of 16 different states were evaluated (5 single-attributes and 11 multi-attributes).

Respondents were asked to assume a time horizon range of 30 years to better reflect the chronic nature of the health states presented. This period of time is close to the average expected years of life for a middle-age person according to UK life expectancy tables. The time horizon and the chronicity of states were discussed with all the participants before proceeding with the interview.

In addition to basic socio-demographic variables, each participant responded to the following evaluation exercises required for MAUF estimation:

  • Visual Analogue Scale (VAS) rating of single-attribute utility functions. For each attribute (n = 5), VAS rating of the intermediate level was conducted on a thermometer feeling scale anchored at 0- the least desirable or the worst level at each attribute and 100- the most desirable or best level at each attribute.

  • VAS rating of multi-attribute health states: A total of 5 corner states, 3 intermediate or marker states and 3 anchoring states were performed. Intermediate or marker states were chosen to ensure the evaluation of a wide range of levels within the 5 targeted attributes, enhancing the precision of estimations. Anchoring references were: 0- the least desirable health state defined by the attributes (health state W) or dead - and 100-the most desirable health state or perfect health, defined as the conjunction of the top level at each attribute (health state P) [11]. It is important to note the lowest anchor states were chosen depending on each participant's preferences. Therefore, for those respondents who declared that being dead was preferable to health state W, being dead was measured on a scale ranging from 0-health state W to 1- the most desirable/perfect health scale. In contrast, for those respondents who valued being dead as worse than health state W, health state W was then valued on a scale ranging from 0-dead to 1-the most desirable/perfect health scale.

  • Time Trade-Off rating (TTO) for power function estimation and for evaluation of MAUF predictive validity. A total of 6 different states were assessed: 3 corner states and all the intermediate or marker states previously described. During the elicitation process, a "ping-pong" presentation was used to converge on an indifference point between the alternatives (Figure 1). All participants were requested to think about the differences between the health states compared in each exercise, always keeping in mind that all other important, broader factors would remain constant (family, job, friends, income, etc.) under all the presented scenarios.

Figure 1
figure 1

Presentation diagram of the time trade- off technique. Life P= The most desirable health state/Full health/The best health state imaginable. Life A= A given health stated derived from the abbreviated health state classification system.

MAUF estimation

A person-mean utility approach was used to estimate a general utility function based on community responses [10],[11],[42]. MAUF forms include the additive form, the multiplicative form and the multi-linear form [10]. A detailed introduction to the principles for MAUF estimation can be found in the literature [10],[11],[33]. Considering the reduced version of the I-QOL comprises 5 health domains, the multiplicative MAUF is reasonable and has empirical support [10],[33]. Since it is easier for respondents to imagine corner states (with one attribute at its worst level and the rest of attributes at their best level), this function is normally expressed in terms of disutility (u) which is just the complement of the utility:

u - = 1 c Π j = 1 5 1 + c c j u - j Π 1
(1)

where

1+c= Π j = 1 5 1 + c c j
(2)

cj is equal to the disutility of the corner state for attribute j and represents the weight attached to that attribute. If the sum of all c j equals 1 then the additive model holds. c is the interaction term and results from solving Equation 2. The methods applied in this research are similar to those followed to develop the HUI-3 [11]. A complete description of the statistical approach can be found in the Additional file 1. Briefly, after confirming the consistency of respondents' ratings, the sample was split in 2 groups according to the health state they considered less preferable (dead or the worst health state possible in the abbreviated health state classification system). Hence, two separate power functions [46] were calculated to convert VAS values (v) into utilities (u) and the adjusted overall person mean scores were calculated with utility values ranging from Dead = 0.00/P = 1.00 scale. Next, the relative weight of each parameter (cj in disutility terms or wj in utility terms) and its interaction form were studied for each group and for the overall sample and finally, the IUI algorithm was estimated.

Predictive validity of the IUI

The accuracy of the algorithm was analysed by comparing estimated and directly measured utilities (TTO) on intermediate states. A number of statistics were computed:

  • Sum of total differences (S differences): S differences = S (predicted u j – observed Person-Mean u j )

  • Mean of differences (MD): MD = [S (predicted u j – observed Person-Mean u j )/nj]

  • Mean of absolute differences (MAD): MAD = [S |(predicted u j – observed Person-Mean u j )|/nj]

  • Overall standard deviation (OSD) of differences: OSD = [(S (predicted u j – observed Person-Mean u j )2)/(nj-1)]

  • ICC between estimated and directly measured scores [47].

For these metrics, values as close to zero as possible are preferred, except for the ICC, interpreted as any correlation. The statistical packages WinSteps software version 3.72.3 and Stata10 along with the spreadsheet Excel 2007 (Microsoft) were used for the analyses presented in this manuscript [48],[49].

Results

I-QOL reduction

Outputs from Rasch analysis are presented in Table 1. No age related DIF was identified. Items 1, 3, 4 and 10 in the I-QOL and 2 and 4 in the Neurogenic Module had etiology related DIF. Item 10 from the I-QOL and 2 and 4 from the Neurogenic Module had sex related DIF. These six items were removed from the selection based exclusively on the results of the Rasch analysis. Additionally, as previously stated, an expert panel then proceeded to consider the results of the analysis jointly with the item content, to reach the final selection of 5 items considered to represent a set of complementary attributes. The 5 response categories were collapsed into 3 to simplify health state valuation, yielding the abbreviated health state classification system (Table 2). This final version proved to be internally consistent and valid for NDO patients according to the psychometric analyses presented in Table 3: at week 6, the abbreviated health state classification proved to have adequate ability to detect changes in those patients who showed a reduction in incontinence episodes (responders), and the association between daily incontinence episodes and health state classification scores was considered adequate. Furthermore, the level of agreement between the original I-QOL and the abbreviated health classification system was high (ICC = 0.90; 95% CI: 0.89-0.91) and statistically significant (p < 0.001).

Table 1 Summary of Rasch outputs
Table 2 The abbreviated health state classification system derived from the I-QOL and Neurogenic Module
Table 3 Psychometric properties of the abbreviated health state classification system in neurogenic detrusor overactivity patients

Weighting the health states derived from the I-QOL

Complete descriptions of the multi-attribute health states and the sample are presented in Tables 4 and 5. A total of 442 interviews were completed, however, 44 cases were withdrawn because they presented at least one inconsistency in their ratings: if VAS values for a given corner state (health states A to E, Table 4) were lower than the VAS values of comparable marker states (M1 to M3, Table 4), n = 50 (please note some participants provided more than one inconsistent answer); or if the value of any corner or marker states were lower than the VAS value of the least desirable health state, n = 24. Only those participants successfully completing all the rating exercises were included, n = 398, generating a total of 2,388 TTO evaluations. With respect to interview quality, 97.7% were performed with full cooperation of the respondent, 84.7% of participants thought carefully before answering, and 84.9% experienced very little or no problems completing the survey. Moreover, mean time required to complete the survey was 30.2 minutes (Standard deviation -SD- 10.9) and the vast majority of interviewers rated the quality as good or very good (94.2%) with less than 1% of interviews being considered of inferior quality.

Table 4 Multi-attribute health states used in preferences elicitation
Table 5 Description of participants in the elicitation survey (valid cases, n= 398)

A majority of respondents were female (60.1%), mean age was 44.75 years (SD14.6); 60.8% had at least a diploma education (2 years of college) and a similar percentage were employed (59.8%). With respect to their health status, 31.7% reported a chronic illness and 8.8% an acute disease. Regarding previous experience, 36.4% declared they had suffered symptoms associated with OAB or UUI and 48.0% recognized some of these problems in their relatives or friends.

With respect to participants' preferences about the worst state described by the abbreviated health state classification system and dead, most of them (n = 294, 73.9%) stated they would prefer living the next 30 years in health state W (Group B), while the rest (n = 104, 26.1%) preferred being dead to living in health state W (Group A).

MAUF estimation and final algorithm of the Incontinence Utility Index

Trimmed values (10%) were fitted separately for each group based on power functions (Equation 3 in Additional file 1) and natural log transformations (Equation 4 in Additional file 1) to convert mean VAS (v) into utility scores (u). Regression models yielded good fit (R2 group A = 0.923 and R2 group B = 0.978) and power functions resulted as follows: Group A, u = 1-(1-v)1.229 and Group B, u = 1-(1-v)0.841. Estimates of the relative weight of each attribute fitted in the perfect health = 0 and worst state = 1 for Group A were: c1 = 0.393, c2 = 0.450, c3 = 0.387, c4 = 0.562 and c5 = 0.283 (Σcj = 2.076; c = −0.911). For Group B: c1 = 0.636, c2 = 0.640, c3 = 0.616, c4 = 0.775 and c5 = 0.490 (Σcj = 3.158; c = −0.994). From these results it was seen that the multiplicative form was an appropriate form.

Final utilities were calculated based on the prevalence proportion in Person-Mean A and Person-Mean B groups (both in W = 0.00/P = 1.00 scale): uj = (104* Person-Mean A uj + 294 * Person-Mean B uj -re-scaled-)/398. A positive linear transformation was applied to re-scale the utilities into a dead = 0.00 / P = 1.00 scale to facilitate comparisons with other utility measures. Table 6 shows utility weights estimated for the multi-attribute health states defined in Table 4. The disutility weights estimated for each attribute from the overall sample were: c1 = 0.470, c2 = 0.484, c3 = 0.456, c4 = 0.590 and c5 = 0.358 (Σcj = 2.357; c = −0.951). Once again the results rejected the linear additive form and showed that all attributes were preference complements. The five single attribute utility coefficients and the overall MAUF are presented in Table 7 with possible scores ranging from 0.036 (worst health state) to 1 (perfect health).

Table 6 Estimated overall utility scores
Table 7 Single and Multi-attribute utilities

Predictive validity of the MAUF

Mean utility scores of marker states directly elicited on the TTO were compared against those estimated by the MAUF to test its predictive validity. The results were as follows: Σ differences = −0.038; MD = −0.013; MAD = 0.038; OSD = 0.004 and ICC (95% CI) = 0.928 (0.648-0.985). Thus, the calculated MAUF showed a very slight tendency to underpredict directly elicited utilities. Moreover, the level of agreement found between both methods (ICC) was good and only 7.2% of variability could not be attributed to subjects.

Discussion

In this study, a new utility index, the IUI, has been estimated from the abbreviated health state classification system derived from I-QOL and its neurogenic module by means of eliciting preferences from a representative sample of UK adult general population [50]. The abbreviated I-QOL version was internally consistent and able to capture clinically important differences in clinical status of NDO patients with UI (i.e. changes in HRQoL according to reductions in the average number of IU episodes per week). Furthermore, a high level of agreement was found between the reduced version and the original I-QOL, confirming the appropriateness of the abbreviated health state classification system of 5 domains and its modelling space for utility estimation. Moreover, all the psychometric procedures undertaken to reduce the I-QOL have been successfully applied previously [30],[36],[51] and have been recently recommended [34].

Regarding the elicitation process, methods applied are consistent with those used to develop one of the most widespread and robust generic utility measures, the HUI [10],[11]. As has occurred in previous publications, the additive model was rejected in this study [10],[11],[42] and attributes were preference complements: for instance, the perceived limitation associated with being depressed and not having bladder control is greater than the separate effect of being depressed and not having bladder control, but smaller than the sum of these two problems.

In addition, predictive validity of IUI scoring algorithm was confirmed after comparing the direct utility values and those estimated for the final algorithm. Recognizing that IUI algorithm showed a slight tendency to underpredict the directly elicited utilities, error size was small and comparable to those errors reported for other utility instruments [11]. What is more, the ICC between direct and indirect values showed an adequate level of agreement.

Generic preference-based indices have historically been the most commonly used means of estimating utilities across a variety of conditions. However, substantial research has been conducted which shows the limitations of these instruments in different conditions [19]-[22], as well as the lack of concordance between the utility values obtained from their application [14]-[16],[18],[52],[53]. As a result, the development of condition-specific preference-based measures has been gaining ground in recent years [30]-[32],[54].

There are published studies focused on obtaining utility scores from condition-specific instruments for urinary problems. An algorithm has been generated to derive utilities from the KHQ by eliciting preferences from a sample of UI patients [31]. Kay et al. (2013) [32] mapped EQ-5D utility scores from the I-QOL among patients with neurogenic and idiopathic OAB using cross-sectional data from Europe and the US. Finally, Yang et al. estimated a population's preference-based index from the OAB-q, the OAB-5D [30]. Consequently, although the IUI was derived from the I-QOL and its specific module for neurogenic patients, the OAB-5D is the most similar instrument because its modelling space was also obtained from applying Rasch, preference elicitation involved TTO evaluations, and also incorporates general population preferences. Nevertheless, relevant differences lay in the characteristics of the samples used in the reduction process (we specifically used NDO patients) and in the estimation models applied to derive the utility scores since OAB-5D followed the methods described previously for the SF-6D [12] and we computed a MAUF in accordance with the HUI latest versions [10],[11]. Despite these differences, mean absolute error/differences in both measures are comparable (OAB-5D: 0.044 versus IUI: 0.038). Hence, additional research is needed to compare performance of each respective measure in the same populations (i.e. criterion validity, responsiveness and influence on cost-effectiveness ratios).

Despite the fact that the MAUF has proven robust, there are a number of limitations in this research. It should be noted that we used TTO evaluations instead of the Standard Gamble (SG). Although SG is considered the preferred technique to collect subjects' preferences, TTO is a legitimate and extensively used technique, generally considered easier to understand and less time consuming [30],[55]. Preferences were elicited from a UK-specific population, so caution should be used before applying the algorithm to other countries, especially if the population is expected to perceive urinary problems differently. Additionally, as a condition-specific preference-based measure, the IUI may suffer from some potential risks in terms of comparability of results [34]. The risk of focusing effects (i.e. cognitive bias that occurs when participants place too much importance on the problems associated with the health states presented to them compared with other conditions) was obviated as best as possible by clearly stating throughout the preference elicitation process that, apart from the health states described by the reduced version of the I-QOL, other important aspects of life (i.e. family, economic situation, friends, job, etc.) would remain constant.

Another source of limitations referred to as anchoring (defining a specific upper anchor that could make comparability across other preference-based instruments problematic) was also anticipated. Consequently, the upper limits during the evaluation process were defined as the most desirable health state, the best health state imaginable, or full health to best facilitate comparisons with other scales. Finally, while the 30-year time horizon was set to illustrate the chronicity of health states, this time frame may not have been the most appropriate for participants under 30 years of age (18.3%) or, particularly for those older than 60 years (17.6%). Thus, this time horizon may result in some over/underestimations during TTO exercises with these subsamples [56],[57].

Conclusions

The I-QOL and the IUI are valid-in-population measures for measuring HRQoL and utilities, respectively, associated with urinary problems. Although the IUI is the first utility measure that has been developed for a specific subset of patients with urinary symptoms (NDO population), it is important to note that the final selection of attributes included in the IUI is from the original I-QOL, with no items utilized from the Neurogenic Module. Hence, investigators may test its applicability in other relevant subsamples. It is worth noting that the use of a representative sample of general population to value its health states may ease the application of this instrument in new subsets of patients suffering from urinary problems. New research is currently underway to confirm the soundness of the IUI modelling space on idiopathic OAB patients and to study the responsiveness and the minimally important differences of the IUI in both NDO and idiopathic OAB populations. These insights will be of value to future researchers using the IUI instrument which is intended to complement utility estimates provided by generic instruments to support decision-making with reliable, valid and understandable information presented on a similar scale.

Additional file