Background

Benign prostatic hyperplasia (BPH) is commonly found in aging men and is characterized by the presence of stromal and epithelial cell hyperplasia beginning in the periurethral zone of the prostate [14]. BPH becomes a clinical entity when associated with lower urinary tract symptoms (LUTS); the most common manifestation of BPH [1, 2]. Patients with BPH-LUTS experience a significant deterioration in quality of life because of their condition, reporting changes in sleep patterns, anxiety and embarrassment, altered mobility, changes in leisure, daily-living and sexual activities and in satisfaction with sexual relationships [3]. In some men, the progressive enlargement of the prostate may lead to worsening symptoms, acute urinary retention, and consequently, surgical intervention [3].

The primary goal for treating men with BPH-LUTS is usually to relieve symptoms and the bother they cause [5]. In patients with moderate to severe bothersome symptoms treatment options include medical therapies, such as alpha (α) blockers, or in men with an enlarged prostate, 5-α-reductase inhibitors as monotherapy or in combination [46].

The most widely employed and validated scoring system for quantifying and monitoring of BPH-LUTS is the 7-item American Urological Association (AUA) Symptom Index developed by the Measurement Committee of the AUA [7]. This instrument measures the severity of voiding and storage symptoms (see Appendix 1) and is the first 7 items of the International Prostate Symptom Score, referred to in this article as the IPSS (see Appendix 1).

The AUA committee also developed the BPH Impact Index (BII) to assess the impact of BPH symptoms on patient health and functioning [8]. The BII is a self-administered questionnaire with 4 questions about urinary problems during the past month regarding physical discomfort, worry about health, how bothersome symptoms are, and whether the symptoms are interfering with doing usual activities (see Appendix 2).

The BII has successfully demonstrated responsiveness to change in patients with BPH-LUTS who were being treated with terazosin versus placebo [9] and dutasteride versus placebo [10, 11]. The BII has demonstrated the ability to detect significant differences between men with symptomatic benign prostatic obstruction (with and without indwelling catheter) before and after intervention [12]. Changes in BII scores were greater for BPH-LUTS patients who reported overall global improvement compared to those reporting only moderate, slight, or no improvement [13].

New classes of drugs are currently under investigation for the treatment of men with BPH-LUTS, one being tadalafil, a long-acting phosphodiesterase type 5 (PDE-5) inhibitor used for men with erectile dysfunction (ED). Several reports have shown possible links between BPH-LUTS and ED in epidemiologic, pathophysiologic, and treatment aspects [14].

The initial validation of the BII was carried out 15 years ago by Barry et al [8]. Since that time, the nature of study populations, study designs and treatment options have changed. The BII is an evaluative index useful in measuring the magnitude of change in the impact of BPH-LUTS within a person over time. Its usefulness in BPH treatment is dependent upon it being reliable, responsive to change, and valid. Therefore, it is important to evaluate the construct validity of the BII for patients receiving newer medical treatment such as tadalafil.

Methods

The BII was administered in 2 clinical studies. Study 1 was a proof-of-concept, randomized, double-blind, placebo-controlled, parallel-design 12-week dose-titration study. Study 2 was a randomized, 5-group, double-blind, placebo-controlled, parallel-design dose-finding 12-week study. Men who were at least 45 years of age, with moderate to severe LUTS due to BPH and evidence of bladder obstruction, were eligible to participate in both studies. Details of the study designs and populations have previously been published [15, 16]. The two studies were in compliance with the Helsinki Declaration.

In each study, subjects were screened at Visit 1. If necessary, they started a 4-week washout of BPH treatments; otherwise subjects returned in approximately 1 week. At Visit 2, subjects were required to have an IPSS ≥ 13 and an uroflowmetry measure of peak flow rate (Qmax) ≥ 4 to ≤ 15 mL/second on a voided volume of 125 mL to continue in the study. Each study included a 4 week single-blind, placebo run-in period to assess treatment compliance and establish baseline levels at its conclusion. At Visit 3 (Week 0), baseline measures were obtained and subjects were randomly assigned to treatment.

Treatment in Study 1 was either tadalafil 5 mg for 6 weeks followed by tadalafil 20 mg for 6 weeks, or placebo for 12 weeks. Subjects returned on Visit 4 (Week 6), and Visit 5 (Week 12), which was the End-of-Study Visit.

Treatment in Study 2 was tadalafil 2.5, 5, 10, 20 mg, or placebo in a 1:1:1:1:1 ratio. The treatment period lasted 12 weeks. Subjects returned on Visit 4 (Week 4), Visit 5 (Week 8), and Visit 6 (Week 12), which was the End-of-Study Visit.

For both studies, subjects completed the BII at every visit starting at Visit 2. All questions asked were about problems over the past month. The first 3 questions of the BII were scored 0 to 3, while the fourth was scored 0 to 4. The sum of the questions produced a BII score that ranged from 0 to 13, with a higher score indicating a worse health impact of BPH symptoms (Appendix 2).

Other questionnaires assessing BPH-LUTS included the IPSS, IPSS-Quality of Life (QoL) and the Global Assessment Questionnaire (GAQ). The IPSS is a validated 7-item urinary symptom severity scale about symptoms occurring over the past month. Scores ranged from 0 to 35 with a higher score indicating more severe symptoms. The IPSS-QoL is a single question: "If you were to spend the rest of your life with your urinary condition just the way it is now, how would you feel about that?" with scores of 0 (delighted), 1 (pleased), 2 (mostly satisfied), 3 (mixed about equally satisfied and dissatisfied), 5 (mostly dissatisfied), and 6 (terrible). The GAQ is a global measure of improvement and was asked at the End-of-Study Visit: "Has the treatment you have been taking during this study improved your urinary symptoms?" with patients responding yes or no.

Objective measures included measuring peak urine flow rate (Qmax), and postvoid residual volume (PVR), which assesses lower urinary tract function. Both measures are often included in clinical trials in men with BPH-LUTS; however, both are considered optional by the AUA following the initial evaluation of patients for BPH-LUTS [4].

Statistical Analyses

Construct validity is the ability of an instrument to measure the degree to which an individual possesses a hypothetical trait or quality. Construct validity may be measured by an instrument's strong relationship with other instruments that are intended to measure the same concept (convergent validity) and a lesser relationship with other instruments that measure different concepts (discriminant validity).

The data from each study was analyzed separately. For all analyses, unless explicitly mentioned, subjects were included regardless of what type of treatment they received.

Internal consistency reliability was assessed using Cronbach's alpha statistic [17]. Cronbach's alpha > .70 is considered acceptable, > .80 good and > .90 excellent [18]. Spearman rank correlation coefficients and Pearson correlation coefficients were computed between the BII, IPSS, IPSS-QoL, Qmax and PVR at each visit. Correlations demonstrating validity typically range from .30 to .80 [19]. Expectations were that BPH-LUTS severity (i.e. IPPS) and related bothersomeness (i.e. IPSS-QoL), which measure concepts similar to BII, would be more highly correlated with BII than the objective measures (i.e., Qmax and PVR), which measure different concepts. To demonstrate known-groups validity and test for differences between groups expected to be different after treatment, Wilcoxin two-sample tests and t-tests were used to compare BII scores at the End-of-Study Visit between subjects with global ratings of improvement versus no improvement on the GAQ, and between subjects taking tadalafil versus subjects taking placebo. SAS 9.1 for Windows General Linear Models (GLM) Procedure was used for comparisons similar to the t-tests, with the addition of initial or pre-treatment BII score as a covariate. The percentage of men who received tadalafil and improved, and the percentage of men who received placebo and improved were computed for each study. Effect size, standardized response mean, and Guyatt's responsiveness statistic were calculated using BII and IPSS change scores (End-of-Study Visit score minus Visit 2 score) to compare the responsiveness, or the ability to detect change, of the 2 measures [20]. Values of .20, .50, and .80 or greater indicate small, moderate, and large responsiveness, respectively [21].

Results

Study 1 Subjects

A total of 281 men met the entry criteria and completed the BII at Visit 2. The mean age among those randomized at Visit 3 was 61.5 years (SD = 8.8, range from 45 to 82). Participants were enrolled from 21 study sites across the United States. A total of 55.2% had BPH-LUTS for more than 3 years. After the placebo run-in period, 278 completed the BII at randomization Visit 3. At Visits 4 and 5, 270 and 259 men, respectively, completed the BII.

Study 2 Subjects

A total of 1053 men met the criteria and completed the BII at Visit 2. The mean age among those randomized at Visit 3 was 62.1 years old (SD = 7.9, range from 45 to 92). Participants for Study 2 were enrolled from 92 study sites across 10 countries. A total of 51.4% had BPH-LUTS for more than 3 years. After the placebo run-in period there were 1052 who completed the BII at Visit 3 at randomization. Visits 4 through 6 included 1016, 937, and 896 men, respectively, who completed the BII.

Reliability

Cronbach's alpha for BII was computed at each visit and ranged from .78 to .85 for Study 1 and .81 to .86 for Study 2. Test-retest reliability was not evaluated due to potential deterioration of the patient condition during the washout period. However, the first and second time BII was administered occurred before and after the 4-week placebo run-in for all subjects. Spearman rank correlation coefficients between BII Visit 2 and BII Visit 3 were r s = .63 for Study 1 and r s = .64 for Study 2; Pearson correlation coefficients were r= .65 and r= .65 respectively.

Validity

For both studies, the initial BII, IPSS, IPSS-QoL, Qmax and PVR assessment used in the analyses were at Visit 2. Table 1 displays the Spearman rank correlation coefficients between the BPH measures at each visit by study. The BII and IPSS-QoL are all subjective measures of symptoms and/or impact of BPH-LUTS. The IPSS is a urinary symptom severity scale that measures the severity of symptoms of BPH_LUTS. At each visit of Studies 1 and 2, the BII correlated well with IPSS (r s = .39 to .67) and IPSS-QoL (r s = .56 to .70). The other 2 variables were objective measures of BPH-LUTS (Qmax and PVR). At each visit of Studies 1 and 2, BII had very low correlations with Qmax (r s = -.13 to .01) and PVR (r s = .00 to .14). Pearson correlations were very similar.

Table 1 Spearman Rank Correlation coefficients among BPH Measures at each visit

To demonstrate known-groups validity for BII in each study, subjects who indicated on the GAQ that treatment had improved their symptoms at the End-of-Study Visit were compared to subjects who indicated no improvement. Table 2 displays the 25th percentiles, medians, 75th percentiles, means and standard deviations for improved and not improved subjects. Baseline scores are shown also. There was a significant difference between improved and not improved subjects for both studies (Study 1: Wilcoxin two-sample test z = 5.03, P < .0001, t-test (df = 256) = 5.16, P < .0001; Study 2: Wilcoxin two-sample test z = 10.24, P < .0001, t-test (df = 469) = 10.64, P < .0001). The GLM results were very similar to the t-tests (Study 1 least-squares means: Improved = 2.91, Not improved = 4.97; Study 2 least-squares means: Improved = 2.90, Not improved = 5.04).

Table 2 BII Scores for subjects with GAQ Global Ratings of "Improved" versus "Not improved" at End-of-study Visit

To demonstrate differences between groups that are expected to be different after treatment, BII scores at the End-of-Study Visit for subjects taking tadalafil were compared with subjects taking placebo in each study. Table 3 displays the 25th percentiles, medians, 75th percentiles, means and standard deviations for tadalafil and placebo subjects. Baseline scores are shown also. There were significant differences between tadalafil and placebo subjects for both studies (Study 1: Wilcoxin two-sample test z = 2.84, P = .0045, t-test (df = 254) = 2.65, P = .0085; Study 2: Wilcoxin two-sample test z = 2.73, P = .0064, t-test (df = 892) = 2.68, P = .0076). Since tadalafil treatment started at Visit 3, GLM was used to compare the 2 groups while controlling for BII at Visit 3. Results were very similar to the t-tests (Study 1 least-squares means: tadalafil = 3.61, placebo = 4.39; Study 2 least-squares means: tadalafil = 3.46, placebo = 4.03).

Table 3 BII scores at End-of-Study visit for subjects taking Tadalafil or placebo

Responsiveness

Responsiveness statistics were calculated for the BII and the IPSS. The effect size, standardized response mean, and Guyatt's responsiveness statistic for BII were: .78, .79, and 1.02 (respectively) for Study 1 and .74, .75, and .82 (respectively) for Study 2. The corresponding response statistics for the IPSS were: 1.35, 1.15, and 1.31 for Study 1 and 1.39, 1.15, and 1.38 for Study 2. Although the IPSS values were greater than the BII, both measures appeared to be responsive to change.

Discussion

The BII is a BPH-specific measure that assesses the impact of BPH symptoms on patient health and functioning. Cronbach's alpha for BII ranged from .78 to .85 for Study 1 and .81 to .86 for Study 2, indicating high internal consistency. BII correlated well at each visit with IPSS and IPSS-QoL, measures of BPH symptoms and overall bother of BPH, respectively, which measure concepts similar to BII. These correlations support BII has convergent validity. In contrast, objective measures of BPH such as Qmax and PVR, which measure different concepts than BII, had very low correlations with BII at each visit. These low correlations offer evidence to support discriminant validity for the BII. It has been recognized for some time in the urological community that urodynamic measures such as Qmax and PVR are not associated with the symptoms of BPH-LUTS itself [22].

Construct validity also may be established by comparing groups that are known to differ on the concept of interest (known-groups validity) or comparing groups that are expected to differ after experimental manipulation. BII differentiated between subjects who indicated treatment had improved their urinary symptoms compared to subjects with no improvement, lending support for known-groups validity. BII also differentiated between subjects taking tadalafil compared to subjects taking placebo, establishing that BII can distinguish groups that are expected to differ after therapeutic intervention. More subjects receiving tadalafil improved than subjects receiving placebo in both studies. The purpose of the analyses in this manuscript was not to identify who improved on placebo or different doses of tadalafil (2.5 mg, 5 mg, 10 mg or 20 mg), but to conduct a construct validity of a BPH bother score tool. The 2 studies are very different with different dosing and duration of a specific dosing, which makes the studies suitable for validity assessment, but very heterogeneous and therefore unsuitable for comparison.

Our results support 2 validation studies that found strong correlations between BII and IPSS [7, 8] and other studies that found strong correlations between BII and IPSS-QoL [11, 23, 24]. Our findings that BII can differentiate known-groups is consistent with BPH treatment studies that showed BII differentiated between levels of global patient improvement [13], between drug therapy and placebo [10], and between treatment options [11].

The responsiveness statistics calculated for BII indicate that BII is responsive to change. BII should be able to detect when clinically relevant change occurs in a clinical trial or a treatment setting.

The IPSS was the primary efficacy measure in the 2 clinical trials. Given that the BII correlates well with the IPSS, one might question what additional benefit it offers. The correlations between BII and IPSS in Study 1 at Visit 5 (r s = .64, r = .67) and in Study 2 at Visit 6 (r s = .67, r = .68) were strong but they indicated that the IPSS accounted for only 45% and 46% of BII variance, respectively. The BII is a subjective measure and IPSS is a urinary symptom severity measure of BPH-specific health status. While the IPSS addresses specific symptoms that the patient may experience, the BII addresses how BPH symptoms impact the patient. The IPSS-QoL consists of a general question about the impact of BPH on the patient's quality of life. It also had strong correlations with BII in Study 1 at Visit 5 (r s = .66, r = .69) and in Study 2 at Visit 6 (r s = .70, r = .71) but the IPSS-QoL accounted for only 48% and 50%, respectively, of BII variance. The BII measures more specifically the impact of BPH symptoms, including physical discomfort, worry about health, how bothersome symptoms are, and whether the symptoms are interfering with doing usual activities.

When the primary goal for treating patients with clinical manifestations of BPH is to relieve bothersome symptoms, the BII could be used to measure symptom impact on patients. Future research will be necessary to determine if the BII is useful in predicting patient outcomes. This study demonstrates that BII is useful for clinical trials that evaluate drug therapy designed to affect the impact of BPH symptoms on patient health and functioning.

Conclusions

The results demonstrate that BII is reliable, shows responsiveness, and has construct validity. The BII is a valid instrument to assess the impact of BPH symptoms on health and functioning in clinical trial settings.

Appendices