Introduction

Urinalysis by dipstick and/or midstream urine culture (MSU) is the first investigation of the patient presenting with lower urinary tract symptoms (LUTS). Negative results are usually assumed to exclude infection and diagnoses such as overactive bladder (OAB) depend on negative tests. The MSU is still considered the gold standard diagnostic test for UTI. It relies on isolating 105 cfu ml−1 of a known urinary pathogen, using aerobic culture with a Enterobacteriaceae-selective media. This was first described by Kass [1] after studying patients with chills, fever, flank pain and dysuria. Kass never claimed to define a threshold for use with “cystitis”, i.e. frequency/dysuria. However, 105 cfu ml−1 has been widely adopted. The dipstick has been validated against a standard of 105 cfu ml−1 of a known urinary pathogen.

In fact, these urinalysis methods are not sensitive and are incapable of excluding UTI. Effort is being made to find improved urinalysis techniques to detect UTI [2], but currently the best performing method is microscopy of a fresh unspun, unstained, clean-catch specimen of urine in a counting chamber to enumerate the white cells [3].

Our past reliance on dipstick analysis and culture to diagnose UTI has now been rejected in the literature [4], we should therefore re-examine our understanding of the symptoms and signs involved in LUTS [5].

The analysis of symptoms presents different challenges to the measurement of change over time, when assessing treatment outcomes. Well-known validated scores of LUTS such as the International Consultation on Incontinence Questionnaire (ICIQ), [6] the Incontinence Quality of Life Questionnaire (ICI-QOL) [7] and Female Lower Urinary Tract Symptoms (F-LUTS) [8] are used to assess the impact of incontinence. They have performed well in measuring the effects of treatments in clinical trials. Their utility in single, cross-sectional analyses is less assured because they use adjectival scaling such as “bothersomeness”. People differ greatly in their semantic interpretation of adjectives. In longitudinal studies, the potential error from this variance can be avoided by the normalisation of within subject changes. This option is available to neither cross-sectional studies, nor to data collected at the time of diagnostic assessment. Additionally, scores such as the ICIQ were validated in patients after UTI had been excluded on the evidence of urine culture. Our interest lies in the symptoms associated with reliable indicators of UTI.

We aimed to design a symptom analysis method, free of adjectival scaling, covering the spectrum of LUTS. We wished to avoid selecting symptoms based on our assumptions. The symptom inventory was constructed from sets collected from patients who, unconstrained, described their experience of disease. We chose to use “Yes/No” responses to enquiry to avoid adjectival qualification. We have reported this approach in developing descriptive measures in several different circumstances. We have found that it is possible to achieve valid scaling by counting the different circumstances that aggravate a patient’s symptoms.

The study aim was to build a symptoms inventory from the patient’s descriptions of their experiences. Once assured that this could be applied by using dichotomised responses, we planned to validate the psychometric properties of the questionnaire. We then planned to examine the relationship to the pathological condition by regression analysis on microscopic pyuria, which is currently our best marker of UTI [3, 9]. We also arranged to test the responsive properties of the questionnaire during treatment.

Materials and methods

Ethics committee approval was obtained from the East London and the City Research Ethics Committee. All patient data were anonymized after collection.

The evolution of the symptom set

The first task was to identify symptoms that described, as thoroughly as possible, the experience of lower urinary tract infections (LUTS) from the patients’ perspective in patients referred with chronic LUTS to a tertiary referral centre. A computer programme was created that enabled a clinician to record symptoms while the patients narrated them. The software functioned to accumulate lists that were easy to add to. A core set of ubiquitous symptoms rapidly evolved, with rarer and idiosyncratic experiences being added as more patients participated. At the end of their narrative, patients were asked if they were confident that they had described everything. This method was deployed in our ordinary clinical service between 1991 and 1999.

In 1999, the data were analysed and we extracted all symptoms that occurred at least as frequently as dysuria (≥2% of respondents), because the latter is an archetypal symptom of cystitis. These questions were then grouped into the four main categories: storage and voiding symptoms, stress urinary incontinence (SUI) and pain. These questions were organised into an inventory that was used to elicit symptoms from patients by direct questioning, recording a “yes” or “no” response. The patients were always asked to supplement the information with additional material applicable to their experience. In 2004, the data were analysed again and a final inventory was constructed from those symptoms described by ≥2% of respondents. These 39 questions are listed in Table 1.

Table 1 Symptom inventory in the order in which the questions are asked

These two preparatory phases sampled males and females so as to avoid a gender bias in the structure of the final symptom inventory.

Validation of the symptom set

The 39-question set was written into a new database and from 2004 these questions were included in the assessment of all adult (≥18 years) patients with untreated LUTS, presenting to a secondary care facility attached to urology and urogynaecology departments. The observers presented the question set by using a script (see Appendix Table 4). The reported 24-h urinary frequency and incontinence were recorded. A group of normal control subjects also contributed data.

At the time of assessment, a clean-catch midstream urine sample was collected as described below. From this an immediately fresh, unspun, unstained specimen was examined in a haemocytometer as described below. An aliquot was despatched for routine culture, as described below.

The first part of the data analysis involved psychometric validation according to the methods described below.

Reliability

  1. 1.

    Internal consistency, which measured the within-group agreement, when the questions were collected into the following categories: storage symptoms, voiding symptoms, SUI symptoms and pain symptoms. This was assessed by calculation of Cronbach’s alpha coefficient.

  2. 2.

    Test–retest reliability, which measured the consistency of the symptom inventory when repeated in the same patient. This was done by allowing each patient to complete the symptom set twice within 2 days, so that the symptoms were unlikely to have changed. We used Cronbach’s alpha coefficient for the paired data.

  3. 3.

    Inter-observer reliability, which measured the consistency of the symptom inventory when performed by different observers of the same patient. This was carried out by two observers questioning the same patient independently during the same clinic attendance. We used Cronbach’s alpha coefficient for the paired data.

Thus, reliability was assessed according to the ICIQ guidelines [6].

Responsiveness

  1. 1.

    Internal responsiveness, the ability of the symptom set to detect change, was measured by comparing the symptoms over four successive follow-up visits during treatment. The patients were treated for LUTS symptoms, using bladder retraining, antimuscarinic drugs for OAB symptoms and antibiotics if microscopic pyuria was identified. The response was assessed using mixed model linear regression.

  2. 2.

    External responsiveness was analysed first by comparing of the symptom score with the patient’s grading of the treatment response on the scale: “worse”, “no change”, “mild improvement”, “moderate improvement”, “marked improvement”, and second by comparing the inventory with 24-h frequency and 24-h incontinence. The response was assessed using mixed model linear regression.

  3. 3.

    Construct validity analysis formed the crux of this study. We explored the inferences that could be drawn from symptom occurrence in relation to the probability of significant microscopic pyuria, the latter being our best indicator of UTI. To analyse this, we used mixed model linear regression.

Comparison with FLUTS

We collected data from 135 patients who completed an International Consultation on Incontinence Modular Questionnaire for female lower urinary tract symptoms (ICIQ-FLUTS) [8] and compared the FLUTS subgroup scores with the number of symptoms detected by our inventory grouped according to the four categories: urgency, stress, voiding and pain. We used Spearman’s correlation coefficient.

Urinalysis methods

The MSU samples were obtained by the midstream clean-catch method described elsewhere [10]. The microscopic leucocyte count was achieved as follows: a fresh aliquot of urine was split into two and examined by microscopy. 1 μl of urine was loaded into a clean Neubauer haemocytometer counting chamber [11] and the preparation examined by light microscopy (magnification, ×200). The white cells were counted per 1 μl. These data were collected by specially trained doctors and nurses. A routine MSU culture was performed as follows: an aliquot of all urine specimens collected was sent to the Whittington Hospital NHS Trust microbiology laboratory, as is routinely done, for culture using standard methods [10] The result was taken as positive if at least 105 colony-forming units (cfu) ml−1 of a known pathogen were present, after 18–24 h of culture.

Statistics

We used a mixed model linear regression analysis to scrutinise the log10 WBC as the response variable. This method is designed for repeated measures and copes with missing data cells. We had to address the occurrence of an excess of zero WBC counts and achieved this by using the glmmADMB procedure in R to construct a zero-inflated negative binomial regression model to accommodate over-dispersion of the data; the regression model was specified for the mean of the negative binomial. The zero WBC counts were expressed as very small numbers for the log10 transformation. The symptoms were treated as individual independent variables (39) questions.

To accommodate an interpretation, commensurate with common clinical practice, the patients were additionally grouped according to their pyuria counts: zero, pyuria 1–9 or pyuria ≥10, and then the symptoms were plotted against these groups (see Fig. 4) We did this because of the widespread use of pyuria ≥10 WBC μl−1 as the threshold for “significant” pyuria [12], although results that incriminate pyuria between 1 and 9 have been published [2].

To compare the voiding and pain symptoms among the three categories of pyuria, we used ordinal regression with the category as the dependent variable and the number of voiding symptoms and pain symptoms as covariates.

We used a two-tailed non-parametric test with an effect size (d) = 0.5, which means that an effect in either direction would be interpreted. The criterion for significance (probability of alpha error) was set at 0.050 and a power of 80% (1-beta error of probability = 0.8). With the proposed sample size of 170 for the two groups (43 controls and 127 patients), the study had a power of 80% to yield a statistically significant result with an effect size d = 0.5.

Cohen’s d effect size = 0.5 (Fig. 1) was selected, as this is the smallest effect that it would be important to detect, in the sense that any smaller effect would not be of clinical or substantive significance. It was also assumed that this effect size is reasonable, in the sense that an effect of this magnitude could be anticipated in this field of research. Cohen’s d is defined as the mean difference expressed in standard deviations. d = difference between means ÷ pooled standard deviations of the two groups.

Fig. 1
figure 1

Four-way Venn diagram of symptom overlap

The sample of at least 1,990 patients had sufficient power to analyse 39 predictor symptoms. Missing data were recorded as null fields.

Results

The first data set was collected between January 1991 and July 1999 and analysed when the software was re-written to accommodate the millennium bug threat. It contained data from 2,446 patients; 2,019 (82%) female and 427 (18%) male with a mean age of 53 (SD = 19). During 7,467 consultations, they provided 65,535 data entries. The first question set, made up of each symptom being described by ≥2% of respondents, consisted of those depicted in Table 1 apart from questions 29 to 33.

The second data set, collected between August 1999 and July 2004, was analysed once we had validated our method for counting pyuria. There were data on 2,109 patients; 1,836 (87%) female and 273 (13%) male with a mean age of 52 (SD = 30). There were 7,046 consultations that provided 54,159 data entries. This analysis resulted in modification of the description of bladder pain, which was expanded into a four-question set: bladder pain or discomfort on filling; filling bladder pain relieved by voiding; filling bladder pain partially relieved by voiding; (filling bladder pain unrelieved by voiding (questions 29 to 33 in Table 1). We did this because these symptom qualifications, referring to voiding, were described by ≥2% of respondents (Table 1).

During the study phase, the LUTS symptom inventory was administered to 2,050 female patients presenting between August 2004 and November 2011. Their mean age 52 (SD = 17) and the average number of symptoms was 10 (SD = 5.9; median = 9), with a mean duration of symptoms of 6 years (SD = 6). The considerable overlap between the symptom groups is illustrated in the four-way Venn diagram of Fig. 1.

A total of 1,305 patients (64%) had no pyuria at presentation, 356 patients (17%) had pyuria between 1 and 9 WBC μl−1 and 389 (19%) had pyuria of ≥10 WBC μl−1. Two hundred and forty-five patients (12%) demonstrated positive MSU cultures.

Reliability

Internal consistency

Cronbach’s alpha for urgency was 0.88 with 11 items, for pain it was 0.861 with 13 items, for stress incontinence 0.884 with 8 items and for voiding 0.882 with 7 items. When the symptoms were counted within each group, Cronbach’s alpha was 0.35 for the counts of urgency symptoms, stress incontinence symptoms, voiding symptoms and pain symptoms, confirming that the symptoms between groups measured different things.

The theoretical value of alpha varies from zero to 1. Higher values of alpha are more desirable. A reliability of 0.70 or higher is often required as a consensus before an instrument is used. 0.80–0.9 is good and >0.9 is excellent.

Test–retest reliability

Ten patients participated in the test–retest analysis. Cronbach’s alpha was 0.981 with an inter-item correlation coefficient of 0.974.

Inter-observer reliability

Ten patients participated in the inter-observer reliability analysis. Cronbach’s alpha was 0.995 with an inter-item correlation coefficient of 0.991.

Responsiveness

Internal responsiveness

A total of 2,050 patients provided data on follow-up over 4 visits (24 weeks); the patients were treated with a combination of antimuscarinics and antibiotics. Analysis showed a significant reduction in the total number of symptoms F = 221, p < 0.001. The tables from the mixed model analysis are shown in the Appendix (Fig. 2)

Fig. 2
figure 2

Internal responsiveness

External responsiveness

We examined external responsiveness by comparing the number of symptoms as the dependent variable using the following independent variables: the patients’ assessment of overall response; 24-h frequency; 24-h incontinence.

The results were as follows. Overall response: F = 359, df = 5, p < 0.001; 24-h frequency: F = 255, p < 0.001; 24-h incontinence: F = 320, p < 0.001. Figure 3 plots the number of symptoms against the patients’ overall assessment of response, the 24-h urinary frequency, and urinary incontinence.

Fig. 3
figure 3

External responsiveness

Construct validity

The zero-inflated negative binomial regression model parameter estimates and statistics, with pyuria (WBC μl−1) as the dependent variable; repeated measures identified by visit number; and with the independent covariates being age, 25-frequency and 24-h incontinence and the 39 symptoms (No = 0 Yes = 1) are shown in Table 2. Age, 24-h frequency, 24-h incontinence, urge incontinence, latchkey incontinence, bladder pain post-void, dysuria and reduced stream explained a significant proportion of the variance in log10 pyuria; exercise incontinence, passive incontinence and pre-cough guarding were negative explanatory variables.

Table 2 Zero-inflated negative binomial regression model parameter estimates and statistics

The clinical implications

Comparison with FLUTS

The correlation between the FLUTS subgroups and the grouped symptoms from our inventory are shown in Table 3. The coefficients range from 0.36 to 0.57 and it is interesting that these were greatest for the voiding symptoms.

Table 3 Correlation between female lower urinary tract symptoms (FLUTS) subgroups and grouped symptoms. Correlation matrix from comparison with International Consultation on Incontinence Modular Questionnaire (ICIQ)

Symptom implications

The core aim was to examine the relationship between the symptoms and microscopic pyuria as the best marker of UTI. The MSU cultures, only 12% positive, were not suitable for such analysis. An important finding was the prominence of voiding symptoms as indicators of pyuria. In particular, these seemed to be most discerning in the milder expressions of the disease. Figure 4 plots the voiding symptoms count (covariate) against the pyuria groups (dependent ordinal variable; zero, 1 to 9, ≥ 10) in patients not describing pain (χ2 = 88, df = 1, p < 0.001). The voiding symptoms count discriminated among all three categories. The pain symptom count only discriminated between zero pyuria and any pyuria (χ2 = 148, df = 1, p < 0.001; Fig. 5).

Fig. 4
figure 4

Voiding symptoms related to the pyuria group

Fig. 5
figure 5

Pain symptoms related to the pyuria group

Discussion

A most important finding from this study has been the discovery that in women, voiding symptoms play an important part in the complex associated with pyuria, pain features but is only evident for the higher levels of pyuria; it would seem that voiding symptoms are more discriminating.

There are many symptom scores reported in the literature that measure the experience of lower urinary tract disease, designed and validated by others. We would not have invested the effort required to develop this inventory without good reason. There is a growing realisation that the methods used to screen for urine infection in clinical practice are far from accurate and that their sensitivities do not justify the given status as arbitrators of the presence or absence of infection. We have recently reported that in patients with chronic LUTS, the dipstick test is at best 59% sensitive [3] and others have shown a diverse urinary microbiome in patients with incontinence and urgency when routine urine cultures are negative [9, 13]. In truth, we do not have reliable methods for achieving such decisive diagnoses. We must therefore rely on the symptoms and signs, which may be no bad thing. We must take a history, examine the patient and not delegate the diagnostic decision to a single test.

The questionnaire that is described in this paper was designed to measure the symptomatic manifestations of pyuria. We could not validate against proven UTI because there is no contemporary microbiological method capable of furnishing the necessary data with sufficient accuracy. Microscopic pyuria, measured by immediate microscopy of a fresh, unstained and unspun specimen of urine in a haemocytometer is the best marker of urinary infection that we have, despite its surrogacy [2].

The 39 questions that make up the questionnaire had their origin in the free texts collected when patients were asked to describe their condition in their own words. Many of the early years of this project were devoted to obtaining these data and fashioning them into an inventory requiring dichotomized responses. Fidelity to the original source data was maintained throughout so that the questionnaire was not altered to accommodate data from investigations or diagnostic categorization. The correlates show that the patients’ own descriptions of their states concur with the pathophysiology.

The symptoms score proved a good predictor of microscopic pyuria and monitor of disease progression. The dependent variable, pyuria, is the best surrogate marker of UTI available. The score correlated with measures of quality of life. Voiding symptoms were proved indicators of mild disease with pain indicating more severe inflammatory responses.