Background

Acute uncomplicated urinary tract infections (UTI) are one of the most common bacterial infections among women presenting to primary care, with an annual incidence of 7% for all ages of women peaking at 15-24 years and women older than 65 [1]. Approximately one third of all women have had at least one physician-diagnosed uncomplicated UTI by the age of 26 years [2].

The original reference standard for diagnosing UTI was the presence of significant bacteriuria, defined as the isolation of at least 105 colony-forming units (CFU) of a single uropathogen, in a clean catch or catherised urine specimen [3]. However, this cut-off limit has been debated in recent years resulting in the use of reduced diagnostic thresholds ranging from 102 [47] and 103 [811].

The pre-test probability of asymptomatic bacteriuria in women of reproductive age is approximately 5% [12, 13]. However, the pre-test probability of an uncomplicated UTI is shown to increase from 5% to 50% among women presenting with at least one symptom of an uncomplicated UTI [14]. Symptoms of an uncomplicated UTI include dysuria (painful voiding), frequency (frequent voiding of urine), urgency (the urge to void immediately), and hematuria (presence of blood in urine). In contrast, patients presenting with vaginal discharge or irritation have a decreased risk of an uncomplicated UTI [14]. The presence or absence of symptoms function as useful diagnostic tests. Near patient testing in the form of urinary dipsticks are also commonly used in Primary Care to improve the precision of UTI diagnosis, providing immediate results which can be interpreted alongside patient symptoms.

Although empirical treatment of UTI is most cost-effective [15, 16], prescribing without confirmation of diagnosis contributes to the growing problem of resistance against uropathogens in primary care [17].

A previous systematic review established the diagnostic accuracy of symptoms and signs for UTI [14], however, it remains unclear whether the diagnostic accuracy of symptoms and signs varies when alternative reference standards are applied. The aim of this systematic review is to determine the diagnostic accuracy of symptoms and signs of UTI in adult women across three different reference standards, 102, 103 and 105 CFU/ml. In addition, we aim to determine the diagnostic accuracy of symptoms and signs combined with dipstick test results.

Methods

The PRISMA guidelines for reporting on systematic reviews and meta-analysis were followed to conduct this review (Additional file 1).

Search strategy

We performed a systematic search of three online databases, Pubmed (1966 to April 2010), Embase (1973 to April 2010) and the Cochrane Library (1973 to April 2010). A combination of MeSH terms and text words were used including: 'urinary tract infection/pyelonephritis/cystitis/urethritis', 'physical examination/medical history taking/professional competence', 'sensitivity and specificity', ' reproducibility of results/diagnostic tests, routine/decision support techniques/bayes theorem/predictive value of tests'. All combinations were restricted to 'women and female'. This search was supplemented by checking references of filtered papers and searching Google Scholar [18]. No restrictions were placed on language.

Study selection

To be eligible for inclusion, the studies had to fulfil the following criteria:

  1. 1)

    Have a study population of adult symptomatic women with suspected uncomplicated UTI presenting to a primary care setting.

  2. 2)

    Use a cohort or cross-sectional study design. Case control studies were excluded.

  3. 3)

    Investigate the diagnostic accuracy of symptoms and signs of UTI using a urine culture from a clean-catch or catherised urine specimen as the reference test, with a diagnostic threshold of at least ≥ 102 CFU/ml.

  4. 4)

    Include sufficient data to allow for the calculation of sensitivity, specificity, negative and positive predictive values and the prevalence of uncomplicated UTI.

Data extraction

The number of true positives, false positives, true negatives and false negatives for each sign and symptom were extracted from each of the studies and a 2 × 2 table was constructed. Discrepancies were resolved by discussion between the two reviewers (LG and GC). Authors were contacted to provide further information when there was insufficient detail in an article to construct a 2 × 2 table.

Quality assessment

The methodological quality of the selected studies was evaluated independently by two reviewers (LG and GC) using the Quality Assessment of Diagnostic Accuracy Studies (QUADAS) tool, a validated tool for the quality assessment of diagnostic accuracy studies [19]. This tool was modified to ensure appropriateness to the present study and included twelve questions from the QUADAS tool with two additional questions extracted from a different review [20]. If no consensus was achieved, studies were evaluated by a third independent reviewer (TF).

Data synthesis and analysis

Summary estimates across different reference standards

We used the bivariate random effects model to estimate summary estimates of sensitivity and specificity and their corresponding 95% confidence intervals. This approach was used as it preserves the two-dimensional nature of the original data and takes into account both study size and heterogeneity beyond chance between studies [21]. In addition, the bivariate model estimates and incorporates the negative correlation which may arise between the sensitivity and specificity of a given sign or symptom as a result of differences in reference standards used in different studies. These alternative thresholds are important when attempting to understand the diagnostic accuracy of symptoms and signs predicting uncomplicated UTI as studies have used different thresholds ranging from ≥ 102 CFU/ml, ≥ 103 CFU/ml and ≥ 105 CFU/ml. However, pooled estimates cannot be calculated using the bivariate model with less than 4 studies.

We plotted the individual and summary estimates of sensitivity and specificity for each symptom and sign at the different threshold levels in a receiver operating characteristic graph, plotting a symptom's sensitivity (true positive) on the y axis against 1-specificity (false negative) on the x axis. We also plotted the 95% confidence region and 95% prediction region around the pooled estimates to illustrate the precision with which the pooled values were estimated (confidence ellipse around the mean value) and to illustrate the amount of between study variation (prediction ellipse). We assessed heterogeneity visually using the summary ROC plots and statistically by using the variance of logit transformed sensitivity and specificity, with smaller values indicating less heterogeneity among studies.

Bayesian analysis and near patient testing (dipstick)

To examine the influence of threshold effects when considering alternative reference standards we conducted subgroup analysis across the three different thresholds: ≥ 102 CFU/ml, ≥ 103 CFU/ml and ≥ 105 CFU/ml. Using Bayes theorem the post-test odds of a UTI were estimated by multiplying the pretest odds by the likelihood ratio, where pre-test odds is calculated by dividing the pre-test probability by (1-pre-test probability) and the post-test probability equals post-test odds divided by (1 + post-test odds) [22]. Finally, the diagnostic accuracy of individual symptoms and signs combined with dipstick test results for nitrites, leucocyte-esterase and combined nitrites and leucocyte-esterase, was determined using data synthesised in a previous high quality systematic review regarding the diagnostic accuracy of dipstick urinalysis [23].

We used Stata version 10.1(StataCorp, College Station, Tx, USA), particularly the metandi commands, for all statistical analyses

Results

Search Strategy

Two researchers (LG, GC) screened all potential articles and agreed that the full text of 51 articles should be examined. Nineteen relevant studies were identified by our search strategy [4, 610, 2436]. Five additional studies [3741] were found by citation searching and two studies by Google Scholar [42, 43]. Ten of the 26 studies reported all required data [810, 25, 3742]. The authors of the remaining papers were contacted for additional data. Ten authors responded [4, 6, 7, 24, 2628, 32, 35, 43, 44] and six studies were subsequently included [4, 6, 7, 24, 26, 43]. The flow diagram of our search strategy is presented in Figure 1.

Figure 1
figure 1

Flow diagram of studies in the review.

Characteristics of included studies

The sixteen studies included 3,711 patients and were carried out in a primary care setting. One study was based in the USA [39], two in Canada [4, 6], one in New Zealand [38], eight in the UK [8, 9, 24, 25, 37, 40, 41, 43] and four in other European countries [7, 10, 26, 42]. The mean weighted prior probability is 65.1 using a reference test of ≥ 102 CFU/ml. The mean weighted prior probability using a reference test of ≥ 103 CFU/ml and ≥ 105 CFU/ml is 55.4% and 44.8% respectively. Summary characteristics of each included study are presented in Table 1.

Table 1 Summary of included studies

Quality assessment

The summary diagram of the quality assessment is shown in Figure 2. The overall quality of the included studies ranges from moderate to good. It is important to note that several studies were conducted before the introduction of standards for reporting diagnostic accuracy studies [3741]. Spectrum bias is identified as a potential source of bias across certain studies, with studies including both complicated and uncomplicated patients [7, 38] or failing to clearly report whether the study was focusing on complicated or uncomplicated UTI [26, 40]. Partial verification bias is also noted in two studies whereby only a selected sample of patients' symptoms are verified by the reference test [24, 41]. Furthermore, the presence of un-interpretable test results and blinding of symptoms and signs and reference test results are poorly reported.

Figure 2
figure 2

Quality assessment. Included questions from the Quadas Tool: [19]

1. Was the spectrum of patient's representative of the patients who will receive the test in practice? (Q1). 2. Were selection criteria clearly described? (Q2). 3. Is the reference standard likely to correctly classify the target condition? (Q3). 4. Is the time period between reference standard and index test short enough to be reasonably sure that the target condition did not change between the two tests? (Q4). 5. Did the whole sample or a random selection of the sample receive verification using a reference standard of diagnosis? (Q5). 6. Did patients receive the same reference standard regardless of their symptoms and signs? (Q6). 7. Were all signs and symptoms clearly defined? (Q7). 8. Was the execution of the urine culture described in sufficient detail to permit replication? (Q8). 9. Were signs and symptoms interpreted without knowledge of the results of urine culture? (Q9). 10. Were the results of the urine culture interpreted without knowledge of the symptoms and signs? (Q10). 11. Were uninterpretable/intermediate test results reported? (Q11). 12. Were withdrawals from the study explained? (Q12). Additional question: [20]. 13. Were the patients selected consecutively? (Q13). 14. Were statistical tests for main outcome adequate? (Q14).

Diagnostic test accuracy of symptoms and signs

Summary estimates across different reference standard thresholds

Sixteen studies examined the accuracy of ten different symptoms and signs of UTI. The pooled sensitivities, specificities and the respective variance of the logit-transformed sensitivity and specificity, for individual symptoms and signs at each of the three reference standard threshold levels are presented in Tables 2, 3 and 4 respectively. Furthermore, the summary estimates of positive and negative likelihood ratio's for individual symptoms and signs at each of the three threshold levels are presented in Table 5. Six symptoms are identified as having useful diagnostic value at a reference standard threshold of ≥ 102 CFU/ml, as their 95% confidence interval values do not cross the line of no effect. Presence of dysuria, frequency, hematuria, nocturia and urgency are found to increase the probability of UTI. Presence of vaginal discharge decreases the probability of UTI. Presence of hematuria in urine has the highest diagnostic utility (LR+ 1.72), with a specificity of 0.85 and a sensitivity of 0.25, thus hematuria when present is more useful in 'ruling in' UTI. In contrast, all other significant symptoms are identified as being more useful in 'ruling out'. A similar pattern of results emerge using a higher reference standard threshold ≥ 103 CFU/ml. Consistent with lower threshold effects dysuria, frequency and urgency remain significant symptoms for ruling out a urinary tract infection at ≥ 105 CFU/ml.

Table 2 Summary estimates of sensitivity and specificity using a bivariate random effects model (102)
Table 3 Summary estimates of sensitivity and specificity, using a bivariate random effects model (103)
Table 4 Summary estimates of sensitivity and specificity using a bivariate random effects model (105)
Table 5 Summary estimates of positive and negative likelihood ratio's, using a bivariate random effects model (102, 103, 105)

The individual and summary estimates of sensitivity and specificity, the 95% confidence region and 95% prediction region for each symptom and sign at each of the threshold levels are presented in a receiver operating characteristic graph in figures 3, 4 and 5. The 95% confidence region remains large for several symptoms and signs across the different diagnostic thresholds, with the exception of dysuria, frequency and hematuria. This indicates greater precision of the pooled estimates for dysuria, frequency and hematuria. The 95% prediction region (amount of variation between studies) is also wide for most symptoms and signs across the different diagnostic thresholds, as reflected in the large values for the variance of logit-transformed sensitivity and specificity, with the exception of hematuria.

Figure 3
figure 3

Receiver operating characteristic graphs with 95%-confidence region and 95%- prediction region for each sign and symptom (10 2 ).

Figure 4
figure 4

Receiver operating characteristic graphs with 95%-confidence region and 95%- prediction region for each sign and symptom (10 3 ).

Figure 5
figure 5

Receiver operating characteristic graphs with 95%-confidence region and 95%-prediction region for each sign and symptom (10 5 ).

Bayesian analysis and near patient testing (dipstick)

Using Bayes theorem the post-test probability across the three threshold levels are presented in table 6. Most notable, presence of hematuria increases the pre-test probability from 65.1% to 75.8% (95% CI 70.9 - 80.1) using ≥ 102 CFU/ml and to 67.4% (95% CI 60.6 - 73.6) using ≥ 103 CFU/ml. Presence of vaginal discharge decreases the pre-test probability from 65.1% to 54.1% (95% CI 48.3 - 59.9). The probability of a UTI increases to 93.3% (≥ 102 CFU/ml) and 90.1% (≥ 103 CFU/ml) when the presence of hematuria is combined with a positive dipstick test for nitrites (table 7). Combining the presence of hematuria with a positive dipstick test for leucocyte-esterase increases the probability to 81% and 73.8% respectively (table 8). The post-test probability of UTI when the presence of dysuria, frequency, nocturia, hematuria and urgency is combined with either positive dipstick test for leucocyte-esterase or a combination of nitrites and leucocyte-esterase is also lower relative to positive symptoms combined with nitrites alone (table 8, 9). In contrast, the presence of vaginal discharge combined with a negative dipstick test result for nitrites reduces the probability of UTI to 38.4% (table 7). The presence of vaginal discharge combined with a negative result for combined nitrites and leucocyte-esterase dipstick test reduces the post-test probability further to 15% (table 9).

Table 6 Post-test probability of significant symptoms across three different reference standards 102, 103 and 105 CFU/ml
Table 7 Post-test probability of significant symptoms with a positive (LR 4.42) or negative dipstick (LR 0.53) test for nitrites [23]
Table 8 Post-test probability of significant symptoms with a positive (LR 1.36) or negative dipstick (LR 0.36) test for leucocyte-esterase [23]
Table 9 Post-test probability of significant symptoms with a positive (LR 2.57) or negative dipstick (LR 0.15) test for nitrites and leucocyte-esterase combined [23]

Discussion

Principal findings

Individual symptoms and signs suggestive of a UTI have modest diagnostic discriminative value when assessed against three reference standard thresholds for UTI. Dysuria, frequency and urgency have a higher sensitivity than specificity and are more useful in ruling out a UTI diagnosis when absent across all three reference standard thresholds ≥ 102 CFU/ml, ≥ 103 CFU/ml and ≥ 105 CFU/ml. In contrast, hematuria has a higher specificity than sensitivity and is more useful in ruling in a diagnosis of UTI when present across the reference standard thresholds ≥ 102 CFU/ml and ≥ 103 CFU/ml. Combining positive dipstick test results, particularly tests for nitrites, with symptoms increases post-test probability of a UTI. In particular, presence of hematuria combined with a positive dipstick test result for nitrites increases the post-test probability from 75.8% to 93.3% at ≥ 102 CFU/ml and from 67.4% to 90.1% at ≥ 103 CFU/ml. Similarly, presence of dysuria combined with a positive dipstick test result for nitrites increases post- test probability from between 51.1% to 82.2% at ≥ 105 CFU/ml.

Context of previous studies

The findings of this systematic review are consistent with a previous systematic review which concluded that no sign or symptom on its own is powerful enough to 'rule in' or 'rule out' the diagnosis of UTI [14]. However, the relative diagnostic importance of individual symptoms and signs varies between this review and the previous systematic review [14]. The previous systematic review found that presence of dysuria, frequency, hematuria, back pain and costovertebral angle tenderness increase the probability of UTI using a diagnostic threshold ranging from between ≥ 102 CFU/ml and ≥ 105 CFU/ml, also history of vaginal discharge, history of vaginal irritation and vaginal discharge on examination decrease the probability of a UTI. In this systematic review we found that dysuria and frequency increase the probability of UTI across different reference standard thresholds ≥ 102 CFU/ml, ≥ 103 CFU/ml and ≥ 105 CFU/ml. Hematuria is also significant in the present study using a diagnostic threshold of ≥ 102 CFU/ml and ≥ 103 CFU/ml. However, in contrast to the previous systematic review back pain is not significantly associated with UTI across the different reference standard thresholds. Vaginal discharge is identified as an important symptom for decreasing the probability of UTI in the present study.

Such differences may be an artefact of different methodological approaches taken. Firstly, the previous systematic review pooled all studies irrespective of the reference standard threshold used, whereas the present study sought to determine the importance of individual symptoms and signs at different reference standard thresholds. In addition, our inclusion criteria was more conservative, excluding studies which involved self-diagnosis, case-control study designs and different healthcare settings (i.e. not primary care settings) where the prevalence of symptoms may differ and increase the chance of spectrum bias.

Strengths and limitations of this study

The systematic search, the conservative inclusion criteria, the inclusion of additional data from authors, and the quality assessment of the included studies can be seen as strengths of this study. In addition, given the lack of consensus regarding reference standard thresholds for UTI, the current study is the first study to determine the diagnostic accuracy of symptoms and signs across the three thresholds ≥ 102 CFU/ml, ≥ 103 CFU/ml and ≥ 105 CFU/ml. Lastly, this study highlights the additional importance of using dipstick test, particularly tests for nitrites, as an additional diagnostic tool when ruling in a UTI diagnosis based on particular symptomatology.

We acknowledge that this review has limitations. Variability of diagnostic accuracy estimates across studies is high. This may be due to the fact that we did not restrict the age of women included in the meta-analysis. It is known that the prevalence of UTI differs across age groups, peaking at 15-24 years and greater than 65 years [45]. In addition definitions used to describe individual symptoms and signs vary across studies. For example, 'lower abdominal pain' has been defined as 'suprapubic pain'[4, 6, 7], 'suprapubic pressure' [42] or 'abdominal pain' [24]. Furthermore, as the bivariate random effects model is used symptoms and signs are analysed, when at least 4 studies are included. Therefore few symptoms and signs are excluded from our meta-analysis particularly at the higher reference standard threshold of ≥ 105 CFU/ml. Finally, while the probability of UTI increases when the presence of certain symptoms are combined with positive dipstick test results, it is important to acknowledge that the predictive value of the dipstick test result, was taken from a meta-analysis which included men and pregnant women [23].

Implications for practice

Individual symptoms and signs will modestly increase the post-test probability and cannot accurately 'rule in' or 'rule out' the diagnosis of a UTI. Subgroup analysis shows improved diagnostic accuracy using lower reference standards of ≥ 102 CFU/ml and of ≥ 103 CFU/ml. In addition, combining nitrite dipstick test results with clinical symptoms and signs is useful for ruling in a UTI diagnosis and deciding on the optimal patient management strategy. More recently, formal clinical prediction rules for UTI that incorporate the independent effects of symptoms and signs into a "risk score" have been proposed as an alternative management strategy when considering antibiotic treatment [8]. This approach appears to be equivalent to alternative management strategies for UTI in women including empirical immediate antibiotics, empirical delayed antibiotics, or use of near-patient testing with a dipstick in terms of duration or severity of symptoms. However, in terms of antibiotic usage, use of a dipstick results in fewer antibiotics being prescribed when compared to immediate empirical antibiotics or use of a UTI "risk score" [46].

Future studies

The current approach of evaluating symptoms and signs as a diagnostic test is in general two-dimensional, and ignores symptom severity [8, 9, 28] In the future, focusing on severity of symptoms may provide more valuable diagnostic information.

Conclusions

Individual symptoms and signs, independent of reference standard threshold have a modest ability to 'rule in' or 'rule out' the diagnosis of UTI. Use of a dipstick test enhances diagnostic utility and reduces the chance of prescribing unnecessary antibiotics. Future studies should focus on the refinement of diagnosis utilising information on severity and duration of symptoms, alone, in combination and alongside dipstick testing.