In these studies, the diagnostic performance and user-friendliness of two RADTs for GAS were assessed by the intended user under real-life conditions in primary health care. The diagnostic sensitivity was 92% (95% CI 87–96%) and 72% (95% CI 65–79%), while the diagnostic specificity was 86% (95% CI 81–90%) and 98% (95% CI 96–99%) for QuickVue and DIAQUICK, respectively. Both QuickVue and DIAQUICK obtained acceptable assessment for user-friendliness and thus fulfilled SKUPs quality goal. Very few studies have previously reported results for user-friendliness for RADTs [23].
For QuickVue, the diagnostic sensitivity in the present study was higher than pooled estimates obtained from meta-analysis for EIA (85% (95% CI 83–88%) for children [8] and 86% (95% CI 81–91%) for adults [10]), and fulfilled SKUPs quality goal for diagnostic sensitivity (> 80%). The diagnostic specificity for QuickVue, however, is lower than the pooled specificity in meta-analysis for EIA for children (96%, 95% CI 95–97%) and for adults (97%, 95% CI 96–99%) [10]) and from that reported by the manufacturer of the test (94%, 95% CI 91–97%) [20]. QuickVue did not fulfill the quality goal set by SKUP for diagnostic specificity (> 95%). However, the specificity of QuickVue in our study was within the range of reported specificity for EIA (54–100%) [8, 11]. In our evaluation of QuickVue, 18 of the 27 (66%) false positive results were from two sites representing 39% (121/309) of all tests. The remaining five sites accounted for 33% (9/27) of the false positive, representing 61% of the samples, giving a diagnostic specificity of 92%. In addition, of the 27 false positive results, five (18%) were reported as weakly positive (weak test line) by the PHCCs. Thus, user specific performance may partly explain the low specificity obtained for QuickVue in our study. Time to culture could also lead to “false positive” results since GAS does not survive at 4 °C [24] and may affect the result of the culture. Anyhow, in the evaluations, the standard procedure for sending throat samples to a laboratory for culturing used in primary health care was followed.
For DIAQUICK, the diagnostic specificity in the present study was higher than pooled estimates from meta-analysis for EIA for children (96%, 95% CI 95–97%) [8] and for adults (97%, 95% CI 96–99%) [10]), and fulfilled SKUPs quality goal for diagnostic specificity (> 95%). The five false positive results for DIAQUICK were from one site, but we have no information whether these were weakly positive or not. The site accounted for about 60% of the samples (208/351) representing 39% (121/309) of all tests.
The diagnostic sensitivity for DIAQUICK in our study, however, is lower than the pooled sensitivity in meta-analysis for EIA in children (85%, 95% CI 83–88%) [8] and adults (86%, 95% CI 81–91%) [10]), and did not fulfill SKUPs quality goal for diagnostic sensitivity (> 80%). The diagnostic sensitivity is also lower than reported by the manufacturer of the test (97%, 95% CI 91–99%) [21], and in a prospective study from 2008 that was performed in an outpatient setting with children (95.8%) [25]. However, that study had several limitations including unclear description of the comparison method, and they used an older version of the DIAQUICK. Anyway, the diagnostic sensitivity of DIAQUICK in our study was within the range of reported sensitivity for EIA (59–96%) [9]. Since the evaluations were performed in a region with low risk of serious complications caused by GAS, a low sensitivity is a lesser problem.
The laboratory staff were informed about the results of the RADTs. This may have introduced the opportunity for bias in the examination and interpretation of the cultures, which could have a slight impact on the figures for sensitivity and specificity.
There is no global consensus for the quality goal for diagnostic sensitivity and specificity for RADTs since the importance of these parameters varies in different parts of the world. This might contribute to the wide range of diagnostic sensitivity for different RADTs. In high-income countries the risk of serious complications caused by GAS is low and healthcare focus on minimizing inappropriate use of antibiotics. High sensitivity might result in detection of GAS carriers and unnecessary treatment. Thus, high diagnostic specificity is more important than high sensitivity. The performance criteria set by SKUP for RADTs are based on previous studies performed in Scandinavian countries [16,17,18], and are also in line with the pooled estimates for diagnostic sensitivity and specificity from meta-analysis [9, 10]. Nevertheless, in the meta-analyses, there was great heterogeneity with a high variability in methodology for the included studies, and the authors disclose that they do not have strong confidence in the estimates due to high heterogeneity of the included studies [9, 10].
Some studies indicate that the sensitivity of RADTs vary with the spectrum of disease [12, 26,27,28], and an increased number of modified Centor criteria [5] has been shown to be associated with increased RADT sensitivity [12, 28]. However, in a meta-regression, there were no significant associations between clinical severity (assessed by modified Centor criteria [5]) and sensitivity and/or specificity of the RADTs [8]. In our study, the same percentage of the patients (almost 60%) had three or four Centor criteria in the evaluation of both QuickVue and DIAQUICK. QuickVue did fulfill SKUPs criteria for diagnostic sensitivity, and the diagnostic sensitivity did not change after exclusion of patients with only two Centor criteria (data not shown). Thus, the low sensitivity for DIAQUICK can probably not be explained by that a relatively high proportion of the patients had only two Centor criteria.
There is also a risk of false negative results from culturing and RADTs if the amount of secretion obtained from the throat samples is too small, and studies have shown that the sensitivity of RADTs increased considerable with inoculum size [13, 14]. In our evaluation of DIAQUICK, more than 83% of the false negative results displayed sparse or moderate growth of colonies which may indicate small amount of secret in the samples and may, thus, contribute to the low sensitivity. For culturing, there is additional risk for obtaining upper respiratory tract normal flora when collecting the throat samples resulting in false negative results. Culture performance is also affected by the conditions used for plating and incubation of the cultures [29], and there is no consensus on the details in the methods for culture of S. pyogenes. In our study, the cultures were performed in an accredited laboratory, and all results from the internal and external analytical quality control were satisfactory. Anyway, if the comparison method is either relatively insensitive or too sensitive, the performance of RADT may be evaluated erroneously. Neither the RADTs nor culturing, or any other methods, are able to distinguish between patients who are carriers of GAS and those who are actually infected with GAS.
In our study, about one third of the patients were < 10 years of age. However, this did probably not affect the results since meta-analysis found that the sensitivity and specificity of the RADTs when analyzed in pediatric studies alone were similar to the overall estimates [9]. Furthermore, the pooled sensitivity and specificity found in children and in mixed population of children and adults are very similar [11].
The prevalence of GAS in the tested population affects the performance of RADTs. In our studies, the prevalence of GAS was 38% in the evaluation of QuickVue and 30% in the evaluation of DIAQUICK. Used in a population with similar prevalence of disease, the positive predictive value (PPV) and the negative predictive value (NPV) for QuickVue is 80% and 95%, respectively. The corresponding numbers for DIAQUICK are 94% and 89%, respectively. In a population with a prevalence of 25% for GAS, the PPV and NPV would have been 69% and 97%, respectively for QuickVue. For DIAQUICK, the corresponding numbers are 92% and 41%, respectively.
The difference of prevalence of GAS in the same population in 2015 and 2018 is within the expected variation between seasons. We have no indication that the prevalence has been affected by genetic changes leading to altered virulence of Streptococcus pyogenes and prevalence of GAS infections [30].
In conclusion, the diagnostic sensitivities were 92% and 72%, and the diagnostic specificities were 86% and 98% for QuickVue and DIAQUICK, respectively in primary health care. Both RADTs obtained acceptable assessments for user-friendliness and fulfilled SKUPs quality goal for user-friendliness. There are several factors that can affect the performance of RADTs, and these studies provide an objective and supplier-independent information about analytical quality and user-friendliness when used under real-life conditions by the intended users.