Background

Diagnosis of childhood tuberculosis (TB) remains challenging [1]. Two commercial IFN-γ release assays (IGRAs), namely T-SPOT.TB and QuantiFERON®-TB Gold In-Tube (QFGT), have been recently developed. These methods show acceptable diagnostic accuracy for active TB in adults [2] and have better correlation with tuberculosis exposure than the TSTs in contact tracing for revealing latent infections (LTBI) [3, 4]. However, pediatric data are limited, and studies evaluating the performance of IGRAs in children have been called for especially in low-incidence countries [2, 5, 6].

In Finland the incidence of TB is low, 5.6/100 000 inhabitants in 2006 [7]. Only zero to six child TB diagnoses per annum has been registered during the last ten years [8]. Universal BCG vaccination of newborns was practiced until 2006 [9]

Between September 2004 and January 2007 our laboratory adapted the tests to the general needs in Finland. After completion of the field evaluation [10] the methods became available for routine diagnostics. We then evaluated the performance of IGRAs in children and compared them to the TST and clinical diagnoses.

Findings

Study Population

Between 15.9.2004 and 15.10.2007 we analyzed samples from 99 consecutive children and adolescents. The flow diagram of the study is presented in Fig. 1. We reviewed the medical records of the patients and collected data on demographics, clinical, microbiological and radiological examinations, and the BCG vaccination history (Table 1). This study was approved by the Ethical Committee of the Hospital for Children and Adolescents, University of Helsinki (Nr.164/E7/05).

Table 1 Characteristics of children enrolled in the study (n = 99)
Figure 1
figure 1

The outline of the study design. The (*) means that immunoconversion was observed in one patient. In this case the Ly-TbSpot, the Mantoux test and the B-TbIFNγ were done and in August and repeated in December, the two were negative and the B-TbIFNγ was once borderline but the repeated B-TbIFNγ test and the Mantoux turned positive in February (for more details see Table 3).

Ly-TbSpot procedure and quality control

The Ly-TbSpot, a modified version of T-SPOT.TB (Oxford Immunotec, Oxford, UK) [11], was performed using standard operation procedures (SOP). Firstly, the results were expressed as a number of reactive spots/million lymphocytes. The lymphocyte count from isolated peripheral blood mononuclear cell (PBMC) preparation was calculated with an automated hematologic analyzer (Advia® 60, Bayer, Germany). Secondly, duplicate ELISPOT wells were used for each antigen and media. Thirdly, Purified Protein Derivative (PPD) (Statens Serum Institut, Copenhagen, Denmark) was used as an additional positive control. Fourthly, we adopted a double cut-off policy, i.e. the responses below 25 spot/million lymphocytes were considered as non-reactive according to the guidelines by the manufacturer; the responses between 25 and 55 spots were considered borderline and over 55 spots were interpreted as reactive. An internal control of cryopreserved cells was run with each new batch of reagents.

B-TbIFNγ procedure and quality control

B-TbIFNγ is a modified version of the QuantiFERON®-TB Gold In-Tube (Cellestis Limited, Carnegie, Victoria, Australia) [12]. In accordance with the SOP of our laboratory, stimulation of blood cells was done in tubes of the manufacturer. However, for measurement of IFN-γ levels, we used EIA of PeliKine Compact human EIA (Sanquin, Amsterdam, The Netherlands). This resulted in a steeper calibration curve and ensured more accurate result interpretation in the cut-off zone (Fig. 2). Two cut-off levels were applied. The samples showing a net reactivity (the reactivity of a sample minus the reactivity of the nil control) < 0.35 IU/ml were interpreted as non-reactive. Those showing reactivity between 0.35 and 0.50 IU/ml were interpreted as borderline and those with a reactivity exceeding 0.50 IU/ml were interpreted as reactive. For the internal quality control, an artificially prepared IFN-γ solution to simulate a low reactive sample was used.

Figure 2
figure 2

Improvement of the calibration curve for IFNγ-measurement. For measurement of IFN-γ levels, the original EIA reagent from Cellestis (circles) was substituted by that of PeliKine Compact human EIA (squares). This resulted in a steeper calibration curve and the use of the whole dynamic range of the photometer. Analytically, this means more accurate result interpretation in the cut-off zone. OD405 nm, optical densities measured at 405 nm.

Method of TST

TST was performed with two TU, purified protein derivative (PPD, RT23, Statens Serum Institut, Copenhagen, Denmark) according to the Mantoux technique. The result was recorded after 48–72 h. A cut-off for positivity was defined as 10 mm or more in BCG-vaccinated and 5 mm or more in non-vaccinated children. If a child had a negative TST result, but was examined because of contact with an infectious TB case, the test was repeated [13].

Definitions

For methods evaluation we used conventional interpretations of definitive and probable TB and of LTBI.

Exclusion criteria

The results of the tests were considered eligible if the reactivities to the mitogen were as suggested by the manufactures. Seven samples were excluded from the ELISPOT analysis because of high non-specific background.

Statistical analysis

The concordances between the IGRAs, and of each test with the TST, were assessed in three categories by describing proportions of agreement (PA) and with the Cohen's kappa statistic with linear weighting. 95% confidence intervals (CI) were calculated with the Wilson efficient-score method which was corrected for continuity. Using a conservative approach, the cases interpreted in the grey zone were placed into categories of false positives or false negatives for the calculations of specificity, sensitivity and accuracy.

Final diagnosis of the participants

Overall, 99 samples were analyzed, and their characteristics are shown in Table 1, Table 2, and Figure 1. The subjects were tested as follows: with the Ly-TbSpot alone, with the B-TbINFγ alone, and with both Ly-TbSpot and B-TbIFNγ. Participant numbers were 52, 20, and 27 (out of 99 total), respectively. Seven results were excluded from calculations because non-specific background reactivity made it impossible to interpret the samples as definitely non-reactive. Of those tested with the Ly-TbSpot alone, six cases were positive and one was a borderline case; of those tested with the B-TbINFγ alone, one was interpreted as a borderline and one was positive; of those tested with both methods, 11 were positive by both methods and two samples were interpreted as borderline only with the B-TbINFγ method. The majority of the patients were either new immigrants or had at least one parent from a country with a high incidence of TB. The median age of the participants was nine years and the age range was from two weeks to adolescence.

Table 2 Final diagnoses of the patients

The major indication for performing IGRA was recent contact with an infectious case of TB (62 of 99 patients). Thirty-seven children were examined because of symptomatic illness, with TB considered a diagnostic modality. Ten children were diagnosed with TB; five had pulmonary TB, only one of them culture positive; five were diagnosed with extrapulmonary TB. One patient also had an HIV infection, and one patient was HIV infected but had no TB.

Immunological conversion was observed in a 13-year-old boy who was examined several times because of the repeated household contacts with TB. His patient history and the kinetics of the immunological responses in all the three methods are presented in Table 3.

Table 3 Anamnestic data and the observed kinetics of immunological conversion in a 13-year-old boy.

The TST was performed in 87 (out of 99) subjects, in eight the test was not performed, and in four the data was missing from the records. Of those tested, positive TST was recorded in 25 cases, the size of the induration ranging from 10 to 30 mm. In one subject the TST converted from zero to 16 mm (Table 3). In another subject the TST test caused a severe delayed type hypersensitivity reaction, which resulted in a cosmetically unacceptable scar (Fig. 3A–C). A similar severe hypersensitivity reaction was observed in a third case that was not enrolled in the current study (Fig 3D, E).

Figure 3
figure 3

Exaggerated delayed type hypersensitivity reaction in the TST leading to a permanent scar. The upper panel represents results of the TST in a 14-year-old boy with LTBI, A taken three days, B two weeks and C three months after inoculation of 2 TU of PPD RT23. The original size of the induration was 20 mm. Photos: Courtesy of Dr. Peter Floman, Hospital of Porvoo. The lower panel represents results of the TST in a 13-year old girl with TB lymphadenitis, D taken 3 days and E two weeks after performing TST. This child was not enrolled in the current study.

Performance characteristics of the IGRAs

Combined performance characteristics for IGRAs are presented in Table 4. None of the samples tested with the Ly-TbSpot were condemned as false positives, whereas three samples fell into the borderline category with the B-TbIFNγ (see Fig. 1). According to our interpretation criteria, three patients in the Ly-TbSpot and one in the B-TbIFNγ cohorts were grouped as false negatives. One was a 7-year-old boy with tuberculous lymphadenitis who had a borderline Ly-TbSpot result in our definition, but the TST induration was 20 mm. The second was a 15-year-old boy with TB contact who had TST induration of 13 mm but his Ly-TbSpot results were repeatedly negative. The third case was a 6-year-old girl with no symptoms whose sibling was diagnosed with TB. She had a TST conversion within a year, the TST induration at the time of assessment was 10 mm, but both IGRAs remained non-reactive. The fourth case was a 9-year-old boy from a highly endemic area whose TST induration was 11 mm but the B-TbIFNγ remained non-reactive. Because of the lack of the gold standard for LTBI, in the last three cases the diagnosis was based primarily on extensive exposure to TB and positive TST, in accordance with the recommendations in our country [13].

Table 4 Performance characteristics of the IGRA-methods.

Agreement between Ly-TbSpot, B-TbIFNγ and the TST

The calculated data are presented in Table 5. The κ values above 0.75 imply good agreement, more than might have occurred by chance.

Table 5 Agreement between Ly-TbSpot, B-TbIFNγ and the TST

Analysis of non-interpretable results

Non-specific background activation of cells for IFNγ production was observed in seven samples analyzed with the Ly-TbSpot. Of those, six samples were from members of a family newly immigrated from Ethiopia. These patients showed reactivity without stimulation with an added antigen, the size of the spots being very small. This non-specific reactivity was interpreted as the reactivity of other than effector T-cell populations, most probably of NK cells. All results obtained with the B-TbIFNγ-method were acceptable.

The influence of the cut-off levels on performance characteristics of the Ly-TbSpot

Theoretical values for sensitivity, specificity, positive and negative predictive values were plotted against differential cut-off points. The analysis showed that the accuracy of the test peaked to 0.97 in the range of 24 to 48 spots/million lymphocytes (Fig. 4). Hence, by preserving our current double cut-off policy, and a conservative way of interpretation, we achieved an accuracy of 0.96. This means that any pediatric sample falling into the borderline category may most probably represent a pathological condition.

Figure 4
figure 4

Test parameters of Ly-TbSpot using differential cut-off points (n = 72). Theoretical values for sensitivity (blue), specificity (red), positive (dashed purple) and negative predictive values (dashed grey) on the y-axis were plotted against the cut-off points on the x-axis. The accuracy (black) of the tests peaked to 0.97 in the range of 24 to 48 spots/million lymphocytes. Hence, by preserving our current double cut-off policy (25 and 55 spots/million lymphocytes) we achieve an accuracy of 0.96.

We present results of a retrospective, non-blinded study of two modified IGRAs for the diagnosis of childhood TB. The modifications were aimed to avoid false positive interpretations. Because no method can guarantee 100% sensitivity and specificity, we made a pragmatic decision to offer the best possible specificity even at the cost of sensitivity. Changing of the cut-off levels of the commercial IGRA methods has been suggested by other investigators. Lee at al [14] suggested lowering the cut-off in the QFGT method. This change would have resulted in a drop of specificity from 91.6% to 87.0% which we, contrary to Lee, do not regard as a minimal loss. Arend et al [15] also recommended dropping the cut-off of the QFGT to achieve a detection rate of potentially infected persons similar to that of TST. The recommendation is not acceptable for the reason that TST is by no means a gold standard. The sensitivity of QFGT could instead be improved e.g. by using more sensitive EIA techniques. In fact, although there was high agreement between IGRAs and TST in our studies, three earlier studies found a lower agreement: two studies [16, 17] performed on pediatric samples in low risk countries observed concordant results between the IGRAs but inferior specificity of the TST, and one large cohort study of TB disease in African children [18] found lower agreement with the TST. In another cohort study of contact tracing, performed on Gambian children, the clinical sensitivity of the TST was found superior to that of the ELISPOT in diagnosing LTBI [19]. In that study, however, the discordance between the tests was not significant [19]. In a recent study from Gambia [20] comparing the new IGRAs to TST in contact tracing, ELISPOT was found more sensitive than the QFGT in the diagnosis of TB disease while equally sensitive in the diagnosis of LTBI. That study showed no significant discordance between IGRAs but the results of the TST were influenced by the BCG vaccination status. The observed discrepancies in the estimation of the three methods are most probably attributed to the differences in the research population, their vaccination status and exposure to other mycobacteria, and most importantly, in the variability of the methods' performance using variable threshold levels.

Our study raises several important issues. When performing TST, intradermal inoculation of PPD in children exposed to M. tuberculosis may produce an excessive delayed type hypersensitivity reaction. Secondly, demonstration of the test variability around the cut-off zone in real laboratory settings convinces that interpretation of immunodiagnostic methods should take into account the method imprecision and utilize a grey zone.

Conclusion

The sensitivity and specificity of IGRA methods compares well with TST but do not cause exaggerated delayed type hypersensitivity reactions.