Phase I/2005 of the interlaboratory testing project—general aspects
Coded PBMC samples from four HLA-A*0201-positive and one HLA-A*0201-negative healthy donor (D1–D5) were included in this first testing phase. The thawing procedure for PBMC samples in the test centers was not standardized and the recovery of viable cells varied greatly between 45 and 102% (mean 73%) in the 12 labs. However, the number of cells recovered was in all cases sufficient to perform the required analyses. When all the data from the tetramer staining and functional tests were combined it became clear that subjects D1 and D5 had responded to the HLA-A*0201 restricted CMV-derived peptide, consistent with their CMV seropositive-status, and that subjects D1, D2, D3, and D5 had responded to influenza. In total, each laboratory should in theory have been able to measure six positive (2× CMV and 4× influenza) responses.
Detection of antigen-specific T-cells by tetramer staining and IFNγ ELISPOT
The protocol required that all PBMC samples should be analyzed by the 12 participants for the presence of HLA-A*0201-restricted CMV-specific and influenza-specific CD8+ T-cells using centrally-prepared tetramers. The indicated frequencies of antigen-specific CD8+ cells generally represent the mean of two separate stainings with CD3 Ab/CD8 Ab/tetramer, except for centers Z1 (CD8/tetramer), Z7 (CD3/CD4/CD8/tetramer), Z5 and Z10 (one staining CD3/CD8/tetramer and one staining CD3/CD4/tetramer) and are based on the analysis and dot plots provided by each participant. As illustrated in Fig. 1, the absolute numbers of tetramer-positive T-cells were influenced by the individual decision of where to set the gates and quadrant markers for the analysis. For example, the inclusion of the subset of T lymphocytes expressing CD8 at a low density influenced the number of CD8+ and consequently the frequency of tetramer+ cells. Moreover, non-specific binding of the tetramer (as seen on the CD8-negative subset) also varied between the different laboratories. For these reasons, not only the frequencies, but also the appearance of the tetramer-positive populations was carefully examined. Two parameters were chosen for validation of “positive” results: (1) a clustered, but not diffuse, tetramer binding-population, and (2) strong intensity of tetramer staining, especially marked for the CMV-tetramer-binding population (Fig. 1). Table 1 shows: (I) the minimum, mean and maximum frequencies of antigen-specific CD8+ T-cells, (II) the results obtained from the individual centers Z1–Z12, and (III) the number and percentage of centers that detected a response. The high frequencies of CMV-specific CD8+ T-cells in donors D1 and D5 were readily detected by all participants (mean of 1 per 141 CD8 ± 113 in D1 and mean of 1 per 80 CD8 ± 24 in D5, respectively). For influenza-specific CD8+ T-cells, the results were more variable. Influenza-tetramer+ cells in donor D3 were detected by all participants with a mean frequency of one cell in 1014 CD8+ T-cells ± 355. In Donor D5, 11 of 12 laboratories detected a mean of one tetramer binding cell per 1106 CD8+ T-cells ± 508. Influenza-specific cells were less numerous in healthy subjects D1 and D2 and were only detected by five and eight laboratories, respectively. No false positive reactivity was reported by any of the participants.
Table 1 Overview of the tetramer results from phase I/2005 of the CIMT monitoring panel
Eleven laboratories analyzed the five PBMC samples for the presence of HLA-A*0201-restricted CMV-specific and influenza-specific IFNγ-producing T-cells by ELISPOT assay. Only one group (Z10) used an intracellular cytokine staining as a functional test (data not shown because no comparison with other groups possible). Table 2 shows (I) the minimum, mean and maximum frequencies of antigen-specific cells, (II) the results obtained from the individual centers Z1–Z12, and (III) the number and percentage of centers that detected each reactivity. As described in the “Materials and methods”, results of spot-forming cells per seeded PBMC were accepted as a positive reaction only when passing statistical testing and when the number of antigen-specific spots exceeded the number of spots in the background wells by at least a factor of two. IFNγ-producing cells reactive against CMV were detected by 10 of the 11 laboratories in donor D1 (mean reactivity was 1 per 1,855 PBMC ± 825) but only by 8 of 11 in donor D5 (mean reactivity was 1 per 4,405 PBMC ± 3,762). The influenza-specific T-cells present in subject D3 were detected by six laboratories, while the responses in the healthy subjects with markedly lower numbers of peripheral specific T-cells (D1, D2 and D5) were detected by three laboratories only.
Table 2 Overview of the IFNγ ELISPOT results from phase I/2005 of the CIMT monitoring panel
Subgroup analysis reveals that the number of CD8+ T-lymphocytes analyzed affects the sensitivity of the tetramer staining
Although the tetramer stainings were performed with centrally prepared reagents following set guidelines, centers were left free to select several parameters according to their own protocols, and this could have influenced the test results (see “Materials and methods”). Most of the participants used monoclonal antibodies specific for CD3 and CD8 to co-stain the cells. There were no obvious differences in the performance of the centers depending on which antibody clones, antibody combinations or cytometer were used (data not shown).
There was a high degree of variability in the number of CD8+ cells which were analyzed per staining, ranging from only 0.5 × 104 to about 19 × 104 (inter-center variation). In addition, a non-negligible intra-center variation was observed for the number of counted CD8+. We therefore analyzed each individual staining independently of the center that performed it and focused on the number of CD8+ T-cells that had been counted. For the six different antigen-specific populations detectable, a total of 68 tests was performed by the group (see Table 1). Overall, antigen-specific T-cell reactivities were reported in 82% of the tests (56/68, mean of duplicate stainings). When less than 30,000 CD8+ T-cells were counted, only 70% of all responses were found. In contrast, 89% of all responses were manifest when more than 30,000 CD8+ T-cells were counted (Fig. 2a). When antigen-specific T-cells were present at high frequency, the number of cells counted did not influence the result, because CMV-specific T-cells from donors D1 and D5 were detected irrespective of the number of CD8+ T-cells in the test. However, for the influenza-specific cells, positivity was registered in only 75% of all tests performed (36 of 48 tests). Strikingly, we observed a marked difference for the results derived from those tests involving less than 30,000 CD8+ T-cells (56% success in detection) as compared to tests performed with more than 30,000 CD8+ T-cells (84%).
In conclusion, the ability to detect antigen-specific T-cell reactivities by tetramer staining was mainly affected by the number of CD8+ T-cells stained and analyzed, especially when the antigen-specific T-cells were present at low or moderate frequencies. We therefore modified our guidelines for the tetramer assay and recommended staining at least 1 × 106 PBMC and analyzing all cells in the tube. In addition, we provided an example of how optimal cell gates and dot-plot quadrants could be selected.
ELISPOT assays are heterogeneous and require standardization
The ELISPOT analyses were performed according to 11 more or less different protocols. The most discernible differences that were observed in these protocols concerned (1) the different types of multi-screen plates, (2) the serum origin, (3) the use of duplicates, triplicates or quadruplicates, (4) the use of allogeneic APC, (5) the inclusion of a resting phase after thawing the PBMC, (6) the number of PBMC per well, (7) the type of antibodies used, (8) the type of spot-reader, and the (9) enzyme and substrate for staining of the spots. Each center also used a different plate protocol (distribution of the wells, number of replicates, control tests).
The influence of each of these parameters on the number of positive responses was studied by further analysis in which the laboratories were divided into two subgroups. As a result, several criteria were identified which could help to improve the sensitivity and comparability of detection.
All data sets (duplicates, triplicates or quadruplicates) were first analyzed by Student t test for unpaired samples (“Materials and methods”). In our panel, one center used quadruplicates, nine centers used triplicates and one center performed the ELISPOT analysis in duplicates. Due to the variety in the replicates, responses measured by duplicate wells failed to pass the Student t test more often as compared to triplicates.
Overall, the 11 centers were able to detect 50% of all possible reactivities in this panel phase (Table 2; Fig. 2b). In a subgroup of three laboratories (Z5, Z6 and Z8), an allogeneic APC population (T2 or K562-A*0201 cells) was added for binding and presentation of the synthetic peptides. The three centers that used allo-APC detected only 28% of all responses, while the other centers detected 58% of all responses.
In five laboratories (Z3, Z4, Z7, Z8, and Z9) PBMC were thawed, and then incubated in culture medium at 37°C. After this resting phase of 2–20 h, living cells were washed, counted and seeded into ELISPOT plates. Laboratories using a resting phase detected 73% of the positive reactivities (22 out of 30 potentially positive tests). No significant difference in the ability to detect antigen-specific T cells was found using shorter or longer resting-times. In contrast, the laboratories that did not use a resting procedure detected only 30% of all positives (Fig. 2b).
Finally, the number of cells seeded per well differed considerably between all participants and ranged from 1 to 6 × 105 PBMC. We divided the laboratories arbitrarily into two groups, those using either more than 4 × 105 PBMC (Z4, Z7, Z8, and Z9) or less than 4 × 105 PBMC (Z1, Z2, Z3, Z11 and Z12). The first group detected 71% of all positive samples, whereas the second group was able to detect only 43% of all positives (Fig. 2b). Centers Z5 and Z6 used a defined number of separated CD8+ T-cells in the ELISPOT and were therefore not included in this subgroup analysis.
None other of the nine depicted protocol variables had any obvious impact on the detection of specific T-cells. As a conclusion from these results, four minimum requirements were formulated for the ELISPOT protocol: (1) perform triplicates for each test antigen (2) do not use allo-APC (3) add a resting time to increase the proportion of living cells seeded and (4) use a minimum number of 4 × 105 PBMC per well.
Phase II/2006 of the interlaboratory testing project—general aspects
To formally prove that the requirements formulated for tetramer staining and ELISPOT analysis increase the ability of the participants to detect antigen-specific CD8+ T-cells and reduce the inter-center variability, we decided to repeat the analysis in a second phase of the panel, with the same participants (phase II/2006). In this round, all groups were asked to follow our modified guidelines for the tetramer- and the ELISPOT-assays.
Again, all PBMC samples were prepared and pre-tested in one central lab and peptide antigens and PE-conjugated tetramers were also provided from one source. As one investigator had meanwhile moved to another lab, we added a 13th center to the group. PBMC from seven selected healthy HLA-A*0201-positive donors and 1 HLA-A*0201-negative donor (D3) were required to be analyzed for the presence of HLA-A*0201-restricted CMV-specific T cells and for influenza-specific T-cells. The mean number of recovered cells after thawing was sufficient to perform the tests. When all the data were combined, it became clear that subjects D2, D5 and D8 possessed CMV-specific CD8+ T-cell subsets, and D1, D2, D4, D6 and D7 possessed influenza-specific CD8+ T-cells. Therefore, each laboratory could theoretically have measured eight positives (3× CMV and 5× Influenza) in this second phase.
Analysis of CD8+ T-cell tetramer binding using the new guidelines
In the second phase, a total of 104 tests were performed to detect the eight possible tetramer reactivities. Following the modified guidelines for tetramer staining, the mean number of CD8+ T-cells that were counted in each separate test increased markedly (+36%): a mean of about 49,000 CD8+ cells were analyzed in the phase I (n = 68 tests) and a mean of 67,000 CD8+ T-cells in phase II (n = 104 tests). The number of cells per test ranged from 12,000 to 467,000 CD8+. In 81% (84 of 104) of the tests >30,000 CD8+ were counted (compared to 66% of all relevant tests in the first phase). Table 3 shows (I) the minimum, mean and maximum frequencies of antigen-specific T-cells, (II) the results obtained from the individual centers Z1–Z13, and (III) the number and percentage of centers that detected each T-cell specificity. Donors D2, D5 and D8 showed very strong reactivities with the CMV-tetramer, with mean frequencies of 1/45 CD8+ T-cells, 1/37 CD8+ T-cells, and 1/19 CD8+ T-cells, respectively. All 13 laboratories were able to detect these populations (Table 3). All but one center detected the influenza-specific cells present at high frequencies in donors D6 (1/1116 CD8+ T-cells) and D7 (1/347 CD8+ T-cells). Donors D1, D2 and D4 possessed fewer specific cells (1/3,739, 1/3,573 and 1/5,278 CD8+ T-cells) which were found by 12, 9 and 9 centers, respectively. Three laboratories also reported influenza tetramer-binding CD8+ cells in D5 or D8. According to the results of the other centers as well as from the ELISPOT (see below), these stainings were considered as false positive (not shown). One center (Z13) was not able to detect any of the influenza-specific CD8+ T-cell reactivities. Finally, no tetramer+ cells were described in the HLA-A*0201-negative donor (D3).
Table 3 Overview of tetramer results from phase II/2006 of the CIMT monitoring panel
Analysis of CD8+ T-cell responses by ELISPOT following the introduction of a set of four rules
In this second phase, all laboratories performed ELISPOT analysis following local protocols, all of which conformed to the newly introduced minimum requirements. Table 4 shows (I) the minimum, mean and maximum frequencies of antigen-specific cells, (II) the results obtained from the individual centers Z1–Z13, and (III) the number and percentage of centers that detected the response. High frequency T-cell responses against CMV could readily be detected by all 13 centers in donors D5 and D8 and by 12 of 13 in donor D2. Failure of center Z4 to detect the CMV reactivity in donor D2 was due to a very high background of the medium control. The number of spots representing IFNγ-producing cells after influenza-peptide stimulation was generally lower, and consequently, the influenza-specific T-cell responses in subjects D1, D2, D4 and D6 were detected by fewer laboratories (four centers for D1, three centers for D2, two centers for D4 and ten centers for D6). The high numbers of influenza-specific T-cells present in D7 were detected by all 13 laboratories (Table 4).
Table 4 Overview of IFNγ ELISPOT results from phase II/2006 of the CIMT monitoring panel
Comparison of the results obtained in both phases
When the mean frequencies of all T-cell responses in both testing rounds were compared, it became clear that there was a difference in the distribution of reactivities (Fig. 3). In the tetramer assay, the mean T-cell frequency of the six possible positives in the first phase was 1 per 2,083 CD8+ T-cells. This value was 1 per 1,769 CD8+ T-cells for the eight possible positives in the second phase. Similarly, the mean T-cell frequency of the responses detected in IFNγ ELISPOT was 1 per 22,369 PBMC for Phase I/2005 but 1 per 14,653 PBMC for Phase II/2006. To allow a comparison of the overall performance in both phases of the panel, we therefore decided to define theoretical thresholds for high, moderate and low T-cell responses and then to compare data of the participating laboratories within these groups.
In order to define such thresholds for low, medium and high T-cell responses, we first displayed the probability of detecting each of the 14 different reactivities as a value in a coordinate system and inserted a trendline. For both the tetramer assay and the ELISPOT assay, we observed a clear correlation between the frequencies of antigen-specific T-cells and the number of participating centers that were able to detect these populations. We then calculated the theoretical frequencies at which 90% (y = 90) and 50% (y = 50) of all participants could detect a given response (Fig. 4a, b) and used these two thresholds to divide all reactivities into three distinct classes of T-cell responses (“high”, “moderate” and “low”).
For the tetramer assay, T-cell frequencies exceeding 1 per 1,200 CD8+ T-cells were therefore classified as “high”, whereas frequencies of less than 1 per 7,650 CD8+ were classified as “low” (Fig. 4a). Following the same rules for the ELISPOT assay, T-cell responses of at least one IFNγ spot per 2,850 PBMC can be considered as “high” and T-cell responses of less than one spot per 19,000 PBMC as “low” (Fig. 4b).
With these calculated assay-specific thresholds for high, moderate and low T-cell responses, we compared the results obtained in the two phases. For the tetramer assay, the ability to detect high frequency T-cells (>1 per 1,200 CD8+) did not differ in the two phases, and was not influenced by the number of CD8+ analyzed, as previously seen for each of the two phases separately (Fig. 5a). However, for moderate and low T-cell frequencies, we found that they could be successfully detected in only 54% of cases in the first phase but this improved to 77% in the second phase. Moreover, here, the number of cells counted did have an impact on the ability to detect low frequency T-cells. In the first phase, only 14% were detected when less than 30,000 CD8+ were counted, as compared to 71% when more than 30,000 CD8+ T-cells were counted. The same trend was observed in phase II/2006, but in this case 40% of assays with less than 30,000 CD8+ successfully detected the moderate to low T-cell frequencies compared to 83% counting more than 30,000 CD8+ (Fig. 5a).
We then analyzed the capacity of the laboratories to measure either high T-cell responses (>1 per 2,850 PBMC) or low to moderate T-cell responses (<1 per 2,850 PBMC) in the ELISPOT assay. This analysis was performed for two defined subgroups of participants. The first subgroup included those five centers (Z3, Z4, Z7, Z8 and Z9) that already fulfilled three or four of the requirements in the first phase of the panel. These five centers did not have to introduce any change or at least no major changes to their protocol for the repetition of the experiments in phase II. The second subgroup included the new center Z13 (led by a colleague that had been in a laboratory that only fulfilled one of four requirements in phase I) and all others that had fulfilled only one or two of the four requirements in the first phase. All laboratories in this second group had to introduce marked changes to their locally established protocols. Similar to the tetramer analysis, the new requirements were not necessary to detect antigen-specific responses among the category of high T-cell frequencies in either the first or second phases (Fig. 5b). However, applying the set of rules defined in phase I markedly improved the capacity of centers to detect the low to moderate T-cell responses. The first subgroup detected a total of 68% of the low to moderate reactivities in phase I, whereas the second subgroup detected only 20% (Fig. 5b). After harmonization of the protocols, both subgroups performed equally well. In addition, the inter-group variability in detecting positive responses was reduced in phase II (percentage of detected responses ranged from 38 to 88% with a mean of 67 ± 16%) as compared to phase I (percentage of detected responses ranged from 0 to 100% with a mean of 55 ± 33%).
Experience does not equal performance
Among the 13 centers that had participated in phase II, tetramer stainings had been performed for 1–8 years. Similarly, the experience in the ELISPOT technology varied between 1 and 10 years. For both techniques, we could not find any correlation between the years of experience and the ability to detect T-cell responses, not even among the subgroups of moderate or low T-cell responses (not shown).