Results and harmonization guidelines from two large-scale international Elispot proficiency panels conducted by the Cancer Vaccine Consortium (CVC/SVI)
- First Online:
- Cite this article as:
- Janetzki, S., Panageas, K.S., Ben-Porat, L. et al. Cancer Immunol Immunother (2008) 57: 303. doi:10.1007/s00262-007-0380-6
- 2.2k Downloads
The Cancer Vaccine Consortium of the Sabin Vaccine Institute (CVC/SVI) is conducting an ongoing large-scale immune monitoring harmonization program through its members and affiliated associations. This effort was brought to life as an external validation program by conducting an international Elispot proficiency panel with 36 laboratories in 2005, and was followed by a second panel with 29 participating laboratories in 2006 allowing for application of learnings from the first panel. Critical protocol choices, as well as standardization and validation practices among laboratories were assessed through detailed surveys. Although panel participants had to follow general guidelines in order to allow comparison of results, each laboratory was able to use its own protocols, materials and reagents. The second panel recorded an overall significantly improved performance, as measured by the ability to detect all predefined responses correctly. Protocol choices and laboratory practices, which can have a dramatic effect on the overall assay outcome, were identified and lead to the following recommendations: (A) Establish a laboratory SOP for Elispot testing procedures including (A1) a counting method for apoptotic cells for determining adequate cell dilution for plating, and (A2) overnight rest of cells prior to plating and incubation, (B) Use only pre-tested serum optimized for low background: high signal ratio, (C) Establish a laboratory SOP for plate reading including (C1) human auditing during the reading process and (C2) adequate adjustments for technical artifacts, and (D) Only allow trained personnel, which is certified per laboratory SOPs to conduct assays. Recommendations described under (A) were found to make a statistically significant difference in assay performance, while the remaining recommendations are based on practical experiences confirmed by the panel results, which could not be statistically tested. These results provide initial harmonization guidelines to optimize Elispot assay performance to the immunotherapy community. Further optimization is in process with ongoing panels.
KeywordsElispotProficiency panelValidationHarmonizationImmune monitoring
Cancer Vaccine Consortium of the Sabin Vaccine Institute
Peripheral blood mononuclear cells
Cytomegalie, Epstein Barr, and influenza virus
National Institute of Allergy and Infectious Diseases
Coefficient of variation
Standard operating procedure
Elispot is a widely used assay for immune monitoring purposes [6, 17, 25, 39]. Measuring immune responses has been accepted as an important endpoint in early clinical trial settings in order to prioritize further vaccine or other immunotherapy development [11, 22, 24, 26, 38]. Despite the overwhelming use of various immune assays for exactly that purpose, reported results are often met with skepticism, caused mainly by two reasons: (1) high variability among results from the same laboratories and/or among different laboratories, and (2) the lack of demand to report standardization, validation and training strategies as well as assay acceptance criteria by the laboratories conducting immune testing. This is surprising since the reporting of results for other clinical endpoints, e.g., side effects, has to follow strict guidelines and definitions.
To provide regulatory and sponsoring agencies with confidence that reported data are generated following necessary standards and rigor that supports product licensure.
To provide an external validation tool for individual labs.
To provide proof to patients and volunteers that necessary measures have been taken to allow successful study participation .
In contrast, immune monitoring approaches in the cancer vaccine field are more heterogeneous, based on the vast variety of vaccine design, type of cancer, and availability of antigen presenting cells. Standardization of the entire Elispot protocol across laboratories is therefore not feasible. We set out to devise a strategy to identify issues and deficiencies in current Elispot practices, and to identify common sources of assay variability within and between laboratories, with the extended goal of standardizing the identified factors in an assay harmonization effort across laboratories.
In 2005, the Cancer CVC/SVI initiated an Elispot proficiency panel program to achieve this goal. In addition to offering an external validation program, the CVC addressed the need for such strategy by comparing assay performance across the field, identifying critical protocol choices and gaining an overview of training and validation practices among participating laboratories.
For this program, predefined PBMCs from four donors with different ranges of reactivity against two peptide pools were sent to participants for Elispot testing. Laboratories had to further provide cell recovery and viability data, as well as respond to surveys describing their protocol choices and training and validation status.
In response to the survey results, the CVC/SVI established requirements for laboratories to participate in future proficiency panels, which included the existence of a Standard Operating Procedure (SOP) prior to joining the program. Further, individualized assay performance assessment was offered to all laboratories, together with suggestions for implementation of protocol optimization steps.
Materials and methods
Participants and organizational setup
All participants were members of the Cancer Vaccine Consortium or its affiliated institutions, the Ludwig Institute for Cancer Research (LICR) and the Association for Immunotherapy of Cancer (C-IMT). Laboratories were located in ten countries (Australia, Belgium, Canada, Germany, Italy, Japan, France, Switzerland, UK, and USA). Each laboratory received an individual lab ID number. Panel leadership was provided by a scientific leader experienced in Elispot, in collaboration with the CVC Executive office. SeraCare BioServices, Gaithersburg, MD, served as central laboratory, providing cells, pretesting and shipping services, as well as logistical services like blinding of panelists. IDs were not revealed to panel leader, CVC or statistician during the panel.
Thirty-six laboratories including the central lab participated in panel 1, 29 including the central lab in panel 2. Twenty-three laboratories participated in both panels. Six new panelists were added to the second testing round. Thirteen dropped out after the first panel. Main reasons for drop out were switch of assay priorities and not meeting criteria for panel participation. Various groups stated that one time participation fulfilled their need for external validation.
PBMCs and peptides
PBMCs from healthy donors were obtained from a commercial donor bank, manufactured and processed under GMP conditions using established Standard Operating Procedures at SeraCare. PBMCs were frozen using a rate controlled freezer and transferred to the vapor phase of liquid nitrogen. A three lot validation study was performed  in which the following was validated: cell separation procedure, freezing media, dispensing effect on function, freezing procedure, and shipping on dry ice. It was demonstrated that functionality and viability were maintained throughout the procedure. In addition, Elispot values from fresh and frozen PBMCs, shipped on dry ice, were nearly equivalent.
Each vial of PBMCs contained enough cells to ensure a recovery of 10 million cells or more under Seracare’s SOP.
PBMCs were pretested at the central laboratory for reactivity against the CEF  and CMV pp65 peptide pools . The CEF peptide pool was obtained through the NIH AIDS Research and Reference Reagent Program, Division of AIDS, NIAID, NIH. It contains 32 8-11mers known to elicit CD8-restricted responses, and serves as a widely used control in IFNγ Elispot assays . The CMV pp65 peptide pool was a generous gift of the NIAID and Becton Dickenson. This pool consists of 135 15mers overlapping by 11 amino acids, and has been shown to elicit CD8- and CD4-restricted responses . PBMCs were selected for no, low, medium and strong responses against both peptide pools, and repeatedly Elispot-tested at Seracare in order to confirm responder status. Response definition was set arbitrarily for the spot number/well (200,000 PBMC): no responder: average median/panel 1 + 2 = 1 (CMV and CEF); low responder: average median/panel 1 + 2 = 18 (CMV) and 40 (CEF); medium responder: average median/panel 1 + 2 = 98 (CMV) and 127 (CEF); and high responder: average median/panel 1 + 2 = 396 (CMV) and 398 (CEF).
Peptide pools were resuspended in DMSO and further diluted with PBS to a final concentration of 20 μg/ml. Aliquots of 150 μl of peptide pool were prepared for final shipment to participants. Corresponding PBS/DMSO aliquots for medium controls were also prepared. Participants were blinded to the content of these vials, which were labeled as “Reagent 1, 2 or 3”.
All cells and reagents sent to participants in both panels were obtained from the same batches.
Cells and reagent vials were shipped to all participants for overnight delivery on sufficient dry ice for 48 h. Shipping was performed by Seracare under their existing SOPs.
Participants received a plate layout template and instructions for a direct IFNγ Elispot assay which had to be performed in one Elispot plate. Each donor was tested in six replicates against three reagents (medium, CEF and CMV peptide pool). Further, 24 wells were tested for the occurrence of false positive spots by the addition of T cell medium only. About 200,000 PBMC/well were tested against 1 μg/ml peptide pool or the equivalent amount of PBS/DMSO. All other protocol choices were left to the participants, including choices about: Elispot plate, antibodies, spot development, use of DNAse, resting of cells, T cell serum, cell counting and spot counting method. All plates were reevaluated at ZellNet Consulting (Fort Lee, NJ) with a KS Elispot system (Carl Zeiss, Thornwood, NY), software KS Elispot 4.7 (panel 1) and KS Elispot 4.9 (panel 2) in blinded fashion. Each well in each plate was audited.
Since the focus of this study was on assay performance and protocol evaluation and not on the definition of a positive response, we prospectively defined the ability to detect a response (independent of magnitude) by using the following “empirical” method: the antigen-specific spot counts per 2 × 105 PBMCs had to be >10, and at least 3× as high as the background reactivity. Similar approaches have been described elsewhere [5, 8].
The following parameters were calculated for the overall panel and the individual participant’s performance, using either lab-specific counts or central recounts: the mean, standard deviation, and coefficient of variation (CV), the median, minimum, and maximum spot counts for each donor and reagent and the media only wells. Box plots were used to illustrate the distribution of spot counts across the panel per given test condition. Further, individual results were represented as box plots comparing lab counts with recounts, central lab results and overall panel results. For panel 2, results from repeating laboratories were also compared in box plot format for each donor and condition. The Fisher’s Exact test was used to compare the proportions of laboratories which missed to detect responses in each panel. For the comparison of recovery and viability, the Student’s t test was applied.
In the first proficiency panel, shipping and Elispot testing among 36 laboratories from 9 countries were conducted without delays. The success of this panel demonstrates the feasibility of such large international studies, the biggest of such format as of today, under the described organizational setup. The second panel with 29 participating laboratories from 6 countries followed the approach of panel 1. However, customs delays of dry ice shipments to some international sites required repeated shipments of cells and antigens to these destinations. Based on this experience, the use of dryshippers with liquid nitrogen is being implemented for international destinations in the third CVC panel round in 2007.
Recovery and viability of PBMCs in panel 1 and 2
Cell recovery and viability in both proficiency panels
Cell recovery in 106 cells
Interestingly, only 4/10 laboratories in panel 1, and 4/7 laboratories in panel 2 with recoveries below 8 million cells were from international locations. Similar, only 1/5 laboratories in panel 1 and 2/5 in panel 2 reporting viabilities less than 70% belonged to international sites. This clearly demonstrates that location for dry ice shipment had no effect on overall cell recovery and viability.
We also investigated whether low (<8 million) or very high (>20 million) cell recovery had an influence on spot counts, assuming that these were potential erroneous cell counts, leading to too low or too high cell dilutions, respectively, what in turn would lead to too high (in case of underestimating cell number) or too low (in case of overestimating cell number) spots counts. However, except for a few sporadic incidents, there was no correlation between cell counts and spot counts (data not shown) in either direction. Only one laboratory with low recovery and low viability was found to have peptide pool-specific spot counts for all donors much below the panel median.
The use of an automated cell counter (Guava) did not reveal any trend in recovery for this group (recovery ranged from 6.7 to 25.1 million cells), nor a difference compared to the overall panel recovery data. In contrast, the mean cell viability per donor measured by Guava counter users was significantly lower than the overall cell viability in users of a hemocytometer (P < 0.01, see table in Fig. 5).
Elispot results in panel 1
All 36 laboratories completed testing of all 4 donors against medium, CEF and CMV peptide pool. Spot appearance and size as well as occurrence of artifacts differed dramatically among laboratories (not shown). Four outlier laboratories were identified, which detected less than half of the responses correctly. In all four cases, detected responses were well below the panel median, and often, there was high background reactivity (up to 270 spots/well) in medium controls. No obvious protocol choices could be identified which could have been responsible for the suboptimal performance. One out of the four laboratories had little experience at the date of the panel. Another group reported a less experienced scientist performing the assay. A third outlier repeated the assay, and was able to perform adequately. No feed back was available from the fourth group.
A similar scenario was found for response detection against the CEF peptide pool. Thirteen laboratories missed to detect the response, one of which due to high reactivity against medium, and one due to inaccurate spot counting.
An interesting observation was that 23 groups reported false positive spots in a range of 1–26 spots/well. Reevaluation revealed that the actual number of groups with false positive spots was lower (12), and the false positive spot number range per well fell between 0 and 8.
Survey results about protocol
During the first panel, participants had to provide information about their protocol choices: plates, potential prewetting of PVDF, antibodies, enzyme, substrate, use of DNAse during PBMC thawing, resting of cells, serum used, cell counting, and plate reader. There was a wide range of protocol choices across the panel participants. The most common choices for the parameters listed above were as follows: use of PVDF plates (64%) prewetted with Ethanol (52%), coated with Mabtech antibodies (67%) at 0.5–0.75 μg/well (33%); spot development with HRP (53%) from Vector Laboratories (33%) using AEC (44%) from Sigma (42%); no use of DNAse when thawing cells (83%), and no resting period for cells prior to the assay (53%); use of human serum (64%); cell counting with trypan blue exclusion using a hemocytometer (78%); and plate evaluation with a Zeiss reader (36%).
There were some clear trends for international sites with preferred use of the AP/BCIP/NBT development system (83 via 29% in the US), the use of nitrocellulose plates (75 via 21% in the US), and the use of Mabtech antibodies (83 via 58% in the US). On the other hand, 7/8 laboratories using an automated cell counter were located in the US.
Survey results about validation and training practices
During the lively discussion of the results of panel 1 and its protocol survey at the Annual CVC meeting in Alexandria in November 2005, it was suggested that the level of experience, standardization and validation of participating laboratories might have been the cause for the variability and performances observed. In response, we conducted a survey among panelists, in which 30 laboratories participated. As expected, the experience and Elispot usage varied significantly. Some laboratories had the Elispot assay established less than one year before panel testing, whereas others used the assay for more than 10 years. The experience of the actual performer of the panel assay also varied widely.
Interestingly, even though 2/3 of participants reported to have specific training guidelines for new Elispot performers in place, more than 50% never or rarely checked on the scientist’s performance after the initial training.
Almost all laboratories indicated that they use an SOP that had been at least partially qualified and/or validated. Validation tools and strategies varied widely. Only 12 groups monitored variability, whereas 23 reported the use of external controls of some kind (e.g., T cell lines, predefined PBMC, parallel tetramer testing).
Thirteen groups were found to have some kind of criteria implemented for assay acceptance. Among the 20 different criteria reported, not one was described by more than one lab.
Mirroring these survey results, 20 laboratories believed that they need to implement more validation steps. All except one group expressed their strong interest in published guidelines for validation and training strategies for Elispot.
Elispot results in panel 2
Based on the experience from the first panel, acceptance criteria for participation in the program were redefined. Only laboratories with established SOPs were accepted. The second panel was repeated with identical experimental setup as the first panel, and with the same batches of PBMCs, peptides and control reagents. Twenty-nine laboratories participated including the central lab. This time, there were no outlier performers identified. The number of groups that did not detect the weak responder dropped dramatically (P < 0.01) from 47% (17/36 labs) in panel 1 to 14% (4/29) in panel 2 (Fig. 2). The overall panel median for the CMV-response in the weak responder increased from 14 spots in panel 1 to 21 spots in panel 2, and for the CEF response from 30 to 51, respectively.
Of the 23 labs repeating the panel, 8 had changed their protocol before panel 2, 3 of which as an immediate response to results from the first panel. One outlier lab from panel 1 participated in panel 2, and improved its performance by detecting all responses correctly as per reevaluation counts. Only their lab-specific evaluation did not detect the low CMV responder. Overall, 4/23 panel-repeating labs (17%) did not detect the low CMV-responder, 3 of which also did not detect this response in panel 1. About 10/23 groups missed the low CMV responder in panel 1, but 7 of these laboratories were able to detect it in panel 2. Only one repeater detected the low CMV response in panel 1, but not panel 2. This is a clear performance improvement for that group (47% missed this response in panel 1), and highlights the usefulness of multiple participation in panel testing as an external training program.
We ran an in-depth analysis of the results and previous survey responses, where available, from participants who missed the weak responder, as well as from laboratories with marginal detection of response, including personal communication. We were able to narrow down the possible sources for these performances. Two laboratories missed responses due to inaccurate evaluation, during which they either included artifacts into spot counts or simply did not count the majority of true spots, as central reevaluation revealed. One laboratory did not follow the assay guidelines. The majority of laboratories, however, followed common protocol choices, but had either very low response detection across all donors and antigens, or detected very high background reactivity in some or all donors. This pattern pointed to serum as the possible cause for suppressed reactivity or non-specific stimulation. Three of these laboratories shared with the CVC that retesting their serum choice indicated that they had worked with a suboptimal serum during the panel; and that they now successfully introduced a different serum/medium to their protocol with improved spot counts. Serum choices included human AB serum, FCS, FBS, and various serum-free media. There was no difference in assay performance detectable between these groups.
The CVC conducted two large international Elispot proficiency panels in which participants tested four batches of predefined donor PBMC for CEF and CMV peptide pool reactivity. Common guidelines like number of cells per well and amount of antigen had to be followed, in order to allow result comparison and reduce variability due to the known influence of cell numbers plated [2, 14]. A surprising finding from the first panel was that almost half of the participants were not able to detect the weak responder. The initial protocol survey did not allow the detection of common sources for this sub-optimal performance. In some cases, laboratories with identical protocol choices performed very differently, whereas others with distinct protocols had almost identical spot counts (Fig. 4). This observation supports the premise that many common Elispot materials and reagents (e.g., plates, antibodies, spot development reagents) perform equally or similarly well; and that there are other factors which influence the outcome of the Elispot assay.
Various laboratories missed the response detection by inaccurately evaluating their Elispot plate, independent of the reader used, as reevaluation revealed (Fig. 3). This observation is in contrast to results from the first NIAID proficiency panel, which describes good agreement in spot counts from participants and experienced, independent centers . Operator-dependent variability in Elispot evaluation results is a known phenomenon . Despite the availability of high resolution readers and software features for automated spot gating and other potentially helpful options, it is essential to employ well trained operators for spot counting, to audit all plates, and to implement changes of reading parameter in cases when well and spot appearances differ from the overall assay, typically for technical reasons. For that, SOPs used for plate evaluation might require revision. The use of available certification and training services can be helpful.
Cell counting is a protocol step known for introducing variability. Because of the wide range in PBMC recovery, we investigated whether cell counts could have been potentially erroneous, which would lead to wrong final cell dilutions, and can therefore lead to too high or low spot counts. Despite large differences in cell recovery (from 5.5 to 34.4 million cells per vial in panel 1), we did not find a correlation between cell recovery and spot counts. Very low (<8 million) and very high (>20 million) cell counts were consistently found for all donors among the same few laboratories. Both, hemocytometers and automated counters were used in those laboratories. Various automated cell counters have been introduced to the market, which do not only offer automated spot counting features, but also the discrimination of apoptotic, viable and dead cells. The use of such systems can potentially decrease variability in cell counts, and most importantly increase the accuracy of viable cell counts [4, 21]. Seven laboratories in panel 1 used the Guava counter, which allows the discrimination of apoptotic cells. Even though our panel did not reveal a difference in cell counts between Guava and hemocytometer users, it could be demonstrated that the overall cell viability reported by Guava users was significantly lower. This could likely be attributed to the ability to discriminate apoptotic cells with this method. Smith et al.  recently reported the usefulness of apoptosis acceptance criteria that allowed the separation of PBMC samples by their ability to respond to an antigenic stimulus or not.
The introduction of a resting period for thawed cells is known to be advantageous since apoptotic cells die during the resting period, and final dilutions for the assay are based on a more homogenous population of viable cells [16, 18]. In contrast, the addition of a mixture of viable and apoptotic cells, which are prone to die during assay incubation time, directly after thawing leads to lower spot counts. During panel 1, the central laboratory performed a side-by-side comparison of the influence of a 2 and 20 h resting period on final spot counts, and demonstrated that a 20 h resting period yielded significantly higher counts (Fig. 6). Proficiency testing results from the C-IMT also support the introduction of a cell resting period for Elispot assays . Even though almost half of participants in panel 1 let cells rest before addition to the plate, there was no clear correlation to the magnitude of peptide-specific spot counts. This might have been due to the variation of resting times between 0.5 and 20 h, and other protocol variables, which included the actual resting protocol. Factors like serum and serum concentration, cell concentration, and actual storage condition (e.g., tissue culture flasks or plates can lead to cell adherence and therefore loss of professional antigen-presenting cells) are known to influence the success of cell resting.
There are no scientific studies published about the effect of serum choices on immune assay results, one of the best known “secrets” in immunology . It is critical to choose serum that supports low background reactivity, but strong signals. The leading choice in this panel to use human AB serum reflects the historic preference for human immune assays. Each serum batch, however, is unique in its ability to support optimal assay resolution, and may potentially contain mitogenic or immune suppressive factors. There was some anecdotal evidence that the serum choice among our panelists was responsible for suboptimal performance. Interestingly, six laboratories preferred to work with serum-free medium. None of these groups observed high background reactivity, but two failed to detect the weak responder.
A survey conducted among participating laboratories shed light on validation and training practices. The wide experience range of participating laboratories, in combination with various levels and approaches to validation and training, correlated with the overall high variability in panel results. In response, new criteria for panel participation were introduced for panel 2. The outlier lab number decreased from 4 to 0 in the new panel, however 3/4 outliers from panel 1 did not participate in panel 2. The percentage of laboratories that did not detect the weak responder decreased significantly from 47 to 14% (P < 0.01). This improvement might have been partially due to the stricter participant selection in panel 2. However, 7/10 labs repeating the panel improved their performance by correctly detecting the low CMV responder in panel 2, while missed in panel 1. A striking finding was that 2/3 of all laboratories stated that they believed they needed to implement more validation. And all but one group expressed their wish for published guidelines for Elispot assay validation and training.
The outcomes from these two proficiency panels first provide Initial Elispot Harmonization Guidelines for Optimizing Assay Performance (Fig. 1) that can fulfill this need and may provide—if implemented widely—the grounds for substantial improvement of assay utility for research applications and development of immune therapies. This can be implemented in accordance with assay recommendations made for cancer immunotherapy clinical trials by the Cancer Vaccine Clinical Trial Working Group . Further optimization is aimed for through ongoing proficiency panel work conducted by the CVC.
Validation of assays is now a requirement for all endpoint parameters in clinical trials . There are an increasing number of publications available describing validated Elispot assays [1, 19, 25, 34]. These papers contain valuable scientific information, but only limited referral to FDA regulations. The FDA guidelines for validation of analytical procedures  describe validation as the process of determining the suitability of a given methodology for providing useful analytical data, which consists of analyzing or verifying the eight or nine assay parameters as described in the US pharmacopeia or the ICH guidelines [13, 37]. Only few publications address validation of bioassays and Elispot in FDA terms [10, 16, 29, 36]. And even these few publications give only limited advice on how to validate the Elispot assay in a given laboratory setting, not to mention specific training guidelines.
Furthermore, acceptance criteria for assay performance were only used by a limited number of laboratories, and each criterion was unique for the laboratory that used it.
These observations should be a wake-up call for the immune monitoring community, which does not only include the cancer vaccine field, but also the infectious disease and autoimmunity field and others. General assay practices for the detection of antigen-specific T cells are comparable across all fields. The CVC as part of the Sabin Vaccine Institute is intending to develop and tighten collaborations with groups from other research and vaccine development areas. Published documents with specific criteria for Elispot assay validation, assay acceptance criteria and training guidelines will be most valuable for the immune monitoring field, and are now being established as CVC guidelines as a result of the described studies. Continuous external validation programs need to be a part of these efforts in order to check upon the success of inter-laboratory harmonization including assay optimization, standardization and validation as well as of laboratory-specific implementation of guidelines and protocol recommendations. These efforts are essential to establish the Elispot assay and other immune assays as standard monitoring tools for clinical trials.
Special thanks to C. Wrotnowski for his work in initializing the proficiency panel program of the CVC. W. M. Kast is the holder of the Walter A. Richter Cancer Research Chair and is supported by the Beckman Center for Immune Monitoring.