Introduction

Tregs play a key role in the regulation of self-tolerance and the maintenance of tissue homeostasis. Several human diseases such as autoimmune and immunodeficient conditions, chronic infections, and cancer have been associated with alterations in Treg numbers or function, and these alterations may contribute to disease progression and impact patient survival [13]. In cancer patients, it is well established that accumulation of Tregs is associated with tumor progression, poor prognosis, and the suppression of anti-tumor immune effector functions. Treg-mediated immunosuppression is therefore considered a major obstacle for successful cancer immunotherapy [46]. Given their potential to affect the outcome of immunotherapy trials, Tregs are being studied extensively in this context. The multitude of Treg definitions in the reported studies and the lack of functional Treg testing in immunomonitoring of clinical trials, however, make correct interpretation of data and comparisons between studies difficult, especially since knowledge of overlap between the identified Treg populations is missing and the methods to detect these cells differ per laboratory. As a result, blurred pictures emerge with respect to associations between clinical outcome and Tregs [7]. So far, Tregs have been identified through a number of different (combinations of) markers including CD4pos, Foxp3pos/hi, CD25pos/hi, CD127neg/low, CTLA-4pos, CD45RApos/neg, Heliospos, CD39pos, and CD73pos/neg using several different gating strategies [815]. The latter may form an important addition to misinterpretation of data sets since differences in gating strategies were found to be the biggest source for interassay variation in flow cytometry-based intracellular cytokine staining (ICS) assays [16, 17]. Similarly, a lack of adequate controls to guide the settings of gates may add another level of complexity to the analysis of Tregs.

To address these issues, the CIP organized a workshop on October 29, 2013 on the detection and functional testing of Tregs. This workshop, which hosted 40 researchers from seven countries in Europe and the USA, brought together leading experts in the field to (1) understand the state of the art of Treg research and to (2) define the most appropriate assays/markers to measure, quantify, and functionally assess Tregs within patient samples. As it became apparent during the workshop that a multitude of markers and combinations thereof is currently being used by the participants, a rationally composed ranking list of “Treg markers” was generated by the participants in the follow-up of the meeting. The preparation of this Treg marker list, subsequent data interpretation of the experiments performed at the LUMC, and subsequent discussions about and approval of the final conclusions were done through a series of circulating emails. Subsequently, the proposed Treg markers were tested in order to get insight into the overlap/differences between the most frequently used Treg definitions and their utility for Treg detection in various human tissues. This led to a context-dependent [i.e., peripheral blood/tumor/lymph node (LN)] essential marker set and robust gating strategy for the analysis of Tregs by flow cytometry.

Materials and methods

Cell samples

We acknowledge the concept of the minimal information about T cell assays (MIATA) reporting framework for human T cell assays [18]. Venous blood samples of healthy donors (HD) and recurrent ovarian cancer (OvCa) patients undergoing chemo-immunotherapeutic treatment (EM Dijkgraaf et al. submitted for publication) were drawn into sodium heparin collection tubes (Greiner Bio-one, Alphen a/d Rijn, the Netherlands) after signing informed consent. PBMCs were isolated using Ficoll (LUMC pharmacy, Leiden, the Netherlands) density gradient centrifugation, washed with PBS (B. Braun, Melsungen, Germany), cryopreserved in 90 % fetal calf serum (FCS; PAA Laboratories, Pasching, Austria) and 10 % DMSO (Sigma-Aldrich, St. Louis, MO, USA), and stored in the vapor phase of liquid nitrogen until further use [19]. TDLN and tumor samples were obtained from cervical cancer patients (CxCa) within the CIRCLE study after signed informed consent. The CIRCLE study investigates cellular immunity against HPV in HPV-induced (pre)malignant lesions and was approved by the Medical Ethical Committee of the LUMC [20]. Single-cell suspensions were prepared from TDLN and tumor samples using collagenase/DNase digestion or gentle MACS procedure, respectively. First, TDLN and tumor samples were cut into small pieces. Single-cell suspensions were prepared by incubating the TDLN pieces with 250 U/ml collagenase D (Roche, Almere, the Netherland) and 50 µg/ml DNase I (Roche) for 1 h at 37 °C, after which the TDLN was put through a cell strainer [21]. Single-cell suspensions of tumor samples were prepared by incubating the tumor pieces for half an hour at 37 °C in IMDM/10 % human AB serum (Greiner) supplemented with 50 µg/ml gentamycin (Life technologies, Bleiswijk, the Netherlands), 25 µg/ml Fungizone (Life Technologies), 10 % penicillin/streptomycin (Sigma), 1 mg/ml collagenase D, and 50 µg/ml DNAse I (dissociation mix), followed by gentleMACS dissociation procedure according to the manufacturers’ instructions. Next, cells were frozen and stored as above. The handling and storage of the PBMC, TDLN, and tumor samples were done according to the standard operation procedures (SOP) of the department of Clinical Oncology at the LUMC by trained personnel. The use of the above-mentioned patient materials was approved by the Medical Ethics Committee Leiden in agreement with the Dutch law for medical research involving humans.

Treg enumeration by flow cytometry

The cryopreserved cell samples were thawed according to SOPs and as described before [19], and Treg subsets were assessed by flow cytometry staining. To this end, one million PBMCs or ~250,000–750,000 TDLN or tumor sample cells was used per condition. Since it has been described that Foxp3 staining can be highly variable and depend on the choice of antibody (clone), buffer, and/or fluorochrome [2224] and the performance of a specific antibody is optimized by the manufacturer using their own permeabilization procedures, optimal Foxp3 staining was determined first. We selected four different Foxp3 antibodies on the basis of in-house availability, compatibility with the rest of our panel and with the LSR Fortessa optical configuration, and two different intranuclear staining kits. Optimal staining was determined by the analysis of the percentage of positive cells and at the strength of the positive signal (compared to the negative fluorescence minus one (FMO) signal). Antibodies and intranuclear staining kits used for Foxp3 staining setup were AF700-labeled Foxp3 (clone PCH101, eBiosciences), PE-labeled Foxp3 (clone PCH101, eBiosciences, and clone 206D, R&D systems), PE-CF594-labeled Foxp3 (clone 259D/C7, BD), AmCyan-labeled CD3 (clone SK7, BD), V500-labeled CD3 (clone UCHT1, BD), PE-CF594- or AF700-labeled CD4 (both clone RPA-T4, BD), PE-CY7-labeled CD25 (clone 2A3, BD), BV650-labeled CD127 (clone HIL-7R-M21, BD), the Foxp3/transcription factor staining buffer set (eBiosciences), and the BD Pharmingen Transcription Factor Buffer set (BD). Cell surface antibody staining was performed in PBS/0.5 % BSA/0.02 % sodium azide (PBA) buffer for 30 min at 4 °C. Intranuclear Foxp3 staining was conducted with the BD or eBiosciences Transcription Factor Buffer sets according to the manufacturers’ protocol. Analysis revealed that Foxp3 could be detected with all used clones when using the eBiosciences kit. Yet, staining intensity (and thus discrimination between negative and positive) was lower with the PCH101 clones when compared with the 206D (PE) clone (Supplementary figure 1a–c), which may be due to fluorochrome choice. Staining pattern and positive-to-negative signal ratio [i.e., staining index (SI)] of the 259D/C7 (PE-CF594) clone were most optimal with the BD TF kit (not shown) and were comparable to the staining pattern of the 206D clone using this kit, indicating that both antibodies could be used in our Treg panel (Supplementary figure 1d–f). After selection of the best Foxp3 antibody and intranuclear staining buffer set, all additional antibodies in the final panel were titrated, and spillover profiles were generated to ascertain that there was no spectral overlap of the selected antibodies into the secondary detectors. Optimal antibody concentrations were determined based on the following criteria: (a) frequency and (b) highest SI (positive mean divided by negative mean), and spillover profiles were generated as described by Murdoch et al. [25]. Antibodies and kits used in the final panel were V500-labeled CD3 (clone UCHT1, BD), AF700-labeled CD4 (clone RPA-T4, BD), PE-CY7-labeled CD25 (clone 2A3, BD), BV650-labeled CD127 (clone HIL-7R-M21, BD), APC-H7-labeled CD45RA (clone HI100, BD), PerCP-Cy5.5-labeled CD8 (clone SK1, BD), PE-CF594-labeled Foxp3 (clone 259D/C7, BD), BV421-labeled CTLA-4 (clone BNI3, BD), FITC-labeled Ki67 (clone 20Raj1, eBiosciences), APC-labeled Helios (clone 22F6, Biolegend), PE-labeled CD39 (clone ebioA1, eBiosciences), LIVE-DEAD® Fixable yellow dead cell stain kit (Q-dot585, Life technologies), and the BD Pharmingen Transcription Factor Buffer set. Stained cells were acquired on a LSR Fortessa (BD) and analyzed using DIVA software version 6.2. Events collected were generally >200,000 per sample, except for one tumor-infiltrating lymphocyte (TIL) sample (~35,000 cells). In the latter, still adequate numbers (~400) of Tregs could be detected.

Treg definitions and gating strategies

Tregs were analyzed according to three commonly used Treg definitions in the literature: (1) the CD25posCD127lowFoxp3pos subset [definition 1 (def.1)] [9, 10], (2) the Foxp3posHeliospos Treg subset (def.2) [12, 26], and (3) the Foxp3hiCD45RAneg activated Treg (aTreg) and Foxp3intCD45RApos naïve Treg (nTreg) subsets (def.3) [8, 11]. Gating for CD25 and CD127 (def.1), Foxp3 and Helios (def.2), and Foxp3 and CD45RA (def.3) Tregs was done on CD3posCD4neg (i.e., CD8pos) T cells and CD3neg lymphocytes, respectively, and subsequently applied to CD3posCD4pos T cells (see also supplementary figure 2a, 3a, and 5a). Percentage of def.1, def.2, or def.3 Tregs is given as percentage within the CD4pos population.

Statistical analysis

Nonparametric (Wilcoxon signed-rank or Mann–Whitney test for two samples and Friedman or Kruskal–Wallis with Dunn’s multiple comparison test for multiple samples) and parametric (paired or unpaired t test for two samples or RM one-way ANOVA or ordinary one-way ANOVA with Tukey’s multiple comparison test for multiple samples) tests were performed as appropriate. All statistical tests were performed at the 0.05 significance level, and 95 % confidence intervals were two-sided intervals. For survival analysis, the OvCa patients undergoing chemo-immunotherapeutic therapy were grouped into two groups according to the median (i.e., grouped into below or above the median of the total group for each parameter), after which survival was tested using Kaplan–Meier method, and statistical significance of the survival distribution was analyzed by log-rank testing. Statistical analyses were performed using SPSS for Windows version 20.0 (IBM, USA) and GraphPad Prism 6.02 (San Diego, USA).

Results

Generation of a rationally ranked Treg marker list

During the CIP workshop, a number of Treg analysis methods were presented. These analyses were discussed, a number of questions were formulated, and during the follow-up of the meeting, a rationally composed ranking list of “Treg markers” was generated. All markers suggested, and the rationale to use them is given in Table 1. To test these markers and get insight into the overlap/differences between the most frequently used human Treg definitions, we included markers 1–8, 10, and 11 for direct ex vivo analysis of peripheral blood samples from six HD and OvCa patients, and LN and tumor samples obtained from CxCa patients. Markers were included based on the number of participants opting for inclusion of the marker and/or their known association with Tregs. LAP/GARP (number 9) was excluded as this marker is only expressed >24 h following in vitro activation.

Table 1 Treg marker list generated after inquiry among workshop participants

Analysis of Tregs according to commonly used Treg definitions

Tregs were analyzed according to three commonly used Treg definitions in the literature [812, 26].

Definition 1: CD25posCD127lowFoxp3pos Tregs

Figure 1a shows the expression of the different markers in def.1 Tregs. The gating strategy for the CD25posCD127lowFoxp3pos def.1 Treg subset is given for a representative HD in supplementary Fig. 2a. Cells expressing Foxp3 comprised 78.7 % (range 70.5–85.1 %) of the CD25posCD127low subpopulation. Due to variability in CD127 expression (Supplementary figure 2b, c), enumerating def.1 Tregs solely based on CD25 and CD127 is highly variable between HD and most likely leads to an overestimation of the number of Tregs (mean 17.6 %, range 7.2–30.4 %). Inclusion of Foxp3 resulted in less variation in the percentage of def.1 Tregs (mean 6.9 %, range 4.6–8.8 %) as would be expected among a group of HD, suggesting that simultaneous staining with CD25, CD127, and Foxp3 is needed for reliable measurement of def.1 Tregs. Further characterization of the CD25posCD127lowFoxp3pos subset revealed that 75 % of these cells were Helios positive (Fig. 1a). Moreover, the majority of CTLA-4 and Ki67 expressing CD4pos T cells were found in the CD25posCD127lowFoxp3pos population (data not shown). These observations add to the notion that bona fide Tregs are detected when the CD25posCD127lowFoxp3pos def.1 subset definition for Treg enumeration is used.

Fig. 1
figure 1

CD25posCD127lowFoxp3pos def.1, Foxp3posHeliospos def.2, and Foxp3hiCD45RAneg def.3 aTregs express high levels of Treg-associated markers, suggesting that they are bona fide Tregs. Phenotypic characterization of def.1, def.2, and def.3 Tregs was performed by flow cytometry. Gating of the three different Treg definitions was performed as described in supplementary figs. 2a, 3a, and 5a. Expression of the Treg-associated markers Helios, CD45RA, CTLA4, Ki67, and CD39 is depicted for a representative healthy donor (HD; left) and multiple HD (right; Helios/CD45RA/CTLA4/Ki67 for six and CD39 for two HD) for a CD25posCD127lowFoxp3pos def.1 Tregs, b Foxp3posHeliosneg and Foxp3posHeliospos def.2 Tregs, and c Foxp3hiCD45RAneg def.3 aTregs, Foxp3intCD45RApos def.3 nTregs, and Foxp3intCD45RAneg def.3 non-Tregs. Percentage Helios/CD45RA/CTLA4/Ki67/CD39 expression is given as percentage of the designated population in the upper right quadrant in the FACS plot for the representative HD (left) and as mean percentage for the six HD (right)

Fig. 2
figure 2

Treg enumeration based solely on Foxp3 and Helios (def.2) or Foxp3 and CD45RA (def.3) led to an underestimation of CD25posCD127lowFoxp3pos def.1 Tregs through exclusion of def.1 Treg cells in the Foxp3posHeliosneg (def.2) or Foxp3intCD45RAneg non-Treg (def.3) populations. Overlap between the three most commonly used Treg definitions (def.1, def.2, and def.3) is given for a a representative HD and b, c six HDs. a Distribution of def.1 Tregs is shown in def.2 and def.3 populations (left), of def.2 Tregs is shown in def.1 and def.3 populations (middle), and of def.3 Tregs is shown in def.1 and def.2 populations (right). Percentage of Tregs analyzed via def.1, def.2, or a combination thereof b and via def.1, def.3, or a combination thereof c is depicted as percentage of CD4pos T cells. Overlap between the designated populations is calculated in relation to def.1 Tregs (set at 100) and given in the bar graph for each population

Definition 2: Foxp3posHeliospos Tregs

The gating strategy for the Foxp3posHeliospos def.2 Treg subset is given for a representative HD in supplementary figure 3a. Analysis revealed that 5.6 % of CD4pos T cells is Foxp3posHeliospos (range 4.1–7.1 %), and Foxp3posHeliosneg cells accounted for 2.9 % (range 1.9–4.4 %) of CD4pos T cells. Interestingly, Foxp3 expression of Foxp3posHeliosneg cells was significantly lower than that of Foxp3posHeliospos cells (Supplementary figure 3b, c). Further characterization of the def.2 Treg subsets revealed that the majority of the Foxp3posHeliospos cells (mean 88 %, range 84.7–90.7 %) were found inside the CD25posCD127low def.1 Treg subset (Supplementary figure 4). Moreover, 64 % of Foxp3posHeliosneg cells (range 52.0–72.9 %) could also found within that CD25posCD127low gate. Expression levels of CTLA4 and CD45RA were found similar in Foxp3posHeliosneg and Foxp3posHeliospos cells (Fig. 1b). Together, this indicates that although probably polluted with Foxp3pos activated effector T cells, the population of Foxp3posHeliosneg cells also contain considerable amounts of Tregs according to definition 1. Interestingly, Foxp3posHeliospos cells expressed significantly more Ki67 compared with Foxp3posHeliosneg cells, suggesting that Foxp3posHeliospos cells, which also express higher levels of Foxp3, represent more recently activated Tregs (p = 0.03; Fig. 1b).

Definition 3: Foxp3hiCD45RAneg a Treg and Foxp3intCD45RApos n Tregs

The gating strategy for the Foxp3hiCD45RAneg and Foxp3intCD45RApos def.3 Treg subsets is given for a representative HD in supplementary figure 5a. Foxp3hiCD45RAneg aTreg accounted for 1.1 % (range 0.8–1.6 %) and Foxp3intCD45RApos nTreg for 1.4 % (range 0.6–2.5 %) of CD4pos T cells. Remarkably, the so-called Foxp3intCD45RAneg non-Treg subset accounted for 6.0 % (range 4.2–7.4 %) of CD4pos T cells, and this was significantly more than the aTreg and nTreg frequencies detected (p < 0.001) (Supplementary figure 5b). Further characterization revealed that the majority of aTregs and nTregs could be found within the CD25posCD127low def.1 and Heliospos def.2 Treg populations. Yet, the so-called non-Treg population also comprised considerable numbers of the def.1 (77.3 %) and def.2 (56.2 %) Tregs, indicating that the def.3 non-Treg population still contained high numbers of Tregs according to the other definitions (Supplementary figure 6a, b, d). Moreover, the frequency of def.1 or def.2 Tregs within the non-Treg population was significantly higher than within the aTreg and nTreg populations (Supplementary figure 6a, c, e). As expected, the aTreg, but not the nTreg population, displayed an activated profile indicated by high levels of Ki67 and CTLA4 expressions (% and mean fluorescence intensity; Fig. 1c and data not shown, respectively) sustaining the notion that this preset profile is likely to accurately detect activated Tregs.

Expression of ectonucleoside triphosphate diphosphohydrolase-1 (CD39)

It has been described that human Tregs express CD39, an ectonucleotidase involved in adenosine triphosphate (ATP) breakdown and the production of immunosuppressive adenosine, thereby suggesting that CD39 may be a functional marker on Tregs [27, 28]. To study the expression of CD39 in relation to the three commonly used Treg definitions, CD39 was included in our flow cytometric marker panel, and two HD-derived PBMC samples were analyzed. Although the majority of CD39 can be found in CD25posCD127lowFoxp3pos def.1 Tregs (~70 %), expression of CD39 is not def.1 Treg exclusive (Supplementary figure 7a, d). Similar results were found for Foxp3posHeliospos def.2 Treg and the Foxp3CD45RA def.3 Treg subsets (Supplementary figures 7b, c, e, f). Interestingly and indeed suggestive of their functional potential, CD39 expression is much higher in Foxp3hiCD45RAneg aTregs than in Foxp3intCD45RApos nTregs. Thus, CD39 expression seems to be present especially on activated Tregs, but its expression is not Treg exclusive. Within the activated Treg populations, it identifies the same population of cells, that is, CD45RAneg and CTLA4pos. CD39 expression therefore falls into the category of markers for identifying the activated subset of Tregs. Of note, it has been demonstrated that CD39, when combined with CD25, can be used to identify and isolate Tregs with strong suppressive activity [29, 30]. Gating on the cell surface markers CD25pos, CD127low, and CD39pos yielded 75–80 % Foxp3pos cells in our hands (Supplementary figures 7 and 8).

Based on the expression of high levels of CD25, Helios, CTLA-4, and CD39, the CD25posCD127lowFoxp3pos def.1, Foxp3posHeliospos def.2, and Foxp3hiCD45RAneg def.3 T cells were classified as bona fide Tregs.

Overlap between the three Treg definitions

Next, the overlap between the def.1, def.2, and def.3 Tregs was determined (Fig. 2a–c). As expected, there was considerable overlap between the three Treg definitions. The overlap between the CD25posCD127lowFoxp3pos def.1 Tregs and the Foxp3posHeliospos def.2 Tregs is approximately 73 %, and thus, Treg enumeration based solely on Foxp3 and Helios may lead to an underestimation in Tregs of ~27 % through exclusion of CD25posCD127lowFoxp3pos cells in the Foxp3posHeliosneg population (range 20.2–35.3 % of CD25posCD127lowFoxp3pos Tregs; supplementary figure  4c). Furthermore, Treg measurements based solely on Foxp3 and CD45RA (def.3) led to an underestimation of the number of def.1 Tregs of 67.5 % through exclusion of the so-called Foxp3intCD45RAneg non-Tregs (range 61.0–73.5 % of CD25posCD127lowFoxp3pos Tregs; supplementary figure 6c).

Treg enumeration in PBMC, TDLN, and TIL of cancer patients

It has been described that the expression of CD25 and/or CD127 can be altered in (chronic) inflammatory/autoimmune diseases such as systemic lupus erythematosus (SLE) and type 1 diabetes, thereby influencing reliable Treg enumeration [12, 26, 31]. In addition, changes in CD25 and CD127 expressions have also been observed in cancer patients undergoing immunotherapeutic interventions such as vaccination or ipilimumab treatment [3235]. To study the possibility of analyzing Tregs by the different definitions under such conditions, we analyzed peripheral blood samples from patients with recurrent OvCa and TDLN and tumor samples from CxCa patients.

As shown in Fig. 3 for representative examples, the gating and enumeration of Tregs based on CD25, CD127, Foxp3 (def.1), and Foxp3 and Helios (def.2) is feasible in OvCa-derived peripheral blood, as well as in TDLN and tumor samples from CxCa patients using the same gating strategy applied for HD-derived PBMC. Treg enumeration based on def.3 was feasible in peripheral blood and TDLN samples of patients but was not reliable in tumor samples due to the absence of the Foxp3intCD45RApos nTreg population which is used for discrimination between Foxp3int and Foxp3hi cells in the gating strategy (see also Supplementary figure 5 for Foxp3 and CD45RA gating strategy). Figure 3b shows a summary of the detected Treg frequencies in all analyzed samples. Importantly, the overlap between the three Treg subsets was comparable between HD-derived and OvCa patient-derived peripheral blood, CxCa-derived TDLN and tumor samples (Supplementary figure 9), indicating that CD25, CD127, and Foxp3 can also be used in cancer condition tissues. Of note, the additional value of CD127 and CD25 in the def.1 Treg marker set becomes particularly clear upon exclusion of these markers when assessing Treg frequencies in these samples. Exclusion of CD127 and/or CD25 from the Treg panel resulted in increase in the number of detected def.1 Tregs (supplementary figure 10). Although exclusion of CD127 only led to a substantial increase in the frequency of def.1 Tregs (mean 21.5 %, range 14.7–29.3 %) in the PBMC of OvCa patients, exclusion of CD25 or CD25 and CD127 led to substantial increases in the frequency of these def.1 Tregs in PBMC of HD and OvCa patients as well as in TDLN or tumor samples from CxCa patients (17.9, 21.9, 24.0, and 30.8 % for CD25 exclusion and 37.6, 58.8, 40.4, and 43.3 % for CD25 and CD127 exclusions, see supplementary figure 10a). This resulted from a less pure Treg detection as reflected by lower percentage of def.1 Tregs expressing markers such as CTLA-4 and reduced frequencies of Heliospos def.2 and Foxp3hiCD45RAneg def.3 aTreg cells among the def.1 Tregs (supplementary figure 10b), indicating that CD25 and CD127 are required for reliable assessment of def.1 Tregs.

Fig. 3
figure 3

Treg gating based on Foxp3 and CD45RA (def.3) is subjective in TIL as it is difficult to distinguish between Foxp3hi versus FoxP3low cells due to the absence of Foxp3intCD45RApos population. Def.1, def.2, and def.3 Treg analyses were performed by flow cytometry. Treg analysis based on CD25 and CD127 (def.1), FoxP3 and Helios (def.2), and FoxP3 and CD45RA (def.3) is given for PBMC of a representative healthy donor (HD) and an ovarian cancer (OvCA) patient and for a TDLN and TIL sample of representative cervical cancer (CxCa) patient in a and for multiple donors in b. Gates were set as described in supplementary figures 2a, 3a, and 5a. Percentage of CD25posCD127low and CD25posCD127lowFoxp3pos def.1; Foxp3posHeliosneg and Foxp3posHeliospos def.2; and the def.3 Foxp3hiCD45RAneg aTreg, Foxp3intCD45RApos nTreg, and Foxp3intCD45RAneg non-Treg populations is given as percentage of CD3posCD4pos T cells. Example of the problem with gating based on Foxp3 and CD45RA in TIL is depicted by the arrow in a

The association between Tregs and survival

Treg accumulation in the tumor or peripheral blood is associated with tumor progression and poor prognosis [36]. To study the relation between the different Treg subsets and survival, we determined the frequencies of the def.1, def.2, and def.3 Tregs in the PBMC of recurrent OvCa patients undergoing chemo-immunotherapeutic treatment (EM Dijkgraaf et al. submitted for publication) and correlated these levels to the overall survival (OS). Pretreatment levels of none of the def.1 Tregs, def.2 Tregs, and def.3 aTreg correlated with survival (Fig. 4a). However, when the pretreatment frequencies of Foxp3hiCD45RAneg or Ki67pos cells within def.1 Tregs (i.e., activated def.1 Tregs) were determined, a trend toward reduced OS was observed for patients with high frequencies of Foxp3hiCD45RAneg def.1 Tregs (p = 0.0643) and a significant reduced OS for patients with high frequencies of Ki67pos def.1 Tregs (p = 0.0133; Fig. 4b). The latter suggests that in particular, measurements of a more activated Treg pool may have prognostic or predictive value.

Fig. 4
figure 4

High pretreatment frequencies of Foxp3hiCD45RAneg and Ki67pos def.1 Tregs (i.e., activated def.1 Tregs) are associated with reduced overall survival in OvCa patients undergoing chemo-immunotherapeutic therapy. The use of Ki67 and CD45RA provides additional information on the activation status of def.1 Tregs. Treg analysis was performed based on CD25, CD127, and Foxp3 (def.1), Foxp3 and Helios (def.2), and Foxp3 and CD45RA (def.3) in PBMC of 21 chemo-immunotherapy-treated ovarian cancer (OvCA) patients (EM Dijkgraaf et al., submitted for publication). Pretreatment values of def.1, def.2, and def.3 Tregs were determined, and overall survival (OS) of these patients following chemo-immunotherapy was plotted in Kaplan–Meier curves for pretreatment values of def.1 (left), def.2 (middle), and def.3 (right) Tregs in a. Activation status of def.1 Tregs was determined by measuring the frequency of Foxp3hiCD45RAneg and Ki67pos cells within the def.1 Tregs. Gating and Kaplan–Meier curves are depicted in b for pretreatment values of Foxp3hiCD45RAneg def.1 Tregs and c for pretreatment values of Ki67pos def.1 Tregs. Gates for Foxp3hiCD45RAneg and Ki67pos were set as shown in the FACS plots. Patients were grouped into two groups based on the median of the total population, i.e., into a group of patients with frequencies that were below the median (dotted line) or with frequencies above the median (solid line) for the indicated parameter, after which survival analysis was performed. Number of patients and corresponding OS for each group is given. Statistical analysis was performed by log-rank testing, and differences were considered significant when p < 0.05

Conclusion and discussion

The unambiguous enumeration of Tregs by flow cytometry is hampered by (a) the inability to directly measure their function and (b) the absence of an exclusive, highly specific marker. Reaching consensus on an essential marker set for Treg enumeration with the currently available markers involves a number of considerations. First, the essential marker set should be able to identify a population of cells that in addition to the essential Treg-defining markers also express other Treg-associated markers but do not produce IFNγ and IL-2 [12, 13, 26]. Secondly, as there are currently three Treg definitions used in the field [812, 26], the cell population identified should be highly specific and include at least the same population of Tregs by all three definitions. Third, the proposed marker set should allow for robust, undisputable, and context (tissue)-independent gating since differences in gating strategies have been found to be the biggest source for interassay variation in flow cytometry-based assays [16, 17]. Fourth, if possible, one should be able to assess their functionality.

Based on the data presented here and taking into account the above-mentioned considerations, we consider the use of the CD3, CD4, CD25, CD127, and Foxp3 markers as the minimally required markers to define human Tregs. We showed that this combination of markers allows for robust and undisputable gating of Tregs in the context of HD- and cancer patient-derived peripheral blood as well as TDLN and tumor samples (Supplementary figure 2 and Fig. 3). Although the latter also holds true for Foxp3posHeliospos def.2 Tregs (Supplementary figure 3 and Fig. 3), Treg measurement based solely on Foxp3 and Helios resulted in a ~25 % underestimation of the number of def.1 Tregs through exclusion of CD25posCD127low cells within the Foxp3posHeliosneg population in all tested tissues/compartments (supplementary figure 4 and 9). These observations were in line with findings from others, reporting that Helios expression was restricted to a subpopulation (approximately 70 %) of human Foxp3pos T(reg) cells [12, 13, 26]. Treg enumeration based on Foxp3 and CD45RA (def.3) yielded distinctive aTreg and nTreg populations in HD- and cancer patient-derived peripheral blood and TDLN, with high CD25, CTLA-4, and Ki67 expression levels in the aTreg and lower expression levels of these markers in the nTreg populations (Supplementary figure 6 and figure 1 and 3). Yet, in line with findings from others [12], the largest population of CD25posCD127lowFoxp3pos (def.1; supplementary figure 6c) or Foxp3posHeliospos (def.2; supplementary figure 6e) populations was found in the so-called non-Treg population of Foxp3intCD45RAneg cells. While the population of Tregs based on definitions 1 or 2 may contain small fractions on non-Tregs, the measurement of Tregs based solely on Foxp3 and CD45RA (def.3) will lead to a ~60–70 % underestimation of Tregs. Importantly, def.3 Treg gating could not be done in a robust and undisputable fashion in tumor samples. Although not unexpected and observed before [8], the absence of the Foxp3intCD45RApos T cell population in tumor samples precluded robust def.3 aTreg and nTreg gatings in this context. Notably, the apparent absence of naïve T cells at tumor effector sites and the preferential recruitment of activated Tregs or accumulation of locally activated Tregs does confirm the validity of the defined respective activated and naïve Treg definitions within definition 3 [8, 11]. Of note, this observation clearly emphasizes the need for validating/assessing the suitability of the flow cytometry panels in the intended context/tissue.

As shown, we used CD3posCD4neg (i.e., CD8pos) and CD3neg cells to define the limits of the positive (CD25, CD127, Helios, and CD45RA) gates as this has been described to form a more reliable gating strategy than using isotype control antibodies or FMO controls [23]. Omission of CD3 and CD8 antibodies from the essential marker set does affect our gating strategy resulting in less reliable/more disputable CD25, CD127, Helios, and CD45RA gating, and thus affecting the reliability of our results (data not shown). Furthermore, this gating strategy results in objective CD25pos gating rather than subjective CD25high gating, the latter being very important for harmonized and comparative Treg analysis.

There are a number of Treg-associated markers which we consider to be of interest, yet optional to the required minimal panel. Based on our data, we highly recommend extending the minimally required antibody panel to include Ki67 and CD45RA as they provide additional information on the Treg activation status (Table 2). Indeed, the addition of CD45RA and Ki67 to the marker panel proved very informative in that no def.1, def.2, or def.3 Tregs were associated with worse survival of ovarian cancer patients but only the pretreatment frequencies of activated Foxp3posCD45RAneg and Ki67pos def.1 Tregs (Fig. 4). The measurement of activated Ki67pos Tregs has also been advocated by others [36, 37]. In one study, renal cell cancer patients undergoing multipeptide vaccination and cyclophosphamide treatment showed a significant reduction in the number of circulation Ki67pos Tregs and a trend toward prolonged OS following therapy [37]. Of note, as Ki67pos def.1 Treg detection was also feasible in TDLN and tumor samples (not shown), this strategy may also be useful to identify activated Tregs within def.1 Tregs in tumor samples, thereby circumventing the need for the subjective gating on Foxp3hi versus Foxp3int cells. While the activation markers CD39 and CTLA-4 [27, 28, 38, 39] have been described as functional markers to identify activated Tregs, they do not provide additional information to a panel over CD45RA and Ki67 and the minimally required antibody set. Helios may be of interest for identifying Tregs in autoimmunity such as SLE since these patients’ conventional T cells display high levels of CD25 resulting in overlap with Tregs [12]. In a recent trial where patients displayed a strong antigen-specific CD4pos T cell response to vaccination, we did not observe such problems for identifying Tregs using the currently proposed markers (EM Dijkgraaf et al. submitted for publication). Based on our data, omission of CD25 as a marker is not recommended as this resulted in the identification of less pure Treg populations (Supplementary figure 10).

Table 2 Proposed marker set

In addition, there remains a number of markers, not tested in this study, which may offer benefits to identify specific subsets of Tregs. CD147 is a cell surface marker that is accessible directly ex vivo and can also be used to identify an activated and highly suppressive Treg subset [36, 40, 41]. Furthermore, LAP (membrane-bound active form of TGF-β) and GARP (membrane-anchoring molecule involved in latent TGF-β binding) may be particularly interesting in defining TGF-β-associated and activated Tregs in tumor samples [39, 4245]. Moreover, the chemokine receptors CCR6, CXCR3, CCR4, and CCR10 were found to be useful for the identification of phenotypical and functional distinct subsets of human Foxp3+ Tregs [46].

In summary, consensus was reached concerning the use of an essential marker set comprising antibodies to CD3, CD4, CD25, CD127, Foxp3, Ki67, and CD45RA and a corresponding robust gating strategy for the analysis of Tregs in human samples. This set will be used in proficiency panels to harmonize the phenotypic analysis of Tregs within laboratories participating in the CIP.