Toward harmonized phenotyping of human myeloid-derived suppressor cells by flow cytometry: results from an interim study

There is an increasing interest for monitoring circulating myeloid-derived suppressor cells (MDSCs) in cancer patients, but there are also divergences in their phenotypic definition. To overcome this obstacle, the Cancer Immunoguiding Program under the umbrella of the Association of Cancer Immunotherapy is coordinating a proficiency panel program that aims at harmonizing MDSC phenotyping. After a consultation period, a two-stage approach was designed to harmonize MDSC phenotype. In the first step, an international consortium of 23 laboratories immunophenotyped 10 putative MDSC subsets on pretested, peripheral blood mononuclear cells of healthy donors to assess the level of concordance and define robust marker combinations for the identification of circulating MDSCs. At this stage, no mandatory requirements to standardize reagents or protocols were introduced. Data analysis revealed a small intra-laboratory, but very high inter-laboratory variance for all MDSC subsets, especially for the granulocytic subsets. In particular, the use of a dead-cell marker altered significantly the reported percentage of granulocytic MDSCs, confirming that these cells are especially sensitive to cryopreservation and/or thawing. Importantly, the gating strategy was heterogeneous and associated with high inter-center variance. Overall, our results document the high variability in MDSC phenotyping in the multicenter setting if no harmonization/standardization measures are applied. Although the observed variability depended on a number of identified parameters, the main parameter associated with variation was the gating strategy. Based on these findings, we propose further efforts to harmonize marker combinations and gating parameters to identify strategies for a robust enumeration of MDSC subsets. Electronic supplementary material The online version of this article (doi:10.1007/s00262-015-1782-5) contains supplementary material, which is available to authorized users.


Introduction
Myeloid-derived suppressor cells (MDSCs) are a heterogeneous group of myeloid cells at different stages of differentiation that are often expanded in cancer patients and capable to suppress immune responses [1]. In recent years, recognition of the clinical relevance of MDSCs has steadily increased. MDSC levels correlate negatively with prognosis and overall survival in cancer patients, and the accumulation of this population promotes tumor progression and is linked to tumor stage and worse prognosis of cancer patients [2][3][4][5].
MDSCs were originally described in mice, on the basis of the co-expression of CD11b and Gr-1. However, subsequent studies elucidated that CD11b + /Gr-1 + cells are not a homogeneous cell population, but rather a heterogeneous group of myeloid cells, endowed with different suppressive ability. At present, it is well known that there are at least two main subsets of MDSCs present in tumorbearing mice, the monocytic (M-MDSC) and the granulocytic (G-MDSC) ones and that these cell populations can be properly identified by using a set of accessory markers. In particular, the differential expression of Ly6C and Ly6G, the two isoforms of Gr-1, allows the identification of G-MDSC as CD11b + /Gr-1 high /Ly6C − /Ly6G high and M-MDSC as CD11b + /Gr-1 int /Ly6C high /Ly6G − [6].
The phenotyping of human MDSCs is complicated by the fact that Gr-1 is not expressed on human leukocytes. Thus, multiple human MDSC subsets with different phenotypes have been documented in several types of tumors in the last 20 years (reviewed in [7]). All reported phenotypes are defined on the basis of a combination of myeloid markers and suffer from the lack of a specific marker. The initial assumption that MDSCs are solely constituted of immature myeloid cells [8][9][10] is now challenged by reports describing MDSCs with morphology and phenotype consistent with those of cells with more differentiated features. At present, it appears that all MDSC phenotypes can be allocated to one of the at least three main subsets, of which each contains more than one cell population. M-MDSCs share the morphology of monocytes and are characterized by the expression of CD14 and lack of CD15 [11,12]; G-MDSCs contain a cell population resembling granulocytes and are phenotypically characterized by CD15 and CD66b expression and the absence of CD14 [12][13][14][15][16]; immature MDSCs (i-MDSC) are cells with immaturity characteristics, defined essentially as lineage negative cells [3,17].
Although there is an increasing relevance of MDSCs for tumor immunology, there is lack of a uniformly used and broadly accepted classification of human MDSCs (in terms of both subsets types and identification markers). This motivated the design of a proficiency panel by the CIMT (Association for Cancer Immunotherapy) Immunoguiding Program (CIP) (http://www.cimt.eu/workgroups/cip/) to assess the heterogeneity in current human MDSC phenotyping approaches and harmonize them across different laboratories. The first step of the proficiency panel, which is the focus of this communication, did not include a functional assay, since the suppressive activity of most of these subsets was already reported [7]; hence, this study was mainly designed to assess intra-laboratory and inter-laboratory variance and to define a robust marker combination for the phenotypical identification of circulating MDSCs. The parameters associated with high variance identified from this first phase will be harmonized in the second step with the aim to decrease the variability of MDSC quantification.

Healthy donors
After given written informed consent, leukapheresis was performed for 23 healthy individuals. Peripheral blood mononuclear cells (PBMCs) were centrally prepared by Ficoll gradient and cryopreserved as described previously [4]. The healthy donors were screened by flow cytometry for the presence of the 10 putative MDSC subsets by the organizing committee, and three representative donors with the most significant expansion of these subsets were selected for the first phase of the proficiency panel.
In addition, guidelines indicated to use at least one million PBMCs per test, not to perform a resting period after thawing the cells and not to fix the stained cells.
Participants were asked to analyze the flow data using their own gating strategy and to report it in single layouts; moreover, they were asked to determine the absolute number of the requested cell populations with or without the presence of a DCM.

Central data analysis
Results from the 23 participating laboratories were centrally collected and analyzed. Absolute numbers of the 10 putative MDSC subsets were normalized to the count of lymphocytes + monocytes. Variance was calculated as follows: %CV = (standard deviation/mean)*100. Comparison between %CV or normalized frequencies of myeloid subsets identified by different groups was performed using the Wilcoxon signed-rank test. Significance level was set for P < 0.05.

Results
To start this project, we conducted a survey by sending out a questionnaire to groups that the organizers identified as having published at least one paper on human MDSCs. We explored their interest in participating to a harmonization process for the phenotyping of human MDSCs. Results of 89 analyzed questionnaires were discussed at the CIMT Annual Meeting 2012 and in two webinars hold in September 2012 with more than 50 participants. Most research groups expressed strong interest to join this activity and, in particular, to reach an agreement for the markers required to phenotype MDSCs. From the initial survey, it appeared that overall 32 markers were used to define MDSCs, but only a core set of 7 markers was commonly used by all groups.
For ethical and practical reasons, it was not feasible to obtain leukaphereses from cancer patients, and therefore, PBMCs of healthy donors were used for this harmonization panel. Participants were asked to phenotype the samples for the 10 predefined putative MDSC subsets. A total of 23 laboratories (15 European and 8 from the USA) participated to the proficiency panel and reported their results to the organizing committee.
Participants were asked to stain the PBMC with antibodies against HLA-DR, CD14, CD15, CD11b, CD33, lineage cocktail (defined as specified above) and CD124 plus a dead-cell marker (DCM). The participating laboratories chose gating strategy, antibody clones and fluorochromes, protocol for staining, negative controls and the inclusion of additional markers. An exemplary, but optional, gating strategy was provided in the guidelines (Fig. 1). We mandated that centers would use either one 8-color antibody panel or three 4-color antibody panels to combine the 8 proposed markers and to perform two experimental runs for each donor on different days. As readout, participants reported the number of total events, lymphocytes, monocytes and MDSC subsets in either the presence or absence of DCM.
For each MDSC subset and each donor, we calculated the intra-assay and inter-assay variance (%CV) of each reported MDSC subset frequency.
Intra-laboratory variance was as expected for multicolor phenotyping assays within a lower range (range 20-45 %, Fig. 2a). However, data analysis revealed large inter-laboratory variance among the 10 MDSC subsets analyzed, ranging from 50 to 300 %, with the highest variance among the granulocytic subsets compared to the immature and monocytic subsets (Fig. 2b). When analyzing the influence of data normalization, we found that normalizing MDSC frequencies (n% MDSC) to either lymphocytes or lymphocytes + monocytes generally reduced variance (Fig. 2c). Exemplary data of individual laboratories for three subsets are shown in Fig. 3 (panels a, b and c), and the descriptive statistics of the results is reported in Supplementary   In view of the large spread of the data, we set out to identify the potential sources of variability in the phenotyping of the 10 MDSC subsets. Accordingly, we analyzed the n% MDSC and the %CV obtained from the analysis performed with or without DCM. Results indicate that the use of DCM decreases the n% MDSC of G-MDSC (MDSC2, 5, 6 and 8), M-MDSC MDSC1 and i-MDSC (MDSC3 and 10) but not of monocytic subsets MDSC4 and 7 (Fig. 4a). The explanation for these results probably lies on the fragile nature of granulocytic cells, which are more prone to cell death during the cryopreservation and/or thawing procedure than either monocytic subsets or lymphocytes [6,21]. The use of a dead-cell marker did not significantly affect the variance of results obtained from two tests performed by the same laboratory (intra-laboratory; Fig. 2a), while it increased the inter-laboratory variance of results ( Fig. 2b; P < 0.0001, Wilcoxon matched-pair test).
Other parameters, such as antibody clones and the use of 8-color versus 4-color panels, were difficult to analyze, because the large number of subgroups identified for each variable led to numbers too small for statistical approaches. However, results suggest that the choice of negative controls, CD124 clone and composition of the lineage cocktail were associated with high inter-laboratory variation (data not shown).
It is well known that one of the sources of inter-laboratory variability arises from manual gating analysis of flow cytometry data [22][23][24], and, indeed, we observed a large variability in the gating strategy developed by the participating groups. For example, some participants put the gate for CD124 + cells only on highly fluorescent cells, while others gated positive cells including also dimly fluorescent cells; as a consequence, total %CV for MDSC1 is twofold higher than the %CV calculated within each of the two groups that use a similar gating strategy to identify CD124 + cells (data not shown). Also the gating of CD33 positive cells was differently performed including CD33 low and/or high cells, thereby increasing the variance. To dissect the influence of gating analysis on the result variance, we grouped the laboratories that followed the gating example provided in our guidelines (Fig. 1) for the identification of the 10 MDSC subsets. We observed that for those participant results following a homogeneous gating approach, the %CV was reduced (P = 0.0012, Wilcoxon matched-pair test) for most MDSC subsets, especially for G-MDSCs (Fig. 4b).

Discussion
Circulating levels of MDSCs have been correlated with tumor burden and overall survival in different types of cancer, and some studies demonstrated their prognostic role for the outcomes of different chemotherapeutic regimens (reviewed in [7]). Therefore, it is a promising biomarker now also for response to immunotherapy. However, Fig. 2 Analysis of the impact of a dead-cell marker on MDSC quantification. Black bars indicate average %CV from analyses performed without exclusion of dead cells, while white bars refer to analyses considering only live cells (n = 18 evaluable laboratories reporting data for the 8-color panel). Intra-laboratory variance is shown in panel a, while panel b reports inter-laboratory variance. c Average %CV (n = 14 evaluable laboratories reporting data for the 8-color panel) of the frequencies of the ten MDSC subsets without normalization (black bar) and normalized on the number of singlets (yellow bar), lymphocytes (blue bar) and lymphocytes + monocytes (red bar) the validation of the predictive significance of MDSCs in multicenter studies is complicated by the phenotypic complexity of human MDSCs, thereby creating a challenge in finding a consensus on the minimal requirements for MDSC monitoring. To address this challenge, we organized the first proficiency panel to harmonize human MDSC phenotyping.
The task was challenging because of the large number of participants, the complexity of the staining panel and the number of subsets to be identified. Under these circumstances, it is not surprising that the quantification of the 10 requested MDSC subsets was characterized by a high inter-laboratory variance, increasing from monocytic to immature and granulocytic subsets. The number and the international origin of the participating laboratories indicate that the high inter-laboratory variance observed in the panel could be similar as in independent studies on human MDSC published. In contrast, the intra-laboratory variance was in general not high, indicating that the participating laboratories were experienced in the field of multicolor flow cytometry.
The choice of reagents and the gating strategy were critical parameters that impacted on the quantification of MDSC. The participating laboratories had been allowed Frequencies were normalized on the count of lymphocytes + monocytes for each laboratory performing either one 8-color panel or three 4-color panels for MDSC identification; the two paired histograms refer to the two independent experimental runs performed by each laboratory (black bars = first run, gray bars = second run). Missing values, not reported by single laboratories, were labeled with "m" to decide on staining reagents. As a result, a considerable number of different antibodies were used to identify MDSC subsets using an 8-color staining panel. We observed that the clone of anti-CD124 antibody and the composition of the lineage cocktail significantly influenced the quantification of CD124 + MDSCs and immature MDSCs, respectively. The issue of the heterogeneity of reagents was also present in previous proficiency panels, in which a partial standardization of reagents had been suggested [25], especially for culturing of PBMCs intended to be used for functional T cell assays [23,26,27]. The MDSC proficiency panel introduced the use of a DCM, and, as expected, we observed that the percentages of several MDSC subsets, and in particular of granulocytic ones, were significantly reduced by dead-cell exclusion. Hence, these results raise the question whether dead MDSCs should also be evaluated in frozen samples to account for their presence in fresh samples, or in contrast, whether inclusion of dead cells would alter MDSC quantification due to unspecific binding of antibodies to other cell types.
The gating strategies used for identification of MDSC subsets had a significant influence on the variance of results. Inter-laboratory performance improved significantly when analysis was limited to only those laboratories, which applied a homogenous gating strategy similar to that proposed in the panel's guidelines. This was not unexpected since most proficiency panels based on flow cytometry similarly recommend harmonization of the gating strategies across laboratories. In a multimer-based proficiency panel, results were audited in order to exclude wet laboratories which regularly used a wrong gating procedure [24,25]. However, since identification of MDSC subsets derives from a complex combination of markers, we believe that a more adequate approach would be to train the participating laboratories to perform a properly gating pipeline through an in silico panel, as done in ICS proficiency panels [23,28]. Hence, we are currently setting up an in silico panel to measure the influence of manual gating analysis on the variance of the results. In this study, participants will be asked to re-analyze their own data, as well as those of a supplementary reference dataset, following guided instructions for sequential gating. The output will guide us to design the second step of the proficiency panel in which harmonization should be achieved following a wider set of mandatory requests, dealing not only with gating procedure but also with other parameters such as the use of a DCM, a unique lineage cocktail, and fluorescence-minus-one controls for dimly fluorescent antigens.

Conclusion
Overall, this interim report emphasizes the lack of a robust definition for human MDSC subsets and highlights the importance of finding an agreement on the harmonized staining and analysis strategy and of translating our findings in general guidelines for immune phenotyping of these cells. We demonstrated that it is feasible to conduct a proficiency panel including a large number of participants and identifying a high number of myeloid subsets using 8-color flow cytometry. In this first step, we provided individual feedback of performance for each laboratory and we identified a number of parameters that must be harmonized in the second step, hence establishing the foundation for the development of a robust assay for MDSC phenotyping. Fig. 4 a Average frequencies (n = 18 evaluable laboratories reporting data for the 8-color panel) of MDSC subsets normalized on lymphocytes + monocytes ( n%) identified either excluding or not the dead cells. Wilcoxon matched-pair test (***P < 0.001, ****P < 0.0001). b Average %CV (n = 18 evaluable laboratories reporting data for the 8-color panel) of the frequencies of MDSC subsets identified either using a homogenous gating strategy (white bar) or not (black bars)