Introduction

Patients with triple-negative breast cancer (TNBC) have poorer prognosis compared to patients with other breast cancer subtypes and fewer treatment options due to the absent or low expression of estrogen and progesterone receptors and HER2 [1]. Exploration of alternative therapy options for TNBC patients is ongoing and immune checkpoint inhibitors (ICIs) targeting the programmed death 1 (PD-1)/programmed death-ligand 1 (PD-L1) interaction are now approved in TNBC [2]. However, questions remain to be answered regarding the optimal selection of patients who might benefit from ICI treatment.

PD-L1 protein expression determined by immunohistochemistry (IHC) is currently the only clinically applied predictive biomarker for checkpoint inhibition in TNBC and is relevant in the unresectable locally advanced or metastatic setting. However, PD-L1 evaluation in breast cancer varies where each ICI comes with a companion/complementary IHC assay where the antibodies, scoring systems, definition of positivity and predictive threshold is different across assays [3,4,5]. For optimal clinical use of PD-L1 as a biomarker, a unique and harmonized IHC workflow and scoring system should be developed. Several phase III clinical trials with different ICIs in TNBC have shown mixed results, but some have been promising in the metastatic and neoadjuvant setting. Commonly investigated ICIs in TNBC are atezolizumab together with the SP142 Ventana IHC assay (IMpassion trials) and pembrolizumab together with the 22C3 Dako IHC assay (Keynote trials), and these ICIs have been approved in TNBC in combination with chemotherapy [6,7,8,9,10,11,12,13,14,15,16,17,18,19]. SP142 has been found to have less sensitivity for PD-L1 staining on tumor cells (TCs) than on immune cells (ICs) in TNBC and the scoring system for SP142 is based on the proportion of tumor area occupied by PD-L1 expressing ICs [20, 21]. For SP142, a predictive threshold value of 1% has been found when adding atezolizumab to nab-paclitaxel in metastatic TNBC in the IMpassion130 phase III trial that led to the first accelerated approval of an ICI in TNBC. The combination is approved outside of the US but has been withdrawn by FDA since continued approval was contingent upon the results of the IMpassion131 trial which failed at showing significant clinical benefit of atezolizumab in combination with paclitaxel [6, 11, 22]. On the other hand, the scoring system for the 22C3 antibody is a combined positive score (CPS) that is based on PD-L1 expression in TCs and ICs as a proportion of the total number of TCs. For 22C3, a predictive threshold value of CPS 10 has been found in the metastatic setting in the Keynote-355 phase III trial [7]. On the contrary, atezolizumab and pembrolizumab have in phase III trials shown clinical benefit in the neoadjuvant setting irrespective of PD-L1 status by SP142 and 22C3, respectively [9, 10, 14].

The IC+ scoring method is not clinically applied for 22C3, and CPS is not clinically applied for SP142. Several studies have shown inter-assay variability and discordance between the SP142 and 22C3 assays, each detecting partially non-overlapping subpopulations of PD-L1 positive TNBC patients [21, 23,24,25,26,27,28,29]. However, these studies have not been consistent in their comparison of scoring methods and their prognostic impacts. To our knowledge, only a few studies so far evaluated the agreement of the clinically established scoring algorithms of these assays in TNBC, reporting impaired concordance [21, 23, 25].

In our current study, we investigated the agreement between the SP142 and 22C3 assays in the context of their clinically used scoring systems in TNBC, assessed their correlation to PD-L1 expression at the mRNA level to investigate if the assays differ in their association with the mRNA status, and evaluated their prognostic value in a population-based early-stage TNBC cohort. The overall aim was to provide additional data about assay interchangeability to support PD-L1 analysis in TNBC and clinical decision making.

Material and methods

Patient cohort

The origin of our TNBC cohort has been previously described by Staaf et al. [30]. Briefly, a total of 408 TNBC patients were identified in Region Skåne between 2010/09 and 2015/03 by the Swedish National Breast Cancer Quality (NKBC) registry. Of those, 340 were enrolled in the Swedish Cancerome Analysis Network - Breast (SCAN-B) study (ClinicalTrials.gov ID NCT02306096), which is a population-based study in the southern health care region of Sweden and all patients with primary breast cancer are eligible (https://www.scan-b.lu.se/) [31]. Eighty-four patients were thereafter excluded because of unclear TNBC status or insufficient tissue material. Of the 256 remaining patients included in our tissue microarray (TMA), 13 were excluded due to metastatic disease at diagnosis or prior to start of adjuvant chemotherapy (n = 8), bilateral breast cancer (n = 3), loss to follow-up before treatment start (n = 1), or non-TNBC status (n = 1). Clinicopathological characteristics and follow-up data was collected through clinical chart review and the last date of counting in events was 18th Oct 2019. Additional 11 patients were excluded since they only had TMA cores from residual disease after neoadjuvant chemotherapy (n = 6) or due to unevaluable TMA cores for the 22C3 staining (n = 5). Of the remaining 232 patients, who all underwent primary surgery (mastectomy or partial mastectomy), 166 received chemotherapy (CT-cohort) according to national guidelines, of which 155 received adjuvant and 11 neoadjuvant CT. Of these, 98.2% (163 of 166) received FEC or EC (5-fluorouracil, epirubicin, cyclophosphamide) based treatment with or without a taxane and three patients (1.8%) received less than 50% of planned CT. The remaining 66 patients did not receive any neo(adjuvant) CT, most often due to age or comorbidities (non-CT-cohort). Checkpoint inhibitors were not given to the patients in the cohort. Adjuvant radiotherapy was given according to national guidelines. All of the 166 CT patients were eligible for overall survival (OS) analysis, 165 for invasive disease-free survival (IDFS) and 163 for distant relapse-free interval (DRFI). In the non-CT-cohort, 64 were eligible for OS, 65 for IDFS and 63 for DRFI (Fig. 1, study flowchart). Clinicopathological characteristics in the CT-cohort (prior to eventual (neo)adjuvant CT) and the non-CT-cohort are presented in Table 1. RNA sequencing data for gene expression profiling (GEX) was available for 84% of the patients (194 out of 232 patients) through the SCAN-B consortium [31].

Fig. 1
figure 1

Study flowchart. Our final cohort consisted of 232 early-stage TNBC patients recruited from the population-based SCAN-B cohort. Abbreviations TNBC: triple-negative breast cancer; NKBC: National Breast Cancer Quality (NKBC) registry; SCAN-B: Swedish Cancerome Analysis Network - Breast; TMA: tissue microarray; NACT: neoadjuvant chemotherapy

Table 1 Clinicopathological characteristics

PD-L1 immunohistochemistry (IHC) and tissue microarray (TMA)

Scoring of PD-L1 expression by immunohistochemical testing was assessed in formalin-fixed, paraffin-embedded tumor samples in a TMA, using two different PD-L1 antibody clones: SP142 with Ventana BenchMark Ultra platform (Ventana Medical Systems, Inc., AZ, U.S) and 22C3 with Dako Autostainer Link 48 platform (Agilent, Inc., CA, U.S) IHC assays. Preparation and staining were done according to the manufacturer´s instructions. The TMA images were assessed in PathXL Philips Xplore (Koninklijke Philips N.V., NL). Each sample was represented by two TMA cores, each of 1.0 mm in diameter. PD-L1 in the adjuvant treated patients and the non-CT-cohort was evaluated on TMA cores from the surgical specimen. For the neoadjuvant patients, PD-L1 was evaluated on TMA cores from core needle biopsies taken prior to neoadjuvant treatment.

PD-L1 IHC scoring

We evaluated PD-L1 staining according to two scoring methods: CPS and staining in ICs. CPS was defined as the combined number of PD-L1 stained TCs, tumor infiltrating lymphocytes (TILs) and macrophages (intratumorally and in adjacent stroma) divided by the total number of TCs, multiplied by 100. We evaluated CPS at a threshold of 1 and 10 according to PD-L1 evaluation in clinical phase III TNBC studies with pembrolizumab and the 22C3 assay [7, 8, 10]. The IC+ score was defined as percentage of the tumor area (non-necrotic, non-sclerotic area) covered by PD-L1 stained tumor infiltrating ICs and evaluated at a threshold of 1% as performed in phase III TNBC trials with atezolizumab and the SP142 assay [6, 9, 11, 13]. The score from the TMA core with highest value was set as the respective CPS and IC+ score for the tumor. PD-L1 expression in TCs (in CPS) included partial or complete membranous staining and in ICs (in CPS and IC+) membranous and/or cytoplasmic staining. Scoring of SP142 PD-L1 expression was done by a physician and a board-certified breast cancer pathologist where consensus in non-matching scoring had to be reached for 4,7% of the tumors (Additional file 1: Table S1). The 22C3 scoring was performed by a physician and in cases that were not clearly obvious, a board-certified breast cancer pathologist was consulted and consensus reached. IHC staining examples are illustrated in Fig. 2A. We scored CPS using the 22C3 assay and IC+ with both SP142 and 22C3 (note it is experimental scoring of IC+ with 22C3 since the IC+ scoring is not clinically applied for 22C3). We did not evaluate CPS for SP142 since it has been shown to have impaired sensitivity for PD-L1 staining in TCs in TNBC [20, 21]. When investigating the concordance between the assays, the SP142 IC+ of ≥ 1% and 22C3 CPS of ≥ 10 scores were compared as they are the only clinically established predictive cut-offs in TNBC. Moreover, since 22C3 CPS ≥ 1 also has been investigated in clinical trials, the concordance between SP142 IC+  ≥ 1% and 22C3 CPS ≥ 1 was evaluated. In addition to this, to compare under more similar, but explorative, scoring conditions, the concordance between SP142 IC+ and 22C3 IC+ was evaluated.

Fig. 2
figure 2

Tissue microarray immunohistochemical (IHC) images of PD-L1 staining and comparison of assays. A:i Negative PD-L1 staining with SP142. A:ii Positive PD-L1 staining in immune cells (ICs) with the SP142 antibody. A:iii Positive 22C3 staining, mostly in ICs. ii and iii are from the same tumor. A:iv Positive PD-L1 staining in tumor cells and in ICs with the 22C3 antibody. All images at 20 × magnification. Concordance analyses in the overall cohort (N = 232) between the SP142 and 22C3 assays with different scoring algorithms where SP142 IC ≥ 1% is compared to 22C3 combined positive score (CPS) ≥ 10, 22C3 CPS ≥ 1 and to 22C3 IC ≥ 1% in (BD), respectively. Venn diagrams show the overlap between the assay IHC expressions, kappa values represent the measurement of the level of agreement and the concordance rate equals the overall percentage agreement

Evaluation of tumor infiltrating lymphocytes (TILs)

Abundance of stromal TILs was evaluated by a board-certified breast cancer pathologist on hematoxylin–eosin stained whole slides from surgical specimen before eventual adjuvant chemotherapy and from pre-treatment core needle biopsies for the neoadjuvant treated patients. Abundance was calculated as percentage of TILs occupying the tumoral stromal area according to the international TILs working group (https://www.tilsinbreastcancer.org/) [32]. If more than one slide was available per patient, the average score was applied. Threshold for high versus low TILs (as binary variable) was set to 30% as performed in a previous pooled analysis of the prognostic value of TILs in early-stage TNBC patients [33], which also was near the mean value of TIL abundance in our cohort (27% in the overall cohort, 29% in the CT-cohort).

Clinical endpoints

In the survival analyses, OS, IDFS and DRFI were defined as endpoints with support of the STEEP criteria [34]. OS was the time from diagnosis of primary breast cancer to death of any cause. IDFS was the time from primary diagnosis to the diagnosis of a breast cancer related invasive event (locoregional or distant) or, if no relapse had occurred, to death of any cause. In the absence of event in OS and IDFS the case was censored at last follow-up. DRFI was defined as the time from diagnosis to the diagnosis of a distant relapse of breast cancer or breast cancer related death, the case was censored at death of any other cause or at last follow-up if no DRFI event had occurred. Contralateral breast cancer and distant recurrences with uncertain origin were not included in DRFI but were included in IDFS. Follow-up time was defined as the time from diagnosis to date of death or to last follow-up.

Statistical analyses and analyses of RNA sequencing data

Analyses of RNAseq data were performed in R (v 3.6.1), all remaining statistical analyses with SPSS (v 26.0). Concordance rate (expressed as percentage) was calculated to evaluate IHC inter-test reliability and kappa statistic applied as a measurement of the level of agreement. A kappa coefficient of ≥ 0.80 was interpreted as strong agreement, 0.60–0.79 as good, 0.40–0.59 as weak, 0.21–0.39 as minimal and < 0.20 as none agreement [35]. Area-proportional Venn diagrams were drawn with https://www.biovenn.nl/ [36]. Chi-square test was applied when comparing categorical values between groups (chi-square test for trend if more than two groups were compared). Nonparametric Mann–Whitney test was applied to compare non-categorical values between two groups. Survival data were analyzed by Kaplan–Meier estimates along with log-rank test and with Cox regression, reporting hazard ratio (HRs) and 95% confidence intervals (CIs). Multivariable Cox regression analyses were performed by including, aside from PD-L1 status, other traditional and prognostic factors: age at diagnosis, tumor size, lymph node status, Nottingham histologic grade (NHG) and TIL abundance as binary covariates. Four multivariable regression analyses were performed, i.e., one for each PD-L1 scoring method: SP142 IC+ , 22C3 CPS 10, 22C3 CPS 1 and 22C3 IC+. RNA sequencing data was matched against patient data generating a list of 16,258 genes across 194 samples. FKPM values were Log2-transformed, imputed (missing data to 0), mean-centered and scaled (samples and genes). The correlation between PD-L1 gene expression (by RNAseq) and PD-L1 protein expression (analyzed by IHC) was estimated using the Spearman method and visualized with boxplots (the median is indicated by the central line, upper and lower limits of the box represent the upper and lower quartiles and whiskers the × 1.5 interquartile range). A p-value less than 0.05 was considered statistically significant and all p-tests were two sided.

Results

Frequency of PD-L1 IHC expression

A higher positive detection rate (Table 1) was observed for SP142 IC ≥ 1% than for 22C3 CPS ≥ 10, 50.9% (118/232) versus 27.2% (63/232), when using these clinically applied predictive cut-offs (from the advanced TNBC setting).

Since 22C3 CPS ≥ 1 has also been investigated in clinical trials, we analyzed the percentage of PD-L1 positivity using this lower cut-off for 22C3. As expected, this resulted in a higher positive detection rate (53.9% (125/232)) compared to 22C3 CPS ≥ 10.

In an explorative analysis, to evaluate 22C3 under more similar conditions as SP142, we applied the IC+ scoring method to 22C3. The positive detection rate for 22C3 IC ≥ 1% was 41.8% (97/232).

Comparison between SP142 and 22C3

When comparing SP142 and 22C3 with the clinically applied scoring methods and cut-offs (IC ≥ 1% and CPS ≥ 10, respectively), a kappa value of 0.48 was obtained (interpreted as week agreement). Approximately half of the tumors (47.8%; 111/232) were negative with both antibodies, whereas 60 tumors (25.9%) were positive with both antibodies, resulting in a concordance rate of 73.7%. Fifty-eight tumors (25%) were positive with SP142, but negative with 22C3, whereas three tumors (1.3%) showed the opposite pattern (Fig. 2B). Taken together, almost half of the tumors (49.2%; 58/118) that stained PD-L1 positive with SP142 were considered to be negative with 22C3, when using these clinically established predictive cut-offs.

The kappa value increased to 0.63 (interpreted as good agreement) and the concordance rate to 81.5% when a threshold of ≥ 1 for CPS was applied for 22C3 where 189 tumors (out of 232) showed concordant PD-L1 status (89 tumors negative with both antibodies and 100 tumors positive with both; Fig. 2C). A lower number of tumors that stained positive with SP142 but negative with 22C3 was found than when using the ≥ 10 cut-off for CPS (n = 18 vs. n = 58). The number of tumors with the opposite pattern (i.e. negative with SP142 but positive with 22C3) was increased from 3 to 25.

Next, we evaluated the concordance between the two antibodies when scored with the same scoring method and cut-off, i.e. IC ≥ 1% (Fig. 2D, note that IC+ is not normally employed for the 22C3 antibody). This comparison resulted in the best concordance rate of 86.6% (201 concordant tumors: 109 negative with both and 92 positive with both) and a kappa-value of 0.73 (interpreted as good agreement). Five tumors were negative with SP142 but positive with 22C3 and 26 showed the opposite pattern.

Association of SP142 and 22C3 with PD-L1 (CD274) gene expression (mRNA)

We detected a significant positive association between PD-L1 IHC expression and PD-L1 (CD274) gene expression (mRNA) in the overall cohort. The Spearman correlation coefficients were similar between PD-L1 gene expression and all the two-categorical IHC scorings (rs = 0.59 for SP142 IC+, rs = 0.60 for both 22C3 CPS 1 and CPS 10, rs = 0.62 for 22C3 IC+; all p-values < 0.001; Fig. 3A–D). When stratifying the 22C3 CPS into three categories (i.e. < 1, 1–9 and ≥ 10), a positive stepwise association between PD-L1 (CD274) gene expression and PD-L1 protein expression was observed (Fig. 3E; rs = 0.67), establishing a good degree of association between transcript and protein measurements.

Fig. 3
figure 3

Association of immunohistochemical (IHC) PD-L1 expression with PD-L1 (CD274) gene expression (mRNA) in the overall cohort. In AD the association of gene expression with SP142 PD-L1 staining in immune cells (ICs), 22C3 combined positive score (CPS) ≥ 1, 22C3 IC staining and 22C3 CPS ≥ 10, respectively, all at two-categorical IHC expressions. In E the association of gene expression with 22C3 CPS at three-categorical IHC expression. The mRNA expression of the SP142 IC and 22C3 CPS concordant and discordant cases in (F) and (G), with CPS threshold of 1 and 10, respectively

We also investigated PD-L1 gene expression levels in SP142 IC and 22C3 CPS concordant and discordant groups, respectively (Fig. 3F, G). Here, transcript levels in the discordant groups (i.e. 22C3 CPS < 10 and SP142 IC ≥ 1% or 22C3 CPS ≥ 10 and SP142 IC < 1%) were found to be at an intermediate level between the concordant positive group and the concordant negative group. No significant difference in PD-L1 mRNA expression was found between the two discordant groups.

Clinicopathological features in the CT-cohort and the non-CT-cohort

Clinicopathological characteristics differed in patients receiving (neo)adjuvant CT and in those not receiving CT. The patients in the CT-cohort were younger (p < 0.001), had higher median TIL abundance, more proliferative (p = 0.007) and higher-grade tumors (p = 0.004), higher rate of PD-L1 expressing tumors (p = 0.006 for SP142 IC and p = 0.028 for 22C3 CPS status) and tended to have fewer deaths (p = 0.059) but had similar rate of relapses as compared to the non-CT-cohort (Table 1). Due to these differences, we chose to evaluate clinicopathological features in relation to PD-L1 status and perform outcome analyses separately in the CT-cohort and the non-CT-cohort.

Association of PD-L1 status with clinicopathological features

In the CT-cohort, tumors with SP142 IC ≥ 1% were significantly associated with higher NHG (p = 0.004), higher Ki-67 proliferation index (p = 0.005), histological medullary features (p = 0.001) and increased stromal TIL abundance (p < 0.001), whereas age at diagnosis, tumor size and lymph node status were not significantly associated with PD-L1 status (Table 2). When using CPS ≥ 10 for 22C3, only medullary features and TIL abundance were significantly associated with PD-L1 status (both p values < 0.001) and the association between PD-L1 and NHG and Ki-67 did not reach statistical significance (Table 2). With the other cut-offs for 22C3 (CPS ≥ 1 and IC ≥ 1%; Additional file 2: Table S2), the results were similar to those obtained for SP142, with significant associations to NHG (p < 0.001 and p = 0.012, respectively), Ki-67 level (p = 0.009 and p = 0.006, respectively), medullary features (p = 0.002 for both 22C3 CPS 1 and 22C3 IC+) and TIL abundance (p < 0.001 for both CPS 1 and IC+).

Table 2 Clinicopathological features in the CT-cohort in relation to SP142 and 22C3 CPS 10 PD-L1 status

In the non-CT-cohort, a significant positive association between TIL abundance and PD-L1 status was observed, irrespective of PD-L1 IHC evaluation method (all p-values < 0.001 for SP142 IC+ , 22C3 CPS 1 and 22C3 IC+ ; for 22C3 CPS 10: p = 0.006 for median TIL score and p = 0.026 for TILs as binary covariate). Associations between the other clinicopathological parameters and SP142 IC ≥ 1%, 22C3 CPS ≥ 10 or 22C3 IC ≥ 1% did not reach significance (Additional file 3: Table S3). When using 22C3 CPS ≥ 1 cut-off, NHG was significantly associated with PD-L1 IHC expression (p = 0.045) and Ki-67 borderline significant (p = 0.051; Additional file 3: Table S3).

Association of PD-L1 with patient outcome in the CT-cohort

When using the clinically established cut-offs for both SP142 (IC ≥ 1%) and 22C3 (CPS ≥ 10) in univariable Cox regression analyses, a positive PD-L1 status was significantly associated with a better DRFI (HR = 0.47, 95% CI 0.22–1.00, p = 0.049 for SP142 IC+ and HR = 0.18, 95% CI 0.04–0.76, p = 0.019 for 22C3 CPS 10; Table 3 and Fig. 4). The HRs for IDFS and OS also indicated a better prognosis for patients with PD-L1 positive tumors (HRs ranging from 0.46 to 0.53), but only reaching significant level for IDFS and SP142 IC status (95% CI 0.26–0.89, p = 0.02). The results for 22C3 CPS ≥ 1 and 22C3 IC ≥ 1% showed a similar pattern although only reaching significancy for 22C3 CPS 1 and IDFS (HR = 0.53, 95% CI 0.29–0.98, p = 0.043; Table 3 and Additional file 3: Fig. S1).

Table 3 Univariable and multivariable regression analyses in the CT-cohort (N = 166)
Fig. 4
figure 4

Kaplan Meier survival analyses according to immunohistochemical PD-L1 status in the cohort receiving (neo)adjuvant chemotherapy. Invasive disease-free survival (IDFS), overall survival (OS) and distant relapse-free interval (DRFI) according to SP142 PD-L1 expression in immune cells (IC+) in panel (A) and in panel (B) for 22C3 combined positive score (CPS) at a threshold of 10

Next, we performed a subgroup analysis where we divided the 22C3 CPS 10 negative group (i.e., those with CPS < 10) into one group positive with SP142 (i.e. IC ≥ 1%; n = 47) and one group negative with SP142 (IC < 1%, n = 71). No significant difference in DRFI was observed between these two groups (log rank p = 0.562; Fig. 5). For the group with 22C3 CPS ≥ 10, a similar division was not meaningful since all the patients in the CT-cohort that had 22C3 CPS ≥ 10 also scored SP142 IC ≥ 1%. These results suggest that if information for PD-L1 status with 22C3 CPS 10 is available, SP142 does not add any further prognostic information for DRFI.

Fig. 5
figure 5

Subgroup survival analysis of patients with 22C3 combined positive score (CPS) < 10. Kaplan Meier estimates and log rank p-value for distant relapse-free interval (DRFI) according to SP142 PD-L1 status in the (neo)adjuvant chemotherapy sub-cohort that had 22C3 CPS < 10 (N = 119)

In multivariable Cox regression analysis, PD-L1 status was found not significantly associated to outcome for any of the clinical endpoints, irrespective of IHC assay and cut-off (Table 3). Of note though, a trend towards better DRFI was observed for 22C3 CPS ≥ 10 staining (HR = 0.26, 95% CI 0.06–1.20, p = 0.084). Stromal TIL abundance was the only covariate showing independent significant association to outcome in multivariable analyses, where it was positively associated with improved IDFS irrespective of PD-L1 assay and cut-off included in the analysis (HRs ranging from 0.24 to 0.27 and p-values from 0.003 to 0.007, Table 3A-D) and with a better DRFI in a multivariable model where SP142 IC+ was included (HR = 0.33, 95% CI 0.11–0.99, p = 0.047; Table 3A).

Association of PD-L1 with patient outcome in the non-CT-cohort

The scarcity of patients in the non-CT-cohort did not allow for robust multivariable Cox regression analyses. PD-L1 status was not significantly associated with DRFI in univariable analysis (HRs ranging from 0.56 to 0.77, p-values not significant). For IDFS, the HRs for PD-L1 status were similar as in the CT-cohort (HRs ranging from 0.53 to 0.65 compared 0.46 to 0.57 in the CT-cohort), but in this small group of TNBC patients not treated with (neo)adjuvant CT with few events, the p-values were not significant (Additional file 5: Table S4 and Additional file 6: Fig. S2). Stromal TIL abundance was not significantly associated with any of the clinical endpoints in univariable analyses (HRs ranging from 0.90 to 1.34). Age (as continuous variable) was negatively associated with OS (HR = 1.07, 95% CI 1.01–1.14, p = 0.027) and IDFS (HR = 1.05, 95% CI 1.00–1.11, p = 0.043), tumor size was negatively associated with all the endpoints (HR = 2.41, 95% CI 1.08–5.39, p = 0.003 for IDFS; HR = 2.54, 95% CI 1.02–6.32, p = 0.045 for OS; HR = 4.63, 95% CI 1.00–21.53, p = 0.051 for DRFI) and lymph node status negatively associated with DRFI (HR = 8.06, 95% CI 2.13–30.59, p = 0.002; Additional file 5: Table S4).

Discussion

To date, two different immune checkpoint inhibitors (ICIs) have been incorporated in the treatment of TNBC; pembrolizumab in both early-stage and metastatic TNBC and atezolizumab in the metastatic setting. Atezolizumab is still approved outside of the US but has been withdrawn by the FDA for metastatic TNBC. Each of these ICIs comes with a different PD-L1 IHC antibody assay, Ventana SP142 and Dako 22C3, respectively, that have different scoring methods and cut-offs [17, 18]. It is of clinical interest to harmonize these assays in the attempt to simplify the use of PD-L1 IHC expression as a predictive biomarker for checkpoint inhibition response. In this context, it has been recommended that a concordance rate of at least 90% is needed for assays to be considered analytically equivalent [37]. In our analysis, the comparison between SP142 IC ≥ 1% and 22C3 CPS ≥ 10, the currently clinically applied scoring methods and predictive cut-offs, showed a concordance rate of only 73.7% and kappa value of 0.48. These results indicate a weak concordance, as previously reported [21, 23]. This low rate of concordance in our cohort was mainly driven by the low positive percentage agreement of only 50.8% (118 SP142 IC ≥ 1% and 60 of these were also 22C3 CPS ≥ 10) where SP142 IC ≥ 1% expression was much more frequent than 22C3 CPS ≥ 10 and where 22C3 CPS 10 was not able to identify almost half (49.2%) of tumors that scored positive with SP142. Conversely, SP142 IC+ failed to identify 4.8% of tumors that scored positive with 22C3 CPS 10. We found better concordance rate of 81.5% (kappa value 0.68) when comparing SP142 IC ≥ 1% and 22C3 CPS ≥ 1, in line with two previously published studies [21, 25], though higher than reported by the IMpassion 130 sub-study of 63.5% [23]. The 22C3 CPS 1 scoring was not able to identify 15.3% of tumors that scored positive with SP142 and, on the other hand, SP142 was not able to identify 20.0% of tumors that scored positive with 22C3 CPS 1. We observed the best concordance rate of 86.6% between the two assays using the IC+ scoring for both (kappa value 0.73), which was in line with some previous results [25, 27, 28], but better than reported in the IMpassion 130 sub-study of 68.8% [23]. Our findings deviating from the IMpassion 130 sub-study might be explained by the lower rate of 22C3 CPS 1 and 22C3 IC+ positivity in our study, which in turn led to a substantially better negative percentage agreement in our cohort, resulting in a higher concordance rate.

PD-L1 (CD274) gene expression (mRNA) showed a strong positive association with all the IHC scorings of PD-L1 expression, irrespective of antibody and cut-off. PD-L1 gene expression could not explain the difference between SP142 and 22C3 CPS since both discordant groups (i.e. 22C3 CPS < 10 and SP142 IC ≥ 1% or 22C3 CPS ≥ 10 and SP142 < 1%) had similar PD-L1 gene expression levels.

We found that PD-L1 expression was positively associated with TIL abundance, NHG, Ki-67 level and histological medullary features. We also investigated the prognostic value of the different PD-L1 IHC scorings and found that PD-L1 expression when evaluated with SP142 IC+ and 22C3 CPS had a significant protective effect in patients that received (neo)adjuvant CT. However, PD-L1 status was not independently prognostic in multivariable regression analyses when adjusting for TIL abundance and other traditional prognostic features, where only TILs had an independent effect on outcome. Of the four different PD-L1 scorings and the three clinical endpoints, the prognostic impact of PD-L1 was strongest for 22C3 CPS ≥ 10 and DRFI. When dividing the CT-subgroup that had 22C3 CPS < 10 into SP142 IC positive and SP142 negative, we found that the SP142 status did not add any further prognostic value regarding DRFI if information for PD-L1 status with 22C3 is available. Keep in mind though that SP142 is relevant in predicting response to atezolizumab in the metastatic setting [6, 23]. It has previously been suggested that 22C3 is a better prognostic marker than SP142 in primary breast cancer patients [38] and our results suggest that 22C3 CPS at a threshold of 10 gives a better division into DRFI prognostic groups than SP142 IC+ in early-stage TNBC.

We chose to perform outcome analyses separately in the CT-cohort and the non-CT-cohort for several reasons. Older age and comorbidity (the primary reasons why (neo)adjuvant CT was not administered in the non-CT cohort), and thereby non-breast cancer related deaths in the non-CT-cohort, are competing risk factors regarding breast cancer specific events and diluting the OS results and, in part, the IDFS analyses. Moreover, TIL abundance and PD-L1 expression, both of which were lower in the non-CT-cohort than in the CT-cohort, are known to be positively associated with CT-response and prognosis in early TNBC [9, 10, 14, 33, 39, 40]. This in turn might partly explain why the prognostic impact of TILs and PD-L1 status was weaker than in the CT-cohort and not significant.

The population-based cohort is the main strength of our study, thus representing PD-L1 and TIL status in an early-stage TNBC population. A weakness is the small tissue cores in the TMA, potentially leading to inaccurate evaluations of PD-L1 expression due to intra-tumoral PD-L1 heterogeneity when compared to scoring on histological whole sections [21, 41,42,43,44,45]. Interestingly, neoadjuvant CT in TNBC is administered more frequently and becoming a standard of care compared to adjuvant CT. The evaluation of PD-L1 would be performed on core needle biopsies instead of whole sections in these patients, as it is often the case for metastatic lesions [46]. Core needle biopsy is more comparable with TMA in terms of size than whole section slides, and this aspect needs to be taken into consideration in the clinical setting when choosing thresholds for PD-L1 expression. Another caveat of our study is that we scored PD-L1 in primary TNBC tumors which have been found in a meta-analysis to differ from PD-L1 expression in metastatic lesions [47]. We have explored the analytical concordance of the SP142 and 22C3 assays. Unfortunately we cannot explore the predictive value of the interchangeability these assays due to the retrospective, non-randomized nature of our study where the patients did not receive immune checkpoint blockade. Further studies addressing that issue are warranted.

In summary, the PD-L1 IHC staining concordance between the clinically validated scoring algorithms for SP142 (IC ≥ 1%) and 22C3 (CPS ≥ 10) was impaired in our early-stage TNBC cohort. The concordance was better when evaluated with 22C3 CPS ≥ 1 or the same IC+ scoring method for both assays. The SP142 assay is better at identifying 22C3 positive tumors than the 22C3 assay is at identifying SP142 positive tumors. PD-L1 expression was of positive prognostic value in patients treated with (neo)adjuvant CT where it was strongest for DRFI and 22C3 CPS ≥ 10. However, PD-L1 status was not independently prognostic when adjusting for TIL abundance in multivariable analyses. Our findings suggest that these two antibody assays, with their respective clinically established scoring method and cut-offs, detect partially non-overlapping subpopulations of TNBC patients in the early-stage setting and are not substitutable with one another regarding PD-L1 detection and prognostic value. Further studies are warranted to investigate the predictive value of the interchangeability of these assays.