Introduction

Head and neck squamous cell carcinomas (HNSCC) compose a behaviorally diverse field of cancers united by their common localization to the head and neck regions [1, 2]. Clinical problems such as early metastatic behavior and serial recurrences due to field cancerization are frequently encountered. Especially intriguing phenomena are the unexpected aggressiveness of small tumors and, in a favorable way, the surprising treatment response of some large tumors. The current therapy stratification of HNSCC is based on the overall state of the patient and clinical observations about the tumor [3, 4].

The site and extent of the tumor do not, however, have a decisive effect on patient prognosis [5, 6]. Attempts to explain clinical diversity of HNSCC by genetical and molecular analysis have thus far proven unsuccessful, leaving the determination of patient prognosis uncertain. A multitude of biomarkers has been suggested, with little success in translating findings to clinical practice [7]. The enthusiastically awaited inclusion of p16/HPV in the staging of oropharyngeal HNSCC has not met all expectations [8]. Some reasons to lack of success may be found in the uneven inclusion of patients to especially small retrospective patient cohorts, bias in inclusion criteria, and poor definition of clinical questions to be tackled [7, 9].

Northern European healthcare system offers an intriguing prospect for unbiased patient sampling, because cancer patients in need of oncological treatment are referred to regional tertiary centers independent of insurance or socioeconomic status of the patients. In addition, based on EUROCARE-5 data, the results of head and neck cancer treatment in Nordic countries and especially in Finland are remarkably superior to other regions in Europe [10].

In this study, a population-based cohort of all new HNSCC patients treated between 2005 and 2010 in Southwest Finland region, covering one sixth of Finland’s population, was collected. This cohort of HNSCC patients corresponds to the real-life patient succession treated at our institute. Tumor samples were retrieved, sampling bias analyzed, and a panel of immunohistochemical biomarkers analyzed.

Thus, we re-evaluated the real-life capability of a panel of immunohistochemical biomarkers to prognosticate patient 5-year overall survival (OS), when identified clinical prognostic variables are taken into account. All of these biomarkers have previously been reported to function as prognostic markers in HNSCC. The biomarkers included loss of tumor suppressor p53 expression associated with p53 mutations, that are the most often encountered mutations in HNSCC associated with metastatic behavior and radio resistance [11]. EGFR overexpression has been the focus of intense study in HNSCC, as EGFR inhibitors are available [12]. p16 has a clinical application as oropharyngeal cancer prognosticator [13, 14]. CIP2A is an mTOR and MYC-associated inhibitor of tumor suppressor protein phosphatase 2A [15]. MET and Oct4 are associated with a stemness phenotype [16, 17] and NDFIP1 was listed in the top three unfavorable HNSCC biomarker in Protein Atlas database [18].

Materials and methods

Primary HNSCC patient cohort

The HNSCC patient cohort was formed by identifying and including all patients treated for new HNSCC in Turku University Hospital (TUH) region in 2005–2010. Tumors were staged according to TNM criteria applicable at the time of diagnosis. Treatment protocols were decided in a multidisciplinary Tumor Board for head and neck cancer. OS was defined from end-of-treatment to end-of-follow-up or death. Age-standardized OS were calculated using International Cancer Survival Standards for weighting.

The usage of human tissue samples was approved by the Finnish national authority for medicolegal affairs (V/39706/2019), regional ethics committee of University of Turku (51/1803/2017) and Auria biobank scientific board (AB19-6863). Patient formalin-fixed, paraffin-embedded (FFPE) samples were acquired from pathology archives through Auria Biobank. Final TMA blocks of duplicate 0.6 mm cores were made in TMA Grand Master (3D Histech) according to annotations on scanned HE slides. Samples of normal liver were included in each block for orientation.

Immunohistochemistry (IHC)

FFPE blocks were cut into 6 um sections. CIP2A IHC was carried out after protocol optimization in Ventana BenchMark XT staining automate (Ventana Medical Systems, Inc) using mouse monoclonal anti-CIP2A antibody (1:25, 2G10-3B5, sc-80659, SantaCruz). p16, p53, and EGFR IHC were carried out in Ventana in clinical pathology laboratory. Oct4 IHC was performed as previously described with anti-Oct4 antibody sc-5279 (1:200 mouse monoclonal, Santa Cruz Biotechnology) [17]. NDFIP1 immunohistochemistry was carried out with anti-NDFIP1 antibody HPA009682 (1:1000 rabbit polyclonal, Atlas Antibodies). MET stainings were performed as previously reported [16].

Immunohistochemical stainings were analyzed by two authors independently, and differences were discussed until consensus was reached. p53 staining was analyzed using the established 3-tier system. Cytoplasmic/membraneous EGFR, MET, and CIP2A expression were scored semiquantitatively based on intensity of the staining on a scale of 1–3. Nuclear Oct4 was scored positive, when a subpopulation of strong positive nuclei was present. p16 immunostaining was regarded positive, when at least 70% of cells demonstrated strong nuclear and cytoplasmic staining intensity. Nuclear NDFIP1 staining was regarded positive when strong, uniform nuclear staining was present. For all statistical analyses, dichotomous cutoffs were applied.

Statistical analysis

Patient data and staining results were entered into SPSS 24 software (SPSS, IBM). For Cox hazards models, the proportionality of hazards was testing using log-minus-log plotting and plotting Schoenfeld residuals against survival time, when appropriate. For all multivariable analysis, stepwise approach with backward LR method was applied, if not otherwise indicated, with p value limits for inclusion and exclusion at 0.05 and 0.10, respectively. For Kaplan–Meier survival estimation, significance was analyzed using log-rank method. To test prognostic potential of biomarkers, their combinations and their interactions, Cox regression was used by first entering the prognostic clinicopathological variables and in another block the biomarker combinations. p values of less than 0.05 were considered significant.

Results

Southwest Finland regional cohort corresponds with Nordic EUROCARE-5 population

An electronic database screen was made to include all HNSCC patients treated in Southwest Finland region during years 2005–2010 (Fig. 1a). Altogether 952 patients’ records were accessed. After initial evaluation, the final cohort included 476 patients diagnosed and treated for new HNSCC tumor (Table 1). Two-hundred and thirty-two patients (49%) were diagnosed with early stage HNSCC, 164 patients (34%) had nodal metastasis at presentation, and five patients (1.1%) were diagnosed with distant metastasis. Only 1.3% (6/476) of patients were lost during the first year of follow-up.

Fig. 1
figure 1

a Principle of the population-validated TMA. First, a background population was screened for comprehensive inclusion of all patients treated for HNSCC in Southwest Finland during the time period of 2005–2010. This background population was used to assess clinical prognostic factors. All available samples were included in TMA. The representativeness of the TMA was analyzed with logistic regression analysis for multiple variables. After the representativeness was confirmed, the TMA is considered a population-validated TMA (PV-TMA). b Overall survival, and c disease-specific survival of the patients included in PV-TMA was slightly lower than of patients not included in PV-TMA. In multivariable analysis, there was no difference in survival

Table 1 Clinicopathological variables of the patient cohort. Univariate (left panels) and multivariable (right panels) survival analysis of HNSCC cohort

OS was influenced by previously acknowledged risk factors: patient age, advanced T class, nodal positivity, and alcohol use (Table 1). Interestingly, T class proved to be a superior prognosticator than TNM stage in all major subsites of HNSCC (Fig. 2a–h and Table 1). However, inadequate prognostic resolution between T1 and T2 as well as T3 and T4, respectively, was noted, especially in laryngeal cancer (Fig. 2d). Thus, for multivariable analysis, T class was divided dichotomously in T0-2 vs T3-4, providing a highly significant prognostic stratification (Table 1; HR 0.27, 95% CI 0.17–0.44, p < 0.001). While the primary tumor site had no decisive impact on patient OS, inclusion of primary tumor site in the following multivariable models was deemed appropriate.

Fig. 2
figure 2

Overall survival was highly affected by tumor T class in both a HNSCC overall and the three main subsites, b oral cavity, c oropharynx, and d larynx. eh TNM stage was an inferior prognosticator as compared to tumor T class in HNSCC overall and the three main subsites, especially in oropharynx, where the prognostic resolution was virtually non-existent. In oral cancer, TNM stage offered minimal prognostic resolution between stage 2 and stage 3

One-hundred and seventy-two patients (36%) were given only surgical treatment (Table 1). Ninety-seven and 191 patients were treated with radiotherapy or chemoradiotherapy, respectively. Fifteen patients were offered no treatment. In a multivariable model fitting age at diagnosis, primary tumor site, T class, nodal status and alcohol consumption, no treatment type proved clearly superior with regard to OS impact, although surgical treatment was associated with a statistically significant improvement in prognosis.

Survival data were compared to results of EUROCARE-5 study (summarized in Table 2). In comparison to general Finnish, Northern European, and whole European average head and neck cancer patient survival, the observed survival rates in Southwest Finland region were higher especially in elderly patients and hypopharyngeal cancer.

Table 2 Survival rates in TUH HNSCC patient cohort compared with Eurocare-5 data for Northern Europe

Construction of representative population-validated tissue microarray (PV-TMA)

Altogether 264 patients’ tumor samples were available for TMA (Fig. 1a). A thorough analysis of TMA construction biases was carried out (Table 3). Compared to clinical data of the background population, HNSCC patients treated in Southwest Finland region in 2005–2010, the established PV-TMA was shown to be representative in terms of age distribution, tobacco and alcohol exposure and especially TNM class, whereas uneven site distribution was observed.

Table 3 Univariate (left panels) and multivariable (right panels) analysis of TMA inclusion bias

Importantly, TMA inclusion was not a significant predictor of 5-year OS or disease-specific survival in neither univariate analysis nor in multivariable survival model fitting for established clinical risk factors (Fig. 1b, c). In conclusion, the PV-TMA constructed for this work can be considered to be well representative of HNSCC patients treated in the region of Southwest Finland in 2005–2010.

Analysis of representative HNSCC patient TMA demonstrates poor performance of putative biomarkers for prognostication

Using this exceptionally representative PV-TMA material, we analyzed the prognostication capability of multiple biomarkers—p53, EGFR, p16, CIP2A, MET, Oct4, and NDFIP1—previously shown to function as prognostic markers in HNSCC (Fig. 3). The prognostic information of CIP2A and p16 reached significance in univariate analysis (Fig. 3i, o, respectively). However, regardless of the hypothesis-based selection of the candidate biomarkers and their previous association with poor prognosis in HNSCC, none of the biomarkers showed significant prognostic value in multivariable analysis using PV-TMA material (Table 4).

Fig. 3
figure 3

Representative immunohistochemical stains and prognostic trends (estimates using Kaplan–Meier method and log-rank method for significance) of the investigated biomarkers in HNSCC. ac p53, df EGFR, gi CIP2A, j-l Oct4, mo p16, pr NDFIP1, su MET

Table 4 Prognostic performance of investigated biomarker staining intensities

Further, the possible prognostication value of the biomarkers for oral cavity, oropharyngeal, or laryngeal cancer patients was further investigated using a multivariable model entering the above identified clinical prognosticators. None of the investigated biomarkers provided statistically significant prognostic information in the three main subsites of HNSCC (Supplementary Table 1). Furthermore, no combination or interaction of the investigated biomarkers could not provide significant prognostic potential in multivariable survival regression, when clinical prognostic variables were included in the models (data not shown).

Discussion

Our study demonstrates, that in a non-biased HNSCC patient population treated with optimal results, the putative biomarkers failed to offer significant prognostic information. In order to improve retrospective as well as future prospective studies, a population-based analysis should be mandatory to appreciate the potential biases in patient selection. Further, the recent failures of significant prospective drug trials in HNSCC [19,20,21] suggest that optimization of retrospective studies is an underappreciated step in discovery of biomarkers for patient treatment stratification.

This study emphasizes the need for thorough exploration of inclusion bias, since some exclusion of patients due to loss of samples and inadequate sample size is unavoidable. In our patient cohort, this is achieved by analysis of the population giving rise to the TMA cohort, the Southwest Finland HNSCC patients from 2005 to 2010. The statistical analysis reveals that our PV-TMA is an exceptionally representative and unbiased study environment for retrospective analysis of biomarkers. Population-validation approach thus improves the robustness and reliability of data analysis.

High risk of bias is present in patient inclusion to both retrospective and prospective cohorts [22, 23]. Inclusion biases include unequal recruitment of patients with different socioeconomic status or limited insurance coverage, supposedly having a poor prognosis, and on the other hand patients with small tumors with good prognosis. Moreover, variance in the given cancer treatments between different hospitals, and between individual clinicians can also be a confounding factor in the analysis of treatment outcomes. Clinical validation of our patient cohort is made possible by the referral system in Northern Europe, leading to an unbiased, institutional patient population, which serves as a representative cross-section of the regional population. Thus, this dataset represents the real-life patient succession observed in the clinic and is, in this respect, superior to recruited prospective cohorts. Furthermore, loss to follow-up is virtually non-existent due to the Nordic public health care system and electronic databases.

Particularly good head and neck cancer treatment results in Nordic countries increases the interest of this dataset [10]. Interestingly, in our regional data, the Southwest Finland patient prognosis was even better than in Finnish EUROCARE-5 data. This may be due to more wide-spread use of cisplatin radiosensitization and, most importantly, the long-standing multi-disciplinary tumor board practice, guaranteeing optimized protocols, meticulous treatment planning, and impartial response monitoring. Of special clinical interest is also the superior prognostic resolution afforded by T class in comparison to complete TNM stage. However, the observed 34% survival rate of T1-2 patients provides rationale for biomarker-based prognostication.

Particularly interesting are our results when putative biomarkers with auspicious publication history for prognostication of HNSCC were tested in PV-TMA. Importantly, we failed to recognize significant prognostic factors, when clinical prognosticators were taken into account, either in the patient material as a whole or in any major subsite. Surprisingly, no combination or interaction of biomarkers proved useful in prognostication of our patient material. More complex statistical analysis used in previous studies to create prognostic biomarker panels [24, 25] could not be applied in this study, concentrating in an unselected patient population. Despite the disappointing failure of the biomarkers, our approach highlights the value of unbiased cross-sectional regional control of patient inclusion in biomarker discovery.

Immunohistochemistry for p16 is the only clinically approved biomarker for HNSCC and is applied in oropharyngeal cancer staging. In our study, p16 was a surprisingly poor prognosticator of HNSCC patients’ OS, in contrast to earlier reports [14, 26, 27]. Whether this is attributable to better overall prognosis of HPV-negative patients or the widespread use of cisplatin radiosensitization, remains an intriguing question. The failure of recent p16 deintensification trials seems, however, to demonstrate a need for better understanding of the role of p16 in both radio- and chemoradioresistance [8, 20]. Thus, our finding cautions against p16-based deintensification with regard to current treatment guidelines in Finland.

The main strength of this study is the impartial inclusion of all HNSCC patients treated in our regional referral center. Thus, the patient cohort is representative of the real-life population encountered in the routine clinical practice, increasing the applicability of our results to clinical decision-making. Despite the crucial representativeness of our patient cohort, there are weaknesses in this study as well. The patient number remains relatively low, especially in site-specific analysis. Further, the patient numbers do not readily allow for more complex statistical approaches, such as multivariable analysis of biomarker combinations and more detailed analysis of staining cut-offs, including integration of data on subcellular localization changes.

In conclusion, we demonstrate the value of population-validation methodology for retrospective biomarker studies, and wish to emphasize the need for population level evaluation for inclusion biases. Impartial cancer patient selection, comprehensive patient registers available for researchers, and exceptionally good cancer treatment outcomes demonstrate optimal possibilities for retrospective analysis of biomarkers. Similar approach should be applied for the design of future prospective trials in molecularly diverse cancers.