Introduction

Head and neck squamous cell carcinomas (HNSCC) are affecting more than 600,000 patients per year [1, 2]. HNSCC arises from the mucosal lining of the upper aerodigestive tract, including the oral cavity (OSCC), oropharynx (OPSCC) and larynx (LaSCC). Tobacco smoking, alcohol consumption, betel nut chewing and human papillomavirus (HPV) infection are the main risk factors for the development of HNSCC [3,4,5,6]. As a result of multi-modal therapy, 5-year survival rates have increased up to 67% for all stages during the last decades [7, 8]. However, survival rates of laryngeal cancer decreased over the past decades from 66 to 63% [9], with an unfavorable prognosis for advanced stages [10, 11]. This is important, as most patients suffering from laryngeal carcinoma present with advanced disease (stage III/IV) at the time of diagnosis [12]. This emphasizes the need for innovations in the field of risk stratification with subsequent therapy adjustments.

Surgical therapy plays a fundamental role in the treatment of HNSCC and is complemented by radiation and chemotherapy in advanced cases (stage III/IV) [8]. Relevant prognostic factors in HNSCC are clinical and histological characteristics, such as tumor size and localization, lymph node extracapsular spread (ECS), and perineural (PNI) and angiolymphatic invasion (ALI) [13,14,15,16]. Two of these histological characteristics have been investigated at a molecular level. An 11 gene expression profile for extracapsular spread in oral squamous cell carcinoma serves as a prognosticator of outcome in patients without nodal metastases [17]. In terms of PNI, a gene expression profile revealed a subset of HNSCC at risk of post-surgical recurrence [18]. However, little is known about the molecular mechanisms related to ALI and a molecular signature for ALI has not been identified so far. ALI implies tumorous invasion of lymphatic vessels and intratumoral vascular invasion. It is correlated with higher rates of lymph node metastases and poor prognosis in oral squamous cell carcinoma [19].

The main objective of this study was to explore the molecular mechanisms of ALI and to identify a gene expression signature associated with ALI. Although ALI is a common histopathological characteristic and predictor of outcome in HNSCC, histopathological examination implies a level of uncertainty. It depends on a single pathological examination, and extensive specimens are needed for robust classification. Especially, small biopsy specimens do not meet these demands.

Therefore, a molecular indicator for patients at high risk of recurrence may support histopathological examination, alter therapeutic strategies and improve patient’s prognosis. We investigated transcriptome data of HNSCC in relation to the occurrence of ALI and patient’s survival based on a global gene expression analysis using TCGA data. We also confirmed newly identified candidate genes in an independent cohort of LaSCC. The whole genome expression data of this cohort were collected in collaboration with the German cancer research center and the Heidelberg Center for Personalized Oncology (DKFZ-HIPO).

Material and methods

Data acquisition

TCGA cohort

A publicly available dataset containing clinical and pathological information on patients with HNSCC was published by the TCGA Research Network. All patients were treated by surgery, followed by radiotherapy or radio-chemotherapy if needed. mRNA count data containing 19,750 genes of 500 patients with HNSCC as well as general clinical and pathological characteristics were downloaded from https://portal.gdc.cancer.gov on January 25, 2019. Updated survival data were retrieved from https://gdc.cancer.gov/about-data/publications/pancanatlas on March 14, 2019. Patients with missing subsite information were excluded from further analysis, resulting in a total of 497 patients.

HIPO-HNC cohort

Patients with histological confirmed LaSCC (n = 13) of the HIPO-HNC cohort were treated at the Heidelberg University Hospital between 2012 and 2016. The seldom expression data of LaSCC were collected in collaboration with the German cancer research center and the Heidelberg Center for Personalized Oncology (DKFZ-HIPO). Patient samples were obtained under the protocol S-206/2011, approved by the Ethics Committee of Heidelberg University, with written informed consent from all participants. This study was conducted in accordance with the Declaration of Helsinki. Sample processing and generation of array-based expression data have been described previously [20]. These data have been uploaded to NCBI's Gene Expression Omnibus and are accessible through GEO Series accession number GSE117973. The clinical and pathological information are summarized in Supplemental table S1. Formalin-fixed paraffin-embedded tissue specimens were retrieved from the tissue bank of the National Center for Tumor Diseases, Heidelberg, Germany.

Data analysis

Differential gene expression (DGE)

The DGE analysis was performed on the TCGA cohort in R version 3.5.1 using the packages DESeq2 and limma [21, 22]. Genes with an FDR less than 0.05 and a log2-fold-change greater 1.5 were considered as significantly differentially expressed. FDR values were adjusted using Benjamini–Hochberg correction. The results of DESeq2 and limma were compared, and consistent genes were used for further analysis.

Unsupervised hierarchical clustering

Transcript data of selected genes were used for unsupervised hierarchical clustering and to construct a heatmap via ClustVis, a web tool for visualizing multivariate data. Hierarchical clustering was performed using Pearson correlation [23].

Chi-squared test and univariate and multivariate analysis

Chi-squared test was used for comparisons of categorical data and a p value < 0.05 was considered as statistically significant. Univariate and multivariate analysis was carried out using the Cox regression model. All analyses were performed in R version 3.5.1.

Survival analysis

Overall and progression-free survival was estimated using the Kaplan–Meier method. Log-rank test was used to compare the differences between groups. p values less than 0.05 were considered statistically significant. Overall survival was defined as a time period from the date of diagnosis to the last date when the patient was known to be alive (censored) or date of death for any reason (uncensored). Progression-free survival was measured from the time of diagnosis to the day of the last follow‐up examination in which the patient was progression free (censored), or to the date of local recurrence of the disease or occurrence of regional or distant metastases (uncensored). Survival analysis was performed in R version 3.5.1 using the packages survminer, survival and ggplot2.

Immunohistochemistry

Immunohistochemical staining was carried out on 4 μm-thick sections from FFPE tissue specimens of the HIPO-HNC cohort. IHC staining was done as described previously [24]. GLI2 was detected by mouse monoclonal antibodies (Santa Cruz Biotechnology, Dallas, USA, clone sc-271786, 1:100 in PBS) and DSG1 was detected by mouse monoclonal antibody (Progen, clone 18107; 1:10 in PBS). The analysis was carried out by two independent observers.

Results

Association between ALI and progression free or overall survival

A histological ALI status was available for 336 out of 497 (68%) patients of the TCGA-HNSC cohort. The impact of ALI on survival was assessed by Kaplan–Meier analysis and log rank test to confirm a statistically significant shorter progression-free (p = 0.018, log rank test) and overall survival (p = 0.008, log rank test) for cases with ALI as compared to ALI-negative counterparts (Fig. 1a, b). The subgroup analysis demonstrated that the association between ALI and recurrent disease or overall survival was evident for LaSCC (PFS: p = 0.047, OS: p = 0.036, log rank test) and OSCC (PFS: p = 0.021, OS: p = 0.02, log rank test), but not for OPSCC (PFS: p = 0.98, OS: p = 0.61, log rank test) (Suppl. Fig. S1A–F).

Fig. 1
figure 1

a Overall survival of patients in the TCGA-HNSCC cohort according to angiolymphatic invasion status. b Progression-free survival of patients in the TCGA-HNSCC cohort according to angiolymphatic invasion status

ALI-related gene expression signature

A differential gene expression analysis was performed for subgroups with positive and negative histopathological ALI status (n = 336) and significant candidate genes (FDR < 0.05 and − 1.5 < log2-fold-change > 1.5) were considered for further analysis. 93 genes were identified by the DESeq2 package (60 with increased mRNA expression level, 33 with decreased mRNA expression level), and 17 genes were identified by the limma package (two with increased mRNA expression level, 15 with decreased mRNA expression level). To reduce the number of genes and to identify robust candidates, consistent candidate genes were selected. In total, six differentially expressed genes (SHH and SLC18A3 with increased mRNA expression level; LCE3E, LCE2B, LCE3D and DSG-1 with decreased mRNA expression level) were identified by both packages and selected for further analysis (Suppl. tables S2–3).

We hypothesized that the transcript levels of the six candidate genes are sufficient to predict the risk of ALI in all patients of the TCGA cohort (n = 497). Indeed, unsupervised hierarchical clustering using Pearson correlation revealed two main clusters (Suppl. Fig. S2). While tumors in cluster A (264 out of 497; 53%) were enriched for a negative ALI status, most tumors in cluster B (233 out of 497; 47%) had a positive ALI status. These data demonstrated the potential of the six-gene expression signature to stratify tumors reliably by ALI status (p = 2.451e−09, Chi-squared test, Suppl. tables 4–5). Furthermore, previously unclassified patients (n = 161) could be categorized to either cluster A or cluster B by use of the presented gene signature. Therefore, the six-gene expression signature enabled segregation of cases for which histological information on the ALI status was not available.

ALI-related gene expression signature predicts tumor recurrence and unfavorable survival in laryngeal carcinoma

As the six-gene signature stratifies HNSCC reliably according to the ALI status, we addressed the question whether patients in clusters A and B differ in prognosis. All patients of the TCGA-HNSCC (n = 497) cohort were considered, including both patients with and without available information on the histological ALI status. Kaplan–Meier plots revealed a significant shorter overall survival for patients in cluster B (p = 0.02, log rank test) (Fig. 2a, b; Supplemental tables 6–7). The subgroup analysis considering distinct subsites of HNSCC demonstrated that differences in overall survival were most prominent for patients with LaSCC (p = 0.004, log rank test). Moreover, LaSCC patients in cluster B shared a significant shorter progression-free survival (p = 0.002, log rank test) (Fig. 3a, b, Supplemental table 8–9). Interestingly, 20 out of 41 (49%) LaSCC patients with a negative ALI status as determined by histological inspection were part of cluster B according to the six-gene expression signature. These patients showed unfavorable PFS (Fig. 4a, b), indicating a substantial amount of patients at risk with a lacking histomorphologic ALI-like phenotype. In contrast, only 4 out of 34 (12%) LaSCC patients with a histologically positive ALI status were part of cluster A according to the six-gene signature (Supplemental table 10).

Fig. 2
figure 2

a Overall survival of patients in the TCGA-HNSCC cohort according to gene expression signature. b Progression-free survival of patients in the TCGA-HNSCC cohort according to gene expression signature

Fig. 3
figure 3

a Overall survival of patients in the TCGA-LaSCC cohort according to gene expression signature. b Progression-free survival of patients in the TCGA-LaSCC cohort according to gene expression signature

Fig. 4
figure 4

a Overall survival of patients with confirmed negative histopathological ALI status in the TCGA-LaSCC cohort according to gene expression signature. b Progression-free survival of patients with confirmed negative histopathological ALI status in the TCGA-LaSCC cohort according to gene expression signature

In conclusion, the newly identified expression signature consisting of six genes is closely associated with ALI and stratifies HNSCC patients at a higher risk for unfavorable progression-free and overall survival, in particular for the subgroup of LaSCC patients, even in histologically unclassified patients.

Univariate and multivariate analysis of laryngeal subsite

Univariate analysis and multivariate Cox regression models were conducted to identify independent risk factors regarding overall and progression-free survival in patients with LaSCC (Table 1). Cluster B, females, positive resection margins, extracapsular spread and perineural invasion were parameters significantly associated with unfavorable OS and PFS in the univariate analysis. In addition, a positive ALI status and non-smoking status were significantly associated with poor OS. ALI status was excluded from the multivariate analysis as it is not independent from the presented gene expression signature. Extracapsular spread and perineural invasion were excluded from the multivariate analysis due to missing information in a large proportion of patients. Non-smoking status (HR 0.11, p < 0.001), females (HR 3.47, p = 0.010) and positive resection margins (HR 5.64, p < 0.001) were associated with poor OS in multivariate analysis. Cluster B (HR 2.5, p = 0.016) and positive resection margins (HR 2.46, p = 0.026) were the only parameters associated with unfavorable PFS in multivariate analysis.

Table 1 Univariate and multivariate analysis of overall and progression-free survival of LaSCC (n = 111)

Validation in an independent LaSCC cohort

The independent validation cohort consisted of 13 patients with LaSCC, who were treated between 2012 and 2016 at the University Hospital Heidelberg (HIPO-HNC). Eleven patients (85%) were male and two patients were female (15%). Patients’ age ranged from 49 to 86 years with a median age of 61 years. All patients had a positive smoking history. 6 out of 13 (46%) patients reported alcohol consumption. Histopathological ALI status was positive in 3 out of 13 (23%) patients (Supplemental table 11). For all tumors, global gene expression data were available and used to conduct unsupervised hierarchical clustering based on the six-gene expression signature. Again, the six-gene expression signature revealed two clusters, and all cases with a positive ALI status based on histological examination were part of cluster B (Suppl. Figure 3). Four out of ten patients with a histologically negative ALI status were part of cluster B, indicating specimens lacking histomorphological ALI-like phenotype. However, due to the low number of patients, no survival analysis was conducted for the HIPO-HNC cohort.

Immunohistochemical detection of GLI2 and DSG1

To confirm altered expression of selected candidate genes of the six-gene expression signature on protein level, GLI2 and DSG-1 protein expression was detected by immunohistochemical staining on tumor sections of the HIPO-LaSCC cohort. GLI2 is well established as a downstream target of SHH and was used as a surrogate marker for active SHH signaling. IHC staining confirmed high GLI2, but low DSG1 protein expression in cancer cells of LaSCC attributed to cluster B and an opposite staining pattern for cases in cluster A (Suppl. Figure 4).

In conclusion, the newly identified gene expression signature is associated with histological ALI status in patients with HNSCC and segregates both patients with and without available histological ALI status. It serves as a prognosticator of poor OS in HNSCC and predicts short progression-free and overall survival in LaSCC. In addition, patients at risk with lacking ALI-like phenotype could be identified. Findings could be confirmed in an independent cohort of LaSCC.

Discussion

The purpose of this study was to identify a specific gene expression signature associated with ALI in HNSCC. Therefore, we analyzed publicly available data of the TCGA-HNSCC cohort. Microscopically determined ALI was associated with a significant shorter progression-free and overall survival in the investigated cohort. This is consistent with current evidence as ALI is a known prognosticator of outcome and patients with ALI-positive HNSCC are recommended to undergo adjuvant radiotherapy [25, 26]. The conducted gene expression analysis unveiled a six-gene expression signature, which was closely related to ALI status and predicted tumor recurrence as well as unfavorable survival in HNSCC, in particular for the subsite of LaSCC. Previous studies reported gene expression signatures in association with perineural invasion and extracapsular spread in different subpopulations of HNSCC [17, 18]. To our knowledge, this is the first study on an ALI-related gene expression signature in the field of HNSCC.

With regard to clinical relevance and in line with the close association to ALI, the newly identified six-gene expression signature served as a risk factor for unfavorable prognosis. Out of the TCGA-HNSCC cohort, 233 patients (47%) could be identified at a significantly higher risk of mortality. Moreover, the application of this gene signature on subsites of HNSCC revealed patients with LaSCC at a high risk of recurrence and poor overall survival. It is important to emphasize that patients showed recurrence in the presence of this gene signature even in ALI-negative tumors, according to pathological assessment. In addition, patients lacking histological ALI status were classified to either cluster A or cluster B by the presented gene signature. This underlines the additional benefit of the presented signature, as some specimens lack an ALI-like phenotype, while patients experience unfavorable survival.

Pathological examination includes examination of extensive and heterogeneous specimens in patients treated by tumor resection. Traditional risk stratification of patients treated by primary radio(chemo)therapy is challenging, as only small biopsy specimens are available for pathological examination. By use of the presented six-gene signature, biopsy specimen could be analyzed for further characterization of this subgroup of patients. In addition, molecular profiling of circulating tumor cells may be useful in the course of liquid biopsies in the future.

The identified six-gene expression signature includes two genes (SHH, SLC18A3) with increased mRNA expression levels and four genes (LCE3E, LCE2B, LCE3D, DSG1) with decreased mRNA expression levels, suggesting a gain in SHH signaling accompanied by a loss of epithelial differentiation is one underlying principle of ALI. The activation of hedgehog signaling has been described as a prognostic indicator in oral squamous cell carcinoma [27]. An increase in SHH signaling results in cellular proliferation and transformation and it has been described as an indirect angiogenic factor [28, 29]. Angiogenesis as a consequence of SHH function results in robust neovascularization by the induction of an array of angiogenic growth factors, including all isoforms of VEGF [28]. High lymphatic vessel density is correlated with poor OS and lymph node metastasis in HNSCC [30, 31].

SLC18A3 encodes a vesicular acetylcholine transporter (VAChT). The presence of VAChT in epidermal melanocytes and keratinocytes has been demonstrated [32]. Vesamicol, an VAChT antagonist, induces potent apoptosis in bronchioalveolar carcinoma [33]. As cholinergic signaling is upregulated in SCC and blocking cholinergic signaling limits growth of squamous cell lung carcinoma, cholinergic signaling is a potential target in the treatment of HNSCC [34].

All genes in the signature with an inverse correlation to ALI are linked to epithelial differentiation. As a member of desmosomal cadherins, desmoglein-1 (DSG1) is important in epithelial differentiation and intercellular junction mediating a strong cell–cell adhesion [35]. A loss of DSG1 is associated with an invasive phenotype, metastasis and decreased survival in squamous cell carcinoma [36,37,38,39,40].

We confirmed high GLI2, but low DSG1 protein expression in tumors of cluster B by immunohistochemical staining, which was enriched for ALI-positive cases. The presence of a different expression pattern on protein level could support standard histological examination to determine the risk for ALI.

Our results have some limitations. We relied on publicly available data which are subject to multiple errors, including data collection in multiple centers, pathologic examination by different pathologists and recall bias. We are aware of a limited number of patients in our in-house cohort. However, we could not identify a larger cohort of LaSCC containing all six genes of the described gene signature. Moreover, the availability of tissue specimen is a huge advantage of this cohort. While we showed that both low expression levels of genes affecting epidermal differentiation and high expression level of SHH are related to ALI, further studies may clarify whether SHH affects transcription of these epidermal markers.

The presented gene expression signature identifies histological ALI-negative patients at risk of disease progression and poor overall survival. This gene expression signature may serve as a molecular surrogate for ALI, as it reliably stratifies patients by ALI status. Furthermore, it enables risk stratification of patients lacking ALI status and predicts outcome in cases with missing ALI-like phenotype. Hereby, adjuvant therapy might be escalated or de-escalated according to the presented signature.

Conclusion

We identified a novel gene expression signature related to ALI status in HNSCC patients. This gene expression signature identifies HNSCC patients at risk of poor overall survival and predicts tumor recurrence and unfavorable survival in laryngeal carcinoma, even if histological ALI status was negative or unknown. However, as our independent validation cohort is small, further validation in a large prospective cohort is a major objective for further studies. In the future, implementation of additional stratification tools could assist in decision making in adjuvant therapy resulting in better prognosis of patients in need.