Mononuclear phagocyte system-related multi-omics features yield head and neck squamous cell carcinoma subtypes with distinct overall survival, drug, and immunotherapy responses

Zhang, Cong; Deng, Jielian; Li, Kangjie; Lai, Guichuan; Liu, Hui; Zhang, Yuan; Xie, Biao; Zhong, Xiaoni

doi:10.1007/s00432-023-05512-5

Mononuclear phagocyte system-related multi-omics features yield head and neck squamous cell carcinoma subtypes with distinct overall survival, drug, and immunotherapy responses

Research
Open access
Published: 27 January 2024

Volume 150, article number 37, (2024)
Cite this article

Download PDF

You have full access to this open access article

Journal of Cancer Research and Clinical Oncology Aims and scope Submit manuscript

Mononuclear phagocyte system-related multi-omics features yield head and neck squamous cell carcinoma subtypes with distinct overall survival, drug, and immunotherapy responses

Download PDF

Cong Zhang¹^na1,
Jielian Deng¹^na1,
Kangjie Li¹,
Guichuan Lai¹,
Hui Liu¹,
Yuan Zhang¹,
Biao Xie¹ &
…
Xiaoni Zhong¹

717 Accesses
Explore all metrics

Abstract

Background

Recent research reported that mononuclear phagocyte system (MPS) can contribute to immune defense but the classification of head and neck squamous cell carcinoma (HNSCC) patients based on MPS-related multi-omics features using machine learning lacked.

Methods

In this study, we obtain marker genes for MPS through differential analysis at the single-cell level and utilize “similarity network fusion” and “MoCluster” algorithms to cluster patients’ multi-omics features. Subsequently, based on the corresponding clinical information, we investigate the prognosis, drugs, immunotherapy, and biological differences between the subtypes. A total of 848 patients have been included in this study, and the results obtained from the training set can be verified by two independent validation sets using “the nearest template prediction”.

Results

We identified two subtypes of HNSCC based on MPS-related multi-omics features, with CS2 exhibiting better predictive prognosis and drug response. CS2 represented better xenobiotic metabolism and higher levels of T and B cell infiltration, while the biological functions of CS1 were mainly enriched in coagulation function, extracellular matrix, and the JAK-STAT signaling pathway. Furthermore, we established a novel and stable classifier called “getMPsub” to classify HNSCC patients, demonstrating good consistency in the same training set. External validation sets classified by “getMPsub” also illustrated similar differences between the two subtypes.

Conclusions

Our study identified two HNSCC subtypes by machine learning and explored their biological difference. Notably, we constructed a robust classifier that presented an excellent classifying prediction, providing new insight into the precision medicine of HNSCC.

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Head and neck cancer (HNSC) is the sixth most common malignant tumor in the world, with head and neck squamous cell carcinoma (HNSCC) accounting for the majority (Sung et al. 2021). Although some progress has been made in the treatment of HNSCC recently, many patients experience significant declines in swallowing or speech, and the 5-year total mortality rate remains at 60% (Murdoch 2007). Immunotherapy has become an emerging cancer treatment modality that regulates the immune system to fight against tumor cells and mitigates resistance to other treatment modalities, thus having a significant impact on the survival and quality of life of cancer patients (Chen and Mellman 2013). Preclinical data indicate that HNSCC is a deeply immunosuppressive disease characterized by abnormal secretion of pro-inflammatory cytokines and dysfunction of immune effector cells (Gavrielatou et al. 2020). Recent clinical practices have shown that the use of monoclonal antibodies to inhibit the PD-1/L1 checkpoint demonstrates good efficacy in the treatment of various cancer types including HNSCC. Combination therapies using checkpoint inhibitors with radiotherapy and/or chemotherapy, cytokine-based and/or adoptive T-cell therapies have also presented some effectiveness (Wallis et al. 2015). However, only a small number of HNSCC patients indeed have benefited from the widely used immunotherapy strategies in clinical practice (Seiwert et al. 2016), so it is increasingly important to discover new biomarkers for the personalized treatment of patients.

The mononuclear phagocyte system (MPS) is an important component of the body’s immune defense (Zhang and Zhang 2020; Ren et al. 2021) and the primary executor of nanoparticle clearance. Previous researches have shown that MPS is the first and foremost significant obstacle blocking drug carriers to target sites after entering the body, especially in terms of clearing the majority of circulating nanomaterials (Lu et al. 2023). MPS, consisting of monocytes, macrophages, and dendritic cells (DCs), play a role in innate immunity through pathogen sensing and phagocytosis and serve as a cellular component in adaptive immunity by presenting antigens to T cells (Geissmann et al. 2010). Monocytes represent immune effector cells, equipped with chemokine receptors and adhesion receptors, producing inflammatory cytokines, phagocytosing cells, and toxic molecules (Geissmann et al. 2010). They can differentiate into inflammatory DCs or macrophages during inflammation but may be less efficient in the steady state (Auffray et al. 2009). Macrophages are phagocytic cells that can eliminate malignant cells by phagocytosis or by producing soluble factors to induce tumor cell apoptosis. In addition to their direct cytotoxic abilities, macrophages play an important role in regulating the progression of tumors through mechanisms such as angiogenesis, fibrosis, and immune surveillance (Long and Beatty 2013). The secretion products of pDC which is a subtype of DC have immunogenic and tolerogenic functions in tumor immunity (Mitchell et al. 2018; Koucký et al. 2019).

MPS is a part of the tumor immune microenvironment. Previous studies have shown that the proportion of MPS in HNSCC patients varies individually and is usually associated with patient survival and other phenotypes (Balm et al. 1982, 1984a, b). However, in the exploration of biomarkers for HNSCC, previous research has not yet focused on this important component of the immune system. Instead, they have paid more attention to biomarkers associated with several hot topics, such as PDL-1 expression (Dong et al. 2002; Freeman et al. 2000; Topalian et al. 2012), tumor mutational burden/neo-antigens (Charoentong et al. 2017), interferon-γ gene signature (Woo et al. 2015), and tumor microenvironment (Ager and May 2015). There seems to be no previous study using MPS to search for biomarkers and to accurately classify HNSCC patients based on these biomarkers, to achieve further precision treatment. Therefore, differentiated patient classification based on MPS biomarkers is feasible and can provide some references for future personalized treatment of HNSCC.

Based on biomarkers of MPS obtained from single-cell sequencing analysis, this study aimed to recognize HNSCC subtypes with distinct overall survival, drug, and immunotherapy responses. Notably, we can not only consider the impact of gene expression, but also the factor including gene methylation and mutation to have a more comprehensive analysis while classifying HNSCC patients. In addition, to make our research results potentially useful in practice, we constructed a robust classifier based on genes with specific expression in subtypes, which can classify patients even with only gene expression data and had a certain degree of accuracy. The classifier now has been uploaded to GitHub (https://github.com/CQMUZC/getMPsub).

Materials and methods

Data source

A single-cell RNA sequencing (scRNA-seq) profile, GSE195832, was obtained from the GEO database (https://www.ncbi.nlm.nih.gov/geo). Considering the aim to explore the marker genes from tumor-infiltrating MPS, four raw scRNA-seq samples, GSM5851565, GSM5851567, GSM5851569, and GSM5851571, were included in our analysis. TCGA HNSCC multi-omics feature regarding RNA-seq, methylation, and mutation were attained as the training dataset from UCSC Xena (https://xenabrowser.net/) to form the specific HNSCC subtype. GSE41613 and GSE65858 were applied as the validation datasets to confirm whether the subtypes had strong universality and to verify certain effectiveness of the classifier.

Single-cell analysis

We first paid attention to the global quality of the mixed data including Mitochondrial percentage, the count of expression in samples per gene, and the count of gene expression per sample. Only cells with a mitochondrial percentage below 25% and gene expression numbers between 600 and 7500 will be included in the analysis. In addition, only genes expressed in more than 1000 cells will be considered. Given the batch effect from different samples, we employed “harmony” reduction to diminish the system error (Korsunsky et al. 2019; Azizi et al. 2018). We performed standardization and normalization on the preprocessed expression matrix so that gene expression levels of each cell can be compared and further analyzed. Principal component analysis (PCA) was adopted to reduce the data noise and uniform manifold approximation and projection (UMAP) was utilized to further depict cell clusters clearly. Given the results of “clustree” and the specific biological targets (Zappia et al. 2018), cells were clustered by integrated results from graph-based clustering and shared nearest-neighbor clustering. Then we annotated cell clusters and verified each other by two methods, which used the R package “SingleR” based on “celldex” to auto-annotate cells and some marker genes to manually annotate. According to the results, R function “FindAllMarkers” was applied to identify differentially expressed genes (DEGs) between each cell cluster and others. Through all the above analysis, we determined MPS and detected their marker genes to deepen our understanding. To confirm the identity of these marker genes, Enrichr (https://maayanlab.cloud/Enrichr/) was adopted to identify which cells will be enriched by these markers. And KEGG pathway enrichment analysis was employed to represent their biological function.

Clustering analysis in multi-omics features

MPS-related muti-omics were identified by the intersection of gene variables and markers from the single-cell analysis. R package “MOVICS” was utilized to characterize the HNSCC subtypes by unsupervised clustering (Lu et al. 2021). In the beginning, R function “getElites” filtered features that met some stringent requirements, in which “Cox” was used for RNA-seq and methylation while “freq” was for binary omics data. Then the optimal number of clusters was acquired referring to Cluster Prediction Index (CPI) and Gaps-statistics (Chalise and Fridley 2017). Considering the silhouette score and the final survival difference, SNF (Wang et al. 2014) and MoCluster (Meng et al. 2016) performed the consensus clustering and recognized the HNSCC subtypes. The overall nominal P-value was calculated by log-rank test and Kaplan–Meier (KM) Curve was printed to show the HNSCC subtypes’ survival difference. Finally, based on the specific expression genes of two subtypes, we aimed to develop a classifier that can predict the subtypes of other HNSCC patients using only RNA-seq.

Drug sensitivity

Genomics of Drug Sensitivity in Cancer (GDSC) are a public database containing cancer cells’ drug sensitivity and molecular markers corresponding to the applied drugs. Among patient subtypes, we considered the differences in four small molecule compounds, Paclitaxel, 5-Fluorouracil, Erlotinib, and Pazopanib. We tested the differential drug response of two clusters to nanomedicines, because Abraxane, a nano-subtype of paclitaxel, was included in the GDSC drug database. 5-Fluorouracil, as the main clinical treatment for HNSCC, was utilized to test whether the subtype had a distinct drug sensitivity for conventional treatment methods. Erlotinib and Pazopanib were utilized to test whether there was a response to specific targets, EGFR and CSF1R. Given the effect of these drugs used in combination with radiotherapy, we subsequently test the differential drug response of patients who had records of radiation therapy in two clusters. Independent-samples t-test was performed to determine the differences in two clusters. Kruskal–Wallis rank sum test was performed for multiple subtypes.

Tumor immune microenvironment

“CIBERSORT” algorithm (https://cibersort.stanford.edu/) evaluated the infiltration degree of 22 immune cells between the two subtypes. EPIC can analyze the expression matrix to determine the infiltration proportions of eight types of immune cells, including B cells, cancer-associated fibroblasts (CAFs), CD4⁺ T cells, CD8⁺ T cells, endothelial cells, macrophages, and NK cells, which were all important components of the immune microenvironment. TIMER utilized a deconvolution algorithm to infer the abundance of tumor-infiltrating immune cells from gene expression profiles. The immune-infiltrating situation of different subtypes can be corroborated given the results of the above algorithms. Additionally, “TIDE” algorithm was employed to evaluate the potential clinical efficacy of immunotherapy in different subtypes and reflected the underlying ability of tumors to escape the immune system. The TIDE score evaluated the potential response to immune checkpoint therapy, with a higher score indicating a poorer response to this treatment and may require alternative therapies. The Exclusion score evaluated the degree of infiltration of immune-suppressive cells in the tumor microenvironment. The higher score indicated a more severe infiltration of immune-suppressive cells and a poorer response to immune checkpoint therapy. The Dysfunction score evaluated the functional state of immune cells in the tumor microenvironment. The higher score demonstrated that the function of immune cells was suppressed in the tumor microenvironment, leading to a poorer response to immune checkpoint therapy. Besides, the higher MSI score corresponded to a higher level of immune cell infiltration and stronger immune response.

MPS-related analysis

The abundance of MPS including macrophages, DCs, and monocytes was calculated by “IOBR” (Zeng et al. 2021). Some targets, CSF1R, TLR8, EFGR, CXCR4, ABCA1, MGFE8, CD47, and CX3CL1, related to tumor-associated macrophagocytes (TAMs), immune therapy in HNSCC, and efferocytosis were detected to explore whether subtypes expressed differently.

Functional analysis

GO was a database established by the Gene Ontology Consortium, aimed at describing gene and protein function. Through GO enrichment analysis, this study can roughly annotate genes and classify them according to biological processes, molecular function, and cellular component. KEGG was a database that systematically analyzed gene function, links genomic information and functional information. The Hallmark gene set was a collection of genes developed jointly by the Human Cell Atlas and the Genomics Institute of the Novartis Research Foundation. The gene set was generated from cell-type-specific genomic expression data and contained gene markers for multiple tissue and cell types, which can be utilized to identify and analyze differences between different cell types or states. In this study, GSEA enrichment analysis was performed using the three different reference gene sets of GO, KEGG, and Hallmark to validate the specific biological differences exhibited by two subtypes. Pathways enriched in two or more reference gene sets were considered to represent unique biological functions specific to the subtype.

Integrated metabolism analysis

In addition to representing the immune microenvironment of subtypes, the R package “IOBR” had been used to evaluate metabolic differences between two subtypes. This study assessed the metabolic levels of subtypes from three perspectives: metabolism, fatty acid metabolism, and cholesterol metabolism, aiming to explore the relevant differences.

The integrated analysis of copy number variation

Considering subtype-specific mutation might be promising as therapeutic target, this study compared the mutational frequency among different subtypes. R package “MOVICS” offered two functions to measure genomic alterations potentially affecting immunotherapy, namely the quantification of total mutation burden (TMB) and fraction genome altered (FGA). In addition, the function “compFGA” calculated and compared not only FGA but also computed specific gain (FGG) or loss (FGL) per sample within each subtype. To measure the consistency of current subtypes with other pre-existing classifications, “MOVICS” offered the function “compAgree” to generate alluvial diagram, visualizing the consistency of two evaluation phenotypes with the current subtype as a reference.

Statistical analysis

All the data processing and analyses were executed in R software (Version 4.2.2). t-Test and Wilcoxon test were utilized to compare the differences between quantitative variables while Chi-square test was employed in qualitative variables. Spearman correlation test was utilized to explore the relationships between variables. The Kappa coefficient was utilized to measure the level of agreement between classifier results and actual classifications. A KAPPA value below 0.4 indicated poor agreement, 0.4–0.6 indicated moderate agreement, 0.6–0.8 indicated good agreement, and above 0.8 indicated excellent agreement. P < 0.05 was considered statistically significant in the whole process.

Results

Data processing

The main process of this study, including the analysis involved, is specifically shown in Fig. 1. Considering integrity and commonality, 848 HNSCC patients were included as the working data when patients containing missing information were excluded. Among them, 481 TCGA-HNSCC patients were included to train the classifier while 270 HNSCC samples in GSE65858 and 97 HNSCC samples in GSE41613 were enrolled, respectively, as two validation datasets. The basic characteristics including the origin, form, and some clinical characteristics of data per dataset is displayed in Table 1.

Table 1 The summary characteristics of the included samples in this study

Full size table

Single-cell analysis recognized marker genes

Through diminishing the batch effort, four raw scRNA-seq samples presented a uniform and random distribution under the UMAP reduction (Figure S1A). 25 cell clusters were gathered, of which a smaller number responded to more cells (Figure S1B). Some genes, CXCL8, AIF1, C1QC, C1QA, CD68, C1QB, CD83, CD86, CD14, and LYZ, that have been confirmed to be specifically expressed in MPS are used as manually annotated marker genes to view their expression in 25 cell clusters. Cluster 1, 13, and 16 were the cell populations with high expression of these genes (Fig. 2A). Besides, other cells were identified for gene expression using corresponding marker genes. Based on their expression level, each cell cluster was ultimately annotated as a specific cell population, in which cell clusters with high or no expression of multiple marker genes were defined as “unknown”. Besides, R package “SingleR” recognized certain cells referring to “celldex”. The corresponding heatmap (Fig. 2B) depicted the expression level of various cells in 25 clusters. Finally, we presented the annotation results under the UMAP reduction, showing their specific clusters (Fig. 2C). Almost cell clusters had the same definition except for “unknown”, and “Fibroblasts”, which heatmap resulted by “SingleR” also had a similar expression level compared with manual annotation. For instance, the heatmap indicated Fibroblasts reference expressed highly in cluster 2, 20, and 11, defined as “tissue stem cells” by SingleR, which is consistent with our manual annotation. Notably, clusters 1, 13, and 16 were manually annotated as mononuclear phagocytes including monocytes, macrophages, and DCs, that is, manual annotation and SingleR had the same determination regarding the targeted cells. 863 marker genes of mononuclear phagocytes were found by R function “FindAllMarkers” (Table S3) and were corroborated enriching in macrophages, monocytes, and DCs by Enrichr (Table S1). KEGG analysis presented that the pathway enriching the most genes was “Phagosome”, indicating our marker genes indeed characterized phagocytosis-related biological functions (Fig. 2D).

Clustering analysis recognized HNSC subtypes

Given the integrated results by the CPI and Gap-statistics, the imputed optimal cluster was 2 (Fig. 3A). The consensus heatmap depicted robust pairwise similarities for two subtypes and the details of how it got a stable clustering result by applying hierarchical clustering (Fig. 3B). The genome-wide heatmap was utilized to reveal information about how the samples cluster together and provide insights into potential sample biases or other artifacts (Fig. 3C). Additionally, it offered the difference in some phenotypes such as age and clinical stage between the two subtypes. According to subtype-specific biomarkers (Table S2), we established an HNSC classifier using nearest template prediction (NTP) to predict the possible subtypes of each sample. The Kappa values, evaluating the performance of the HNSC subtypes classifier, represented a good consistency in predicting the subtype for HNSC samples by the comparison of the actual subtype and the predicted type in the training dataset (Fig. 3D). The consensus clustering resulted in the HNSC subtypes with distinct survival differences in the training dataset, in which samples in cluster 2 were more likely to have a better prognosis (Fig. 3E). GSE41613 and GSE65858 were considered the validation dataset to confirm the effectiveness of the HNSC classifier. The KM curve indicated that cluster 2 identified by the classifier had a longer overall survival time (Fig. 3F, G). Notably, for the convenience of future research, we have packaged the classifier using the NTP algorithm into an R package called “getMPsub” and uploaded it to GitHub for easy accessibility.