Introduction

Esophageal cancer is a prevalent malignant carcinoma, ranking seventh in global incidence and sixth in cancer-related mortality [1]. It comprises two primary subtypes, esophageal adenocarcinoma (EAC) and esophageal squamous cell carcinoma (ESCC), distinguished by etiology and epidemiological distributions. With ESCC accounting for over 90%, China has the highest incidence rate of esophageal cancer globally [2, 3]. Unfortunately, ESCC often presents without symptoms in its early stages, leading to late diagnoses and a loss of opportunities for conventional treatment [4]. Hence, there is an urgent need to identify novel and reliable biomarkers to enhance early ESCC diagnosis and screening.

Invasion and metastasis are the leading causes of cancer-related morbidity and mortality [5, 6], constituting fundamental 'hallmarks of cancer' as described by Hanahan and Weinberg [7, 8]. Basement membranes (BMs) are intricate layers of extracellular matrix (ECM) that act as a barrier between tumor cells and the stroma, playing a crucial role in preventing malignant cell invasion and metastasis [9]. The ECM is a vital non-cellular component of the tumor microenvironment (TME), with the capacity to influence immune cell recruitment and the progression of tumors. The breaching of the epithelial BM by tumor cells during the metastatic phase is a critical point in their dissemination to distant organs [10]. The alteration of ECM architecture by neoplastic invasion is crucial to tumorigenesis, malignant cell metastasis, and the remodeling of the TME [11]. BMs are primarily composed of laminin, which facilitates cellular interaction and collagen IV, which maintains membrane structure [12, 13]. They also harbor growth factors, heparan sulfate proteoglycans, and nidogen [14]. The complex composition of BMs is essential for sustaining cell polarity, promoting cell adhesion, and enabling cell migration. The distinct protein expressions and densities of BMs in normal versus pathophysiological conditions, such as cancer, are particularly noteworthy. Furthermore, the upregulation of matrix metalloproteinases (MMPs), enzymes that can degrade BMs, is closely linked to tumor invasion and metastasis [15]. Extensive studies have uncovered significant correlations between BM-related genes (BMRGs) and a variety of malignancies, including breast cancer [16], lung adenocarcinomas [17], bladder cancer [18] and renal cell carcinoma [19].

ESCC is a multifaceted disease characterized by a complex, multi-step progression. It initiates in normal epithelial cells, evolves through basal cell hyperplasia, and escalates through multiple phases of intraepithelial neoplasia, culminating in an aggressive, invasive cancer [20]. This progression encompasses the epithelial-mesenchymal transition (EMT) within the basement membrane, facilitating metastasis through the lymphovascular system [21]. Spatial transcriptomic analyses on multistage ESCC samples have elucidated that aberrant interactions among epithelial cells in the basal layer initiate EMT and expedite [22]. In the realms of diagnosis and treatment, comprehending the role of BMs is indispensable. Carcinoma in situ, confined to the epithelium, can be managed effectively. However, once the disease breaches the BMs, it infiltrates deeper tissue. The relevance of BMs is further underscored in therapeutic approaches such as radiotherapy, chemotherapy, and immunotherapy. Consequently, identifying genes associated with BMs is paramount for precise diagnosis and treatment, averting the dire metastatic outcomes in ESCC.

In the current report, we commenced by procuring ESCC single-cell RNA-sequencing (scRNA-seq) data from the Gene Expression Omnibus (GEO) database, employing a set of hallmark genes to compute the BM scores. We then performed differential expression analysis of BM-related genes, contrasting ESCC with normal tissues, utilizing gene expression profiles from GEO and TCGA databases. Subsequently, we harnessed three distinct machine learning algorithms to pinpoint the seven potential biomarkers, around which we constructed a diagnostic nomogram model for the prediction of ESCC. The expression profiles and diagnostic accuracy of these seven hub genes were corroborated using external databases. Building upon preliminary evaluations that included assessments of immune cell infiltration, immune expression signatures and their correlation with EMT, we advanced to functionally delineate the role of the BGN gene in ESCC carcinogenesis through a series of experimental approaches.

Our study unveils novel diagnostic biomarkers for ESCC by identifying previously unreported BMRGs. The limitations of bulk technologies in discerning signals from heterogeneous cell populations are well recognized. In contrast, scRNA-seq offers unprecedented resolution in mapping cellular compositions within complex microenvironments. Leveraging this technology, we adopted an interdisciplinary strategy to identify candidate genes with enhanced diagnostic efficacy and specificity, surpassing the reliability of biomarkers identified in prior studies that did not undergo ROC analysis for diagnostic efficiency assessment [23, 24]. Furthermore, we have pinpointed a novel therapeutic target for ESCC through predictive drug sensitivity and in vitro cellular assays, laying the groundwork for personalized therapeutic strategies in ESCC.

Materials and methods

Data collection

We acquired four microarray datasets (GSE53625, GSE44021, GSE23400, GSE20347) and one scRNA-seq data (GSE188900) from the GEO database. Additionally, we obtained RNA expression profiles of ESCC samples (n = 81) from the TCGA database.

Analysis of scRNA-seq data and calculation of BM-related gene module score

The R package “Seurat” was used to filter, normalize and cluster the scRNA-seq raw data. We employed the universal manifold approximation and projection (UMAP) method for dimensionality reduction. To annotate cell clusters, we referred to cell markers obtained from the CellMarker and PanglaoDB databases. The AUCell and ssGSEA algorithms were utilized to calculate the BM scores. A total of 222 BM-related genes have been reported elsewhere, including genes encoding proteins of the BM matrix and components of the BM zone (Supplementary Table S1) [25]. However, in these scRNA-seq data, 10 of these genes were not detected. Subsequently, the BM-related gene module score was computed for each cell type using 212 hallmark genes. Additionally, Monocle was employed for trajectory analysis to predict cell differentiation and visualize gene expression profiles within each cell state.

Screening of BM-related biomarkers by machine learning

The R package “limma” was used to identify differentially expressed genes (DEGs) between the tumor and normal groups, applying thresholds of an adjusted P < 0.05 and | log2 FC |> 1. Initially, we screened for the intersection of BM-related DEGs in TCGA, GSE53625, and GSE44021. Then, three machine learning algorithms, namely the least absolute shrinkage and selection operator (LASSO) algorithm [26], support vector machine-recursive feature elimination (SVM-RFE) algorithm [27], and random forest (RF) algorithm [28], were employed to identify characteristic BMRGs. LASSO analysis was conducted using ten-fold cross-validated penalty parameters via the R package “glmnet” with the minimal lambda value considered optimal. The SVM-RFE classifier was implemented with the minimum cross-validation error using the “e1071” package. RF algorithm analysis was executed via the “randomForest” package and genes with a Mean Decrease Gini value greater than two designated as characteristic genes. Finally, receiver operating characteristic (ROC) analysis was performed using the “pROC” R package and the area under the curve (AUC) was used to estimate the diagnostic efficacy of candidate BM genes in both training and validation GEO datasets.

Functional enrichment analysis

The R package “clusterProfiler” was employed to perform Gene Ontology (GO) annotation and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis [29]. The GO categories encompassed three sections: biological processes (BP), molecular functions (MF), and cellular components (CC), with a significance threshold set at an adjusted P-value < 0.05 for the enrichment pathways. GSCALite database (http://bioinfo.life.hust.edu.cn/web/GSCALite/) was employed to investigate the cancer pathway activity associated with candidate BM genes. Additionally, the EMTome database (http://www.emtome.org/) was used to analyze the correlation between gene expression and EMT signature in pan-cancer. The GeneMANIA (http://www.genemania.org) was employed as a flexible online resource to visualize the interaction network of key BM genes and predict gene function and interactions [30].

Drug sensitivity analysis

The “calcPhenotype” function of the “oncoPredict” R package was selected to calculate predicted sensitivity values for each tumor sample, leveraging the training drug sensitivity data sourced from the Genomics of Drug Sensitivity in Cancer database and training expression data from the Cancer Therapeutics Response Portal [31]. Tumor samples were stratified into high and low expression groups based on the median expression level of each gene. A sensitivity score was used to assess drug responsiveness between these groups, where a lower score indicated a higher sensitivity to potential drugs.

Immune cell infiltration analysis

To evaluate the correlation between BM genes and the tumor immune microenvironment, we employed the R package “estimate” to calculate the stromal score, immune score, and estimate score for each ESCC patient [32]. The CIBERSORT algorithm was used to estimate the abundance of 22 distinct immune cells based on gene expression profiles [33]. The MCPcounter algorithm was applied to quantify the relative abundance of eight immune cells and two stromal cells within a tissue sample derived from transcriptomic data [34]. In addition, we also examined the association between BM genes and 40 immune checkpoint genes. In this study, the expression profile of candidate genes in patients was acquired from GSE53625.

Consensus clustering analysis

The unsupervised consensus clustering approach was applied to divide ESCC patients into the optimal number of clusters based on BM gene expression, utilizing the R package “ConsensusClusterPlus” [35]. Subsequently, Gene Set Variation Analysis (GSVA) analysis was conducted with the gene set “c5.all.v2023.2.Hs.symbols” as input. GSVA scores were calculated using the R package limma to identify differentially enriched GO terms between the two subtypes, with |t values|> 2 serving as the threshold for significance.

Cell culture

The human normal esophageal epithelial cell line HEEC and ESCC cell line TE-1 used in this study were obtained from the Cell Bank of the Chinese Academy of Sciences (Shanghai, China). The cells were cultured in Dulbecco’s modified eagle medium (DMEM, ThermoFisher Scientific, USA) and RPMI-1640 medium (Gibco, USA), respectively. Both mediums were supplemented with 10% fetal bovine serum (FBS, Gibco, USA) and maintained at a temperature of 37℃ in a humidified environment with 5% CO2. Additionally, the cells were regularly checked for mycoplasma contamination as needed.

Cell transfection and drug treatment

Two small interfering RNA (siRNA) sequences targeting human BGN were synthesized by Hanbio Biotechnology (Shanghai, China). A scrambled siRNA was also synthesized as a control. The plasmids were then transfected into the TE-1 cell line using Lipofectamine®3000 (ThermoFisher Scientific, USA) according to the manufacturer’s instructions and incubated at 37 °C for 24 h. Docetaxel, paclitaxel, oxaliplatin and OSI-027 were purchased from MCE, USA.

Western blot analysis

The cells were treated with ice-cold RIPA lysis buffer (Beyotime, China) to extract total protein. The protein concentration was measured using the BCA Protein Assay Kit (Beyotime Biotechnology, Shanghai, China). Subsequently, the proteins were separated by 10% sodium dodecyl sulphate polyacrylamide gel electrophoresis (PAGE) and transferred onto a polyvinylidene difluoride (PVDF) membrane (Millipore, USA). After blocking with 5% (w/v) non-fat milk for one hour at 25 °C, the PVDF membranes were incubated overnight at 4 °C with appropriately diluted primary antibodies against BGN (ab109369, 1: 1000 dilution, Abcam, USA), GAPDH (60004–1, 1:3000 dilution, Proteintech, China), N-cadherin (4061S, 1: 1000 dilution, CST, USA) and E-cadherin (ab314063, 1: 1000 dilution, Abcam, USA), respectively. The PVDF membranes were then washed several times with TBST buffer (10 mM Tris–HCl pH 8.0, 150 mM NaCl, 0.05% Tween 20) and incubated with diluted HRP-conjugated IgG secondary antibodies (ab205719, 1: 10,000 dilution, Abcam, USA) for two hours at 25 °C. Finally, Protein bands were detected using enhanced chemiluminescence detection reagents (Thermo Pierce, Cramlington, UK) and quantified utilizing the Bio-Rad Image Lab software.

Cell proliferation assay

Cell proliferation was assessed using the Cell Counting Kit-8 (CCK8) and plate colony formation assay. For the CCK8 assay, cells were seeded in 96-well plates at a volume of 100 µL per well. After incubation for one day, three days, five days and seven days, 10 μL of CCK8 (MCE, USA) was added to each well and incubated at 37 °C for two hours. The optical density (OD) values were then measured using a microplate reader (BioTek, USA) at 450 nm. For the colony formation assay, cells in the logarithmic growth phase were selected and dissociated with trypsin to obtain a single-cell suspension. These cells were then inoculated into a 12-well plate and cultured for 10 days, with regular monitoring of their growth. Then, all colonies were fixed using 4% paraformaldehyde, stained with crystal violet, and captured using a digital camera.

Wound‑healing assay

The transfected TE-1 cells were seeded into six-well plates with a density of 1 × 106 cells per well and cultured for 24 h. Cells adhered and scratched with 200 μL pipette tips. Then, the cells were washed with PBS twice and cultured in a serum-free medium. Images were captured at 0 h and 24 h.

Transwell assay

The transfected TE-1 cells were seeded into the upper chamber of a 24-well Transwell plate (8 μm pore size, Coring, USA). The assays involved culturing the cells with 200 μL serum-free RPMI-1640 medium, while the lower chambers were supplemented with 600 μL of RPMI-1640 medium containing 20% FBS. After incubation, the migrated cells were fixed with 4% paraformaldehyde and stained with crystal violet. The cells above the membrane in the upper chamber were removed with cotton swabs. Finally, a quantitative analysis was conducted by counting the stained cells in three randomly selected fields.

Flow cytometry

In brief, the cells were collected, washed, and re-suspended in a binding buffer to be monitored. Cells were processed with Annexin V-fluorescein isothiocyanate (FITC) and propidium iodide (PI) for apoptosis detection according to the protocol (Vazyme Biotech, China). Stained cells were detected and analyzed using a CantoII flow cytometer (BD Biosciences, Franklin Lakes, NJ, USA).

Statistical analysis

All experiments were conducted independently at least three times, yielding consistent outcomes, unless specified otherwise in the accompanying figure legends. All statistical analyses were performed using R software version 4.3.1. Comparisons between two different groups were carried out using the t-test. Correlation analysis was evaluated using Spearman's test. Statistical significance set at P < 0.05, ns > 0.05, and indicated by asterisks (* P < 0.05, ** P < 0.01, *** P < 0.001, and **** P < 0.0001).

Results

Single-cell RNA-seq profiling demonstrates a significant correlation between ESCC occurrence and BMs

We analyzed scRNA-seq data using the UMAP method, revealing 18 distinct clusters. These clusters were meticulously classified into eight major cell types: T cells, B cells, myeloid cells, endothelial cells, epithelial cells, mast cells, fibroblasts, and smooth muscle cells (Fig. 1A). Expression profiles of marker genes for each cell type are depicted in the dotplot (Fig. 1B). Out of the 222 BM genes analyzed, 212 genes were identified in our single-cell investigation. Utilizing these genes, we calculated the BM-related module score. Analysis by AUCell (Fig. 1C) and ssGSEA (Fig. 1D) algorithms consistently indicated elevated BM scores in most ESCC cell types compared to normal cells, with the fibroblasts exhibiting the highest scores. This finding strongly suggests that a noteworthy association exists between the occurrence of ESCC and the emergence of BMs.

Fig. 1
figure 1

Single-cell analysis revealed abnormally elevated BM-related module scores in ESCC cells. (A) The scRNA-seq data was analyzed using the UMAP algorithm, resulting in eight cell types. (B) The identification and classification of cell types were performed utilizing a set of specific cell markers that enabled accurate annotation of each cell type. (C) The BM-related module score was calculated using the AUCell algorithm. (D) Computation of the BM-related module score utilizing the ssGSEA algorithm. ** P < 0.01, *** P < 0.001, **** P < 0.0001, ns, no significance

Machine learning-based screening of characteristic genes for clinical diagnosis of ESCC

In order to elucidate characteristic genes, we identified DEGs between the ESCC group and the control group from the TCGA, GSE53625, and GSE44021 datasets via the R package “limma”. Venn diagrams revealed 22 genes shared by DEGs and 220 BMRGs (Fig. 2A). Subsequently, we conducted functional analysis on these 22 genes. GO analysis indicated their involvement in the extracellular matrix organization, collagen-containing extracellular matrix, basement membrane, and matrix structural constituent terms (Supplementary Fig. S1A-C). Additionally, KEGG analysis showed enrichment in ECM-receptor interaction, focal adhesion and PI3K-Akt signaling pathway (Supplementary Fig. S1D), with most genes enriched in basement membrane-related terms upregulated in ESCC group.

Fig. 2
figure 2

Identification of BM-related diagnostic biomarkers for ESCC via machine learning methods. (A) Venn diagram showing the overlap between DEGs and 220 BMRGs. (B) Optimal λ values for hub genes identified by the LASSO logistic regression algorithm. (C) The correlation between the number of trees and the error rate in Random forests. (D) Gene selection using the SVM-RFE algorithm. (E) Venn diagram of diagnostic markers shared by LASSO, random forest, and SVM-RFE algorithms. (F) Split violin plot demonstrating the differential expression of nine hub genes between ESCC patients and controls in the GSE53625 dataset. *** P < 0.001

We further employed three machine learning algorithms to narrow down the diagnostic genes. The LASSO algorithm identified 13 characteristic genes according to the optimum λ value (Fig. 2B). Their coefficient profile is shown in Supplementary Fig. S1E. Using random forests, we optimized the model to achieve a minimum error rate (Fig. 2C) and selected genes with a Mean Decrease Gini (MDG) > 2 (Supplementary Fig. S1F), resulting in the identification of 16 characteristic genes. Meanwhile, the SVM-RFE algorithm screened 16 genes with the minimum error (Fig. 2D). After intersecting the results from LASSO, RF and SVM-RFE, nine key diagnostic genes were identified (Fig. 2E). We examined the expression patterns of these nine genes between ESCC and control samples using the GSE53625 dataset (Fig. 2F). Finally, we selected seven upregulated hub genes in tumor tissues for further analysis, including BGN, COL4A1, LAMB3, LUM, MMP1, NELL2, and SPARC.

Validating potential biomarkers for ESCC diagnosis

In our study, it is evident that the seven selected genes demonstrate a distinct co-expression pattern in the GSE53625 dataset (Fig. 3A). Analysis using the Genemania network revealed strong interactions among these genes in various biological aspects (Supplementary Fig. S2A). Based on these seven genes, we constructed a nomogram exhibiting an excellent predictive performance with an AUC value of 1.0 (Fig. 3B, C). The calibration curves demonstrate that the diagnostic model of the constructed nomogram closely aligns with the ideal model, indicating a highly accurate predicted probability (Supplementary Fig. S2B). We further assessed the diagnostic efficacy of each gene in predicting the occurrence of ESCC. The results are summarized as follows: BGN (AUC 0.981, CI 0.967–0.992), COL4A1 (AUC 0.918, CI 0.885–0.948), LAMB3 (AUC 0.962, CI 0.939–0.981), SPARC (AUC 0.965, CI 0.943–0.984), LUM (AUC 0.925, CI 0.893–0.952), MMP1 (AUC 0.981, CI 0.964–0.993), and NELL2 (AUC 0.923, CI 0.891–0.953) (Fig. 3D).

Fig. 3
figure 3

Development and validation of a diagnostic nomogram model. (A) Correlation matrix of the seven gene expression profiles from the GSE53625 dataset, highlighting inter-gene relationships. (B) Construction of a predictive nomogram incorporating characteristic genes for ESCC. Each gene is assigned a score, with the total score representing the sum of individual gene scores. (C) ROC curve analysis of the nomogram model, demonstrating its diagnostic accuracy. (D) Comparative ROC curves for the individual predictive genes within the GSE53625 dataset. (E) ROC curves for the corresponding genes in the GSE20347 dataset illustrate their predictive performance. (F) Split violin plot contrasting gene expression levels between ESCC patients and controls within the GSE20347 dataset, underscoring the differential expression patterns. *** P < 0.001

To confirm the expression levels and diagnostic precision of these signature genes, we utilized two independent external datasets, GSE20347 and GSE23400. In the GSE20347 dataset, the AUC values for BGN, LAMB3, SPARC, MMP1, LUM, COL4A1, and NELL2 were 0.886, 0.955, 0.869, 0.986, 0.917, 0.938, and 0.920, respectively (Fig. 3E). Moreover, the expression levels of these seven genes were significantly elevated in tumor samples compared to the control group (Fig. 3F). In the GSE23400 dataset, the AUC values for BGN, LAMB3, SPARC, MMP1, LUM, COL4A1, and NELL2 were 0.921, 0.927, 0.859, 0.968, 0.827, 0.844, and 0.868, respectively (Supplementary Fig. S2C). Additionally, all seven genes exhibited higher tumor expression levels than controls (Supplementary Fig. S2D).

Analysis of gene expression and tumor microenvironment in ESCC

Studies indicate that the tumor microenvironment modulates extracellular matrix components to foster tumorigenesis [36, 37]. We then conducted an in-depth analysis to elucidate the interplay between the tumor microenvironment and the seven BM gene expressions in ESCC patients. Utilizing a heatmap, we depicted the tumor microenvironment scores alongside the expression profiles of seven key BM genes, offering a visual synopsis of the molecular landscape in ESCC patients (Fig. 4A). Notably, the BGN, MMP1, LUM and SPARC genes exhibited positive associations with the immune score, while COL4A1, LAMB3 and NELL2 demonstrated negative correlations, suggesting a dichotomous role in immune modulation (Fig. 4B).

Fig. 4
figure 4

Tumor microenvironment characterization and expression of seven BM genes in ESCC patients. (A) Heatmap representation displaying the tumor microenvironment score and the expression levels of the seven BM genes in ESCC patients. (B) Scatter plots illustrating the correlations between the expression of the seven BM genes and the immune score (left panel) and the estimate score (right panel), highlighting the association between gene expression and immune activity. (C) Bar graphs showing the differences in immune-related functional scores between normal and tumor groups, indicating the impact of the tumor microenvironment on immune response. (D) Correlation matrices displaying the relationships between the expression of the seven BM genes and 22 immune-related cell types, elucidating the interaction between these genes and the immune system. (E) Correlation matrices depicting the associations between the expression of the seven BM genes and immune checkpoint genes, revealing potential regulatory mechanisms in the tumor microenvironment. * P < 0.05, ** P < 0.01, *** P < 0.001

Further, we employed ssGSEA to scrutinize the variations in 13 immune-related functions between normal and tumor tissues. Our findings indicated that 11 of these immune function scores were significantly elevated in tumor tissues, underscoring the presence of an aberrant immune response in ESCC (Fig. 4C). To delve into the cellular composition of the immune response, we applied the CIBERSORT algorithm to quantify the abundance of 22 distinct immune cell populations. We discovered that M0 and M1 macrophages, CD4 memory-activated T cells, and activated dendritic cells were positively correlated with the expression of all seven BM genes. Conversely, Tregs, CD8 T cells, activated NK cells, and naive B cells exhibited negative correlations, highlighting a complex interplay between tumor-associated genes and immune cell dynamics (Fig. 4D).

Leveraging these insights, we performed an integrated analysis of the immunological profiles of candidate BMRGs. This analysis revealed significant positive correlations with a spectrum of immune checkpoint genes, particularly CD276, CTLA4 and TNFRSF family members, indicating a potential immunological signature that may be harnessed for therapeutic development (Fig. 4E).

Molecular subtyping and immunophenotyping of ESCC reveal distinct tumor microenvironments

To delineate the molecular subtypes of ESCC and elucidate their associated molecular features, we further performed a cluster analysis using a consensus clustering approach based on the expression patterns of seven BMRGs derived from the GSE53625 dataset. The optimal number of clusters was determined to be two (k = 2), which stratified 179 ESCC samples into two distinct subtypes: subtype A (n = 54) and subtype B (n = 125) (Fig. 5A). All the seven BMRGs elevated in subtype B compared to subtype A (Fig. 5B). To elucidate the biological differences between these subtypes, we performed GSVA analysis to assess GO enrichment. The results showed that subtype B exhibited upregulation of various pathways, including gap junction assembly, metalloendopeptidase activity, keratinocyte proliferation and migration, collagen metabolic process, collagen catabolic process, lamellipodium membrane and extracellular matrix disassembly (Fig. 5C, D). Considering the central role of basement membranes in the tumor microenvironment, we further evaluated the differences between the two subtypes. Subtype B was found to have elevated stromal score and immune scores compared to subtype A, suggesting a more complex and potentially aggressive tumor microenvironment (Fig. 5E). Additionally, we utilized the MCPcounter algorithm to quantify the relative proportions of different infiltrating immune and stromal cells. Notably, four infiltrating immune cell types — cytotoxic lymphocytes, NK cells, monocytic lineage cells and neutrophils — as well as two stromal cell types, endothelial cells, and fibroblasts, were found to be more abundant in subtype B. Among these, fibroblasts were identified as the predominant cell component, indicating their potential role in the tumorigenic process (Fig. 5F). This finding aligns with the scRNA-seq analysis, which reveals that fibroblasts possess the highest scores for BM-related gene modules in ESCC (Fig. 1C, D). These results suggest a significant association between fibroblasts and the pathophysiology of basement membrane lesions.

Fig. 5
figure 5

Characterization of molecular subtypes in ESCC and their associated molecular features. (A) Consensus clustering matrix demonstrating the optimal partitioning into two distinct subtypes when k = 2. (B) Box plots illustrating the mRNA expression levels of the seven BMRGs genes across the two identified subtypes. (C) Heatmap depicting the GSVA enrichment scores of GO pathways, highlighting the differences between the two subtypes. (D) Corresponding bar plot providing a detailed view of the GSVA enrichment differences in GO pathways between the subtypes. (E) Violin plot displaying the comparative stromal, immune, and estimate scores for the two subtypes, indicating variations in the tumor microenvironment. (F) Box plots showing the infiltration levels of eight immune cell types and two stromal cell types in the two subtypes, revealing the cellular composition of the tumor microenvironment * P < 0.05, ** P < 0.01, *** P < 0.001

EMT pathway activation and fibroblast heterogeneity in ESCC tumor progression

In our analysis of ESCC scRNA-seq data, we identified the expression of six BMRGs, with the LAMB3 gene notably absent from this dataset. Remarkably, the genes BGN, LUM, SPARC, and COL4A1 exhibited high expression levels in fibroblastic and smooth muscle cells (Fig. 6A). This observation corroborated our previous cluster analysis and suggested a significant role for these BMRGs in the cancer-associated fibroblasts. To further elucidate the dynamics of fibroblast differentiation, we extracted fibroblasts to perform a pseudotime analysis using Monocle2, which delineated six distinct states of cell fate trajectories based on their gene expression patterns (Fig. 6B). The differential expression of BGN, LUM, SPARC, and COL4A1 throughout fibroblast differentiation implies that these genes may exert distinct influences on tumor progression (Fig. 6C).

Fig. 6
figure 6

Molecular characterization and functional analysis of fibroblasts in ESCC. (A) Tissue-specific expression patterns of BMRGs including BGN, MMP1, LUM, SPARC, COL4A1, and NELL2 across various cell types. (B) Pseudotime trajectory analysis of fibroblasts, illustrating differentiation stages (left) and progression along pseudotime (right). (C) Expression dynamics of BGN, COL4A1, LUM, and SPARC across different pseudotime points and differentiation stages. (D) The GSCALite platform was used to analyze the cancer-related pathway activity. (E) Examination of the correlation between BGN gene expression and EMT in pan-cancer using the EMTome database, revealing the widespread impact of BGN on cancer progression

Expanding the scope of our investigation to a broader oncological context, we harnessed the GSCALite platform to explore the correlation between gene expression profiles and the activation of cancer-related signaling pathways in a comprehensive dataset encompassing 33 distinct malignancies from TCGA. Our analysis uncovered that BGN exhibited the most significant association with the activation of the EMT pathway among the seven evaluated genes, shedding light on the molecular mechanisms driving tumorigenesis (Fig. 6D). This finding was further substantiated by a focused analysis of the correlation between BGN gene expression and EMT in pan-cancer using the EMTome database, which aligned with the results from GSCALite (Fig. 6E). Recognizing the critical role of EMT in processes such as tumor initiation, progression, and metastasis, we resolved to investigate the specific function of BGN in ESCC.

BGN suppression attenuates the migratory capacity of ESCC cells and augments apoptotic activity

To investigate the role of BGN in ESCC pathophysiology, we utilized RNA interference to diminish endogenous BGN expression. Our preliminary assessment of BGN expression levels disclosed a markedly elevated expression in the TE-1 cell line when contrasted with that of HEEC cell line (Supplementary Fig. S3). Consequently, TE-1 was selected for subsequent investigation. Two distinct small interfering RNAs (siRNAs) and their corresponding scrambled controls were transfected into TE-1 cells, with knockdown efficacy confirmed by western blot analysis (Fig. 7A and Supplementary Fig. S4A). EMT, a process pivotally associated with cancer invasion and metastasis, involves transformating polarized epithelial cells into cells with mesenchymal phenotypes, characterized by the loss of cell adhesion and acquisition of migratory properties. Following BGN knockdown, there was a notable increase in the expression of the epithelial cell marker E-cadherin, coupled with a significant decrease in the mesenchymal cell marker N-cadherin (Fig. 7B and Supplementary Fig. S4B, C). Notably, BGN suppression led to a significant curtailment in cell proliferation and a reduction in clone formation in vitro (Fig. 7C, D). Flow cytometry analyses indicated that BGN knockdown enhanced apoptosis in TE-1 cells (Fig. 7E). Additionally, the migratory capabilities of TE-1 cells were substantially diminished upon BGN inhibition (Fig. 7F, G). Collectively, these findings underscore the tumor-suppressive effects of BGN inhibition in ESCC.

Fig. 7
figure 7

Impact of BGN knockdown on ESCC cell phenotypes. (A) Western blot analysis confirming the significant suppression of BGN expression following siRNA transfection in ESCC cells. (B) Knockdown BGN led to a significant downregulation of the EMT marker N-cadherin, accompanied by an upregulation of E-cadherin. (C) CCK-8 assay depicting the attenuated proliferation of TE-1 cells upon BGN knockdown. (D) Plate cloning assay illustrating the reduced clone formation capacity of TE-1 cells with BGN depletion. (E) Flow cytometry analysis revealing the influence of BGN knockdown on apoptosis in TE-1 cells, focusing on the role of BGN. (F-G) Transwell assays (F) and wound-healing assays (G) quantifying the compromised migration capabilities of TE-1 cells post-BGN knockdown. Error bars, ± SD from at least three biological replications. * < 0.05,  ** P < 0.01, *** P < 0.001, ****P < 0.0001. (H) In the left panel, the AlphaFold-predicted structure of the BGN protein (P21810) is depicted in a cartoon representation, with O-linked glycosylation sites at S42, S47, S180 and S198, as well as N-linked glycosylation sites at N270 and N311, illustrated using a ball-and-stick model. The 12 leucine-rich repeat (LRR) domains are delineated within dashed squares. The right panel presents a rotated 180° view of the charge distribution, with elliptical dashed lines highlighting the positively charged regions at the head and tail of the 'palisade' structure

BGN, a member of the small leucine-rich proteoglycan (SLRP) family, class I subfamily, features 12 leucine-rich repeat (LRR) domains (Fig. 7H). AlphaFold's predicted structure of BGN (P21810) reveals an orderly arrangement of β-strands on the cell membrane, creating a 'palisade' configuration that likely mediates key protein interactions. Given the established role of glycosylation in protein function [38, 39], the O-linked glycosylation sites [40] at S42, S47, S180, S198, and N-linked sites at N270, N311 on BGN may critically influence its trafficking, localization, signaling, and immunological properties, providing a foundation for future research. Additionally, charge analysis identifies positively charged domains within the LRR ' palisade ' head (e.g., K139, K209, K238) and bottom regions (e.g., K82, K128, R266), which may serve as crucial sites for cofactor recruitment and protein interaction, suggesting potential targets for drug design (Fig. 7H).

BGN expression dictates sensitivity to chemotherapeutic agents in metastatic ESCC

In the therapeutic landscape of metastatic ESCC, chemotherapy regimens incorporating platinum and fluoropyrimidine/paclitaxel continue to set the standard of care [41]. Chemotherapy sensitivity refers to the extent of vulnerability of cancer cells to chemotherapeutic agents. Highly sensitive tumor cells are more likely to be eradicated during treatment, thereby enhancing the therapeutic efficacy of the drugs. Our approach commenced by stratifying ESCC patients into high and low-expression cohorts according to the median expression level of BGN. Subsequently, we assessed the differential drug sensitivity between these cohorts to identify candidate therapies for patients with elevated BGN expression. Notably, ESCC patients exhibiting high BGN expression demonstrated enhanced sensitivity to docetaxel, paclitaxel, and oxaliplatin, all established chemotherapeutic agents in ESCC management and mTOR inhibitor OSI-027 (Fig. 8A). To elucidate this, we conducted cell proliferation assays to assess the chemosensitivity of cells exhibiting varying levels of BGN expression to a panel of clinically relevant drugs. It is worth noting that the proliferation of TE-1 cells, which endogenously express high levels of BGN, was markedly suppressed upon treatment with docetaxel, paclitaxel, oxaliplatin and OSI-027, particularly when compared to cells with BGN knockdown (Fig. 8B–E). Altogether, our data suggest that BGN expression may serve as a predictive biomarker to guide clinical drug selection in ESCC treatment.

Fig. 8
figure 8

BGN expression and its correlation with drug sensitivity in ESCC. (A) Predicted sensitivity profiles for the chemotherapeutic agents docetaxel, paclitaxel, oxaliplatin and OSI-027, stratified by BGN expression levels. (B-E) Proliferation assays showing the effects of docetaxel (B), paclitaxel (C), oxaliplatin (D) and OSI-027 (E) drugs on BGN knockdown TE-1 cells, highlighting the influence of BGN expression on treatment response, suggesting a potential role of BGN in dictating therapeutic susceptibility. Error bars, ± SD from three biological replications. * P < 0.05, ** P < 0.01, *** P < 0.001, **** P < 0.0001

Discussion

ESCC is characterized by its aggressive phenotype, typically presenting at advanced stages with a pronounced tendency for metastasis and exhibiting resistance to chemotherapeutic agents. The BMs provide structural support and tissue compartmentalization, facilitating dynamic cellular interactions that regulate behaviors like proliferation and migration [13]. Additionally, BMs act as a physical barrier, typically impeding primary tumor dissemination during the in situ phase of cancer [42]. Meanwhile, the degradation of BMs is a critical initial step for ESCC cells to invade and metastasize. Therefore, investigating the early diagnosis and mechanisms of ESCC may enhance diagnostic precision and guide stratified therapeutics, addressing a critical research gap. With this in mind, we aimed to investigate the clinical relevance of BM-associated genes in ESCC. Using single-cell data, we initially estimated the BM score in individual cells based on the expression profiles of related genes. Our integrative analysis identified a set of 22 differentially expressed genes enriched in BMs functions, which were further refined to seven key diagnostic genes (BGN, COL4A1, LAMB3, LUM, MMP1, NELL2, and SPARC) through machine learning algorithms. These genes showed distinct co-expression patterns and high diagnostic accuracy in ESCC, as validated across independent datasets, underscoring their potential as robust biomarkers for early detection and stratified treatment strategies. This approach holds the potential for diagnosing and treating a spectrum of solid tumors in future oncology applications.

Among the seven biomarkers, MMP-1, known for its BM-degrading capabilities, has been implicated as a potential diagnostic and prognostic biomarker in ESCC [43]. COL4A1, a key component of collagen IV, has been shown to promote tumorigenesis in various cancers and is consistently upregulated in ESCC [44]. Laminins, encoded by LAMB3, are glycoproteins integral to BM and have been linked to cancer cell proliferation and migration [45]. SPARC, a matricellular glycoprotein, is associated with aggressive metastatic tumors and is highly expressed in ESCC, where it may promote tumor progression through interactions with the tumor microenvironment [46]. LUM, a small leucine-rich proteoglycan, and NELL2, a secreted protein with potential roles in cancer development, were also identified as differentially expressed in ESCC. Biglycan (BGN) is a vital extracellular matrix constituent implicated in various malignancies [47,48,49,50]. It can be secreted by prostate cancer cells and regulate myeloid-derived suppressor cells (MDSCs) migration through Akt/mTOR and MNK/eIF4E pathways [51].

In the context of advanced cancers, tumor-associated macrophages (TAMs) are known to undermine immune surveillance by adopting immunosuppressive attributes that impede the antitumor efficacy of CD8+ T cells. These TAMs exhibit an upregulation of PD-L1, which interfaces with PD-1 receptors on T cells, precipitating T cell exhaustion. Furthermore, they secrete IL-10, TGF-β, and arginase, as well as initiate collagen deposition and TME metabolic reprogramming, which collectively dampen the activity of CD8+ T cells [52]. This immunosuppressive environment propels the differentiation of CD4+ T cells into regulatory T cells (Tregs), thereby exacerbating the suppression of antitumor responses. Treg cells and TAMs are engaged in a reciprocal positive feedback mechanism that fosters an immunosuppressive phenotype [53]. This study revealed that the seven BM genes significantly correlate with M0 macrophages, M1 macrophages, and CD8 T cells, indicating their role in the tumor immune microenvironment (Fig. 4D). In addition, cancer-associated fibroblasts (CAFs) and TAMs interact through the SDF-1/CXCR4 signaling axis, leading to the activation of PI3K/Akt and NF-κB pathways, thereby reinforcing a sophisticated network of immune evasion [54, 55]. CAFs are critical in modulating the biomechanical properties of the ECM, contributing to its stiffness and degradation. These cells actively engage with a diverse array of cell types within the TME, thereby orchestrating ECM remodeling to facilitate the process of tumorigenesis [36]. In this work, our research delineates an ESCC tumor microenvironment dichotomy in BM gene expression, signifying immune response complexity and therapeutic targeting opportunities. ESCC molecular subtypes, distinguished by differential BMRG expression, reveal subtype B as more aggressive with enriched immune and stromal scores, highlighting the role of fibroblasts in tumorigenesis. BGN is hypothesized to enhance the migratory and invasive properties of malignant cancer cells [56, 57]. Its modulation by the TGF-β signaling pathway, a vital driver of EMT, highlights the importance of this interplay [58]. Furthermore, BGN has emerged as a biomarker that marks the progressive transition from normal colonic mucosa to adenoma and ultimately to adenocarcinoma, particularly in the context of colorectal cancer [59].

BGN's significant correlation with EMT pathway activation across cancers suggests its pivotal role in tumor progression, warranting further investigation into its specific function in ESCC (Fig. 6D, E). Although elevated BGN expression has been reported to correlate with poorer overall survival in ESCC, the underlying mechanisms remain to be elucidated [24]. In vitro experiments from our study corroborate these findings, showing that RNA interference-mediated BGN suppression in ESCC cell lines significantly impeded cell proliferation, induced apoptosis, and attenuated cell migration and invasion, revealing a tumor-suppressive role for BGN in ESCC pathophysiology.

Cancer represents a multifaceted systemic disorder characterized by the dynamic interplay between malignant cells, the ECM, and the diverse cellular constituents of the TME. The ECM's pivotal role in tumor progression marks it as a prime target for oncological therapies. TKIs have revolutionized NSCLC [60] and CML [61] treatment by targeting EGFR mutations, partially via discoidin domain receptors (DDRs) signal transduction inhibition. There is an urgent need to identify tumor-specific ECM targets that can suppress cell proliferation, migration, and angiogenesis to curb tumor progression without harming healthy tissues. Intratumoral ECM has been proposed as an antigen for tumor vaccines and CAR therapy, with studies demonstrating that fibronectin EDA domain-targeted vaccines improved macrophage infiltration, inhibited angiogenesis, and reduced metastasis in a breast cancer mouse model [62, 63]. Despite advances, challenges persist in ECM-targeted cancer therapies, notably the absence of systematic assessments of ECM variability across cancers, which is essential for tailored treatment strategies and sensitivity profiling. Furthermore, the scarcity of ECM-specific drug libraries may stem from inadequate in vitro and in vivo models [37]. Our stratification of ESCC patients by BGN expression levels identified a heightened sensitivity to standard chemotherapies in the high-expression cohort. TE-1 cells with intrinsic high BGN levels showed significantly reduced proliferation following treatment with docetaxel, paclitaxel, oxaliplatin and OSI-027, in accordance with the drug sensitivity analysis. These findings position BGN as a promising biomarker for personalized ESCC chemotherapy and highlight its potential role in guiding the selection of targeted therapeutics.

Conclusion

In conclusion, our research has pinpointed seven pivotal characteristic genes and crafted a diagnostic nomogram integrating these genetic markers with clinical metrics, offering a robust framework for the early detection of ESCC. These genetic elements may also serve as prognostic indicators and therapeutic targets, facilitating personalized clinical strategies and medical decision-making for ESCC management. Given BGN's crucial involvement in the etiology and progression of ESCC, therapeutic targeting of this protein could confer a significant clinical benefit. Thus, the adjunctive use of chemotherapeutic agents like docetaxel, paclitaxel, and oxaliplatin, in tandem with current therapeutic strategies, may significantly bolster the clinical efficacy of ESCC treatments.