Introduction

Long non-protein coding RNAs (lncRNAs) have been recognized as crucial cellular regulators influencing tissue homeostasis, along with playing significant roles in pathological conditions such as cancer initiation, growth, and metastasis1. These lncRNAs, typically with a transcript length of 200 nucleotides or longer, form a diverse group of regulatory RNAs capable of interacting with DNA, proteins, or other RNAs2. Their precise spatial and temporal regulation lends them versatility as regulators of cellular processes in various cellular compartments, presenting them as interesting targets for therapeutic interventions in pathological conditions3.

Keratinocyte-derived cutaneous squamous cell carcinoma (cSCC) is the most common metastatic skin cancer globally, with its incidence on the rise due to lifestyle shifts and aging populations4. The metastasis rate of primary cSCC is approximately 3–5%, contributing to at least 20% of skin cancer-related deaths5,6. Cumulative exposure to UV radiation stands as a primary risk factor for cSCC, attributed to its high mutation rate7. Mutations in Tumor protein 53 (TP53) gene occur early in keratinocyte carcinogenesis8,9,10, followed by subsequent genomic alterations in oncogenes like NOTCH1, HRAS, CDKN2A, and EGFR that further drive cSCC progression11,12. Since epidermal keratinocytes in chronically sun-exposed normal skin already carry several mutations in genes linked to cSCC progression, changes in the tumor microenvironment play a critical role in the initiation and progression of cSCC13,14.

While the role of lncRNAs in cSCC has been subject to extensive investigation, their specific impact on cSCC progression remains largely unexplored15. We have previously characterized and named three lncRNAs that are upregulated in cSCC and regulate its progression through different mechanisms: PICSAR (p38 inhibited cutaneous squamous cell carcinoma-associated lincRNA)16,17, PRECSIT (p53 regulated carcinoma-associated STAT3-activating long intergenic non-protein coding transcript)18, and SERLOC (super enhancer and ERK1/2-regulated long intergenic non-protein coding transcript overexpressed in carcinomas)19. Additionally, PVT1 was identified as an lncRNA that promotes cSCC progression by suppressing cellular senescence, inhibiting CDKN1A expression, and preventing cell cycle arrest20,21.

In this study, to elucidate the role of lncRNAs in cSCC, RNA sequencing (RNA-seq) was performed on patient-derived cSCC cell lines and normal human epidermal keratinocytes (NHEK). Correlation analysis of differentially expressed lncRNAs and protein-coding genes identified six clusters. Cluster 1 was upregulated in cSCC cells and contained genes related to cell motility. Further investigation of the lncRNAs in this cSCC cell motility cluster revealed that they were also upregulated in patient-derived head and neck SCC (HNSCC) cells. Notably, higher expression of HOTTIP and LINC00543 was associated with worse prognosis in head and neck SCC (HNSCC) and lung SCC (LUSCC) cohorts in the TCGA data, respectively. These findings uncover a novel set of lncRNAs involved in cSCC cell locomotion, which may serve as biomarkers and potential therapeutic targets in advanced cSCC.

Materials and methods

Ethical issues

The use of tumor-derived SCC cell lines and NHEKs, as well as the collection of cSCC tissues, was approved by the Ethics Committee of the Hospital District of Southwest Finland (187/2006, 138/2007). All participants provided written informed consent, and the study was conducted with the permission of Turku University Hospital, in accordance with the Declaration of Helsinki.

Cell culture

NHEKs (n = 4) were isolated from the skin of healthy individuals undergoing mammoplasty16. NHEK-PC, NHEK (adult pooled), NHEK (adult single donor), HEK, NHEK were from PromoCell (Heidelberg, Germany), while HEK and human oral keratinocytes (HOK) were acquired from ScienCell Research Laboratories (Carlsbad, CA, USA). Primary non-metastatic (UT-SCC12A, UT-SCC91, UT-SCC105, UT-SCC111, and UT-SCC118) and metastatic (UT-SCC7, UT-SCC59A, and UT-SCC115) cSCC cell lines were established from surgically removed cSCCs at Turku University Hospital22,23. The authenticity of cSCC cell lines was verified by short tandem repeat profiling as described previously22. HNSCC cell lines (n = 29) were established from surgically removed HNSCCs at Turku University Hospital24,25. The immortalized non-tumorigenic human keratinocyte–derived cell line (HaCaT) and three Ha-ras-transformed tumorigenic HaCaT cell lines (A5, II-4, and RT3) were kindly provided by Dr Norbert Fusenig (Deutsche Krebsforschungszentrum, Heidelberg, Germany). A5 cells form benign tumors, II-4 cells invasive tumors and RT3 cells metastatic tumors in nude mice in vivo26. Cells were cultured as described previously18. For growth factor treatment, cSCC cell lines (n = 2–8) were maintained in serum-free medium for 24 h and then treated with transforming growth factor-β1 (5 ng/mL; Sigma Aldrich, St Louis, MO) for 24 h.

Tissue samples

Normal human skin samples (n = 6) were obtained from the upper arm of healthy volunteers or during mammoplasty operation at Turku University Hospital. Primary cSCC samples (n = 6) were collected from surgically removed tumors in Turku University Hospital16.

RNA sequencing

RNA was isolated from cSCC cell lines (n = 8) and NHEKs (n = 4) using miRNeasy Mini Kit (Qiagen), and the RNA-seq analysis was performed using Illumina HiSeq2500 system using paired-end sequencing chemistry with 100 bp read length (Illumina, San Diego, CA) at the Finnish Functional Genomics Centre, University of Turku, Åbo Akademi University and Biocenter Finland. The number of reads was ~ 60 M or above for all samples (Supplementary Table S1). Fastq read files from the same sample were first merged in order to obtain one fastq file pair per sample. Before pooling the reads, the quality of all read files were inspected using FastQC software v0.12.1 (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)27 to ensure consistent quality of the reads (Supplementary Fig. S1). The reads were aligned against the human GRCh38 (hg38) genome assembly with Homo_sapiens.GRCh38.88.gtf transcript annotation file using STAR alignerr, version 2.6.1 (http://code.google.com/p/rna-star/)28 (Supplementary Fig. S2, Supplementary Table S2; GeneVia Technologies, Tampere, Finland). Gene-level read counts were obtained simultaneously with the alignment process. Read count data was filtered to exclude genes with very low read counts (less than ten reads in total in all samples). The obtained read counts were normalized using regularized log transformation function of DESeq2 R package, v. 1.20.0 (https://bioconductor.org/packages/release/bioc/html/DESeq2.html)29, which transforms the count data to the log2 scale in a way that minimizes differences between samples for rows with small counts and also normalizes the data with respect to library size. DESeq2 R package, v. 1.20.029 uses raw read counts as the input and performs internal normalization using the median of ratios method. Differential expression (DE) analysis was performed using DESeq2, v. 1.20.0. P-values were adjusted using the Benjamini–Hochberg multiple testing adjustment method30. Genes with absolute log2 fold change > 1 and adjusted p < 0.05 were considered as significantly differentially expressed. Differentially expressed genes were annotated with HGNC gene symbols, gene description and gene biotype using biomaRt, v. 2.36.1 (http://www.bioconductor.org/biocLite.R)31. The results were split into differentially expressed protein-coding genes and lncRNAs based on the gene biotype annotations, using Ensembl biotype definitions as guidance. RNA sequencing data (accession number GSE252347) have been deposited in the public database GEO (Gene Expression Omnibus, NCBI; http://www.ncbi.nlm.nih.gov/geo/).

Correlation and clustering analysis

A combined dataset of significantly differentially expressed lncRNAs and protein coding genes was generated and pairwise Pearson’s correlations of the expression profiles of each gene were computed (GeneVia Technologies). Correlation distances (1—correlation coefficient) were used as input for k-means clustering with 6 centres, generating clusters of co-expressed genes, i.e. genes with similar expression profiles across all samples. Correlation distance matrix was subjected to classical multidimensional scaling (MDS), or principal coordinates analysis, to be able to visualize the clustering result in a two-dimensional plot generated using ggplot2 1.0 (https://ggplot2.tidyverse.org/articles/ggplot2.html).32. Correlation densities within each of the clusters were visualized as histograms using basic R functions and custom R scripts (http://www.R-project.org/). Distributions of protein coding gene and lncRNA expression values in each cluster were examined using box plots generated using ggplot2 1.0 (https://ggplot2.tidyverse.org/articles/ggplot2.html)32. Gene expression values of all six clusters were also visualized as a heatmap using R package pheatmap33.

Functional analysis

Enrichment of Gene ontology (GO)34,35 terms in the genes in k-means clusters was studied using R package clusterProfiler, v.3.8.1 (http://bioconductor.org/ packages /release /bioc/ html/clusterProfiler.html36. The p-values of enrichment analysis were corrected for multiple testing using Benjamini–Hochberg multiple testing adjustment procedure30. For the cluster genes, the enrichment analysis was conducted using both lncRNAs and mRNAs.

Ingenuity Pathway Analysis (IPA) (QIAGEN, Germantown, MD; QIAGEN Inc., https://www.QIAGENbioinformatics.com/products)37 was performed separately for the k-means gene cluster including both lncRNAs and mRNAs (GeneVia Technologies). As before, genes with absolute log2 fold change > 1 and adjusted p < 0.05 were considered significantly differentially expressed and were included in the analyses. P-values were corrected for multiple testing using Benjamini–Hochberg method30. IPA also calculated a z-score for each pathway, indicating predicted pathway activation if the z-score > 2 or inhibition if z-score < -2.

DAVID Bioinformatics Resources (https://david.ncifcrf.gov/tools.jsp) was used to study the genes in k-means clusters38,39. It calculates the probability that particular GO annotations are overrepresented in a given gene list using a Fisher exact probability test. Molecular function and biological process GO terms with a p value < 0.05 containing at least three genes were considered significant.

Transcription factor annotation

Experimentally validated transcription factors (TFs) from Transcriptional Regulatory Element Database (TRED)40 via RegNetwork41, Encyclopedia of DNA Elements (ENCODE) and ChIP-X Enrichment Analysis (CHEA) database42 were collected (GeneVia Technologies). The data from the three sources were combined and redundant entries removed to generate a list of unique TF-TG pairs consisting of 376 TFs and 23,430 TF targets. For each TG, a list of potential TFs was then extracted, and the list was used for annotating DE genes with their potential upstream TFs. To visualize the enriched TFs in cluster 1 https://www.wordclouds.com was used.

Real-time quantitative PCR

Total RNA was extracted from cultured NHEKs and cSCC cells using RNeasy mini kit (Qiagen, Germantown, MD, USA), and 1 µg of total RNA was reverse transcribed into cDNA with random hexamer and M-MLV Reverse Transcriptase H Minus (both from Promega, Madison, WI, USA) for real-time quantitative reverse transcriptase-PCR (qRT-PCR) analysis. Primers and probe for LINC01361 (Hs03084701_cn, Cat Nro 4400291), LINC01558 (Hs00205026_m1, Cat Nro 4351372), LINC01460 (Hs03405179_cn, Cat Nro 4400291), LINC00543 (Hs00905601_s1, Cat Nro 4426961), LINC00702 (Hs05184596_cn, Cat Nro 4400291) and HOTTIP (Hs04965481_cn, Cat Nro 4400291) were purchased from Thermo Fisher Scientific (Waltham, MA, USA). Primers and probes for MMP13 and β-actin (ACTB) were designed as previously described25. qRT-PCR reactions were performed using the QuantStudio 12 K Flex (Thermo Fisher Scientific) at the Finnish Functional Genomics Centre in Turku, Finland. qRT-PCR amplification was executed with the following protocol: hold stage for 2 min at 50 °C, 10 min at 95 °C, and PCR stage for 40 cycles of 0.15 min at 95 °C and 1 min at 60 °C. Samples were analyzed using the standard curve method in 2–3 parallel reactions with threshold cycle values < 5% of the mean threshold cycle.

GEO dataset analyses

Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/geo/) was utilized to investigate the expression of LINC RNAs in cSCCs in vivo. RDEBSCC gene expression profile dataset GSE1115828 was downloaded from the publicly available GEO database and used to analyze the expression of LINC00702, LINC01361 and as a control gene MMP13 in RDEBSCC (n = 8) and non-SCC (n = 10) samples investigated in our study. Another dataset we utilized was GSE4521643. This dataset included samples from immunosuppressed patients (cSCC, n = 21, AK n = 7) and sporadic cSCC (n = 9) and AK (n = 3). The GEO2R online tool was employed to analyze the gene expression profile.

Gene expression profiling interactive analysis

The online Gene Expression Profiling Interactive Analysis (GEPIA; http://gepia.cancer-pku.cn/) analysis tool was utilized to analyze the expression of HOTTIP and LINC00543 in SCCs in the database and the relationship between HOTTIP and LINC00543 expression and prognosis of HNSCC, LUSCC, and esophageal carcinoma (ESCA) in The Cancer Genome Atlas (TCGA) data44,45.

Statistical analysis

The R package Limma was used for statistical analysis of RNA-seq data. Statistical analyses of qRT-PCR results were performed using SPSS Statistics software for Windows, version 27 (IBM, Armonk, NY). To determine the significance of differences between two sample groups, the U-test was used. TCGA data was analyzed using the GEPIA tool46.

Declaration of AI use

ChatGPT-4 (https://chat.chatbotapp.ai/) was used to revise the language of some paragraphs during the final round of editing the revised manuscript.

Results

The expression profiles of lncRNAs and protein-coding genes in cSCC cells and NHEKs

To determine differentially expressed lncRNAs and protein-coding genes, the RNA-seq was performed on primary non-metastatic (n = 5) and metastatic (n = 3) cSCC cell lines, as well as NHEKs. Genes were considered significantly differentially expressed if the adjusted p-value (FDR) was less than 0.05 and the absolute log2 fold change in expression level was greater than 1. Genes in two categories—lncRNAs or protein-coding genes—were further characterized. The number of differentially expressed lncRNA genes was 723, while the number of differentially expressed protein-coding genes was 1667. To illustrate the distribution of differentially expressed lncRNAs (Supplementary Fig. S3A) and protein coding genes (Supplementary Fig. S3B) volcano blots were constructed. The top 50 most significantly differentially expressed lncRNAs and protein coding genes are presented in a heatmap (Fig. 1A). As previously shown in cSCC, LINC0052047 was among the top downregulated lncRNAs (Fig. 1A) and PRECSIT (LINC00346)18 was among top upregulated lncRNAs (Fig. 1A). Additionally, inflammasome component AIM248 was among top upregulated mRNAs (Fig. 1A).

Figure 1
figure 1

Profile of the differentially expressed lncRNAs and mRNAs in cSCC cells compared to NHEKs. RNA-seq was performed on patient-derived cSCC cell lines (n = 8) and normal human epidermal keratinocytes (NHEK, n = 4). (A) Heatmaps and the hierarchical clustering of the top differentially expressed lncRNAs and mRNAs in cSCC cells compared to NHEKs. (B) The number of differentially expressed lncRNAs and mRNAs based on fold changes (FC log2). (C) Biotype classification of upregulated and downregulated lncRNAs. All biotypes identified in the data are shown.

Regarding lncRNA expression with more than an eightfold change (log2), there were 11 upregulated genes, 5 of which were novel transcripts (Fig. 1B, Supplementary Table S3). One antisense lncRNA, ZNF667-AS1 (ZNF667 antisense RNA 1) was > 10 FC (log2) downregulated (Fig. 1B, Supplementary Table S3). LINC01361, HOTTIP, and LINC01558 were among the upregulated lncRNAs (FC (log2) > 5 and > 4, respectively) (Fig. 1B, Supplementary Table S3). The majority of the differentially expressed genes belonged to the 2–3 FC (log2) group, including LINC00543, LINC00702 and LINC01460 (Fig. 1B, Supplementary Table S3). Similarly, for differentially expressed protein coding genes, 5 upregulated genes (e.g. MMP13) and 4 downregulated genes were identified with > 10 FC (log2), and the group with most differentially expressed protein coding genes exhibited a 2–3 FC (log2) (Fig. 1B, Supplementary Table S4).

Classification of differentially expressed lncRNA biotypes

Altogether 568 lncRNAs were significantly upregulated (adjusted p-value < 0.05, FC (log2) 1) in cSCC cells. The majority of these were either antisense RNAs (48%) or lincRNAs (41%) (Fig. 1C). In contrast, 155 lncRNAs were significantly downregulated (adjusted p-value < 0.05, FC (log2) 1) in cSCC cells, with the vast majority being antisense RNAs (55%) (Fig. 1C). Additionally, 35% of the significantly downregulated lncRNAs were lincRNAs (Fig. 1C).

Co-expression networks of the lncRNAs and protein-coding genes

The differentially expressed protein-coding genes and lncRNAs (adjusted p-value < 0.05, FC (log2) 1) were combined and pairwise Pearson’s correlation coefficients were computed for their expression values across the samples (Fig. 2). k-means clustering, using six centers, was performed based on the correlation distances. The clustering of the genes is visualized as an MDS plot with different colors representing the different clusters (Fig. 2A). Pairwise Pearson’s correlations of the expression values across the samples within each cluster were calculated to determine the expression profile similarity among the cluster genes, and the correlation distributions were visualized using histograms (Fig. 2B). Average correlations for the clusters were also calculated (Supplementary Table S5). The within-cluster correlations showed that Cluster 6 had the highest average correlation and the narrowest distribution of correlation values (Supplementary Table S5). In contrast, Cluster 5 exhibited a bimodal distribution of correlation values. Additionally, in the MDS plot, the genes in Cluster 5 did not form a clearly defined group, suggesting that these genes had variable expression profiles that lacked clear correlation with the expression profiles of the other genes (Fig. 2A,B Supplementary Table S5).

Figure 2
figure 2

Construction of the co-expression network of lncRNAs and mRNAs. Differentially expressed lncRNAs and mRNA (adjusted p-value < 0.05, FC (log2) 1) in cSCC cells compared to NHEKs were joined and pairwise Pearson’s correlation coefficients were computed using their expression values. k-means clustering with six centers was performed based on the correlation distances. (A) Visualization of gene clustering in a multidimensional scaling (MDS) plot. Genes are represented as dots with their principal coordinates based on pairwise correlation distances colored by clusters. (B) Calculation of pairwise Pearson’s correlations within each cluster to assess expression profile similarities among the clustered genes. Correlation distributions were visualized using histograms. (C) Visualization of gene expression profiles in a heatmap displaying expression values of all clustered genes across the samples. (D) Box plot presentation illustrating the distribution of expression values in cSCC cell lines (n = 8) and NHEKs (normal, n = 4) for lncRNAs and mRNAs within one cluster.

Genes in Cluster 1 and 2 are upregulated in cSCC cells

The gene expression profiles of the clusters were visualized by generating a heatmap of the expression values of all clustered genes across the samples (Fig. 2C). Genes in Clusters 1 and 2 were upregulated in cSCC cells, whereas genes in Clusters 3, 4, 5 and 6 were downregulated compared to NHEKs (Fig. 2C). To further inspect the expression values of the cluster genes, box plot visualizations were generated. These box plots display the expression value distributions in cSCC cells and NHEKs separately for the protein-coding genes and lncRNAs within the same cluster (Fig. 2D).

Genes in Cluster 1 are involved in cSCC cell motility

The upregulated Cluster 1 (Fig. 3) was studied in more detail. Pathway analysis of the genes associated with Cluster 1 revealed significant upregulation of the IPA biofunction categories Cell movement, Migration of cells and Invasion of cells (Fig. 3A). This association was also supported by GO analysis. GO terms related to Biological process associated with Cluster 1 genes included Cell migration and Locomotion (Fig. 3B). GO terms related to Cellular component associated with Cluster 1 genes included Cell periphery and Plasma membrane (Fig. 3C). David analysis of Cluster 1 genes showed the association of the protein coding genes with terms such as Endopeptidase activity, Cell migration and Extracellular matrix disassembly (Supplementary Table S6). A network of the Cluster 1 genes related to the term Extracellular space (Supplementary Fig. S4) was created by integrating protein–protein functional interaction data from the STRING database (https://string-db.org/, August 16, 2023)49. Interestingly, the previously known cSCC cell invasion related matrix metalloproteinase, MMP-13, was associated with this term (Supplementary Fig. S4)50.

Figure 3
figure 3

Genes within Cluster 1 genes play a role in cSCC cell motility. Summary of (A) significantly regulated IPA biofunctions, (B) the Gene ontology (GO) Biological process and (C) GO Cellular component associated with genes in Cluster 1. (D) Biotype classification of lncRNAs within Cluster 1 displaying all biotypes discovered in the data. Heatmap visualization of the top upregulated (E) lncRNAs and (F) mRNAs in cSCC cell lines (n = 8) compared to NHEKs (n = 4).

Classification of differentially expressed lncRNA biotypes in Cluster 1

LncRNA biotypes in Cluster 1 were studied in more detail. Altogether, there were 208 lncRNAs in Cluster 1, the majority of which were antisense or lincRNAs (45% and 44%, respectively) (Fig. 3D). Among the top regulated lncRNAs in Cluster 1, there were several novel transcripts and previously uncharacterized lincRNAs, as well as some previously characterized ones, such as HOTTIP51 and LINC0049152, which have been shown to be involved in cancer cell invasion (Fig. 3E) (Supplementary Table S7). Interestingly, Cluster 1 included novel transcripts such as LINC01361, LINC01460, and LINC01558, as well as RNA transcripts LINC00543 and LINC00702, which have not previously been identified in SCCs (Supplementary Table S7). The expression of selected lncRNAs, both novel and those not previously characterized in cSCC within cluster 1 in cSCC cells and NHEKs was demonstrated as a heatmap by RNA-seq (Supplementary Fig. S5). Notably, the cSCC invasion-related MMP13 and complement factor I (CFI) were among the top upregulated protein-coding genes in Cluster 1 (Fig. 3F) (Supplementary Table S7).

Cluster 1 genes were also annotated with potential upstream regulator transcription factors (TFs) (Supplementary Fig. S6, Supplementary Table S8). Only experimentally validated TFs were included in the annotations. Myc was shown to be upstream regulator for 377 genes and EZH2 for 368 genes (Supplementary Fig. S6). These TFs were shown to activate the transcription of genes coding for molecules potentially related to cancer progression, such as Myc regulating CFI and EZH2 regulating MMP13 and CFI (Supplementary Table S8).

Validation of differentially expressed lncRNAs in cSCC

The expression of selected lncRNAs in cSCC cells was verified using qRT-PCR (Fig. 4A). HOTTIP, LINC00543, LINC00702, LINC01361, LINC01460, and LINC01558 were significantly upregulated in cSCC cells compared to NHEKs (Fig. 4A). As a control, the mRNA levels of MMP13 were also observed to be upregulated in cSCC cells (Fig. 4A).

Figure 4
figure 4

Validation of differentially expressed lncRNAs in cSCC and HNSCC cell lines. The RNA expression levels of LINC00543, LINC00702, LINC01361, LINC01460, LINC01558, HOTTIP and MMP13 were determined using qRT-PCR in cSCC (A) and HNSCC (B) cells and NHEKs (A, n = 10; B, n = 11). The levels of β-actin (ACTB) mRNA were used as a reference gene. Statistical significance was determined by the Mann–Whitney U-test; *p < 0.05; **p < 0.01, ***p < 0.001.

The expression of lncRNAs in HNSCC cell lines

The expression of these selected lncRNAs was also investigated in patient-derived HNSCC cells using qRT-PCR (Fig. 4A). Significant overexpression of LINC00543, LINC01361, LINC01460, LINC01558 and HOTTIP was detected in HNSCC cells compared to normal keratinocytes (Fig. 4B). Additionally, MMP13 was significantly upregulated in HNSCC cells, as previously shown24,25 (Fig. 4B).

The expression of LINC00702, LINC01558 and HOTTIP in Ha-ras-transformed tumorigenic HaCaT cells

To further elucidate the role of the invasion cluster related lncRNAs during the progression of cSCC, the expression of LINC00702, LINC01558, HOTTIP and MMP13 was determined in an immortalized non-tumorigenic keratinocyte–derived cell line (HaCaT) lacking functional p53, as well as in three Ha-ras-transformed tumorigenic HaCaT-derived cell lines with varying levels of tumorigenicity in vivo (A5, II-4, and RT3)26. Other selected lncRNAs were not expressed in this in vitro model of cSCC progression. LINC00702 was expressed only in RT3 cells, the most aggressive ras-transformed tumorigenic HaCaT-derived cell line (Fig. 5A). The expression of LINC01558 was low in HaCaT and A5 cells, whereas markedly higher levels were observed in II-4 and RT3 cells (Fig. 5A). The expression of HOTTIP and MMP13 was increased in A5 compared to HaCaT cells, but was low in II-4 and RT3 cells (Fig. 5A).

Figure 5
figure 5

The regulation of lncRNA expression in Ha-ras-transformed HaCaT and cSCC cells. (A) Evaluation of LINC00702 (n = 2–3), LINC01558 (n = 3), HOTTIP (n = 2–3) and MMP13 (n = 3) expression levels in HaCaT cells and tumorigenic Ha-ras-transformed HaCaT cell lines (A5, II-4, and RT3) with qRT-PCR normalized to β-actin (ACTB) mRNA levels. (B) Cutaneous SCC cell lines (n = 7–8) were treated with transforming growth factor-β1 (TGF-β1; 5 ng/mL) for 24 h. Subsequently, LINC00543 and LINC00702 levels were determined using qRT-PCR and normalized to β-actin (ACTB) mRNA levels. *p < 0.05, Mann–Whitney U-test.

Regulation of lncRNAs LINC00543 and LINC00702 by TGF-β in cSCC cells

Transforming growth factor-β (TGF-β) signaling has been recognized as a regulator of cSCC cell invasion53,54,55,56. Therefore, the effect of TGF-β on the regulation of cell motility-related lncRNAs was examined. Cutaneous SCC cell lines were treated with TGF-β, and the regulation on the lncRNAs was assessed using qRT-PCR. Treatment with TGF-β resulted in the upregulation of LINC00702 and a significant downregulation of LINC00543 in most SCC cell lines (Fig. 5B). In contrast, LINC01361 and LINC01558 were downregulated, while LINC01460 was upregulated in cSCC cells following TGF-β treatment (Supplementary Fig. S7).

The expression of lncRNAs in cSCC tumors in vivo

Overexpression of LINC00543 was detected in cSCC tumor samples in vivo compared to normal skin using qRT-PCR (Fig. 6A). Additionally, qRT-PCR analysis indicated an upregulation of the expression of the protein-coding gene MMP13 in cSCC tumors in vivo (Fig. 6B)57. Furthermore, by analyzing the GSE111582 database8 we observed that LINC00702, LINC01361 and MMP13 were upregulated in recessive dystrophic epidermolysis bullosa-associated cSCC (RDEBSCC), an aggressive form of cSCC compared to non-tumor samples (Supplementary Fig. S8). Using another dataset (GSE45216)43 we detected a slight increase of expression of LINC01361 in immunocompetent patients with moderately and poorly differentiated cSCC compared to actinic keratoses (AKs), premalignant epidermal lesions (Supplementary Table S9). In cSCCs from immunosuppressed patients, we observed a slight increase in LINC00702 expression when all cSCCs were compared to AKs. The expression of LINC00702 was further upregulated when moderately and poorly differentiated cSCCs were specifically compared to AK (Supplementary Table S9).

Figure 6
figure 6

LINC00543 and HOTTIP are linked to a poor prognosis in SCC. (A, B) The expression of (A) LINC00543 and (B) MMP13 was determined in primary non-metastatic (non-met) and metastatic (met) cSCC tumor samples (n = 6) in vivo compared to normal skin (n = 4–5) using qRT-PCR. Statistical significance (*p < 0.05) was determined using the Mann–Whitney U-test. (C) Analysis of The Cancer Genome Atlas (TCGA) data was conducted using the GEPIA tool to investigate the expression of LINC00543, HOTTIP and MMP13 in squamous cell carcinomas within TCGA database (head and neck SCC (HNSCC), esophageal carcinoma (ESCA) and lung SCC (LUSCC)). (D) Evaluation of the impact of LINC00543 and HOTTIP on the overall survival of SCC patients was performed using the GEPIA tool with available TCGA data.

The expression of LINC00543 and HOTTIP in other SCCs is associated with a poor prognosis

Analysis of data from the Cancer Genome Atlas (TCGA)44,45 was conducted to study the expression of lncRNAs and MMP13 in SCCs within the TCGA database, including HNSCC, ESCA and LUSCC44,45. Cluster 1-related LINC00543, HOTTIP and MMP13 were upregulated in the investigated SCCs (Fig. 6C). Furthermore, the impact of lncRNA genes expression on the overall survival of SCC patients was assessed using available TCGA data44,45. Elevated expression of genes encoding LINC00543 and HOTTIP was found to correlate with a poor prognosis, resulting in shorter overall survival in LUSCC and HNSCC, respectively (Fig. 6D).

Discussion

The prognosis for metastatic cSCC is poor, with no established biomarkers to predict metastatic risk or specific therapeutic targets for advanced or metastatic cSCCs. Therefore, it is important to characterize cSCC progression at the molecular level to uncover dysregulated pathways and to identify key drivers of the disease. While lncRNAs are being extensively studied in cSCC, their precise role in the progression of cSCC remains largely unknown15. Several recent studies have identified lncRNAs, such as MALAT1, PVT1, LINC00319, that play roles in cSCC progression20,21,58,59. Additionally, our previous studies have highlighted the role of lncRNAs PICSAR, PRECSIT and SERLOC in cSCC16,17,18,19.

In this study, deep RNA-seq analysis of patient-derived cSCC cells and NHEKs was conducted to discover novel lincRNAs involved in cSCC progression. Among the identified lncRNAs and mRNAs that were most significantly downregulated, some were previously associated with cSCC progression. For instance, LINC00520, known for its inhibitory effects on cSCC progression, was among the top downregulated lncRNAs47. Conversely, the upregulated lncRNA PRECSIT (LINC00346) was previously shown to promote cSCC progression18. Additionally, the upregulation of AIM2, a protein-coding gene encoding an inflammasome component associated with promoting cSCC growth and invasion, further supported the findings of this RNA-seq analysis in accordance with previous research on cSCC cells48.

To gain additional insight into cSCC progression, we utilized the deep RNA-seq data for integrative analysis of lncRNAs and protein-coding genes to predict their potential roles in cSCC progression. Integrative analysis is a commonly employed approach to predict the functions of lncRNAs in various biological processes60. Previously a potential reference set of lncRNAs, mRNAs and circular RNAs in cSCC was identified using whole transcriptome sequencing61. Furthermore, microarray analysis of cSCC precursor AK samples has revealed a lncRNA implicated in JAK-STAT3 signaling pathway in AK62. However, to our knowledge no integration analysis based on whole transcriptome sequencing of lncRNAs and protein coding genes has been conducted for genes regulated in cSCC samples. The integration analysis revealed six gene clusters, two of which were upregulated. We further investigated one of the upregulated clusters, specifically Cluster 1. Bioinformatics analysis of gene enrichment revealed that the genes in Cluster 1 were significantly enriched in terms and pathways related to cell motility. Therefore, we assigned the name “cSCC motility cluster” to Cluster 1.

We focused on cSCC motility cluster-related lincRNAs, which were the second largest biotype of lncRNAs after antisense RNAs. Previous studies in other cancers have also highlighted, that the two predominant categories of lncRNAs are antisense and lincRNAs63. Among the top regulated lncRNAs in Cluster 1 there were several novel transcripts and previously uncharacterized lincRNAs, along with some previously characterized cancer cell motility-related lncRNAs such as HOTTIP51 and LINC0049152. HOTTIP has been shown to regulate the invasion of prostate cancer cells64. Additionally, LINC00491 has been identified to regulate the migration and invasion of colon adenocarcinoma cells65. Furthermore, the previously known cSCC cell invasion-related matrix metalloproteinase MMP-13, and the complement inhibitor serine protease complement factor I (CFI) were found to be involved in this cluster of genes, emphasizing the functional significance of this cluster50,66. Myc and EZH2 were identified as the most frequent TFs in cell motility cluster. These findings are in accordance with previous studies showing that Myc can induce poor differentiation of cSCC, suggesting its role in the development of a more aggressive tumor phenotype67. EZH2 has been observed to enhance the progression of cSCC68. Interestingly, EZH2 has also been shown to interact with Myc and the coactivator (p300) via a cryptic transactivation domain (TAD), and this way induce gene activation and oncogenesis69.

We investigated further the expression of selected lncRNAs and one mRNA, MMP13, in patient derived cSCC and HNSCC cells. The upregulation of previously uncharacterized lncRNAs, such as LINC01361, LINC01460 and LINC01558 as well as already characterized HOTTIP, LINC00543 and LINC00702 was confirmed in cSCC cells compared to NHEKs using qRT-PCR. These lncRNAs, except LINC00702 were also upregulated in HNSCC cells. Additionally, the expression of LINC00702, LINC01558 and HOTTIP in an in vitro model of cSCC progression confirmed their upregulation during epidermal keratinocyte carcinogenesis. Notably, HOTTIP has been demonstrated to regulate cancer progression by promoting the invasion of prostate cancer cells, while LINC00543 promotes the metastasis of colorectal cancer70. On the other hand, LINC00702 has been previously reported to be downregulated in gastric and bladder cancer71,72 but to promote the pathogenesis of meningeoma73.

TGF-β signaling pathway is involved in the invasion of SCC cells53,54,55. A decrease in TGF-β signaling has been observed to be linked with an increased invasion depth of cSCC56. In this study we demonstrate, that LINC00543 and LINC00702 are regulated by TGF-β. While TGF-β appears to play an important role in cell motility, our results support the notion, that these lncRNAs are part of the cell motility cluster.

The expression of selected lncRNAs was also assessed in cSCC tumor samples in vivo using qRT-PCR. LINC00543 was found to be upregulated in cSCC tumor samples in vivo compared to normal skin. Moreover, qRT-PCR analysis demonstrated a significant upregulation of MMP13 in cSCC compared to normal skin. Analysis of data from the GEO database revealed, that the expression of LINC00702 and LINC01361 was elevated in a more aggressive form of cSCC, RDEBSCC, compared to non-SCC samples. In immunosuppressed patients with cSCC, LINC00702 was upregulated in moderately and poorly differentiated cSCCs compared to AK samples in vivo. These findings confirm the role of these lncRNAs as markers for aggressive and advanced disease. Additionally, TCGA data was used to investigate the expression of lncRNAs and MMP13 in other SCCs. LINC00543, HOTTIP and MMP13 were found to be upregulated in HNSCC, ESCA, and LUSCC. Furthermore, higher expression levels of LINC00543 and HOTTIP were associated with a poorer prognosis, indicating shorter overall survival in LUSCC and HNSCC, respectively. These results suggest that upregulation of LINC00543 and HOTTIP may serve as indicators of poor prognosis in SCC patients. Further studies with larger cSCC sample set are required to validate these results, as the cSCC expression data are not included in TCGA database.

In summary, we employed deep RNA-seq data for the integrated analysis of lncRNAs and mRNAs to predict their potential functions in cSCC progression. Based on these results, a set of lncRNAs was identified to be involved in cSCC cell motility cluster. The investigated lncRNAs were observed to be upregulated in patient-derived cSCC and HNSCC cell lines. Additionally, LINC00543 and HOTTIP were found to be overexpressed in tumor samples in vivo and higher expression of LINC00543 and HOTTIP was associated with a poor prognosis for patients with LUSCC or HNSCC. The findings of this study indicate that the cell motility cluster-related lncRNAs HOTTIP, LINC00543, LINC00702, LINC01361, LINC01460 and LINC01558 may serve as potential biomarkers and therapeutic targets in advanced and metastatic cSCC.