Characterization of two transcriptomic subtypes of marker-null large cell carcinoma of the lung suggests different origin and potential new therapeutic perspectives

Pulmonary large cell carcinoma (LCC) is an undifferentiated neoplasm lacking morphological, histochemical, and immunohistochemical features of small cell lung cancer, adenocarcinoma (ADC), or squamous cell carcinoma (SCC). The available molecular information on this rare disease is limited. This study aimed to provide an integrated molecular overview of 16 cases evaluating the mutational asset of 409 genes and the transcriptomic profiles of 20,815 genes. Our data showed that TP53 was the most frequently inactivated gene (15/16; 93.7%) followed by RB1 (5/16; 31.3%) and KEAP1 (4/16; 25%), while CRKL and MYB genes were each amplified in 4/16 (25%) cases and MYC in 3/16 (18.8%) cases; transcriptomic analysis identified two molecular subtypes including a Pure-LCC and an adenocarcinoma like-LCC (ADLike-LCC) characterized by different activated pathways and cell of origin. In the Pure-LCC group, POU2F3 and FOXI1 were distinctive overexpressed markers. A tuft cell-like profile and the enrichment of a replication stress signature, particularly involving ATR, was related to this profile. Differently, the ADLike-LCC were characterized by an alveolar-cell transcriptomic profile and association with AIM2 inflammasome complex signature. In conclusion, our study split the histological marker-null LCC into two different transcriptomic entities, with POU2F3, FOXI1, and AIM2 genes as differential expression markers that might be probed by immunohistochemistry for the differential diagnosis between Pure-LCC and ADLike-LCC. Finally, the identification of several signatures linked to replication stress in Pure-LCC and inflammasome complex in ADLike-LCC could be useful for designing new potential therapeutic approaches for these subtypes. Supplementary Information The online version contains supplementary material available at 10.1007/s00428-023-03721-4.


Introduction
In the last decade, the combination of pathologic, genomic, and clinical advances has led to reclassification of large cell carcinomas (LCC) of the lung into more specific pathologic entities [16].Indeed, the 2021 World Health Organization (WHO) classification defines pulmonary LCC as a rare undifferentiated carcinoma that lacks the cytological, architectural, immunohistochemical, and histochemical features of small cell lung cancer, adenocarcinoma (ADC), or squamous cell carcinoma (SCC) [17].In detail, if a lung cancer with large cell morphology expresses immunohistochemical markers of pneumocytes, such as thyroid transcription factor 1 (TTF-1) and NapsinA, it is considered ADC.Conversely, if squamous markers including p40, CK5/6, or p63 are expressed, the lung cancer is defined as SCC.Additionally, if it is positive for the neuroendocrine markers synaptophysin and chromogranin, it is considered large cell neuroendocrine carcinoma (LCNEC).Therefore, LCC is a diagnosis of exclusion in a surgically resected NSCC lacking expression Michele Simbolo and Giovanni Centonze shared first authorship Aldo Scarpa and Massimo Milione shared last authorship Extended author information available on the last page of the article of the aforementioned immunohistochemical markers and mucin stains [1,17].
Identification of molecular drivers and potential therapeutic targets in LCC would result in a clinically meaningful adjustment in disease management.However, the molecular characterization of these tumours remains challenging due to their rarity.The available information on genomic alterations consists of three studies performed using different targeted next generation sequencing gene panels on 12 (26 genes analysed) [4], 25 (166 genes analysed) [1], and 7 (425 genes analysed) [8] cases, which agree on TP53 as the most frequently mutated gene.Furthermore, only one gene expression analysis was carried out on 12 cases, suggesting the presence of two molecular profiles, one of which was linked to mitogenic processes and the second was similar to that of ADC [5].
The present study aimed to gather further information on this rare disease entity by providing an integrated molecular overview of 16 cases of LCC based on the evaluation of the mutational asset of 409 genes and the transcriptomic profiles of 20,815 genes.

Cases
The clinical databases of three Italian hospitals (Fondazione IRCCS Istituto Nazionale dei Tumori, Milan; ASST Spedali Civili di Brescia, Brescia; Fondazione IRCCS Ca' Granda Ospedale Maggiore Policlinico, Milan), between 2010 and 2020, were queried for the diagnosis "large cell carcinoma".Twenty-eight cases were identified and revised by six pathologists (C.C., M.M., A.S., A.F., L.B., G.S.).Twelve cases were excluded: three because only bioptic or cytologic material was available; 9 were excluded after immunostaining: 6 positive for TTF1 and NapsinA were defined as ADC with solid pattern; 2 positive for p40 were defined as non-keratinizing SCC; and one case immunoexpressed chromogranin A and synaptophysin and was defined LCNEC.Finally, 16 cases met all the LCC criteria of the WHO 2021 classification [17] Table 1); none of these 16 cases showed any Alcian or PAS histochemical stain.
In addition, 17 ADC and 11 LCNEC cases were used for a comparative transcriptomic profiling.
The study was performed according to the clinical standards of the 1983 Declaration of Helsinki and was approved by the Ethic Committee of Fondazione IRCCS INT (No. INT 171/16).

Immunohistochemistry
Immunostaining was performed for the 10 markers listed in Table 2 in an automated immunostainer (Dako Autostainer System).The antibodies Pou Class 2 homeobox 3 (Pou2f3), absent in melanoma 2 (Aim2), and forkhead box I1 (Foxi1) were tested to validate transcriptomic findings and were evaluated as a percentage of positive cells according to Yamada et al. [24].

Tumour mutational load and mutational signatures
Tumour mutational load (TML) and mutational spectrum for each sample were evaluated using the Oncomine TML 5.10 plugin on IonReporter (Thermo Fisher Scientific) as detailed in Supplementary Methods.

FISH validation of MYB gene amplification
FISH assay was carried out to assess MYB (6q23.3)amplification using a Locus-Specific Probes XL 6q21/6q23/6cen (MetaSystems srl Italia).An orange fluorochrome labelled to hybridize the MYB gene localized on 6q23 and an aqua fluorochrome labelled to hybridize the centromere.

Fusion genes and splice variant detection
ALK, RET, and ROS1 rearrangements and MET exon skipping were investigated using an automated real time polymerase chain reaction (RT-PCR) approach (Easy PGX platform, Diatech Pharmacogenetics, Jesi, Italy).

Expression analysis by next-generation sequencing
RNA was prepared using ReliaPrep FFPE Total RNA Miniprep System (Promega, Milan, Italy), quantified using Qubit RNA HS Assay Kit (Thermo Fisher), and qualified using RIN analysis of Agilent RNA 6000 Nano Kit on Agilent 2100 Bioanalyzer (Agilent Technologies).RNA with RIN > 5 and concentration over 10 ng/µl was considered suitable.The Ampliseq Transcriptome Human Gene Expression Kit (Thermo Fisher Scientific, MA, USA) was used to analyse the expression status of 20,815 human RefSeq genes (Supplementary Methods).The expression data analysis was subjected to quality control using the workflow defined by Law et al. [6].

Gene set enrichment analysis (GSEA)
To identify the biological processes differently enriched among all the clusters, we used GAGE R package [10] and ssGSEA score [20].We identified the cluster-specific enriched gene sets using pathways from MSigDB [9,20].
We assessed the ssGSEA score and performed a z-score normalization of the pathway for each sample (Supplementary Methods).A positive correlation between the sample and the specific pathway is represented by a z-score > 0. We considered only the differently related pathways (p-value < 0.05 according to Benjamini-Hochberg test).All samples were grouped according to their molecular class.

Statistical analysis
The association between immunophenotypical and molecular features and their correlation with different LCC groups (ADLike-LCC vs. Pure-LCC) was assessed using the Fisher exact test for categorical variables and the nonparametric Wilcoxon test for continuous variables.Data analysis was
The morphological findings were characterized by solid neoplastic tissue composed of large polygonal cells with prominent nucleoli; necrosis was present in 13 cases (81.3%).
Mutations were found in at least one gene in all 16 cases (Fig. 1A, Supplementary Table 2).A total of 35 mutations in 14 genes were identified, including 20 missense, 4 nonsense, 5 splice site alterations, 1 small deletion, and 5 frameshift (Supplementary Table 2).
TML value, molecular spectrum, and COSMIC signature were computed for each (Supplementary Table 3).A median of 4.4 mutations per Mb (range 0.8-12.7)was estimated for all LCCs, similar to that of lung adenocarcinomas [7].The mutational signatures did not show specific patterns.
Based on the chromosomal position of each gene, the status of chromosome arms was inferred (Supplementary Fig. 1).The major alterations were gains in chromosomes 3, 5, 6, 8, and 20, while losses were observed in chromosomes 3, 5, 13, and 15.

Fusion genes and splice variants
No fusion genes or splice variants were detected for ALK, RET, ROS, and MET genes.

Comparison of marker-null LCC expression profiles with lung adenocarcinomas and large cell neuroendocrine cancers
We investigated the transcriptomic relationship between marker-null LCC, ADC, and LCNEC, which represent the other non-keratinizing large cell histotypes of lung cancer.An unsupervised clustering analysis was conducted for 16 LCC, 17 ADC, and 11 LCNEC samples using the highest variable expressed genes (HVGs; explaining 70% of the total variance) which resulted in 2109 genes.Consensus clustering [22] was applied to identify the best number of clusters (k) which resulted to be k = 3 (Supplementary Fig. 2).
An expression-based molecular map was developed using UMAP method to understand the topological relationships between samples [11].Specifically, 11 marker-null LCC samples formed a standalone group (named Pure-LCC), while the remaining 5 cases were included in the cluster enriched for ADC histology and were named adenocarcinoma-like (ADLike-LCC; Fig. 2A).To understand the relationship between each sample and the others, we applied hierarchical clustering analysis that grouped the samples as follows (Fig. 2B): cluster 1 (CL1; Pure-LCC), including 11 marker-null LCC samples; cluster 2 (CL2; named LCNEC), including 11 LCNEC samples; and cluster 3 (CL3; named ADC/ADLike-LCC), including the remaining 22 samples, composed of 17 ADCs, and 5 marker-null LCCs (ADLike-LCC).The main clinicopathological characteristics of patients according to their expression profile are summarized in Table 3.
Differential expression (DE) analysis between clusters highlighted the overexpression of 121 LCC-specific genes.The FOXI1 gene was the most representative overexpressed marker for the Pure-LCC group followed by POU2F3, MYB, and KIT, which showed the lowest adjusted p-value and the highest logFC (Supplementary Table 4).An immunostaining for the two most representative gene-related proteins, Foxi1 and Pou2f3 (Fig. 3A), was performed.Both markers resulted high expressed in the Pure-LCC compared to ADLike-LCC (p = 0.035 and p = 0.043, respectively) (Supplementary Fig. 3A, Supplementary Table 5).
Next, we performed gene set enrichment analysis (GSEA) to identify the main molecular pathways characterizing the Pure-LCC cluster.We observed a positive association with the biological process related to DNA repair through homologous recombination mechanisms, including Fanconi, ATM and ATR pathways (Fig. 3B).Alpha and beta defensin signalling was also enriched exclusively for this cluster together with cell proliferation and division processes.In fact, Pure-LCC showed a higher mitotic count compared to ADLike-LCC (p = 0.009, Table 3).Furthermore, a strong similarity to tuft cell profile described by Yamada et al. [24] was observed (Fig. 3D) due to overexpression of tuft cell markers as FOXI1, GFI1B, HEPACAM2, and POU2F3 in this group.
Five of the 16 marker-null LCCs showed an ADLike-LCC expression profile.Although these cases scored negative at NapsinA immunostaining, they showed a transcriptomic profile characterized by overexpression of NAPSA, FOS, Surfactant, S100A11, and YAP1 genes similar to that of ADC samples, whereas none of the 121 Pure-LCC specific genes was overexpressed.NapsinA immunostaining of NAPSA overexpressing cases highlighted that this protein was located in normal lung tissue within hyperplastic pneumocytes and macrophages (Supplementary Fig. 3B).The DE analysis identified 4 ADLike-LCC specific overexpressed genes: AIM2, DKK1, S100A8, and SERPINB4.Immunohistochemical analysis for Aim2 confirmed its expression in at least 60% of neoplastic cells of all five ADLike-LCC cases but only 3 of the 11 Pure-LCC samples (Supplementary Fig. 3A, Supplementary Table 1).ADLike-LCC cases were also distinguished from Pure-LCC by a low TML (median = 1.4 mut/Mb vs. 4.4 mut/ Mb; p = 0.04).The GSEA highlighted the presence of a positive correlation among several pathways related to the inflammatory response, including the AIM2 inflammasome complex but not PDL1 (Supplementary Table 6).Of interest, immunostaining for PDL1 resulted negative in both ADLike-LCC and Pure-LCC.Then, we performed a deconvolution analysis comparing Pure-LCC with ADLike-LCC cases.As shown in Fig. 3D, this analysis revealed that the ADLike-LCC group was characterized by a strong infiltrate including macrophages, B lymphocytes, and dendritic cells.Finally, we investigated the cellular origin of ADLike-LCC using the 2 signatures described by Nakamura et al. [12] that comprises specific markers of lung alveolar and bronchial cells.According to the GSEA scoring, the ADLike-LCC showed an expression profile compatible with an alveolar origin (Fig. 3E) due to overexpression of several alveolar lung markers including HIGD1B and RFTN, and the lack of bronchial markers (Fig. 3F).

Discussion
The present study on the genomic and transcriptomic analysis of 16 marker-null LCC showed that (i) TP53 was the most frequently inactivated gene (15/16; 93.7%) followed by RB1 (5/16; 31.3%) and KEAP1 (4/16; 25%), while CRKL and MYB genes were amplified in 4/16 (25%) cases and MYC in 3/16 (18.8%) cases and (ii) transcriptomic analysis identified two molecular subtypes including a Pure-LCC and an adenocarcinoma like-LCC (ADLike-LCC) characterized by different activated pathways and cell of origin.A schematic representation of the main findings of the present study is depicted in Fig. 4.

Fig. 2 Gene expression analysis of LCC, ADC and LCNEC.
Transcriptome sequencing data of 16 marker-null LCC, 17 ADC and 11 LCNEC are represented using two approaches: A Uniform manifold approximation and projection (UMAP) method using the highest variable expressed genes (HVGs; explaining 70% of the total variance), which were 2109 genes.ADC, adenocarcinoma; LCNEC, large cell neuroendocrine carcinoma; Pure-LCC, pure large cell carcinoma; ADLike-LCC, adenocarcinoma like large cell carcinoma.B Heatmap resulting from hierarchical clustering analysis using the 2109 HVGs, in which tumor samples are arranged in columns, grouped according to their expression clustering class, annotated for the histological subtype.The expression values of 2109 genes are indicated in red and blue to indicate high and low expression, respectively To date, only three studies reported a genomic characterization of marker-null LCC in 12 [4], 25 [1], and 7 cases [8], respectively.Karlsson et al. reported that 11/12 (91.7%)LCC had TP53 mutations and 1/12 (8.3%) an activating mutation in MET, while none had KRAS or RB1 alterations [4].Chan et al. identified TP53 mutations in 24/25 (96%) cases, while 4/25 (16%) showed mutations in each KRAS and RB1 genes [1].Liang et al. found TP53 alteration in 4/7 (57.1%) cases and RB1 and KRAS each in 3/7 (42.8%) cases.Our study confirmed TP53 as a key driver of LCC, as well as the frequent involvement of RB1, and identified KEAP1 alterations in 25% of cases.Moreover, we report for the first time the amplification of CRKL and MYB genes in 4/16 (25%) cases and the evaluation of TML that had a median value of 4.4 muts/Mb.
Our comparative expression analysis identified two LCC transcriptomic entities, Pure-LCC and ADLike-LCC, which respectively overlap with the marker-null LCC and the LCC-AC-like subtypes reported by the only gene expression study performed on 12 marker-null LCC [5].
Transcriptomic analysis of the lung marker-null LCC performed by Karlsson et al. highlighted that this group had an expression profile distinct from that of LCNEC and ADC [5], characterized by gene ontology processes such as DNA replication, cell division, and cellular response to stress and oxidation-reduction processes.Our study confirmed these observations defining the Pure-LCC as a molecular class distinct from LCNEC and ADC, characterized by a greater number of mitoses compared to ADLike-LCC and a series of biological processes related to DNA repair due to replication stress.Recently, these processes have been included in the "replication stress signature" previously described by Dreyer et al. [3] in pancreatic cancer and by Thomas et al. [21] in SCLC.Part of this signature is the ATR pathway which showed a highly enriched score in Pure-LCC, suggesting a  In contrast, the transcriptomic analysis of ADLike-LCC showed a distinctive overexpression of NAPSA and Surfactant family genes, typical of adenocarcinomas, together with the exclusive overexpression of AIM2, DKK1, S100A8, and SERPINB4.NAPSA overexpression was associated with the immunopositivity of NapsinA in normal lung tissue within hyperplastic pneumocytes and intra-alveolar macrophages, as previously described [13].In this respect, the GSEA showed that the ADLike-LCC group had the highest proportion of macrophages compared to ADC, LCNEC, and Pure-LCC groups, and the deconvolution analysis highlighted a strong leukocyte infiltrate which sets up a "hot tumour" profile in ADLike-LCC in contrast to a "cold tumour" profile of the Pure-LCC samples.The GSEA also showed a positive correlation with several pathways related to the inflammatory response, including the AIM2 inflammasome complex.The AIM2 gene has been described as a tumour suppressor in early studies [2] but in NSCLC it appears to promote tumour growth as an oncogene in an inflammasome-dependent way [25].A recent study correlated the presence of the AIM2 inflammasome complex signature with drug sensitivity to the compounds AICAR, AT-7519, bosutinib, DMOG, and Z-LLNLE-CHO [15], suggesting a potential therapy for these tumour types.
From a clinicopathological point of view, the two molecular subgroups showed significant differences regarding age at diagnosis (p = 0.023) and the number of mitoses observed (p = 0.009), both higher in Pure-LCCs.The higher mitotic count may suggest more aggressive behaviour of Pure-LCC, among which death events were also higher.However, the limited number of cases analysed does not allow definitive conclusions based on statistical evidence to be drawn.
Transcriptomic analysis also suggested a different cell of origin for the two LCC molecular subtypes: alveolar cell for ADLike-LCC and tuft cell for Pure-LCC.Indeed, GSEA showed that ADLike-LCC had an expression profile close to that of the alveolar epithelium, while Pure-LCC expression profile was similar to that of the tuft cell-like profile described by Yamada et al. and characterized by co-expression of POU2F3 and FOXI1 genes [24].Of note, a recent study on the transcriptional mechanism of the tuft cell lineage identified a critical transcriptional complex composed of POU2F3, OCA-T1, and OCA-T2; these interactions may become an important target for pharmacological blockade in tuft cell-like carcinomas [23].
In conclusion, our study split the histological marker-null LCC category into two different transcriptomic entities, with POU2F3, FOXI1, and AIM2 genes as differential expression markers that might be probed by immunohistochemistry for the differential diagnosis between Pure-LCC and ADLike-LCC.GSEA revealed a profile compatible with tuft cell-like origin for Pure-LCC and an alveolar cell origin for ADLike-LCC.Finally, the identification of several signatures linked Fig. 4 Schematic representation of the results of the study.Markernull LCC were defined as cancers negative for immunohistochemical markers of lung adenocarcinoma (TTF-1, NapsinA), squamous cell carcinoma (p40, CK5/6, p63), and large cell neuroendocrine carcinoma (ChgA, Syn) and for mucin immunostaining (Alcian-PAS).Genomic analysis showed common (TP53, RB1 and KEAP1 mutations) and differential (amplification of CRKL, MYB and MYC; TML, tumor mutational load) alterations.Transcriptomic analysis identified two molecular subtypes: Pure-LCC and ADLike-LCC.These were characterized by different overexpressed genes (red arrows) and potentially targetable enriched pathways (ATR pathway and AIM2 inflammasome complex).Transcriptomes also revealed differences regarding the composition of tumour microenvironment (TME: cold and hot) and the cell of origin to replication stress in Pure-LCC and inflammasome complex in ADLike-LCC could be useful for designing new potential therapeutic approaches for these subtypes.

Fig. 1
Fig. 1 Genomic features of LCC and FISH validation for MYB gene amplification.AThe upper histogram shows the tumour mutational load, defined as the number of mutations per megabase (muts/Mb), of each sample.The central matrix shows 19 genes that were found altered at sequencing analysis.Genes are listed according to the frequency of alterations.B Representative images of the FISH validation for MYB gene of a diploid (on the left) and an amplified case (on the right).Red spots mark MYB gene, while the spectrum green spots label the centromere of chromosome 6

Fig. 3
Fig. 3 Immunohistochemical and gene set enrichment analysis (GSEA).A Differential immunostainings for Aim2, Foxi1 and Pou2f3 markers in pure large cell carcinoma (Pure-LCC) and in adenocarcinoma like LCC (ADLike-LCC) molecular subtypes.HE (haematoxylin and eosin).Heatmaps of B relevant gene sets from MSigDB collections; C immune subpopulations inferred by gene expression of immune metagenes significantly enriched in each of

Table 1
Clinicopathological features of 16 large cell carcinomas (LCCs) FU, follow up; DOD, dead of disease; AWD, alive without disease; cluster: according to transcription profile