Abstract
Background
Lung squamous cell carcinoma (LUSC) is a major subtype of non-small cell lung cancer with a high mortality rate. Identifying causal plasma proteins associated with LUSC could provide new insights into the pathophysiology of the disease and potential therapeutic targets. This study aimed to identify plasma proteins causally linked to LUSC risk using proteome-wide Mendelian randomization (MR) and colocalization analyses.
Methods
Proteome-wide MR analysis was conducted using data from the UK Biobank Pharma Proteomics Project and deCODE genetics. Summary-level data for LUSC were obtained from the ILCCO Consortium, the FinnGen study, and a separate GWAS study. A total of 1,046 shared protein quantitative trait loci (pQTLs) were analyzed. Sensitivity analyses included the HEIDI test for horizontal pleiotropy and colocalization analysis to validate the causal associations.
Results
MR analysis identified six plasma proteins associated with LUSC risk: HSPA1L, PCSK7, POLI, SPINK2, TCL1A, and VARS. HSPA1L (OR = 0.47; 95% CI: 0.34–0.65; P = 4.89 × 10–6), SPINK2 (OR = 0.68; 95% CI: 0.58–0.80; P = 3.17 × 10–6), and VARS (OR = 0.44; 95% CI: 0.31–0.63; P = 5.94 × 10–6) were associated with a decreased risk of LUSC. Conversely, PCSK7 (OR = 1.37; 95% CI: 1.21–1.56; P = 1.40 × 10–6), POLI (OR = 4.50; 95% CI: 2.25–9.00; P = 2.13 × 10–5), and TCL1A (OR = 1.72; 95% CI: 1.34–2.21; P = 1.89 × 10–5) were associated with an increased risk. The SMR analysis and HEIDI test confirmed the robustness of these associations. HSPA1L, SPINK2, and VARS showed significant inverse associations, with strong colocalization evidence for TCL1A (PPH4 = 0.817).
Conclusions
This study identified six plasma proteins potentially causal for LUSC risk. HSPA1L, SPINK2, and VARS are associated with decreased risk, while PCSK7, POLI, and TCL1A are linked to increased risk. These findings provide new insights into LUSC pathogenesis and highlight potential targets for therapeutic intervention.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
Lung cancer is the leading cause of cancer-related mortality worldwide, with non-small cell lung cancer (NSCLC) comprising approximately 85% of all lung cancer cases. NSCLC is further divided into several subtypes, with lung squamous cell carcinoma (LUSC) being the second most common subtype after lung adenocarcinoma (LUAD) [1, 2]. LUSC accounts for about 30% of NSCLC cases and is strongly associated with smoking. Despite advances in surgical techniques and the development of targeted therapies and immunotherapies, the prognosis for LUSC remains poor, with a five-year survival rate of less than 20% for advanced stages [3]. Current treatment options for LUSC are limited, and there is a critical need for identifying new therapeutic targets and understanding the molecular mechanisms underlying its pathogenesis [4].
Recent advances in omics technologies have enabled comprehensive analyses of various molecular layers, such as genomics, transcriptomics, proteomics, and metabolomics, providing deeper insights into cancer biology and identifying potential biomarkers and therapeutic targets. Several studies have explored the associations between different types of omics data and LUSC risk. For instance, Hu et al. [5] analyzed multi-omics differences in LUSC patients with high and low PD1/PDL1 expression, finding significant immune-related gene expression differences that could influence the effectiveness of immunotherapy [5]. Another study by Qu et al. [6] examined the prognostic and therapeutic significance of CD93 in LUSC, revealing its role in promoting tumor cell stemness and resistance to treatments [6]. Xu et al. [7] conducted a multi-omics analysis at the epigenomics and transcriptomics levels, identifying prognostic subtypes of LUSC based on DNA methylation and copy number variations, which provided new insights into the molecular mechanisms of LUSC and potential biomarkers for early diagnosis [7].
Mendelian randomization (MR) is a powerful epidemiological method that uses genetic variants as instrumental variables to assess the causal relationships between exposures and disease outcomes [8]. This approach leverages the random assortment of genes at conception, which mimics the conditions of a randomized controlled trial and helps to minimize confounding and reverse causation. MR has been increasingly applied in various fields of biomedical research to identify potential causal risk factors and therapeutic targets. By integrating genetic data with protein quantitative trait loci (pQTL) information, MR can provide insights into the causal effects of circulating proteins on disease risk [9]. This approach has been particularly useful in uncovering novel biomarkers and drug targets for complex diseases such as cardiovascular diseases, diabetes, and cancers.
To date, there have been limited studies applying MR to investigate the causal role of plasma proteins in LUSC. Most researches have focused on other subtypes of lung cancer or other diseases, leaving a gap in our understanding of the proteomic influences on LUSC [10, 11]. This study aims to fill this gap by conducting a systematic proteome-wide MR analysis to identify plasma proteins causally associated with LUSC. By leveraging large-scale genomic and proteomic datasets, this study seeks to provide new insights into the molecular underpinnings of LUSC and to identify potential biomarkers and therapeutic targets that could improve clinical outcomes for patients.
Methods
Study design
An overview of the study design is shown in Fig. 1. All analyses were based on summary-level data. The proteome-wide MR and colocalization analyses included data from the UK Biobank Pharma Proteomics Project (54,306 participants, 2,958 proteins) and deCODE genetics (35,559 participants, 4,906 proteins). After removing plasma proteins without genetic instruments, the MR analysis included 1,046 shared pQTLs. The outcome data for LUSC were obtained from the ILCCO Consortium (3,275 cases, 15,038 controls) for discovery, and from the FinnGen Study (1,510 cases, 314,193 controls) and a GWAS study (7,426 cases, 55,627 controls) for replication.
Data sources for plasma proteins
Shown in Table 1, we selected index cis-SNPs associated with plasma protein levels at the genome-wide significance level (P < 5 × 10–8) as instrumental variables from two large-scale GWASs: the UK Biobank Pharma Proteomics Project (UKB-PPP) and the deCODE Health study. Cis-SNPs were defined as SNPs within 1 Mb of the gene encoding the protein, with linkage disequilibrium estimated based on the 1000 Genomes European panel.
The UKB-PPP conducted proteomic profiling on blood plasma samples from 54,306 participants using the Olink platform, collecting data on 2,958 proteins [12]. For the two-sample MR analysis, we selected index cis-SNPs as instrumental variables for 2,002 proteins. Similarly, the deCODE Health study identified index cis-SNPs for 1,820 plasma proteins, using 4,906 aptamers measured among 35,559 Icelanders with the SomaScan platform [13]. There were 1046 proteins with index cis-SNPs that overlapped between the two studies.
Given that colocalization analysis could only be performed for proteins from the deCODE Health study due to the availability of summary-level data, we used genetic instrumental variables selected from the deCODE Health study for these overlapping proteins in the main results. We also presented results for the overlapping proteins with genetic instruments from both studies to examine the consistency of results based on different proteomic profiling platforms.
Data source for LUSC
As illustrated in Table 1, data on the associations of protein-associated SNPs with LUSC were obtained from three major sources. The primary dataset was provided by the International Lung Cancer Consortium (ILCCO), which included 3,275 LUSC cases and 15,038 controls of European ancestry. The data can be accessed through the GWAS database at https://gwas.mrcieu.ac.uk/datasets/ieu-a-984/. Additional data were obtained from the FinnGen study, which included 1,510 LUSC cases and 314,193 controls of European ancestry. This data is available for download at https://storage.googleapis.com/finngen-public-data-r10/summary_stats/finngen_R10_C3_NSCLC_ADENO_EXALLC.gz. Furthermore, we utilized data from a study by McKay JD et al., which included 7,426 LUSC cases and 55,627 controls of European ancestry [14]. This data can be accessed through the GWAS catalog at https://www.ebi.ac.uk/gwas/studies/GCST004750.
In our MR analysis, we treated the ILCCO data as the discovery dataset and the FinnGen study as the replication dataset. To increase the statistical power, we performed a meta-analysis of the ILCCO and FinnGen datasets. Colocalization analysis was also performed based on the meta-analysis data to confirm the robustness of our findings. The GWAS meta-analysis was conducted using the METAL software. Genetic variants associated with LUSC at P < 5 × 10–8 in the GWAS meta-analysis, and with low linkage disequilibrium (r2 < 0.001), were selected as instrumental variables for the MR analysis.
Proteome-wide MR analysis
pQTLs from the UK Biobank Pharma Proteomics Project and the deCODE Health study were used to select genetic instruments. The platform ID for each protein from each study was mapped to the gene symbol and unified based on annotations provided by the original studies and manual review (https://biodbnet-abcc.ncifcrf.gov/db/db2db.php). SNPs were mapped to human genome Build 37 (NCBI GRCh37) to unify genomic coordinates. The following criteria were used to select instruments and proteins: (i) SNPs associated with any protein were selected (P < 5 × 10–8); (ii) SNPs and proteins within the Major Histocompatibility Complex (MHC) region (chr6:25.5–34.0 Mb) were excluded due to their complex linkage disequilibrium (LD) structure; (iii) LD clumping was conducted to identify independent pQTLs for each protein (r2 < 0.001); (iv) the R2 and F-statistic were used to estimate the strength of genetic instruments. For duplicate proteins among studies, the protein with the largest sum of R2 was selected. Instruments were classified as cis or trans pQTLs based on the following criteria: a pQTL was defined as cis if the leading SNP in the region was located within 1 Mb of the transcription start site of the protein-coding gene. This resulted in a total of 3,919 instruments and 1,046 unique plasma proteins included in the analysis. Instrument variables are presented in Table 1.
The "TwoSampleMR" package was used to perform MR analysis. For proteins with only one instrument, the Wald ratio method estimated the log odds change in LUSC risk per standard deviation increment of circulating protein levels as proxied by the instrumental variables. The inverse-variance weighted (IVW) method was used for proteins with more than one instrument to obtain MR effect estimates. A heterogeneity test assessed the genetic instruments' heterogeneity based on the Q statistic. Additional analyses, including simple mode, weighted mode, weighted median, and MR-Egger, were performed to account for horizontal pleiotropy. MR-Egger results were used only when the intercept indicated horizontal pleiotropy. Bonferroni correction was applied for multiple testing, with P < 4.78 × 10–5 (0.05/1,046) as the significance level.
Replication MR analysis was conducted for the identified proteins based on LUSC GWAS summary data from the FinnGen study and the McKay study. A P value < 0.05 was defined as the significance level for replication. The estimates for each protein from the ILCCO meta-GWAS, FinnGen, and McKay studies were combined using a random-effects meta-analysis method. In a stratified analysis by tumor site, associations of the identified protein markers with specific lung cancer subtypes were tested.
Sensitivity analysis
To ensure the robustness of our findings, we carried out several sensitivity analyses. Initially, we applied the Cochran Q test to assess the heterogeneity among genetic variants. A P-value greater than 0.05 from this test suggested no significant heterogeneity. Next, we utilized MR-Egger's intercept test to evaluate horizontal pleiotropy. Results indicating a P-value greater than 0.05 implied an absence of horizontal pleiotropy. Additionally, we employed the MR Pleiotropy Residual Sum and Outlier (MR-PRESSO) method to detect influential outliers due to pleiotropy. In this method, a P-value greater than 0.05 in the Global test indicated the presence of horizontal pleiotropic outliers.
Summary-data-based Mendelian randomization (SMR) analysis
SMR analysis was conducted as a complementary method to verify the causal associations between proteins and LUSC. The heterogeneity in dependent instruments (HEIDI) test, using multiple SNPs in a region, was employed to distinguish proteins associated with LUSC risk due to a shared genetic variant rather than genetic linkage. The SMR and HEIDI tests were performed using SMR software (SMR v1.3.1) [15].
A Bonferroni adjusted-P value < 0.05was defined as the significance level for SMR. The P value of the HEIDI test > 0.10 indicated that the association between the protein and LUSC was not driven by linkage disequilibrium. After identifying LUSC-related proteins, we conducted a comprehensive literature search to define novel protein markers for LUSC. Proteins that had not been previously reported to be associated with LUSC in terms of gene polymorphisms, mRNA levels, or protein levels were classified as novel protein markers for LUSC.
Bayesian co-localization analysis
We conducted colocalization analysis using the R package "coloc" to determine if the associations between the identified proteins and LUSC resulted from linkage disequilibrium. Bayesian co-localization assigns posterior probabilities to five hypotheses: PP.H0, no association with either plasma protein or LUSC; PP.H1, association with plasma protein only; PP.H2, association with LUSC only; PP.H3, association with both plasma protein and LUSC, but with distinct causal variants; and PP.H4, association with both plasma protein and LUSC, with the same causal variant. A posterior probability of H4 (PPH4) greater than 0.80 is deemed significant, corresponding to a false discovery rate of less than 5% [16].
Ethical statement
The research protocols for this study were reviewed and approved by the Ethical Review Committee of Shanghai Chest Hospital. The study adhered strictly to the ethical standards established in the 1964 Declaration of Helsinki and its subsequent amendments. Participant confidentiality was rigorously maintained throughout the research process, ensuring full compliance with data protection regulations.
Results
MR analysis identified six plasma proteins associated with LUSC risk
These proteins include HSPA1L, PCSK7, POLI, SPINK2, TCL1A, and VARS (Fig. 2). Specifically, HSPA1L (OR = 0.47; 95% CI: 0.34–0.65; P = 4.89 × 10–6), SPINK2 (OR = 0.68; 95% CI: 0.58–0.80; P = 3.17 × 10–6), and VARS (OR = 0.44; 95% CI: 0.31–0.63; P = 5.94 × 10–6) were associated with a decreased risk of LUSC. Conversely, PCSK7 (OR = 1.37; 95% CI: 1.21–1.56; P = 1.40 × 10–6), POLI (OR = 4.50; 95% CI: 2.25–9.00; P = 2.13 × 10–5), and TCL1A (OR = 1.72; 95% CI: 1.34–2.21; P = 1.89 × 10–5) were associated with an increased risk of LUSC.
Sensitivity analysis for LUSC causal proteins
The results of sensitivity analyses confirmed the robustness of our primary MR analyses for LUSC causal proteins (Table 2). No evidence of heterogeneity was found in the association of the six identified proteins, as measured by Cochran Q statistics (P_heterogeneity > 0.05). Additionally, there was no indication of horizontal pleiotropy in the instrumental variables (IVs), as assessed by MR-Egger intercept (P_pleiotropy > 0.05). The high support of colocalization evidence was observed for several proteins, which strengthens the confidence in the identified associations. Specifically, PCSK7, SPINK2, and TCL1A showed consistent results across various sensitivity tests, underscoring their potential causal roles in LUSC. These findings suggest that the observed causal relationships between the identified proteins and LUSC are robust and not likely confounded by pleiotropic effects or other biases.
Validation in replication datasets and meta-analysis
The validation of the six identified proteins associated with LUSC was conducted in replication datasets, including the FinnGen study and a separate GWAS study, followed by a meta-analysis (Fig. 3). HSPA1L was consistently associated with a decreased risk of LUSC across all datasets, with a combined OR of 0.51 (95% CI: 0.41–0.63, P = 5.92 × 10–10). SPINK2 also showed a robust inverse association with LUSC, with a combined OR of 0.71 (95% CI: 0.63–0.79, P = 9.41 × 10–10). Conversely, PCSK7 and TCL1A did not show consistent associations across the datasets, with significant heterogeneity observed (I2 = 84% and I2 = 73%, respectively). The combined ORs for PCSK7 and TCL1A were 1.14 (95% CI: 0.90–1.43, P = 0.273) and 1.35 (95% CI: 0.96–1.89, P = 0.0850), respectively, indicating potential variability in their associations with LUSC risk. POLI and VARS displayed mixed results across the studies, with POLI having a combined OR of 1.27 (95% CI: 0.31–5.19, P = 0.738) and VARS showing a strong inverse association with a combined OR of 0.48 (95% CI: 0.38–0.60, P = 7.05 × 10–10). These results highlight the need for further investigation into the roles of PCSK7, POLI, and TCL1A in LUSC, while confirming the potential causal roles of HSPA1L, SPINK2, and VARS.
SMR analysis, HEIDI test and colocalization analysis
The SMR analysis and HEIDI test provided further validation for the causal associations between plasma protein levels and LUSC risk (Fig. 4). HSPA1L, SPINK2, and VARS were significantly associated with a decreased risk of LUSC. HSPA1L had an OR of 0.33 (95% CI: 0.22–0.49; P = 4.73 × 10–7) and passed the HEIDI test (P = 0.012), although no colocalization evidence was found (PPH4 < 0.001). SPINK2 showed an OR of 0.69 (95% CI: 0.58–0.81; P = 8.20 × 10–5) with a non-significant HEIDI test (P = 0.293) and moderate colocalization support (PPH4 = 0.274). VARS demonstrated a strong inverse association (OR = 0.29; 95% CI: 0.19–0.46; P = 6.73 × 10–7), failed the HEIDI test (P = 0.008), but lacked colocalization evidence (PPH4 < 0.001). Conversely, PCSK7 showed no significant association with LUSC (OR = 1.08; 95% CI: 1–1.16; P = 0.216) and no colocalization support (PPH4 < 0.001). POLI was significantly associated with an increased risk (OR = 4.5; 95% CI: 2.07–9.76; P = 8.42 × 10–4) but failed the HEIDI test (P = 0.002) and lacked colocalization evidence (PPH4 < 0.001). TCL1A had a significant positive association (OR = 1.72; 95% CI: 1.33–2.21; P = 1.84 × 10–4), passed the HEIDI test (P = 0.032), and showed strong colocalization evidence (PPH4 = 0.817).
Discussion
In this study, we conducted a comprehensive proteome-wide MR analysis to identify plasma proteins causally associated with LUSC risk. Our analysis identified six proteins—HSPA1L, PCSK7, POLI, SPINK2, TCL1A, and VARS—as having significant associations with LUSC risk. Specifically, HSPA1L, SPINK2, and VARS were found to be associated with a decreased risk, while PCSK7, POLI, and TCL1A were associated with an increased risk of LUSC. The robustness of these findings was confirmed through SMR and HEIDI tests, and Bayesian colocalization analysis provided further evidence for the shared genetic basis of these associations. These results highlight potential novel biomarkers and therapeutic targets for LUSC, offering new insights into the disease's molecular mechanisms and avenues for future research.
Previous studies have explored the causal relationships between various biomarkers and lung cancer subtypes using MR methods. For instance, Li et al. [17] conducted a proteome-wide MR analysis and identified several plasma proteins associated with lung cancer risk, specifically LUAD. Their study highlighted proteins such as ALAD, FLT1, and ICAM5, which exhibited protective effects against LUAD, while others like MDGA2 and RNASET2 were associated with increased risk. In contrast, our study focused specifically on LUSC and identified six novel proteins—HSPA1L, PCSK7, POLI, SPINK2, TCL1A, and VARS—that are causally linked to LUSC risk. Ji et al. [18]conducted a bidirectional MR study to explore the causal association between mitochondrial proteins and different pathological types of lung cancer, including LUSC. Their study identified significant causal relationships for several mitochondrial proteins with overall lung cancer, LUSC, and small cell lung carcinoma (SCLC), but found no causality with LUAD. This aligns with our findings in emphasizing the importance of protein biomarkers in LUSC, though our study identified different proteins using a proteome-wide approach. Additionally, Chen et al. [19]conducted a two-sample MR study to evaluate the causal relationships between plasma metabolites and the risk of various cancers, including lung cancer. Their study identified significant associations between specific plasma metabolites and lung cancer risk, demonstrating the utility of MR in uncovering novel biomarkers for cancer risk. However, their study did not focus on specific lung cancer subtypes, whereas our study provides targeted insights into LUSC.
Among the proteins identified, HSPA1L, SPINK2, and VARS showed significant associations with LUSC risk, suggesting their potential roles in the disease's pathogenesis. HSPA1L has been implicated in enhancing cancer stem cell-like properties by activating IGF1Rβ and regulating β-catenin transcription, which are critical pathways in tumor progression and resistance to therapy [20]. In LUSC, HSPA1L could contribute to the aggressive nature and poor prognosis of the disease by promoting epithelial-mesenchymal transition (EMT) and maintaining the cancer stem cell phenotype, making it a potential therapeutic target for disrupting these pathways [21, 22].
SPINK2, a serine protease inhibitor, is highly expressed in certain cancers and plays a role in tumor progression by regulating protease activity and intercellular communication within the tumor microenvironment [23]. Studies have shown that SPINK2 expression is associated with poor prognosis and therapy resistance in acute myeloid leukemia (AML) [24]. In the context of LUSC, the elevated expression of SPINK2 might similarly contribute to tumor growth and resistance to conventional treatments, suggesting that targeting SPINK2 could be a novel strategy to enhance treatment efficacy and improve patient outcomes [25]. VARS, a valyl-tRNA synthetase, is part of the aminoacyl-tRNA synthetase (ARS) family, which has been associated with various cancers due to its non-canonical functions in signal transduction and cellular homeostasis [26]. Research indicates that VARS and other ARSs are involved in cancer-associated activities, including promoting survival and proliferation of cancer cells [27]. In LUSC, the significant association of VARS with decreased risk suggests a potential protective role, which could be leveraged to develop new therapeutic approaches aimed at modulating ARS activity to inhibit tumor growth.
PCSK7, POLI, and TCL1A were also significantly associated with LUSC risk in our study. The MR analysis revealed that PCSK7 was associated with an increased risk of LUSC (OR = 1.37; 95% CI: 1.21–1.56; P = 1.40 × 10–6). The SMR analysis supported this finding, but the HEIDI test did not indicate colocalization, suggesting that the observed association might not be driven by a single causal variant. Previous studies have highlighted the role of PCSK7 in cancer progression, including its involvement in the activation of matrix metalloproteinases which facilitate tumor invasion and metastasis [28].
POLI, a member of the low-fidelity Y-family of DNA polymerases, was found to be associated with an increased risk of LUSC (OR = 4.50; 95% CI: 2.25–9.00; P = 2.13 × 10–5) in our MR analysis. This was further supported by the SMR analysis, although the HEIDI test suggested the association might be due to linkage disequilibrium. POLI has been previously implicated in promoting the migration and invasion of lung cancer cells, indicating its potential role in LUSC progression [29]. TCL1A, identified through MR analysis as being associated with an increased risk of LUSC (OR = 1.72; 95% CI: 1.34–2.21; P = 1.89 × 10–5), also showed strong evidence of colocalization (PP.H4 = 0.817) in the SMR analysis. TCL1A is known to act as an oncogene in various cancers, including its role in modulating the Akt/mTOR signaling pathway, which is crucial for cell growth and survival [30]. This makes TCL1A a promising target for therapeutic interventions in LUSC.
Several studies have previously developed risk models to predict the prognosis of LUSC patients. For instance, a study constructed a cell death-related prognostic signature using genes associated with autophagy, apoptosis, and necrosis, which showed significant predictive value for patient survival [31]. Another study developed a model based on immunogenic cell death-related lncRNAs, demonstrating prognostic significance and correlation with PET/CT imaging features [32]. Despite these advancements, our study is the first to report on the causal associations of the specific plasma proteins HSPA1L, PCSK7, POLI, SPINK2, TCL1A, and VARS with LUSC risk. These findings provide new insights into the molecular mechanisms of LUSC and highlight potential biomarkers and therapeutic targets that have not been previously reported in the context of LUSC risk models. Future studies should aim to validate these markers in real-world cohorts and explore their potential clinical applications.
This study has several limitations. Interpreting the PPH4 in colocalization requires caution, as a low PPH4 may not conclusively indicate the absence of colocalization, particularly when PPH3 is also low due to limited statistical power. Enhancing the power of colocalization methods through improved analytical techniques, fine-mapping, and explicit modeling of varying linkage disequilibrium (LD) patterns could address this issue. A key assumption of MR is the "no horizontal pleiotropy" assumption, which presumes that the instrumental variable influences the outcome solely through the exposure. Despite our efforts to mitigate horizontal pleiotropy through exclusion of trans-associated loci, Steiger filtering, and sensitivity analyses, completely eliminating this bias is challenging. Weak instrument bias in MR can lead to inaccurate estimates and reduced statistical power. Although we selected strong genetic instruments and used various statistical methods and sensitivity analyses, some bias may still persist. All data used in our analyses were downloaded from online databases, such as the UK Biobank Pharma Proteomics Project and deCODE genetics. This reliance on existing databases may introduce biases related to the specific populations and methodologies used in these studies. To address this limitation, future research should incorporate real-world cohorts to validate the findings of our study. Including diverse and representative populations in real-world settings will help ensure that the observed associations are robust and generalizable. Such validation efforts are crucial for translating our proteome-wide Mendelian randomization and colocalization analysis findings into clinical applications and therapeutic strategies. Lastly, our findings are based primarily on data from individuals of European ancestry, limiting their generalizability to other ethnicities. Further research is needed to validate these findings across diverse populations.
Conclusions
This study identifies plasma proteins causally associated with LUSC risk, providing new insights into its molecular mechanisms. The robust evidence from proteome-wide MR and Bayesian colocalization analyses underscores the potential of these proteins as biomarkers and therapeutic targets. Future research should validate these findings in diverse populations and explore the functional roles of these proteins in LUSC pathogenesis. Enhancing analytical techniques and expanding datasets will improve the precision of MR and colocalization analyses. Collaborative efforts to gather stratified data will facilitate more detailed analyses, advancing our understanding and treatment of LUSC.
Availability of data and materials
Not applicable.
Abbreviations
- LUSC:
-
Lung Squamous Cell Carcinoma
- NSCLC:
-
Non-Small Cell Lung Cancer
- LUAD:
-
Lung Adenocarcinoma
- MR:
-
Mendelian Randomization
- pQTL:
-
Protein Quantitative Trait Loci
- SMR:
-
Summary-data-based Mendelian Randomization
- HEIDI:
-
Heterogeneity in Dependent Instruments
- PPH4:
-
Posterior Probability of Hypothesis 4
- GWAS:
-
Genome-Wide Association Study
- OR:
-
Odds Ratio
- CI:
-
Confidence Interval
- LD:
-
Linkage Disequilibrium
- IVW:
-
Inverse-Variance Weighted
- ILCCO:
-
International Lung Cancer Consortium
- UKB-PPP:
-
UK Biobank Pharma Proteomics Project
- SNP:
-
Single Nucleotide Polymorphism
- ARS:
-
Aminoacyl-tRNA Synthetase
- EMT:
-
Epithelial-Mesenchymal Transition
- AML:
-
Acute Myeloid Leukemia
References
Nooreldeen R, Bach H. Current and future development in lung cancer diagnosis. Int J Mol Sci. 2021;22(16):8661.
Thai AA, Solomon BJ, Sequist LV, Gainor JF, Heist RS. Lung cancer. Lancet. 2021;398(10299):535–54.
Lau SCM, Pan Y, Velcheti V, Wong KK. Squamous cell lung cancer: current landscape and future therapeutic options. Cancer Cell. 2022;40(11):1279–93.
Niu Z, Jin R, Zhang Y, Li H. Signaling pathways and targeted therapies in lung squamous cell carcinoma: mechanisms and clinical trials. Signal Transduct Target Ther. 2022;7(1):353.
Hu Z, Bi G, Sui Q, Bian Y, Du Y, Liang J, et al. Analyses of multi-omics differences between patients with high and low PD1/PDL1 expression in lung squamous cell carcinoma. Int Immunopharmacol. 2020;88: 106910.
Qu J, Lin L, Fu G, Zheng M, Geng J, Sun X, et al. The analysis of multiple omics and examination of pathological images revealed the prognostic and therapeutic significances of CD93 in lung squamous cell carcinoma. Life Sci. 2024;339: 122422.
Xu Y, She Y, Li Y, Li H, Jia Z, Jiang G, et al. Multi-omics analysis at epigenomics and transcriptomics levels reveals prognostic subtypes of lung squamous cell carcinoma. Biomed Pharmacother. 2020;125: 109859.
Sanderson E, Glymour MM, Holmes MV, Kang H, Morrison J, Munafò MR, et al. Mendelian randomization. Nat Rev Methods Primers. 2022;2:2.
Zheng J, Haberland V, Baird D, Walker V, Haycock PC, Hurle MR, et al. Phenome-wide Mendelian randomization mapping the influence of the plasma proteome on complex diseases. Nat Genet. 2020;52(10):1122–31.
Sun T, Chen X, Yan H, Liu J. The causal association between serum metabolites and lung cancer based on multivariate Mendelian randomization. Medicine (Baltimore). 2024;103(7): e37085.
Li Q, Wei Z, Zhang Y, Zheng C. Causal role of metabolites in non-small cell lung cancer: Mendelian randomization (MR) study. PLoS One. 2024;19(3): e0300904.
Sun BB, Chiou J, Traylor M, Benner C, Hsu YH, Richardson TG, et al. Plasma proteomic associations with genetics and health in the UK Biobank. Nature. 2023;622(7982):329–38.
Ferkingstad E, Sulem P, Atlason BA, Sveinbjornsson G, Magnusson MI, Styrmisdottir EL, et al. Large-scale integration of the plasma proteome with genetics and disease. Nat Genet. 2021;53(12):1712–21.
McKay JD, Hung RJ, Han Y, Zong X, Carreras-Torres R, Christiani DC, et al. Large-scale association analysis identifies new lung cancer susceptibility loci and heterogeneity in genetic susceptibility across histological subtypes. Nat Genet. 2017;49(7):1126–32.
Zhu Z, Zhang F, Hu H, Bakshi A, Robinson MR, Powell JE, et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat Genet. 2016;48(5):481–7.
Giambartolomei C, Vukcevic D, Schadt EE, Franke L, Hingorani AD, Wallace C, et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 2014;10(5): e1004383.
Li H, Du S, Dai J, Jiang Y, Li Z, Fan Q, et al. Proteome-wide Mendelian randomization identifies causal plasma proteins in lung cancer. iScience. 2024;27(2):108985.
Ji T, Lv Y, Liu M, Han Y, Yuan B, Gu J. Causal relationships between mitochondrial proteins and different pathological types of lung cancer: a bidirectional mendelian randomization study. Front Genet. 2024;15: 1335223.
Chen Y, Xie Y, Ci H, Cheng Z, Kuang Y, Li S, et al. Plasma metabolites and risk of seven cancers: a two-sample Mendelian randomization study among European descendants. BMC Med. 2024;22(1):90.
Choi SI, Lee JH, Kim RK, Jung U, Kahm YJ, Cho EW, et al. HSPA1L enhances cancer stem cell-like properties by activating IGF1Rβ and regulating β-catenin transcription. Int J Mol Sci. 2020;21(18):6957.
Go G, Lee SH. The cellular prion protein: a promising therapeutic target for cancer. Int J Mol Sci. 2020;21(23):9208.
Lee JH, Han YS, Yoon YM, Yun CW, Yun SP, Kim SM, et al. Role of HSPA1L as a cellular prion protein stabilizer in tumor progression via HIF-1α/GP78 axis. Oncogene. 2017;36(47):6555–67.
Barresi V, Di Bella V, Lo Nigro L, Privitera AP, Bonaccorso P, Scuderi C, et al. Temporary serine protease inhibition and the role of SPINK2 in human bone marrow. iScience. 2023;26(6):106949.
Chen X, Zhao L, Yu T, Zeng J, Chen M. SPINK2 is a prognostic biomarker related to immune infiltration in acute myeloid leukemia. Am J Transl Res. 2022;14(1):197–210.
Chen T, Lee TR, Liang WG, Chang WS, Lyu PC. Identification of trypsin-inhibitory site and structure determination of human SPINK2 serine proteinase inhibitor. Proteins. 2009;77(1):209–19.
Kim YW, Kwon C, Liu JL, Kim SH, Kim S. Cancer association study of aminoacyl-tRNA synthetase signaling network in glioblastoma. PLoS One. 2012;7(8): e40960.
Kobayashi D, Tokuda T, Sato K, Okanishi H, Nagayama M, Hirayama-Kurogi M, et al. Identification of a specific translational machinery via TCTP-EF1A2 interaction regulating NF1-associated tumor growth by Affinity Purification and Data-independent Mass Spectrometry Acquisition (AP-DIA). Mol Cell Proteomics. 2019;18(2):245–62.
Demidyuk IV, Shubin AV, Gasanov EV, Kurinov AM, Demkin VV, Vinogradova TV, et al. Alterations in gene expression of proprotein convertases in human lung cancer have a limited number of scenarios. PLoS One. 2013;8(2): e55752.
Li L, Tian H, Cheng C, Li S, Ming L, Qi L. siRNA of DNA polymerase iota inhibits the migration and invasion in the lung cancer cell A549. Acta Biochim Biophys Sin (Shanghai). 2018;50(9):929–33.
Hao J, Mei H, Luo Q, Weng J, Lu J, Liu M, et al. TCL1A acts as a tumour suppressor by modulating gastric cancer autophagy via miR-181a-5p-TCL1A-Akt/mTOR-c-MYC loop. Carcinogenesis. 2023;44(1):29–37.
Mao G, Yang D, Liu B, Zhang Y, Ma S, Dai S, et al. Deciphering a cell death-associated signature for predicting prognosis and response to immunotherapy in lung squamous cell carcinoma. Respir Res. 2023;24(1):176.
Han Y, Dong Z, Xing Y, Zhan Y, Zou J, Wang X. Establishment of a prognosis prediction model for lung squamous cell carcinoma related to PET/CT: basing on immunogenic cell death-related lncRNA. BMC Pulm Med. 2023;23(1):511.
Acknowledgements
We thank all participants and investigator involved in the McRae AF. et al. genome-wide association analysis on methylation, the eQTLGen consortium, the Genotype-Tissue Expression (GTEx) project, Ferkingstad E. et al. genome-wide association analysis on proteome, the TRICL, the FinnGen study for sharing data. We thank Dr. Lv and Dr. Liu for their help in instruction for SMR and Colocalization analysis.
Funding
This study was supported by The Medical Engineering Cross Research Funding of Shanghai Jiaotong University "Star of Jiaotong University" Program (24X010301595).
Author information
Authors and Affiliations
Contributions
QW, XX, and XL contributed equally to this work and should be considered co-first authors. QW was involved in the conception and design of the study, data collection, and manuscript writing. XX conducted the data analysis and interpretation, and contributed to the drafting and critical revision of the manuscript. YK L and XL participated in data collection, analysis, and manuscript drafting. SW provided critical insights during the study design, supervised the analysis, and was involved in manuscript revision. GL is the corresponding author and was responsible for overseeing the entire study, providing guidance on the study design and data interpretation, and critically revising the manuscript for important intellectual content. All authors have read and approved the final manuscript.
Corresponding authors
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Competing Interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Wang, Q., Xue, X., Ling, X. et al. Causal associations of plasma proteins with lung squamous cell carcinoma risk: a proteome-wide Mendelian randomization and colocalization analysis. CCB 3, 18 (2024). https://doi.org/10.1007/s44272-024-00024-w
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s44272-024-00024-w