Construction and validation of a novel prognostic nomogram for predicting overall survival in lung adenocarcinoma patients with different patterns of metastasis

Objective Metastasis of lung cancer is an important factor affecting survival. The present study proposed to establish and verify a nomogram for predicting overall survival (OS) in lung adenocarcinoma (LUAD) patients with different patterns of metastasis. Methods A total of 9727 patients diagnosed with metastatic LUAD patients from 2010 to 2015 were enrolled based on surveillance, epidemiology and end results (SEER) Database and then randomly divided into training and validation cohorts, and 136 patients in our Cancer Center were enrolled as the external validation cohort. Univariate and multivariate analyses were performed to evaluate the prognostic impact on OS. A prognostic nomogram was constructed and evaluated by C-index, calibration curve, decision curve analysis (DCA), and risk stratification system. Results Ultimately, 6809 and 2918 patients diagnosed with metastatic LUAD in the training and validation cohorts were enrolled in the study, respectively. A male sex, a later T and N stage, a larger tumor size, treatment including no surgery, no chemotherapy and no radiotherapy, metastasis sites were found to be independent risk factors in LUAD patients for worse OS, and then incorporated into the nomogram. The frequency of bone metastasis was the highest, and in single site metastasis, the prognosis of liver metastasis was the worst. Two-site metastasis is more common than three-site and four-site metastasis, and co-metastasis eventually leads to a worse survival outcome. The C-index value of nomogram for predicting OS were 0.798, 0.703 and 0.698 in the internal training, validation and external validation cohorts, separately. The calibration curves for the 6-months, 1-year and 2-year showed significant agreement between nomogram models and actual observations. The DCA curves indicated nomogram was more beneficial than the AJCC TNM stage. Patients were further divided into low-risk and high-risk groups according to nomogram predicted scores and developed a survival risk classification system. Conclusions Our prognostic nomogram is expected to be an accurate and individualized clinical predictive tool for predicting OS in LUAD patients with different patterns of metastasis.


Introduction
Lung cancer is one of the common malignant tumors that threaten human life, with high morbidity and mortality, accounting for approximately one tenth (11.4%) and one fifth (18.0%) of confirmed cancer and cancer deaths, respectively (Siegel et al. 2022).Non-small cell lung cancer (NSCLC) accounts for about 80% of lung cancer, of which lung adenocarcinoma (LUAD) is the most common subtype (Chen et al. 2014).Early lung cancer can be asymptomatic or atypical, when diagnosed, many patients have progressed to advanced and distant metastasis, which largely determines the treatment strategy and the possibility of long-term survival Ying Xiong and Feifei Gu contributed equally to this study.(Nasim et al. 2019).With the development of gene detection, targeted therapy has made remarkable progress (Jones and Baldwin 2018).In addition, the study of tumor microenvironment has also promoted the progress of immunotherapy to some extent (Hu et al. 2019).Thus, for advanced patients cannot be treated with surgery, the combined use of chemotherapy, radiotherapy, targeted therapy and immunotherapy is recommended to further reduce metastatic risk (Abu Rous et al. 2023).Despite, the prognosis is still not ideal, with their 5-year survival rate is less than 20%, and metastatic lesions are the main reason (Xie et al. 2021).
LUAD is a malignant tumor with major metastatic sites including bone, brain, liver and lung, which has a certain effect on the survival rate (Hendriks et al. 2015).At present, TNM staging is still the gold standard for predicting the prognosis of LUAD, however, patients with the same stage often have different prognosis after receiving similar treatment, probably as some clinical characteristics are not considered in the staging system.Thus, a more accurate prognostic model is needed to provide information for clinical decisions in patients with metastatic LUAD.Recently, as a new statistical prediction model, nomogram has shown good application value in all kinds of cancer (Iasonos et al. 2008), including nasopharyngeal carcinoma (Tang et al. 2018), esophageal cancer (Liu et al. 2021a), colorectal cancer (Liu et al. 2021b), hepatocellular cancer (Liu et al. 2020), small cell lung cancer (Yang et al. 2021) and so on.A previous study (Pang et al. 2022) has constructed a nomogram for predicting distant metastasis in invasive LUAD, however, this study only included whether patients had distant metastasis, no further analysis of specific metastatic sites and different patterns was constructed, and few effective risk stratification tools to optimize the prognostic role of metastasis in LUAD survival were established.
Therefore, in this study, we aim to explore different metastatic patterns of LUAD and their effects on prognosis based on surveillance, epidemiology and end results (SEER) database, and then further evaluate the reliability and feasibility through independent internal cohorts, in order to improve the predictive effectiveness of traditional methods and guide clinical decision-making.

Patients selection
This study was conducted as a retrospective study using the SEER database including cancer incidence, survival and treatment information from multiple registries (http:// seer.cancer.gov/).The data of patients diagnosed with metastatic LUAD from 2010 to 2015 were enrolled using SEER*Stat version 8.4.1 (username: 25736-Nov2021).Inclusion criteria were the following: (1) patients diagnosed with primary cancer from 2010 to 2015; (2) according to the International Classification of Diseases (ICD) for Oncology-3, patients histological codes were: 8140/3, 8141/3, 8143/3, 8144/3, 8146/3, 8147/3, 8149/3, 8250/3, 8251/3, 8255/3, 8260/3, 8310/3, 8323/3, 8480/3, 8481/3, 8570/3, 8574/3; (3) data such as year of diagnosis, sex, age of diagnosis, race (white, black, other), grade, TNM stage at the time of diagnosis, tumor size, treatment, different patterns of metastatic sites, survival months, and vital survival status were collected from the SEER database.Exclusion criteria included: (1) patients with more than one kind of primary malignant cancer; (2) patients aged < 18 years; (3) patients with survival time less than 1 month; (4) patients with unknown or missing clinical information.The specific inclusion and exclusion processes are shown in Fig. 1.And also, we enrolled patients with metastatic LUAD diagnosed from 2013 to 2016 at Union Hospital Cancer Center as the external validation cohort.Cases without sufficient clinical characteristic information and incomplete follow-up information were excluded.Written consent was obtained from all enrolled patients and the study was approved by Cancer center of Union hospital of Tongji medical college of Huazhong university of science and technology.

Data collection and clinical endpoints
Demographic and clinicopathological data were extracted in the study: age, sex, race, grade, tumor stage, nodal stage, tumor size, surgery, chemotherapy, radiotherapy, metastatic site and follow-up data.The primary outcome of this study was overall survival (OS), which was defined as the time between first treatment and death or last follow-up.Events that had not occurred by the last follow-up date were recorded as censoring.All patients were followed up by regular records of each clinic recheck or phone calls.

Statistical analysis
All the analyses were conducted using SPSS 26.0, Graph-Pad Prism 9.0 and R software v4.2.3.The patients in the training and validation cohorts were divided with a ratio of 7:3 using the "create Data Partition" function in the R "crate" package to ensure that the outcome events were randomly distributed.The Chi-squared test was used to explore the baseline balance between the internal and external cohorts.Correlations between variables were assessed using the Pearson correlation coefficient.Survival curves were analyzed using the Kaplan-Meier method and compared by the log-rank test.Univariate and multivariate Cox proportional hazards regressions were conducted to evaluate the prognostic significance of variables with respect to OS.The nomogram was explored by the "rms" package of R software and the concordance index (C-index) was calculated to predict the performance of the established nomogram, then a calibration curve (1000 bootstrap resampling) to test the calibration and a decision curve analysis (DCA) to evaluate the clinical utility.In addition, the "survminer" package was used to get the cutoff value according to the scores predicted by the nomogram, and then the cohorts were divided into different risk groups to establish the Kaplan-Meier curve.A twotailed P value less than 0.05 was considered statistically significant.

Basic patient characteristics in the training and validation cohorts
Ultimately, a total of 9727 patients diagnosed with metastatic LUAD were enrolled in the study, with 6809 and 2918 patients in the training and validation cohorts, respectively.The baseline characteristics of the patients were displayed in Table 1.The training cohort was comprised of 3344 female (49.1%) and 3465 males (50.9%), with the similar proportion in the validation cohort.Besides, metastatic LUAD patients in the two cohorts tend to have a larger tumor size (54.5% in the training and 53.8% in the validation cohort), a later T stage (T3-T4) (63.9% vs 64.1%) and a later N stage (N2-N3) (67.7% vs 67.7%), respectively.At the end of the study period, 681 (10.0%) and 293 (10.0%) patients suffered from death in the two cohorts, separately.
In the training cohort, patient with solitary bone, brain, liver and lung metastasis were 1083 (15.9%), 1083 (15.9%), 182 (2.7%) and 987 (14.5%), respectively.Among patient with two metastatic sites, the proportion with both bone and lung metastasis (6.4%) was higher than others.Also, patients with bone, brain and lung metastasis (2.8%) were the most in the three metastatic sites.Besides, the number of LUAD patients with both bone, brain, liver and lung metastasis were 98 (1.4%), which was similar in the validation cohort.
Table 1 showed the correlations between the patient baseline parameters in the two cohorts.As was shown, there was no correlation in age, sex, race, grade, T stage, N stage, tumor size, surgery, chemotherapy, radiotherapy and metastasis, indicating there were no significant differences between the two cohorts.In addition, we enrolled a total of 136 patients with metastatic LUAD at Union Hospital Cancer Center as the external validation cohort.The characteristics between internal and external validation cohorts were compared in Table 2. Similarly, there was no significant difference in the above clinical factors except race (all the patients were Chinese).

Kaplan-Meier method and log-rank test
To further investigate the prognosis of metastasis in LUAD patients, survival curves based on OS and different metastatic sites were analyzed in the training and validation cohorts using the Kaplan-Meier method and compared by the log-rank test (Fig. 2).In the training cohort, we found that the frequency of bone metastasis (39.1%) was the highest, and in solitary site metastasis, the prognosis of liver metastasis was the worst (P < 0.001, Fig. 2a, c).Two-site metastasis (22%) is more common than three-site (7.5%) and four-site metastasis (1.4%).And in the cases with multiple metastatic sites, the clinical outcomes of the cases with bone, brain and liver metastasis were the worst (P < 0.001, Fig. 2b, d), eventually inferior to the single sites.These results were nearly similar in the validation cohort (Fig. 3).

Univariate and multivariate analysis
In the univariate Cox regression analysis, age, sex, T stage, N stage, tumor size, treatment including surgery, chemotherapy and radiotherapy, metastasis sites besides single lung were corroborated as potential factors affecting OS in the training and validation cohorts (Tables 3, 4).Parameters that reached a significant difference in the univariate analysis were further analyzed considering the influence of confounding factors.As demonstrated in the table, the above variables except age are included after analysis in the multivariate Cox regression analysis.Fortunately, a male sex, a later T and N stage, a larger tumor size, treatment including no surgery, no chemotherapy and no radiotherapy, metastasis sites were still found to be independent risk factors in LUAD patients for worse OS in the two cohorts (All P values < 0.05).

Construction of nomogram
To further analyze the prognostic values of risk factors, we created the nomogram model that incorporated all significant factors in the multivariate Cox regression analysis (Fig. 3).As shown in the nomogram, a male gender, patients with T4 or N3 stage, tumor larger than 4 cm, no surgery, no chemotherapy or radiotherapy, patients with bone, brain and liver metastasis were demonstrated as the favorable parameters to prognosis.And the 6-month, 1-year and 2-year OS was predicted in this nomogram based on chosen variables

Calibration and validation of the nomogram
Generally, the C-index is used to quantify the prediction ability of the nomogram model, and the values were 0.798, 0.703 and 0.698 in the internal training, validation and external validation cohorts, separately.And then, the nomogram calibration curves were constructed for 6-month, 1-year and 2-year in the cohorts, respectively, which demonstrated a high degree of consistency between the anticipated and actually observed survival probabilities (Fig. 4).Normally, the DCA curve was developed to identify the clinical benefits and utility of the nomogram compared to the TNM staging system.As well, the DCA curves of the cohorts both demonstrated that the nomogram we constructed in predicting OS was more beneficial than the 7th American Joint Committee on Cancer (AJCC) TNM stage, and also displaying net benefit in predictive models for threshold probabilities at different time points (Fig. 5).To sum up, the above results showed that the nomogram we

Risk stratification of prognosis by the nomogram model
Finally, a risk stratification system for predicting OS based on the total nomogram scores was developed.First, the total risk score for each case was calculated in the training cohort, and then the cut-off value was 236 according to R software (Fig. 6).Based on the cut-off value, included patients were divided into low and high-risk groups.Further analyze in the internal and external cohorts were demonstrated that, for metastatic LUAD patients, those in the low-risk group (total risk score < 236) had superior prognosis than the high-risk group (total risk score ≥ 236) (P < 0.0001) (Fig. 7).

Discussion
Metastasis is the main cause of lung cancer-related death (Nichols et al. 2021), with 40-60% of lung cancer patients already developing metastasis when diagnosed, leading to a poor prognosis (Stein et al. 2021).At present, the gold standard used to predict the prognosis of lung cancer is the TNM staging system, and all lung cancer patients with metastasis are classified as stage IV, whose first-line treatment is still a combination of chemotherapy, radiotherapy, immunotherapy and targeted therapy (Gadgeel et al. 2012).Unfortunately, patients with the same stage often have a heterogeneous prognosis, as some potential   d-f) and external validation (g-i) cohorts for 6-month, 1-year, and 2-year prognostic factors are not considered in the TNM staging system.And previous studies have shown that there are great differences in tumor characteristics, metastasis patterns and prognosis among different histological types of lung cancer (Wang et al. 2020).NSCLC accounts for about 80% of lung cancer, of which LUAD is the most common subtype, with a 5-year survival rate less than 20% (Torre et al. 2016).Therefore, early identification of high-risk metastatic LUAD patients has important guiding significance for treatment decision-making, long-term survival assessment and follow-up frequency.At present, the nomogram has been considered as a useful tool to assess risk by integrating the important pathological and clinical features of oncology results (Zuo et al. 2021).Some studies (Ouyang et al. 2022) have constructed nomograms to predict the survival prognosis of patients with lung cancer by integrating different clinical factors, and have shown good reliability and feasibility.In addition, for metastatic LUAD patients, Pang et al. (2022) established a nomogram by incorporating factors such as histological type, surgical approach and metastatic status to accurately predict prognosis.However, this study only included whether patients had distant metastasis, no further analysis of specific metastatic sites and different patterns was constructed.Go a step further, single metastatic site was enrolled to construct different nomogram for LUAD patients in several studies.Meng et al. (2022) investigated pretreatment peripheral blood indexes in advanced LUAD with bone-only metastasis and developed a nomogram model to estimate survival.A nomogram for predicting brain metastasis of EGFR-mutated LUAD patients and estimating the efficacy of therapeutic strategies was constructed by Wang et al. (2021) Similarly, a nomogram model was designed by Wang et al. (2022a) based on easily accessible clinical factors which demonstrated excellent performance to predict the individual cancer-specific survival of NSCLC patients with liver metastasis.According to the above studies, we found that tumor metastasis has been considered to be an important prognostic factor in LUAD patients, but its impact in patients with multiple sites has not been comprehensively analyzed and predicted.Thus, we aim to explore different In this study, we demonstrated that clinical parameters including sex, T stage, N stage, tumor size, treatment including surgery, chemotherapy and radiotherapy had a significant impact on patient survival, which was consistent with previous studies (Deng et al. 2018;Wang et al. 2022b).Whereas, age is not an independent factor affecting    (Wang et al. 2023).To our knowledge, for patients with liver metastasis, chemotherapy and radiotherapy are the standard treatment, however, the inability to tolerate chemotherapy due to liver insufficiency caused by liver metastasis may also lead to a poor prognosis (Nakagawa et al. 2008).At the same time, this also may be explained by the liver is an immunosuppressive organ, which hinders the immune surveillance of other metastatic organs when liver metastasis occurs (Ham et al. 2015).Therefore, whenever a patient has liver metastasis, no matter where it is combined, there is a tendency to have a poor prognosis compared with other modes without liver metastasis, fortunately, which has also been confirmed in our study.Nevertheless, a significant difference in patients with single lung metastasis was not found in the nomogram.Interestingly, in the subgroup analysis of evaluating the risk of major organ metastasis of different histological types of lung cancer by Wang et al. (2023), it was found that smallcell lung carcinoma (SCLC), large-cell carcinoma (LCLC), squamous-cell carcinoma (SCC) could increase or decrease the risk of lung metastasis to varying degrees, but there was no correlation in LUAD, which may be related to our trouble to get the expected results.What is more, apart from single site, we further integrated multi-site metastasis into the nomogram.As we have studied, two-site metastasis is more common than three-site and four-site metastasis, cases with bone, brain and liver metastasis were found to obtain the worst survival prognosis, and co-metastasis eventually leads to a worse survival outcome compared to single sites.At the same time, the nomogram in our study also showed better predictive accuracy and differential ability to predict the survival rate of metastatic LUAD patients.Further, DCA curves also confirmed that the nomogram we constructed was more beneficial than the 7th AJCC TNM stage in predicting 6-month, 1-and 2-year OS.Finally, patients were divided into two risk groups according to the total score based on the nomogram, and significant survival differences were found in Kaplan-Meier curve evaluation.
Nevertheless, our study also has some limitations.First, there is a lack of some important clinicopathological factors in the SEER database, such as specific treatment regimens, tumor markers, gene mutations and so on.Secondly, as a retrospective study, selection bias is inevitable.In addition, although we have fully considered metastatic sites, there is a lack of information on the number of metastatic lesions, so we are unable to incorporate this important factor into the model.Therefore, larger sample size, multicenter clinical studies are needed to further confirm the model (Table 4).

Conclusion
In summary, based on the SEER database, we successfully constructed a nomogram including different metastatic patterns to predict the OS of LUAD patients.We found that the frequency of bone metastasis was the highest, and in single site, the prognosis of liver metastasis was the worst.Two-site metastasis is more common than three-site and four-site metastasis, and co-metastasis eventually leads to a worse survival outcome.What is more, risk models and nomogram are more accurate than the TNM staging system in predicting OS for LUAD patients with metastasis.And also, the risk stratification system based on the nomogram is a useful tool to guide metastatic LUAD decision-making and predict clinical outcomes.

Fig. 1
Fig. 1 Flow chart of patient selection from the surveillance, epidemiology and end results (SEER) database

Fig. 2
Fig. 2 Kaplan-Meier survival curves of overall survival according to single site (a, c) and multiple site metastasis (b, d) in the training and validation cohorts.Log-rank test, P < 0.05 was considered statistically significant

Fig. 3
Fig. 3 Nomogram for predicting 6-month, 1-year, and 2-year overall survival in LUAD patients with metastasis in the training cohort.The metastatic site on the nomogram from left to right is: unknown; lung; bone; bone and lung; brain; brain and lung; liver; bone, brain and

Fig. 4
Fig. 4 Calibration curves for the nomogram in the training cohort (a-c), internal validation (d-f) and external validation (g-i) cohorts for 6-month, 1-year, and 2-year

Fig. 5
Fig. 5 DCA curves for the nomogram in the training cohort (a-c), internal validation (d-f) and external validation (g-i) cohorts for 6-month, 1-year, and 2-year

Fig. 6
Fig. 6 Risk stratification system based on nomogram predicted scores and cut off value by R software in the training cohort

Fig. 7
Fig. 7 Kaplan-Meier survival curves categorized into low-risk (total risk score < 236) and high-risk groups (total risk score ≥ 236) based on cut off value according to prognostic score of the nomogram in

Table 1
Baseline characteristics in the SEER training and validation cohorts

Table 2
Baseline characteristics in the SEER and external validation cohorts that had a hazard ratio.All the prediction parameters have corresponding accurate values in the nomogram.Add all these values and put them in the total score scale to calculate the survival probability.

Table 4
Univariate and multivariate cox analyses on variables for the prediction of overall survival in the validation cohort