Background

Lung cancer is the first leading cause of cancer death among men and the second leading cause of cancer death for women worldwide [1]. At present, the eighth edition of non-small cell lung cancer (NSCLC) tumor-node-metastasis (TNM) staging system developed and validated by the International Association for the Staging of Lung Cancer (IASLC) project is considered to be the most significant prognostic predictor and the main guider of postoperative supplementary treatment [2]. The following factors were incorporated into the IASLC system: histological grade, gender, age, and performance status. No molecular prognostic factors are used in the clinic because of the lack of cross-validation, Even the new biomarker programmed cell death protein 1 ligand (PD-L1) is a predictive marker of good response to immunotherapy drugs but poor prognostic indicator of survival [3]. However, clinicians know that the outcomes are diverse among resected NSCLC patients with the same TNM stage and other similar clinical features. Some die early after surgical treatment, while some remain alive, even living longer than expected. Therefore, for clinicians, subgroups of T and N staging and other more clinicopathological features should be considered in prognostic risk and survival prediction.

Recently, there have been many studies on the prognostic factors for patients with resected NSCLC [4,5,6,7]. Prognostic factors can be divided into clinical factors, tumor-related factors and treatment-related factors. TNM stage, gender, age, number of examined regional lymph nodes (NELNs), number of positive regional lymph nodes (NPLNs), surgery type, histological grade, histology, and marital status have been reported to be prognostic factors for patients with resected NSCLC [8,9,10,11,12,13,14,15,16,17,18,19,20,21,22]. There have been few studies on T and N staging subgroups as prognostic factors. Meanwhile, some prognostic prediction tools, such as prognostic nomograms, scores, and survival models for patients with resected NSCLC, have been reported in many published studies [23,24,25,26,27]. Unfortunately, for clinicians who are busy in clinical work, it is inconvenient to use the TNM stage system and tools for which the results were inaccurate and vague. Therefore, we aimed to develop software that can conveniently, specifically, accurately predict the prognostic risk and survival of patients with T1-3N0–2M0 NSCLC. In the process of building the model, T and N staging subgroups and other more clinical features were analyzed as prognostic factors.

Implementation

We collected information on patients from the Surveillance, Epidemiology, and End Results (SEER) database, which provides cancer statistics for U.S. patients. In this study, 6886 patients were obtained. Eligibility criteria included the following: [1] histological diagnosis of NSCLC; [2] suffering from only single primary NSCLC in their lifetime and had NSCLC between 2004 and 2014; [3] received resection only; [4] had definitive surgical information; [5] survival time equal to or greater than one month; and [6] ≥20 years old. Moreover, the following criteria were used to exclude patients from the study: [1] M1 stage or without definitive information on M stage; [2] without definitive information on primary site, laterality or histological grade; [3] with T4>7 and without definitive information on tumor size; [4] with T4 Inv, T4 Ipsi Nod and without definitive information on tumor extension; [5] with N3 stage or without definitive information on N stage; [6] without definitive information on the number of examined and positive regional lymph nodes; [7] unknown marital status and race. Figure 1 shows the flow chart of the process used to screen patients according to the inclusion and exclusion criteria. Clinicopathological characteristics and follow-up information were collected, as shown in Table 1, including gender, age, laterality, race, N stage, NELNs, NPLNs, surgery type, primary site, histological grade, histology, marital status, tumor extension, tumor size, survival months and status.

Fig. 1
figure 1

According to the inclusion and exclusion criteria, the flow chart of screening patients. a NSCLC: non-small cell lung cancer. b According to the eighth edition of American Joint Committee on Cancer (AJCC)/ Union for International Cancer Control (UICC) stage classification for NSCLC

Table 1 The clinicopathological characteristics of patients with resected T1-3N0 − 2 M0 NSCLC

First, In this data set, approximately 70% of patients were randomly assigned to the training set (resulting in 4821 patients), while the remaining patients comprised the test set (resulting in 2065 patients). The training set was used to build the model, and the test set was used to verify the model. Second, based on the training set, the Cox proportional hazard regression model was used to identify independent prognostic factors and their model coefficients. Third, we obtained a prognostic index (PI) equation, which is the value of each independent prognostic factor and the sum of the corresponding regression coefficient product. Fourth, according to the quantile of the PI value, patients were divided into three risk groups: the low-, intermediary-, and high-risk groups with significantly different survival rates according to Kaplan-Meier analysis and log-rank test. Meanwhile, we obtained the mean and median survival times and 1–5-year survival rates of the three risk groups. We used a test set to verify the model. Finally, we developed a software program named RNSCLC-PRSP to predict the prognostic risk and survival of patients with resected T1-3N0–2M0 non-small cell lung cancer by selecting their clinicopathological features. The software is freely available on the web at http://www.rnsclcpps.com with all major browsers supported. Clinicians register and log in and then they select the clinicopathological characteristics of patients, and the prognostic risk and survival outcome are predicted.

We used SPSS (version 16.0) software (Inc, Chicago, IL, USA) for all statistical calculations, and P<0.05 was considered to be significant. Meanwhile, the tree model analysis method was also used to rank the importance of each variable for prediction,

Results

Univariate analysis of prognostic factors

Variables codes and assignment methods of clinicopathological characteristics are provided in the Additional file 1: Table S1. After the univariate analysis, the result of which are presented in Table 2, gender, age, N stage, NELNs, NPLNs, surgery type, primary site, histological grade, histology, marital status, tumor extension, and tumor size were significant prognostic factors (P<0.05).

Table 2 Univariate analysis of the Cox proportional hazard regression model of resected T1-3N0 − 2 M0 NSCLC

Multivariate analysis of prognostic factors

By multivariate analysis of prognostic factors, the results of which are shown in Table 3, gender, age, N1 stage, NELNs (6<N ≤ 12, N>12), NPLN (1 ≤ N ≤ 3, N ≥ 4), lobectomy (LET), primary site (UL, ML), histological grade (II, III, IV), histology (AC, S, ASC), marital status (married, divorced), tumor extension (T3 Inv, T3 Satell), and tumor size (T2a>3–4(3<T ≤ 4), T2b>4–5(4<T ≤ 5), T3>5–7(5<T ≤ 7)) were identified as independent prognostic factors.

Table 3 Multivariate analysis of the Cox proportional hazard regression model of resected T1-3N0 − 2 M0 NSCLC

The tree model analysis

The tree model analysis method was used to rank the importance of each variable for prediction. The results are shown in Table 4. The third column is standardized importance. The first 12 variables were selected into the model, which was consistent with the Cox regression results.

Table 4 The importance of each variable for prediction

Prognostic risk model construction and software development

Using the Cox proportional hazard regression model, we obtained the PI equation, PI = ∑βixi.

=0.379X1–0.403X2–0.267X51–0.167X61–0.298X62 + 0.460X71 + 0.617X72–0.344X81–0.105X91–0.243X92 + 0.305X101 + 0.508X102 + 0.754X103 + 0.143X111 + 0.170X112 + 0.434X113–0.327X122–0.247X123 + 0.517X133 + 0.340X134 + 0.457X143 + 0.419X144 + 0.407X145. Using the PI equation, we obtained the PI value of every patient. As shown in Table 5, we obtained PI ranges for the training and test sets. According to the quantile of the PI value, we divided patients in the training and test sets into three risk groups. The three risk groups were divided based on the PI values as follow: 0~50%, 50~90%, and 90 + %. The quantiles are divided into low-, intermediary-, and high-risk groups. We obtained three risk groups and their corresponding mean and median survival times and 1–5-year survival rates of the training and test sets (Tables 6 and 7, respectively). Using K-M curves and log-rank tests, we found that, from the low-, intermediate- and high-risk groups, the survival rates of the training and test sets were worse stepwise (P<0.001) (Fig. 2). Through the test set verification, the model effect is good.

Table 5 PI ranges of the training and test sets
Table 6 (training-set) Three risk groups and their corresponding mean and median survival times and 1–5-year survival rates
Table 7 (test-set) Three risk groups and their corresponding mean and median survival times and 1–5-years survival rates
Fig. 2
figure 2

Kaplan-Meier survival curve of PI ranges

We developed a software named RNSCLC-PRSP to predict the prognostic risk and survival of patients with resected T1-3N0–2M0 NSCLC.

Discussion

We have invented a novel tool to predict the prognosis of patients with resected T1-3N0–2M0 NSCLC. We determined the independent risk factors and obtained prognostic risk models and risk groups and their corresponding survival times. This paper highlights that comprehensive and further refined analysis that is capable with the incorporation of clinical pathological factors to predict prognosis of resected T1-3N0–2M0 NSCLC.

To access the program, clinicians can enter the url http://www.rnsclcpps.com in a browse to reach the login screen of the software. At the bottom of interface is a brief introduction of the software and an explanation of the relevant abbreviations. Above the interface is the login box. New users can click the button of register on the login box to register. After successful registration, users can click the button to return to the login, enter the username and password, click the button to login and enter the software interface. The first line of interface is titled Prognostic risk and survival prediction software RNSCLC-PRSP for resected T1-3N0–2M0 NSCLC (according to the eighth edition AJCC/UICC stage classification). Operational tips (notes) are located under the title, under the note is an explanation of the relevant abbreviations, and there are alternative options located under the abbreviations. According to the note and explanation of abbreviations, clinicians first need to determine the clinicopathological characteristics of patients. Taking a resected T1-3N0–2M0 (according to the eighth edition of AJCC/UICC stage classification) non-small cell lung cancer patient as an example, the clinicopathological characteristics of a representative patient were gender (man), age (≤65), N stage (N0), NELNs (N>12) ,NPLNs (N ≥ 4) ,surgery type (LET) ,primary site (UL) ,histological grade (III) ,histology (S) ,marital status (married) ,tumor extension (T3 Inv) ,tumor size (T2b>4–5(4<T ≤ 5)). For these clinicopathological characteristics, clinicians can choose the appropriate response for each factor. If there are no corresponding options, clinicians should choose none and then click the button to submit their entry, and the prognostic risk and survival prediction results will be shown on the next page. Here are the prognostic and prediction results for the representative patients: high-risk group, PI value is PI≥0.79, mean and median survival time are 42.93 and 24.0 months respectively, and the 1–5 year survival rates are 68.6, 49.7, 41.6, 32.6, 26.8% respectively.

The RNSCLC-PRSP software we have developed is based on the actual needs of clinicians predicting the prognosis of patients with resected NSCLC. Clinicians are very busy in clinical work; meanwhile, the prognosis of resected NSCLC patients is affected by many factors. There is no more time for clinicians to evaluate every factor to obtain a more accurate prognosis. We provide quantitative and relative analysis software, and clinicians can conveniently and swiftly get every patient’s prognostic risk and survival calculated accurately just by choosing some of the clinicopathological features. The RNSCLC-PRSP software would be gladly accepted by clinicians. At present, there have been no relative prognostic predictive software programs for resected T1-3N0–2M0 NSCLC. Pilotto S et al. developed clinicopathological prognostic nomograms for resected squamous cell lung cancer, Based on clinicopathological factors including age, T descriptor (according to the seventh edition of the TNM classification), lymph node status, and grading in the model. Every patient was assigned a prognostic score [28]. Francesco Guerrera et al. designed a prognostic model predicting 5-year survival after surgical resection for stage I non-small cell lung cancer based on clinical, pathological and surgical covariates [25]. Compared to the above two tools, our software analysis includes more clinicopathological features and more detail for more patients with resected non-small cell lung cancer and our novel software is more convenient and practical for clinicians.

Although we have established predictive software using relative prognostic factors, we may need to analyze more clinicopathological factors to improve the software. Thus, further research will be conducted. The potential valuable prognostic prediction factors such as smoking status, performance status, comorbidity, molecular biological factors, biochemical and biomarker test results, lung function, tumor vascular or lymphatic invasion, surgical method (minimally invasive or open), and surgery margins, were not able to be determined or researched in more recent database. However, with the expansion of databases, further research will be carried out, and our software can be updated and improved to provide better service.

Conclusions

Using the SEER database and the Cox proportional hazard model, we identified the independent prognostic factors and corresponding PI value of patients with resected T1-3N0–2M0 NSCLC. According to different PI ranges, three prognostic risk groups (the low-, intermediate-, high-risk groups) were determined, and their corresponding survival times were obtained. We developed the RNSCLC-PRSP software for clinicians to conveniently and practically predict the prognosis of patients with resected T1-3N0–2M0 NSCLC to guide further treatment. We have shown that the software we have developed opens a new predictive method in this field.

Availability and requirements

Project name: My bioinformatics project.

Project home page: http://www.rnsclcpps.com

Operating system(s): Platform independent.

Programming language: Java.

Other requirements: no.

License: no.

Any restrictions to use by non-academics: no.