Introduction

Lung cancer (LC) is the second most prevalent tumor and remains the leading cause of malignancy-related deaths worldwide by far [1]. LC is commonly classified into small cell lung cancer (SCLC) and non-small cell lung cancer (NSCLC). Among them, adenocarcinoma is the most important subtype of NSCLC and the most common type of LC. With the increasing popularity of low-dose spiral computed tomography (CT) in health screening and disease diagnosis, the incidence of ≤ 2 cm lung cancer has been increasing [2]. For early-stage lung adenocarcinoma, more thoracic surgeons are accepting segmental or subsegmental resection and selective lymph node dissection as the optimal treatment modality [3, 4]. However, in some LC cases, lymph node metastasis (LNM) occurs in the early stages of the tumor. The incidence of LNM in LC cases with lesions ≤ 2 cm in diameter has been reported to be about 10% [5, 6]. Emerging evidence suggests that lymph node metastasis is a risk factor for poor prognosis in patients with early-stage lung adenocarcinoma [7]. Unfortunately, the accuracy of preoperative lymph node staging CT scans is only 45%-79% [8,9,10,11,12]. Preoperative mediastinoscopy and endobronchial ultrasound transbronchial needle aspiration are not routinely used in patients with clinical stage I disease, and these methods have produced a considerable number of false-negative results [13,14,15]. Complete clearance of metastatic lymph nodes during surgery plays a key role in improving the disease-free survival and overall survival of patients [16]. Therefore, it is necessary to accurately assess preoperative lymph nodes metastasis in NSCLC.

It has been shown that adenocarcinomas with micropapillary and solid growth patterns are more aggressive and have a poorer prognosis [17, 18]. In addition, blood inflammatory markers and tumor markers can be used to predict lymph node metastasis in lung cancer [19,20,21,22]. CT remains the most widely used tool to assess tumor and lymph node involvement in patients with early-stage non-small cell lung cancer [8,9,10,11]. Some researchers claim that frozen sections are a key indicator to guide the approach to resection [23] and that it is feasible to report histological subtypes and other pathological features during surgery [24, 25].

To date, many studies have explored independent predictors of lymph node metastasis [26,27,28,29,30,31,32]. These include carcinoembryonic antigen (CEA) [26], tumor size [26], standardized uptake value maximum (SUVmax) [27], female [28], never smoker[28], adenocarcinoma histology [28], positive N1 lymph nodes on positron emission tomography (PET) [29], blood inflammation biomarkers [30], neutrophil to lymphocyte ratio (NLR) [31] and consolidation-to-tumor ratio (CTR) [32], ect. However, only a few studies have developed comprehensive models to predict lymph node metastasis based on radiological features, patient clinical information, and hematological parameters.

In our study, we explored the risk factors for lymph node metastasis in a cohort of patients with early invasive lung cancer and developed a nomogram model for predicting the risk of lymph node metastasis based on patient clinical information, hematologic indicators, imaging features, and pathologic findings. The aim was to enable the nomogram to quickly and accurately predict the incidence of lymph node metastasis before or during surgery, which may provide a computational method for surgeons to make intraoperative decisions.

Materials and methods

Patients

This study was approved by the Ethics Committee of Qilu Hospital, Shandong University (registration number: KYLL-202008–023-1), and all patients signed an informed consent form for the use of their clinical information prior to the procedure.

Patients with invasive adenocarcinoma from January 2020 to December 2021 at Qilu Hospital of Shandong University were retrospectively evaluated.

The inclusion criteria were: (1) patients with a single intrapulmonary nodule suggested by chest CT within 1 month before surgery; (2) nodules with a maximum diameter ≤ 20 mm on CT; (3) undergoing pneumonectomy (lobectomy or subpneumonectomy) with systemic lymph node dissection; (4) complete pathological data and pathological type of Invasive lung adenocarcinoma; (5) not receiving neoadjuvant chemotherapy or radiotherapy before surgery; (6) no pulmonary atelectasis and active inflammatory images of the lungs. Exclusion criteria were (1) patients < 18 years of age, (2) open-heart surgery, (3) incomplete perioperative data, and (4) patients with a history of malignant disease within 5 years. (5) combination of acute infectious diseases that can cause changes in the levels of systemic inflammatory markers; (6) presence of distant metastases.

A total of 2213 patients were included in this study, and after our exclusion according to the above-mentioned criteria, 522 patients with invasive lung adenocarcinoma with tumor size ≤ 2 cm were finally recruited in our study. Figure 1 shows the flow chart of included patients.

Fig. 1
figure 1

Flow chart of this study

Clinical data of patients

Clinicopathological information was collected from the patient record management system as follows: age, gender, presence of preoperative comorbidities [hypertension, diabetes mellitus, and chronic obstructive pulmonary disease (COPD)], history of smoking, body mass index (BMI), predicted percent forceful expiratory volume in one second (FEV1% predicted), predicted percent maximum voluntary ventilation (MVV% predicted), and American Society of Anesthesiologists (ASA) score.

Hematological test

Record hematologic parameters within 2 weeks prior to surgery as follows. (1) Blood count: neutrophils, basophils, eosinophils, lymphocytes, monocytes, red blood cells, platelets, albumin, hemoglobin, blood glucose, blood type. (2) Serum enzyme count: serum 5'-nucleotidase (5'-NT), serum amylase (SA), lactate dehydrogenase (LDH). (3) Tumor markers: carcinoembryonic antigen 125 (CA125), neuron-specific enolase (NSE), carcinoembryonic antigen (CEA), gastrin-releasing peptide (pro-GRP), cytokeratin 19-fragment (cybra21-1), and squamous carcinoma antigen (SCC). (4) Inflammatory markers: serum complement C1q and derived neutrophil–lymphocyte ratio (NLR), platelet-lymphocyte ratio (PLR), monocyte-lymphocyte ratio (MLR), derived neutrophil–lymphocyte ratio (dNLR), neutrophil–lymphocyte and platelet ratio (NLPR), systemic inflammatory response syndrome (SIRS), total systemic inflammatory index (AISI) and systemic inflammatory index (SII). These derived inflammatory indicators were calculated as follows.

  • NLR = neutrophils/lymphocytes.

  • PLR = platelets/lymphocytes.

  • MLR = monocytes/lymphocytes.

  • dNLR = [neutrophils/ (leukocytes—neutrophils)].

  • NLPR = [Neutrophils/ (lymphocytes × platelets)].

  • SIRI = [(neutrophils × monocytes)/lymphocytes)].

  • AISI = [(neutrophils × monocytes × platelets)/lymphocytes].

  • SII = [(neutrophils x platelets)/lymphocytes)].

Imaging analysis

The morphological features of computed tomography include: location (central or peripheral), shape (regular or irregular), spiculation, calcification, cavity sign, bronchial sign, lobar sign, pleural adhesion sign, vascular penetration sign, pleural effusion sign, maximum tumor diameter, lymph node enlargement sign, and consolidation to tumor ratio (CTR). Two radiologists measured each imaging feature independently, and a third radiologist with more than 20 years of experience in chest radiology reassessed the discrepancies. Any disagreements were resolved by consensus.

Centrality was defined as nodules located in the bronchi, lobular bronchi, and segmental bronchi. Peripherality was defined as nodules located below the tertiary bronchi. Spiculation was defined as spread from the nodal margins to the lung parenchyma without contacting the pleural surface. Signs of calcification were defined as having one of these patterns on CT imaging: stratification, central nodule, diffusion, or popcorn pattern. Cavitation signs were defined as gas-filled spaces that are considered to be transparent or low-attenuation regions. The bronchial sign shows direct bronchial involvement of nodules on CT images. Lobulation was defined as the wavy or fan-shaped portion of the lesion surface and the strands extending from the nodal margins into the lung parenchyma. Signs of pleural adhesions were defined as linear attenuation or major or minor fissures toward the pleura. The vascular penetration sign was observed on the CT image with a pulmonary artery crossing the node. The pleural effusion sign was defined as a blunting of the rib-diaphragm angle visible on the CT image. The lymph node enlargement sign was the enlargement of mediastinal lymph nodes that can be observed on CT images.CTR was defined as the ratio of the diameter of the solid component of the lung nodule to the maximum diameter of the nodule.

Histological evaluation

All pathological specimens were fixed in formalin, stained with hematoxylin–eosin, and evaluated by two experienced lung pathologists. Histopathological evaluation was performed by examining hematoxylin–eosin-stained slides with a light microscope. All specimens were classified according to the International Association for the Study of Lung Cancer/American Thoracic Society/European Respiratory Society classification of adenocarcinoma of the lung [33]. The pathological lymph node status of patients was confirmed according to the 8th edition of the TNM lung cancer classification.

The percentage of each histological component (mucinous, lepidic, acinar, papillary, micropapillary and solid pattern) was recorded in 5% increments and the tumors were classified according to the predominant pattern. The pattern was considered present if ≥ 5% of the histological pattern was present in the tumor.

DNA purification and quantification

Cutting all formalin-fixed paraffin-embedded (FFPE) specimens to 5–8 μm thickness. Thereafter, DNA and RNA extraction was performed using 5–30 tissue sections with at least 2% tumor cells using the FFPE DNA/RNA Nucleic Acid Extraction Kit (No. 8.0223601X036G, Xiamen Diagnostics, Xiamen, China). After isolation of DNA and RNA, the concentrations of DNA and RNA were determined using a microscopic spectrophotometer. the RNA concentrations ranged from 10 to 500 ng/μL and the DNA concentrations were > 2 ng/μL.

Immunohistochemistry Validation in Resected Patients

All IHC staining was performed in the clinical immunohistochemistry laboratory of our hospital pathology department. All IHC staining was performed in the clinical immunohistochemistry laboratory of our hospital pathology department. Briefly, specimens were sectioned at 5 μm, dewaxed and incubated with primary antibody. Staining characteristics as well as the intensity and distribution of staining patterns were reviewed and considered. If more than 5% of the tumor cells with the appropriate staining pattern were found, the case was considered positive; otherwise, the case was considered negative. Immunohistochemistry was verified for CK5/6, CK7, Napsin A, MUC-AC, P63, Ki-67% positive rate, CyclinD1, EMA, CD31, D2-40, etc.

special staining in resected patients

The Periodic Acid-Schiff (PAS) reaction, Periodic Acid-Schiff reaction with diastase (PAS-D) and elastic fibers are three special staining procedures that are commonly performed in a histology laboratory. The staining reaction was classified as positive or negative by three "blinded" observers.

Statistical analysis

All statistical analyses were performed using SPSS 26.0 (SPSS Inc., Chicago, Illinois, USA) and R statistical software (Windows version 4.2.1, http: //www.r-project.org/). We used the “rms package” to plot the nomogram, “pROC” to plot the ROC curve, and “rmda” to plot the DCA curve. Categorical variables were compared using Pearson's Chi-square test or Fisher's exact test. Normally distributed continuous variables were expressed as mean ± standard deviation (SD) and compared using the Student's t-test. For non-normally distributed continuous variables, data were expressed as medians (interquartile range [IQR]) and compared between two groups using the Mann–Whitney U test. Statistical significance was described as a two-sided P value of less than 0.05.

We implement the random assignment of patients through the R. All enrolled patients were randomly assigned to the training and validation cohorts in a 7:3 ratio, using a randomly segmented sample. The training cohort was used to develop the prediction nomograms, while the validation cohort was used to verify the performance of the nomograms.

Predictive model development and validation

Construction of nomogram

The training cohort data were first analyzed by univariate logistic regression analysis to identify potential risk factors. Those factors with P-values less than 0.05 in univariate analysis were included in further multivariate logistic regression analyses. Finally, predictive models were developed using independent risk factors (P < 0.05 in multivariate logistic regression). A nomogram was created by using R statistical software (Windows version 4.2.1, http: //www.r-project.org/). Area under the curve (AUC) was determined, and receiver operating characteristic (ROC) curves were created. A regression model was used to calculate scores for each variable, and the predicted probability of risk of lymph node metastasis in small-sized non-small cell lung cancer could be derived by summing the scores for each variable.

Nomogram performance

An assessment of the performance of predictive nomograms is made by discriminative power, calibration and clinical utility. Discriminative power is the capability of a model to correctly differentiate between events and non-events.ROC curves are employed to assess the recognition efficiency of predictive nomograms [34]. A measurement of how well the predicted probability matches the actual result is called calibration. the Hosmer–Lemeshow test can be used to assess calibration ability, with a p-value greater than 0.05 indicating satisfactory calibration [35]. Subsequently a nomogram calibration plot is formed to further assess the calibration. This was verified internally by using a bootstrap method repeated 1000 times [36]. Predictive nomograms were evaluated for clinical effectiveness using decision curve analysis (DCA) based on the net benefit of different threshold probabilities [37]. The optimal cutoff value was determined when the Youden index (sensitivity + specificity-1) reached its maximum value based on ROC curve analysis of the training cohort.

Results

Patient characteristics

A total of 522 patients were enrolled in this study. The overall incidence of lymph node metastasis was 13.23% (61/461). Of all patients enrolled, 284 were women and 138 were men. The median age was 61 (range: 31–81) years. the median tumor size on CT was 1.2 (range: 0.3–2) cm. Demographic characteristics and variable data for both cohorts are shown in Table 1. The training cohort included 366 (70.1%) patients, whereas the validation cohort included 156 (29.9%) patients. The characteristics of the two cohorts were similar, with p-values > 0.05 except for MVV% predicted, and the differences in distribution were not statistically significant. Detailed information on the features of the two groups in the training and validation groups is shown in Table 2.

Table 1 Patients’ characteristics of the training cohort and validation cohort
Table 2 Clinical characteristics of patients in the training and validation cohorts

Identifying risk factors for lymph node metastasis

Univariate and then multivariate logistic regression analyses were performed in the training cohort to investigate independent risk factors for lymph node metastasis, and the results of the logistic regression analyses are shown in Table 3.

Table 3 Univariate and multivariate logistic regression analysis of LNM factors in a training cohort

Univariate analysis showed that as many as 30 factors were potential risk factors for lymph node metastasis in early-stage small lung adenocarcinoma (P < 0.05). After further multivariate logistic regression analysis, six indicators were finally identified to be independently associated with lymph node metastasis. The six indicators were: age [odds ratio (OR) = 0.934; 95% confidence interval (CI): 0.871–0.996; P < 0.001]; SA (OR = 1.025; 95% CI: 0.937–1.109; P = 0.008); CA125 (OR = 1.103; 95% CI: 1.021–1.189; P = 0.042); Mucinous (no and yes; OR = 1.729; 95% CI: 0.371–7.519; P = 0.003); Napsin A (no and yes; OR = 2.704; 95% CI: 0.489–15.541; P = 0.007); and CK5/6 (no and yes; OR = 18.668; 95% CI: 2.938–154.991; P = 0.042). The results of the multifactorial logistic regression analysis of the 30 factors screened in this study are detailed in the forest plot (Fig. 2).

Fig. 2
figure 2

Multi-factor logistic regression analysis of forest plots. PNI, prognostic nutritional index; PLR, platelet-lymphocyte ratio; SA, serum amyloid; CA125, carcinoma antigen 125; BMI, body mass index; FEV1, forced expiratory volume in one second; TTF, thyroid transcription factor 1; PAS, Periodic Acid-Schiff reaction; PAS-D, Periodic Acid-Schiff reaction with diastase; CK 5/6, Cytokeratin 5/6; CK 7, Cytokeratin 7; MUC-AC, mucin-AC

Frequency of targeted gene alterations

Of the 522 patients, 46 underwent genetic alteration analysis using ARMS-PCR. Of these, 37 (80.4%) samples had gene mutations detected. The mutation frequencies of EGFR and KRAS genes were 71.7% (33/46) and 8.7% (4/46), respectively. EGFR mutations were the most common type of alteration, with 39.1% (18/46) of patients having mutations in Exon21, 26.1% (12/46) having mutations in Exon19, 2.2% (1/46) having mutations in Exon18, 2.2% (1/46) having mutations in Exon20, and 2.2% (1/46) having double mutations in Exon18 and Exon20. All of the KRAS mutations were mutations in Exon2, with a total of 4 cases or 8.7% (4/46). Of the 37 patients with genetic mutations, 4 had lymph node metastases and 33 did not. Considering the possibility of gene mutations in patients without genetic testing, this study will not include gene mutations in the univariate and multifactorial analyses, but will simply elaborate the findings.

Nomogram construction

All six independent risk factors for lymph node metastasis in small invasive lung adenocarcinoma within 2 cm were included to create a logistic regression model. The probability of lymph node metastasis in small invasive lung adenocarcinoma could be calculated by the following formula: ln (p/1-p) = -0.068 × age + 0.025 × SA + 0.098 × CA125 + 0.547 × mucinous (no = 0; yes = 1) + 2.927 × CK5/6 (no = 0; yes = 1)—13.972. Based on the above equation, a nomogram of the predicted probability of lymph node metastasis in invasive lung adenocarcinoma within 2 cm was plotted using R statistical software (Fig. 3). As shown in this nomogram, there are 9 axes, and axes 2–7 represent the six variables in the prediction model. By drawing a line perpendicular to the highest point axis, the estimated score for each risk factor can be calculated and can be further summed to obtain a total score. The total score axis is then used to predict the probability of developing lymph node metastasis in invasive lung adenocarcinoma, which in turn can further guide the surgical approach.

Fig. 3
figure 3

Nomogram for predicting the probability of LNM in small invasive lung adenocarcinoma. SA, serum amyloid; CA125, carcinoma antigen 125; CK 5/6, Cytokeratin 5/6. As shown in this nomogram, there are 9 axes, and axes 2–7 represent the six variables in the prediction model. By drawing a line perpendicular to the highest point axis, the estimated score for each risk factor can be calculated and can be further summed to obtain a total score. The total score axis is then used to predict the probability of developing lymph node metastasis in invasive lung adenocarcinoma, which in turn can further guide the surgical approach

Predictive performance and validation of the nomogram

Discrimination ability of the prediction model and nomogram is assessed by the ROC curve (Fig. 4). ROC area under the curve (AUC) was 0.843 (95% CI: 0.779–0.908) for the training cohort and 0.838 (95% CI: 0.748–0.927) for the validation cohort, indicating that the nomogram has good predictive accuracy. The ROC curve for the training cohort had a threshold of 0.089 and sensitivities and specificities of 0.795 and 0.786, respectively (Table 4). Our Hosmer–Lemeshow test and calibration charts were used to assess calibration capability. Our p-value for the Hosmer–Lemeshow test was 0.0613 in the training cohort and 0.8628 in the validation cohort, indicating that the difference between the predicted and actual observed probabilities was negligible. A good calibration of the prediction nomogram is also demonstrated by the calibration plots of the training cohort (Fig. 5A) and the validation cohort (Fig. 5B). The bias-corrected C-index for the training cohort was 0.8444 and the bias-corrected C-index for the validation cohort was 0.8375, further demonstrating the goodness of the prediction model.

Fig. 4
figure 4

Results of ROC curve in the training and validation cohorts

Table 4 Results of ROC curve for training cohort
Fig. 5
figure 5

A, B Calibration curves of the prediction nomogram in the training cohort (A) and validation cohort (B). The X-axis represents the probability predicted by the nomogram and the Y-axis represents the actual probability of LNM in invasive lung adenocarcinoma within 2 cm. The black dashed line represents the ideal curve, the blue solid line represents the apparent curve (uncorrected), and the red solid line represents the deviation curve corrected by bootstrap method (B = 1000 times). LNM, lymph node metastasis

Clinical utility of the predictive nomogram

Just as shown in Fig. 6A and B, DCA was used to assess the clinical utility of the prediction nomogram. Findings show that the nomogram provided greater net benefit and broader threshold probabilities for predicting the risk of lymph node metastasis in invasive lung adenocarcinoma within 2 cm in both the training and validation cohorts, showing that the nomogram is clinically useful. Figure 7A and B show the clinical impact curves (CIC) for the validation cohort and the verification cohort, respectively. The curves show that a high benefit ratio is obtained within a probability threshold of 0.2–1.0. It suggests that the present model can indeed be used clinically to predict the probability of lymph node metastasis in small invasive lung adenocarcinoma.

Fig. 6
figure 6

A, B Decision curve analysis of predicted nomogram in the training cohort (A) and validation cohort (B). The y-axis measures the net benefit, the black line represents the hypothesis that no lymph node metastasis has occurred in invasive lung adenocarcinoma within 2 cm, and the gray line represents the hypothesis that lymph node metastasis has occurred in invasive lung adenocarcinoma measuring ≤ 2 cm. The blue line in Fig. 6A represents the training cohort, and the red line in Fig. 6B represents the validation cohort

Fig. 7
figure 7

A, B Clinical impact curves of predicted nomogram in the training cohort (A) and validation cohort (B). The horizontal coordinate is the probability threshold and the vertical coordinate is the number of people. The blue line indicates the number of people judged by the model to have lymph node metastasis at different probability thresholds; the red line indicates the number of people judged by the model to be at high risk and to have true lymph node metastasis at different probability thresholds. At the bottom, a cost: benefit ratio is also added, indicating the ratio of loss to benefit at different probability thresholds

Discussion

In this retrospective study, we developed a nomogram to predict the incidence of lymph node metastasis. In this study, age, SA, CA125, mucin composition, CK5/6, and napsin-A were found to be independent risk factors for lymph node metastasis. The results of genetic testing showed that EGFR was the most common alteration. A nomogram model was developed to assess the risk of lymph node metastasis, which showed consistent discriminatory performance and satisfactory calibration. In 2012, a related study by Terumoto Koike et al. identified the following four predictors of mediastinal lymph node metastasis: (age ≥ 67 years, CEA ≥ 3.5 ng/ml, tumor size ≥ 2.0 cm, and the CTR ≥ 89%) [26]. Advanced age was a common predictor in both our studies. As for hematologic components, our study showed SA and CA125 as predictors. CTR and tumor size were not shown to be associated with mediastinal lymph node metastasis in our study. The inclusion of immunologic components in the predictors is an innovative point of our study. These previously unpublished observations have potential implications for the therapeutic management of early-stage lung adenocarcinoma. This is because the nomogram may have the potential to predict lymph node status before the end of surgery and to guide surgeons in developing lymph node dissection strategies.

Many studies have been conducted on the effect of age on lymph node metastasis in non-small cell lung cancer [26, 38,39,40,41,42,43,44,45,46]. A part of the findings concluded that youth is an influential factor for lymph node metastasis in lung cancer, with a higher risk of lymph node metastasis in lung cancer patients at a younger age [26, 41,42,43]. Another part of the study showed that age had no significant effect on lymph node metastasis in lung cancer patients [44,45,46]. This discrepancy may be due to differences in the patients included in the study, sample size, and analysis methods. Therefore, the different conclusions reached in previous studies are explainable and acceptable. Based on our findings, we conclude that patients with young invasive lung adenocarcinoma are at greater risk for lymph node metastasis and require more thorough and meticulous lymph node dissection.

To date, there have been some case reports of elevated levels of SA being associated with lung cancer [47,48,49]. The predominance of salivary amylase was observed in these studies from the amylase isozyme pattern in serum and tumor tissues. Amylase levels were higher in tumor tissue than in normal lung tissue. Immunohistochemical studies revealed that amylase was located in tumor cells. Observation of ultrastructure revealed electron-dense particles in the cytoplasm of tumor cells. The findings suggest that in this case, amylase is produced by lung cancer. The possibility that serum amylase levels may be a highly sensitive marker for lung cancer was raised in these studies. Our findings found that lung adenocarcinoma patients with high levels of SA concentration in the blood had a higher risk of lymph node metastasis.

CA125 has long been recognized for its role as a classical tumor maker, not only as a predictor of lung cancer, but also as a direct correlate of tumor infiltration and metastasis. It has been confirmed that CA125 is associated with lymph node metastasis in lung cancer [50, 51]. CA125 provides important value in judging the extent of lung cancer metastasis and monitoring the progression of lung cancer disease. This study demonstrated the importance of CA125 in determining whether lymph node metastasis is present in lung cancer patients. Surgeons should be more cautious when performing lymph node dissection during lung cancer surgery when faced with patients with high serum CA125 levels.

Mucus is thought to play a key role in the development of cancer, as mucinous adenocarcinoma in many organs is associated with lymph node metastasis and poorer prognosis [52,53,54,55,56]. The mucinous glandular component of the tumor is histologically characterized by cupped and highly columnar epithelial cells and produces mucin, and the mucinous subtype is considered more malignant than other common subtypes of lung adenocarcinoma, such as squamous and alveolar subtypes [57,58,59]. Some reports with small sample sizes claim a low rate of lymph node metastasis in invasive mucinous adenocarcinoma [60,61,62,63]. The results of other studies hold the opposite opinion. The study by Zhu et al. claimed that the mucus subtype is a risk factor for distant metastasis of lung adenocarcinoma [64]. Our findings suggest that the mucus component is one of the risk factors for lymph node metastasis.

Napsin A is a human aspartate protease associated with pepsin, gastrin, renin, and histone protease [65]. IHC studies have demonstrated that Napsin A is expressed in normal human type II lung cells and alveolar macrophages [66]. Strong cytoplasmic staining for napsin A was observed in up to 87% of lung adenocarcinomas [67,68,69,70,71]. In contrast, CK5/6 is a sensitive and relatively specific marker of squamous differentiation [72,73,74]. The novelty of our study is that for the first time, lymph node metastasis was linked to these two immunohistochemical markers, demonstrating that CK5/6 and napsin A can be used to predict lymph node metastasis in invasive adenocarcinoma. However, the reasons behind why CK5/6 and napsin A can predict lymph node metastasis are still waiting to be explored and studied.

Our study has several advantages compared with other studies. First, for the first time, we included CK5/6, napsin A, and mucus components as influencing factors for lymph node metastasis in our prediction model. Second, the factors in our prediction model are common and easily available in clinical practice. Third, our prediction model has excellent discriminatory power, calibration, and clinical utility. The model is easy to use in clinical practice, and the associated nomogram guides surgeons to quickly select an optimized surgical approach.

Our study has several limitations. First, the analysis was based on retrospective data from a single institution, and the possibility of selection bias cannot be ruled out; results from other centers must be validated. Second, mutation testing was performed according to the patients' wishes. Thus, the sample size for testing their genomics is a subset of the entire cohort, which makes it challenging to include mutation information in a multiple regression analysis. Third, the limited number of cases may lead to potential bias, especially in histological subtype analysis.

Conclusion

In this study, a clinical prediction model for six risk factors was proposed. For invasive lung cancer, age, SA, CA125, mucin composition, CK5/6, and napsin-A are important risk factors associated with lymph node metastasis. Based on this line chart, surgeons may be able to predict lymph node status before the end of surgery.