Lymphatic Node Metastasis Risk Scoring System: A Novel Instrument for Predicting Lymph Node Metastasis After Thymic Epithelial Tumor Resection

Background The authors aimed to create a novel model to predict lymphatic metastasis in thymic epithelial tumors. Methods Data of 1018 patients were collected from the Surveillance, Epidemiology, and End Results database from 2004 to 2015. To construct a nomogram, the least absolute shrinkage and selection operator (LASSO) regression model was used to select candidate features of the training cohort from 2004 to 2013. A simple model called the Lymphatic Node Metastasis Risk Scoring System (LNMRS) was constructed to predict lymphatic metastasis. Using patients from 2014 to 2015 as the validation cohort, the predictive performance of the model was determined by receiver operating characteristic (ROC) curves. Results The LASSO regression model showed that age, extension, and histology type were significantly associated with lymph node metastasis, which were used to construct the nomogram. Through analysis of the area under the curve (AUC), the nomogram achieved a AUC value of 0.80 (95 % confidence interval [Cl] 0.75–0.85) in the training cohort and 0.82 (95 % Cl 0.70–0.93) in the validation cohort, and had closed calibration curves. Based on the nomogram, the authors constructed the LNMRS model, which had an AUC of 0.80 (95 % Cl 0.75–0.85) in the training cohort and 0.82 (95% Cl 0.70–0.93) in the validation cohort. The ROC curves indicated that the LNMRS had excellent predictive performance for lymph node metastasis. Conclusion This study established a nomogram for predicting lymph node metastasis. The LNMRS model, constructed to predict lymphatic involvement of patients, was more convenient than the nomogram. Supplementary Information The online version contains supplementary material available at 10.1245/s10434-021-10602-0.

need exists for a model to predict lymph node metastasis in thymic epithelial tumors to assist in clinical diagnosis and treatment.
Predictive models for lymph node metastasis have been developed for many cancer types, such as squamous nonsmall cell lung cancer, esophageal squamous cell carcinoma, colorectal cancer, and so on. [7][8][9] Nevertheless, a model for predicting lymphatic involvement of thymic epithelial tumors is hard to construct. The construction of a prediction model faces two main challenges. First, the incidence of the disease is low. The overall incidence of thymoma is 0.13 per 100,000 person-years in America and 10 0.09 to 0.23 per 100,000 person-years in Europe. 11 Second, no lymph node map for thymic epithelial tumors has existed in the past as a public reference for lymphatic resection of thymic epithelial tumors. Therefore, a new lymph node map was proposed by the International Thymic Malignancy Interest Group (ITMIG) and published in the 8th edition of tumor-node-metastasis (TNM) stage classification system for thymic malignancies. 12,13 Using the Surveillance, Epidemiology, and End Results (SEER) database, this study aimed to develop and validate a predictive model for lymph node metastasis status after thymic epithelial tumor resection. The results of this study can be conveniently implemented in clinical work and can contribute to further guidance and optimization of treatment strategies for thymic epithelial tumors.

Patients and Study Design
The population data on thymic epithelial tumors were extracted from the SEER database of the American National Cancer Institute. The Incidence-SEER 18 Regs Research database is based on the November 2017 submission through SEER*Stat software version 8.3.6 (Information Management Services, Inc., Calverton, MD).
As shown in Fig. 1, patients with a diagnosis of thymic epithelial tumors between 2004 and 2015 were selected for the study from the SEER database for public use. All population data were used to divide the patients into two cohorts. The patients who received surgery between 2004 and 2013 formed the training cohort, and those who received surgery between 2014 and 2015 formed the validation cohort.
The inclusion criteria specified the following: (1) The histopathologic diagnosis had to be included. All data had to be histologic type ICD-O-3, and the histologic type had to be according to the International Classification of Diseases for Oncology, third revision (ICD-O-3) using the codes according to the 2015 World Health Organization Classification of Tumors of the Thymus. 14   The exclusion criteria ruled out patients whose race, marital status, lymphatic metastases, tumor size, tumor extension, histology type, or distant metastasis was unknown.

Variable Definition
The candidate variables in the analysis were age at diagnosis, sex, race, marital status, tumor size, tumor extension, histologic type, and histologic grade. Race was separated into white, black, and Asian (Asian Indian, Pakistani, Chinese, Filipino, Japanese, Kampuchean, Korean, Laotian, and Vietnamese). Marital status was grouped as single (divorced, separated, single, or unmarried or domestic partner) and married. Extension of tumor included four subgroups: location (CS Extension code 100 or 300 and CS Mets at Dx code 00 or 10), adjacent connective tissue (CS Extension code 400 and CS Mets at Dx code 00 or 10), adjacent organs/structures (CS Extension code 600 and CS Mets at Dx code 00 or 10), and distance (two states according to the SEER manual: (1) CS Extension code 100, 300, 400, or 600 and CS Mets at Dx code 40 or 50 and (2)  15 Thymic epithelial tumors were classified into low-risk thymomas (type A, AB, and B1), high-risk thymomas (type B2 and B3), and thymic carcinomas (type C). 16 Statistical Analysis Continuous data are described using median (interquartile range [IQR), and categorical data are described as counts and percentages. Least absolute shrinkage and selection operator (LASSO) regression were performed on the training cohort using the lars package (https://mirrors. tuna.tsinghua.edu.cn/CRAN/web/packages/lars/lars.pdf), and three unsparse variables were finally retained for inclusion in the final prediction model after feature selection. 17 Nomograms were plotted for visual analysis by using the rms package of R. 18 To decrease overfit bias, we used area under receiver operating characteristic curve (AUC) and calibration with 1000 bootstrap samples to measure the predictive performance of the nomogram. For convenience of clinical use, a novel scoring model was established, which could make clinical prediction easier and more convenient. To estimate the performance of the scoring model, we used AUC, sensitivity, specificity, and accuracy. All statistical test results were considered significant when p was lower than 0.05. All statistical analyses were performed in R-3.6.2 (R Foundation for Statistical Computing, Vienna, Austria). 19

Patients Characteristic
As shown in Table 1, the statistical analysis included 1018 eligible patients divided into a training cohort (808 patients) and a validation cohort (210 patients). Men accounted for about a half of the cohorts (52.4 %), and the median age was 59.0 years (range, 48.0-67.0 years). The median tumor size was 64 mm (range, 45.0-89.8 mm), and local invasion was mainly tumor invasive extension (35 %). Thymoma accounted for more of the thymic epithelial tumors (74.8 %), and the majority were low-risk thymoma (40.5 %). Lymph node metastasis was found in 9.5 % of the study cohort.

Feature Selection Based on LASSO
By running least absolute shrinkage and selection operator (LASSO) regression analyses, according to 10-fold cross-validation, a lambda (k) value of 4.79 with a log (k) of 0.68 were chosen (1-SE criteria), and features with non-zero coefficients were filtrated as the risk factors of thymic epithelial tumor involvement, as shown in Fig. 2. From eight features, this study selected three: age, extension, and histology type.

Construction of the Prognostic Model
As shown in Fig. 3, a nomogram Table 2.
Based on the score of each variable in the nomogram, a simpler and more generalizable model, called Lymphatic Node Metastasis Risk Scoring System (LNMRS), was constructed, as shown in Table 2. The predicted AUC of the LNMRS was 0.80 (95 % CI 0.75-0.85) for the training cohort, and 0.82 (95 % CI 0.70-0.93) for the validation cohort, as shown in Table 3. The receiver operating characteristic (ROC) curve is shown in Fig. 4B. Meanwhile, detailed scores were calculated, as shown in Table 2. The calibration curves are presented as prediction curves closed to the standard curve, as shown in Fig. 5.
We scored the entire cohort population using the LNMRS model and plotted the scores of both cohorts on a kernel-density map based on the incidence of lymph node metastasis. We determined a score of 13 to be the optimal threshold, whereby patients with a score lower than 13 have a low risk of metastasis and those with a score higher than 13 have a high risk of metastasis, as detailed in Fig. 6. For example, if a 40-year-old patient has pathologic thymic carcinoma and an extension of adjacent organs/structures, then this person has an LNMRS score of 17, which indicates a high risk of lymph node metastases based on the kernel-density map.

DISCUSSION
Currently, no predictive model exists for lymph node metastasis in thymic epithelial tumors. In this study, we developed a simple nomogram-based model called the Lymph Node Metastasis Risk Scoring System (LNMRS), which includes age, tumor extension, and histologic type. This prediction model had an AUC of 0.80 (range, 0.75-0.85) for the training set and an AUC of 0.82 (range, 0.70-0.93) for the validation set, with good discriminative effect and calibration ability. With only three variables, our model was not only objective and accurate, but also easier to generalize to clinical studies.
Some research showed that lymph node status was a significant prognostic factor for patients with thymic epithelial tumors. 2,3,20,21 Findings suggested that nodal sampling or lymph node dissection can be performed to LNMRS the lymphatic node metastasis risk scoring system  LNMRS the lymphatic node metastasis risk scoring system, CI confidence interval, AUC area under the curve acquire accurate staging and prediction of prognosis. 2,22 Our analysis of 1018 patients found that lymphatic metastasis is lymph node metastasis related to age, pathologic type, and tumor extent. This conclusion also was reached in another study. 23 In addition, we noted patients with negative lymph node findings who had higher postoperative scores and whether some preventive treatment measures, such as adjuvant radiotherapy and individualized postoperative follow-up assessment, could be used for this group of patients. The National Comprehensive cancer Network (NCCN) suggests that patients with R0 resection need not be treated with chemotherapy or radiotherapy, but should be surveilled for recurrence with an annual chest computed tomography (CT) scan. However, lymph status could not be shown clearly for patients with R0 resection. 24 In our study, the probability of lymph node metastasis was calculated based on a nomogram with personal clinical information. Patients with R0 resection had high probability of lymphatic metastasis and were more likely to experience lymph node metastasis. Nevertheless, no study exists to support postoperative adjuvant therapy for such patients. Therefore, further research is needed.
The SEER database has a massive amount of clinical information for researchers to perform a large range of clinical studies. Based on the role of lymph node metastasis in disease progression, lymph node prediction using relevant variables from the SEER database has been performed for different malignancies with proven results. [25][26][27] However, some limitations of the SEER database are inevitable. First, the SEER database gathers only patients' clinical information, and neither the consistency nor the standardization of patient treatment could be normalized. Second, the validation set was not available for a sufficiently prolonged follow-up period, resulting in a survival rate that was not applicable. Finally, in the classification of variables, those not described according to the specificity of a different neoplasm (e.g., invasive carcinoma confined to gland of origin in CS Extension) do not distinguish Masaoka stage 1 from stage 2 neoplasms. OPEN ACCESS This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.