Segmentation of mediastinal lymph nodes in CT with anatomical priors

Purpose Lymph nodes (LNs) in the chest have a tendency to enlarge due to various pathologies, such as lung cancer or pneumonia. Clinicians routinely measure nodal size to monitor disease progression, confirm metastatic cancer, and assess treatment response. However, variations in their shapes and appearances make it cumbersome to identify LNs, which reside outside of most organs. Methods We propose to segment LNs in the mediastinum by leveraging the anatomical priors of 28 different structures (e.g., lung, trachea etc.) generated by the public TotalSegmentator tool. The CT volumes from 89 patients available in the public NIH CT Lymph Node dataset were used to train three 3D off-the-shelf nnUNet models to segment LNs. The public St. Olavs dataset containing 15 patients (out-of-training-distribution) was used to evaluate the segmentation performance. Results For LNs with short axis diameter \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ge $$\end{document}≥ 8 mm, the 3D cascade nnUNet model obtained the highest Dice score of 67.9 ± 23.4 and lowest Hausdorff distance error of 22.8 ± 20.2. For LNs of all sizes, the Dice score was 58.7 ± 21.3 and this represented a \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ge $$\end{document}≥10% improvement over a recently published approach evaluated on the same test dataset. Conclusion To our knowledge, we are the first to harness 28 distinct anatomical priors to segment mediastinal LNs, and our work can be extended to other nodal zones in the body. The proposed method has the potential for improved patient outcomes through the identification of enlarged nodes in initial staging CT scans. Supplementary Information The online version contains supplementary material available at 10.1007/s11548-024-03165-4.


Introduction
Lymph nodes and the lymphatic system comprise an integral part of the body's natural defense mechanisms and play a vital role in maintaining a person's health.Abnormalities to the lymphatic system can result in enlarged lymph nodes (lymphadenopathy) [1,2] with etiologies ranging from infection, autoimmune disease or malignancy.Distinguishing between the causes for enlarged and metastatic nodes from non-metastatic LNs is critical for clinicians in determining the correct treatment [1][2][3][4].Frequently, radiologists use a systematic approach to identify suspicious nodes through nodal size measurement with the help of established guidelines, such as the tumor, node, and metastasis (TNM) criteria [4].In particular, the presence of enlarged LNs in the setting of cancer not only dictates the staging and extent, but is vital to treatment and management.
In clinical practice, radiologists routinely identify, manually measure, and describe the features of lymph nodes on CT and MRI to identify areas of pathology.Among the various imaging features for lymphadenopathy, nodal size is the most widely used criteria [1][2][3][4] to determine benign versus malignant status when paired with clinical data.A node is considered enlarged if its smallest diameter (along the short axis) is greater than 10mm on an axial CT slice [1][2][3][4][5].However, this assessment can be cumbersome and time-consuming, especially at initial staging and while comparing multiple sites of metastasis during the evaluation of treatment response in follow-up imaging.To help relieve this laborious process, automated LN measurement can augment radiology workflows by aiding in the identification of LNs in specific regions of the body, such as the mediastinum.
In this paper, we present an approach to segment mediastinal LNs in CT studies of the body.Fig. 1 shows an overview of the pipeline.We used the LN labels for 89 CT volumes from the public NIH CT Lymph Node dataset, and combined them with the labels for 28 distinct structures in the body obtained through the public TotalSegmentator tool.Three nnUNet segmentation models were trained end-to-end with this data, and evaluated on a test dataset comprising of 15 patients from an external institution.Our results indicated a performance improvement (measured through Dice scores) over the current state-of-the-art method evaluated on the same test dataset.2 Methods

Patient Population.
We used datasets from two distinct institutions for the purposes of training and testing the 3D nnUNet models.The public NIH CT Lymph Node dataset [7,21] was used for training, and it comprised of a total of 176 CT series from 176 patients.90 CT volumes were obtained at the level of the chest (mediastinum), and segmentation masks for 388 nodes with a short axis diameter (SAD) ≥ 1cm were provided.The remaining 86 CT volumes were acquired at the abdomen with 595 abdominal LNs annotated.To our knowledge, no underlying disease causes or demographics were provided for the patients in the NIH dataset, and LNs that were smaller than 1cm were left unannotated.However upon visual inspection, only 89 of the 90 mediastinal CT volumes had a field-of-view centered around the thorax [12].Additionally, Bouget et al. [12] provided the ground truth annotations for all the mediastinal LNs in these 89 volumes.In particular, the authors adopted a "conservative" annotation approach and segmented all suspicious regions as lymph nodes.They incorporated nodes with any short-axis measurement including nodes smaller than the suggested RECIST criterion for malignancy of 1cm.We used these labels for the 89 volumes for training our models.
The test dataset was a public dataset released from St Olavs Hospital in Trondheim, Norway [9,22] that comprised of 15 patients with confirmed lung cancer diagnosis.A total of 384 lymph nodes were annotated in this dataset with 131 nodes having a SAD ≥ 8mm and 238 nodes with SAD < 8mm.The dimensions of the CT volumes in the test set ranged from (487 ∼ 512) × (441 ∼ 512) × (241 ∼ 829) voxels.To our knowledge, this is the largest publicly available dataset in which all mediastinal LNs have been segmented.Since LNs with a SAD ≥ 8mm are suspicious for metastasis [3,19,20], we considered these nodes as clinically significant in this work.Furthermore, in contrast to prior works, we focused solely on LN segmentation and not on station mapping.

Anatomical Priors
Inspired by prior literature [6,8,9,[11][12][13] on LN segmentation with anatomical priors, we utilized the public TotalSegmentator [23] that was designed to segment over 117 distinct classes in CT volumes.The tool is of tremendous use for various applications, such as personalized risk assessment through body composition analysis [24,25].TotalSegmentator was developed using a training set of 1,204 CT exams and encompasses a diverse array of scanners, institutions, and protocols to ensure its versatility and robustness in different clinical settings.We utilized the segmentation labels generated by this tool for 28 different structures (e.g., trachea, pulmonary artery etc.) in the body and combined them with the lymph node labels, resulting in a total of 29 distinct classes for training.A complete list of the segmentation classes provided by TotalSegmentator is presented in Supplementary Material Table 2. Incorporation of anatomical priors helped to disambiguate anatomical regions of the body that are of similar intensity as the LNs, such as the heart and esophagus, and it also reduced the effect of severe class imbalances between the lymph node labels (foreground) and the background.Furthermore, we were solely interested in the segmentation of LNs, so we only used the 28 classes for training and discarded them at test time.

3D nnUNet
The self-configuring nnUNet segmentation framework [26] was employed to train different configurations for the task of LN segmentation in CT.The nnUNet model is currently the de-facto standard for segmentation, and it can be adapted for various datasets and modalities, including CT and MRI.The framework automatically determined the optimal hyper-parameters for training a segmentation model and learned to segment target structures of interest.In this work, we trained 3D low-resolution, 3D full-resolution, and 3D cascade nnUNet configurations, and compared their performance.
During training, each configuration of the 3D nnUNet took as input the CT (unwindowed) volume and the corresponding ground-truth masks for 29 different structures.Five-fold cross-validation with different initialization of trainable parameters for a total of 1000 epochs was done.Distinct subsets of training and validation data from the 89 CT volumes were automatically created for each fold.The model learned to segment the target structures of interest in the volume, and iteratively refined it via a loss function.The loss function used by the model was an equally weighted combination of binary cross-entropy and soft Dice losses.This loss function computed a segmentation error that measured the overlap between the prediction and ground-truth.It was optimized using the Stochastic Gradient Descent (SGD) optimizer with an initial learning rate of 10 −2 and a batch size of 1.
At test time, the 3D nnUNet predicted the segmentation masks for the structures in the held-out test CT volumes.Since our primary objective was to segment LNs, we discarded the remaining classes at test time.The best model with the lowest loss from each of the 5 folds was used for inference on the test CT volume, and predictions from these five folds were ensembled together.All experiments were run on a desktop running Ubuntu 20.4 LTS with a NVIDIA V100 GPU with 32GB RAM.

Experiments and Metrics
The 3D nnUNet models in our work were trained with the data acquired at the NIH and tested on data obtained at an external institution (out-of-training distribution).Therefore, our primary experiment was the comparison against the slab-wise UNet designed by Bouget et al. [12], which was evaluated on the same test dataset of 15 patients.Additionally, we also wanted to determine the performance of the individual configurations of the 3D nnUNet on the test set.In order to quantify this, we used the Dice coefficient as the metric.Contrary to prior works [9,12,13], we did not apply any post-processing techniques to the predictions from our segmentation models.We only partitioned the LNs based on their SAD, and computed results for each component in the division.

Results
Table 1 provides a summary of the Dice scores across the different 3D nnUNet configurations.It also shows our model's performance in contrast to prior work [12] on the same test set.Fig. 3 contains box plots that shows the distribution of the Dice scores across the different nnUNet configurations.It can be seen   score of 44.8 ± 13.5 obtained by Bouget et al. [12], our results show a marked improvement with a 10 point increase in Dice.These results are corroborated by the box plots in Fig. 3, which shows that the median values of the dice score distributions also steadily increase across the various configurations.Due to the low number of testing cases (n = 15), a non-parametric Wilcoxon signedrank statistical test did not yield statistically different results.However, given the clear improvements in the Dice score, we believe that the addition of more data would provide clearer insights into any performance differences.

Discussion and Conclusion
In this work, we trained various configurations of the 3D nnUNet model to segment lymph nodes in mediastinal CT volumes with anatomical priors.As evidenced by prior works [6,8,[11][12][13], the utilization of 28 anatomical priors ameliorated the severe class imbalance problem and reduced false positive incidences as the model had supervision in the previously uncertain anatomical regions.The 3D cascade nnUNet obtained reasonable segmentation Dice scores for all LNs and those with SAD ≥ 8mm; these scores tend to be lower for structures that are smaller in size in contrast to larger organs (e.g.liver).Presently, it is impossible for an automated approach to obtain voxelperfect segmentations of LNs due to technical challenges in the CT acquisition process.The timing and uptake of contrast material can fluctuate, resulting in adjacent regions (e.g., azygos vein) to be iso-intense with the LNs that straddle the mediastinum as seen in Fig. 2(a), which can obscure their shape and size.Additionally, the manual annotations done by trained radiologists or residents for LNs in CT may not always be complete.For example, in Fig. 2(b), a lymph node was not annotated in the ground truth, but the nnUNet model correctly segmented this missed lymph node.Incomplete ground truth could also reduce the segmentation Dice scores as the correctly detected LN would be incorrectly considered as a false positive instead of a true positive.
Furthermore, the true metastatic nature of a node can only be determined through an invasive biopsy procedure for diagnosis.But, this may not be clinically feasible due to small sizes or anatomic locations.Thus, reliance on CT, PET/CT, or ultrasound imaging markers are few of the non-invasive ways to assess malignancy [12].Utilizing PET/CT can provide complementary information on metastatic nodes based on their metabolic activity; higher SUV values (regardless of the nodal size) are suspicious for metastatic disease.However, PET/CT is not the initial diagnostic test and is generally performed after CTs first identify a malignancy and areas of metastatic disease; to that end, the initial CT exam must be exhaustively used to derive biomarkers.
One of the main limitations of our work is the inability to disambiguate collocated LNs in the CT volumes due to the diversity of LN shapes and appearances.As pointed out by Bouget et al. [12], this task is often difficult even for an experienced radiologist, and it is expected that the task would be equally, if not more, challenging for an automated method as well.Additionally, we do not tackle the problem of station mapping in this work.Furthermore, the test dataset that we used in this work is relatively small with only 15 patients.Due to clear imbalances in the station-level distributions [12], an extensive data collection and annotation process would be required to address both these issues.This would also enable any statistical differences to be extracted across the different nnUNet model configurations.
As localization of lymph nodes and measurement of suspicious nodes are routine tasks that clinicians perform on a day-to-day basis, our end-to-end anatomical prior-guided approach to segmenting lymph nodes would potentially alleviate the cumbersome nature of the measurement task.Since the models were trained with data that was presumably acquired with a variety of imaging scanners and exam protocols, it is fair to note that our 3D cascade model was particularly effective at identifying LNs with SAD ≥ 8mm.It holds promise as a tool to report automated measurements, differentiate metastatic from non-metastatic nodes, and flag any concerning LNs that were missed by the reading radiologist.
In summary, the segmentation of mediastinal lymph nodes in CT was explored in our work through the use of anatomical priors.In addition to the LN labels for 89 volumes from the public NIH CT Lymph Node dataset, 28 different structures were also used to train different configuration of 3D nnUNet segmentation models in an end-to-end manner.As post-processing steps were unnecessary, our 3D cascade model was able to achieve a segmentation dice score of 72.2 ± 22.3 for clinically significant LNs with SAD ≥ 8mm.Our results show an improvement of 10 points over the current state-of-the-art method that was evaluated on the same test dataset.Mining additional LNs in unannotated CT exams would enable the segmentation performance to be improved over time.Our approach has immense potential for improved patient outcomes through the identification of enlarged nodes in initial staging CT exams, while also determining the best options for next steps, whether that be diagnostic biopsy or therapeutic treatment.

Fig. 1 :
Fig.1: Flowchart of the proposed approach to segment mediastinal lymph nodes in CT using anatomical priors.First, the public TotalSegmentator tool was used to segment 28 structures in 89 mediastinal CT volumes from the public NIH CT Lymph Node dataset.Next, these labels were combined with the manual annotations for mediastinal LNs, and used to train a 3D nnUNet segmentation model.At test time, the 3D nnUNet was executed on CT volumes of 15 patients in the public St Olavs dataset.Green labels in the prediction correspond to the predicted LNs.The figure is best viewed in color in the PDF.

Fig. 2 :
Fig.2: Results from our approach to detect mediastinal LNs in CT volumes.Left column: A slice of the original CT volume, Middle column: GT annotation, Right column: Prediction from the nnUNet Cascade model.The different colors in the GT correspond to the different stations of the LNs, but for evaluation purposes, they were all considered to belong to one class based on their short axis diameter.Notice that in (b) for patient #7, the model was able to partially capture the large metastatic node (blue), while it also identified an unmarked node in the GT (middle).In (c), the model missed the node in blue.

Fig. 3 :
Fig. 3: Box plots of the different 3D nnUNet model configurations for the segmentation of mediastinal lymph nodes in the St Olavs dataset.Results are shown for lymph nodes with short axis diameters ≥ 8mm.

Table 1 :
Comparison of the different nnUNet models for the LN segmentation task.Bold font indicates best results."-" stands for unreported results.
from the table that the 3D low-resolution nnUNet fares the worst amongst all the configurations.The 3D full-resolution nnUNet yielded respectable Dice scores across all LNs and for those LNs with SAD ≥ 8mm.However, the 3D cascade nnUNet models performed the best.Of note, the 3D cascade nnUNet with the first-stage predictions from the full-resolution model demonstrated the best Dice scores across all LN size categories.This model achieved a Dice score (mean and standard deviation) of 54.8 ± 23.8 across all LNs regardless of their short axis diameters, 49.6 ± 23.5 for LNs with SAD < 8mm, and 72.2 ± 22.3 for LNs with SAD ≥ 8mm, respectively.In contrast to the Dice

Table 2 :
Complete list of all organs and structures used for training the 3D nnUNet models in this work.28 classes were generated by TotalSegmentator when it was executed on the 89 CT volumes in the NIH CT Lymph Node dataset.These were combined with the lymph node labels from the NIH CT Lymph Node dataset to yield the final 29 classes for training.