Background

Cervical lymphadenopathy (CLA) is a common disease occurring in patients of all ages, with an annual incidence of 0.6–0.7% for the general population [1, 2]. The most common etiologies are reactive hyperplasia (38–79%) and tuberculous lymphadenitis (4–34%) in benign cases and metastatic carcinoma (50–94%) and lymphomas (5–41%) in malignant cases [3,4,5,6]. Referral patterns and treatment strategies for different types of CLA are all distinct; thus, accurate identification of the specific etiology is essential for subsequent medical management [1, 7]. However, the differential diagnosis of CLA is challenging, especially in patients without reliable medical history and characteristic symptoms, which is commonly seen in underdeveloped areas of developing countries [8]. Due to the lack of a universally accepted protocol for the investigation of lymphadenopathy, some of these unexplained CLA may experience an average delay of 3 to 6 months from the initial presentation of symptoms to the diagnosis of malignancy [9]. Recently, specialized lymph node diagnostic clinics have been established in several developed countries and advanced medical institutions to benefit unexplained CLA patients with rapid, agile, and scheduled systems, but same interventions are still impractical in many countries and regions with underdeveloped healthcare conditions, involving a huge population worldwide [1, 10, 11].

Imaging methods are the main tools for detection, diagnosis, and follow-up monitoring in unexplained CLA patients, including ultrasound imaging (US), computed tomography (CT), and magnetic resonance imaging (MRI). Compared with other imaging modalities, US is more convenient, economical, and radiation-free and has better resolution in characterizing cervical lymph nodes (CLNs). It consists of two basic modalities, B-mode ultrasound (BUS) and color Doppler flow imaging (CDFI), where BUS reliably shows the size, shape, borders, and internal echoes of the CLN, while CDFI is utilized to complement BUS by detecting blood vessels and assessing the vascular distribution of the CLN in real time [12]. The importance of BUS and CDFI duplex ultrasound in patients with CLA is well recognized, and this method is recommended as the first-line diagnostic tool for unexplained CLA [11, 13]. However, the diagnostic performance of the dual-modality US strongly relies on the clinical and professional expertise of radiologists [14, 15]. Subjective image interpretation, lack of effective quantification, and persistent intra- and inter-observer variability remain the main dilemmas faced in US examinations. Consequently, a significant proportion of patients with unexplained CLA are frequently misdiagnosed and subsequently subjected to unnecessary investigations and inappropriate treatment [16].

To enable timely and accurate diagnosis of unexplained CLA patients with relatively less demand for clinical expertise, one potentially promising approach is utilizing artificial intelligence (AI) technology. AI technology represented by radiomics can mine high-throughput quantitative features from image data to reveal disease features and includes two main strategies of machine learning and deep learning (DL). Some inherent characteristics of ultrasound images (including limited image quality and susceptibility to operator influence) can make the manual definition and extraction of image features less reliable, limiting the performance of traditional machine learning. Meanwhile, DL has gradually started to become a mainstream research method for ultrasound image analysis by using deep neural networks and data-driven learning techniques to achieve automatic extraction and quantification of image features imperceptible by naked eyes [17,18,19]. Recent studies have shown that DL has achieved good performance in diagnosing thyroid nodules [20], classifying parotid gland tumors [21], identifying extra-nodal extension of head and neck squamous cell carcinoma [22], predicting prognosis of oral cancer [23], and detecting COVID-19 pneumonia [24]. However, its application in the context of lymph node imaging is still rare, and only few studies reported that DL with BUS images of lymph nodes could identify whether relevant draining lymph nodes of breast [25], thyroid [26], and lung cancer [27] were metastatic or not. To the best of our knowledge, it has not been used in the characterization of unexplained CLA yet.

In this study, we developed a cervical lymphadenopathy hierarchical diagnosis model (CLA-HDM) based on DL radiomics. It used BUS and CDFI dual-modality images to establish a two-level diagnostic structure for unexplained CLA. CLA-HDM mimics the clinical diagnosis logic and divides the characterization task into three sub-tasks. It firstly classifies unexplained CLA as benign or malignant and then determined the specific etiology in each condition. It was trained and validated (both internally and externally) in multi-center unexplained CLA patient cohorts. The performances of different radiologists were compared between with and without CLA-HDM’s assistance. The model was opened for external validation.

Methods

Study cohorts

This multi-center diagnostic study, conducted from June 1, 2018, and November 31, 2021, was approved by the ethics committee of the Second Hospital of Lanzhou University, and the requirement for individual consent for this retrospective analysis was waived. This study followed the Standards for Reporting of Diagnostic Accuracy guidelines.

1906 patients were collected from three hospitals located in underdeveloped areas of China (hospital 1: Lanzhou University Second Hospital; hospital 2: Gansu Provincial Cancer Hospital; hospital 3: People’s Hospital of Ningxia Hui Autonomous Region), who all had definitive CLA pathological findings by US-guided needle and/or excisional biopsy. Excision biopsy was required only when the needle biopsy result was inconclusive. The following inclusion criteria were applied: (a) patients without obvious infectious etiology or clinical symptoms (e.g., tenderness and fever), (b) patients without history of malignancy or chemoradiation, and (c) patients with available BUS and CDFI images. The exclusion criteria were as follows: (a) patients with incomplete pathological and clinical information and (b) patients with poor BUS or CDFI images. The flowchart of the patient inclusion criterion is shown in Fig. 1.

Fig. 1
figure 1

Patient selection flowchart. CLA, cervical lymphadenopathy; US, ultrasound

Image acquisition

All examinations of patients (ultrasound and LN biopsy) at the three hospitals involved were performed by radiologists with more than 10 years of ultrasound experience, and the ultrasound images of these patients were obtained from 14 different diagnostic ultrasound instruments (the process of ultrasound images collection is shown in Additional file 1: Methods 1; details of the instruments used in each hospital can be found in Additional file 1: Table S1). In accordance with the clinical practice, the selected lymph node for biopsy at each hospital was the most suspicious lymph node on images (the largest suspicious lymph node was selected when multiple suspicious lymph nodes were present) [12, 28]. Baseline characteristics (sex, age, node longitudinal diameter, location, neck level, and methods of pathologic diagnosis) of the patients and selected lymph nodes were obtained from electronic medical records and biopsy reports.

Model development

CLA-HDM consisted of three task-specific classification sub-models, including sub-model 1 for the diagnosis of benign and malignant unexplained CLA, sub-model 2 for the diagnosis of tuberculous and reactive in the set of benign type candidates, and sub-model 3 for the diagnosis of metastatic and lymphoma in the set of malignant type candidates (Fig. 2). Each sub-model had a dual branch and late-fusion structure with two attention blocks. The two branches took BUS and CDFI images as inputs respectively. A fully connected layer like channel attention block was applied to reweight the R, G, and B channels to highlight the important color information in CDFI. Then, the BUS and channel-reweighted CDFI images were fed forward into their respective feature extractors (ResNet-50 [29]). Modality fusion attention was applied to the features in the CDFI branch, and its weights were obtained from the features of the BUS branch by global average pooling and fully connected layer, in order to mimic radiologists who read CDFI images primarily based on the understanding of corresponding BUS images. These three task-specific sub-models shared the same structure but not parameters (Fig. 2a).

Fig. 2
figure 2

Proposed deep learning-based hierarchical diagnostic model (CLA-HDM) to non-invasively assess unexplained CLA. a Each sub-model takes BUS and CDFI images as inputs and assigns weights between different color channels in CDFI branch and pays attention to specific CDFI features under the guidance of BUS branch via attention mechanism. b For each test case, our model utilizes dual-modal ultrasound images as inputs each time, outputs hierarchical diagnostic task-related predictive probabilities and corresponding heatmaps to compare with and assist radiologists. CLA, cervical lymphadenopathy; BUS, B-mode ultrasound; CDFI, color Doppler flow imaging; AI, artificial intelligence

In the training stage, we trained three sub-models independently on the training cohort. In the testing stage, we firstly evaluated the performance of the sub-models individually. Then, the three sub-models were assembled to build CLA-HDM to diagnose every case in the testing cohorts. Whether CLA-HDM would output the diagnosis probability of sub-model 2 or sub-model 3 was automatically determined by the diagnosis result of sub-model 1(Additional file 1: Method S2). Layer-CAM [30, 31] was applied to the final stage feature maps of the feature extractors to visualize the heatmaps (Additional file 1: Method S5). Details of the methods, including data preprocessing and model development, are shown in Additional file 1: Method S1, S3 and S4 [32,33,34,35,36,37,38,39,40,41].

Radiologist study

A two-stage radiologist study was conducted to evaluate the diagnostic performance of the CLA-HDM and its clinical application value. Six radiologists with an average of 10 years of US experience (3–20 years) participated in this study, and they were divided into three groups according to the years of experience: seniors (radiologist 1 [F.N.], 20 years; radiologist 2 [Y.D.], 14 years), middles (radiologist 3 [Y.Y.J.], 9 years; radiologist 4 [T.T.D.], 8 years), and juniors (radiologist 5 [Y.F.W.], 5 years; radiologist 6 [X.F.], 3 years). The testing cohorts were shuffled and submitted to radiologists. Each radiologist was asked to interpret them blindly and independently.

In the first stage of radiologist study, the BUS images, CDFI images, and baseline characteristics of each patient were available for radiologists. Each radiologist first classifies unexplained CLA as benign or malignant, and then they further determined specific etiology. In the second stage (AI-assisted radiologist study), the corresponding lymph node hierarchical diagnostic heatmaps and AI probabilities were provided for the radiologists. Each radiologist was allowed to change or maintain the initial diagnosis and gave the final diagnosis conclusions (Fig. 2b).

Statistical analysis

All statistical analyses were performed using SPSS software (version 26.0) and Python (version 3.8.10). Continuous variables were expressed as means ± standard deviations, and comparisons between two groups were made using the Mann-Whitney U test or Student’s t-test. Categorical variables were expressed as numbers and percentages, and comparisons between two groups were made using the chi-squared test or Fisher’s exact test. ROC analysis was used to evaluate the diagnostic performance of the model in the training and testing cohorts (micro-averaging was used to plot multi-class ROC [42]). 95% confidence interval (CI) was calculated using bootstrapping with 1000 resamples. Differences in performance between CLA-HDM and six radiologists and among six individual radiologists without and with AI assistance were assessed using McNemar’s test. Diagnostic performance between the CLA-HDM and three different levels of radiologist groups and between different radiologist groups was compared using a permutation test. Statistical significance was set at P < 0.05.

Results

A total of 763 unexplained CLA patients were successfully enrolled in this multi-center study (Fig. 1), and the detailed pathological diagnostic results are shown in Additional file 1: Table S2. Of these, 566 cases from hospital 1 were used as the primary cohort to reduce overfitting or bias in the analysis. Cases before 2021 were selected in the primary cohort as the training cohort (n = 395) for model development, while cases from 2021 were used as the internal testing cohort (n =171) to simulate prospective experimental conditions. Cases from hospitals 2 (n = 105) and 3 (n = 92) were used as external test cohorts 1 and 2, respectively. There were no clinically significant differences between the training and three testing cohorts (P > 0.05; Additional file 1: Table S3), and all testing cohorts were used for radiologist-machine comparison.

Sub-model performance evaluation

The performance of three sub-models was tested independently. In the internal testing, and external testing cohorts 1 and 2, sub-model 1 showed AUCs of 0.932, 0.963, and 0.896; an accuracy of 86.0%, 87.6%, and 82.6%; a sensitivity of 89.5%, 83.3%, and 81.8%; and a specificity of 78.9%, 96.9%, and 83.8% for differentiation between benign and malignant unexplained CLA. Sub-model 2 showed AUCs of 0.922, 0.857, and 0.872; an accuracy of 84.2%, 75.8%, and 78.4%; a sensitivity of 85.7%, 76.2%, and 71.4%; and a specificity of 80.0%, 75.0%, and 87.5% for differentiation between tuberculous lymphadenitis and reactive hyperplasia. Sub-model 3 showed AUCs of 0.852, 0.847, and 0.827; an accuracy of 86.0%, 86.1%, and 83.6%; a sensitivity of 87.9%, 88.7%, and 87.2%; and a specificity of 73.3%, 70.0%, and 62.5% for differentiation between lymphoma and metastatic carcinoma (Table 1 and Fig. 3).

Table 1 Performance of sub-models and CLA-HDM in the diagnosis of unexplained CLA
Fig. 3
figure 3

Diagnostic performance of three task-specific sub-models and their assembled model (CLA-HDM) in the training cohort, internal testing cohort, and external testing cohort 1 and 2

CLA-HDM performance evaluation

After integrating three sub-models together, CLA-HDM designed for diagnosing four common etiologies of unexplained CLA (reactive, tuberculosis, lymphoma, and metastatic) achieved the overall AUCs of 0.873 (95% CI, 0.838–0.908), 0.837 (95% CI, 0.789–0.889), and 0.840 (95% CI, 0.789–0.898) in three testing cohorts, respectively (Table 1 and Fig. 3). More specifically, AUCs for reactive hyperplasia were 0.718 (95% CI, 0.595–0.856), 0.875 (95% CI, 0.793–0.967), and 0.812 (95% CI, 0.691–0.952); for tuberculous lymphadenitis were 0.883 (95% CI, 0.830–0.939), 0.860 (95% CI, 0.795–0.938), and 0.897 (95% CI, 0.828–0.976); for lymphoma were 0.816 (95% CI, 0.685–0.964), 0.670 (95% CI, 0.518–0.843), and 0.936 (95% CI, 0.884–1.006); and for metastatic carcinoma were 0.855 (95% CI, 0.811–0.906), 0.825 (95% CI, 0.758–0.894), and 0.804 (95% CI, 0.730–0.882), respectively (Additional file 1: Fig. S1).

Heatmaps for interpreting CLA-HDM decision-making

After using heatmaps to visualize the decision-making of CLA-HDM, we found clearly different patterns for four etiologies in BUS and CDFI images (Fig. 4). To determine benign or malignancy, model tended to focus on the intranodal region in BUS, which is the same region as radiologists making diagnosis. Heatmaps showed that CLA-HDM concentrated on intranodal vessels, not surrounding vessels for benign CLA in CDFI. However, for malignant CLA, it focused more closely on peripheral or mixed vascularity. Furthermore, the focus on CDFI tended towards the most abundant intranodal vessels for reactive hyperplasia, but towards the peripheral vessels for tuberculosis. Differently, when CLA-HDM successfully identified lymphoma, it focused on the area of intense hilar vascularity in CDFI, but it paid attention to the surrounding peripheral area for the true positive diagnosis of metastatic carcinoma, forming a lollipop shape in CDFI. Those information was notified to radiologists for diagnosis assistance in this study.

Fig. 4
figure 4

Examples of heatmaps generated by CLA-HDM for each etiology of unexplained CLA. When ultrasound BUS and CDFI images of a case (first row) are input into CLA-HDM, it will firstly give first-level diagnostic heatmaps to distinguish benign from malignant CLA (second row) and then second-level diagnostic heatmaps to identify the specific etiologies of benign or malignant CLA (third row). Generally, the heatmaps reveals a corresponding regularity for each pathology category. CLA, cervical lymphadenopathy; BUS, B-mode ultrasound; CDFI, color Doppler flow imaging

First stage of the radiologist study

In the first stage, six radiologists without AI assistance and CLA-HDM were recruited for the radiologist-machine comparison. Compared with each individual radiologist, CLA-HDM achieved systematically better accuracy, sensitivity, and specificity than all radiologists in the three testing cohorts, except for radiologist 1 (a senior radiologist) in the external testing cohort 1, who had equivalent performance to the CLA-HDM (P >.05, Fig. 5 and Table 2). Moreover, CLA-HDM showed significantly better accuracy, sensitivity, and specificity than some of these radiologists in different testing cohorts (P < 0.05, Table 2). Compared with three different levels of radiologist groups, CLA-HDM also achieved systematically better accuracy, sensitivity, and specificity than all groups and was significant in at least one testing cohort (P < 0.05, Fig. 5 and Table 3).

Fig. 5
figure 5

Comparison between CLA-HDM and radiologists and between radiologists without and with AI assistance to identify four common etiologies for unexplained CLA. Radiologists 1 and 2 represent senior-level experience, radiologists 3 and 4 represent middle-level experience, and radiologists 5 and 6 represent junior-level experience. ROC, receiver operating characteristic curve; AI, artificial intelligence; CLA, cervical lymphadenopathy

Table 2 Comparison of diagnostic performance between CLA-HDM and six radiologists, and between radiologists with and without AI assistance
Table 3 Comparison of diagnostic performance between the groups of radiologists at different levels

Second stage of the radiologist study

In the second stage, all radiologists in the three testing cohorts achieved higher accuracy, sensitivity, specificity with AI assistance, except for radiologist 1 (a senior radiologist) in the external testing cohort 2, who had a slightly decreased performance, but not significant (P >.05, Fig. 5 and Table 2). Specifically, each individual radiologist with AI-assisted had an equivalent or slightly increased specificity, while accuracy and sensitivity were significantly improved in at least one testing cohort (P < 0.05, Table 2). In general, we found that in all three testing cohorts, CLA-HDM helped most radiologists to improve their original diagnosis, especially for reactive hyperplasia and metastatic carcinoma. Positive and negative examples of the two-stage AI-assistance study were illustrated in Additional file 1: Fig. S2, S3.

By analyzing AI assistance in terms of different radiologist groups, we found that accuracy, sensitivity, and specificity in three testing cohorts were all improved, especially for the junior and middle experience groups, whose improved diagnostic performance was comparable to that of the senior experience group without AI assistance (P > 0.05, Fig. 5 and Table 3). Moreover, a reduction in the false-positive rate (0.7–3.1%) and false-negative rate (2.2–10%) in the three groups was observed (Additional file 1: Fig. S4a). If only benign and malignant differentiation of CLA was considered, the false-negative rate of the radiologist groups with AI assistance decreased by 3.5–13.2%, and the false-positive rate decreased by 7.6–14.8% (Additional file 1: Fig. S4b).

Discussion

In this multi-center study, we proposed a DL model named CLA-HDM for accurately diagnosing unexplained CLA by integrating BUS and CDFI images. After both internal and external independent validations, it was proven to be effective in assisting radiologists, with a systematic improvement of their diagnostic accuracy in classifying unexplained CLA into reactive hyperplasia, tuberculous lymphadenitis, metastatic carcinoma, and lymphomas. It was especially helpful for radiologists with junior and intermediate experience. With AI assistance, their diagnoses were improved to the similar level of senior radiologists. To the best of our knowledge, this is the first study that uses a DL based radiomics model with medical images for the characterization of unexplained CLA patients. In total, 763 patients from three hospitals participated in this study, which guaranteed its credibility and provided a good basis for initiating larger scale perspective investigations in future.

CLA-HDM did not only provide a clinical judgement of unexplained CLA, but also visualized its decision-making by key feature-based heatmaps. By interpreting these heatmaps with senior physicians, we found that they often showed distinct and recognizable patterns for different etiologies. For the BUS images, there were two locations valuable for CLA-HDM to diagnose unexplained CLA, namely the lesion margins and the internal echoes of the lymph nodes; for the CDFI images, the model focused on the locations of the vasculature. This was consistent with the clinical experience and relevant studies [12, 43,44,45]. Specifically, malignant CLAs were typically associated with distinct features, such as well-defined sharp margins, extensive intranodal structural variations (for example, intranodal necrosis is common in metastases and reticulation is common in lymphomas), and abundant peripheral vascularity [43, 44]; the highlighted regions in the heatmaps were helpful to identify these representative characteristics of malignant CLAs. However, in most benign CLAs, the margins were ill-defined and blurry, the intranodal structure changed slightly, and vessels were rarely or only detected intranodal vessels (for example, avascular or hilar vascular flow is common in reactive CLAs and displacement vessel is common in tuberculosis) [45, 46]. And as a result, the entire lymph node and its peripheral areas on BUS images and intranodal vessels on CDFI images of benign CLAs is of importance in AI interpretation. These patterns were also consistent with biological or pathological characteristics of each etiology, which give a good direction for further investigation, but such speculations still need direct evidence to confirm. However, heatmaps undoubtedly played a good role in guiding radiologists, especially when they were facing some challenging cases with non-negligible uncertainty. This effective assistance was positively confirmed by all radiologists involved.

Compared with other studies of classifying malignant draining lymph node metastasis [47, 48], our study is facing a more complex clinical scenario, but the proposed model still achieved a good performance in both dichotomous and quadruple classification of unexplained CLA. More importantly, it was proved to be a good assisting tool for radiologists to improve their overall diagnostic accuracy. It revealed a great potential of helping radiologists to avoid subjective bias related to professional experience, which may reduce unnecessary investigations, inappropriate or delayed treatments. This especially holds a big significance for CLA patients in underdeveloped countries and regions.

There are several limitations in this study. First, the dataset we used for model development had a category imbalance across etiologies of unexplained CLA. This is mainly due to differences in the prevalence and clinical management of each etiology. When clinicians consider patients with unexplained CLA to be benign cases, they generally use follow-up rather than invasive procedures, resulting in a relatively small proportion of benign CLA cases of 34.4% included in the study. Also, the low prevalence of lymphoma compared to metastatic carcinoma resulted in a significant category imbalance within the group of malignant CLA. These factors affect the diagnostic performance of the model to some extent, and using more and broader data to address this issue will be an important direction for future work. Second, the retrospective nature of this study caused inevitable deviations. Our future research will incorporate the AI system into routine clinical workflows for perspective validations. Finally, the patients in this study were from medically underdeveloped regions of China. Therefore, the proposed model needs to undergo a multi-region survey for a more comprehensive investigation.

Conclusions

The proposed CLA-HDM based on dual-modality ultrasound images showed systematically better accuracy, sensitivity, and specificity in the diagnosis of four common etiologies of unexplained CLA than skilled radiologists. It helped to narrow the gap between radiologists with different levels of experience in classification, which is potentially of great significance for CLA patients in underdeveloped countries and regions.