Background

Tuberculosis (TB) remains the most common cause of death from a single infectious pathogen worldwide in 2019 [1]. It is estimated that with 10 million people developed TB disease and 1.4 million TB patients died in 2019 [1]. Tuberculous pleural effusion (TPE) is a common clinical manifestation of extra-pulmonary TB, which accounts for 25 ~ 30% of total TB cases in TB-endemic regions, including China [2–4]. Early and accurate diagnosis of TPE is extremely critical for the management of the disease. Currently, the gold standards for TPE diagnosis were based on the detection of acid-fast bacilli (AFB) in sputum, pleural fluid, or pleural biopsy tissues through Mycobacterium tuberculosis (M. tuberculosis) culture or performed by thoracoscopy [4, 5]. However, the limited sensitivity, low accuracy and invasive examination of those diagnostic tools compromised their diagnostic value in clinical practice [6–8]. Alternative diagnostic methods, including tuberculin skin test (TST), adenosine deaminase (ADA), and interferon-gamma release assays (IGRAs), have improved the speed for TPE diagnosis in recently years [4, 9–11]. However, the sensitivity and/or specificity of those methods were still insufficient when separated TPE from other type of pleural effusion (PE), such as malignant pleural effusion (MPE) and parapneumonic pleural effusion (PPE) [9–11].

Therefore, it was urgent to seek and establish a highly sensitive, accurate and less invasive diagnostic marker or method for TPE patients. The aim of this study was to construct a scoring system based on a nomogram to distinguish TPE from non-TB BPE. Besides, we also validated the diagnostic performance of the developed scoring system in the internal set and the external set from our patients and another hospital, retrospectively.

Materials and methods

Patients and study design

This was a retrospective study of individuals more than 18 years old who were admitted to Ningbo First Hospital with newly diagnosed PE between January 2014 and March 2021. A flow diagram of patient selection was presented in Fig. 1. We retrospectively reviewed all consecutive patients with the keyword ‘PE (J94.804 and J90. × 00)’ and ‘tuberculous pleurisy (A16.500)’ in the clinical electronic record system of Ningbo First Hospital. All the patients were first admitted to our hospital because of pleural effusion. All PE samples and concomitant blood samples were taken and tested for counts and biochemical parameters. The data from the first sample of PE and blood obtained in each patient was considered for analysis. The related demographic, laboratory, and clinical information for each patient were extracted from the clinical electronic record system. Finally, a total of 909 patients with BPE were enrolled in this study. Patients were randomly separated as the training set (n = 651) and the internal validation set (n = 258) at a 7:3 ratio, A cohort of 110 patients with PE in the Affiliated People Hospital of Ningbo University from August 2020 to November 2021 were used as the external validation set. Among 909 patients, 414 patients with BPE were caused by tuberculous pleurisy (TBP), and 495 patients were caused by parapneumonic effusion (PPE), chronic heart failure (CHF), empyema, parasitic infection and so on. Patients that meet all the following criteria were included: (i) PE was diagnosed underwent either ultrasonography, chest CT, or X-ray (ii) patients underwent diagnosis for PE by cytology, thoracentesis or pleural biopsy and follow-up (at least 6 months). The exclusion criteria were as follows: (i) patients diagnosed with MPE; (ii) age < 18 years old; (iii) pregnant women; (iv) patients with incomplete clinical data; (v) unknown etiology of PE.

Fig. 1
figure 1

The flowchart of patient selection. A Ningbo First Hospital set. B The Affiliated People Hospital of Ningbo University set. MPE malignant pleural effusion, PE pleural effusion, BPE benign pleural effusion, TB tuberculosis

The primary aim of the present study was to develop a scoring system with high predictive accuracy to accurately differentiate TPE from non-TPE. The training set included 70% of the patients with PE from Ningbo First Hospital to develop a novel scoring system based on a nomogram to distinguish patients with TPE from patients with non-TPE. The internal validation set included the remaining 30% patients with PE from Ningbo First Hospital to validate the diagnostic performance of the scoring system. The external validation set included 110 patients with PE from Affiliated People Hospital of Ningbo University, independent of the patients from Ningbo First Hospital, were used to further validate the predictive model.

This study was approved by the Ethics Committee of Ningbo First Hospital and the Affiliated People Hospital of Ningbo University. This study was conducted in accordance with the Helsinki Declaration. The requirement for written informed consent was exempted because of the retrospective nature.

Diagnostic criteria for BPE and TPE

BPE was diagnosed based on the following criteria: (a) no tumor cells found in PE; (b) PE of a known etiology, such as TPE or parapneumonic PE, that vanished after optimal treatment; (c) no signs of malignant disease were developed during the follow-up. TPE patients who were first diagnosed and treated in our hospital were included in our study, and was diagnosed based on any of the following criteria: (a) M. tuberculosis was positive in culture of the pleural effusion or pleura tissue; (b) granulomatous inflammation was present in the pleura biopsy by histologic examination and M. tuberculosis was isolated from other sites; or (c) the both presence of granulomatous inflammation in the pleura biopsy by histologic examination and clinical response to anti-TB treatment [12–14].

Data collection

The following clinical and laboratory data were acquired from the clinical electronic record system, including age, gender, smoking history, effusion routine [effusion white blood cell (WBC), neutrophil count, and lymphocyte count], effusion biochemical indexes [total protein, glucose, ADA, and lactatedehy drogenase (LDH)], blood routine (WBC, neutrophil count, and lymphocyte count), blood indexes [high-sensitivity C-reactive protein (hsCRP), erythrocyte sedimentation rate (ESR), ADA, and LDH], carbohydrate antigen 125 (CA125), and carbohydrate antigen 19–9 (CA19-9) in PE and serum.

Statistical analysis

Continuous variables were presented as median and inter quartile rang (IQR, 25th–75th), and were compared using either a t-test or Mann–Whitney U test, as appropriate. Categorical variables were presented as number and percentage (n, %), and were compared using the Chi-square (X2) test or Fisher’s exact test. Univariate logistic regression analysis was used to screen the independent factors in the training set, and all variables at a significant level [area under the curve (AUC) > 0.6] were selected for multivariate logistic analysis. Then, stepwise selection using the Akaike information criterion (AIC) in the multivariable logistic regression models determined the statistically significant variables. Odds ratios (ORs) were estimated and presented with 95% confidence intervals (CI). Selected variables were incorporated into the nomograms to construct the scoring system using the rms package of R. Calibration curves and decision curve analysis (DCA) were also performed. Receiver operating characteristic (ROC) curve and the corresponding AUCs were calculated to determine the discrimination capacity of the models in distinguishing TPE from non-TB BPE. Besides, the sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), positive likelihood ratio (PLR), and negative likelihood ratio (NLR) were performed to assess the diagnostic accuracy of the nomogram in the training set and validation sets. All statistical analyses were performed using R (packages rms, MASS, OptimalCutpoints, pROC, and rmda; version 4.0.5; http://www.r-project.org) and SPSS 22.0 (SPSS Inc., Chicago, IL USA). Two-sided P < 0.05 was considered to be significant.

Results

Baseline characteristics

A total of 909 patients with PE from Ningbo First Hospital were included in the present study, and were randomly divided into the training set (n = 651) and the internal validation set (n = 258), respectively. Besides, 110 patients from the Affiliated People Hospital of Ningbo University were included in the external validation set. The demographic and clinical, and laboratory characteristics of the patients among the three groups were presented in Table 1.

Table 1 The clinical characteristics of the training set, internal validation set, and external validation set

Univariate and multivariate logistic regression analyses in patients with TPE and non-TB BPE

Additional file 1: Table S1 compared the demographic, clinical, and laboratory variables between TPE and non-TB BPE in the training set. The cutoff values of those variables were calculated using the Youden index. As shown in Additional file 2: Table S2, most of the included variables were significantly different between the patients with TPE and non-TB BPE. The results calculated by univariate logistic analysis were shown in Additional file 2: Table S2. All 24 variables showed statistical significance. To establish an accurate prediction model, 16 variables with an AUC > 0.6 were performed to multivariate regression analysis. Stepwise selection using AIC method in the regression model identified six most valuable variables in distinguishing TPE from non-TB BPE with highest order. Table 2 summarized the results of the multivariate logistic regression analysis. Results were as follows: age (OR (95%CI), 0.419 (0.232–0.755)), effusion lymphocyte (OR (95%CI), 3.229 (1.824–5.715)), effusion ADA (OR (95%CI), 7.258 (3.745–14.066)), effusion LDH (OR (95%CI), 6.626 (2.894–15.172)), effusion LDH/ADA (OR (95%CI), 0.189 (0.097–0.370)), and serum WBC (OR (95%CI), 0.331 (0.173–0.634)) (Table 2).

Table 2 Multivariate logistic regression analysis of the clinical characteristics in the training set

Development and validation of the nomogram prediction model

A nomogram based on the above six variables was developed and presented in Fig. 2A. The calibration curve of the nomogram showed that the predicted line overlapped well with the reference line, indicating a good performance of the diagnostic monogram in the training set (Fig. 2B). In addition, the DCA was applied to assess the net benefit of the diagnostic nomogram in order to verify the clinically utility of the model. Results showed that patients would benefit more over the “treat-all” or “treat-none” strategy when the threshold probability was > 0.4 (Fig. 2C).

Fig. 2
figure 2

Development of the diagnostic nomogram. A Diagnostic nomogram for distinguishing TPE from non-TB BPE in the training set. B Calibration curve of the nomogram. C Decision curve analysis of the nomogram

Diagnostic performance of the scoring system in the training set and validation sets

In the training set, effusion LDH/ADA showed the largest impact on the discrimination of TPE from non-TB BPE in the model with a point of 10 (Fig. 2A). The other five variables were then modified to integer points: age (5 points), effusion lymphocyte (5 points), effusion ADA (8 points), effusion LDH (7 points), effusion and serum WBC (6 points) (Table 3). The optimal cutoff value for the total scores was calculated using ROC. When the cutoff value was 23 points, this scoring system showed a good discriminative performance in distinguishing TPE from non-TB BPE with an AUC of 0.937 (95%CI, 0.917–0.957, Fig. 3A and Table 4). The corresponding specificity, sensitivity, PLR, NLR, PPV, and NPV values were 89.0%, 89.5%, 8.5, 0.12, 87.2%, and 91.2%, respectively (Table 4).

Table 3 Diagnostic nomogram score calculation for the training set
Fig. 3
figure 3

Discrimination and calibration of the scoring system for distinguishing TPE from non-TB BPE. AC ROC curves of the scoring system in the training set, internal validation set, and external validation set. BD Calibration curves of the scoring system in the training set, internal validation set, and external validation set

Table 4 Diagnostic performance of the scoring system based on nomogram in differentiating TPE from non-TB BPE in the training set and validation sets

The scoring system also exhibited good discriminative values in distinguishing TPE from non-TB BPE in the internal validation set and external validation set, with AUCs of 0.934 (95%CI, 0.902–0.966, Fig. 3B and Table 4) and 0.941 (95%CI, 0.891–0.991, Fig. 3C and Table 4), respectively. The specificity, sensitivity, PLR, NLR, PPV, and NPV values in the internal validation set were 88.7%, 90.3%, 9.1, 0.13, 89.4%, and 89.6%, respectively (Table 4). The specificity, sensitivity, PLR, NLR, PPV, and NPV values in the external validation set were 93.6%, 87.5%, 7.5, 0.07, 90.6%, and 91.3%, respectively (Table 4). Furthermore, the calibration curve of the scoring system also showed good agreements in the three datasets (Fig. 3D–F).

Discussion

Early diagnosis and prompt therapy for patients with TPE is critical to prevent severe complications (pleural thickening, empyema, and calcification, etc.) and mortality. Despite the availability of various diagnostic methods, the early differential diagnosis of TPE from MPE and other non-TB BPE remains to be challenging in clinical practice. Besides, paucibacillary nature of the disease, inappropriate and inadequate test samples, ineffective conventional microbiological techniques, lack of thoracoscopy equipment all lead to the difficulty for diagnosing TPE.

Conventional histopathologic presence of M. tuberculosis on culture, or pleural pathology showing caseating granuloma is the gold standard for diagnosing TPE, however, the diagnostic tests were time consuming and low positive rate [8, 11]. Tuberculin skin test (TST) and interferon-gamma release assays (IGRAs) were two common detection methods for diagnosing TPE, but the limitations of inaccuracy, inconsistent sensitivity, and time to diagnosis have retained its efficacies [11, 15, 16]. Under the circumstances, thoracoscopy seemed to provide a higher sensitivity (93–100%) and accuracy for diagnosing TPE, however, it was an invasive and expensive diagnostic method with a reported 2–6% rate of complications [8, 17, 18]. The common complications were bleeding, fever, empyema, pneumonia, and prolonged air leak and so on [18]. Besides, several patients with underlying disease progression and elderly patients cannot tolerate the examination.

In recently years, the Xpert MTB/RIF (Xpert) and/or next-generation Xpert MTB/RIF Ultra (Xpert Ultra), two nucleic acid detection methods, have been increasingly used to diagnose pulmonary TB, rifampicin (RIF) resistance as well as extra-pulmonary TB in various types of clinical specimens endorsed by World Health Organization (WHO) [19, 20]. A meta-analysis indicated that the pooled sensitivity of Xpert in diagnosing TPE was only 51.4% [21]. The low sensitivity has compromised its diagnostic capacity for TPE, which might be attributed to the number of mycobacteria and performance of amplification techniques. Therefore, an effective and noninvasive diagnostic method is urgently needed for diagnosing and management of TPE.

Nomograms are a graphical representation of a complex mathematical formula, which are widely used to estimate diagnosis and prognosis for a variety of diseases by integrating clinical, biologic, and/or genetic variables in medicine [22]. Previously, we and other investigators had reported the application of nomogram in differentiating MPE from BPE [23, 24]. In the present study, we developed a scoring system based on a nomogram to distinguish TPE from non-TB BPE. We initially integrated 26 variables, including not only primary clinical and laboratory variables but calculated ratios. We selected six most significant variables (age, effusion lymphocyte, effusion ADA, effusion LDH, effusion LDH/ADA, and serum WBC) analyzed by multivariate regression analysis to construct a predictive model. Our model showed a good diagnostic performance in distinguishing TPE from non-TB BPE in the derivation and validation sets. The integrated six commonly indexes were inexpensive, routinely tested, and readily available in most hospitals, therefore, our model is convenient to apply in clinical practice.

Effusion ADA has long been used to diagnose TPE in numerous studies [11, 15]. Michot et al. indicated that effusion ADA at an optimal value of 41.5 U/L might be a useful biomarker to differentiate TPE from non-TPE with a sensitivity and specificity were with a sensitivity of 97.1% and a specificity of 92.9% [25]. A study conducted by Garcia-Zamalloa et al. showed a similar cutoff value of effusion ADA with 40U/L [26]. However, a recent study from China showed that best cutoff value of effusion ADA for TBP was 27U/L with a sensitivity of 81% and a specificity of 78% [27]. A similar cutoff value of effusion ADA was also found in our study (22.75 U/L). Therefore, the optimal cutoff values are still controversial due to the prevalence rates of the disease, sample sizes, different test methods, or HIV co-infection [11]. Besides, a similar or even higher level of effusion ADA has been reported in PPE, especially in patients with empyema [28, 29]. Effusion LDH was recommended to assist in the classification of patients with complicated parapneumonic effusion (CPPE) [30]. However, an elevated effusion LDH in TPE, PPE, and MPE and the low sensitivity and specificity of LDH in differentiating TPE from PPE limited its utility in clinical practice [30].

The effusion LDH/ADA ratio was also assessed in differentiating TPE from PPE. Wang et al. indicated that effusion LDH/ADA ratio might be a useful biomarker in diagnosing TPE at a cut-off level of 16.20, with a sensitivity of 93.62% and a specificity of 93.06% [31]. Another study from New Zealand also showed that effusion LDH/ADA ratio at a cutoff value of 15 demonstrated a high sensitivity and specificity in distinguishing TPE from non-TB effusion [32]. Similarly, our study showed a cutoff value of 17.07 for effusion LDH/ADA. Further prospective investigations were needed to validate the results in the future.

To our knowledge, this was the first study to evaluate a scoring system based on a nomogram in distinguishing TPE from non-TB BPE. The developed scoring system might be reliable and accuracy in distinguishing TPE from non-TB BPE, which was assessed by the indexes of sensitivity, specificity, PLR, NLR, PPV, and NPV in the training and validation sets. Our study incorporated the most common and valuable indexes in the predictive model to differentiating TPE from non-TB BPE, which was better than any single variable alone. The six easily accessible and inexpensive variables routinely tested and acquired in most hospitals. Therefore, our diagnostic model for differentiating TPE from non-TB BPE could be easily used in clinical practice in most hospitals, especially in primary hospitals.

Our study had some limitations. First, the present study was retrospective design. Only routine biomarkers in serum and PE were included in the study. Several newly potential biomarkers, such as interleukin 27 (IL-27) and tumor necrosis factor-α (TNF-α), might provide better diagnostic accuracy. Second, external validation was a single-center with a small sample size. Third, our nomogram did not incorporate imaging data into the scoring system, which might be useful. Besides, we also did not compare the diagnostic accuracy of our scoring system and other diagnostic tests for unavailable data, such as IGRAs and Xpert Ultra. Finally, this study was conducted on Chinese patients. Since the incidence of TB differs from country to country, the results of this study cannot be applied to patients in other countries. Further multicentric and prospective investigations containing comprehensive data was needed to validate our results.

Conclusions

Taken together, the present study developed a novel scoring system based on a nomogram with six clinical and laboratory variables to aid differential diagnosis of TPE and non-TB TPE. Our novel scoring system showed a good diagnostic performance and calibration in distinguishing TPE from non-TB TPE in the training set and the validation sets. Further multicentric and prospective investigations should be used to validate the accessible and non-invasive nomogram.