Background

Since their introduction in routine clinical practice in the 1920s, chest radiographs have been used as a primary tool to diagnose and manage pulmonary tuberculosis (PTB) [1,2,3]. To date, despite their limitations and the availability of computed tomography, they remain the most commonly used tool in PTB diagnosis and management worldwide [4,5,6]. The chest x-ray (CXR) has been used not only as a diagnostic tool, but also to estimate disease severity in multiple TB studies and clinical trials [7,8,9].

There are several methods of grading the radiological severity of disease by estimating the extent of lung field that is ‘abnormal’, including the WHO grading system [10] or the US National Tuberculosis and Respiratory Disease Association classification [11]. In 2010, Ralph et al. [12] created a simple validated scoring system (using a score out of 140) from findings that the proportion of lung fields affected by disease at diagnosis of PTB was associated with a greater acid fast bacilli (AFB) smear grade and that the presence of cavitation (but not number or size of cavitation), along with the percentage of lung field affected on CXR, predicted 2-month smear positivity on treatment. This has gained some currency in studies describing radiological severity [13,14,15] and in a subsequent study that validated this approach [16].

The relationship between radiological appearance and disease severity has been assessed by comparison with measures of bacterial load such as smear microscopy and culture [17,18,19,20,21]. At diagnosis, the presence of cavitation visible on CXR has been associated with a higher sputum AFB smear grade [12, 17, 22]. The time taken for specimens in automated liquid culture to signal positive is inversely related to bacterial load [13, 23, 24]. Using this concept, a study of 95 images showed that the presence of cavitation on CXR was associated with a shorter time to positivity (TTP) [21]. More recently, some studies have shown that patients with cavitation have a higher bacterial load as judged by TTP in liquid culture [23, 24]. Another study of 244 patients with radiographic assessment of cavitation found that the colony forming units per milliliter were significantly higher in those with cavitation; this was also true using TTP as a marker for bacterial load [22].

In his seminal review of post mortem examinations of patients with TB, Canetti [25] described a difference in the number of bacteria in lung tissue of samples with cavities compared to those with caseous necrotic tissue only and areas of alveolitis. He found that tubercle bacilli were abundant in the inner layer of a cavity, abundant but less so in a solid area of caseous tissue, and rare within areas of inflammatory tissue. As Canetti described bacilli-rich areas as well as areas of inflammatory change, one would think that the host’s inflammatory response in addition to the bacterial load should affect the findings on the CXR prior to treatment. We look at a number of patient factors that may affect this host response and which have been associated with radiological findings in other studies, such as on HIV status [26], diabetes mellitus status [27, 28], age [29], ethnicity [30, 31], and gender [32], to investigate what host factors affect radiological severity. Hypoalbuminemia at diagnosis of PTB and low body mass index (BMI) are surrogates of disease severity known to lower survival rates [33, 34]. TB symptoms at diagnosis have been associated with worse burden of disease [35].

With so much weight put on the extent of radiological findings, little is known about what this reflects. We use the REMoxTB database [36] of patients from Africa and Asia with PTB to determine whether radiological extent of disease judged by the CXR severity score correlates with M. tuberculosis bacterial load as measured by Mycobacteria-Growth-Indicator-Tube (MGIT) TTP.

Methods

Study sites and patients

Data were collected from the REMoxTB clinical trial, which compared the use of two 4-month moxifloxacin-containing regimens to the standard 6 month first line treatment for PTB [36]. Between 2007 and 2012, 1931 patients were enrolled from 51 sites across 8 countries in Africa and Asia and the protocol mandated pre-treatment postero-anterior CXRs, sputum sampling for AFB smear and culture, and routine blood tests (including liver function tests, albumin levels and HIV testing). During the trial, study patients were excluded if they had severe medical comorbidities or were already taking antiretroviral treatment for HIV prior to study enrollment. In this study, all patients were adults aged 18 years or more, who had smear- and culture-positive PTB by molecular speciation.

CXR scoring

The CXR images were taken at the clinical sites by a radiographer and either uploaded as a digital image (DICOM file) or presented to the clinical site staff as a plain film. Plain films were digitalized with digital photography using a standard protocol to ensure images were of an adequate quality. An early assessment of ‘readability’ was performed and, where films were judged poor, sites were asked to re-take the images. All images were converted into DICOM files for evaluation.

The digital images were read independently by two clinicians (SHG and SEM) using the Osirix medical imaging software on Apple iMAC computers with at least 1920 × 1080 pixel screens, and readers were encouraged to take regular breaks during the reading process. Images were sent to readers by study site and were read in the same order.

Both readers followed standardized criteria to establish whether an image was of sufficient quality for analysis (Table 1). If deemed satisfactory, the image was assessed for the presence of cavitation and a measure of percentage of abnormal lung field. In the case of discrepant results on readability or presence of cavitation, a third reader blinded to the primary assessment reviewed the film (FC). Only those images that the first two readers agreed on for readability or those that the third reader deemed readable were used in the final analysis. The final result for cavity presence was based on agreement between the primary readers or, if discrepant, the majority result including the third reading. The percentage of lung field affected was calculated using the method described by Ralph et al. [12], where the reader divides the lung fields into quadrants and by observation scores each quadrant by its percentage of abnormal opacification. The scores are then added together and divided by four to produce a total percentage of lung field affected by disease.

Table 1 Inclusion and exclusion criteria for deeming an image of sufficient quality for reading

Microbiological and clinical data

Sputum samples and demographic data were collected as part of the clinical trial protocol at screening and baseline visits, prior to starting treatment. Sputum samples were either early morning samples or spot samples, none of which were induced. The samples were processed by standard methodology and graded as described in the trial report. Samples that were re-treated due to contamination were not included in the analysis as this process altered the calculated TTP and, thus, could not guarantee an accurate quantification result. As part of the pre-treatment assessment, participants were tested for HIV and were asked about a history of diabetes mellitus. In addition, a series of questions about symptoms were asked and symptoms graded by severity using the modified Division of AIDS system [37] (Table 2).

Table 2 Division of AIDS (DAIDS) grading of adverse event (AE) severity (modified version). This describes the grading system referred to in this study to describe the severity of TB symptoms such as cough, night sweats, weight loss, and hemoptysis

Statistical analysis

The inter-reader variability was presented on a Bland–Altman plot using the final severity scores from readers 1 and 2. The average of the two readers’ calculation of the percentage of lung field affected was used with the final results of cavity assessment. Images where readers disagreed by 1.96 standard deviations or more were not included in the analysis to ensure accuracy in the average percentage value. The presence or absence of cavitation was plotted against TTP and a Wilcoxon rank sum test to calculate the difference in average TTP for each of the two groups was performed. The average percentage area of lung field affected was compared to log10TTP using linear regression and plotted on a scatterplot.

Baseline clinical and biochemical findings (age, sex, ethnicity, BMI, serum albumin, number of grade 3 or 4 TB symptoms, HIV status and type II diabetes status) and radiological severity score were included in a univariable regression analysis. Those found to be significant (p < 0.05) were used to create a multivariable regression model to determine the relationship of these characteristics with the radiological severity score. For this process the participants were put into two groups; those with cavitation and those without. Wilcoxon rank sum tests and χ2 tests were used to compare both groups. All statistical analysis was performed using R statistical software [38].

Ethical approval

This study was performed within the scope of the approvals provided for the REMoxTB clinical trial [36].

Results

Out of 1931 patients randomized for the trial, 1837 had CXRs taken within the required protocol time frame. Following the three-reader quality assessment, 1713 images were deemed readable. Taking into account available data required for analysis, including non-retreated culture results with TTP data, the total number of cases was 1354 from 47 study sites (Fig. 1). The baseline characteristics and findings for the 1354 cases with available matching data are shown in Table 3 and breakdown of participants by site in Table 4. A comparison of the characteristics between the included and excluded cohort are also shown to ensure sampling bias was not an issue (Table 5).

Fig. 1
figure 1

Flow diagram showing breakdown of final cohort for analysis

Table 3 Baseline characteristics of final 1354 subjects
Table 4 The 47 sites across 8 countries where the participants (1354) were recruited
Table 5 A comparison of the included and excluded cohorts. Using χ2 tests and Wilcoxon rank sum test p values are provided

Reader agreement

There was agreement for 1394 (76%) of the 1837 images available for either their readable quality or the presence or absence of cavitation. Agreement between the two readers on cavitation presence was 0.495 by Cohen’s Kappa score (95% CI 0.45–0.54, p < 0.001), where a value of < 0.4 is poor, 0.4–0.75 is fair to good, and > 0.75 to 1 is excellent [39]. The level of agreement when assessing the percentage of the area of lung field affected was illustrated using a Bland–Altman plot (Fig. 2).

Fig. 2
figure 2

Bland–Altman plot demonstrating the level of agreement between readers 1 and 2 in scoring the 1713 images for radiological severity (x axis: the mean average numerical score between readers 1 and 2, y axis: the difference in scores for each image between readers 1 and 2). Horizontal lines show the mean ± 1.96 standard deviations; 3.34 (23.11−16.44) (SD = 10.10)

Cavity presence and bacterial load

The number of images confirmed to have cavitation visible was 1049 (77.5%) of 1354. The median TTP for MGIT samples from all 1354 patients was 117 h (4.88 days) with an interquartile range of 89 h (3.7 days) to 153 h (6.4 days). Figure 3 shows a boxplot of distribution of TTP between those without and with cavitation on CXR at baseline. This demonstrates that the median TTP is 26 h greater in those patients without compared to those with cavitation (95% CI 16–30, p < 0.001, Wilcoxon rank sum test).

Fig. 3
figure 3

Boxplot of TTP distribution comparing subjects without and those with cavitation present on CXR. Thick black horizontal lines represent the median values with the interquartile range being the horizontal edges of the boxes. The overall range lies out with these and extreme outliers above the plots

Extent of radiological disease and bacterial load

The median percentage of lung fields affected on the chest radiographs was 18.1% (interquartile range 11.3–27.5%). Figure 4 shows a scatterplot of the percentage of lung field affected against the sputum culture log10TTP values for the 1354 patients. Using linear regression for every 10-fold increase in TTP, the area affected decreases by 11.4% (p < 0.001, 95% CI 14.9–7.9%).

Fig. 4
figure 4

Scatterplot showing the log10TTP (hours) from baseline sputum cultures against the percentage of lung field affected on the CXRs

Multivariable regression model: pre-treatment factors and baseline radiological severity

The percentage of lung fields affected was compared to other parameters in two groups; those with cavity presence and those without cavitation. Characteristics of these two groups are shown in Table 6, with both groups showing statistical differences in albumin levels (lower albumin levels in the cavity group), ethnicity (African participants having a higher level of cavitation and Asians with a greater number without cavitation), TTP (lower TTP in the cavity group), and percentage of lung field affected (greater in the cavity group). HIV status, diabetes status, culture TTP (log10TTP), serum albumin, number of grade 3 or 4 TB symptoms, BMI, age, and ethnicity were found to have a statistically significant effect on the radiological severity of chest images by univariable analysis in the group with cavity disease (Table 7). In the non-cavitatory disease group, only ethnicity, serum albumin, and number of grade 3 symptoms were statistically significant on univariable analysis. Putting these significant variables in a multivariable regression model (Table 8) in those patients with cavitatory disease, the factors found to have a significant effect on the area of lung field affected were BMI, serum albumin, and log10TTP. In those without cavitatory disease, the factors found to have a significant effect on area of lung field affected were the number of grade 3 and 4 symptoms and serum albumin.

Table 6 Characteristics of those with and without cavitation used in the analysis comparing other baseline factors and radiological severity on CXR at diagnosis
Table 7 Results of univariable analysis. The β-coefficient represents the change in percentage area of lung field affected for every 1 unit increase in variable. For Log10TTP, this is the change in percentage area affected for every 10-fold increase in TTP
Table 8 Multivariable regression analysis using variables found significant in univariate analysis. The β-coefficient represents the change in percentage area affected for every 1 unit rise in variables for albumin, BMI, number of grade 3/4 symptoms. and age. For log10TTP, this represents the change in percentage area affected for a 10-fold increase in TTP. For ethnicity, HIV, and diabetes this indicates the percentage difference in area affected between the two groups (for example, compared to the African cohort, Asians had 0.67% less area affected on the CXR than the African cohort)

Discussion and conclusions

The REMoxTB study provided a unique opportunity to address important questions about the role of radiology in the diagnosis and evaluation of severity of TB infection in a large group of smear- and culture-positive patients with PTB that spanned two continents. The subjectivity of CXR interpretation has been a longstanding concern in clinical practice and there have been multiple attempts to develop methods to standardize image reading in order to reduce reader variability [12, 40,41,42]. This study shows that agreement between readers in cavity assessment was moderate (Kappa score 0.495), comparable to other studies that found a Kappa agreement variation on cavity presence from 0.24 to 0.7 [43,44,45,46,47,48]. The clustering of scores across the x-axis and ‘0’ line of full agreement on the Bland–Altman plot (Fig. 2) confirms that the assessment of area of lung field affected is reproducible.

A high proportion of patients in this study had radiological evidence of cavitation (78%), compared with 72% in a study of 800 Turkish patients [20], 53.1% in a study of 893 USA-based patients [48], and 51% in a recent multicenter trial of 1692 patients in African sites [49]. Previous reports have suggested that the presence and number of cavities is related to bacterial load [12, 19,20,21,22, 50], but most of these studies were small, with an average of 138 patients (a range of 61–244). Using this large sample of patients we were able to show that there is a statistically significant reduction in the TTP (our surrogate for bacterial load) in patients with cavities compared to those without, with a median reduction of 26 h (p < 0.001). The large number of patients in this study provides the statistical power to demonstrate this unequivocally. It would, however, be reasonable to assume that such a reduction in TTP is of modest clinical significance, given that the replication rate of M. tuberculosis is approximately 14–24 h.

Looking at the two groups of cavity and no cavity, the cavity group had lower albumin, TTP, and greater area affected suggesting that those with cavities appear to have other markers of ‘severe’ disease.

More cavities were proportionately found in the African cohort than in the Asian cohort. This raises the question of whether ethnicity plays a role in cavity formation and the immune response to TB addressed in previous studies [29,30,31]. A recent study suggests that the pattern of radiological presentation at diagnosis is associated with certain inflammatory profiles in patients [30]. Significant differences between the cytokine response of Africans and Eurasian patients rather than Mycobacterial strain type have been demonstrated [30, 31]. This may be a contributing factor to the radiological severity of patients at presentation as we also noted a small but significant difference through univariate analysis of radiological score between patients of African origin and those of south and southeast Asian origin that was lost when put into a multivariable analysis.

The study also shows a relationship between overall area of lung field affected on radiograph and bacterial load with a very shallow association seen on the scatterplot presented (Fig. 4). The association described that it would require a 10-fold increase in TTP to change the area affected by 11%, suggesting that patients with a higher bacterial load do have greater radiological severity but the effect of this association is small.

Our study addresses the effect of variables such as ethnicity, initial bacterial load, nutritional status, HIV status, sex, age, symptom severity, and diabetic status by multivariable-regression analysis on radiological severity. When weighted against each other in a model, bacterial load does not have a statistically significant effect on the degree of diseased lung field on CXR in the group with non-cavitatory disease and, again, a modest effect in those with cavitation. This fits with the autopsy findings that Canetti described, where higher bacillary burden was found within cavities and their surrounding tissues but was much lower within the inflammatory, non-cavitating tissue alone [25].

The only variable found to be related to the severity of the CXR at diagnosis is the serum albumin level in both the cavity and non-cavity groups. Poor nutritional status of patients with TB, using pre-treatment albumin levels and BMI as surrogate markers of nutritional status [33, 34, 51], has been associated with poorer treatment outcomes and death. In our study, patients with low serum albumin concentration at diagnosis (at a level of 15 g/L at the lowest) had a contributing effect to the radiological severity, but again to a modest degree, with a 0.65% and 0.48% decrease in area affected for every 1 g/L increase in serum albumin at baseline in the cavitatory and non-cavitatory groups, respectively (Fig. 5).

Fig. 5
figure 5

Scatterplot of the serum albumin levels (x-axis) and the percentage of lung field affected on the CXR (y-axis) for all 1354 participants. A linear regression line shows a steady decrease in serum albumin as more area is affected by disease

Through this analysis, our findings show that the factors affecting the appearance of the radiograph are likely to be multifactorial and to include host parameters such as ethnicity, age, co-morbidities, the bacterial load, and degree of disease progression. The interaction of the factors affecting the inflammatory response of an individual to PTB infection is being explored in other research.

We included those with HIV and type II diabetes mellitus in our cohort and found no significant effect on the radiological severity. This may be due to our HIV cohort being a select group with CD4 counts > 250 at PTB diagnosis without preceding anti-retrovirals and those with diabetes with less severe disease as a requirement for the clinical trial. They may, therefore, not reflect the full spectrum of morbidity and its effects on CXR severity.

In summary, our study is the largest review of radiology in a well characterized patient group with smear- and culture-positive PTB and suggests that, although CXR is a valuable tool for diagnosis, its use for judging the bacterial burden of disease has limited value. This is not unexpected, as the radiological image appears to be a composite of the interaction of disease pathology caused by the organism, the severity of the immune response and the nutritional status of the patient. The statistical power of this large study has enabled us to precisely measure the associations between CXR severity and other factors measured. The effect of serum albumin level on the radiological severity serves as an indicator that hypo-albuminemia is a marker of disease severity, as shown in other studies where it has been indicated to predict poor outcome in PTB [52]. The full value of CXR as a prognostic marker is yet to be seen and warrants further analysis. Although the associations between CXR severity and other factors conform with our expectations that patients with higher bacterial burden have more extensive disease, the small size of the effect and the finding that, in a multivariable model, it is outweighed by other patient factors in those without cavitation and is modest in those with cavitation would suggest that clinicians should be cautious in over-interpreting cause of radiological disease extent at diagnosis.