Introduction

The differential diagnosis of pulmonary lesions can be challenging, particularly when it becomes necessary to distinguish infections from manifestations of the underlying condition in hematological patients [1]. Morphological findings are usually considered unspecific, especially that of pulmonary lymphoma and invasive bronchopulmonary aspergillosis [2,3,4]. The identification of primary pulmonary lymphoma can be particularly challenging, as the incidence is very low, i.e., < 1% of all non-Hodgkin lymphomas and 0.5% of all primary pulmonary malignancies [5, 6]. In a retrospective study, 13 of 19 patients with pulmonary lymphoma manifestations were initially misdiagnosed as having pneumonia, lung cancer, or tuberculosis [7].

Histopathologic workup is usually required to establish the correct diagnosis, but transthoracic biopsy is often difficult and requires the patient to be able to cooperate [8]. In hematological patients, thrombocytopenia could further increase the risk of complications of invasive diagnostic procedures, and, if sedation or general anesthesia is required, patients are put at an additional risk [9]. Therefore, noninvasive tools to improve the differentiation of unclear pulmonary lesions are desirable. Due to its excellent soft tissue contrast, MRI is being investigated in this context, e.g., by using DWI [10,11,12] or DCE-MRI [13, 14]. A straightforward approach proposed by Nagel et al shows encouraging results for differentiating infectious and noninfectious pulmonary lesions using simple signal intensity quotients (referred to as “nonenhanced imaging characterization quotients,” NICQs) from 3-T MR images [15]. However, the best parameters did not exceed an AUC of 80%. Theoretically, the diagnostic performance of 3-T MRI may be further enhanced by using texture analysis [16]. For this purpose, freely available software exists, e.g., HeterogeneityCAD and PyRadiomics [17, 18].

The aim of the present study was to evaluate texture analysis of nonenhanced MR imaging at 3 T for differentiating fungal infiltrates and pulmonary lymphoma manifestations in hematological patients. The diagnostic performance should further be compared with that of NICQs.

Materials and methods

Patients

This monocentric prospective study was approved by the local ethics committee (EA4/017/14). All patients were consecutively included and gave written informed consent. Data on NICQs in this patient collective have been published recently [19].

The main inclusion criteria were an underlying hematological disease and the presence of at least one solid pulmonary lesion in a current, clinically indicated chest X-ray or CT scan; patients with contraindications to MRI were excluded. Pulmonary lymphoma manifestations had to be histopathologically proven or show unequivocal response to antineoplastic treatment during follow-up. Fungal infections had to be at least “probable” according to the European Organization for Research and Treatment of Cancer/Invasive Fungal Infections Cooperative Group and the National Institute of Allergy and Infectious Diseases Mycoses Study Group (EORTC/MSG) Consensus Group [20]. The final diagnosis was based on all available clinical data and established by a senior consultant oncologist (S.S.). All patients were included and scanned between April 2014 and July 2018. Sixteen patients of the initial collective of 51 patients were excluded: 2 patients with poor general condition not able to complete the examination, 2 patients due to poor image quality, 2 patients, in which the reliable attribution of the findings to lymphoma or fungal infection was ultimately not possible, and 10 patients with neither fungal infection nor lymphoma manifestation (Fig. 1).

Fig. 1
figure 1

Flow chart displaying the inclusion and exclusion of patients in this study

MRI technique

All MRI examinations were performed on a 3-T scanner (Magnetom Skyra, Siemens Healthineers). The patients were imaged in supine position using an MRI protocol derived from Biederer et al and Attenberger et al with a surface coil on the chest [21, 22].

Texture analysis was performed using imaging data acquired with a T2-weighted (T2w) single-shot fast spin-echo sequence (time of echo, 27 ms; time of repetition, 500 ms; refocusing flip angle, 160° after initial 90° excitation pulse; matrix size, 256 × 320; slice thickness, 5 mm) and a T1-weighted (T1w) gradient-recalled echo sequence (time of echo, 2.04 ms; time of repetition, 5.39 ms; flip angle, 9°; matrix size, 180 × 320; slice thickness, 3 mm), both in axial plane. A multi-breath-hold regimen was applied, aiming for a max breath-hold time of 8–10 s. An additional T2w single-shot fast spin-echo sequence in coronary plane was acquired for planning purposes, but not considered in the analysis. It has been shown that the protocol was suitable for immunocompromised patients [23].

Image analysis

Image analysis was performed by two readers: a board-certified radiologist (S.N.) with more than 7 years of experience in cross-sectional imaging and, to evaluate interrater variability, by a radiology resident (D.K.) with more than 3 years of experience in MRI. For determination of intrarater variability, the less experienced reader repeated the image analysis in 9 cases of fungal infection and 9 cases of lymphoma manifestations.

ROIs were drawn in T1w and T2w using the freely available software “3D Slicer” (version 4.10) [24]. Every lesion was marked on every slice where it was clearly visible, resulting in a volume that was considered in the further analysis. Only solid parts were marked; i.e., blood vessels, bronchi, and the perilesional spaces were excluded. For illustration, examples of ROIs are given in Fig. 2. In case of several lesions, these were considered in the same order in T1w and T2w. The readers were blinded of any diagnostic or clinical data.

Fig. 2
figure 2

Upper row: 27-year-old male patient with acute myeloid leukemia and focal Aspergillus infiltrate in the left upper lobe; a T2w overview, b T1w, and c T2w with zoom on the lesion. Note the surrounding halo in (a), which was ignored when drawing the ROI. Scale indicates 2 cm. Lower row: 58-year-old female patient with gastric lymphoma originating from mucosa-associated lymphatic tissue (MALT) and manifestation in the right lower lobe; e T2w overview, f T1w, and g T2w with zoom on the lesion. The ROI encloses only the solid part of the lesion; i.e., a small bronchus was spared in this case. Scale indicates 2 cm. (c) shows confluent areas of patchy hypointensities in a rather geographic distribution, while (g) shows small hypointense spots in a more repetitive pattern. (d) and (h) are concepts of a pixelwise representation of the structure of the two lesions. The images show differences between the two lesions in terms of the distribution of white, light gray, dark gray, and black pixels: in (h), pixels are homogeneously distributed throughout the image, while in (d), black and darker gray pixels are clustered in the upper right part of the image

First-order statistics were extracted from the original images using HeterogeneityCAD (Commit 27ade9a) [17] in a first approach and, subsequently, PyRadiomics version 2.1.2 [18] to crosscheck the results for entropy and uniformity, after they did not show the expected inverse behavior.

The default settings for HeterogeneityCAD were left unchanged. For PyRadiomics, as recommended to make results more comparable, images were normalized and a voxel array shift was applied in the analysis. The configuration was set with the resampling adjusted to the slice thickness as follows: imageType: Original: {} \ featureClass: firstorder: \ setting: normalize: true, normalizeScale: 100, interpolator: “sitkBSpline”, resampledPixelSpacing: [2, 2, 2] for T1w or resampledPixelSpacing: [3, 3, 3] for T2w, binWidth: 5, voxelArrayShift: 300.

ROIs to calculate NICQs were placed as reported in a previous study [15]: T2NICQs were calculated from signal intensities of the lesion, muscle, and fat \( \left(\left(\frac{{\mathrm{SI}}_{\mathrm{Lesion}}-{\mathrm{SI}}_{\mathrm{Muscle}}}{{\mathrm{SI}}_{\mathrm{Fat}}-{\mathrm{SI}}_{\mathrm{Muscle}}}\right)\ast 100\right) \), T1Qmean from signal intensities of the lesion and muscle \( \left(\frac{{\mathrm{SI}}_{\mathrm{Lesion}}}{{\mathrm{SI}}_{\mathrm{Muscle}}}\right) \); for T2NICQ90th the 90th percentile of the signal intensity of the lesion, for all other measurements, the mean signal intensity was used.

Statistical analysis

Statistical tests were performed on the results of the first reading by S.N.. In case of multiple lesions in a patient, the largest lesion was defined as the leading lesion. Categorial parameters are given as frequencies. All metric data were tested for normal distribution using the Shapiro-Wilk test. For normally distributed data, descriptive statistics are given as mean and standard deviation. If no normal distribution was found, median and interquartile range are given. The Mann-Whitney U (MWU) test was used to test for significant differences of the parameters and ROC analysis was used to determine the diagnostic performance based on the leading lesions and on all lesions. For lesion-based ROC testing, an additional adjustment for clustering according to Obuchowski was done [25]. Resulting AUCs were rated (70–80% acceptable, 80–90% excellent, 90–100% outstanding) [26] and compared with those of the NICQs using the DeLong test [27].

For assessment of interrater agreement, ICC estimates and their 95% confidence intervals were calculated based on a mean-rating (k = 2), absolute-agreement, 2-way random effects model. For assessment of intrarater agreement, ICC estimates and their 95% confidence intervals were calculated based on a mean-rating (k = 2), absolute-agreement, 2-way mixed effects model. Intra- and interrater reliability was rated (ICC < 0.5 poor, 0.5–0.75 moderate, 0.75–0.9 good, > 0.9 excellent) [28].

Statistical analysis was performed using SPSS (SPSS Statistics, version 25.0, IBM Corp.) and R (version 3.5.1). For all tests, a p value < 0.05 was considered statistically significant.

Results

Study group

Sixteen patients with fungal infections and 19 patients with pulmonary lymphoma manifestations were included into the analysis. Further details are provided in Table 1. The mean scan duration was 5:24 min (range 2:48–8:08 min).

Table 1 Patient demographics and data of the included lesions. Mean values and standard deviation are provided for age; median and interquartile range for lesion volumes

Texture analysis parameters

Data were continuous, but not normally distributed according to Shapiro-Wilk tests. Table 2 summarizes the analysis for the leading lesion and Table 3 for all lesions including adjustment for clustered data.

Table 2 Patient-based ROC analysis of the leading lesions. Results based on analysis of 16 fungal lesions and 19 lymphoma manifestations and results of PyRadiomics. Additionally, the median and IQR of HeterogeneityCAD results are listed. Cutoff value and direction are only specified in case of significant differences. IQR, interquartile range; MWU, Mann-Whitney U; NICQ, non-enhanced imaging characterization quotient
Table 3 Non-clustered ROC and clustered ROC analysis of all lesions. Results based on analysis of 33 fungal lesions and 38 lymphoma manifestations and results of PyRadiomics. Additionally, the median and IQR of HeterogeneityCAD results are listed. Cutoff value and direction are only specified in case of significant differences. IQR, interquartile range; MWU, Mann-Whitney U; NICQ, non-enhanced imaging characterization quotient

The following statistical analysis was performed on the results of the PyRadiomics evaluation only, since they showed the expected inverse behavior for entropy and uniformity, contrary to the results obtained with HeterogeneityCAD. Still, the results for HeterogeneityCAD are presented to illustrate how the results by different algorithms can diverge.

For the leading lesion, the MWU test showed significant differences between fungal infiltrates and pulmonary lymphoma manifestations for T1w uniformity, energy, and entropy, as well as for T2w energy.

For the leading lesion, T1w entropy showed the best diagnostic performance, followed by T2w energy, T1w uniformity, and T1w energy with only slightly inferior results. Considering all lesions, the overall performance was slightly inferior with T2w energy showing the best performance, followed by T1w energy, T1w entropy, and T1w uniformity.

The AUCs of T1w uniformity, T1w energy, and T1w entropy as well as of T2w energy furthermore exceeded those of NICQs. For the leading lesion, the DeLong test showed a significantly different AUC only for T1w entropy vs. T1Qmean (p < 0.05), with other differences being only close to significance, i.e., T1w energy vs. T1Qmean (p = 0.09), T2w energy vs. T1Qmean (p = 0.06), T2w energy vs. T2NICQmean (p = 0.07), and T2w entropy vs. T2NICQ90th (p = 0.06). Considering all lesions, significantly different AUCs were observed for T1w and T2w energy vs. T2NICQ90th and for T2w energy vs. T1Qmean (p < 0.05 each) with other differences being only close to significance, i.e., T1w energy vs. T1Qmean (p = 0.07) and T2w energy vs. T2NICQmean (p = 0.06).

Intra- and interrater reliability

For T1w, intrarater reliability and interrater reliability were good to excellent for entropy and uniformity (ICC ≥ 0.86; p < 0.001). Except for the moderate intrarater agreement for energy, which was lowest of all with ICC = 0.64 (p = 0.05), and for kurtosis with ICC = 0.73 (p < 0.05), the remaining parameters in T1w also showed almost consistently good to excellent agreement (ICC ≥ 0.75).

For T2w, again entropy and uniformity showed the best intra- and interrater agreement (ICC ≥ 0.81; p < 0.01), but values were lower than for T1w except for a slightly higher ICC for entropy (ICC = 0.95; p < 0.001). All other parameters in T2w showed good-to-excellent intra- and interrater agreement (ICC ≥ 0.76; p < 0.05). Details are shown in Table 4.

Table 4 Intraclass correlation coefficient testing for intra- and interrater reliability of skewness, kurtosis, entropy, and uniformity for T1w and T2w. Results of interrater testing based on analysis of 33 fungal lesions and 38 lymphoma manifestations; results of intrarater testing based on analysis of 9 fungal lesions and 9 lymphoma manifestations. For both intrarater reliability and interrater reliability testing, results of PyRadiomics were considered. ICC, intraclass correlation coefficient

Discussion

This study shows that first-order statistics of texture analysis from 3-T MR images provides good overall diagnostic accuracy and useful supplementary information to enhance the differentiation of fungal infiltrates and pulmonary lymphoma manifestations in hematological patients.

The imaging data can be acquired with a speed-optimized MRI protocol including fast T1- and T2-weighted sequences that have been shown to be suitable for immunocompromised patients and have also been used to evaluate NICQs [15, 23]. The present results show that first-order statistics can improve the diagnostic performance of nonenhanced pulmonary MRI while maintaining a short examination time. It is relevant to keep the examination time as short as possible, since this group of patients unlikely tolerates prolonged MRI examination times [1]. Reliable classification and knowledge of the underlying entity of pulmonary lesions is essential, as it can spare patients invasive procedures and allows earlier initiation of appropriate treatment, such as antifungal therapy opposed to antineoplastic treatment in patients with pulmonary lymphoma manifestation [29].

In our results based on the analysis of PyRadiomics, T1w uniformity, entropy, and energy along with T2w energy showed the best performances for differentiating pulmonary lymphoma and fungal pneumonia.

Uniformity is a measure of the homogeneity with greater values implying a smaller range of discrete signal intensity values. Entropy specifies the uncertainty/randomness in the image values and measures the average amount of information required to encode the image values. Energy is a measure of the magnitude of the histograms’ voxel values and correlates with the variation on the brightness levels [30].

When considering the leading lesion, the diagnostic performance of T1w entropy was rated excellent and was also significantly superior as compared with T1Qmean. However, not all comparisons yielded significant differences, but the performance of T1w entropy, uniformity, and energy along with T2w energy was generally higher than that of NICQs. T1w und T2w energy performed similarly when considering all or just the leading lesion, while T1w entropy and T1w uniformity showed better results when considering the leading lesion. It is also noteworthy that of the parameters with the best diagnostic performance, three were a T1w- contrary to one T2w-parameter.

A brief insight into the pathology would suggest lymphoma manifestations to present with higher uniformity and lower entropy than fungal infiltrates: Fungal infections in neutropenic patients are most commonly caused by invasive Aspergillus and Candida species [31]. Histologically, these invasive infections tend to show a suppurative inflammatory response with intra-alveolar inflammatory fluid and a granulomatous inflammatory response, occasionally surrounded by fibrosis [32]. Likewise, hemorrhage and necrosis are observed in invasive fungal lung infections [32, 33]. Cavitation is another finding in patients with pulmonary fungal infection [33].

In contrast, both primary and secondary lymphoma manifestations are histologically characterized by extra-alveolar interstitial infiltrates containing mainly densely packed mass-forming, neoplastic lymphoid cells [34, 35]. Thus, unlike inflammatory lesions, pulmonary lymphomas form homogeneous masses.

However, uniformity was lower and entropy was higher in lymphoma manifestations. It should be kept in mind that MR images are a macroscopic representation of the examined tissue and do not necessarily allow conclusions about the exact histologic tissue composition. Regarding entropy, our results are in line with those of a study by Suo et al, in which the heterogeneity of malignant and inflammatory pulmonary lesions in CT images was assessed. Though neither pulmonary lymphomas nor fungal pneumonias were included, the absolute entropy in each region was also larger in the cancer than in the inflammatory lesion group. However, this difference was not significant (all p > 0.05) [36].

The comparison of the lesion size was not the aim of the study, but there are obvious differences with larger volumes in the lymphoma group. However, since readers were not required to mark the lesion as a whole, sizes were not considered in the analysis. Nevertheless, size differences themselves might contribute to differentiating between fungal infections and pulmonary lymphoma manifestations. Moreover, size differences of the lesions have to be considered in the evaluation of the energy of the lesions: This parameter is volume-confounded [30]; thus, a difference in size can already lead to a difference in energy. This may also explain the low intrarater agreement of energy:

If the ROIs are not equally sized, variability is possible by the raters or by volume-confounding. Thus, energy would generally have to be considered as an unreliable parameter as soon as the ROI sizes are variable.

A further interesting point when looking at the results is that entropy and uniformity provide a means of an intrinsic quality assurance: These parameters should behave inversely [37]; i.e., increasing entropy should be associated with decreasing uniformity and vice versa. This was also the reason for us to finally base the analysis on the results of PyRadiomics. But independently of this, the results of the two algorithms also diverge in the other parameters. Since numerous parameters can be set for the analysis (e.g., resampling, normalization, or bin size to name a few), the results will also easily deviate. The congruency and behavior of individual parameters may then remain understandable, but the effects on the analysis as a whole will become more complex. This will likely still be the case, although the source code can be viewed by any user: It must be assumed that most clinicians and clinical investigators will not be familiar with the underlying algorithms. Thus, texture analysis and radiomics in a broader sense likely remain a “black box” for most users which could produce results separating entities without the investigators fully understanding. This also means that any inconclusive results may easily be adopted. Nevertheless, radiomics is considered as an emerging new means for image evaluation and a powerful tool expanding human capabilities.

This study has following limitations:

First, texture analysis was confined only to solid portions of the target lesions. Further studies should focus on analyzing different regions of a lesion as in the study of Suo et al, who analyzed and compared edge and core regions [36]. Such a study design could also provide further insights into the perilesional space of fungal nodules and the halo sign known from CT, even when it is not visible to the naked eye on MR images. Second, histopathologic proof of lymphoma or proof of the pathogen causing fungal infection was not available in all patients; therefore, the clinical response to treatment had to be used as a standard of reference in such cases.

In conclusion, T1w entropy, uniformity, and energy and T2w energy showed the best performances for differentiating pulmonary lymphoma from fungal pneumonia and outperformed NICQs. Results of the texture analysis should be checked for their intrinsic consistency to identify possible incongruities of single parameters.