Introduction

18F-FDG PET/CT is used in the staging of patients with non-small cell lung cancer (NSCLC) due to its superior accuracy in the detection of nodal involvement and metastatic disease compared to CT [1].

Meta-analyses of nodal involvement per patient and per node in NSCLC demonstrated pooled sensitivities of 0.76 and 0.65, and pooled specificities of 0.88 and 0.95 [2], suggesting that there is scope for improvement in both sensitivity and specificity. There is also a lack of evidence for the use of semi-quantitative analysis, with variable maximum standardised uptake value (SUVmax) thresholds used for differentiating benign from malignant involvement [2], in part due to the limitations of different image reconstruction parameters, which are known to affect the accuracy of standardised uptake value (SUV) measurements [3, 4].

Iterative methods are commonly used for the reconstruction of PET data because of improved signal-to-noise (SNR) ratios [35]. The most widely used iterative algorithm is ordered subset expectation maximisation (OSEM) [6]. OSEM aims to find the most likely image through repeated iterations with each iteration giving an image with a greater likelihood of describing the measured data. However, it is not possible to run this algorithm to full convergence as the image noise increases with each iteration, becoming visually unacceptable well before full convergence is reached [3, 5]. Due to this OSEM is stopped after a stipulated number of iterations resulting in an under-converged image and underestimation of SUVs.

Bayesian penalised likelihood (BPL) is an iterative PET reconstruction which includes point spread function (PSF) modelling, recently developed by GE Healthcare (Q.Clear, GE Healthcare, Milwaukee, WI, USA) [7]. As described elsewhere [710], BPL includes a relative difference penalty [11] which is a function of the difference between neighbouring voxels as well as a function of their sum [12]. This penalty function acts as a noise suppression term, which allows an increased number of iterations without the noise usually seen in OSEM [7]. It is controlled by the penalisation factor (beta), which is the only user-input variable. Modified block sequential regularised expectation maximization is used as an optimiser for this BPL algorithm, which, due to the penalty function, allows effective convergence to be achieved in images, potentially providing a more accurate SUV [12, 13]. We have previously shown the improvement this algorithm provides over OSEM in phantom studies [8], lung nodules [9] and colorectal cancer liver metastases [10].

Aim

The aim of this study was to test whether using BPL increases signal-to-background (SBR), SNR and SUVmax when evaluating nodal disease in patients with lung cancer compared to OSEM.

Materials and methods

Patient selection

All patients who underwent 18F-FDG PET/CT at our institution between October 2011 and April 2013 for the staging of NSCLC, with subsequent nodal station histopathological diagnosis, were retrospectively identified. Institutional review board approval is not required for retrospective analyses of this nature in our hospital.

18F-FDG PET/CT imaging protocol

PET/CT scans were performed on a 3D mode GE Discovery 690 PET/CT system (GE Healthcare). The patients were fasted for at least 6 h prior to their scan. Their blood glucose was measured and recorded on the radiology information system prior to intravenous injection, with 4 MBq/kg of 18F-FDG. Imaging commenced approximately 90 min post-injection (mean ± one standard deviation for this patient group 92 ± 6 min) and covered the skull base to upper thighs. The PET/CT images were acquired under normal tidal respiration for 4 min per bed position. The CT was performed using a pitch of 0.984, 120 kV, automA with a noise index of 25.

PET reconstructions

PET images were reconstructed using two different algorithms with the same normalisation correction factors and both using the CT scan for attenuation correction. The standard of care PET reconstruction algorithm used is time of flight (ToF) OSEM (VPFX, GE Healthcare). This was used with two iterations, 24 subsets and 6.4-mm Gaussian filter. The sinograms generated at the time of scanning were retrospectively processed using the ToF BPL reconstruction algorithm with a penalisation factor (beta) of 400, the only user-input variable for this algorithm, as this has been shown to optimise signal detection [8].

Imaging analysis

Semi-quantitative analysis

Analysis of the pre-existing PET images (reconstructed using OSEM) and new PET images reconstructed using BPL, was performed with the PET images fused with the CT component of the original study (on mediastinal windows) for analysis.

If there was more than one node in a single station, the analysed node was chosen based on the highest FDG-avidity on OSEM reconstruction, being the standard of care reconstruction used at time of analysis. If all the nodes within the station demonstrated background FDG uptake, the largest node was chosen.

The SUVmax of each node was recorded using a standard volume of interest (VOI) tool. Background SUVs were measured in the right lobe of the liver and descending aorta at the level of the carina, with 3.0-cm and 1.0 diameter spherical VOIs respectively. SUVmax, SUVmean and standard deviation (SUVsd) within the VOI were recorded for both reference organs. In addition, liver SUVpeak was also recorded. Signal-to-background ratio (SBR) for each node was calculated as node SUVmax divided by the descending aorta SUVmean. SNRs for both node and background were calculated using the SUVsd on a reference VOI as a measure of noise. The node SNR was defined as node SUVmax divided by descending aorta SUVsd and the liver background SNR as liver SUVmean divided by liver SUVsd.

To assess the effect of BPL on nodes with background FDG uptake (on OSEM), nodes were classified as FDG-positive (above-background) or FDG-negative (at-background). Background uptake was patient-specific and set as the descending aorta SUVmean on the OSEM images.

Visual analysis

Visual analysis of the OSEM and BPL PET images, fused with the CT component of the original study (viewed on mediastinal windows), was performed by a senior radiology resident with four years of radiology (including one year of PET/CT) experience, on the same workstation. Nodes were scored according to the degree of FDG uptake (above-background or at-background). The reference organ for background uptake was the descending aorta. The scorer reviewed the cases in a randomised order, blinded to the clinical outcome and outcome of prior scoring for each case. The scorer was not blinded to the nature of the reconstruction algorithm used.

Statistical analysis

Statistical analyses were performed using R [14] and IBM SPSS Statistics 22.0 (IBM Corporation, New York, NY, USA) with p values less than 0.05 considered as statistically significant. Differences in background SUVmean, SUVmax and SNR across the entire cohort were analysed using paired t-tests. Differences in node SUVmax, SBR and SNR were analysed using Wilcoxon rank-sum tests. The percentage difference in node SUVmax (%ΔSUVmax) was also calculated. Percentage difference in node SUVmax between histopathologically positive and negative nodes were analysed using the Mann-Whitney U test.

Diagnostic performance

The performance of both algorithms to detect malignant nodes was assessed using both semi-quantitative and visual criteria. For semi-quantitative criteria, receiver operating characteristic (ROC) curves were plotted, and area under the curve (AUC) values calculated. The areas under both ROC curves were compared using the method described by DeLong et al. [15]. The optimal SUV threshold for the diagnosis of malignancy was defined as the point on the curve closest to the upper left corner of the ROC space. Sensitivity, specificity and accuracy for malignancy detection were calculated for these thresholds, and an SUVmax threshold of 2.5. For visual criteria, nodes scored as above-background were designated malignant and nodes at-background were designated benign. Sensitivity, specificity and accuracy for malignancy detection were then calculated.

Results

Clinical characteristics

Forty-seven patients (29 male, 18 female, mean age 69 years, range 36–82 years) met the inclusion criteria. Within the cohort, 27 had squamous cell carcinoma, 18 had adenocarcinoma and two had adenosquamous carcinoma. A total of 112 nodal stations were included for analysis, of which 25 stations in 18 patients were histopathologically positive. Histopathological diagnosis was obtained by surgical or mediastinoscopic sampling in the majority of the stations (n=97), the remainder by transbronchial needle aspiration (n=15). The mean nodal short-axis diameter was 9 mm (range 3–27 mm). When stratified according to histopathological status, the mean short-axis diameter was 11 mm (range 5–27 mm) in positive stations and 8 mm (range 3–15 mm) in negative stations.

Background analysis

The average background SNR on OSEM was 10.4 (range 7.6–14.0), increasing to 12.4 on BPL (range 8.2–16.7, p<0.0001). There was no statistically significant difference in liver SUVmax and descending aorta SUVmax between OSEM and BPL (p=0.35 and 0.07, respectively), and very small albeit statistically significant differences in liver SUVmean, liver SUVpeak, liver SUVsd descending aorta SUVmean and descending aorta SUVsd (Table 1). The largest difference was in liver SUVmean with a mean difference of 0.17 (95 % confidence interval (CI) 0.11–0.22).

Table 1 Background standardised uptake value (SUV) analysis

Semi-quantitative analysis

SUVmax, SNR and SBR

On comparison of BPL with OSEM, there was a statistically significant difference in node SUVmax (mean difference 0.8, p<0.0001), SNR (mean difference 13.2, p<0.0001) and SBR (mean difference 0.4, p<0.0001). The %ΔSUVmax was 16 %. The results of the node analysis are summarised in Table 2.

Table 2 Summary of maximum standardised uptake value (SUVmax), signal-to-noise (SNR), signal-to-background (SBR) and percentage difference in SUVmax across the entire cohort and classified according to histopathology

Analysis by histopathology

In the 25 histopathologically positive nodes, there was a statistically significant difference in node SUVmax (mean difference 1.8, p<0.0001) and SBR (mean difference 0.9, p<0.001). The mean %ΔSUVmax was 23.7 %. There was a relatively lower increment in SUVmax (mean difference 0.5, mean percentage difference 14.3 %) and SBR (mean difference 0.2) in histopathologically negative nodes although the differences were also statistically significant. The results are summarised in Table 2 and Fig. 1. The differences in %ΔSUVmax between positive and negative nodes were also statistically significant (p=0.032).

Fig. 1
figure 1

Error bar chart plotting the mean maximum standardised uptake value (SUVmax) ± 1 SD on both reconstructions in the entire cohort and according to histopathology

Visual analysis of FDG uptake

On visual analysis of FDG uptake on OSEM compared to BPL (Table 3), scores were concordant in 100 nodes (89 %). All of the nodes with discordant scores had a higher score on BPL (two histopathology positive, ten negative).

Table 3 Results of visual analysis of FDG uptake compared to semi-quantitative criteria

With regard to semi-quantitative analysis, there was concordance in classification of FDG uptake in 110 nodes (98 %). The remaining two nodes had negative histopathology and were FDG-positive (above-background) on OSEM, but classified as FDG-negative (at-background) on BPL (Table 3, footnote).

Diagnostic performance

ROC curves were plotted to evaluate the usefulness of OSEM and BPL to detect histopathologically positive nodes. The AUC values were and 0.711 (p=0.001) and 0.697 (p=0.003), respectively (Fig. 2), with no statistically significant difference between the two algorithms (p=0.256).

Fig. 2
figure 2

Receiver operating characteristic (ROC) curves for detection of histopathologically positive lymph nodes with Bayesian penalised likelihood reconstruction (BPL) compared to ordered subset expectation maximum (OSEM). Optimum maximum standardised uptake value (SUVmax) thresholds for malignancy detection are indicated as SUVmax (sensitivity, specificity)

The optimum SUVmax threshold for detection of malignancy was 3.0 and 4.0 for OSEM and BPL, respectively (Fig. 2). The sensitivities, specificities and accuracies at these thresholds as an entire cohort are summarised in Table 4. Across these groups, a minor decrease in sensitivity (by 4.0 %) and increase in specificity (by 2.3 %) between OSEM and BPL was observed, with little change in accuracy (increase by 0.9 %). Conversely, using either an SUVmax threshold of 2.5 or visual criteria, a divergence in sensitivity and specificity between OSEM and BPL was observed, with sensitivity increasing and specificity decreasing with BPL (Table 4).

Table 4 Diagnostic performance of ordered subset expectation maximum (OSEM) and Bayesian penalised likelihood reconstruction (BPL) in detecting malignant nodes based on (A) semi-quantitative analysis using a maximum standardised uptake value (SUVmax) threshold of 2.5, (B) optimum SUVmax threshold (3.0 and 4.0, respectively) and (C) visual analysis

To investigate if nodal size had an influence over diagnostic performance, the sensitivities, specificities and accuracies at the optimum SUVmax thresholds of 3.0 and 4.0 for OSEM and BPL, respectively, were repeated with the dataset dichotomised into two groups: >10 mm and ≤10 mm (Table 5). The results in nodes >10 mm (n=24) were identical between these two groups. In nodes ≤10 mm (n=88), there was a drop in sensitivity by 7.2 %, increase in specificity by 2.7 % and accuracy by 1.2 %. These marginal changes reflect the prior results when analysed as a cohort.

Table 5 Diagnostic performance of ordered subset expectation maximum (OSEM) and Bayesian penalised likelihood reconstruction (BPL) in detecting malignant nodes based on size, using optimum maximum standardised uptake value (SUVmax) threshold (3.0 and 4.0, respectively)

Discussion

This study demonstrated a significant increase in SBR and SUVmax of mediastinal nodes using BPL compared to OSEM in patients with NSCLC, examples are illustrated in Figs. 3 and 4. While SBR and SUVmax increased in both histopathologically positive and negative nodes, the %ΔSUVmax in histopathologically positive nodes was significantly higher (p=0.033, SUVmax mean difference 1.8 vs. 0.4, respectively).

Fig. 3
figure 3

A histopathologically proven involved 8-mm node in station 5 in adenocarcinoma of the left upper lobe. Ordered subset expectation maximum (OSEM) maximum standardised uptake value (SUVmax) 3.8, increasing to 5.0 on Bayesian penalised likelihood reconstruction (BPL). The blood pool SUVmean difference was 0.1. All positron emission tomography (PET) images are displayed on SUV scale 0–6

Fig. 4
figure 4

A histopathologically proven involved 13-mm node in station 10L in a patient with squamous cell carcinoma. Ordered subset expectation maximum (OSEM) maximum standardised uptake value (SUVmax) 5.4, increasing to 7.7 on Bayesian penalised likelihood reconstruction (BPL), with a visually appreciable decrease in noise of the mediastinal blood pool. All positron emission tomography (PET) images are displayed on SUV scale 0–6

There were also significant increases in node SNR using BPL, which is of relevance as OSEM algorithms can underestimate lesion activity when located in a relatively FDG-avid background [7]. In the context of mediastinal nodes, the surrounding blood pool may render faintly FDG-avid abnormalities less visually conspicuous. The resultant effect of BPL is demonstrated by the increase in sensitivity based on visual analysis (76 % from 68 %, Table 4). Some of the increase in SUVmax can be attributed to PSF modelling being included within the BPL and not within OSEM. PSF modelling incorporates information about the PET detector response into the reconstruction algorithm which leads to an improved image, especially for small lesions. While comparison between OSEM with PSF modelling (SharpIR on GE systems) and BPL might seem more appropriate as BPL includes PSF modelling, OSEM with PSF has not been adopted as standard of care in our centre. This is due to the increased intervoxel covariance seen with the PSF modelling [9, 16] which causes images to appear very heterogeneous. We have seen here that background metrics (Table 1, Figs. 3 and 4) remain very similar when moving from OSEM to BPL despite the addition of PSF modelling.

Despite improved definition of nodal FDG uptake and a relatively higher SUVmax increment in histopathologically positive nodes using BPL, ROC analysis of the usefulness of SUVmax as a single semi-quantitative parameter between both algorithms showed that BPL did not significantly improve the performance for diagnosing nodal disease (for example, accuracy 72.3 % (OSEM) to 73.2 % (BPL) at optimum SUVmax thresholds, Table 4). The observation of two OSEM FDG-negative histopathologically positive nodes remaining FDG-negative on BPL lends weight to this although the numbers are small (these two nodes were also sub-centimetre).

Interestingly, performing the same analysis dichotomising the dataset according to nodal size (>10 mm, ≤10 mm), demonstrated the same pattern of (small) change in diagnostic performance from OSEM to BPL only in nodes ≤10 mm (Table 5). This may suggest that the difference in performance is size dependent, although the effect may be due to the considerably larger number of nodes ≤10 mm (n=88) compared to nodes >10 mm (n=24). It is important to stress that the difference is clinically negligible so conclusions as to whether BPL confers any advantage or vice versa to evaluation of subcentimetre nodes using semi-quantitative analysis cannot be drawn from this observation.

On applying the widely used SUVmax threshold of 2.5 [17] to both reconstructions, there was an expected increase in sensitivity and decrease in specificity using BPL compared to OSEM. On visual analysis, there was increased detection where a small, but relatively greater, proportion of nodes were ‘upgraded’ on BPL, which also resulted in a divergence of sensitivity and specificity. There was also a small decrease in accuracy from 62 % to 55 %.

Importantly the diagnostic performance was not significantly improved with BPL, even though the increments in SBR, SNR, and SUVmax contributed to increased visual detection. This is likely to be due to the increase in visualisation and semi-quantitative measurements occurring independent of their aetiology; glucose utilisation is better detected in both malignant and granulomatous nodes. Furthermore, when a semi-quantitative method of analysis is applied, a higher SUVmax threshold than is commonly used may be appropriate when using BPL, although there is no universally agreed level at present for OSEM [1820]. Similarly, a different threshold will have to be adopted when using a visual analysis. As PET technology (hardware and software) evolves, for example with BPL reconstruction, radiologists and physicians will need to adapt and re-learn to account for the improving quality in images. The need to potentially amend current parameters for disease detection and reporting is occurring in other areas of imaging due to improved imaging technology. Improved nodule detection using low dose CT, as seen in the National Lung Screening Trial, where low-dose CT demonstrated high sensitivity but low positive predictive values in lung cancer detection, has resulted in the definition of a positive screening result being refined [21].

There are a number of limitations of this study. Firstly, there is inherent sampling error as direct radiological-pathological correlation was not possible for individual nodes and the unit of analysis had to be based on each nodal station. Where there were multiple nodes in a single station on CT, a single node was chosen for analysis based on the degree of FDG-avidity followed by size, and both of these criteria may contribute to false-positive results. This was thought not to be of overall significance as primary evaluation of the data was centred on the difference due to the methods of PET reconstruction.

Finally, the cohort was skewed with a majority proportion of histopathologically negative nodes and small absolute number of positive nodes. This may explain the relatively small differences in semi-quantitative diagnostic performance between BPL and OSEM. The small number of positive nodes may also in part explain the relatively poor AUC values derived from semi-quantitative analysis. This was based on SUVmax as a standalone predictor of nodal positivity but was to some extent an expected finding. It is also concordant with the wider observations of 20–25 % false-negative and false-positive rates in PET for mediastinal nodal involvement [22, 23], confirming the importance of tissue sampling in mediastinal staging to determine nodal involvement [24].

Conclusion

BPL, an iterative reconstruction technique using a Bayesian penalised likelihood reconstruction algorithm, increases SBR, SNR and SUVmax of mediastinal nodes in NSCLC, compared to OSEM, the current standard of care. This led to an improvement in visual sensitivity using BPL. However, this did not improve the accuracy for determining nodal involvement, and suggests that the limitations of 18F-FDG PET/CT in nodal analysis in NSCLC are due to the inherent non-specificity in the reasons for FDG-avidity, and are likely to be unchanged by improvements in techniques in its detection.