Artificial intelligence-based PET denoising could allow a two-fold reduction in [18F]FDG PET acquisition time in digital PET/CT

Purpose We investigated whether artificial intelligence (AI)-based denoising halves PET acquisition time in digital PET/CT. Methods One hundred ninety-five patients referred for [18F]FDG PET/CT were prospectively included. Body PET acquisitions were performed in list mode. Original “PET90” (90 s/bed position) was compared to reconstructed ½-duration PET (45 s/bed position) with and without AI-denoising, “PET45AI and PET45”. Denoising was performed by SubtlePET™ using deep convolutional neural networks. Visual global image quality (IQ) 3-point scores and lesion detectability were evaluated. Lesion maximal and peak standardized uptake values using lean body mass (SULmax and SULpeak), metabolic volumes (MV), and liver SULmean were measured, including both standard and EARL1 (European Association of Nuclear Medicine Research Ltd) compliant SUL. Lesion-to-liver SUL ratios (LLR) and liver coefficients of variation (CVliv) were calculated. Results PET45 showed mediocre IQ (scored poor in 8% and moderate in 68%) and lesion concordance rate with PET90 (88.7%). In PET45AI, IQ scores were similar to PET90 (P = 0.80), good in 92% and moderate in 8% for both. The lesion concordance rate between PET90 and PET45AI was 836/856 (97.7%), with 7 lesions (0.8%) only detected in PET90 and 13 (1.5%) exclusively in PET45AI. Lesion EARL1 SULpeak was not significantly different between both PET (P = 0.09). Lesion standard SULpeak, standard and EARL1 SULmax, LLR and CVliv were lower in PET45AI than in PET90 (P < 0.0001), while lesion MV and liver SULmean were higher (P < 0.0001). Good to excellent intraclass correlation coefficients (ICC) between PET90 and PET45AI were observed for lesion SUL and MV (ICC ≥ 0.97) and for liver SULmean (ICC ≥ 0.87). Conclusion AI allows [18F]FDG PET duration in digital PET/CT to be halved, while restoring degraded ½-duration PET image quality. Future multicentric studies, including other PET radiopharmaceuticals, are warranted. Supplementary Information The online version contains supplementary material available at 10.1007/s00259-022-05800-1.


Introduction
Recent research in PET has focused on decreasing noise and increasing signal-to-noise ratios (SNR) [1]. Digital PET with silicon photomultipliers (SiPM) has led to improved timing, energy, spatial resolution, and effective time-of-flight (TOF) sensitivity [2][3][4][5]. This has resulted in faster scanning with less injected activity [1]. However, despite these advances, there is an ever-increasing demand for PET scans, which can contribute to significant delays in scheduling examinations and patient management.
DL-based denoising can either be associated with PET reconstruction or be used as a post-reconstruction tool. SubtlePET™ (Subtle Medical, Stanford, US, provided by Incepto, France) is a post-reconstruction PET denoising software that has been approved by the Food and Drug Administration and validated by the European Commission for [ 18 F]FDG PET [22]. The algorithm is based on deep convolutional neural networks (DCNN), the most common DL architecture [23,24]. SubtlePET™ uses multi-slice 2.5D encoder-decoder U-Net DCNN. It takes the pixel's neighborhood into account to reduce noise and increase image quality. Using a residual learning approach that is optimized for quantitative (L1 norm) and structural similarity (SSIM), it has learned to separate and suppress noise components while preserving non-noise components.
Recently, SubtlePET™ processed [ 18 F]FDG PET images obtained with 33% less injected activity gave similar visual and quantitative performances to native PET in analog PET/ CT without time-of-flight (TOF) [25]. Promising results were also reported while reducing reconstructed PET acquisition time by 75% using analog and digital PET/CT (with or without TOF) in a smaller study population with a substantially higher original time-activity PET product [26]. Our group demonstrated the stability of most [ 18 F]FDG PET radiomics features while applying this software without study count reduction [27].
In this prospective study, we aimed to evaluate the feasibility of halving PET acquisition time in a routine clinical setting by using SubtlePET™ while preserving visual and semi-quantitative PET performances in digital TOF PET/ CT.

Patient selection
One hundred ninety-five adult patients referred to our comprehensive cancer center for initial or follow-up [ 18 F]FDG PET/CT from end-January to end-February 2021 were prospectively included in this study. The only exclusion criterion was a specific acquisition protocol involving a longer acquisition time per bed position on the head and neck or liver areas.
This non-interventional clinical study was approved by the local institutional review board from the François Baclesse Comprehensive Cancer Center and was registered with the French Health Data Hub under reference N° F20210720123322 on 20 January 2021. All patients provided informed consent to the use of their data.

Imaging protocol and processing
All exams were performed in accordance with the EANM imaging guidelines [28] on a digital SiPM PET/CT (VEREOS, Philips Healthcare). After a 6-h fasting period, patients were injected with 3 MBq/kg [ 18 F]FDG intravenously.
Before each PET scan, a low-dose non-contrast-enhanced CT scan was acquired for attenuation correction and as an anatomical reference. CT scan parameters were 100-140 kV, with variable mAs according to a dose right index of 14 and an iterative reconstruction Idose of 4:64 × 0.625-mm slice collimation, the pitch of 0.83, rotation time 0.5 s, 3D modulation, matrix 512 × 512 and voxel size 0.97 × 0.97 × 3 mm 3 .
PET acquisition, 1 h post-injection, was recorded in list-mode. Its field comprised at least the skull base to the upper thigh and was extended to total body acquisition if needed. Two PET reconstructions were performed: one for routine clinical purposes using the full acquisition time of 90 s per bed position ("PET90"), and a second one using 45 s per bed position for the purpose of this study ("PET45"). For both reconstructions, we used 3D ordered subset expectation maximization (3D-OSEM) with Point Spread Function (PSF), 4 iterations with 4 subsets (4i4s), a 2 × 2 × 2 mm 3 voxel size, and 288 × 288 matrix size. Scatter, attenuation, and random corrections were computed. PET45 images were processed by SubtlePET™ and are referred to hereafter as "PET45AI." A fully automatic workflow allowed image transfer as well as denoising. A common and affordable NVIDIA 1080 GPU processor was used for SubtlePET™.

Visual analysis
Original blinded PET90 and PET45AI were reviewed sideby-side by five experienced nuclear medicine physicians on a Syngo.via viewing server (version VB 30A, Siemens Healthcare). Each reader interpreted a unique part of the study population (all images per patient) and did not review PET/CT scans they had previously seen in clinical practice.
Readers attributed a global, whole-image quality (IQ) score to each PET series: 1 = poor; 2 = moderate; 3 = good. It was based on global and hepatic image noise and on normal tissue contrast.
All lesions with increased [ 18 F]FDG uptake were notified on each PET series. For each lesion, the readers specified the preferred PET series for detection (related to the contrast-to-background ratio), the supposed nature, i.e. malignant (primary tumor, local recurrence, (nodal) metastasis), benign or indeterminate, and its location.
Additionally, to evaluate the incremental value of AIbased denoising, PET45 was compared to PET90 and PET45AI in 146 patients (due to missing data).

Semi-quantitative analysis
Lesions were independently and semi-automatically segmented on each PET series, using the 50% 3D-isocontour of the maximal pixel value.
In each lesion volume-of-interest (VOI), the following standardized uptake values based on lean body mass (SUL) were measured: SUL max and SUL peak . LBM was estimated using the Janma formula [29].
The metabolic volume (MV) of the lesion and, when feasible, its short and long axes on the associated CT were calculated.
In addition, the reference liver SUL mean with its standard deviation (SD) were collected in a 3 cm-diameter VOI in the right liver lobe, identical for each PET series.
Both standard and EARL 1 (European Association of Nuclear Medicine Research Ltd) SUL were analyzed. EARL 1 SUL was obtained numerically by Gaussian postfiltering within Syngo.via (EQ.PET filter [30]), with a full width at half maximum (FWHM) of 7.2 mm for all PET series. Our center is EARL accredited, and we use EARL 1 SUL in routine practice for quantification, as it is transposable to different PET cameras and reconstructions [31].
Lesion-to-liver ratios (LLR) were calculated as SUL/ SUL mean liver and the coefficient of variation in the liver (CV liv ) as SD/ SUL mean .

Statistical analysis
Shapiro-Wilk testing found all quantitative variables (except for denoising processing time and SUL differences) to be nonnormally distributed, further expressed by the median and interquartile range (IQR).
IQ scores between two PET series were compared by the Wilcoxon signed-rank test with continuity correction for paired data. Concordance rates of lesion detection between PET90 and PET45AI and between PET90 and PET45 were compared by the chi-squared test. Differences in continuous quantitative variables (semi-quantitative PET measures) between two PET series were statistically analyzed by the Wilcoxon signed-rank test for paired data.
Bland Altman plots were used to display absolute SUL differences between PET90 and PET45AI, with Limits of Agreement (LOA) computed as the mean difference ± 1.96 × SD. A logistic uni-and multivariable regression analysis was carried out to look for predictive factors of a decrease of over 10% in SUL max in PET45AI vs PET90.
This decrease threshold of 10% was set by the required accuracy of SUL calibration within 10% for VEREOS PET, according to the AAPM report 126 [33]. Bonferroni correction for statistical significance level was used in univariable logistic regression analysis. Elsewhere, P-values < 0.05 were considered statistically significant. Analyses were conducted with R version 4.0.2.

Patient population and image processing
The main characteristics of the 195 patients included in this study are shown in Table 1. All two-fold count reduced PET series (PET45) were successfully treated by the denoising software with a mean processing time of 90 s (min-max; 45-122 s).
Concerning lesion detection, 33 out of 195 patients presented a normal and concordant examination on both PET series. In the remaining 162 patients, a total of 856 lesions were detected.
Of these, 836 lesions were visualized in both original PET90 and denoised PET45AI, resulting in a lesion concordance rate of 97.7%. Seven out of 856 (0.8%) small and low-uptake lesions were detected exclusively on PET90 in 6 patients ( Table 2). Thirteen foci (1.5%) were detected only on PET45AI in 10 patients, mostly corresponding to indeterminate liver lesions. An illustration is shown in Fig. 1.
There was no per-lesion preferred PET series for detection in 86% of lesions. On the other hand, original PET90 was preferred for 12% and PET45AI for 2%.

Statistical comparison of standard values
Lesion SUL max , SUL peak , LLR, and CV liv were significantly lower in denoised PET45AI than in original PET90 (P < 0.0001) ( Table 3). In contrast, lesion MV and liver SUL mean were higher in PET45AI than in PET90 (P < 0.0001). Lesion SUL, MV, LLR, and liver SUL mean showed a good-to-excellent correlation between both PET series (≥ 0.873 up to 0.998).

Statistical comparison of EARL 1 values
Lesion EARL1 SUL peak was not significantly different between both PET series (P = 0.09). Otherwise, the comparison of EARL1 SUL and derived measures between PET90 and PET45AI was similar to the comparison of standard measures.

Absolute and relative differences
Bland Altman (Fig. 2) plots show the absolute difference between both PET series in SUL max and SUL peak (both The other average absolute differences were close to 0. The mean ± SD relative differences in PET45AI compared to PET90 reached − 9.48 ± 11.50% for standard SUL max , − 3.41 ± 7.17% for standard SUL peak , − 3.74 ± 7.34% for EARL 1 SUL max, and − 1.37 ± 5.71% for EARL 1 SUL peak of lesions. For liver SUL mean , the mean relative difference was + 5.64 ± 4.75% and 5.88 ± 3.93% for standard and EARL 1 measures, respectively. Table 4 shows lesion characteristics (size and uptake) according to their detectability. Most discordant and preferred lesions had a low-to-moderate uptake and size.

In lesion SUL max
Multivariable logistic regression analysis indicated two independent predictors of a SUL max decrease of over

Fig. 1 Two concordant and two discordant PET images between PET90 and PET45AI
In a several hepatic (oblique red arrows) and a spinal bone metastasis (vertical upward red arrows) in a female patient with breast cancer were detected on both original PET90 and denoised PET45AI. In b a concordantly negative PET. In c a low-uptake, sub-centimetric left axillary lymph node (oblique red arrows) in a patient referred for left breast cancer staging, classified indeterminate and exclusively detected on original PET90. In d an indeterminate liver focus exclusively annotated on PET45AI (vertical downward red arrows) in a male patient scanned for advanced lung cancer staging 10% in PET45AI compared to PET90, namely SUL max in PET45AI (P < 0.0001) and CT long axis (P = 0.01) ( Table 5). Supplementary Fig. 3 shows that the smaller the lesion size on CT and the lower the SUL max , the greater the probability of a negative SUL max bias over 10%.   IQ scores were lower in PET45 (median: 2) than in both PET90 and PET45AI (median: 3), P < 0.0001. Poor IQ scores (= 1) were exclusively found in PET45 scans (n = 12; 8%). IQ was scored moderate (= 2) in 99 (68%) PET45 examinations vs in 13 (9%) PET90 and 16 (11%) PET 45AI, the remainder being considered of good image quality.

Semi-quantitative analysis
Lesion standard SUL max was significantly higher in PET45 than in PET90 (P ≤ 0.0001, with an average ± SD relative bias of + 3.30 ± 10.34%). Lesion standard SUL peak , EARL1 SUL peak and EARL1 SUL max were similar in PET90 and PET45.

Discussion
This prospective study shows good visual and semi-quantitative performances of AI-denoised half-count PET compared to original PET in a digital PET/CT. We simulated a two-fold reduction in the PET acquisition time and then applied a commercially available PET denoising software based on U-net DCNN. All PET series were successfully denoised within 2 min in an automatic workflow using a common GPU card. This makes it compatible with routine clinical use. Visually, global image quality scores were similar between PET90 and PET45AI but lower and clinically insufficient in half-count PET45 due to high noise. We obtained few discordances (2.3%) between original PET90 and denoised PET45AI in the absolute detection of 856 lesions.
A total of 0.8% of lesions were detected only on PET90 in 3% of patients. This concerned sub-centimetric or small lesions with a maximum SUL max of 3.1 g/ml. Most of these "original PET90-only or false-negative lesions in PET45AI" were classified as authentically malignant (71%) or indeterminate (29%). Many other concordant malignant lesions were detected in all but one of these patients.
A total of 1.5% of lesions were exclusively visualized on denoised PET45AI in 5% of patients. These "false positives" were predominantly located in the liver and interpreted as indeterminate or benign foci. For most lesions, there was no per-lesion preferred PET series for detection. However, in a minority of lesions (12%), original PET was preferred and less frequently (in 2%) denoised PET. Whether on original or on denoised PET, preferred lesions showed a variable uptake and size, mostly low-to-moderate. More expertise in the reading of these new denoised PET images could further improve the accuracy and comfort of readers.
A higher lesion detection discordance rate (> 10%) was found between PET90 and half-duration PET45 than between PET90 and PET45AI, with particularly additional false positives in PET45. This further renders half-count PET not compatible with routine clinical use. Similar results were observed in [21], with also a decrease in diagnostic confidence when dividing acquisition time by two.
Comparing semi-quantitative SUL measures in lesions between PET90 and PET45AI, only harmonized EARL 1 SUL peak was not significantly different when using the same Gaussian post-filter for both PET series. Standard Table 5 Uni-and multivariable logistic regression analysis for predicting a negative ΔSULmax above 10% in PET45AI compared to PET90 * statistically significant. 1 of PET45AI. OR, odds ratio; BMI, body mass index; MV, metabolic volume; CV liv , coefficient of variation in the liver. CV liv _Ratio = CV liv (PET45AI) / CV liv (PET90). PET45AI SUL max values were used to build a predictive model focusing on the end result, namely denoised and not original PET. However, we obtained the same results with original PET90 SUL max . A negative ΔSULmax above 10% concerned 383 lesions (46%). Few lesions showed an increase above 10% in SUL max (n = 9; 1.0%) on PET45AI vs PET90, not further analyzed SUL peak and standard and EARL 1 SUL max were lower in denoised PET45AI than in original PET90. The average relative difference remained below 10% for all lesion SUL. Greater SUL biases occurred especially in lesions with a moderate size and uptake and mostly "non-target and non-evaluable lesions" according to PERCIST criteria [34,35]. In our quantitative study, all lesions were taken into account. The overrepresentation of small, lowuptake lesions negatively affected quantitative differences between both PET series.
On the other hand, SUL mean in the reference liver was slightly higher (on average + 6%) in PET45AI than in the original PET90. Its standard deviation and thus its noise levels were lower (on average − 12% for standard CV liv ). The decrease in CV liv highlights the denoising efficacy even when dividing study counts by two.
Some other research groups have found even lower SUV biases, despite a higher study count reduction, especially while using CycleGANs as DL architecture [16,17] or Subtle PET™ (U-net) [26]. However, their studies were performed on different and/or smaller cohorts.
A pilot study of 10 small lung nodules suggested that a fully 3D U-net compared to a 2.5D U-net, as used in our study, may offer better lesion quantitative performance, even though visual image quality was similar [19]. However, 2.5D U-net is useful for routine clinical practice owing to its shorter computational time and lower processing capacity requirement.
Nevertheless, probably more important than these differences in semi-quantitative measures was their correlation between original PET90 and PET45AI, in particular for lesion SUL.
This inter-PET correlation was very high for lesion SUL and MV (with ICCs of at least 0.97) and high for liver SUL mean (with ICCs of at least 0.87), testifying to the stability and reliability of these measures obtained after PET count reduction and denoising.
A strength of our study is a large number of lesions of very different sizes, uptake, nature, and location.
Study limitations are the side-by-side reading methodology which could have enhanced the detection accuracy in PET45(AI). Second, the clinical impact of denoised PET has not been properly established. Third, the unlimited lesion number per patient led to a potential statistical bias due to the over-representation of dependent lesions in the same patients. Fourth, the effect of AI-denoising on image artifacts was not studied. A final small drawback is a use of harmonized EARL 1 SUL measures, which are still widely used, and not more recent EARL 2 values [36].
Our study thus supports the routine use of Subtle PET™ combined with a two-fold faster PET acquisition.
The benefit of decreasing PET duration, thus reducing waiting time for appointments and helping patients who experience discomfort, outweighs the minor decrease in performance. Although not properly studied, our findings could also lead to a reduction in injected activity or a combination of both (activity and time). Initially, an Italian group reported a similar performance of Subtle PET™ treated PET with 33% less injected [ 18 F]FDG activity compared to native PET in non-TOF analog PET/CTs [25].
Further research should be carried out on ways to increase performances, e.g. by optimizing the DL-model and/or adapting acquisition time in liver and regions of interest. Furthermore, large multicentric studies with different PET cameras, reconstruction parameters, and various reductions in [ 18 F]FDG PET acquisition time-activity product are necessary. Striking the optimal balance between performance and time savings is essential. Moreover, research with other PET radiopharmaceuticals is warranted.

Conclusion
This prospective study demonstrates the satisfactory preservation of [ 18 F]FDG PET image quality and quantification when applying AI-based denoising on half-duration PET compared to original full-duration PET. AI restored degraded and clinically insufficient image quality of half-duration PET. It paves the way for a significant reduction in acquisition time and the optimization of PET imaging equipment in routine clinical practice. ration and statistical analysis), Cyril Jaudet, and Aurélien Corroyer-Dulmont (material preparation and data analysis). The first draft of the manuscript was written by Kathleen Weyts, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Funding We benefitted from a 1-month free trial period of SubtlePET™.
Data availability The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.