Quantitative evaluation of a deep learning-based framework to generate whole-body attenuation maps using LSO background radiation in long axial FOV PET scanners

Attenuation correction is a critically important step in data correction in positron emission tomography (PET) image formation. The current standard method involves conversion of Hounsfield units from a computed tomography (CT) image to construct attenuation maps (µ-maps) at 511 keV. In this work, the increased sensitivity of long axial field-of-view (LAFOV) PET scanners was exploited to develop and evaluate a deep learning (DL) and joint reconstruction-based method to generate µ-maps utilizing background radiation from lutetium-based (LSO) scintillators. Data from 18 subjects were used to train convolutional neural networks to enhance initial µ-maps generated using joint activity and attenuation reconstruction algorithm (MLACF) with transmission data from LSO background radiation acquired before and after the administration of 18F-fluorodeoxyglucose (18F-FDG) (µ-mapMLACF-PRE and µ-mapMLACF-POST respectively). The deep learning-enhanced µ-maps (µ-mapDL-MLACF-PRE and µ-mapDL-MLACF-POST) were compared against MLACF-derived and CT-based maps (µ-mapCT). The performance of the method was also evaluated by assessing PET images reconstructed using each µ-map and computing volume-of-interest based standard uptake value measurements and percentage relative mean error (rME) and relative mean absolute error (rMAE) relative to CT-based method. No statistically significant difference was observed in rME values for µ-mapDL-MLACF-PRE and µ-mapDL-MLACF-POST both in fat-based and water-based soft tissue as well as bones, suggesting that presence of the radiopharmaceutical activity in the body had negligible effects on the resulting µ-maps. The rMAE values µ-mapDL-MLACF-POST were reduced by a factor of 3.3 in average compared to the rMAE of µ-mapMLACF-POST. Similarly, the average rMAE values of PET images reconstructed using µ-mapDL-MLACF-POST (PETDL-MLACF-POST) were 2.6 times smaller than the average rMAE values of PET images reconstructed using µ-mapMLACF-POST. The mean absolute errors in SUV values of PETDL-MLACF-POST compared to PETCT were less than 5% in healthy organs, less than 7% in brain grey matter and 4.3% for all tumours combined. We describe a deep learning-based method to accurately generate µ-maps from PET emission data and LSO background radiation, enabling CT-free attenuation and scatter correction in LAFOV PET scanners.


Introduction
Attenuation correction of PET emission data is one of the essential corrections in PET image formation for accurate quantification. In the early generation of PET scanners, attenuation of 511 keV annihilation photons was measured from a separate transmission scan using an external radionuclide-based source (i.e. germanium-68) [1] and an attenuation map (µ-map) was generated. Although this method was able to directly measure the attenuation factors at the same energy with the annihilated photons, it suffered from noisy data and long acquisition times [2]. With the introduction of combined PET/CT systems [3], linear attenuation coefficients (LACs) at 511 keV are estimated from CT images (µ-map CT ) using a bilinear relationship with Hounsfield unit values [4,5].
Recently introduced long axial field-of-view (LAFOV) PET/CT systems have enabled total-body PET imaging using a single bed position [6,7]. In addition to large anatomical coverage that includes major body organs without the need for any bed movement, these systems markedly increase system sensitivity and noise equivalent count rates compared to standard axial FOV (SAFOV) PET scanners [8][9][10][11]. Furthermore, LAFOV was shown to provide PET images with superior image quality compared to SAFOV systems. These technological advancements can be utilized in a clinical setting by reducing the activity of the injected radiotracer without compromising the image quality and quantification accuracy [12,13] and reducing the PET examination time [14][15][16]. However, the benefits of low-dose PET examinations using LAFOV PET systems can be hindered by the dose associated with the CT scans performed for attenuation correction. While the CT provides important additional diagnostic information and accurate anatomical localization of PET findings, there are potentially numerous situations in which the requirement for CT can be waived: for example, where an anatomical CT scan is available from previous examinations performed during the work-up of the patient. Furthermore, CT-less protocols could be desirable in low dose PET/CT examinations for screening or in paediatric scans to minimize the ionization radiation-induced risks in the health of young patients or in research protocols.
The development of lutetium-based scintillators, such as lutetium oxyorthosilicate (LSO) scintillators [17], and introduction of silicon-based photomultipliers (SiPM) [18] resulted in substantial improvements in coincidence timing resolution with values close to 200 ps [10,19], increasing the accuracy and robustness of PET image reconstruction process with time-of-flight (TOF) PET reconstruction algorithms [20,21]. These advances also pushed the potential of methodologies which seek to jointly estimate the activity and attenuation from TOF-PET data [22][23][24] such as maximum likelihood estimation of attenuation and activity (MLAA) or maximum likelihood estimation of activity and attenuation correction coefficients (MLACF). Previous work has shown that incorporation of prior information, such as anatomical information derived from magnetic resonance imaging (MRI) data or other sources, can be used to improve the robustness of joint reconstruction methods, scatter correction in particular, by providing initial conditions [25][26][27]. Hwang et al. have shown that MLAA derived µ-maps from a PET/MRI scanner can be used as an input data to a deep learning-based method to synthesize more accurate attenuation maps [28].
The radioisotope lutetium-176 ( 176 Lu) found in LSO scintillators of PET detectors decays with a half-life of 38 billion years, emitting gamma rays with 307, 202, and 88 keV during the process [29]. We have previously demonstrated that this LSO background radiation can be detected using a high sensitivity LAFOV PET scanner and developed a method to generate µ-maps using MLACF algorithm with LSO transmission (LSO-TX) data (µ-map MLACF ) [30]. In this paper, we extend the previous method by incorporating a deep learning-based model to synthesize enhanced whole-body µ-maps (µ-map DL-MLACF ) based on µ-map MLACF images. We perform a quantitative comparison of µ-maps generated using the proposed deep learning-enhanced MLACF method against µ-maps generated using the MLACF and CTbased methods. Secondly, we evaluate the performance of the proposed method using pre-and post-injection LSO-TX measurements. Finally, we compare the PET images reconstructed using µ-maps based on MLACF-, DL-MLACF-, and CT-based methods and assess their quantitative performance on healthy and malignant tissues.

Patient population
Within this study, 18

Imaging protocol
The data used within this work were acquired using a dynamic PET protocol where the tracer administration was performed in the scanner. The protocol is illustrated in Fig. 1. Before the administration of 18 F-fluorodeoxyglucose ( 18 F-FDG) activity, a 5-min long LSO-TX acquisition was performed using a special acquisition protocol with open energy (160-725 keV) and coincidence timing windows (6.64 ns). Following, the 18 F-FDG was injected from the left or right arm (average activity: 234.6 ± 54.9 MBq, target dose: 3 MBq/kg) and list-mode PET data were acquired for 65 min using Biograph Vision Quadra (Siemens Healthineers, Hoffman Estates, IL, USA) LAFOV PET/CT system. In this work, we only used the PET emission data from 55 to 65 min post injection. After the PET acquisition, a second set of LSO-TX list-mode data was acquired for 5 min (65 to 70 min post injection). At the end of the study, low-dose CT (pitch factor: 1, maximum voltage: 120 kV, maximum tube current 90 mAs, CareDose4D, CarekV) data were acquired as part of the clinical examination. The CT images were reconstructed with a voxel size of 1.52 × 1.52 × 1.65 mm 3 . Figure 2 depicts the different methods used to generate attenuation maps in this work. CT-based µ-maps were generated by converting the Hounsfield units (HU) of the reconstructed CT images to attenuation correction factors using a bi-linear transformation [5]. These µ-maps were resampled to 440 × 440 × 645 matrix with a voxel size of 1.65 × 1.65 × 1.65 mm 3 , as used in standard PET reconstructions.

MLACF-derived µ-maps
The list-mode data acquired with wide-open energy and coincidence-timing windows were post processed and two LSO-TX sinograms, corresponding to LSO-TX at 307 keV and 202 keV, were generated by extracting events using energy windows of 275 to 355 keV and 165 to 247 keV respectively. Two initial µ-maps were reconstructed from these sinograms using maximum likelihood for transmission tomography method [31] with 8 iterations and 3 subsets. These µ-maps were mapped to 511 keV, and then averaged and smoothed using a Gaussian filter with a full width half maximum (FWHM) of 4 mm. The resulting LSO-TX derived µ-map and the TOF emission sinogram were used as inputs to MLACF algorithm to jointly reconstruct a PET image and an MLACF-derived attenuation map using 20 global iterations [30]. Two sets of MLACFderived attenuation maps were generated by using the LSO-TX data acquired pre-and post-18 F-FDG injection, referred as µ-map MLACF-PRE and µ-map MLACF-POST in the rest of the paper respectively. To minimize the effects of motion artefacts, MLACF-derived µ-maps were co-registered to CT-derived µ-maps by applying a combination of rigid and non-rigid registration using NiftyReg package [32]. The bending energy weight was set to 0.1% to constrain the degrees of freedom of the non-rigid deformation [33].

Deep learning based µ-maps
Convolutional neural networks (CNNs) were trained to enhance the MLACF-derived µ-maps using the paired µ-map CT as target images. To achieve this, we used a three dimensional UNET architecture [34,35] with five downsampling and five up-sampling layers, and parametric rectified linear unit (PReLU) used as the activation function.
Multiple patches with a matrix size of 64 × 64 × 64 were used in the training. The input images were normalized to zero mean and unity variance. We performed data augmentation by randomly applying ± 20% image scaling and ± 10% image rotation. We trained and tested the networks using fivefold cross-validation, where for each fold, the data were split to 14 training (78% of the data) and 4 testing sets (22% of the data). Separate models were trained using µ-map MLACF-PRE and µ-map MLACF-POST images as input images and same cross-validation folds were used across these models. The predicted attenuation maps from models trained using µ-map MLACF-PRE and µ-map MLACF-POST images are referred as µ-map DL-MLACF-PRE and µ-map DL-MLACF-POST respectively.

PET image reconstruction
The PET emission data from 55 to 65 min post injection were reconstructed using µ-map MLACF-PRE , µ-map MLACF-POST , µ-map DL-MLACF-PRE , µ-map DL-MLACF-POST and µ-map CT for each subject. The PET images reconstructed using the different µ-map methods are referred as PET MLACF-PRE , PET MLACF-POST , PET DL-MLACF-PRE , PET DL-MLACF-POST , and PET CT . The PET images were reconstructed with PSF + TOF algorithm using 4 iterations and 5 subsets using a dedicated image reconstruction software prototype (e7-tools, Siemens Healthineers). The emission data were corrected for decay, randoms, and scatter. The image matrix was set to 440 × 440 × 645 with a voxel size of 1.65 × 1.65 × 1.65 mm 3 . A Gaussian post-reconstruction filter was applied with a FWHM of 2 mm.

Data analysis
The generated attenuation maps and PET images were evaluated using regional analyses. The percentage relative mean error (rME) and relative mean absolute error (rMAE) values were calculated using Eqs. 1 and 2: where I x represents µ-maps generated using MLACF-or deep learning-based methods or PET images reconstructed using these µ-maps. Similarly, I ref represents µ-map CT or PET CT . The µ-map CT images were segmented into 3 VOIs: waterbased soft tissue, fat-based soft tissue and bones using a thresholding algorithm. Bones were segmented by only including voxels with a LAC greater than 0.105 cm −1 followed by a flood-fill operation to include the bone marrow in the segmentations. Fat-and water-based soft tissue segmentations were obtained by thresholding voxels with LAC values outside 0.080-0.090 cm −1 range and 0.090-0.105 cm −1 range respectively. Furthermore, three dimensional segmentations of liver, lungs, kidneys, spleen, grey and white matter of the brain were obtained using a semi-automatic  [36,37]. Hypermetabolic tumour lesions (n = 24) were delineated by a qualified nuclear medicine physician using an isocontour tool (PMOD 4.1, threshold set to 50% of max value).

Statistical tests
Nonparametric two-sided Wilcoxon signed-rank tests were used to assess differences between different µ-maps and reconstructed PET images. Statistically significance was considered for P-values lower than 0.05. Spearman's rank correlation was used to assess any potential relationship between the accuracy of the method and patient BMI and Spearman's rank coefficient (r s ) and P-values are reported.

Results
Attenuation maps generated using CT-(µ-map CT ), MLACF-(µ-map MLACF-PRE and µ-map MLACF-POST ) and deep learning-enhanced MLACF (µ-map DL-MLACF-PRE and µ-map DL-MLACF-POST ) methods and corresponding rME maps for a representative subject are shown in Fig. 3. There were no visual differences between µ-map MLACF-PRE and µ-map MLACF-POST , and between µ-map DL-MLACF-PRE and µ-map DL-MLACF-POST . The µ-maps generated using the MLACF-based method had some artefacts, where the attenuation correction factors in the skull, skin, and bladder of the patient were overestimated. These artefacts were significantly improved in the µ-maps generated using the deep learning-based method. In overall, the deep learningenhanced MLACF method produced µ-maps with less noise and a good visual resemblance to µ-map CT .
These findings were further validated with the quantitative VOI-based assessments shown in Fig. 4. It is shown Outliers are plotted using individual points 42 kg/m 2 , which was an outlier in terms of rME and rMAE values. The µ-maps of this patient and another larger patient with a BMI of 32.5 kg/m 2 are illustrated in supplementary Figs. 1 and 2 respectively. Figure 5 shows PET images of a representative subject reconstructed using CT-, MLACF-, and deep learningenhanced MLACF µ-maps together with their rME maps. The PET DL-MLACF-PRE and PET DL-MLACF-POST images closely resembled the PET CT images. The VOI-based rMAE results showed a 3.0-times reduction in fat-based soft tissue, 2.4times reduction in water-based soft tissue and 2.5-times reduction in bones in PET DL-MLACF-POST compared to PET MLACF-POST images (Fig. 6). Similar to µ-map results, no significant difference was observed between VOI-based rMAE values of PET DL-MLACF-POST and PET DL-MLACF-PRE images (P = 0.78 in fat-based soft tissue, P = 0.91 in waterbased soft tissue and P = 0.98 in bones). Figure 7 illustrates the average percentage error in SUV mean values in organs of interest and brain grey and white matter. The PET DL-MLACF-POST achieved an average absolute error of less than 4% in the liver and spleen, 4.7% in the lungs, 6.7% in the grey matter, and 5.6% in the white matter of the brain. Figure 8 shows the absolute errors in SUV mean of tumour lesions, grouped per their anatomical location. Bone lesions showed a 3.2-times absolute error reduction for PET DL-MLACF-POST compared to PET MLACF-POST , where thoracic lesions demonstrated a 2.7-times absolute error reduction. In average, the PET DL-MLACF-POST achieved an absolute percentage error of 3.6% in abdominal, 2.9% in bone, 4.4% in pelvic, and 4.8% in thoracic lesions. We observed larger errors in cervical lesions for all methods, where the mean absolute error was 12.7% for PET DL-MLACF-POST and 12.9% for PET DL-MLACF-PRE . However, it should be noted that these results were highly influenced by the values from the patient with an outlier BMI of 42 kg/m 2 with seven cervical lesions. Excluding this subject, the average absolute error was 4.2% for PET DL-MLACF-POST and 3.5% for PET DL-MLACF-PRE . The absolute error, averaged across all Fig. 5 Top row: PET images of a representative subject reconstructed using the CT-, MLACF-, and deep learning-enhanced MLACF-based attenuation maps. PET images reconstructed using MLACF-and DL-based µ-maps generated using pre-and post-injection LSO-TX data are shown separately. Bottom row: voxelwise maps of relative error distribution of PET images relative to the PET image reconstructed using the CT-based µ-map tumours excluding the outlier patient, was reduced from 9.6 to 4.3% for PET DL-MLACF-POST compared to PET MLACF-POST with a statistically significant difference between methods (P < 0.001). All tumours combined, no significant difference was observed between PET DL-MLACF-PRE and PET DL-MLACF-POST (P = 0.23).
As also described above, we observed relatively larger errors in µ-maps and reconstructed PET images of the

Discussion
The introduction of LAFOV PET scanners with increased system sensitivity compared to SAFOV PET scanners opens opportunities for low-dose PET imaging protocols. Although the risks of the equivalent dose associated with nuclear medicine imaging are modest [38], there remains sufficient concern to warrant a number of studies exploring the potential for lower activity PET scans without compromising image quality via a number of approaches [39,40]. However, the value of low-dose PET imaging protocols can be hindered by the CT scans required for attenuation correction. Although CT is a critical part of most clinical PET/CT studies and delivers important anatomical and diagnostic information to the interpreting physician, further reductions in the patient dose through omission of the CT component could find utility in some specific clinical scenarios. For instance, a CT-less method for PET attenuation correction might be desirable in longitudinal or follow-up PET scans where a CT scan is already available from the patient's work up. The higher sensitivity of LAFOV systems can be exploited for acquisition of images at later time points [41], dual-time-point studies [42,43], or as part of abbreviated dynamic imaging protocols [44]. It can also be used for dose reduction in neuroimaging studies where an MR scan is often available for anatomical information or to reduce radiation exposure in cancer screening and paediatric studies.
In this work, we exploited the high sensitivity of a LAFOV PET system to detect LSO-TX events and used a joint reconstruction and deep learning-based method to construct attenuation maps from the LSO-TX data. Qualitative and quantitative analyses indicate that the deep learning-enhanced MLACF method was able to generate µ-maps with better resemblance to CT-based µ-maps than the µ-map MLACF , particularly improving the overestimation of the attenuation coefficients in the skin and skull of the patients, addressing the crosstalk issues around the bladder, and reducing the noise present in µ-map MLACF . PET images reconstructed with µ-map DL-MLACF-PRE and µ-map DL-MLACF-POST showed less than − 3.6% rME in fat-based soft tissue, water-based soft tissue, and bones. Furthermore, mean organ and tumour SUV values calculated from PET DL-MLACF-PRE and PET DL-MLACF-POST images had less than 7% absolute error compared to mean SUV values from PET CT images. Quantitative VOI-based comparisons showed no significant differences between µ-map DL-MLACF-PRE and µ-map DL-MLACF-POST . These results indicate that the presence of PET activity had negligible effect on the quality of LSO-TX images and the proposed method achieved comparable performance with pre-and post-injection LSO-TX data. The LSO-TX data can also be acquired simultaneously with PET emission data, in our case reducing the total scan duration to five minutes.
The use of deep learning-based methods in PET attenuation correction has been increasingly popular, particularly in PET/MRI imaging where lack of CT-based attenuation maps introduced significant challenges to accurate PET quantification [45]. In previous work, CNNs were trained using coregistered MR and CT images to generate pseudo-CT based µ-maps for head [46,47] and pelvis [48][49][50], which were shown to be more accurate compared to vendor-provided atlas based µ-maps. Besides, the use of supervised deep learning techniques such as CNN has limited performance in generating whole-body µ-maps as these techniques require perfectly aligned MR and CT whole-body images which is not straightforward. As an alternative, unsupervised methods with cycle-consistent GAN architecture were used to generate attenuation-corrected PET images from non-attenuationcorrected PET images [51,52]. Most related to our work, Hwang et al. [28] generated whole-body µ-maps using a CNN and initial µ-maps generated using MLAA joint reconstruction algorithm with TOF emission data. However, the lack of an initial attenuation and emission images can cause challenges in scatter correction during the joint reconstruction process and can lead to unscaled µ-maps with inaccurate attenuation factors [27]. In a more recent work, Hwang et al. proposed incorporating non-attenuation-corrected PET images in their method to estimate the scatter distribution [53]. Here, we suggest use of an LSO-TX derived to µ-map to provide initial conditions for scatter correction in the MLACF joint reconstruction algorithm.
In this work, we used CT-based µ-maps as target images during the training and evaluation of the methodology. While CT-based PET attenuation correction is often considered the gold standard, it can also suffer from some limitations. Truncation or beam-hardening artefacts can be introduced to CT images when the patient's arms are present in the field-of-view [54]. This is particularly an issue for patients with large BMIs [3]. Previous work has also shown that the use of CT-based AC can lead to some bias in linear attenuation coefficients of cancellous and compact bones, albeit the minor PET quantification errors caused by this might only be clinically significant in quantitative bone studies [55]. Furthermore, potential patient motion and the respiratory movement of the chest between PET and CT acquisitions can lead to spatial mismatch of images which can lead to incorrect PET attenuation correction factors [56]. In this work, to make a fair comparison, MLACF and DL-MLACF based µ-maps were co-registered to their CT pairs. However, it can be argued that the proposed method is less prone to misregistration errors when the LSO-TX data is simultaneously acquired with the PET emission data. Further evaluation with phantom data is required to assess the performance of our method in such scenarios.
Another limitation of this study was the relatively small sample size of our training set. In this work, we used crossvalidation to train and test our method using data from 18 subjects. Since one of the aims of this work was to evaluate the accuracy of the proposed method with LSO-TX data acquired pre-and post-tracer administration, the data used in this study were acquired using a dynamic 18 F-FDG protocol where the tracer administration was performed in the scanner. The logistical challenges of these lengthy dynamic scan protocols limited the size of our study cohort. In principle, the size of the training set can be increased in future studies using only post-injection LSO-TX data. We observed larger errors for one subject whose BMI was 41% above the average population BMI, suggesting that the proposed method might have limited performance in very large patients (i.e. BMI > 40 kg/m 2 ). This can be addressed in future work by enlarging the data pool and including more diverse population of patient data (i.e. larger patients) in the model training. Furthermore, the MLACF-based µ-maps which are used as the only input to our model were jointly reconstructed using LSO-TX and 18 F-FDG emission data. Further investigation is needed to assess the performance of our method with other PET radiotracers. Finally, introduction of LAFOV PET/CT systems demostrates great potential in reducing patient dose in PET examinations. Further work includes evaluation of the method in PET scans with lower injected activities of radiopharmaceuticals.

Conclusion
We present the development and initial validation of a deep learning-based method to synthesize CT-free attenuation maps using information from LSO transmission and PET emission data. We demonstrated that the proposed method was able to generate accurate attenuation maps, independent of the timing of the LSO-TX scan, with strong correlation to CT-based attenuation maps. Results presented in this work suggest that the proposed method can enable CT-free quantitative PET imaging which might be beneficial in certain clinical scenarios and research studies.

Declarations
Ethics approval The local Institutional Review Board approved the study (KEK 2019-02193) and written informed consent was obtained from all patients. The study was performed in accordance with the Declaration of Helsinki.
Conflict of interest HS is a full time employee of Siemens Healthineers. MT, VP, DP, and MC are full time employees of Siemens Medical Solutions USA, Inc. AR has received research support and speaker honoraria from Siemens Healthineers. No other conflicts of interests were reported.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.