Introduction

Radioembolization with yttrium-90 (90Y) (TheraSphere™, Boston Scientific Corporation, Marlborough, MA, USA) glass microspheres is a well-established locoregional treatment option for hepatocellular carcinoma (HCC) [1]. In this procedure, glass microspheres of diameter 15–35 μm containing radioactive 90Y, are administered to liver tumours via catheterization of the hepatic artery. Since the hepatic artery almost exclusively supplies tumour vasculature, the microspheres are flushed into the target tissue where they lodge within tumour arterioles and deliver a localised radiotherapeutic dose via emitted beta radiation. The targeted administration of the microspheres facilitates an optimised dose distribution, where tumours are maximally dosed, and normal tissue largely spared.

Studies evaluating the relationship of dose and clinical outcomes demonstrate that, where personalised dosing is used to achieve a maximised tumour absorbed dose, outcomes (including response and survival) are optimised [2,3,4]. Personalised dosimetry may be implemented by using a pre-treatment scintigraphy (planar, SPECT or SPECT/CT) study, from which anticipated tumour absorbed dose, normal tissue dose and lung dose can be calculated for each individual. Single-photon emission computed tomography (SPECT) and computed tomography (CT) hybrid imaging (SPECT/CT) using 99mTc-macroaggregated-albumin (MAA) is the accepted standard for the pre-treatment or ‘scout’ procedure, which forms the basis of predictive dosimetry calculations. It has been widely evidenced that [99mTc]Tc-MAA is a viable surrogate agent [5, 6] and is consequently recommended in best practice guidelines as an essential step in the 90Y treatment workup [1].

Tumour absorbed dose and lung shunt fraction are two of the most frequently cited quantitative metrics derived from multi-compartment analysis of [99mTc]Tc-MAA SPECT/CT [7] and represent important indicators of the expected efficacy and safety of the therapy. However, quantitative values in [99mTc]Tc-MAA SPECT/CT can exhibit variability due to technical factors including; gamma ray attenuation in the patient, scatter in the patient and detector, imaging system resolution (partial volume effects), and image noise [8,9,10,11]. This can substantially impact quantitative accuracy. Comparable performance that allows for reliable comparison of quantitative values between systems is essential in multi-centre studies. To leverage [99mTc]Tc-MAA SPECT/CT as a dosimetry tool with minimal bias and variability, harmonization strategies must be implemented.

The TheraSphere Advanced Dosimetry Retrospective Global Study Evaluation in Hepatocellular Carcinoma Treatment (TARGET) study focused on the exploration of multi-compartment dosimetry based on [99mTc]Tc-MAA SPECT/CT in an international, retrospective multi-centre study [12]. Images were generated independently by each participating centre, and therefore a range of different imaging devices and protocols were used in the generation of the dataset. A broad spectrum of SPECT/CT systems were included, ranging from first generation scanners to those incorporating contemporary technologies with advanced acquisition and reconstruction capabilities. Similarly, a range of imaging protocols were included with varying acquisition and reconstruction parameters. These protocols made reference to international guidelines [13] and so shared some commonalities, but were largely distinct.

Conducting inter-site comparisons between centres using different devices, and where there was no uniform approach to imaging procedures, results in variability which must be characterised. In addition, widespread clinical application of the resulting dose thresholds beyond the study requires a harmonized method to quantify the SPECT images.

The aim of this phantom study was twofold; firstly to use multicentre phantom data to evaluate inter-site variability of dosimetry based on [99mTc]Tc-MAA imaging, and secondly, to implement and evaluate a standardised imaging protocol.

Materials and methods

Phantom images were used to identify the impact of acquisition and reconstruction related factors on dosimetry via measurement of simulated Lung Shunt Fraction (LSF) and tumour to normal tissue (T:N) ratio (evaluated via Contrast Recovery Coefficient (CRC)).

Phantom preparation

Each participating centre was provided with two phantoms; a National Electrical Manufacturers Association (NEMA) NU-2 image quality (IQ) phantom (Data Spectrum Corporations—NC) and a hollow cylinder (diameter 20 ± 2 cm).

NEMA IQ phantom

For the liver model a NEMA IQ phantom compliant with NU2 2007 standards [14] was utilised, which contained 6 spheres (representing tumoral liver) of diameter 10, 13, 17, 22, 28 and 37 mm and a cylindrical insert of diameter 50 mm inside a 9.7 L background volume (representing non-tumoral liver). Spherical inserts were filled with [99mTc]Tc-pertechnetate of activity concentration 0.24 MBq/ml, and the background volume with a concentration of 0.03 MBq/ml, such that there was an approximate 8:1 sphere to background ratio and a total activity of 300 MBq, per the TARGET protocol. The selected activities exceed those typically used clinically, due to the substantially larger volume of the NEMA phantom compared to a standard liver.

Cylindrical phantom

For the lung model a cylindrical phantom of diameter 20 ± 2 cm was utilised, with length sufficient to cover the axial field of view (FOV). The exact dimensions and volume of the phantom was recorded by each site. The phantom was filled with water and 30 MBq added, such that there was an approximate 10:1 ratio between the NEMA phantom activity and cylindrical phantom activity.

Each site provided details of injected activity, residual activity, volume measurements, date/time of measurements and dose calibrator information.

Phantom positioning

The phantoms were positioned adjacent to each other on the scanner couch (Fig. 1). The relative positioning of the phantoms was dependent on the given sites acquisition protocol and could be either feet-first or head-first.

Fig. 1
figure 1

NEMA and cylindrical phantom positioned for a feet-first protocol

Phantom imaging

Site-specific protocols

Each participating centre was asked to perform planar and SPECT acquisitions based on their site-specific clinical protocol used for pre-treatment [99mTc]Tc-MAA imaging. Planar images of the cylindrical phantom (representing the lungs) and NEMA IQ phantom (representing the liver) were used in the LSF evaluation. SPECT images of the NEMA IQ phantom were used for CRC evaluation. If patient data included in the TARGET clinical study was generated with more than one imaging protocol (i.e. an old and updated protocol), imaging was to be repeated for every protocol that had been utilised to scan patients.

Standardised protocol

Centres were then asked to perform a second acquisition and reconstruction using a standardized protocol (parameters shown in Table 1), which was identical for every centre. Reconstructions were centrally reviewed to confirm whether they matched the standard protocol. If a participating centre was unable to fully apply the standardized protocol due to technical reasons, a protocol deviation was noted within the final dataset, and they were excluded. The TARGET phantom study standardised acquisition and reconstruction protocol was designed to ensure widespread utility on a variety of scanner types and was not optimised for lesion detection or image quality.

Table 1 Acquisition and reconstruction parameters for the TARGET standardised protocol

Data collection

Each participating centre was required to submit full activity measurements and dose calibrator information, as well as a complete description of the site-specific planar and SPECT acquisition protocols (field of view positioning, acquisition parameters etc.) and site-specific reconstruction protocols (attenuation correction, scatter correction, point spread function modelling).

Planar data, SPECT projection data and reconstructed SPECT images generated with the site-specific and standardised protocol respectively were provided by each centre, together with the co-registered CT scan.

Image analysis

Lung shunt fraction investigation

The LSF was measured based on planar images, which is standard in clinical practice. Regions of interest (ROIs) were drawn around the liver and lungs using the NEMA IQ phantom and cylindrical phantom, respectively. The LSF was calculated as a percentage based on the counts identified in these ROIs, using Eq. 1.

$$LSF = \frac{{Counts_{Lung} }}{{Counts_{Liver} + Counts_{Lung} }}$$
(1)

ROIs were defined automatically according to the following steps: a scatter window was subtracted from the photopeak window using a k-factor of 0.5 and corrected to account for the possible difference in window widths. A geometric mean image was constructed by blurring the anterior and posterior views, multiplying them, and taking the square root on a pixel-by-pixel basis. ROIs were automatically defined around the liver phantom and the lung phantom separately [15]. Both ROIs were dilated to ensure all counts were included. Scatter correction was retrospectively applied to data acquired using the standardised protocol centrally.

Tumour dosimetry investigation

Contrast recovery coefficients were calculated and regarded as a proxy measure of tumour to normal tissue ratio. SPECT images of the NEMA IQ phantom were analysed by registering a volume of interest (VOI) template over the spherical inserts and in the background compartment. Contrast recovery coefficients were calculated for each sphere size by dividing the measured contrasts by the nominal, true, contrasts determined from the reported injected activities, as shown in Eq. 2.

$$CRC = \frac{{C_{s} /C_{B} - 1}}{R - 1}$$
(2)

where Cs is the mean measured signal in the spherical VOIs, CB the mean measured signal in the background, and R the imposed sphere-to-background activity concentration ratio. Sphere VOIs were defined automatically using a fitting routine in which a template of the spheres was registered to the SPECT and the mean signal in the VOIs was simultaneously optimised. To estimate the background signal, an annular VOI was placed onto a 5 cm stack of slices adjacent to those containing the spheres. This method deviates from the NEMA standard (which stipulates many circular VOIs be placed around the spherical VOIs), however this was deemed necessary as the reconstructed activity distribution in images generated without attenuation correction or scatter correction becomes inhomogeneous with hotter regions near the edge of the phantom, and consequently the background signal would be overestimated.

Statistical methods

Summary statistics were calculated for the standardised protocol and all site-specific protocols (including from centres with multiple site-specific protocols and excluding data obtained where deviations from the standard protocol were noted), for the endpoints given in Table 2.

Table 2 Statistical endpoints

The analysis of the difference in LSF and CRC between centres constituted a camera-by-camera comparison using a paired t test, where the standardised protocol was compared with the site-specific protocol for the same camera (NB. Only sites which contributed data for the standardised and site-specific subgroups were included in this paired analysis).

Processing of the data was performed centrally, using the Python programming language.

Results

General overview

Data was collected over 4 years from 2016 to 2020. A total of 23 SPECT systems (25 site-specific SPECT protocols and 16 site-specific planar protocols) across 14 sites were used for imaging within the TARGET study and included in the current analysis (an overview is given in Table 3). The datasets were 39% (n = 9) from Siemens, 52% (n = 12) from GE and 9% (n = 2) from Philips.

Table 3 Summary of SPECT/CT systems and site-specific imaging protocols per system

There were 18 hybrid SPECT/CT scanners, which included a CT scan for attenuation correction (AC), 3 systems were SPECT only and were reliant on a post-registered CT for attenuation correction or Changs [16] correction method, and 2 systems did not supply any attenuation corrected reconstructed data. Scatter correction (SC) was implemented in 19 of the site-specific protocols and 14 implemented point spread function (PSF) modelling (Table 3).

Lung shunt fraction investigation

Phantom filling

The ratio between activity in the cylindrical ‘lung’ phantom and the NEMA IQ ‘liver’ phantom as required by the protocol was 1:10, corresponding to a ‘true’ lung shunt fraction of 1/(1 + 10): approximately 9.1%. From the reported activities by the participating centres, the injected activities resulted in a median ‘true’ lung shunt fraction of 9.1% (interquartile range (IQR): 8.9%-9.3%). These values are summarised in Table 4, and the raw activity data is given in Supplemental file table S1.

Table 4 Summary of true Lung Shunt Fraction (LSF) values based on injected activities for LSF investigation and the measured Lung Shunt Fraction values for the site-specific and standardised protocols
Site-specific protocols

After correcting for partial and missing entries and excluding datasets where the phantom was not fully visible in the field of view, data was available for 16 cameras and 16 protocols. Of these, 8 cameras and 8 protocols had both standardised and site-specific data and were used for the paired analysis. Across the 8 considered protocols, the primary emission window was consistent, being centred on 140 keV and with a range of 10–20%. Feet-first positioning was used in 100% of protocols.

The median measured lung shunt fraction for the site-specific protocols was 9.9% (9.6–10.1%) meaning there was a median LSF error of 8.8%

Standardised protocol

The median measured LSF for the standard protocol was 8.6% (8.4–9.0%) and LSF error was 5.0%. The LSF values as measured on the site-specific and standardised protocols were significantly different (p < 0.001).

The median LSF for the site-specific and standardised protocols is shown in Fig. 2 and a summary of measured LSF is given in Table 4.

Fig. 2
figure 2

LSF as measured on site-specific protocols and the standardised protocol. Boxplots summarizing the lung shunt fraction values for the site-specific (left) and standardised (right) protocols. The box represents the 25th to 75th percentile range and the central horizontal line represents the median value. The whiskers represent the range. The dashed line represents the true LSF value as determined from injected activities

Tumour dosimetry investigation

Phantom filling

According to protocol, the intended ratio between activity in the spherical inserts and background compartment was 8:1. From the activities reported by the participating centres, the median ‘true’ sphere to background was 7.9:1 (7.8–8.1). These values are summarised in Table 5, and the raw activity data is given in Supplemental file table S1.

Table 5 Summary sphere to background values based on injected activity for tumour dosimetry investigation and the CRC values for the 37 mm sphere for the site-specific and standardised protocols
Site-specific protocols

Data was available for 22 cameras and 25 site-specific protocols, and there were 11 cameras and 11 protocols that supplied both standardised and site-specific data and were used for the paired analysis. Attenuation correction using a co-registered CT was performed in the majority of protocols 20/25 (80%). Scatter correction was less uniformly applied, being performed in 17/25 (68%) of protocols. A full overview of the site-specific protocols is given in Supplemental file table S2. For the paired subgroup, 100% of the protocols used CT based attenuation correction and 90% used scatter correction.

Figure 3 shows the distribution of sphere recoveries of all submitted data for the site-specific protocols. A large range in sphere recoveries was observed. CRC generally increased with sphere size, due to the larger impact of the partial volume effect on smaller structures. However, there were large differences for equal sphere sizes between centres/cameras. As an example, for the largest insert (37 mm diameter) the median CRC was 0.61 (0.53–0.69), and CRCs ranged from 0.35 to 1.01. The median S:B ratio was 5.28 (4.63–5.80) and S:B error was -2.64. A full overview of CRC for each insert, camera and protocol is given in Supplemental file table S3. Conducting attenuation correction and scatter correction yielded the highest average CRC, the median CRC for the 37 mm sphere for the subset of site-specific protocols applying attenuation and scatter correction was 0.68 (0.61–0.69). Comparing this with the subset where no attenuation correction and scatter correction was applied CRC was lowest with a larger variance, the median CRC for the 37 mm sphere being 0.51 (0.41–0.53).

Fig. 3
figure 3

CRCs by sphere diameter for inserts 1 to 6 for site-specific acquisition and site-specific reconstruction protocols. A scatter plot of contrast recovery coefficient by sphere diameter for the 25 site-specific protocols

Within the paired subgroup, the site-specific median CRC for the 37 mm sphere was 0.63 (0.61–0.69), and CRCs ranged from 0.41 to 0.77 (Table 5). The CRCs for the site-specific protocols within the paired subgroup are shown in Fig. 4A.

Fig. 4
figure 4

CRCs by sphere diameter for inserts 1 to 6 for site-specific acquisition and site-specific reconstruction protocols (paired subgroup) and for the standardised protocol. A A scatter plot of contrast recovery coefficient by sphere diameter for the 11 site-specific protocols which were included in the paired analysis. Protocols include both AC and SC, with the exception of the ‘GE Infinia’ protocol which includes AC and no SC. B A scatter plot of contrast recovery coefficient by sphere diameter for the 11 cameras that provided data acquired via the standardised protocol

Standardised protocol

Data was available for 11 cameras and 11 protocols from 6 different centres. Figure 4B shows the CRCs obtained when the standardized acquisition and processing protocol was imposed.

Taking the largest sphere as an example, the median CRC was 0.78 (0.61–0.82), and the range was from 0.45 to 0.92. The median S:B ratio was 6.15 (5.19–6.85) and S:B error was -1.86. The CRC values from all (n = 25) site-specific protocols and the standardised protocol were significantly different (unpaired t-test p = 0.046). There was no significant difference between the CRC values of the site-specific paired subgroup (n = 11) and the standardised protocol (paired t-test p = 0.09).

A summary of measured CRC values for the site-specific (paired subgroup) and standardised protocols is given in Table 5.

Figure 5 highlights the variability in CRC for each sphere diameter using all site-specific protocols within the paired subgroup and the standardised protocol from all systems. The standardised protocol was associated with a higher CRC on average but greater variability.

Fig. 5
figure 5

CRCs by sphere diameter for inserts 1 to 6 for site-specific protocols (paired subgroup) and the standardised protocol. The dashed lines represent the mean CRC across site-specific protocols (paired subgroup) and sites that provide data for the standardised protocol. Boxplots summarizing the CRC range for the site-specific (left) and standardised (right) protocols are included for each sphere diameter. The box represents the 25th to 75th percentile range and the central horizontal line represents the median value. The whiskers represent the range

Discussion

The TARGET phantom Sub-Study aimed to evaluate inter-site variability in [99mTc]Tc-MAA imaging and specifically, to report on variability in two dosimetric quantities (LSF and T:N due to technical and procedural differences between sites. To fully leverage the benefits of [99mTc]Tc-MAA scout imaging, quantitative metrics derived from [99mTc]Tc-MAA SPECT should be comparable between scanners, sites and studies. This study demonstrates that the use of key image corrections, specifically AC and SC, significantly reduced inter-system variability, whilst standardization of other reconstruction parameters (iterations, subsets and post-filtering) did not improve consistency.

Of the two metrics considered in this study, the LSF investigation exhibited better consistency between sites. Results showed the LSF was overestimated by approximately 8.8% using the site-specific protocols and the LSF IQR between different protocols was 9.6–10.1. Greater variability was noted in the tumour to normal tissue investigation, where CRC was demonstrated to vary substantially when site-specific imaging protocols were used. As an example, for the largest sphere in the NEMA IQ phantom, the CRC IQR was 0.5–0.7 and the two most differing sites recorded a CRC 0.35 and 1.01 respectively, meaning that if the same patient were scanned in two participating centres, the outcome in apparent tumour absorbed dose could differ by more than a factor of two. This demonstrates the potential variability that can be expected when sites use different imaging systems and methods.

An additional aim of the study was to investigate the impact of a harmonization strategy, involving implementation of a standardized imaging protocol. Firstly, considering the standardised protocol for the LSF investigation, the LSF as measured on the standardised protocol was underestimated by 5.0%, less than that noted for the site-specific protocols, however the IQR was not reduced (8.4–9.0). This indicates that the standardised protocol did not have a positive impact on variability but did improve accuracy. For the tumour dosimetry investigation, imposing the standardized protocol was evidenced to improve average performance. For the majority of centres (70%), the average CRCs of the largest sphere demonstrated a positive bias compared to site-specific protocols, and the S:B error was reduced, indicating accuracy was improved. The IQR however was not improved, as an example, the largest sphere CRC IQR increased from 0.16 (0.53–0.69) for the site-specific protocols where no specific corrections were imposed, to 0.21 (0.61–0.82) for the standardized protocol. It is evident, that implementing a single, standardised protocol does not necessarily reduce variability, as it is still necessary to account for the different properties of the collimators and the reconstruction algorithms for the various cameras. A key finding was that eliminating sources of possible variation in image corrections substantially improved inter-system quantification variability. For the subset of sites that provided both site-specific and standardised datasets, and who largely included attenuation and scatter correction as part of their site-specific protocols, the initial large variability in recovery coefficients was reduced. By removing inconsistency in only two parameters (AC and SC) the IQR in CRC for the 37 mm sphere halved from 0.2 to 0.1.

Our results demonstrate that technical factors have a non-negligible impact on [99mTc]Tc-MAA image-based dosimetry and dose targets reported in multi-centre trials should be interpreted in this context. The benefits of defining specific dose targets when imaging practice remains markedly inconsistent is inherently limited. Efforts to define accurate dose thresholds must be matched by efforts to standardise imaging practice to maximise efficacy.

A simple step that may be taken to maximise consistency between centres, is to ask centres to perform key image corrections (i.e. scatter correction and attenuation correction). Since the site-specific subgroup of images that all applied AC and SC, were in fact more consistent than those obtained from the standardised protocol, this would suggest there is merit in investigating the implementation of image corrections and procedural guidelines but leaving the specific application to the discretion of individual centres with greater insight of their own imaging system.

Whilst the standardized protocol did require centres to perform these corrections, the protocol was prescriptive and did not leave much room for centres to optimise, leading to some inconsistent behaviours between different imaging systems. There were several factors that could introduce inconsistencies, for example, the standardized protocol stipulated that a 5 mm gaussian filter be used for post-filtering, however these filters may be implemented differently in the various reconstruction algorithms used by the respective vendors. Similarly, the energy window width in the standardized protocol was required to be 15% (140 keV ± 7.5%). Many centres had a default of 20% window width and thus, in order to adhere to the standardized protocol, the energy window settings were changed. For some scanners this would also require the scanner to be peaked. If the peaking procedure was skipped, this would result in poorer quality images. Finally, by tuning of the number of iterations, site-specific protocols typically balanced image noise and reconstruction time at the expense of a lower CRC. Since convergence rates vary between the various reconstruction algorithms for individual imaging systems, the standardized protocol utilized a relatively high number of iterations to assure full convergence and optimize the CRC. However, this resulted in high noise levels in several cases. A key takeaway therefore is that close agreement between sites via implementation of a standardised protocol for SPECT is only partly relevant, and potentially only pertinent for the acquisition parameters. Due to the different reconstruction algorithms and collimator specifics, a better harmonization approach could involve focusing on the CRC metric itself, and tuning reconstruction parameters accordingly leading to different settings for different cameras. Ideally this should be facilitated through a central entity to analyse the data, an approach which has been successfully demonstrated in the EARL initiative. A similar methodology, of implementing a standardised acquisition protocol and performing reconstruction centrally, has been successfully implemented for SPECT quantification of 177Lu [17] and 99mTc [18] in phantoms.

This study has several limitations. Firstly, site specific differences in phantom preparation and activity dose measurement likely contributed to variability in the results, however as a specific preparation protocol was made available and the selected phantoms are standard in the field, this variance is expected to be small as compared to the objective of this investigation (i.e., the variability in acquisition and reconstruction). The LSF was inferred from two phantoms that were both homogeneously filled with activity and water, representing the ‘liver’ and ‘lungs’. In reality lungs are much less dense, activity is non-uniformly distributed, and the liver and lungs often overlap in the planar view, which in practice results in substantial overestimation of the LSF. For this reason many centres consider SPECT/CT as an alternative to planar imaging (alternative methods using SPECT/CT have been investigated and proven to be superior [19,20,21]). Finally, the standardized protocol for this Phantom sub-study was designed considering a system from a specific vendor, individual cameras from the various other imaging vendors implement different reconstruction algorithms and thus, a standardized protocol based on one imaging system was not capable of encompassing all scanner processes.

The dataset collected in this work, encompasses a wide range of system types and [99mTc]Tc-MAA SPECT/CT imaging protocols and thus provides a representative insight into the large variability evident in the field. Future research should build on existing works [12] to establish an evidenced-based standardised practice of acquiring and reconstructing [99mTc]Tc-MAA SPECT/CT images for the purpose of pre-treatment dosimetry. The harmonization of imaging procedures is now endorsed by several professional societies and organizations [22,23,24]. Much focus has been given to reducing variability of PET image quantification in multi-centre settings i.e. The EARL initiative [25], and more recently the AAPM launched a scheme to enhance consistency in 90Y Bremsstrahlung imaging, again based on quality control procedures using phantoms for standardisation [26]. However, as yet there is no standardisation programme for pre-treatment [99mTc]Tc-MAA SPECT/CT dosimetry. Investment in an accreditation scheme similar to EARL for [99mTc]Tc-MAA SPECT/CT dosimetry would be a valuable future endeavour to help advance the use of quantitative [99mTc]Tc-MAA SPECT/CT imaging. In the interim, publications on dose–effect relationships and reported dose thresholds should comment on their centre-specific imaging factors (e.g., system type, protocol parameters, image quality as measured via CRC) so that other centres may put results into context before implementing clinically.

In conclusion, this study shows that quantification of [99mTc]Tc-MAA SPECT/CT is feasible in a multi-centre phantom study, and high quality clinically relevant data can be obtained. Over the range of cameras and site-specific planar protocols investigated, comparable performance was noted in the lung shunt investigation, which suggests suitability for quantitative analysis of LSF in a scenario analogous to that of pre-treatment dosimetry work-up. Site-specific SPECT protocols included in this study were not capable of consistently reconstructing [99mTc]Tc-MAA activity distributions and there were large differences in CRC between different protocols for the same size structure. By eliminating sources of difference in image corrections between protocols, variation in quantification was reduced. A subset of site-specific protocols that implemented key image corrections (AC and SC) had a reduced range compared to the full site-specific dataset. The standardised protocol did not improve consistency between sites in either the LSF investigation or tumour dosimetry investigation but did improve accuracy.