Introduction

Positron emission tomography (PET) can measure in vivo biological processes at the molecular level. In clinical practice, PET is an essential imaging modality for diagnosis of various diseases [1]. Furthermore, PET can be used as a research tool to elucidate human physiological and pathological processes. The advantage of PET imaging is its high quantitative accuracy [2]. Various quantitative metrics are used according to the purpose of each PET imaging. The most popular metric is a standardized uptake value (SUV), which is normalized by body weight. The SUV is calculated by the following equation:

$$\mathrm{SUV}= \frac{{\mathrm{AC}}_{\mathrm{VOI}} (\mathrm{kBq}/\mathrm{mL})}{\mathrm{ID} \left(\mathrm{MBq}\right) / \mathrm{BW} (\mathrm{kg})},$$

where ACVOI is the average (or the maximum) activity concentration in the specified volume of interest (VOI), ID is the injected dose of radiopharmaceuticals, and BW is the body weight. The SUV is a unitless metric based on the assumption that human tissue density is equal to the density of water (1 g = 1 mL). Tracer uptakes in specified regions can be quantitatively evaluated with SUVs.

In treatment response assessment studies, quantitative metrics such as SUVs and their percentage change can be a surrogate marker to assess the therapeutic response [3,4,5]. For assessing the therapeutic response using pre-therapy and follow-up FDG-PET, changes in tumor SUVs have been used as primary and secondary endpoints in numerous studies and trials [4, 6, 7]. In EORTC criteria and PERCIST [8, 9], the tumor response is classified into four categories: complete metabolic response (CMR), partial metabolic response (PMR), stable metabolic disease (SMD), and progressive metabolic disease (PMD). In such cases, SUVs would be one of the key biomarkers for clinical decision-making.

However, quantitative values derived from PET are variable depending on technical, biological, and physical factors [10,11,12]. In multicenter studies using quantitative metrics, the variability of the metrics may have a significant impact on the study outcomes [13, 14]. Table 1 summarizes main factors affecting SUVs. Appropriate scanner calibration and quality control effectively reduce technical errors. Standardization of imaging protocols including patient preparation can reduce the unwanted biological bias and variability. Finally, any necessary harmonization strategies should be implemented to minimize the inter-scanner physical variability. These three-steps are key to get reliable (repeatable) and comparable (reproducible) quantitative values. Figure 1 presents the expected merits of harmonization. Harmonization leads to precise outcomes even with small datasets, because the non-pathophysiological bias and variation can be removed [15]. To appropriately use PET as an imaging biomarker, the benefits, limitations, and remaining challenges for harmonization of PET should be known. This review summarizes harmonization strategies for quantitative PET, including whole-body PET in oncology, brain PET in neurology, PET/MR, and non-18F emerging applications.

Table 1 Summary of factors affecting the SUV
Fig. 1
figure 1

Expected merits of harmonization of quantitative values in PET. True changes were + 10% (the blue point) for the placebo group and − 25% (the red point) for the treatment group. Bias and variability in quantitative values are reduced by harmonization

Harmonization strategies for whole-body PET in oncology

Several organizations have constructed scanner qualification programs for whole-body FDG-PET imaging in oncology. All programs involve standardization of imaging protocols and image quality, and some programs focus more on harmonization of quantitative metrics based on phantom experiments. Figure 2 shows representative phantoms used for harmonization of PET and Table 2 summarizes studies related to PET harmonization [13, 14, 16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34]. While earlier studies focused only on the variability in SUVs, recent studies have assessed clinical outcomes for various cancers using various quantitative metrics such as textural features. PET harmonization can drive the use of PET in clinical oncology studies.

Fig. 2
figure 2

© SNMMI

Photographs and PET images of representative phantoms used for harmonization: ACR phantom and its PET image (A), SNMMI-CTN chest phantom and its PET images (B), and NEMA NU-2 image quality phantom and its PET image (C). The PET image of the ACR phantom is reprinted from the paper by DiFilippo et al. [34]. The photograph and PET images of the SNMMI-CTN chest phantom are reprinted from the paper by Sunderland et al. [36]. These studies were originally published in JNMT and JNM, respectively.

Table 2 Publications relating to harmonization of PET

The American College of Radiology Imaging Network (ACRIN) has verified the accuracy of average SUVs using a uniform cylindrical phantom (5.92–8.88 or 8.14 kBq/mL; 6283 or 9293 mL). A 90% circular region-of-interest (ROI) of the interior diameter was applied to the phantom image, and the acceptable average SUV was 1.0 ± 0.1. Scheuermann et al. [35] reported that 12% of scanners (12/101) they tested failed due to incorrect SUV or normalization calibrations. This result suggested that verification of accurate SUV calibration is very important in multicenter studies.

The Society of Nuclear Medicine and Molecular Imaging (SNMMI) Clinical Trial Network (CTN) uses an anthropomorphic chest phantom with fillable spheres for validating the quantitative performance of PET/CT scanners [36, 37]. The standard radioactivity concentration ratio between the spheres and background are 4:1. SUVmax for small spheres and SUVmean for background regions were measured to assess the scanner calibration accuracy and quantitative performance for small lesions [36]. The acceptance criterion for the SUVmean of the uniform background was 1.0 ± 0.1.

In 2010, the European Association of Nuclear Medicine (EANM) Research Ltd. (EANM/EARL) launched a PET/CT scanner accreditation program [38]. The program uses a NEMA NU-2 image quality phantom to measure SUV recovery coefficients in relation to sphere size (diameters: 10–37 mm). The phantom is filled with 18F solutions, and the sphere-to-background radioactivity concentration ratio is 10 (20 and 2 kBq/mL) [22]. Maximum, mean, and peak recovery coefficients are measured to check whether they are within pre-specified upper and lower limits. High-resolution scanners sometimes required down-smoothing to fit this range. Consequently, the first standard limits EARL1 were updated to EARL2 according to the progress in PET instrumentation and reconstruction technology (Fig. 3A) [38, 39].

Fig. 3
figure 3

Harmonization ranges for maximum recovery coefficients, which were proposed by EANM (A) and JSNM (B). The range of the 5th-to-95th percentiles of SUVmax data for 23 scanners is shown here as the JSNM-WG proposed range

The Radiological Society of North America/Quantitative Imaging Biomarker Alliance (RSNA/QIBA) created the FDG-PET/CT profile to characterize and reduce the variability in SUVs. The QIBA profile comprehensively covers acquisition, reconstruction and post-processing, and analysis and interpretation to obtain SUVmax with a within-subject coefficient of variation (wCV) of 10–12% [40, 41]. Conforming to this profile supports the claim that an increase in SUVmax of 39% or more, or a decrease of 28% or more, indicates that a true biological change has occurred with 95% confidence.

The Japanese Society of Nuclear Medicine (JSNM) has published standard PET imaging protocols and phantom test procedures and criteria to standardize the methods that would affect image quality and quantification accuracy. A NEMA NU-2 image quality phantom is used with a sphere-to-background radioactivity concentration ratio of 4. For SUV harmonization, the SUVmax of hot spheres must satisfy the specified range (Fig. 3B) [33, 42]. A post-smoothing process was sometimes needed for high-resolution systems to meet this range, just as for the EARL limits. In 2022, a JSNM working group proposed new upper and lower ranges for SUVmax recovery curves [33]. This SUV harmonization range was proposed based on NEMA NU-2 image quality phantom data measured with 23 PET/CT scanners. The working group also suggested image quality criteria to ensure the 10 mm sphere visibility and to reduce the intra-scanner variability of quantitative metrics: the contrast-to-noise ratio (QH,10 mm/N10mm) should be ≥ 2.5 and the coefficient-of-variance in the background (CVBG) should be ≤ 10% (or 14.1%).

Since recent PET systems with advanced reconstruction algorithms may provide higher quantitative values, downgrading image resolution may be needed to harmonize quantitative PET data to those of older PET systems. Such a harmonization process may lose the superior imaging capability of the newer PET systems. One solution to retain the high image quality provided by newer PET systems has been to make a second set of images to provide comparable quantitative values [43]. However, this introduced a cumbersome PET data handling issue, and a better strategy was needed.

To overcome the double dataset handling issue, Kelly et al. [44] developed software that can calculate the harmonized SUVs in the background while displaying PET images reconstructed by an advanced algorithm. It is possible to provide the harmonized SUVs based on a single dataset without losing the advantage of high-resolution PET images. Quak et al. [17] successfully harmonized SUVs of 517 cancer patients using the commercial software, EQ.PET (Siemens). This software can apply the harmonization method of Kelly et al. to clinical data. The harmonization process of the EQ.PET is summarized as follows. First, a maximum recovery coefficient is measured for each sphere in the NEMA NU-2 image quality phantom. These values are then compared to a reference recovery curve to calculate the root mean square error (RMSE). This comparison is repeated while increasing the full width at half maximum (FWHM) of a Gaussian filter. The FWHM value at which the minimum RMSE is obtained then is applied to PET images for harmonization.

Free vendor-neutral software is desirable for widespread application of this harmonization method. Daisaki et al. [29] successfully harmonized SUVs using the “RC Tool for Harmonization” (Nihon Medi-Physics) which semi-automatically calculates an optimal FWHM of the Gaussian filter and RAVAT (Nihon Medi-Physics) which calculates QIBA Profile-compliant quantitative values. They harmonized SUV recovery curves of 15 PET systems to the JSNM harmonization range (Fig. 4). These vendor-neutral software programs can be freely applied to PET images, irrespective of scanner models. Two datasets are not required, because RAVAT applies a pre-determined additional smoothing filter to PET images. As mentioned below, several clinical studies have used RAVAT software for harmonization of quantitative PET [45, 46].

Fig. 4
figure 4

SUVmax recovery curves before and after harmonization using RAVAT. The inter-scanner coefficient-of-variance (CV) is reduced by harmonization

Harmonization strategies for brain PET

For brain PET, there are many kinds of radiotracers that can measure metabolisms, protein aggregates, transporters, and receptors [47]. Depending on radiotracer characteristics, various quantitative metrics are used to assess tracer uptakes. For harmonization, brain phantoms are generally used, because image characteristics such as tracer distribution, uptake contrast ratio, and image noise levels for brain PET are different from those for whole-body PET.

In the multicenter Alzheimer’s Disease Neuroimaging Initiative (ADNI) study, Joshi et al. [48] proposed the two-step harmonization method with the Hoffman 3D brain phantom. Using the digital Hoffman phantom image smoothed with the 8 mm FWHM Gaussian filter as the target resolution, they selected an FWHM of the Gaussian filter for each scanner model. Figure 5 shows representative PET images applied with and without a scanner-specific smoothing filter. Joshi et al. reported that the scanner-specific smoothing approach effectively reduced the inter-scanner variability, while the low frequency correction was not effective. Ikari et al. [49] also used the image resolution of 8 mm FWHM as a reference level. In the Japanese-ADNI (J-ADNI) study, the image resolution was harmonized by a scanner-specific smoothing filter [50]. There are some reports using gray matter contrast recovery (RCGM) and gray-to-white matter contrast (GMWMr) of the Hoffman 3D brain phantom [49, 51, 52]. Like the harmonization approach for whole-body PET, Verwer et al. [51] proposed upper and lower limits for RCGM and GMWMr to harmonize image contrast.

Fig. 5
figure 5

Copyright © 2009 Elsevier Inc. All rights reserved

Hoffman phantom images of five PET scanner models with pre- and post-smoothing. After application of the smoothing filter, the visual difference in resolution was reduced. The images are reprinted with modification from the paper by Joshi et al. Neuroimage. 2009; 46: 154–159 [48].

Even though the Hoffman 3D brain phantom is widely used, it can only simulate the distribution pattern of FDG in the brain [53]. Other phantoms might be better model for radioactivity distributions other than those of FDG. Hoye et al. [54] harmonized 11C-raclopride brain PET images measured by HRRT (high-resolution research tomograph, Siemens) and HR + (a standard clinical scanner, Siemens) using a 3D brain phantom (the Iida phantom [55]). Fahey et al. [56] evaluated image uniformity, spatial resolution, and image quality of 13 PET scanners using the SNMMI CTN brain phantom, which has a uniform section, a resolution section, and a clinical brain simulation section.

Clinical studies with harmonized PET

Quantitative values in PET can be used as a biomarker in clinical studies once harmonizing of PET data is completed. In oncology fields, PET imaging has been involved in many therapeutic studies [57,58,59,60,61,62]. Ito et al. [45] assessed anti-PD-1 therapy response to non-small cell lung cancer (NSCLC) using harmonized FDG-PET. They used nine PET/CT scanners and harmonized SUVs based on the JSNM harmonization strategy before applying EORTC criteria and PERCIST. Changes in harmonized SUVs correlated well with overall survival of NSCLC patients. Kitajima et al. [63] investigated relationships of prognosis for stage I–III breast cancer patients and pre-treatment FDG-PET/CT-derived quantitative metrics. They harmonized maximum SUVs of the image quality phantom among five PET/CT scanners according to the JSNM harmonization method. Primary tumor and nodal maximum SUVs and total lesion glycolysis (TLG), which are derived from pre-treatment FDG-PET/CT, were associated with recurrence-free and overall survivals in patients with operable breast cancer.

Among brain PET studies, Sevigny et al. [64] reported that Aducanumab, a human monoclonal antibody that selectively accumulates with Aβ aggregates, reduced Aβ plaques with amyloid PET as an adjunct marker for Aβ pathology. They harmonized amyloid PET images to be a uniform spatial resolution of 6.5 mm in-plane and 7.5 mm axially [65]. Change in amyloid PET SUVR values was used as a surrogate marker for treatment response. Senda et al. [66] performed a multicenter observational study on potential preclinical and prodromal Alzheimer’s disease. Image reconstruction parameters for each PET scanner in the observational study were determined with the Hoffman 3D brain phantom and the uniform cylindrical phantom so that all the scanners met previously established image quality criteria [49]. Senda et al. conducted brain FDG, amyloid and tau PET imaging, and measured AD t-sum values for FDG images and SUVRs for amyloid and tau images.

Post-reconstruction data-driven harmonization methods for PET

Adjusting reconstruction settings and applying additional smoothing filter are major approaches for PET harmonization. These methods are categorized into “image-based” harmonization approach, because “harmonized images” need to be generated before measuring quantitative metrics in PET images. On the other hand, several data-driven harmonization methods have been applied for quantitative metrics in PET [67].

The centiloid scale (CL) is an example of popularly used data-driven harmonization methods. The CL is a harmonized quantitative metric for amyloid PET proposed by Klunk et al. [68]. The standard CL is calculated from the standardized uptake value ratio (SUVR) of amyloid PET as follows:

$$\mathrm{Centiloid \,scale }\left(\mathrm{CL}\right)=\frac{{\mathrm{SUVR}}_{\mathrm{IND}}-{\mathrm{SUVR}}_{\mathrm{YC}-0}}{{\mathrm{SUVR}}_{\mathrm{AD}-100}-{\mathrm{SUVR}}_{\mathrm{YC}-0}} \times 100,$$

where SUVRIND is an SUVR value of an individual, SUVRYC-0 is the mean SUVR of 34 young healthy controls, and SUVRAD-100 is the mean SUVR of 45 AD patients. The standard image datasets and the VOI template are available online (https://www.gaain.org). SUVR values measured using various scanners and different amyloid tracers were converted to the unified 0 to 100 scale [69,70,71]. The CL scale has been applied to a tau PET SUVR conversion by Yamao et al. [72].

Another representative data-driven approach is the ComBat harmonization method [23], which was originally proposed to reduce “batch effects” in the field of genomics [73]. ComBat directly applies to quantitative values derived from PET images; therefore, additional image data processing and any phantom data acquisition are not mandatory. Orlhac et al. [74] provided a practical guide and list of limitations when applying ComBat to image-derived quantitative metrics. Figure 6 shows simulated data before and after using ComBat. In a multicenter study, Dissaux et al. [75] applied ComBat to numerous FDG-PET/CT radiomic features derived from four different scanners. They reported two radiomic features associated with local control in NSCLC patients undergoing stereotactic body radiation therapy. Hotta et al. [76] also used ComBat to evaluate FDG-PET/CT textural features of primary tumors that were acquired by three different scanners. They evaluated the prognostic value of pretreatment FDG-PET/CT for patients with surgically treated rectal cancer. Gray-level co-occurrence matrix entropy, which presents intra-tumoral metabolic heterogeneity, was associated with overall survival and progression-free survival.

Fig. 6
figure 6

© SNMMI

Box plot and quantitative value distributions for data of three virtual sites. After applying the ComBat harmonization method, data on the three sites showed similar distributions. The figures are reprinted with modification from the paper by Orlhac et al. J Nucl Med. 2022; 63: 172–179 [74].

PET/MR harmonization initiatives

Hybrid PET/MR systems can provide superior soft tissue contrast by MR and functional information by PET at the same position. Such combined information is useful in various fields including neurology, neuro-oncology, cardiology, and oncology [77,78,79]. The first commercial clinical PET/MR was the Ingenuity TF PET/MR (Philips), a separated-type system [80]. The Ingenuity TF PET/MR has a sequential configuration that connects two subsystems spatially separated by a shared patient bed, so simultaneous PET and MR imaging is not possible. Subsequent integrated PET/MR systems such as the Biograph mMR (Siemens), SIGNA PET/MR (GE), and uPMR790 (United Imaging) allow simultaneous imaging [81,82,83].

One of the key challenges of PET/MR imaging is MR-based attenuation correction (MR-AC). A segmentation-based method has been proposed to provide a four-compartment body segmentation including air, lung, fat, and soft tissue using the Dixon MR-sequence [84]. However, the segmentation-based methods do not account for bone structures, because a near-zero signal is obtained from bone regions due to bone having both a low spin density and a rapid T2 relaxation rate [85, 86]. Inaccurate bone consideration leads to large underestimation of PET uptakes, especially in near-bone tissues [87]. To overcome this problem, ultrashort echo-time (UTE) and zero echo-time (ZTE) MR-sequences have been developed to capture bone information, and these sequences provide more accurate attenuation maps [88, 89]. More recently, deep learning-based techniques are offering the option to improve accuracy of attenuation correction without CT images [90,91,92].

Although phantom measurements are necessary for scanner quality control and harmonization of quantitative metrics, there are no standard phantoms for PET/MR hybrid imaging. Figure 7 overviews MR-AC problems in phantom measurements. Ziegler et al. [93] reported that a strong artifact was caused by an inhomogeneous radiofrequency excitation MR signal due to a large amount of water as the phantom fluid. Errors in segmentation result in incorrect recognition of water, fat, and air. Boellaard et al. [94] noted that the lung insert was missing in some MR-AC maps. Then, it was found that phantom walls could not be visualized in standard MR-sequences, because they were made of plastic or glass materials [95]. These issues should be solved for valid phantom assessment. Although adding NaCl and NiSO4 solutions may improve the homogeneity of MR signals [93], it is still difficult to solve the problem of the missing phantom wall. Ziegler et al. [95] suggested that CT-based AC was suitable for accurate PET performance measurements of PET/MR systems when using the NEMA NU-2 image quality phantom.

Fig. 7
figure 7

© SNMMI (A), Boellaard et al. [94]. © Am. Assoc. Phys. Med. (B), and Ziegler et al. [95]. Copyright © 2015, Ziegler et al. CC BY 4.0

Overview of MR-AC problems: inhomogeneous MR signal (A), missing phantom walls (B), and segmentation errors (C). The images are reprinted with a modification from the following papers: Ziegler et al. [93].

Even though MR-AC accuracy was not verified, Laforest et al. [96] harmonized contrast recovery coefficients (CRCs) between the SIGNA PET/MR and Biograph mMR using the NEMA NU-2 image quality phantom. Six customized spheres (diameters: 8.5, 11.5, 15, 25, 32.5, and 44 mm) were used in addition to the standard six spheres (diameters: 10, 13, 17, 22, 28, and 37 mm). The CT-based AC methods were used and CRCs were evaluated together with the root-mean squared discrepancy (RMSD) to determine harmonized reconstruction parameters. Laforest et al. achieved high CRCs within the limits of EARL1 and minimized the variation between two different PET/MR scanners. Jentzen et al. [97] evaluated quantitative accuracy of 124I PET/MR images for patients with differentiated thyroid cancer using serial PET/CT data as a reference. After harmonization processing, activity concentrations of lesions in the neck were comparable between images measured with PET/CT (CT-AC) and PET/MR (MR-AC).

For evaluation of brain PET imaging, the Hoffman 3D brain phantom has been widely used as mentioned above. Since the phantom is composed of acrylic plates, MR-AC accuracy cannot be evaluated as well as for the NEMA NU-2 image quality phantom. Ribeiro et al. [98] evaluated PET image quality of PET/MR systems using the Hoffman 3D brain phantom with CT-based template AC. On the other hand, Teuho et al. [99] used another 3D brain phantom (the Iida phantom [55]) which contains the gray matter, white matter, skull, and tracheal structures with a realistic head contour. Teuho et al. evaluated the difference between CT-AC and MR-AC on visual and quantitative differences using multiple PET/MR and PET/CT systems. They noted that regional differences between PET/MR and PET/CT systems were minimized using CT-AC. Because white matter of the 3D brain phantom is composed of polymer, which is not visible in MR-AC, the Iida phantom is still insufficient for evaluation of MR-AC accuracy [100].

CT-AC is the current standard method in PET/MR phantom studies for evaluating image quality and quantitative accuracy, thereby aiming at harmonization. In human PET/MR studies, however, MR-AC is generally used to reconstruct PET images. A PET/MR multimodal phantom, which mimics the electron density and MR contrast of human tissue, is required to evaluate MR-AC accuracy, and facilitate PET/MR harmonization [100]. Harries et al. [101] developed a realistic phantom of the human head using water-saturated gypsum plaster, silicone, agarose gel, etc. Rausch et al. [102] developed a cylindrical phantom with a 3D printable MR-visible polymer. Preliminary evaluations of MR-AC accuracy may be possible using these phantoms although they still cannot provide PET signals. On the other hand, Canata et al. [103] proposed to use a patient as a phantom to assess the MR-AC accuracy using CT as the reference. PET/MR harmonization will be facilitated if a standard method that can verify the MR-AC accuracy is established.

Non-18F PET harmonization initiatives

Many novel radioisotopes have emerged for diagnostic PET imaging, theranostics, and immuno-PET imaging. One of these radioisotopes for PET is 68Ga. 68Ga-labelled tracers targeting somatostatin receptor, prostate-specific membrane antigen (PSMA), and fibroblast activation protein inhibitor (FAPI) have been used in recent clinical studies [104,105,106]. These tracers are often used in combination with 177Lu and 225Ac as part of theranostics [105, 107, 108]. 124I has been used to evaluate lesion dosimetry prior to radioiodine treatment in patients with differentiated thyroid cancer [97, 109]. Immuno-PET is a promising tool for predicting the outcome of monoclonal antibody-based cancer therapy using such radioisotopes as 64Cu, 74Br, 86Y, 89Zr, and 124I [110, 111].

It is essential to consider the physical properties of each radioisotope for harmonization of non-18F PET imaging. Table 3 lists the physical properties of typical positron emitting isotopes [112,113,114,115]. Long positron ranges due to the high energy of emitted positrons lead to blurring of the source distribution and loss of spatial resolution [116, 117]. A low positron branching ratio may result in inferior image quality and errors in quantitative uptake measurements due to the low count statistics and the cross-calibration error. In addition, some radioisotopes emit prompt or non-prompt γ rays in the decay process (Table 3). If such associated γ rays have contaminated coincidence count data in energy and coincidence timing windows, they will lead to degradation of image quality and quantitative accuracy [113].

Table 3 Properties of typical positron emitting radioisotopes

Some studies have investigated PET image characteristics of different radioisotopes [114, 118]. Soderlund et al. [113] evaluated the image quality and spatial resolution for a set of radioisotopes (18F, 11C, 89Zr, 124I, 68Ga and 90Y). Figure 8 shows PET images of the NEMA NU-2 image quality phantom filled with the various radioisotopes. 124I and 68Ga, which have longer positron ranges, showed slight degradation of contrast recovery and resolution in small spheres compared to 18F. Other reports using small animal PET systems showed considerable image blurring due to the positron range [119, 120]. Especially for 124I, the image quality may be deteriorated due to prompt γ-rays having the energy (603 keV) that is similar to 511 keV of annihilation photons.

Fig. 8
figure 8

© SNMMI

PET images of the NEMA NU-2 image quality phantom filled with various radioisotopes. The images were acquired by Biograph mCT (Siemens). The images are reprinted with modification from the paper by Soderlund et al. [113].

Harmonization studies on non-18F PET imaging have been conducted by several groups. Huizing et al. [121] performed 18F and 68Ga PET phantom acquisitions using 13 PET/CT systems and evaluated quantitative recovery coefficients according to the EARL strategy. While 18F recovery coefficient curves for all PET/CT systems satisfied the range for EARL1 standards, 68Ga curves were located near the lower limit of the range. After correcting the difference between 68Ga and 18F cross-calibrations, 68Ga recovery coefficient curves for most scanners satisfied the EARL1 standards. Some investigators worked on multicenter 89Zr PET harmonization studies. Makris et al. [122] and Kaalep et al. [123] used the NEMA NU-2 image quality phantom with a 10:1 sphere-to-background ratio according to the EARL protocol. Results from both studies showed that inaccuracy of the local dose calibrator cross-calibration would be a large source of bias in quantitative values. Christian et al. [124] used an anthropomorphic chest oncology phantom filled with 89Zr at a clinically relevant activity level and a 4:1 sphere-to-background ratio. They investigated optimal reconstruction parameters based on visual lesion detectability and SUVpeak recovery coefficients.

Minimizing measurement errors and cross-calibration errors is essential for harmonization of non-18F PET imaging. In most PET/CT systems, cross-calibration between the dose calibrator and a PET system is performed using 18F, which may result in inaccurate quantification with radioisotopes other than 18F. Accurate quantification for non-18F radioisotopes requires proper correction of such physical properties as branching ratio, half-life, and prompt γ rays. Bailey et al. [125] and Sanderson et al. [126] noted most PET systems underestimated 68Ga SUVs which was caused by overestimation of 68Ga radioactivity using dose calibrators with a default calibration factor setting. Appropriate 68Ga calibration factor setting is important to provide accurate quantitative values. Beattie et al. [127] recommended the use of appropriate dose calibration factors for measuring 89Zr and 124I. For 124I, they proposed to use a copper filter to remove the contribution of X-rays emitted by 124I in the 20–40 keV range.

As mentioned earlier, additional care is required for harmonization of non-18F PET. Appropriate quality control for PET systems, dose calibrators, and other associated devices should be performed considering their physical properties. Ideally, scanner validation and harmonization are conducted with the radioisotopes being used in the clinical protocol [124].

Future challenges and summary

The major harmonization approach is adjusting image reconstruction parameters and smoothing filters so that quantitative values are comparable among different scanners. It should be noted that such harmonization methods may spoil small lesion detectability. Aide et al. [128] presented a representative case before and after harmonization, which used a state-of-the-art SiPM-based PET/CT scanner for measurements (Fig. 9). To mitigate deteriorating small lesion detectability, harmonization criteria should be regularly updated in step with scanner performance improvements [11]. Because novel PET systems such as SiPM-based scanners [129, 130], total-body scanners [131, 132], and brain-dedicated systems [133, 134] have been increasingly applied to clinical studies, harmonization methods should adapt to such imaging systems.

Fig. 9
figure 9

Copyright © 2021, Aide et al. CC BY 4.0

18F-fluorocholine PET images acquired by Biograph Vision scanner (Siemens). The patient was referred for restaging after a liver graft for hepatocellular carcinoma. PET data were reconstructed with an EARL-compliant harmonized parameter (9 mm Gaussian filter) (upper) and with point-spread function modeling and 0.9 mm voxel size (lower). The red arrow indicates a tiny lung metastasis. The images are reprinted with modification from the paper by Aide et al. [128].

Many organizations have built harmonization strategies; however, international methodology harmonization is necessary to achieve harmonization of quantitative PET, because clinical studies and trials are conducted worldwide. International harmonization of the standards remains as an issue to ensure the comparability of quantitative values in PET.

This review discussed harmonization strategies for quantitative PET. To make use of PET as a biomarker, quantitative values derived from PET should be comparable between scanners, sites, and studies. By minimizing the bias and variability due to technical, biological, and physical issues, quantitative PET can precisely highlight physiological and pathological changes even with small datasets. It is expected that this review will facilitate harmonization of quantitative PET and PET-derived biomarkers contribute to research and development in medicine.