Quantitative imaging biomarkers in nuclear medicine: from SUV to image mining studies. Highlights from annals of nuclear medicine 2018
- 134 Downloads
Quantification in medical imaging is one of the main goals in research and clinical practice since it allows immediate understanding, objective communication, and comparison. Our aim was to summarize relevant investigations on quantification in nuclear medicine studies published in the volume 32 of Annals of Nuclear Medicine.
In this article, we summarized the data of 14 selected papers from international research groups that were published between January and December 2018. This is a descriptive review with an inherently subjective selection of articles.
We discussed the role of parameters ranging from standardized uptake value to ratios, to flow within a region of interest, to volumetric parameters and to texture indices in different clinical scenarios in oncology, cardiology, and neurology.
In all the medical disciplines in which nuclear medicine examinations play a role, quantification is essential both in research and in clinical practice. Standardization and high-quality protocols are crucial for the success and reliability of imaging biomarkers.
KeywordsStandardized uptake value Quantification Radiomics Artificial intelligence Volumetric parameters
In 2016, the European Journal of Nuclear Medicine and Molecular Imaging (EJNMMI) and Annals of Nuclear Medicine (ANM) embarked on a joint initiative to publish highlights from studies of high interest for both European and Japanese readers and to strengthen the scientific cooperation between Europe and Japan. Since then, insightful papers have been published annually within this collaborative project [1, 2, 3, 4]. These contributions have addressed impactful topics in preclinical studies, translational research, and clinical applications.
Currently, nuclear medicine research is focused mainly on new radiopharmaceuticals targeting specific biological processes, on the search for reliable imaging modalities for diagnosis, prognostication, and treatment response evaluation, and on innovative approaches to image and treat patients in a personalized manner. In particular, interest in image-derived biomarkers and advanced image analysis is growing, and this warrants a review designed to describe the key findings from a range of relevant investigations on quantification.
The use of any type of measurements and/or of related mathematics falls into the category of quantification . Imaging is one of the domains that mostly require quantification . In fact, image interpretation is somewhat operator dependent since it is influenced by personal experience, skills, and expertise, as well as by geographical location . Therefore, quantification is a necessity to increase consistency in image assessment and thereby guide medical decisions appropriately. Quantification in the digital age implies easier and automatic tools for image analysis, a myriad of biomarkers potentially derivable from any type of image (from pictures to medical images), and, consequently, a strong basis for personalized treatment decisions.
Biomarkers are defined as “measurable and quantifiable biological parameters (…) which serve as indices for health- and physiology-related assessments” . The main advantage of the use of biomarkers lies in the fact that they are, or should be, easier and cheaper than direct measurement of the clinical endpoint . Three crucial aspects—high reproducibility, a sizeable signal-to-noise ratio, and a swift change in response to events (e.g., treatment)—make a biomarker good . In this regard, biomarkers derived from images have the potential to be among the most suitable.
The aim of the present review, within the joint EJNMMI/ANM initiative, is to provide an overview on quantification research in nuclear medicine focused on imaging biomarkers with an increasing degree of complexity: from SUV to radiomics. We subjectively selected the 14 most relevant investigations on imaging quantification from among all articles published in the Annals of Nuclear Medicine volume 32 (2018). We chose studies on the role of image-derived parameters in different clinical scenarios in oncology [11, 12, 13, 14, 15, 16, 17, 18, 19, 20], cardiology [21, 22], and neurology [23, 24].
Quantitative imaging parameters
The most common (semi)-quantitative parameter used in clinical practice for the assessment of positron emission tomography (PET) images is the standardized uptake value (SUV) and, particularly, the maximum SUV (SUVmax). SUVmax provides a measurement of the tracer uptake within the single pixel/voxel with the maximum decay counts among those belonging to a target region or volume, normalized to the injected activity . SUVmax quantitation is affected by various patient- and image-related factors, including blood glucose level, concomitant medications, artifacts related to metal implants and devices, positive contrast, lesion size, scanner calibration, and acquisition time [25, 26, 27, 28, 29, 30]. Abdel Gawad et al. proposed a 3D mathematical formulation of the SUV recovery coefficients. This formulation accounted for the most influential factors in SUV quantitation, including lesion size (volume or diameter) and lesion contrast (ratio of lesion concentration to background radiation level). Good performance of the proposed mathematical models has been reported in both phantom and clinical studies. In view of these preliminary results, the employed formulations are expected to allow more reproducible SUV measurements .
The increasing clinical application of PET/magnetic resonance (MR) has led to comparative studies aiming inter alia to assess the reproducibility of metrics between PET/CT and PET/MR systems. In this regard, Ringheim et al.  prospectively assessed the reproducibility of SUVmax in 30 prostate cancer patients with biochemical failure imaged by sequential 68Ga-PSMA-11 PET/CT and PET/MR. They found good agreement, with a linear correlation between the SUVmax of same-day randomized 68Ga-PSMA-11 PET/CT and PET/MR scans. However, the Bland-Altman analysis estimated a mean percentage difference of 18.7% (i.e., the PET/CT SUVmax was on average about 20% higher than the PET/MR SUVmax). Interestingly, based on their findings, they were able to propose a formula to compute PET/CT SUVmax starting from PET/MR SUVmax and vice versa: Y(PET/CT SUVmax) = 0.75 + 1.00 × (PET/MR SUVmax). Overall, their results indicated that SUVmax from 68Ga-PSMA-11 PET/CT and PET/MR should not be used interchangeably in either patient management or follow-up.
The target-to-background ratio is a simple method to normalize tracer uptake within an ROI that can be used in planar scintigraphy, single-photon emission computed tomography (SPECT), and PET imaging. Particularly in the case of PET, the above-mentioned shortcomings of SUV have resulted in attempts to search for alternative, more valuable options. The target-to-reference tissue ratio (TBR) is a simple (semi)-quantitative method to normalize the SUVmax. The liver, spleen, muscle, and blood have been proposed as reference organs. In the case of [18F]FDG studies, the tumor-to-blood standard uptake ratio (SUR) may be used as a surrogate for the metabolic rate. Moreover, a better correlation has been reported between the SUR and the metabolic rate of [18F]FDG compared with SUV, i.e., SUR is superior to SUV [30, 31]. The TBR is not dependent on tracer plasma clearance but, like SUV, it requires an adequate time for image acquisition. However, assuming the kinetics of [18F]FDG in the reference tissue to be irreversible and the shape of the arterial input function to be constant, both SUV and TBR may be accurately corrected for scan time variability .
The correlation between the target-to-liver ratio (TLR) and SUR has been investigated in order to assess whether the liver can be assumed to be a surrogate of arterial radiotracer uptake. However, the less than perfect results demonstrated the unsuitability of the liver as a surrogate of arterial tracer supply for SUV normalization via TLR computation . Nevertheless, the prognostic role of SUR and TLR has been described in both solid tumors [33, 34, 35, 36] and Hodgkin lymphoma . Interestingly, Annunziata et al.  found that TLR exhibited a prognostic role also in follicular lymphoma. The authors compared the TLR with the five-point Deauville scale and to the International Harmonization Project criteria in 89 patients with follicular lymphoma treated with induction immuno-chemotherapy. The TLR and the five-point Deauville scale were found to be comparable in predicting progression-free survival at 5 years (PPV = 80% for both, NPV = 88% and 86%, respectively), with a better performance than the International Harmonization Project criteria. Accordingly, the TLR calculated using end-of-treatment [18F]FDG PET/CT might be used as a prognostic index in follicular lymphoma, with the advantage over the five-point Deauville scale that it is independent from the subjective judgment of the observer .
As mentioned above, TBR may be computed with any radiopharmaceutical, including 68Ga-PSMA, as shown by Komek et al. . These authors provided evidence of the possibility of using SUV ratio cut-off values to identify metastases with poor or favorable outcomes in patients with advanced prostate cancer .
Additionally, TBR may be used for neuroreceptor quantification  and in patients with cardiovascular diseases . Habert et al.  aimed to develop and validate a method for the quantification of amyloid burden on the basis of 18F-florbetapir PET scans. They calculated in 53 cases (26 normal elderly controls, 11 patients with mild cognitive impairment, and 16 patients with Alzheimer’s disease) the TBR in ROIs of the native space of the untransformed PET images, applying correction for the partial volume effect. They identified a positivity threshold and thereafter, using an in-house software to correlate it with other approaches, they validated the cut-off. The method was applied to an independent cohort of 318 cognitively normal subjects to assess the amyloid status, with positive results. This method may be useful in clinical practice, especially when the optimal cut-off cannot be obtained from healthy subjects .
Flow within ROIs
Dynamic image acquisition, analysis of multiple blood and/or urine samples, and extensive protocols for image and data post-processing are extremely accurate even if time consuming. Indeed, in PET, the net input constant (Ki) computed from dynamic images applying Patlak analysis  better describes the [18F]FDG metabolic rate than the surrogate endpoint of the metabolic rate of glucose consumption, SUR. Interestingly, Lebasnier et al.  evaluated the role of cardiac dynamic [18F]FDG PET/CT in patients with suspicious cardiac sarcoidosis. They analyzed images of 28 patients, computing both global and segmental Ki. The heterogeneity of glucose metabolism was estimated by the normalized coefficient of variation and used to differentiate patients with and patients without cardiac sarcoidosis, with promising results (AUC = 0.96). Despite the retrospective study design, the paper benefitted from some positive aspects, including patient selection. In fact, it should be mentioned that the physiological myocardial uptake of [18F]FDG requires specific protocols for patient preparation and image analysis . Accordingly, two patients who did not meet the minimum criteria for adequate preparation were excluded from the analysis. Moreover, the combined evaluation of [18F]FDG and myocardial perfusion images, which is the recommended radionuclide method for the evaluation of cardiac sarcoidosis , was worthy. Lastly, whole-body PET/CT images provided additional findings in more than half of the patients . These results could have an important impact on patient management.
The volumetric parameters used in PET/CT—metabolic tumor volume (MTV) and the total lesion glycolysis (TLG)—have been proposed for prognostication, radiation oncology planning, and treatment response assessment in a variety of tumors imaged by [18F]FDG PET/CT. MTV and TLG represent the metabolically active burden and the level of glucose uptake within the total metabolically active volume, respectively. In the case of non-[18F]FDG tracers, other definitions are used depending on the biological process traced by the radiopharmaceutical (e.g., functional volume, proliferative volume). Anwar et al.  retrospectively tested the prognostic role of baseline SUVmax, MTV, and TLG in 49 stage I NSCLC patients. All patients underwent staging [18F]FDG PET/CT prior to any treatment, followed by complete tumor surgical resection (i.e., R0). Both 1- and 3-year disease-free survival were shorter in patients with higher MTV or TLG. MTV showed a higher prognostic value compared with TLG and SUVmax (AUC = 0.82, AUC = 0.79, and AUC = 0.72, respectively). Accordingly, baseline volumetric PET parameters may be useful in stage I NSCLC pre-treatment risk stratification, identifying patients who could benefit from wider resection and postoperative adjuvant therapy. These findings warrant further investigations since, if confirmed in independent cohorts, they might influence clinical practice.
Similarly, findings with a potential clinical impact were reported by Yoo et al.  in invasive ductal breast cancer. The authors analyzed baseline [18F]FDG PET/CT images to test whether volumetric parameters extracted within the primary tumor were predictors of axillary lymph node status (i.e., metastatic or not) in 135 patients with clinically negative axillary lymph nodes. All patients received breast surgery (breast-conserving surgery or mastectomy) with sentinel lymph node(s) biopsy and/or axillary lymph node dissection. The TLG was found to be an independent predictor of axillary lymph node status. Patients with a TLG greater than 5.74 had a risk of axillary lymph node metastases 17.36 times higher than patients with a TLG ≤ 5.74 .
Albano et al.  compared the prognostic value of visual assessment, SUVs, and volumetric parameters in 52 patients with primary brain lymphoma imaged by [18F]FDG PET/CT. Patients with a low-to-moderate MTV and TLG (< 9.8 cm3 and < 94, respectively) exhibited longer progression-free and overall survival. Neither qualitative visual assessment nor SUVs were associated with outcome. As addressed by the authors, the high physiological [18F]FDG brain uptake may mask or interfere with brain lesion assessment, and [18F]FDG may not be the most suitable tracer for PET imaging of brain tumors.
Recently, a study investigated the ability of 4-borono-2-[18F]-fluoro-phenylalanine ([18F]FBPA) PET/CT to differentiate radiation-induced necrosis from recurrent brain tumor and to select patients with brain tumors for boron neutron capture therapy. Twelve patients (gliomas = 9, malignant meningioma = 1, hemangiopericytoma = 1, brain metastases from lung cancer = 1) previously treated with external beam radiotherapy were included. SUVs and volumetric parameters were calculated. Patients who experienced tumor recurrence (n = 6) showed significantly higher values than patients with radiation-induced necrosis. Interestingly, the total lesion FBPA uptake was the only parameter without any overlap between the two groups (121.01 ± 50.48 and 12.36 ± 9.70, p = 0.0029) .
Radiomics and artificial intelligence
Texture is a general term to describe the appearance of a surface or a volume. Traditionally, in radiology, it refers to visual characteristics, with lesions being classified as homogeneous or heterogeneous . More recently, the concept of texture in medical imaging has evolved such that it now refers to objective quantitation of lesion homogeneity/heterogeneity based on the extraction of parameters through mathematical methods. Texture analysis may be computed using different approaches (e.g., histogram, wavelet, or fractal based), software (in-house, free, or commercial), and imaging modalities (e.g., slides, planar X-ray, SPECT and SPECT/CT, PET/CT, CT, and MRI). Moreover, it may be combined within mixed approaches to AI algorithms. Otherwise, AI algorithms may be used as stand-alone tools. Radiomics, which relies on texture analysis, and AI-based algorithms—image mining—have been approached as biomarkers with diagnostic prognostic or predictive role .
Iwabuchi et al.  compared the fractal dimension (FD), a quantitative feature of tracer distribution, with the conventional quantitative specific binding ratio (SBR), which has been widely used to measure the total striatal [123I]ioflupane uptake. The authors firstly performed a phantom study which confirmed the relationship between FD and SBR, and then proved that FD could detect dot and diffuse deterioration patterns. Thereafter, they moved to the clinical setting, retrospectively analyzing [123I]ioflupane SPECT images of 150 patients, including 110 with parkinsonian syndrome. Here, FD performed better than SBR (AUC 0.90 and 0.94, respectively) in distinguishing between patients with and without parkinsonian syndrome. As expected, due to their complementarity, SBR was significantly empowered when combined with FD (AUC 0.90 vs 0.96, p < 0.001) .
Molina-García et al.  found that textural features derived from baseline [18F]FDG PET/CT were able to predict pathological complete response to neoadjuvant chemotherapy and outcome (in terms of progression-free and overall survival) in 68 patients with locally advanced breast cancer. Conversely, in 82 patients with aggressive B cell lymphoma, textural features derived from baseline [18F]FDG PET/CT failed to predict first-line therapy response, though some were correlated with disease-free and overall survival .
So far, AI and texture analysis applications in medical imaging have received the same level of attention from the scientific community. However, it should be stressed that data validation is an essential requirement for image mining studies. In fact, the significance, goodness and strength of the results rely on generalizability, which is proved only by independent data validation . The study published by Nakajima et al.  is representative for the value of validation. They tested the diagnostic performance of version 1.1 of an artificial neural network (ANN) applied to myocardial perfusion imaging in order to detect perfusion defects and ischemia and subsequently compared the results with those obtained using the previous version (v1.0). The ANN v1.0 was trained on data from 1051 Swedish patients, while the ANN v1.1 was retrained on data from 1001 Japanese patients. The cohort used in this study to validate the v1.1 was the same as was used for the validation of the v1.0 (106 Japanese patients). Results showed that the slight improvement in the AUC obtained by the v1.1 compared with the v1.0 in detecting stress defects was statistically meaningless (0.95 and 0.93, respectively; p = 0.27), while the v1.1 performed better than the v1.0 in detecting induced ischemia (0.98 and 0.88, respectively; p = 0.0093) .
Our overview confirms that quantification permeates every type of nuclear imaging approach, providing helpful functional information regardless of the clinical setting (e.g., diagnosis, staging, prognostication) and the disease (e.g., neurological, cardiological, oncological).
Quantification makes the observations objective and is the prerequisite for the standardization of any type of assessment. It allows easier exchange of information and encourages medical debate at both the research and the clinical practice level [5, 6]. The identification of image-based biomarkers derived from nuclear medicine and molecular imaging techniques is one of the main topics of investigation. Several initiatives, aiming to change patient care by making imaging a more quantitative and reliable science, have been introduced, such as the Quantitative Imaging Biomarkers Alliance® (QIBA)  and the Image Biomarker Standardisation Initiative (IBSI) . Adherence to the procedural and clinical guidelines is an essential prerequisite for biomarker development and clinical implementation.
Risk factors (e.g., heavy smoker), signs, symptoms (e.g., severe pain, light fever, high blood pressure, abnormal heart rate), and disease severity (e.g., end-stage heart failure, multiple metastatic tumor lesions, mild cognitive impairment) may also be quantified using a scale. Each level/stage within a scale is defined according to specific criteria, which generally refer to quantitative parameters or range. However, risk factors, signs and symptoms, and diseases may be scaled differently according to different definitions or guidelines; therefore, even if the categorizations or adjectives used to characterize a disease are the same, their meaning may change (e.g., oligometastatic disease may be defined as a tumor with up to three or up to five metastases ). Moreover, different scales may be available to quantify the same condition (e.g., five-point vs ten-point scale). Therefore, whatever quantification method is used, the scale to which it refers needs to be mentioned. Collectively, quantification has great value and is easily applicable, keeping in mind that it should be appropriately used and that quantitative parameters constitute just one piece of the entire puzzle.
All the above-described quantification approaches have some advantages and limitations. In fact, the simple methods may fall short in terms of accuracy. SUVmax is commonly used in both oncological and non-oncological conditions with different purposes (e.g., diagnosis and prognostication), thanks to its easy computation that is independent from the clinical condition (health or disease), and with different tracers ([18F]FDG or non-[18F]FDG), acquisition protocols (e.g., whole-body, total-body, dual-point, dynamic), and scanners (PET/CT or PET/MRI). However, to guarantee reproducibility and comparability of measurements across patients’ scans and between centers, SUV should be used within specific standardization programs (e.g., the EARL initiative ).
Kinetic modeling provides the most accurate estimates of a condition, but it requires time-consuming protocols for image acquisition and analysis. Further studies on the development of algorithms for easier computation of kinetic biomarkers are of paramount importance.
Image mining (i.e., radiomics and artificial intelligence ) workflow requires extensive protocols for image pre-processing and data analysis as well as large datasets. Additionally, the pathophysiological relationship between the image mining biomarkers and the relevant clinical endpoints needs to be determined . Finally, even if artificial intelligence powers digital medicine , data reproducibility in biomarker discovery appears one of the main concerns . Nonetheless, the available results encourage further investigations. In parallel with investigations of technical and clinical matters, issues concerning the “black box”, legal responsibility, and acceptance should be addressed. Since the adoption of innovations depends not only on the scientific evidence (“logos”, one of the components of the rhetorical triangle) but also on the integrity (“ethos”) and the values, necessities, and emotions (“pathos”) of the recipients, all these aspects need to be addressed before a large-scale implementation [49, 50].
In conclusion, in all the medical disciplines in which nuclear medicine examinations play a role, including especially oncology, cardiology, and neurology, quantification is essential. Both in research and in clinical practice, imaging findings expressed in terms of measures allow immediate understanding, objective communication, and comparison. Standardization and high-quality protocols in research and clinical routine are essential for the success and reliability of imaging biomarkers.
MK PhD scholarship was funded by the AIRC grant IG-2016-18585. We thank all colleagues from the Nuclear Medicine Department of Humanitas Clinical and Research Center for their collaboration and Prof. Carlo Stella for cooperation with the Hematology Department.
MS and MK conceptualized the study, MK performed data selection, MS and FB drafted the paper, FB commented on the paper, and MK reviewed the paper; all the authors approved the manuscript.
Compliance with ethical standards
In view of the nature of the present article (i.e., review), ethical approval was considered unnecessary. Figures are based on anonymized images, taken from existing research database, published with patient consent.
Conflict of interest
The authors declare that they have no conflict of interest.
- 5.Shryock RH. The history of quantification in medical science. Isis. The University of Chicago Press. The History of Science Society. 1961;52:215–37.Google Scholar
- 6.Weisz G. Body counts: medical quantification in historical and sociological perspective. In: Weisz G, Jorland G, Opinel A, editors. Body Counts Med Quantif Hist Sociol Perspect Hist Sociol sur la Quantif médicale. Montreal, Kingston, London, Ithaca: McGill-Queen’s University Press; 2005. p. 377–93.Google Scholar
- 7.Biomarkers [Internet]. Available from: https://www.ncbi.nlm.nih.gov/mesh/68015415. Accessed 26 Aug 2019.
- 8.Aronson JK, Ferner RE. Biomarkers – a general review. Curr Protoc Pharmacol. Hoboken, NJ, USA: John Wiley & Sons, Inc.; 2017;9.23.1–9.23.17.Google Scholar
- 16.Beshr R, Isohashi K, Watabe T, Naka S, Horitsugi G, Romanov V, et al. Preliminary feasibility study on differential diagnosis between radiation-induced cerebral necrosis and recurrent brain tumor by means of [ 18 F]fluoro-borono-phenylalanine PET/CT. Ann Nucl Med. 2018;32:702–8.CrossRefPubMedPubMedCentralGoogle Scholar
- 18.Molina-García D, García-Vicente AM, Pérez-Beteta J, Amo-Salas M, Martínez-González A, Tello-Galán MJ, et al. Intratumoral heterogeneity in 18F-FDG PET/CT by textural analysis in breast cancer as a predictive and prognostic subrogate. Ann Nucl Med. 2018;32:379–88.CrossRefPubMedPubMedCentralGoogle Scholar
- 23.Habert MO, Bertin H, Labit M, Diallo M, Marie S, Martineau K, et al. Evaluation of amyloid status in a cohort of elderly individuals with memory complaints: validation of the method of quantification and determination of positivity thresholds. Ann Nucl Med. 2018;32:75–86.CrossRefPubMedPubMedCentralGoogle Scholar
- 31.Hofheinz F, van den HJ, Steffen IG, Lougovski A, Ego K, Amthauer H, et al. Comparative evaluation of SUV, tumor-to-blood standard uptake ratio (SUR), and dual time point measurements for assessment of the metabolic uptake rate in FDG PET. EJNMMI Res. 2016;6:53.CrossRefPubMedPubMedCentralGoogle Scholar
- 41.Slart RHJA, Glaudemans AWJM, Lancellotti P, Hyafil F, Blankstein R, Schwartz RG, et al. A joint procedural position statement on imaging in cardiac sarcoidosis: from the Cardiovascular and Inflammation & Infection Committees of the European Association of Nuclear Medicine, the European Association of Cardiovascular Imaging, and the American Society of Nuclear Cardiology. Eur Heart J Cardiovasc Imaging. 2017;18:1073–89.CrossRefGoogle Scholar
- 43.Sollini M, Antunovic L, Chiti A, Kirienko M. Towards clinical application of image mining: a systematic review on artificial intelligence and radiomics. Eur J Nucl Med Mol Imaging. 2019. https://doi.org/10.1007/s00259-019-04372-x.
- 45.Zwanenburg A, Leger S, Vallières M, Löck S. Initiative for the IBS. Image biomarker standardisation initiative. 2016.Google Scholar