A weather forecast of rain tomorrow is a dichotomic prediction, whereas a forecast that predicts a chance of rain is probabilistic as opposed to a prediction of either sunshine or rain is deterministic. When the probability of rain is 40%, precautions could include bringing an umbrella, or wearing a raincoat before going out, or both, or neither. The final choice of a subsequent action depends on having appropriate information. Although an approach based on probability is essentially included in the process of any medical decision, specific thresholds of normal and abnormal values are often applied. For example, anemia can be defined when hemoglobin values are < 12 and 13 g/dL in women and in men, respectively (though these values might differ slightly among laboratories). Heart-to-mediastinum (H/M) ratios of 1.6, 1.68, and 1.75 in 123I-metaiodobenzylguanidine (mIBG) images might signal a poor prognosis for patients with heart failure.1,2,3 Such thresholds are straightforward and help to guide subsequent actions in clinical practice; however, the question remains as to how appropriate thresholds are determined.

In this issue of the Journal of Nuclear Cardiology, Roberts and colleagues described using 123I-mIBG to discriminate dementia with Lewy bodies (DLB) and Alzheimer disease (AD) and compared their findings with those of persons aged ≥ 60 years in a UK study4 based on the H/M ratio, which is popularly applied to discriminating normal and abnormal cardiac 123I-mIBG uptake in clinical practice. The H/M ratios were corrected using a phantom-based method to overcome differences in camera-collimator variations.5,6 The UK threshold to discriminate AD and DLB was lower than that in a Japanese multicenter study,7 and suggested that the threshold might be even lower in the USA, than in the UK. Several factors are involved in establishing appropriate thresholds, such as the demographics of patients including background, age, and comorbidity; therefore, how to understand whether or not a threshold is “appropriate” is addressed here.

Heart-to-Mediastinum Ratio Depends on Age and Background

The effects of aging on H/M ratios have been investigated, and some studies have found that the ratio declines with age, whereas others have not.8,9,10,11 Since most 123I-mIBG studies have included relatively few patients in control groups, whether or not a slight decline of H/M ratio with age is a truly physiological change has not been confirmed. In particular, elderly persons often have several comorbidities, such as ischemic heart diseases, hypertension, diabetes mellitus, and are under medications that affect cardiac 123I-mIBG accumulation, all of which could be causes of decreased 123I-mIBG uptake. Persons with confirmed cardiac diseases or with comorbidities requiring medical management were excluded from a multicenter 123I-mIBG database compiled by the Japanese Society of Nuclear Medicine (JSNM) working group.12 We can reference the H/M ratios in the JSNM database, because it is considered to have been derived from near-normal individuals. All 123I-mIBG H/M ratios were standardized to the conditions of a medium-energy (ME) general-purpose collimator in the JSNM database, because the type of collimator also significantly influences H/M ratios.5 More specifically, calculated H/M ratios are higher in the descending order of ME low penetration, ME general-purpose, low-medium energy, low-energy (LE) general-purpose, and LE high-resolution collimators. The most recent Japanese neurological studies regarding Parkinson disease and dementia with Lewy bodies have calculated H/M ratios using the JSNM standard conditions, so that data can be readily compared among medical centers irrespective of camera-collimator combinations and types.13 Notably, the H/M ratio did not correlate with age in the JSNM 123I-mIBG database before collimators were standardized, but significantly and age-dependently declined thereafter.11

A comparison of normal databases with groups of patients aged < 60 and ≥ 60 years 11 shows lower H/M ratios for older, than younger patients (Figure 1). Moreover, although the UK data were also standardized to ME collimator conditions based on phantom experiments, the average H/M ratio was lower for their elderly patients than for age-matched Japanese patients.4 This is probably because comorbidities were respectively included and excluded from the UK and Japanese databases.

Figure 1
figure 1

123I-mIBG heart-to-mediastinum (H/M) ratios in databases from the UK and Japan. A and B show early and delayed H/M ratios from Japanese Society of Nuclear Medicine normal database (n = 62)12 and control data from the UK (n = 29).4 The age groups are < 60 (n = 28) and ≥ 60 (n = 34) years in the JSNM database. The H/M ratios age-dependently decreases when age groups were compared. C and D show UK and Japanese patients with AD (n = 15 and 31, respectively) as well as DLB (n = 17 and 30, respectively).4,7 The H/M ratio for DLB did not differ between UK and Japanese databases, whereas that of AD was higher in the Japanese database. Original UK data were provided by Dr. Gemma Roberts (Newcastle University, UK). AD, Alzheimer disease; DLB, dementia with Lewy bodies; H/M, heart-to-mediastinum ratio; JP, Japan; UK, United Kingdom

What is an Appropriate Threshold?

Various thresholds have been applied in nuclear cardiology practice. For example, if normal databases based on selection criteria are accumulated, a range comprising mean ± 1.96 (or ± 2) standard deviation (SD) can serve as lower and upper limits that include 95% (or 2.5% to 97.5% quantile) of control persons. For example, the early H/M ratio in the Japanese 123I-mIBG database is 3.10 ± .43, with lower and upper limits of 2.2 and 4.0, respectively.11,12 When the data are more varied with an SD that is large, ± 1.5 (87% of data included) or 1.0 SD (68% of data included) could be selected as thresholds. When some outlier values are included, 10%-90% quantile (80% of the data included) might also be selected, depending on the situations. The selection criteria for control patients could definitely influence the mean and SD in clinical investigations because normal healthy persons cannot be recruited for such studies.

Another consideration of thresholds is based on receiver operating characteristics (ROC) analysis. When two groups of patients are analyzed using ROC curves, possible thresholds can move on the curves because data points are derived from plots of sensitivity (true positive) and 1 − specificity (false positive), (Figure 2A). Thus, the maximum point corresponding to the highest value of sensitivity + specificity − 1, the Youden index,14 is a candidate appropriate threshold (data point P in Figure 2A). Thresholds were decided based on this approach in the Japanese multicenter database including the Japanese AD, DLB, and UK databases. Another method is to calculate the point that is closest to the left upper corner (coordinate of [0,1]) (data point Q in Figure 2A). The best point can be determined as a specific point, but several points can be candidates if distances are similar among several adjacent points.

Figure 2
figure 2

ROC analysis of patients in Japan and UK who have dementia with Lewy bodies and Alzheimer disease. A Calculations based on delayed H/M ratios. Data point P is farthest from line of identity, corresponding to highest sensitivity + specificity − 1. Data point Q is closest to left upper corner. Corresponding probability and delayed H/M ratios are shown. B Probability of DLB calculated by logistic function based on Japanese and UK databases.4,7 Thresholds of 2.5 for early H/M and 2.2 for delayed H/M are shown in blue unfilled and filled circles, and those of 1.8 and 1.65, respectively are shown in red unfilled and filled circles. Corresponding probabilities of DLB and AD can be estimated using these curves. AD, Alzheimer disease; DLB, dementia with Lewy bodies; H/M, heart-to-mediastinum ratio; JP, Japan; ROC, receiver operator characteristics; UK, United Kingdom

Viewpoint of Probability

Prediction of the optimal point can be viewed from probability analysis based on logistic curves. The possibility of diagnosing DLB is schematically displayed in the UK and Japanese studies cited above4,7 using the logistic function shown in Figure 2B. The thresholds of early and delayed H/M ratios in the UK study were 1.8 (1.77-1.80) and 1.65 (1.61-1.70), respectively, that corresponded to 67% and 72% probabilities of DLB respectively. The thresholds in the Japanese multicenter study were 2.5 and 2.2, which respectively corresponded to 50% and 62% probabilities of DLB. If 2.2 was applied to both early and delayed H/M ratios in the UK and Japanese databases, the probabilities of DLB would be respectively 52% and 48% vs. 73% and 64%.

An early H/M ratio of 2.5 is within the normal range of the JSNM working group database, but this is inconvenient for clinical practice. Therefore, we decided to use 2.2 as the optimal threshold for both early and delayed images of patients with neurological issues.11 As noted above, we understand that the diagnostic probability of DLB based on this threshold will be 60%-70%.

Different Thresholds for Patients and Purposes

The selection of optimal thresholds depends on the patient and the purpose. The thresholds for example, between AD and DLB, DLB and non-DLB, control and DLB might not be identical. Another concern is the expectation of high sensitivity or high specificity. Since various thresholds have been presented in clinical studies, some adjustment is required to fit studies at individual institutions. In particular, most multicenter studies have specific inclusion and exclusion criteria to obtain clear-cut results and avoid confounding factors. For example, heart diseases, diabetes mellitus and some medications decrease H/M ratios, resulting in lower thresholds for discriminating DLB. In addition, body stature or fraction of obesity in study populations might be important factors for creating control databases. A comparison of databases from Japan and the USA uncovered population-specific changes in the diagnostic accuracy of myocardial perfusion imaging and left ventricular function.15,16

Importance of Technical Standardization

Apart from the clinical variations discussed above, differences in methodology such as data acquisition and processing methods should be standardized in advance before using common procedures. The method of setting region(s) of interest should be simple and reproducible, preferably using a standardized location and size, or semi-automated as applied in Japan.17 Phantom experiments are ongoing in Japan and Europe to overcome camera and collimator differences among 123I-mIBG studies.6,18 Procedural guidelines, normal databases, and normal values created by academic medical societies are also convenient for clinical practice.12,19

Deterministic and Probabilistic Approaches

Lastly, various thresholds can be calculated based on statistical methods. However, fundamental or clinical considerations should be included to determine appropriate thresholds. To better understand the meaning of thresholds, the probability of specific diseases, pathophysiological conditions, and prognostic outcomes as discussed herein might also be useful. A definite diagnostic threshold is a convenient deterministic approach in which only the most likely diagnosis is applied, but a probabilistic approach might also be beneficial to understand the nature of definite, probable, and equivocal situations that are obscured behind dichotomic thresholds.