Introduction

Visual and quantitative perfusion and function assessments of myocardial perfusion SPECT (MPS) are important in detection and evaluation of coronary artery disease (CAD).1 Reproducibility analysis for these parameters is essential in understanding the significance of changes seen in serial studies. Most previous studies have reported the inter-/intra-observer agreement for one MPS test or the reproducibility of measures for retrospective studies.2-6 A recent study has reported the reproducibility of visual perfusion derived by current automated software for test–retest MPS studies obtained with the same radiopharmaceutical injection.7 The aim of this study is to comprehensively evaluate the reproducibility of both visual and automated quantifications of perfusion and function parameters from MPS stress and rest scans that were repeated in the same scanning session.

Materials and Methods

Patients

One hundred studies with repeated stress and rest MPS acquisitions were obtained from July 2007 to March 2008 at Nuclear Cardiology Diagnostic Imaging Laboratory, University of Ottawa Heart Institute, Ottawa, Canada, to evaluate the reproducibility of MPS measures.

The population age was 56 ± 11 years and included 20 studies with prior myocardial infarction, 25 cases with percutaneous coronary intervention, and 8 with coronary artery bypass graft. The clinical characteristics for all the patients in this study are summarized in Table 1. This study was approved by the University of Ottawa’s Human Ethics Review Board.

Table 1 Study demographics (N = 100)

Image Acquisition and Reconstruction Protocol

The details of image acquisition and tomographic reconstruction have been previously described.7 In brief, each study was performed with a 99mTc-tetrofosmin rest/stress 1-day protocol.8 For rest images, the injected dose of 99mTc-tetrofosmin was, on average, 340 MBq, with a range of 301-453 MBq. For the stress images, the average injected dose was 1053 MBq, with a range of 930-1281 MBq. All subjects were imaged at 45 min after tetrofosmin injection at rest followed by stress images that were taken at 30-60 min after either radiopharmaceutical injection during treadmill testing or dipyridamole infusion. Patient was repositioned between stress and rest, but not between repeat resting scans or repeat stress scans. All patient studies were acquired with a 90° rotation of a dual-detector integrated SPECT/CT camera (Infinia Hawkeye 4; GE Healthcare) that used 15% energy window centered on the 140-keV photopeak of 99mTc with low energy high-resolution collimators. Both rest and stress scans were obtained with 8 gated frames per cardiac cycle.

All acquisitions were reconstructed on a Xeleris 2.0 workstation with a 64 × 64 matrix (pixel size = 6.8 × 6.8 mm2) using the same reconstruction algorithm (filtered back projection). Butterworth filters were applied to the images. The filter parameters were an order of 10 and cutoff of 0.4 cycles/cm for ungated MPS images, and an order of 10 and cutoff of 0.35 cycles/cm for gated data. For each study, two separate SPECT scans were acquired at both rest and stress, one immediately followed by another with the same acquisition parameters. No attenuation correction was applied. All scans were randomized and de-identified. Both stress and rest images were used for generating the perfusion and function measures.

Image Preprocessing

All images were first segmented by QPS/QGS. Then all image contours were reviewed by experienced technologist case-by-case, who was blinded to paired information for the randomized patient identification codes, and were individually adjusted if required. Contour QC flag was derived.9

Visual Analysis

Visual interpretation of MPS images was based on short-axis and vertical long-axis tomograms divided into 17 segments. Each segment was scored by an expert with more than 20 years experience in nuclear cardiology (S.H.) using a 5-point continuous scoring system for perfusion variables (0, normal; 1, mildly abnormal; 2, moderately abnormal; 3, severely abnormal; and 4, absence of segmental uptake), a 6-point scale for wall motion (0, normal; 1, mild hypokinesia; 2, moderate hypokinesia; 3, severe hypokinesia; 4, akinesia; 5, dyskinesia), and a 4-point scale for wall thickening (0, normal; 1, mild; 2, moderate to severe; 3, absent). During visual scoring, no clinical information, such as patient history, was taken into account. The expert was also blinded to any computer-generated myocardial perfusion quantification results and paired information for the randomized patient identification codes. However, all available image data including raw projections gated stress and resting non-corrected scans were considered during scoring. The observer could also modify the default assignment of segments to each specific vascular territory. In addition, to eliminate visual bias, each image in first scanning session was presented to observer after 6 months for testing the intra-observer reproducibility. Subsequently, the visual summed scores for stress (VSSS) and summed scores for rest (VSRS) were calculated by summing of respective segmental perfusion scores. Visual summed difference score (VSDS) was also computed as the difference between VSSS and VSRS (VSSS − VSRS). Visual motion (VM) and thickening (VT) scores were obtained from stress image by summing of all corresponding segmental scores. Previously established abnormality threshold for VSSS (≥3) was assumed for observer scoring.10 For direct comparison with a recently introduced measure—total perfusion deficit (TPD),10,11 all visual perfusion scores were also normalized to percent of the total myocardium (the summed scores divided by 68 [4 × 17] and multiplied by 100). Local summed and normalized scores for each vascular territory were also obtained.

Automated Quantification

QPS software computed TPD10,11 score by integrating the hypoperfusion severities below normal limits in polar map coordinates. Normal limits threshold was defined as 3.0 mean absolute deviations (approximately equivalent to 2.5 standard deviations, SD) for each polar map sample. TPD was measured independently at each stress and rest scan. Ischemic TPD (ITPD) was defined as the difference between stress TPD (STPD) and rest TPD (RTPD) (STPD − RTPD). Functional measures included ejection fraction (EF), ventricular wall motion (WM) (mm), and thickening (WT) (%), which were obtained from gated stress/rest scans using QGS software.12 TPD, WM, and WT values were also calculated for each vascular territory.

Statistical Analysis

Continuous variables are denoted as mean ± SD. Bonferroni-corrected paired t-test was used to compare mean values of continuous variables of test–retest studies and χ2 test was used to analyze categorical variables. To test reproducibility, several methods including linear regression, Pearson correlation analysis, and the Bland–Altman method were employed to determine the reproducibility between the same assessments of two repeated studies. To obtain visual assessment of intra-observer reproducibility, the reading scores in first scanning session were compared with visual scores in the second scanning session. Repeatability coefficient (RC), defined as 1.96× SD of the differences between pairs of repeated measures, was also determined.13 A P < .05 was considered statistically significant.

Results

160 of 800 contours generated from QPS/QGS (gated image: 21%; ungated image: 19%) needed adjustment, as determined by experienced technologist. Most of the adjustments were subtle, only rearranging the valve-plane positioning. Only 13 cases were adjusted by giving a corrected initial ellipsoid. Eleven cases with incorrect ellipsoid determination had high shape quality control flag value (≥4.11).9

Perfusion Results

For the perfusion visual scores, 17–18% of patients had ≥5% myocardium abnormal at stress in both studies (test: 17%; retest: 18%), and 10% of the patients had ≥5% myocardium abnormal at rest or ischemic myocardium in both studies. TPD showed similar distribution of abnormality as compared to visual scores. The distribution differences between test and retest studies in either visual scores or quantitative scores were not significant. Table 2 shows the reproducibility of perfusion results (global visual perfusion score, TPD, and corresponding results in vessel territories). Table 2A shows the reproducibility of global variables and correlation between test and retest studies. The correlations of visual perfusion interpretation were excellent for VSSS and VSRS (both r = 0.95) and good for VSDS (r = 0.81). Similar correlation values were also obtained for stress, rest, and ischemic TPD with r = 0.98, r = 0.99, and r = 0.85, respectively. As compared to visual measures, the correlation between test and retest for stress and rest quantitative perfusion parameters was significantly higher (0.98 vs 0.95: P = .001; 0.99 vs 0.95: P < .0001).

Table 2 Global (A) and regional (B) reproducibility of perfusion parameters between test–retest studies in each vascular territory

The reproducibility of global perfusion quantification analyzed by Bland–Altman method is also shown in Table 2A and Figure 1. In general, the bias, 95% limits of agreement, and SD were smaller for quantitative perfusion measures (stress, rest, and ischemic) compared to visual findings. In addition, Table 2A shows that both quantitative and visual perfusion parameters were highly reproducible with small repeatability coefficients (RC) (RC = 4.8% for VSSS%; RC = 3.8% for VSRS%; RC = 4.3% for VSDS%; RC = 3.3% for STPD; RC = 1.8% for RTPD, and RC = 3.2% for ITPD), all P ≤ .002. Among abnormal cases, Figure 2 shows that reproducibility of stress or ischemic quantitative parameter was smaller than that of corresponding visual parameter for all abnormality thresholds.

Figure 1
figure 1

Differences of global perfusion defects. The left column shows the differences of visual perfusion results between test and retest; right column shows the differences of TPD results between test and retest. The first row shows the differences of normalized visual stress perfusion and STPD between test and retest; the second row shows the differences of normalized visual rest perfusion and RTPD between test and retest; the third row shows the differences of normalized visual ischemic perfusion and ITPD between test and retest

Figure 2
figure 2

RC of stress and ischemic perfusion measures for all studies and for abnormal studies Note: CAD includes all cases with prior myocardial infarction, percutaneous coronary intervention, or coronary artery bypass graft. Abnr defined by STPD ≥ 3% includes abnormal cases who have STPD ≥ 3%, Abnr defined by STPD ≥ 5% includes abnormal cases who have STPD ≥ 5%, VSSS% means normalized visual summed scores for stress, and VSDS% means normalized visual summed difference score

Most of the correlations between test and retest studies for visual and automated perfusion measures in each vessel territory were either excellent or good (≥0.7) (Table 2B). In general, the correlations of both visual and automated perfusion measures in the LAD were high (>0.95), the correlations of both measure methods in the LCX were good, and the correlations of both methods in the RCA were also high (>0.80). In addition, the automated measurements in the RCA were correlated better than those of visual measures (P < .0001).

Similar to the correlation analyses, the automated perfusion results by Bland–Altman method (Table 2B) show similar results as compared with visual measures. Both quantitative and visual perfusion parameters in each vessel territory were also highly reproducible with small repeatability coefficients (all RC ≤ 4.1%). The RC for STPD and RTPD in RCA was significantly lower than that for normalized summed visual rest perfusion (P < .001).

Functional Results

Global functional measures (Table 3A) showed high repeatability especially in thickening (VT: r = 0.94; WT: r = 0.91). Visual scoring was more reproducible for thickening score than for motion scores (P < .0001). In addition, stress EF had lower RC than rest EF (P = .0048). However, after removing 35 cases with operator adjusted contour, there are no significant differences between stress EF and rest EF (stress EF: r = 0.94, −1.88 ± 3.6%, RC = 6.4% and rest EF: r = 0.91, −0.66 ± 4.4%, RC = 8.7%).

Table 3 Global (A) and regional (B) reproducibility of functional parameters for each vascular territory

Visual functional measures (VM and VT scores), automated raw motion, and thickening in each vessel territory were also analyzed (Table 3B). Table 3B shows that most of functional measures in each vessel territory have high or good correlation (≥0.72) except VM score in LCX and RCA area. Visual thickening score was more reproducible than visual motion score in each vessel territory (P ≤ .001). In addition, the RC for automated WM in each vessel territory was smaller than 2 mm and for automated WT in each vessel territory was smaller than 10.4%.

Discussion

This is the first study, which comprehensively evaluates and compares intra-observer visual and quantitative reproducibility of wide range of function and perfusion parameters in MPS studies obtained with the same injection of radiopharmaceutical. We included most of visual and quantitative parameters, which have been used in clinical practice including TPD, WM, and thickening. It should be noted that in this study an experienced attending physician was reading blinded scans in two consecutive sessions within 6-month interval with the same display defaults and presets. Therefore, this represents the case for the best possible visual intra-observer reproducibility. We demonstrated that the software analysis showed equivalent or better reproducibility than this best possible (expert intra-observer) visual analysis.

In this study, the reproducibility of visual and quantitative MPS measurements has been evaluated from patients who underwent two separate gated stress/rest MPS studies. Most of correlations and reproducibility of perfusion and functional assessments of stress and rest MPS between these two consecutive studies were high by quantitative and by visual analyses. As compared with visual assessments, stress and rest perfusion automated measures had significantly higher correlations and smaller repeatability coefficients. Our results also showed that VT scores were more reproducible than motion scores both globally and in local vessel territories.

Several previous serial studies have evaluated the variability of MPS assessments.2-6 Mahmarian et al3 showed a high correlation of myocardial perfusion defect size from 18 serial exercise thallium-201 SPECT images. Prigent et al6 also reported high reproducibility of the quantitative measure of percent hypoperfused myocardium by exercise thallium-201 SPECT images. Recently, our laboratory compared visual with automated assessments from 31 stable patients with abnormal stress MPI and suggested that automated quantification for abnormal perfusion was more reproducible than visual evaluation.5 In that study the RC of ITPD was found to be 5.7% for the studies repeated within 9-22 months and it is found to be 3.2% in our study with same-injection repeated studies. However in abnormal studies, the ITPD reproducibility is slightly higher at 4.3% (see Figure 2). In another recent study, the reproducibility of ischemic quantitative measurement was reported as 0.25 ± 3.81%.14 Therefore, the estimated RC at 95% confidence would be 7.5%. The values reported in our studies are lower, even in abnormal studies, which may be related to the differences in the software.

Compared to previous studies, this study utilized repeated stress and rest studies obtained in the same day with the same isotope injection to evaluate the reproducibility of perfusion and function parameters obtained from MPS. By using studies obtained on the same date with the same injection, we eliminated the effect of disease progression, changing patient medications, and other variations in the clinical state as well as the variability in the level of the exercise. Therefore, from the point of view of testing the visual or software analysis reproducibility, our data are not contaminated by the unrelated components and represent true evaluation of the analysis reproducibility.

This study also showed good correlation for visual and automated perfusion/functional assessments of SPECT MPI between these repeated studies, and higher reproducibility for some automated perfusion measures in stress and rest images. This study showed that reproducibility of stress or ischemic quantitative parameters in abnormal cases considered separately was significantly smaller than that of corresponding visual parameters. Besides evaluation of global parameters, quantifications in each vessel territory analyzed in our study showed that quantitative measures in RCA territory had some superiority as compared with visual measures. This might be related to slight patient movement, and artifacts variations in this location between these two repeated studies, which could affect expert’s evaluation during scoring.

There are several limitations in this study. The major limitation of this study is the small number of patients with abnormal perfusion and/or function included. In this study, we only compared software reproducibility to the intra-observer reproducibility. Presumably inter-observer reproducibility would be worse (with higher RC coefficients) than the intra-observer variability. However even in the case of intra-observer variability, we could demonstrate the superiority of the software supporting the overall conclusions of software analysis superiority as compared with visual analysis. In addition, although our data acquired during continuous sessions are not affected by the disease progress, patient medications, and other clinical state as well as the variability in the level of the exercise, the functional parameters such as EF, WM, and WT could be potentially affected by the timing of the acquisition. In particular, stress function could be affected by stunning and rest function could be affected by variable image quality (e.g., increased extra-cardiac uptake). Furthermore, in this study, not all of the standard variables generated from our software were evaluated for reproducibility. Some parameters such as transient ischemic dilation (TID) or diastolic function parameters were not evaluated due to nature of the imaging data obtained with single injection. TID would be associated with error in sequential imaging acquisitions and diastolic function parameters were only validated for the 16-gate data, whereas the data studied here were obtained with 8-gates.

Conclusion

This study demonstrates that standard perfusion and function parameters derived by expert visual or quantitative analysis are highly reproducible with significant advantages for the quantitative approach especially for the stress and ischemic perfusion variables.