Introduction

A considerable proportion of the western society is at risk of suffering a cardiovascular event during life. Atherosclerosis is one of the main underlying processes. Non-invasive assessment of atherosclerosis is important since it allows studies into the etiology and consequences of early and advanced atherosclerosis in populations at large [1]. The last two decades, measurement of coronary artery calcification (CAC) using computer tomography (CT) has been used to assess coronary atherosclerosis non-invasively. The presence, and more importantly, the quantity of CAC, relates well with the overall severity of the atherosclerotic process [2]. Several studies have demonstrated a strong relation between coronary calcium burden and the incidence of myocardial infarction, a relation which was independent of age [3, 4].

Most of the evidence on determinants and consequences of coronary calcium is based on data obtained with electron beam CT (EBCT) [57]. The availability of EBCT scanners is modest, whereas the Multi Detector-Row CT (MDCT) scanners are more widely available and also allow for detection of coronary calcium. Current data suggest that EBCT and MDCT give comparable results [8, 9]. In contrast to EBCT, however, data on reproducibility of CAC measurements using MDCT images is not widely available [10, 11], but information is relevant. Furthermore, due to technical improvement, slice thickness of the images has become smaller which may affect the likelihood of detecting coronary calcium, and hence its reproducibility.

We set out to study inter-scan reproducibility of coronary calcium measurements from MDCT images and to evaluate whether reproducibility is affected by different measurement protocols, slice thickness, selected cardiovascular risk factors and technical variables.

Materials and methods

Participants were recruited from the PROSPECT study [12], cohort of 17,357 healthy breast-cancer screening participants, aged 49–70 years, living in Utrecht and surroundings, enrolled between 1993 and 1997. Between October 2002 and December 2004, a random selection of 1,996 women were invited by mail and 1,000 (50.1%) who were postmenopausal and did not use contraceptives or hormone replacement therapy answered positively. Of these 1,000 women, a random selection of 573 underwent a MDCT examination during a single visit and 76 of them were scanned twice. The Medical Ethical Committee of the University Medical Center Utrecht approved the study and written informed consent was obtained from all participants.

Current cardiovascular drug use (blood pressure lowering, lipid lowering and glucose lowering drugs) was assessed by asking women to bring all packages to the study centre. Smoking behavior, medical history and cardiovascular family history were assessed by a questionnaire. Height and weight were measured and body mass index was calculated as weight divided by height squared (kg/m2). Waist-to-hip ratio (WHR) was assessed. Systolic and diastolic blood pressures were measured at both arms with an automated and calibrated blood pressure device (DINAMAPTM XL, Critikon, Johnson & Johnson, Tampa, Florida, USA) with the subject in supine position. A venous blood sample was drawn after an overnight fast of at least eight hours. Plasma total cholesterol, plasma triglycerides, and plasma glucose were measured using standard enzymatic procedures. High-density lipoprotein (HDL) cholesterol was measured by the direct method (inhibition, enzymatic). Low-density lipoprotein (LDL) cholesterol was calculated using the Friedewald formula.

Coronary imaging and calcium measurements

The amount of calcium in the coronary arteries was assessed with a Multi Detector-Row CT (MDCT) scanner (Mx 8000 IDT 16, Philips Medical Systems, Best, The Netherlands). Subjects were positioned within the gantry of the MDCT scanner in supine position. During a single breath hold, images of the heart, from the level of the tracheal bifurcation to below the base of the heart, were acquired using prospective ECG triggering at 50–80% of the RR-interval, depending on the heart rate. Scan parameters were 16 × 1.5 mm collimation, 205 mm field of view (FOV), 0.42 s rotation time, 0.28 s scan time per table position, 120 kVp and 40–70 mAs (patient weight <70 kg: 40 mAs; 70–90 kg: 55 mAs; >90 kg: 70 mAs). Scan duration was approximately 10 s, depending on heart rate and patient size. We had the participant get up from the table and lay down again since in studies on change in CAC over one year it is not realistic to assume exactly the same position of the participant at both occasions. Therefore our patients sat up and consequently moved slightly between scans to mimic two separate scan runs.

From the acquired raw data, the whole volume was reconstructed with an intermediate reconstruction algorithm in non-overlapping data sets of 1.5 mm and 3 mm thick sections. Quantification of coronary calcium was performed on a separate workstation with software for calcium scoring (Heartbeat-CS, EBW, Philips Medical Systems, Best, The Netherlands). All regions with a density over 130 Hounsfield units were identified as potential calcifications.

After completing a training-program, one scan reader (AR) who was unaware of the scores of the first scan, manually selected the calcifications within one of the coronary arteries (left main, left anterior descending, left circumflex, right coronary artery, and PDA) and scored the second scan of the participants. To reduce the influence of noise, the minimum size of a calcified lesion was set at 0.5 mm2. The peak density in Hounsfield units and the area in mm2 of each selected region were calculated. The Agatston [13] calcium score was obtained by multiplying the area by a weighting factor that is dependent on the peak signal anywhere in the lesion. The scores of individual lesions were added to obtain the Agatston calcium score for the entire coronary tree. The total calcium volume was calculated by multiplying the area of the calcified lesion (measured in square millimeters) by section thickness (1.5 mm and 3.0 mm). The calcium volume for each coronary vessel was computed by summing the volumes of the lesions in that vessel for all sections. Finally, the total volume from all the vessels became the calcium volume for a subject. The mass method uses volumetric, density information and a calibration phantom of hydroxyapatite to calculate the actual mass of the calcified plaques [14].

In addition, information on breathing artifact (inconsistency of sternum bone in sagital section in mm), noise (standard deviation of enhancement in fixed cardiac area of 212 mm2) and mean heart rate (beats/min) during scan acquisition was collected.

Data analysis

The mean and standard deviations (SD) of coronary calcium were calculated for all scoring methods separately. Because of the skewed distribution of scores, medians were also computed. The Intra-class correlation coefficient was estimated for between scans data and for 1.5 and 3.0 mm slices thicknesses separately. The mean difference in score between scans was calculated as well as the absolute and relative differences.

To distinguish between random differences or systematic difference, information on mean and absolute differences is needed. One may assume a priori a non-differential misclassification in the calcium scores, but one has to show that with the results. When the chance of the 2nd result being higher or lower is equal, one would expect a mean difference of zero, with some standard deviation. The absolute difference will not be zero since all differences are ‘absolutised’, but it is expected that at least the mean difference is much less than the absolute difference. If however the chance of a higher or lower value in the 2nd scan is not equal, the mean difference will be plus or minus a certain value. In addition, the absolute difference will have a value close to that of the mean difference. Therefore we need both parameters.

To estimate a weighted kappa as measure of agreement of categorical variables, subjects were divided into four groups according to the mean Agatston score as proposed by Rumberger et al. [15]: A: 0–9 (absent-minimal), B: 10–99 (mild), C: 100–399 (moderate) and D: (400 (severe degree of calcification). This categorization is specifically for the calcium scoring method according to Agatston. Therefore we additionally categorized all scoring methods in their quartiles to calculate kappa as measure of agreement for all scoring methods.

The relation between risk factors, technical variables and measurement error was assessed using Spearman correlation coefficients. In a similar manner the relation between calcium level and measurement error was examined. Since logarithms of coronary calcium scores have generally been used in statistical analyses in other papers, we also studied the reproducibility of logarithmic transformed calcium score. Logarithmic analysis of coronary calcium scores was performed by calculating natural log of coronary calcium scores +0.001 (ln (CCS + 0.001)) because the logarithm of coronary calcium scores alone excludes all subjects with zero scores [16]. We defined relative difference as absolute difference divided by the mean calcium level multiplied in 100 and expressed in percent. Data analysis was performed with SPSS for windows, version 12.0. A statistically significant difference was assumed when the two-sided P-value was less than 0.05.

Results

Mean age was 67.3 ± 5.2 years. Fifty-five participants (72.4%) had a coronary calcification more than zero in Agatston (1.5 mm slice thickness). Table 1 shows the general characteristics of the 76 women who had two MDCT scans.

Table 1 Characteristics of studied population (N = 76)

Table 2 presents information on calcium distributions by various scoring techniques and reproducibility results, by slice thickness. Overall, calcium scores were higher when based on the 1.5 mm slice thickness than based on the 3.0 mm slice thicknesses. The kappa agreement and Intra-class correlation coefficients between the two scans were high for all scoring methods, indicating that with respect to ranking of subjects all three methods are doing well. In addition, the mean differences in scores were relatively small compared to the absolute differences for all measurements, suggestion no systematic measurement errors. Finally, results for the scans with 1.5 mm slice thickness were similar to those for the 3.0 mm slice thickness (Table 2).

Table 2 Characteristics of different coronary calcium scoring methods; effect of slice thickness on inter-scan reproducibility

Table 3 presents the relation of cardiovascular risk factors with inter-scan mean difference. No consistent relations were found between risk factor levels and measurement error. Importantly, however was the observation that calcium level or the logarithm of the coronary calcium scores were not related to the mean difference between scans, whereas they were significantly related to the absolute and relative differences (Table 4, Figs. 1 and 2). These observations suggest that measurement error increases with increasing CAC levels, yet that this occurs in a random way.

Table 3 Relationship between cardiovascular risk factors and inter-scan mean difference of coronary calcium scoring methods by MDCT (Slice thickness 1.5 mm)
Table 4 Relationship between cardiovascular risk factors and inter-scan absolute and relative difference of coronary calcium scoring methods by MDCT (Slice thickness 1.5 mm)
Fig. 1
figure 1

Relation between mean calcium score and inter-scan difference in mean calcium scores (Bland-Altman plots)

Fig. 2
figure 2

Relation between mean calcium score and inter-scan absolute difference

Discussion

With respect to ranking of subjects, the inter-scan reproducibility of coronary calcium measurements by MDCT using Agatston, volume and mass scoring algorithms is excellent. The inter-scan reproducibility showed no major differences between scoring methods. The slice thickness did not affect reproducibility, nor did heart rate and technical parameters. Measurement error was related to increased coronary artery calcification, although our findings suggest that the error in the measurements is a random phenomenon.

Our findings, i.e., no major differences between scoring methods are in contrast with several reports on reproducibility based on EBCT scanning. Direct comparison of the findings of these studies with those of other is difficult since the parameters used to indicate reproducibility differ between studies. Furthermore, potentially the prevalence of CAC and its extent may affect reproducibility, as our findings suggest that measurement error increases with increasing CAC levels. Also the sizes of the studies differ which have undeniable effects on reproducibility results. However, our results are similar to those of by Rumberger and Kaufman [17], who compared these three methods and did not find any one method preferable to another in terms of reproducibility of results from consecutive scans in a patient.

Although the correlation between inter-scan measurements is excellent [18, 19], it still occurs that subjects with small deposits of calcium in scan one may have larger deposits of calcium in the 2nd scan, which leads to proportionally larger error in reproducibility. This has triggered other studies [20] on reproducibility to suggest that “the variability is partially a function of the absolute calcium score and inversely related to it”, implicating that low coronary calcium scores may not be reproducible. However, our results could not confirm this.

Besides different algorithms for calcium scoring, slice thickness has been reported to affect the reproducibility of scoring protocols. In our study, the reproducibility of the coronary calcium measurements by MDCT was similar for 1.5 mm as for 3.0 mm slice thickness, and equal for Agatston, volume and mass measurements confirming the results by Rumberger and Kaufman [17].

The implications of our main findings depend on the research question that is asked in studies using CAC measurements. When the interest is using CAC measurements for prognostic studies our results for kappa and ICCC show that ranking of subjects is adequate based on one CT scan. So the need for duplicate CAC scan is absent. The fact that measurement error increases with increasing CAC values, is in prognostic studies not of major importance since the categorization of individuals seems adequate. When the interest is in etiologic studies using CAC as outcome parameter, our findings show that risk factor relations will be validly estimated since none of the risk factors relates to measurement error. When the interest is in using CAC as risk factor for future events (assessment of relative risks), it is most likely that in analyses with CAC as continues variable the magnitude of association of high CAC levels with events reflects an underestimation of the true magnitude. The direction of the relation will not change since based on our results measurement error is random, leading to random misclassification of the exposure variable. When the interest is in diagnostic value of CAC measurements, which is usually done in categories of CAC, again the relations will be valid given our high kappa coefficients. Although our study was performed in healthy postmenopausal women, we expect that the finding will also be applicable for men.

Our findings are important in the light of the wider availability of MDCT in countries compared to EBCT. One reason for that is lower equipment cost. Other advantages of MDCT over EBCT have been suggested to be less quantum noise, thinner section thickness, and simultaneous acquisition of four sections (with 16-slice or with 64-slice ), which is reported to reduce misregistration artifact.

In conclusion, our findings demonstrate that coronary calcium measurements by MDCT are highly reproducible and are not affected by scoring protocols, slice thicknesses and technical factors.