Introduction

Access to imaging data is becoming more widespread as regions/nations implement Electronic Health Record (EHR) solutions. Expectations for accessibility are changing with physicians requiring access to images dating back many years in near real-time. To satisfy these expectations, EHR projects are driving the deployment of imaging repositories that host data online for the legal retention period and in some instances for the lifetime of the patient.

New technological advancements including multislice CT scanners and functional MRI have dramatically increased the size and number of digital images generated by medical imaging departments. CT scan images represent up to 50 % of a Picture Archiving and Communication System (PACS) storage space; with each CT study consisting of hundreds of slices, easily more than 1,000 images. Storage needs are predicted to further increase when departments are required to archive 3D volume renderings and other complex advanced image reformats of the original images.

Despite the fact that the cost of storage is dropping, the savings are largely surpassed by the increasing volume of data being generated. In addition, the cost of operation remains high, with a mandatory data migration and legal retention period of 7 years for adult patients and 7 years past the age of consent for adolescence. Considering the fact that Canada annually generates 1.5 PB of radiology imaging data, the potential national metrics are significant [4].

While local area network bandwidth within a hospital is adequate for timely access to imaging data, efficiently moving the data between institutions requires wide area network bandwidth, which has a limited availability at a national level. As such, EHR networks cannot support timely access and distribution of large medical images. Conservative levels of data compression can address the storage and transmission needs by enabling more efficient distribution and optimizing archiving of imaging data.

The goal of this study was to determine levels at which irreversible compression can be confidently used in diagnostic imaging applications for thin slice CT images (less than 3 mm thickness). The compression levels investigated were based on accepted levels published in the Canadian Association of Radiologists (CAR) Standards for Irreversible Compression published in June 2008 after a large-scale pan-Canadian evaluation study [2] which did not include in its scope thin slice CT images less than 5 mm.

It has been demonstrated that thin slices are less tolerant to JPEG 2000 compression than thick slices as they have smaller peak signal-to-noise ratio (PSNR) and more visible artifacts because of more random noise due to less photons contributing to the reconstruction of a voxel [5]. Woo et al. [6] stated that the perceived artifacts, more pronounced in thin-section images, have a blurring effect mainly altering the structures of the solid organs and soft tissues. Yamamoto et al. [7] reported that 2-mm thick high-resolution lung CT images are more vulnerable to compression than 10-mm thick conventional lung CT images on the basis of the comparison of PSNR and subjective rating of the image quality. Siddiqui et al. [8] and Siegel et al. [9] reported that thinner sections are less compressible, based on their observations of the PSNR and a computer-based perceptual metric [10] in five chest CT data sets compressed with the JPEG 2000 algorithm. Thus, there is ample evidence that image compression may have a differential effect on thin slice CT images.

One study has directly examined the effect of digital compression on thin slice CT. Ringl et al. [3] asked their readers, for each image pair, to select the image with the best image quality then rate the difference between the compressed and uncompressed image. Their approach was to select the ideal compression ratio as the one with no visual difference between the original and compressed image. However, the relation between visually detectable image degradation and diagnostic accuracy is unknown, as the authors admit.

In the present study, we explicitly examined this question. We began with a large variety of subtle conditions in diverse body parts. Each lesion was then reproduced under varying compression conditions, then diagnosed by experienced radiologists to determine if compression would have an impact on diagnostic accuracy. We also asked radiologists to rate the degree of degradation and their confidence.

Materials and Methods

Materials

The study considered the effects of compression in four radiological areas in thin slices of CT: body, chest, musculoskeletal (MSK), and neurological. Two hundred eighty retrospective examinations were selected from the Hamilton Health Sciences PACS and used to retrieve 4,200 anonymized and uncompressed images. The images were acquired using Toshiba Aquilion 16 Slice, Toshiba Aquilion 64 Slice (Toshiba Medical Systems Corporation, Toshigi, Japan), and GE VCT 64 Slice(General Electric Healthcare, Milwaukee, WI, USA). Extreme caution was taken to ensure that all the images were exported in original size noncompressed format.

We retrieved 15 slices for each selected exam. In addition, all retrieved images had a slice thickness up to 3 mm. The body cases had slice thickness between 2 and 3 mm, chest (including high resolution and vascular) cases between 1 and 2 mm, musculoskeletal cases between 1 and 2 mm, and neuro cases between 1 and 3 mm. Reporting was performed by all radiologists on a pair of 2 MegaPixels flat panel Digital Imaging and Communications in Medicine (DICOM) monitors (EIZO NANAO Corporation, Ishikawa, Japan).

As much as possible, we selected subtle abnormalities for each radiological area, which could be overlooked if the image was somewhat altered, or over compressed. We also tried to restrict the abnormalities to three to four possibilities, however due to the nature of certain radiological areas, including the numerous types of abnormalities encountered, this was not always possible. Here is a full list of abnormalities per radiological area:

  • Body abnormalities: abscess, collection, cyst, gallstone, hemangioma, liver contusion, liver lesion, mass, perihepatic fluid, scar, and subcutaneous emphysema

  • Chest abnormalities: atelectasis, bronchiectasis, fibrosis, ground glass opacity, mass, nodule, pulmonary embolism, pleural plaque, and tree in bud

  • MSK abnormalities: bone lesion and subtle fracture

  • Neuro abnormalities: aneurysm, arteriovenous malformation, hemorrhage, infarct, ischemia, mass, pneumocephalus, sinus thrombosis, stroke, and subgaleal hematoma

Images were randomly assigned one of seven possible compression ratios: uncompressed, JPEG low, JPEG medium, JPEG high, JPEG 2000 low, JPEG 2000 medium, or JPEG 2000 high. Compression ratios were based on values determined in the CAR Standards for Irreversible Compression in Digital Diagnostic Imaging.

We investigated the effects of compression on thin slice CT images in four anatomical areas of interest (body, chest, MSK, and neuro). The evaluation consisted of four independent collections of images or worklists, which have been viewed by a minimum of five readers each.

Compression Technique

Ensuring diagnostic quality in compressed images is imperative. Significant advances have been achieved in compression algorithms, moving from JPEG, to wavelet, and now to JPEG 2000. Irreversible compression allows a greater size reduction with no significant visual quality loss or image degradation, and the severity of the degradation is strictly dependent on the compression ratio. JPEG is the most widely accepted compression tool, but it has been shown that the JPEG-2000 algorithm may provide higher compression levels than JPEG at an equivalent or higher image quality [11]. The new features in JPEG 2000 compared to JPEG are [7]:

  1. 1.

    Efficient irreversible and reversible compression within a single unified coding framework

  2. 2.

    Progressive transmission and spatial scalability (thumbnails)

  3. 3.

    Superior image quality; broad range of image types

  4. 4.

    Support for region of interest coding

  5. 5.

    Support for continuous tone and bilevel compression (BW and color)

  6. 6.

    Robustness to bit errors (wireless communication applications)

  7. 7.

    Avoids excessive memory usage

In order to provide a fair assessment of existing compression tools, we tested and compared the two most commonly adopted DICOM compression algorithms: JPEG and JPEG 2000. We used compression ratios for each modality and specialty recommended in the CAR Standards as shown in Table 1. As in our previous study, we used a widely adopted commercial compression package, PICTools JPEG 2000 from Pegasus Imaging (Accusoft Pegasus, Tampa, FL, USA; [12].

Table 1 Recommended compression ratio by modality and specialty

Compression Ratios

Based on results from our previous study, we tested three different levels of compression for both JPEG and JPEG 2000 compression algorithms at 3 different compression ratios: 6:1 (low), 9:1 (medium), and 12:1 (high). By virtue of the fact that our study included both JPEG and JPEG 2000, we were able to compare the performance of the two compression algorithms. In addition to compressed images, each set of images also contained uncompressed images for evaluation. Each work list included 70 cases, each one having a stack of images for CT scans, representing a ratio of six compressed for one uncompressed image. The entire set of images was randomized and readers were not shown the same image to evaluate twice. Images were collected from other collaborating hospitals in the Hamilton Health Sciences network to ensure that our findings could be generalized to images generated by different brands of acquisition equipment.

Measurement

Our methodology included two recognized evaluation techniques: a subjective assessment based on comparison of original and compressed images, and an objective assessment of diagnostic accuracy. In the subjective assessment, readers compared original and compressed images and rated the degree of degradation. In the objective assessment, readers were presented with images consisting of a mix (one normal/four abnormal) of normal and abnormal cases.

  1. 1.

    Diagnostic accuracy evaluation: Images consisting of a mix of normal cases and identified pathologies were presented to reviewers for their assessment. Using normal images was important as it (1) allowed us to examine the possibility that artifacts may simulate abnormalities in a normal image and (2) it will control for a possible bias that may have occurred if readers knew that each image contains an abnormality. Images were displayed with a grid overlay, which divided the image into four equal quadrants. The grid could be toggled on and off by the reader during the evaluation. Using the grid and drop-down menus, readers were required to identify the type and location of the lesion. Radiologists then provided a confidence rating of 0–5 (0 is definite absence of lesion and 5 is definite presence).

    The diagnoses provided by radiologists were then used to compute sensitivity (true positive rate), specificity (true negative rate), and overall accuracy.

  2. 2.

    Forced choice evaluation technique: Each compressed image was paired with its original and the observer was asked to compare both images and rate the perceptible difference on a scale from 0 to 5, where 5 represents no visible difference and 0 is unacceptable.

Study Design

Sample of Readers

We enrolled 25 readers across Canada to participate in our study. All were certified by the Royal College of Physicians and Surgeons of Canada or an equivalent specialty body. All had between 5 and 25 years experience. To ensure an adequate sample size, we required a minimum of five readers for each of our reading sessions representing a minimum of 20 radiologists for our four reading sessions. Radiologists were asked to read in a manner consistent with their daily experience. They were blinded to the level and type of compression in each image.

The data from each radiologist in each condition (body area (body, chest, MSK, neuro)), compression type (JPEG, J2000), compression ratio (low, med, and high) was first used to compute an overall sensitivity (true positive/total positive), specificity (true negative/total negative), and accuracy (true (positive + negative)/total). Mean difference between compressed and uncompressed images (six-point scale, where 5 = no difference, 0 = large difference) and mean confidence in diagnosis (six-point scale, where 5 = 100 % confidence, 0 = no confidence) were computed.

Analysis

The data were then subjected to an analysis of variance. First, an overall analysis of variance (ANOVA) was conducted across all body parts. There were three factors in this analysis—body region (four levels) compression (0 = uncompressed, 1 = JPEG, 2 = J2000) and level (1 = low comp, 2 = med comp, 3 = high comp (only for type = 1, 2)). Overall correlations between accuracy, confidence, and perceived difference were calculated. The ANOVA was then repeated for each body region. Separate ANOVAs were performed for sensitivity, specificity, accuracy, confidence, and perceived difference.

Results

The data on sensitivity, specificity, accuracy, confidence, and perceived difference, averaged across body systems, is shown in Table 2. Results are displayed in histogram form for each of the five dependent variables, with p values noted.

Table 2 Results of the study for sensitivity, specificity, accuracy, confidence and perceived difference averaged over all body systems

Sensitivity

Mean value for the uncompressed images was 0.70, for JPEG was 0.60, for JPEG 2000 was 0.62. There was no significant difference by Type (p = 0.566), or by level of compression (p = 0.629). The type by level interaction was significant (p = 0.037) reflecting that JPEG was low on medium compression, but JPEG 2000 was high. Because there was no overall significant difference by type or level and no consistent trend, we conclude that compression had no significant effect on sensitivity.

Specificity

Mean value for the uncompressed images was 0.61, for JPEG was 0.61, for JPEG 2000 was 0.67. There was no significant difference by type (p = 0.467) or by level of compression (p = 0.072) and no interaction (p = 0.761). Because there was no overall significant difference by type or level and no consistent trend, conclude that compression had no significant effect on specificity.

Accuracy

Mean value for the uncompressed images was 0.69, for JPEG was 0.60, for JPEG 2000 was 0.62. There was no significant difference by type (p = 0.478) or by level of compression (p = 0.541) and no interaction (p = 0.566). Because there was no overall significant difference by type or level and no consistent trend, we conclude that compression had no significant effect on accuracy, despite the apparent reduction overall.

Confidence

Mean value for the uncompressed images was 4.1, for JPEG was 4.03, for JPEG 2000 was 4.12. There was no significant difference by type (p = 0.430) or by level of compression (p = 0.250) and no interaction (p = 0.143). Because there was no overall significant difference by type or level and no consistent trend, we conclude that compression had no significant effect on confidence, despite the apparent reduction overall.

Perceived Difference

Mean value for the uncompressed images was 4.68, for JPEG was 4.16, for JPEG 2000 was 4.53. There was a significant difference by type (p = 0.05), but not by level of compression (p = 0.185) and no interaction (p = 0.453). JPEG images were perceived as significantly worse than J2000 or uncompressed images.

We then analyzed the correlation between confidence rating, perceived difference and accuracy. If image compression is resulting in significant loss of clarity (as measured by perceived difference) this should be reflected in lower accuracy and lower confidence. In fact, the correlations among the measures were very low. Correlation between the perceived difference and accuracy was −0.125 (n.s.). Similarly, the correlation between perceived difference and accuracy was−0.055. The relation between confidence and perceived difference was 0.263 (p = 0.001). Thus while image degradation was possible detectable (a greater perceived difference for JPEG, but not for JPEG 2000 images), this did not affect confidence or diagnostic accuracy.

Finally, we repeated the analysis within body system, as shown in Tables 3, 4, 5, 6, and 7. To control for type I error rate, we used a Bonferroni correction on the p value. Since there were a total of 75 computed p values, we used an alpha of 0.05/75 = 0.0007. Under these circumstances, although 12 of the 75 p values were less than 0.05, none achieved a significance level of 0.0007. Moreover, on inspection, there was no consistent pattern except that, as noted in the overall analysis, JPEG was rated lower on perceived difference for all four regions, and was significant and relatively large for chest (3.58 vs 4.53 for JPEG 2000 and 4.67 for uncompressed).

Table 3 Analysis of sensitivity by body system
Table 4 Analysis of specificity by body system
Table 5 Analysis of accuracy by body system
Table 6 Analysis of confidence by body system
Table 7 Analysis of perceived difference by body system

Conclusions

Although some level of degradation was visually detectable on JPEG images, the overall analysis showed that the compression (level/type) had no affect on subjective assessment or diagnostic accuracy for the anatomical areas investigated.

For diagnostic accuracy, there was no significant difference by type or by level of compression. Because there was no overall significant difference by type or level and no consistent trend, we conclude that compression had no significant effect on accuracy, despite the apparent reduction overall. Similarly, although the highest level of compression appeared to show decreased confidence for both JPEG and JPEG 2000, this was not statistically significant.

Assessment of perceived difference showed a significant difference by type (uncompressed, JPEG, and JPEG 2000) but not by level of compression (6:1, 9:1, and 12:1). However, visually, there appeared to be an approximate downward trend with increased compression. JPEG images were perceived as significantly worse than JPEG-2000 or uncompressed images.

As a general recommendation, we suggest that at the levels of irreversible compression tested, irreversible JPEG and irreversible JPEG 2000 on thin slices (less than 3 mm) CT images, are both appropriate to use for medical images.