Background

Highly consistent, reproducible and standardized response criteria are essential to evaluate the efficacy of new anti-cancer drugs in multicenter trials [1]. The World Health Organization (WHO) criteria and the Response Evaluation Criteria in Solid Tumors (RECIST) have been widely used as the only imaging biomarker presently approved by the United States Food and Drug Administration (FDA) for drug testing, although the use of functional imaging methods such as Positron Emission Tomography (PET) Response Criteria in Solid Tumors (PERCIST) complements the limitations of anatomic methods in treatment response assessment in terms of biological relevance and prognostic information [24]. The criteria require either two dimensional (WHO criteria- sum of the product of greatest perpendicular dimensions in the transverse plane over all target lesions) or one dimensional (RECIST - sum of single longest dimensions in the transverse plane for arbitrary five lesions per organ and up to ten lesions per patient) tumor size measurements [1, 412]. Three-dimensional radiologic assessment of tumor burden also has been performed using volumetric techniques [13, 14].

Several quality assurance (QA) phantoms for anatomic measurement have been developed for assessment with computed tomography (CT) and magnetic resonance imaging (MRI) [1522]. However, the development of these phantoms has predominately focused on clinical assessment. Yet, in recent years, there has been an emphasis to improve preclinical anti-cancer drug testing by incorporating longitudinal imaging of tumor models with use of preclinical scanners specially designed for small rodents [2325]. A tumor measurement QA phantom for preclinical studies in rodent models could be used to identify and correct biased measurement results for tumor size determined with different imaging modalities in multiple laboratories or institutions [26]. In addition, the verification for radiologic assessment of tumor size change using this QA phantom would allow standardization of imaging protocols prior to animal studies, thus potentially reducing the number of animals required, increasing study efficiency and decreasing cost.

This Technical Advance describes the evolution in design, construction and testing of a multimodality QA phantom for use with preclinical scanners. Initial design attempts modified commercial phantoms available for human testing. By using the results from these early versions, the UTHSCSA multimodality tumor measurement QA phantom was successfully constructed for further quality assurance testing of tumor size in rodent models.

Methods

Gammex/UTHSCSA Mark 1 phantom

In 2007, the Gammex 404 GS LE phantom was modified in an attempt to construct the first generation tumor measurement phantom and this new phantom was denoted as Gammex/UTHSCSA Mark 1 phantom (Figure 1 a and 1b, Table 1). The phantom was composed of four sets of measurement calibration standards: A. Image Caliper - stainless steel wires at 2 mm vertical intervals and 3 mm horizontal intervals, B. Volume - two spheres with volumes 179.59 and 523.59 mm3, C. Diameter - long cylinder (5 to 10 mm by 0.5 mm intervals) and D. Diameter Depth Dependence - 2 mm-cylinders from 2 to 15 mm depth.

Figure 1
figure 1

Gammex/UTHSCSA Mark 1 phantom. (a) Design, (b) photograph, (c) image caliper, volume, diameter and diameter depth dependence of US images, (d) CT and (e) T1 weighted MR images of the phantom are displayed. Visualsonics US unit with 35 MHz frequency, clinical Philips CT and clinical Philips 3T MRI units were used. The size of the phantom was too large to fit into the bore of all preclinical CT and MR scanners.

Table 1 Comparison of Gammex/UTHSCSA Mark 1 phantom, Gammex/UTHSCSA Mark 2 phantom and UTHSCSA multimodality tumor measurement phantom

US, CT and MR images of the phantom were obtained following the imaging protocols listed in Table 2 (Figure 1 c-e). The size of test objects in the phantom was measured three times independently. For US, visual measurements were made using a measurement tool in Vevo 770 v.2.2.3 software (Visualsonics Inc., Toronto, ON, Canada). For MRI, a full width at half maximum (FWHM) method was used in ImageJ software (Version 1.42q, National Institutes of Health, Bethesda, MD). RECIST and WHO analyses were performed according to their definitions. For volume accuracy, the equation V = π/6·a·b·c where a, b and c are diameters in three perpendicular dimensions was used [27].

Table 2 Imaging modalities and protocols used in this study

Gammex/UTHSCSA Mark 2 phantom

After testing the Gammex/UTHSCSA Mark 1 phantom in multiple imaging modalities, the phantom was redesigned based on the number of target lesions (five per organ) required by RECIST and the dimensions necessary for use in preclinical scanners (Figure 2 a and 2b, Table 1). The phantom consisted of five tumor-simulating test objects with different diameters of measurement calibration standards: A. Diameter - low contrast spheres (2, 4, 7, 10 and 14 mm) and B. Volume - low contrast spheres with volumes 4.2 to 1436.8 mm3. The sizes of test objects were chosen based on the following reasons: 2 mm is the smallest tumor size in rodent models that can be readily palpated and 14 mm is the maximum tumor size tolerated without perturbing influence by host animal physiology.

Figure 2
figure 2

Gammex/UTHSCSA Mark 2 phantom. (a) Design, (b) photograph, (c) US, (d) micro-CT and (e) T1 weighted MR images of the phantom are shown. Visualsonics US unit with 35 MHz frequency, Gamma-Medica Ideas micro-CT unit and clinical Philips 1.5 T MRI unit were used. Sphere distortion in US images and poor contrast in CT images show limitations of the phantom (red arrow).

US, CT and MR images were acquired and the size of test objects in the phantom for the US and MR images was measured to calculate RECIST, WHO and volume as described for the Mark 1 phantom (Figure 2 c-e, Table 2).

UTHSCSA multimodality tumor measurement phantom

A new multimodality tumor measurement phantom was constructed to improve the contrast and geometry of Gammex/UTHSCSA Mark 1 and 2 phantoms (Figure 3 a and 3b, Table 1). The phantom had five test objects of 2, 4, 7, 10 and 14 mm as Gammex/UTHSCSA Mark 2 phantom but was constructed with smaller dimensions (length × width × depth of 11.5 cm × 3.8 cm × 2.4 cm) so that it would fit any preclinical scanner. The phantom was made in house of tissue mimicking (TM) materials based on methods developed in Dr. Ernest L. Madsen's laboratory at the University of Wisconsin Madison [28] (See additional file 1: technical appendix with detailed description of phantom construction; additional file 2: Table S1 summarizing the phantom ingredients; additional file 3: Figure S1 describing silicone mold preparation; additional file 4: Figure S2 describing silicone mold procedures; and additional file 5: Figure S3 describing phantom assembly procedures).

Figure 3
figure 3

UTHSCSA multimodality tumor measurement phantom. (a) Design, (b) photograph, (c) US, (d) micro-CT and (e) T2 weighted MR images of the phantom are shown. Visualsonics US unit with 35 MHz frequency, Gamma-Medica Ideas micro-CT unit and Bruker 7.0 T MRI unit were used. Artifacts, size and contrast problems noted with Gammex/UTHSCSA Mark 1 and Mark 2 phantoms were solved with this phantom.

US, CT and MR images of the phantom were acquired (Figure 3 c-e, Table 2). The size of test objects in the phantom was measured to calculate RECIST, WHO and volume for the US and MRI images as described for the Mark 1 phantom. For CT, a FWHM method was used in ImageJ software. For CT and MRI, contrast (%) was calculated using the equation C = (Sbackground - Sobject)/Sbackground.

Statistical analysis

Linear regression analysis was performed on design volume (NIST traceable gold standard) as a function of measured volume of test objects for Gammex/UTHSCSA Mark 2 phantom and the UTHSCSA multimodality tumor measurement phantom using GraphPad Prism software (Version 5.01, GraphPad Software Inc, San Diego, CA). Analysis of RECIST and WHO for all three phantoms and that of volume for Gammex/UTHSCSA Mark 1 phantom could not be performed because two data points were insufficient for statistical analyses. A p-value < 0.05 was considered statistically significant.

Results

Multimodality images of Gammex/UTHSCSA Mark 1 phantom, Gammex/UTHSCSA Mark 2 phantom, and UTHSCSA multimodality tumor measurement phantom

Table 1 summarizes the size, weight, material, components and problems of Gammex/UTHSCSA Mark 1 phantom, Gammex/UTHSCSA Mark 2 phantom, and UTHSCSA multimodality tumor measurement phantom. Figure 1 c - e depicts US, CT and MR images of Gammex/UTHSCSA Mark 1 phantom. During testing of the Mark 1 phantom, several problems became evident. First, although the Mark 1 phantom was initially designed to fit into preclinical scanners, it was too large to fit into the bore of preclinical mouse CT and MR units. Second, CT contrast in the two spheres for volume accuracy needed to be improved for accurate size measurement. Third, reverberation in the surface of the phantom interfered with US imaging.

In the Gammex/UTHSCSA Mark 2 phantom, the design was simplified based on the number (five per organ) of target lesions required by RECIST as shown in Figure 2. The UTHSCSA multimodality tumor measurement phantom had the same structure as the Mark 2 phantom but the geometry and contrast of the phantom were improved by reducing the size of the phantom and adding contrast agents to the TM materials as displayed in Figure 3. Tumor-simulating test objects appeared darker than background in all three images and the contrast between test objects and background (CT: 9.67% and MRI: 25.15%) was sufficient to distinguish test objects and measure their size. Except for a small reverberation close to the surface in the US images, no artifacts were evident for the UTHSCSA multimodality tumor measurement phantom.

Size measurement in Gammex/UTHSCSA Mark 1, Gammex/UTHSCSA Mark 2 phantom, and UTHSCSA multimodality tumor measurement phantom

RECIST, WHO and volume analyses for two spheres from US and MR images of the Gammex/UTHSCSA Mark 1 phantom are displayed in Table 3. For the Mark 1 phantom, smaller errors were determined for RECIST for both US (1.73 ± 0.44%) and MRI (-2.65 ± 3.74%) compared with WHO (US, -4.75 ± 1.30%; MRI, -7.56 ± 6.52%), with MRI errors larger than for US by both RECIST and WHO. For volume analysis, MRI errors were larger than for US for both the 7 mm and 10 mm test objects. RECIST, WHO and volume analyses for CT were not determined due to inadequate CT contrast.

Table 3 RECIST, WHO and volume analyses from US and MR images of Gammex/UTHSCSA Mark 1 phantom

Table 4 shows RECIST, WHO and volume analyses for five test objects from US and MR images of the Gammex/UTHSCSA Mark 2 phantom. Measurements from CT images were not determined due to the same reasons as mentioned for the Mark 1 phantom (Figure 2 d). For US, RECIST (5.66 ± 1.41%) had larger errors than WHO (-0.16 ± 1.32%). For MRI, RECIST and WHO analyses showed small errors ranging from 0.39 ± 2.54% for RECIST to -2.05 ± 2.79% for WHO. Volumes calculated from US images had larger errors (range of -5.69 ± 1.59% to 7.29 ± 5.65%) for smaller test objects (2, 4 and 7 mm) which improved with the analyses of the 10 mm (3.99 ± 2.03%) and 14 mm (1.21 ± 0.66%) test objects. Volume analysis from MR images showed similar features to that for US but had much larger errors (range of -21.81 ± 66.60% to 11.86 ± 21.62%) for smaller test objects (2, 4 and 7 mm). For Mark 2 phantom, tumor volume measured by US and MRI correlated (p < 0.0001) with design volume (Table 4). The best fits for US and MRI versus design volume were line y = 1.014 ± 0.009x - 0.152 ± 6.341 (R2 = 0.9998; p < 0.0001) and line y = 0.962 ± 0.011x - 6.665 ± 7.357 (R2 = 0.9996; p < 0.0001), respectively.

Table 4 RECIST, WHO and volume analyses from US and MR images of Gammex/UTHSCSA Mark 2 phantom

RECIST, WHO and volume analyses for the UTHSCSA multimodality tumor measurement phantom are displayed in Table 5. Unlike results for the Mark 2 phantom, RECIST and WHO calculations showed reduced errors (range of -1.47 ± 0.25% to 1.69 ± 0.33%) for all three modalities. RECIST analysis showed smaller errors than WHO analysis except for CT. For volume analysis, errors were ≤ -2.84 ± 2.49% except for the 10 mm test object in MRI (-5.34 ± 0.76%) and the smallest test object (2 mm) with errors ranging from -18.30 ± 10.65% to 5.72 ± 0.60% for CT and MRI, respectively. For the UTHSCSA multimodality tumor measurement phantom, US-, CT- and MRI-measured tumor volume also correlated (p < 0.0001) with design volume (Table 5). US, CT and MRI -measured volume versus design (NIST traceable gold standard) volume had the best fit of lines y = 0.980 ± 0.003x + 2.277 ± 2.261 (R2 = 1.000; p < 0.0001), y = 1.011 ± 0.004x + 0.413 ± 3.052 (R2 = 0.9999; p < 0.0001) and y = 0.977 ± 0.008x - 1.013 ± 5.613 (R2 = 0.9998; p < 0.0001), respectively. These results demonstrate that technical personnel using the phantom could quickly prove the data from all three modalities is acceptable over the entire range of sizes with error limits determined by the study designer by comparing the slope and intercept values from a simple regression analysis (Table 5).

Table 5 RECIST, WHO and volume analyses from US, CT and MR images of UTHSCSA multimodality tumor measurement phantom

Discussion

Previous QA phantoms constructed for size measurement had various tumor shapes and focused predominately on measurement of test objects from CT and MRI images using measurement protocols unique to their institution [1517]. This study focused on construction of a phantom with a simple spherical test object design based on a FDA approved imaging biomarker (WHO criteria, RECIST) for use with multiple preclinical imaging devices. As discussed in Table 1, the Gammex/UTHSCSA Mark 1 and Mark 2 phantoms were too large to fit into the bore of some preclinical CT and MR scanners. Since certain components of the Mark 1 phantom such as image caliper and depth dependence were not required for QA of tumor size measurement, these features were deleted in the Mark 2 phantom based on RECIST (Figure 2). Composite aluminum poly film (Figures 1 c and 2 c) on the Mark 1 phantom surface caused reverberation artifact in the US images that were corrected in future phantoms by using thin composite polyethylene terephthalate/aluminum/linear low density polyethylene (PET/AL/LLDPE). In addition, test objects in US images of the Mark 2 phantom did not appear as perfect spheres compared with those in MR images (Figure 2c). The beam dispersion in the region deeper than focal depth created distortion in the spheres (overestimated diameter in horizontal directions and underestimated diameter in depth). The contrast in CT images of both phantoms was not sufficient to make size measurements (Figure 1 d and 2 d).

In the UTHSCSA multimodality tumor measurement phantom, size, distortion and contrast problems were solved for the images acquired with all three modalities (Figure 3 c-e). First, the diameter of the tumor measurement phantom was reduced to fit within the bore of all preclinical scanners. Second, the center of test objects was designed to be set above the focal depth (10 mm for 35 MHz transducer) to avoid distortion. Third, barium sulfate was used for pronounced CT contrast. As a result, test object measurements were improved for the UTHSCSA multimodality tumor measurement phantom. For all imaging modalities, RECIST and WHO errors were reduced for UTHSCSA multimodality tumor measurement phantom (≤1.69 ± 0.33%) compared with both Mark 1 (≤ -7.56 ± 6.52%) and Mark 2 (≤ 5.66 ± 1.41%) phantoms.

RECIST values were more accurate than WHO values for the UTHSCSA multimodality tumor measurement phantom except for CT. This result corresponded to the fact that WHO criteria are known to give higher risk of measurement error and overestimation of response rates [9]. Volume calculation of the smallest test object (2 mm) in the UTHSCSA multimodality tumor measurement phantom had the largest errors of -15.34 ± 0.04% and -18.30 ± 10.65% for US and CT, respectively, and errors were reduced for larger test objects (≤ -2.84 ± 2.49%) except for 10 mm sphere by MRI (-5.34 ± 0.76%) (Table 5). This explains why small tumors smaller than or equal to 2 mm in preclinical and clinical tumor models cannot be measured with high accuracy.

Conclusions

The UTHSCSA multimodality tumor measurement phantom design and construction methods provide adequate image quality for validating tumor size measurement in three commonly used preclinical imaging modalities (US, CT and MRI). This tumor measurement phantom provides a potential QA tool for monitoring radiologic assessment of tumor size change in future multi-institutional studies requiring integration of data from disparate sources and devices.