Introduction

The wide availability and technical improvement of magnetic resonance imaging (MRI) has led to an increase in detection of vestibular schwannoma (VS) at an early stage [1]. Accordingly, the incidence of diagnosed VS has increased [2, 3]. The natural development of VS remains uncertain as growth percentages between 30% and 90% have been reported, depending at least in part on the length of the observation period [2, 4]. So far, no clinical parameters have been identified to correlate with VS growth [57], and therefore, VS growth is objectified by performing consecutive MRI [2, 5]. If growth is found on MRI, an intervention may be chosen, such as surgical resection or radiation therapy [6, 8]. Most patients therefore enter the so-called wait and scan policy for a certain period, in which the audiovestibular symptoms are monitored regularly and VS growth is measured on consecutive MRI [2].

Radiologists generally use two-dimensional measurements to assess VS growth, although volume measurements seem to provide more accurate growth assessment [1, 9, 10] since VS shows asymmetric growth in all directions. Little information is published about the diameter increase or volume increment between subsequent images that constitute to VS growth beyond measurement error [912]. Usually, when an increase in tumor diameter or volume is found, it is considered to be growing, but validation of this observation is lacking. Especially when invasive treatment decisions are based on these observations, it is of great importance to find the most suitable method to assess growth of VS on MRI and to provide a definition of growth beyond measurement error.

This study focuses on the accuracy and reproducibility of VS volume measurements compared to two-dimensional measurements to determine VS growth on MRI. The hypothesis is that measurement of tumor volume with specific area tracing software is a more accurate tool compared to two-dimensional measurements for determining tumor growth.

Materials and methods

All patients who received an MRI scan of the cerebellopontine angle (CPA) between January 2003 and March 2008 in our tertiary referral center were analyzed retrospectively. Patients were included in this study if a radiological diagnosis of a VS was made, resulting in 102 patients. Thirty patients were excluded who had been treated by surgery or radiotherapy, resulting in 72 patients. Four patients who had an intralabyrinthine schwannoma were also excluded. There were no patients with neurofibromatosis type 2. MRI images of 68 patients, 32 (47%) men and 36 (53%) women, age range of 36–84 years, median age of 63.5 years, were available; one scan was available in 21 patients, two scans were available in 22 patients, three scans in ten patients, four scans in eight patients, five scans in four patients, and six scans in three patients, resulting in a total of 165 scans suitable for analysis. In patients with more than one scan, mean follow-up was 21.8 months (SD 15.7).

All examinations were performed at 1.5 T (Gyroscan, Powertrack 6000, Philips, Best, The Netherlands) using a head–neck coil (Philips, Best, The Netherlands). The MR protocol consisted of axial 2D SE T1-weighted images (TR/TE, 550/15 ms; slice thickness, 3 mm; inter slice gap, 0.3 mm; number of slices, 12;FOV, 180 mm (RFOV 80%); and matrix 256 × 256), axial 3D TSE T2-weighted images (TR/TE, 3,000/250 ms; slice thickness, 0.35 mm; number of overcontiguous slices ,30; FOV, 130 mm (RFOV 80%), and matrix 256 × 256) covering the skull base, and contrast-enhanced (gadolinium 0.2 ml/kg body weight) axial 3D ISO T1-weighted images (TR/TE, 8.9/4.6 ms; slice thickness, 1 mm; FOV, 256 (RFOV 80%); and matrix 256 × 256) covering the entire skull base and cranium. All patients underwent the same MRI protocol with similar parameters and planes of acquisition to ascertain an optimal correlation in serial scans.

Two readers, experienced in head and neck imaging, independently performed the measurements on contrast-enhanced T1-weighted images (CE T1-WI) and on the corresponding T2-weighted images (T2-WI). Both observers were blinded to each other’s MR assessments and clinical information.

For two-dimensional assessment of VS, the maximum diameter was measured in three diameters: anteroposterior (AP), mediolateral (ML) [including the portion in the internal auditory canal (IAC)], and craniocaudal (CC) (Fig. 1a, b). To establish these dimensions, a digital submillimeter ruler was used. Volume assessment was done on a stereotactic radiotherapy treatment planning station, fitted with iPlan® RT image version 3—Advanced Contouring Workstation (BrainLAB Oncology Solutions, Feldkirchen, Germany). MR images were uploaded in this system, and area tracing software was used to outline the VS on each MR image that contained tumor tissue (Fig. 2a). If there was a sharp contrast with surrounding tissue, the auto brush function (surrounding the VS automatically) was used. Manual segmentation was necessary in cases in which differentiation with surrounding tissue was difficult because of the high sensitivity of this autotracer. Each segmentation result was checked visually. By tracing the VS surface on all slices, the software was able to calculate VS volume (Fig. 2b). Volumetric analysis was expressed in cubic centimeter.

Fig 1
figure 1

a, b Contrast-enhanced T1-weighted image with a vestibular schwannoma in the cerebellopontine angle (CPA) on the right side. a Measurements in the axial plane: X is the maximum mediolateral, and Y the maximum anteroposterior dimension; b in the coronal plane, the Z demonstrates the maximum craniocaudal dimension

Fig. 2
figure 2

a: Example of area tracing with volume software. Axial contrast-enhanced T1-weighted image shows a right-sided vestibular schwannoma (asterisk) with a large cerebellopontine angle component. The red line is the result of the autotracer which lines the vestibular schwannoma. b Three dimensional representation of a vestibular schwannoma (VS), integrating the surface of all slice intervals. The small intracanalicular (A) and large extracanalicular (B) portion of the VS can easily be identified

To compare reproducibility of the measurements in different size categories, VS were classified into four stages, as defined by Hasegawa et al. [13]: stage A, intracanalicular VS; stage B, VS extending into the CPA; stage C, VS compressing the brain stem; and stage D, VS deviating the fourth ventricle.

Statistical analysis

Reproducibility measures consist of agreement and reliability parameters [14]. These parameters were evaluated for measurements by two different readers at one point in time. SPSS 15.0 statistical software (SPSS, Chicago, Il, USA) was used to perform the statistical calculations. For evaluation of interobserver reproducibility, we used baseline measurements of the MRI scans from the 68 patients for both readers, which can be considered as independent observations, whereas consecutive measurements within patients are correlated.

Agreement parameters

Agreement parameters measure the ability to achieve the same value in two measurements and give an indication of the size of measurement errors [14]. The agreement between two readers was evaluated using Bland and Altman plots [15]. The Bland and Altman plot is the most robust method to quantify agreement in clinical measurements. Here, the differences between measurements (on the Y-axis) are plotted against the mean of two measurements (on the X-axis). The visual representation of agreement illustrates the magnitude and range of the differences, bias or outliers, and the relation between the magnitude of the differences and the magnitude of the mean values [15]. This method also assesses 95% limits of agreement. These limits of agreement are used to define the smallest detectable difference (SDD) as 1.96 × the standard deviation (SD) of the mean difference in measurements between the two readers. SDD represents the change that can be detected beyond measurement error [15]. We can define a tumor to have grown when the difference between two measurements falls outside this interval. In this way, it will be possible to discriminate between stable tumors and growing tumors, according to our measurements. For both two-dimensional and volume measurements, the SDD for absolute differences and for differences relative to baseline [SDD (%)] was calculated. Differences relative to baseline were calculated using the following formula: (AB)/[(A + B)/2] × 100, in which A is the result of reader A and B the result of reader B. The SDD and SDD (%) were presented for four VS stages: A, B, C, and D.

Use of the SDD (%) enables comparison of the different unities involved in the two different measurement techniques (millimeter and cubic centimeter). Because, in clinical practice, all three diameter measurements (CC, AP, and ML) are equally essential in assessing VS progression, we considered the diameter with lowest agreement as the limiting factor in diameter measurements. We compared the SDD (%) of this diameter with the SDD (%) of volume measurements.

Reliability parameters

Reliability parameters assess whether measurements can be used to distinguish patients from each other despite measurement error [14]. A parameter of reliability is the intraclass correlation coefficient (ICC). The ICCs are defined as the ratio of the variance among patients (patients variability) over the total variance (among patients, among readers, plus the error variance), which is expressed as a dimensionless number, being one (perfect reliability) in the most ideal case. ICC was calculated for interobserver diameter and volume VS measurement and was also presented for four VS stages: A, B, C, and D.

Results

Bland and Altman plots were constructed using data of the baseline MR images of the 165 scans from 68 patients (Figs. 3 and 4). The SD for each reader (A and B) and the SD of the mean difference between readers are presented in Table 1. The SDD and SDD (%) for absolute differences and differences relative to baseline MR images, respectively, are presented in Table 2. In Table 3, we present the ICC with 95% confidence intervals.

Fig. 3
figure 3

Bland and Altman plot of baseline two-dimensional maximum craniocaudal (CC) dimension measurements on contrast-enhanced T1-weighted images (CE T1-WI). The values on the Y-axis represent the measurement differences between the two readers and their mean difference (thin line). The values on the X-axis represent the mean of both measurements. The thick black lines represent the 95% limits of agreement. Interobserver differences are larger in smaller vestibular schwannomas

Fig. 4
figure 4

Bland and Altman plot of baseline volume measurements on contrast-enhanced T1-weighted images (CE T1-WI). The values on the Y-axis represent the measurement differences between the two readers and their mean difference (thin line). The values on the X-axis represent the mean of both measurements. The thick black lines represent the 95% limits of agreement. Interobserver differences are larger in smaller vestibular schwannomas

Table 1 Interobserver agreement parameters based on baseline T1-weighted, contrast-enhanced images (CE T1-WI) and T2-weighted images in 68 patients.
Table 2 SDD and SDD (%) for measurements performed on T1-weighted, contrast-enhanced (CE T1-WI) images and T2-weighted images.
Table 3 Intraclass correlation coefficients (ICC) for two-dimensional and volume measurements on T1-weighted, contrast-enhanced images (CE T1-WI) and T2-weighted images with their 95% confidence intervals.

Contrast-enhanced T1- versus T2-weighted images

Two-dimensional and volume assessments of VS were performed on both CE T1-WI (CC dimension, T1CC; AP dimension, T1AP; ML dimension, T1ML; volume, T1VOL) and T2-WI (CC dimension, T2CC; AP dimension, T2AP; ML dimension, T2ML; volume, T2VOL). The dimension with highest SDD (%) was taken as a limiting factor, when comparing both imaging modalities.

With two dimensional measurements, the SDD (%) for T1CC appeared to be equal to T2CC (40.3 and 40.1) However, the T2AP dimension showed higher SDD (%) compared to T1AP: 34.3 versus 28.3, respectively. Therefore, CE T1-WI showed highest agreement in two dimensional measurements. In addition, the ICC was consistently higher in T1CC, T1AP, and T1ML directions, compared to T2CC, T2AP, and T2ML directions (0.947, 0.974, and 0.978 versus 0.943, 0.961, and 0.948), reflecting higher reliability for CE T1-WI in two-dimensional measurements. For volume measurements, similar results were obtained: SDD (%) for T1VOL was 19.7 compared to 30.1 in T2VOL. ICC values for volume measurements for both sequences were 0.999.

Volume measurements versus two-dimensional measurements

The SDD (%) values for T1CC, T1AP, and T1ML were 40.3, 28.3, and 20.9, respectively (Table 2). All three dimensions are equally essential in estimating VS growth. Because the T1CC dimension is the limiting factor in these two-dimensional measurements, with its lowest agreement, it was used to compare the two-dimensional measurements with volume measurements. The SDD (%) for T1CC was considerably higher than the SDD (%) for volume measurements (T1VOL): 40.3% versus 19.7%. The SDD (%) decreased with increasing tumor size from stage A to type D VS, in both two-dimensional and in volume measurements (Table 2). In all tumor stages, volume measurements were associated with smaller SDD (%) compared to CC dimensional measurements. The ICC revealed that both two-dimensional and volume measurements showed high interobserver reliability. However, volume measurements revealed a higher reliability compared to the three diameter measurements (0.999 versus 0.947, 0.974 and 0.978). Reliability increased with tumor size, and volume measurements were more reliable in all VS stages (Table 3).

Smallest detectable differences in two-dimensional and volume measurements

SDD for two dimensional measurements on CE T1-WI varied from 2.12 to 2.98 mm. For volume measurements on CE T1-WI, SDD was 0.11 cm3 (Table 2).

Discussion

VS are benign neoplasms, originating from the neurolemnal sheath of the eighth cranial nerve and are predominantly found in the CPA and in the internal auditory canal (IAC). The incidence of VS varies from 1 to 2/100,000 [2, 12, 16, 17], although postmortem histopathological examinations show a higher incidence of about 2.7% [18]. This discrepancy indicates that the vast majority of VS never become symptomatic, reflecting very slow or arrested growth.

Therefore, the wait and scan policy has gained popularity as an alternative or prelude to surgery and radiation therapy. This can be justified, as growth is known to be extremely variable with most VS remaining stable or showing minimal growth for many years [16, 19]. The goal of this regimen is to minimize therapeutic risks and complications and to preserve an optimal quality of life in selected patients. Because no single reliable clinical feature exists that predicts tumor growth [57], MRI is the mainstay in the conservative management of VS [2, 5]. It is essential to use a measuring method that provides reliable measurements with a high interobserver agreement, as change in size is—besides its clinical presentation—the most relevant parameter.

Various ways of describing VS tumor growth have been proposed. The conventional method of assessing VS is by performing two-dimensional measurements. Some authors use the largest AP and/or ML dimension, sometimes combined with the CC dimension [5, 8, 1924]. Others use the guidelines for measuring VS described by the American Academy of Otolaryngology, Head and Neck Surgery (AAO-HNS) [6, 7, 25]. The usefulness of two-dimensional measurements could be doubted. Firstly, two-dimensional measurements assesses VS growth in maximal three directions, while a VS shows asymmetric growth in all possible directions; therefore, a two-dimensional assessment does not approach “real” tumor growth. Secondly, in VS with large diameters, a small increase in diameter corresponds to a much larger increase in VS volume than a similar increase in diameter in a small VS [1]. Volume measurements can be performed in several ways; some authors consider VS to be ellipsoid and calculate the volume using a mathematical formula [11, 21, 26, 27]. However, this method has shown to produce a large overestimation of VS volume [1, 11]. Others have performed true volumetric measurements by using (semi)automatic software to calculate VS volume [1, 9, 11, 12, 21, 28, 29]. According to the few studies comparing two-dimensional versus non-formula-based volume measurements, the VS volume measurements are more accurate compared to two-dimensional measurements [1, 9, 10]. Other authors disagree with this [16, 21, 27], and also in clinical practice, most clinicians keep relying on two-dimensional measurements. However, the results of this study indicate that VS volume measurements, especially on the CE T1-WI, produce a better interobserver agreement and reliability compared to the two-dimensional measurements. This study therefore indicates that CE T1-WI volume measurements should replace two-dimensional measurements in evaluating VS growth. The difference in interobserver agreement and reliability between the two measurement methods is of clinical significance because invasive treatment decisions are based on these observations. Therefore, the measurement method with highest agreement on reliability is necessary in assessing VS growth.

An exception may be made concerning small (stage A) VS. In these small intracanalicular VS, the CC dimension does not play an important role, since the diameter of the IAC is usually quite constant. When one uses the AAO-HNS guidelines, the CC direction is not taken into account at all. Then, the AP direction is the limiting factor of the remaining AP and ML directions (Table 2). In stage A VS, under these conditions, we found that measurement error [SDD(%)] of both two-dimensional and volume measurement techniques are comparable (AP 28.0, ML 27.1, and VOL 28.9). Therefore, both measurement techniques can be used to evaluate stage A VS. In larger VS (stages B and C), both AP and ML dimensions show higher measurement error [SDD (%)] compared to volume measurements. In stage D VS, the SDD (%) of the ML dimension equals the SDD (%) of volume measurements: 5 versus 5.7, respectively. This can be explained by the fact that the ML dimension is the longest distance in two-dimensional VS measurements, and measurement error will decrease when measurement distances increase. This occurs both in two-dimensional and in volume measurements and should be taken into account when evaluating growth in different VS stages (Table 2 and Figs. 3 and 4). Other authors confirm this phenomenon [1, 11, 30]. However, apart from this low SDD (%) of the ML dimension in stage D VS, the far worse SDD (%) of the AP dimension should also be taken into account in evaluating VS growth with two-dimensional measurements, thus pleading for the use of volume measurements in the assessment of stage B, C, and D VS.

Overall, contrast-enhanced images are necessary, both in volume and two-dimensional measurements, to maintain a high reproducibility, since it facilitates differentiation of VS from surrounding tissue.

The calculated agreement [SDD and SDD (%)] and reliability (ICC) values were compared with findings reported in the literature. Studies addressing the change in length and volume exceeding measurement error (SDD) are reported inconclusively. In two-dimensional VS measurements, many authors simply state a 1- or 2-mm increase in subsequent scans as evidence of growth without any statistical justification [2, 68, 20, 22, 31]. According to this present study, SDD varies between 2.12 mm for the ML dimension and 2.98 mm for the CC dimension on CE T1-WI (Table 2). This indicates that there is a measurement error in two-dimensional measurements, which is not recognized when a measured increase in size of only 1 or 2 mm between follow-up scans is considered as tumor growth, and treatment decisions should be used with caution when using these arbitrary criteria.

In volume measurements, an absolute increase above which one can consider a VS to have grown was not found in the literature. This study revealed SDD ranging from 0.05 to 0.26 cm3 (types A–D) on CE T1-WI (Table 2). Volume increase expressed as SDD (%) varied from 15 to 89% in the literature [912]. However, the numbers of patients in these studies were small [9, 11, 12], and generally, not only baseline measurements of the VS were taken into account [9, 10, 12]. Therefore, the percentages reported in the literature could be questioned. In this study, SDD (%) in CE T1-WI volume measurements was 19.7%. ICC for two-dimensional measurements was not found in the literature. One study calculated ICC for volume measurements: Luppino et al. [9] calculated ICC for two different types of volume measurement. Their “contour method,” similar to our volume method, had an interobserver reliability of 0.96, which is comparable to the ICC of 0.99 in this study.

This study and technique also harbor some limitations. Firstly, the assessments of reproducibility parameters were based on interobserver differences and not on intraobserver differences. It was assumed that this approach better reflects clinical practice, where it is usual that different clinicians assess subsequent scans.

Secondly, BrainLAB volume software is not widely available in radiology departments

Thirdly, performing these volume measurements takes a little more time compared to the conventional two-dimensional measurements. In our experience, VS contouring and volume calculation typically took a few minutes. This could be a limiting factor in introducing this method for VS in daily clinical practice. In the literature, 10–25 min for manual segmentation has been described, although these calculations were performed with different software and on older systems [10, 11].

Fourthly, the used volume measurement method is semi-automatic. However, there is still a human component that is responsible for an interobserver difference. It is desirable to develop software able to perform even better automated volume measurements in order to further diminish reader-related measurement error.

Conclusion

CE T1-WI volume measurements show better interobserver agreement and reliability compared to two-dimensional measurements for the assessment of growth of VS. Small intracanalicular VS form an exception. When evaluating VS growth, one has to take VS baseline characteristics into account because SDD (%) strongly depends on VS size. The 1- or 2-mm difference commonly used to define growth of VS in consecutive scans in two-dimensional measurements lies within measurement error and should not direct clinical practice.