Introduction

Pelvic organ prolapse (POP) is a major health care problem, with 11% of women undergoing surgery for POP and/or urinary incontinence during life time, of whom 30% have repeat surgery [1]. Failure to identify all of the involved compartments may result in incomplete surgical repair with subsequent persistence or recurrence of the prolapse [2, 3].

Since the introduction of fast imaging sequences, magnetic resonance (MR) imaging has become a promising diagnostic tool in the assessment of the pelvic floor dysfunction. Many protocols concerning the MR imaging procedure and interpretation on POP have been introduced. Until now, only two studies have addressed the reliability of prolapse staging on dynamic MR imaging [4, 5]. However, the reference lines and anatomical landmarks which should be used to assess the presence and the quantitative staging of the prolapse is a subject of ongoing debate.

The aim of the present study was to determine the intra- and interobserver reliability of dynamic MR staging in POP patients.

Materials and methods

Dynamic MR imaging of the female pelvic floor were reviewed in a cross-sectional observational study. MR imaging had been performed as part of routine clinical practice in patients with recurrent prolapse, especially in the posterior compartment, and in cases where the patient complaints did not correspond with clinical findings on request of the Department of Obstetrics and Gynecology at the Radboud University Nijmegen Medical Centre, which is a national tertiary referral centre for POP patients. The study was submitted and deemed exempt by the Institutional Review Board 1 August 2007.

A computer-generated list of 30 out of 63 patients who underwent dynamic MR imaging in the period of September 2005 through March 2007 were included. None of these images were regarded as unsuitable for assessment.

The MR datasets were assessed by two independent observers, an experienced radiologist (JF, 3 years of experience) and a novice observer (SB, half year of experience). One of the observers (SB) repeated the assessment of the set of images at least 1 month later, to determine the intraobserver reliability. Prior to the study, ten MR datasets not included in the study had been assessed in cooperation to reach consensus on interpretation of the MR images. A case record form with definitions of all measurement points and reference lines was used (see below). The observers were blinded to the clinical findings and the previous assessment of the images.

Imaging protocol

The dynamic MR imaging examination was performed with the patient in supine position with parallel and slightly flexed legs. Patients were requested not to void for 1–2 h prior to their examination. The rectum was opacified using 100–150 ml ultrasound gel. The urethra, bladder, and vagina were not opacified. No premedication was given. MR images were acquired using a 3T MR scanner (TIM TRIO, Siemens Medical, Germany) and an eight-channel body phased-array coil. MR images were obtained in the sagittal plane using a Half-Fourier acquisition single-shot turbo spin-echo sequence (2000 ms/90 ms repetition time/echo time; 150° flip angle). During the MR examination, the patient was asked to relax her pelvic floor muscles, to contract the muscles slowly, relax again, and then to increase the intra-abdominal pressure and strain in order to defecate. To assure that the patient followed the instruction given, all images were viewed online on the MR console. The MR examination time was 35 min. The images were analyzed offline at a later stage on a console with zoom facilities and electronic calipers. The midsagittal images were used to assess the prolapse.

Reference lines

The reference lines used to assess POP are shown in Fig. 1. According to literature, the pubococcygeal line was defined as a straight line between the inferior rim of the pubic bone and the last visible coccygeal joint [5, 6], the H-line as a straight line between the inferior rim of the pubic bone and the posterior wall of the anal canal on the level of the impression of the puborectal sling [7], and the mid-pubic line as a line drawn through the longitudinal axis of the pubic bone, passing through its midequatorial point [8].

Fig. 1
figure 1

MR image obtained at rest. Dynamic midsagittal Half-Fourier acquisition single-shot turbo spin-echo (2000/90; 150°) through the pelvis of a 62-year-old woman with symptoms of pelvic organ prolapse. The image shows the used reference lines. PCL pubococcygeal line, H-line, MPL mid-pubic line

Anatomical landmarks and clinical measurement points

A whirl of urine in the bladder and/or a dent into the cranial portion of the bladder, seen during straining on the sagittal images, indicated adequate straining. The MR images during rest and maximal straining were assessed for POP with the use of the various anatomical landmarks in all three compartments in relation to the previously mentioned reference lines. Anatomical landmarks used for each compartment were the bladder base and bladder neck for the anterior compartment, the distal portion of the cervix or the vagina vault for the middle compartment, and the anorectal junction and the most anteriocaudal point of the anterior rectal wall for the posterior compartment. The distance from the anatomical landmarks to the different reference lines was measured perpendicular.

Besides the aforementioned anatomical landmarks, we have also introduced clinical measurement points on MR images to approximate point Ba, C, and Bp of the pelvic organ prolapse quantification (POP-Q) system, which refers to the most descended edge of the anterior vaginal wall, cervix/vaginal vault, and posterior vaginal wall, respectively [9]. For the anterior compartment, we used the most posteriocaudal point of the anterior vaginal wall, for the middle compartment the most distal point of the cervix or the vaginal vault, and for the posterior compartment the most anteriocaudal point of the posterior vaginal wall. At rest, we additionally assessed the total vaginal length, measured from the fornix posterior or vaginal vault, following the contour of the vagina, until the crossing with the mid-pubic line. We assessed these measurement points in relation to the mid-pubic line because this line has been introduced by Singh et al. as a reflection of the hymenal remnants, which is the reference structure in the POP-Q system [8].

Qualitative staging is the most widely used method of prolapse staging. In addition to the quantitative staging, we have assessed the reliability of qualitative staging of prolapse. In case an individual measurement point descended below or above a reference line, the measurement was scored positive and negative, respectively.

Rectocele, enterocele, perineal descent, and genital hiatus

Furthermore, Fig. 2 shows the lines A and B, which are defined as an extended line of the anterior border of the anal canal [1012] and the expected margin of the normal anterior rectal wall, respectively [1315]. These lines were only applied in the presence of any outpouching of the anterior rectal wall. Additional measurements were then performed, which were the depth until the most anteriocaudal point of the anterior rectal wall, and the area and perimeter of the outpouching in relation to lines A and B. The complete or incomplete evacuation of the outpouching during defecation was evaluated.

Fig. 2
figure 2

MR image obtained at straining. Dynamic midsagittal Half-Fourier acquisition single-shot turbo spin-echo (2000/90; 150°) through the pelvis of a 58-year-old woman with symptoms of pelvic organ prolapse. The image shows the used reference lines A and B, applied in the presence of any outpouching of the anterior rectal wall

The presence of an enterocele was defined as any outpouching of the peritoneal sack, containing omentum and/or small bowel loops, into the rectovaginal space. The distance between the most distal point of the peritoneal sack and the vaginal vault, and the three reference lines, was measured, respectively.

Perineal descent was measured as a perpendicular distance between the pubococcygeal line and the anterior margin of the muscles sphincter ani [16, 17]. The dimension of the genital hiatus was defined as the distance between the inferior rim of the pubic bone and the posterior wall of the anal canal on the level of impression of the puborectal sling [7, 18].

Statistical methods

The sample size calculation was performed before the start of this study based on the precision of the reliability. Twenty-three patients are needed to obtain a relative precision of 15% in the SE. In order to reach the number of 23 for all measurements of POP, a total of 30 patients are needed.

The intraobserver reliability was assessed of the first and second measurements of one of the observers, and the interobserver reliability was assessed of the first measurements of this observer and an additional measurement of the second observer. The intraclass correlation coefficient (ICC) was calculated to measure the reliability of the quantitative MR imaging measurements [19]. A linear mixed model was used to calculate the ICC of each specific measurement of POP, separately. A separate model was used to study the intraobserver ICC and for the interobserver ICC. The independent variable was the specific measurement of POP. The dependent random variable was “patient,” and the dependent fixed variable was observer one (first, second) in case of the intraobserver model and observer (one, two) in case of the interobserver model. The mean difference between the two categories of the fixed variable and the ICC with their 95% confidence interval are presented. An ICC of more than 0.8 denotes excellent agreement, between 0.8 and 0.6 good agreement, between 0.6 and 0.4 moderate agreement, and below 0.4 poor agreement, respectively [20]. SPSS version 14.0 (SPSS,Chicago, IL, USA) was used to perform the statistical analysis.

Results

Thirty women with POP were included. The median age was 52 years (range 32–76). The median parity was two (range 1–5). The median of the most descended point of prolapse during gynecological examination, according to the POP-Q system, was +1 cm (range minus 2–plus 7). Twenty-four women (80%) had a history of gynaecological surgery, consisting of surgery for pelvic organ prolapse (n = 22) and urinary incontinence (n = 2). Twenty patients had undergone previous hysterectomy, of whom two had a subtotal hysterectomy.

The results show systematic differences in the intra- and interobserver reliability in relation to the mid-pubic line, but also in relation to the pubococcygeal line and H-line. These differences are generally less than 0.5 cm which seems clinically irrelevant.

Table 1 shows the ICC and mean difference of the within and between observer measurements of pelvic organ prolapse at rest and straining by anatomical landmarks in relation to the pubococcygeal line, the H-line, and the mid-pubic line and by clinical measurement points in relation to the mid-pubic line.

Table 1 ICC and mean difference, with 95% CI, of the within and between observer measurements of pelvic organ prolapse by anatomical landmarks and clinical measurement points in relation to three reference lines at rest and straining (n = 30)

The ICC of the anatomical landmarks were excellent to good, with some exceptions where inferior reliability came across. More specific, the interobserver ICC of the anorectal junction during straining (ICC = 0.45) using the H-line and the intra- and interobserver ICC in the anterior compartment at rest (ICC = 0.54, 0.50 and 0.51, respectively) and the inter-observer ICC during straining (ICC = 0.60 and 0.49) using the mid-pubic line were moderate. The median interobserver ICC of the anatomical measurement points (Bl, Bn, C/V, ARJ and Rec) of the pubococcygeal line were the highest both at rest and during straining compared to the median inter-observer ICC of the H-line and mid-pubic line (0.83 and 0.83 compared to 0.76, 0.69, 0.65 and 0.77, respectively). In conclusion, the interobserver agreement using the H-line and the mid-pubic line was somewhat disappointing. The median ICC of the pubococcygeal line at rest and during straining is substantially higher compared to the median ICC of the H-line and mid-pubic line, especially with regard to the interobserver reliability.

Table 2 displays the ICC and mean difference of the within and between observer measurements of any outpouching of the anterior rectal wall in relation to a straight line through the anterior border of the anal canal (line A) or in relation to the expected margin of the normal anterior rectal wall (line B). Both methods had a good to excellent intra- and interobserver reliability (ICC range 0.73–0.93).

Table 2 ICC and mean difference, with 95% CI, of the within and between observer measurements of any outpouching of the anterior rectal wall in relation to two different reference lines (n = 30)

The observers agreed on the presence of an enterocele in ten out of 30 patients. One more small enterocele has been assessed by one of the observers, which has been disregarded in the analysis. Table 3 presents the results of the ICC and mean difference of the within and between observer measurements of enteroceles during straining in relation to the pubococcygeal line, the H-line, and the mid-pubic line. The intraobserver reliability of the quantitative assessment for the three reference lines were excellent (ICC range 0.91–0.97), but the interobserver reliability for the pubococcygeal line and the H-line were only moderate (ICC = 0.47 and 0.45, respectively).

Table 3 ICC and mean difference, with 95% CI, of the within and between observer measurements of enteroceles during straining in relation to three reference lines (n = 10a)

Table 4 shows the ICC and mean difference of the within and between observer measurements of perineal descent in relation to the pubococcygeal line and the genital hiatus. Overall, the reliability of these measurements was good to excellent (ICC range 0.72–0.89), with the exception of the interobserver reliability at rest of perineal descent (ICC = 0.52).

Table 4 ICC and mean difference, with 95% CI, of the within and between observer measurements of perineal descent in relation to the pubococcygeal line and the genital hiatus at rest and straining (n = 30)

Discussion

In the present study, we have determined the intra- and interobserver reliability of dynamic MR staging in POP patients, using the three most common reference lines at rest and during straining. The intra- and interobserver reliability of quantitative prolapse staging on dynamic MR imaging were generally excellent to good. However, systematic differences and inferior reliability were mainly seen in relation to the mid-pubic line.

To our knowledge, only two studies have questioned the reliability of dynamic MR imaging measurement of the pelvic floor. Morren et al. have assessed nulliparous female volunteers, using the pubococcygeal line as reference line and different anatomical reference points to evaluate the position of the pelvic organs [5]. Subsequently, Fauconnier et al. have assessed women with POP and used the mid-pubic line and a new-introduced reference line, the perineal line [4]. Both studies used the ICC to assess reliability. Morren et al. reported that most measurement error was due to intraobserver reliability, which was not reproduced in this study. Fauconnier et al. have also reported excellent intra- and interobserver reliability of the MR imaging measurements.

Furthermore, Fauconnier et al. have introduced three measurement points on dynamic MR imaging similar to the clinical staging by the POP-Q system [4, 9]. In contrast with their findings, the reliability of these measurements was only moderate to poor on external validation in our study, with the exception of the middle compartment (i.e., the most descended point of the cervix or the vaginal vault).

In the present study, the cine-loop of images has been used, with very satisfactory reliability. Intra- and interobserver reliability can either be assessed in a cine-loop of images, where each rater picks the own image for assessment or in a single predefined image. Although, this difference may have an important influence on the results, the method used in the two previous studies on reliability has not been described.

The presence of POP on dynamic MR imaging has frequently been described whether or not pelvic organs descended below a certain reference line on Valsalva maneuver [12, 18, 21]. This is probably the most widely used method in clinical practice as well. In the present study, however, the reliability of this dichotomous distribution was overall poor or showed in virtue of expectation a high agreement. These high agreements were due to the fact that all anatomical landmarks were at a far distance of the reference lines, and thus not discriminating. Consequently, a diagnostic test for POP, merely based on descend below a certain reference line, seems to be a nonvalid method for interpretation of dynamic MR images.

In literature, four different methods have been described to report on the presence or staging of an enterocele on MR imaging, i.e., (1) an enterocele was diagnosed in case of any outpouching of the peritoneal sack and its contents into the rectovaginal space [3, 7, 14, 15], (2) when the sack descended into the rectovaginal space for more than one third of the proximal vagina [12, 2224], (3) when the sack descended more than halfway the vagina [25], or (4) when the peritoneal sack descended below a certain reference line [10, 11, 13, 26]. In the present study, the qualitative assessment (i.e., the first three methods) was excellent to good. The interobserver reliability of the quantitative assessment for the pubococcygeal line and the H-line, however, were moderate.

For the full validation of a certain method, data on, for example, validity and applicability needs to be assessed besides reliability. In the present study, reliability is the only parameter assessed. This limitation needs to be taken into account in the interpretation of the results.

Another issue, which needs to be considered, is the ease of use of reference lines. The pubococcygeal line is the most reliable to use, probably because this line is drawn between two fixed bony points. In addition, the pubococcygeal line is the most widely used reference line in the available literature, which is another advantage. Future studies are needed to establish whether the pubococcygeal line is also the preferable reference line with regard to validity, for example, compared to symptoms and the clinical staging of pelvic organ prolapse.

Dynamic MR imaging is mostly performed in the supine position, as open-magnet unit MR imaging, which are needed for prolapse assessment in the sitting position, are not very widespread. Bertschinger et al. has concluded in a study on closed-magnet unit dynamic MR imaging versus open-magnet unit dynamic MR imaging, i.e., supine vs. sitting position that the presence and severity grade of POP was concordant in the majority of patients [27]. Based on their conclusion, we think that our findings on reliability also apply when open-magnet unit dynamic MR imaging is used to asses POP.

The assessment of the MR images has been performed by an experienced radiologist (JF) and a novice observer (SB). Our data show that, in case of clear definitions of the reference line and measurement points, only a short training period was needed, to assess dynamic MR images regarding pelvic organ prolapse in a reliable manner.

In conclusion, the intra- and interobserver reliability of quantitative prolapse staging on dynamic MR imaging were generally good to excellent. The pubococcygeal line appears to be the most reliable to use, since the median ICCs are lower for the H-line and the mid-pubic line compared with the pubococcygeal line.