Introduction

Parastomal hernia (PSH) is a common complication following stoma formation and can cause discomfort, pain, strangulation, and incarceration of intestines, as well as difficulties with stoma care [1]. The exact incidence of PSH remains unclear, but most studies report high rates of over 30%, especially in case of colostomy [1, 2]. Still, reported rates vary widely in the literature, ranging from 0 to 86% [1, 3, 4]. This variability depends on several factors such as the length of follow-up, patient and surgical characteristics including type of stoma, method of stoma construction, but also on definition of PSH [5,6,7,8].

Moreover, several different diagnostic modalities can be used for the diagnosis of PSH, making it a factor affecting the incidence rate. In practice, clinical examination is the first method to assess the presence or absence of a PSH. In case of doubt about the diagnosis or to help plan for the surgical approach and management, an imaging modality can be chosen, such as ultrasonography (US), computed tomography (CT) scan, or magnetic resonance imaging (MRI) scan.

In addition, the lack of a clear definition and the use of several different classifications of PSH is a significant problem in PSH research [9]. Some studies use imaging to confirm the diagnosis of PSH, whereas others only use imaging in clinically unconvincing cases [10, 11]. Due to these differences, protocols often deviate between clinical practice and the research setting, as well as between clinical studies.

In 2014, the European Hernia Society (EHS) proposed a classification depending on the defect size and the presence of a concomitant incisional hernia [9, 12]. With the ability to correctly compare different studies and thus to provide a uniform research reporting, this classification is recommended by the EHS to use in PSH research [9]. However, these guidelines also emphasize the uncertainties on the accuracy of clinical and imaging diagnoses of PSH.

Therefore, the aim of this systematic review is to evaluate the accuracy of the different modalities used to identify PSH after stoma construction or after PSH repair. The secondary objective is to assess the inter-observer variation, correlations between (a) symptomatic PSH and imaging or surgical findings, and identify different definitions and classifications used for diagnosis of PSH.

Methods

The study protocol was registered in PROSPERO (CRD42018112732; International Prospective Register of Systematic Reviews). The Preferred Reporting Items for Systematic Reviews and Meta-Analyses of Diagnostic Test Accuracy (PRISMA-DTA) statement was followed [13]. Moreover, the article by Wille-Jørgensen et al. on systematic reviews and meta-analyses in coloproctology was used for methodological guidance [14].

Systematic literature search

A systematic search was performed by a biomedical information specialist instructed by first author (G.S.). Embase, MEDLINE, Cochrane, Web of Science, and Google Scholar databases were searched on March 5, 2019. Full search strategies and results per database are presented in Appendix 1. There was no limit on date of publication. After duplicate removal, studies were reviewed independently by two researchers (D.L. and G.S.) on title and abstract, followed by full-text review using EndNote X9®. Differences in article selection were discussed, and articles were included or excluded after consensus was reached.

Inclusion and exclusion criteria

Studies were included if they met the following criteria: (1) inclusion of patients that underwent stoma construction (ileal conduit, ileo- or colostomy) or PSH repair surgery; (2) studies assessing the performance of a diagnostic modality (clinical examination, CT, US, MRI, or diagnosis at surgery) used for the diagnosis of PSH. Only (non) randomized controlled trials, prospective, or retrospective cohort or case-control studies were included. Excluded were as follows: studies reporting on pediatric patients (< 18 years of age), studies reporting only on gastro-/oesophago- or duodenostomies, studies in which no data on diagnostic modalities were described, and studies with unclear diagnostic work-up, so that diagnostic data could not be extracted. Studies not written in English, case reports, letters, comments, abstracts, or posters were also excluded.

Data extraction

Data from included studies were extracted by one researcher (G.S.) and were checked independently by another researcher (S.H.) using standard forms covering study characteristics (year, journal, study design, level of evidence, and risk of bias), patient characteristics (number of patients, sex, age, body-mass index, and follow-up), surgical characteristics (indication for surgery, acute or elective, laparoscopic or open abdominal surgery, reoperation, stoma type, use of mesh, location of mesh), and outcome characteristics (definition and classification of PSH, diagnostic modalities and corresponding incidence of PSH and inter-observer variation). Since there is no gold standard modality for diagnosing a PSH, the detection rates of the different diagnostic modalities are compared within each study. The available absolute data and incidence rates of modalities are presented and compared in contingency tables. Intra-class correlation coefficient and Kappa values for inter-observer variation were extracted and presented. Inter-modality agreements were expressed as Cohen’s Kappa values for each study if possible. Statistical level of agreement per Cohen’s Kappa value range is presented in Supplemental table 1. The pooled Cohen’s Kappa value was calculated in a random effects model using inverse variance method, using meta-package for R version 3.5.1. (R Foundation for Statistical Computing, Vienna, Austria).

Study quality assessment

Two researchers (S.H. and G.S.) independently assessed the quality of included studies by assessing the level of evidence according to the Oxford Centre for Evidence-based Medicine Levels of Evidence [15] and the possible risk of bias using the Cochrane Collaboration’s tool for assessing risk of bias [16] and the QUADAS-2 tool [17] with RevMan 5.3 (Cochrane Centre, Denmark).

Results

Search and study characteristics

A PRISMA flow diagram of the complete search results is shown in Fig. 1. After removal of duplicates, 1495 articles were screened on title and abstract of which 192 articles were selected for full-text reading. Finally, 29 articles were judged eligible and were included.

Fig. 1
figure 1

Preferred items for reporting of systematic reviews and meta-analyses (PRRISMA) flow diagram

An overview of study characteristics is shown in Table 1. The methodological quality of all included studies per outcome measure is summarized in Fig. 2. Overall, a high risk of bias was present in the included studies (Fig. 3). Applicability concerns were present in 10–20% of the review sample (Fig. 3). Specific methodological concerns per included study are outlined in Appendix 2 Table 8.

Table 1 Overview of included studies
Fig. 2
figure 2

Risk of bias and applicability concerns summary

Fig. 3
figure 3

Overall risk of bias and applicability concerns

Definition and classification of PSH

The definition of PSH was reported in eighteen (62%) of the included studies [2, 11, 19,20,21,22, 25, 27, 28, 31, 33, 34, 37, 39, 41,42,43]. Some studies used two different definitions for clinical and radiologic examination [20, 21, 27, 31]. Therefore, a total of nineteen different definitions were used (Appendix 3 Table 9). For the definitions used in clinical examination, most studies included a combination of the terms “bulge” or “protrusion” and “around” or “in the vicinity of” the stoma. Also, some studies added the position of the patient’s body (supine or/and erect) during examination and the use of the Valsalva maneuver. For the definitions used in radiological examination, the terms “defect,” “fascia,” and “hernia sac” were often incorporated in the definition. Five studies did not describe the definition of PSH or diagnostic approach [18, 23, 24, 29, 35].

The classification of PSH was reported in thirteen (45%) of the included studies [2, 10, 22, 26, 30, 32,33,34, 36, 38,39,40, 43]. Two classifications were used. One developed and introduced by the European Hernia Society [12] and one by Moreno-Matias [34] (Appendix 4 Table 10).

Inter-observer variation

Three of the included studies reported on inter-observer variation [25, 31, 42]. Each study investigated different modalities examined by different observers. An overview of the methods and results of these studies is summarized in Table 2. Gurmu et al. reported a low inter-observer reliability when diagnosing PSH by clinical examination with disagreement rates of 35 and 54% between three surgeons and 18% between two surgeons [25]. Jänes et al. reported a strong agreement between three surgeons after diagnosing PSH by clinical examination with a Kappa value of 0.85 [31]. Also, the inter-observer reliability was higher among radiologists when patients underwent a CT in prone position as compared with patients in supine position with Kappa values of 0.85 and 0.82, respectively [31]. Strigård et al. investigated inter-observer reliability and learning curve of three-dimensional ultrasonography (3D US) in 40 patients. They found an overall inter-observer agreement of 72% with a Kappa value of 0.59, which is classified as “weak.” The learning curve reached its top at around 30 patients with an inter-observer agreement of 80% for the last ten examined patients [42].

Table 2 Inter-observer variation

CT versus clinical examination

The incidence rates of PSH after CT and clinical examination were reported in nineteen studies including a total of 1369 patients [2, 18,19,20,21, 23, 26,27,28, 30, 32, 33, 36,37,38,39, 43]. PSH incidence rates, disagreement percentages, and Kappa values are presented in Table 3. Study quality and clinico-radiological concordance are presented in Supplemental table 2. Fifteen studies (79%) reported a higher incidence rate and two studies (11%) reported lower incidence rate when diagnosing PSH using CT as compared with clinical diagnoses. When comparing CT to clinical examination, the relative difference in incidence rates ranged from 0.64 to 3.0. Disagreement between diagnoses by using CT versus clinical examination could be obtained in fifteen studies and ranged from 0 to 37.3%. The pooled inter-modality agreement Kappa value for all fourteen studies with contingency tables was 0.64 (95% CI 0.52–0.77) which is classified as “substantial agreement.”

Table 3 CT versus clinical examination

Ultrasonography versus clinical examination

The incidence rates of PSH after US and clinical examination were reported in one study, which included 43 patients with peristomal bulging (Table 4, Supplemental table 3) [41]. Sjödahl et al. reported a lower incidence rate by US for diagnosing PSH with relative difference of 0.58 when compared with clinical examination. The disagreement between these modalities was 53.5%.

Table 4 Ultrasonography versus clinical examination

CT versus ultrasonography

Studies comparing PSH incidence of CT to regular US were not identified. One study by Näsvall et al. [35] investigated intrastomal 3D US as an alternative to CT and included twenty patients that were indicated for surgical revision due to stoma-related symptoms. The PSH incidence was higher when using CT (80%) as compared with 3D US (75%) (Table 5, Supplemental table 4).

Table 5 CT versus ultrasonography

Peroperative diagnosis

Näsvall et al. compared 3D US and CT to findings at surgery in twenty patients [35]. For both imaging modalities a high sensitivity of 83% was found. A positive predictive value (PPV) of 94% and a negative predicted value (NPV) of 75% were reported for diagnosis with CT. For diagnosis with 3D US, a PPV of 100% and a NPV of 60% were reported. Also, Fleshman et al. reported peroperative findings in thirteen patients who were diagnosed with PSH at clinical examination of which eleven were confirmed by CT and two were confirmed operatively [24]. Study quality, PSH incidence rates, and surgico-radiological concordance of the two studies are presented in Table 6 and Supplemental table 5.

Table 6 Peroperative diagnosis

Imaging versus clinical examination

Two studies reported on clinical examination, CT and MRI for the diagnosis of PSH. These studies did not subdivide the incidence rate per type of imaging modality [11, 22]. Study quality, PSH incidence rates, and clinico-radiological concordance of the studies are presented in Table 7 and Supplemental table 6. Donahue et al. reported a higher incidence rate when using imaging with a relative increase of 1.47 and found no patients with clinical detected but radiological occult PSH [22]. Hansson et al. found three symptomatic PSHs in 60 patients that were clinically examined. A CT or MRI was performed in 27 of the 60 patients of whom nineteen patients had a asymptomatic hernia. Hotouras et al. reported 25 (58%) PSHs diagnosed with CT. Eleven (44%) of these 25 patients with radiological confirmed PSH were symptomatic as reported by the patients.

Table 7 Imaging versus clinical examination

Imaging after clinical suspicion of parastomal hernia

Brandsma et al. and Fleshman et al. used only a CT when there was clinical suspicion of PSH. In the study of Brandsma et al. [10], sixteen out of nineteen clinical PSHs (14.3%) were confirmed by CT, two by MRI, and one by US. Fleshman et al. found thirteen (13%) clinical PSHs of which eleven (11%) were confirmed by CT and two peroperatively. Hansson et al. performed a CT or MRI when there were doubts about the diagnosis of PSH during clinical examination [11]. One participating center performed imaging routinely (Table 7). The incidence after clinical examination was 5% (3/60) and after imaging 7% (4/61).

Discussion

Today, in both clinical practice and research there is no gold standard modality to examine patients for the presence of PSH. The literature on this subject is diverse and inconclusive. Facilitating comparison between studies on PSH remains challenging, due to, among others, the number of existing definitions, imaging modalities, and classifications. Indeed, this systematic review shows a great variance in detection rates of PSH between different diagnostic modalities.

Most included studies compared CT with clinical examination. The majority of these studies found higher incidence rates by using CT [2, 19,20,21, 23, 26, 28, 30, 32,33,34, 37,38,39,40]. However, some studies showed contradictory results in favor of clinical examination [24, 27, 36]. This discrepancy between studies could be explained by the technical differences in examination of the patients’ abdominal wall, bearing in mind that a patients’ body position and the use of Valsalva maneuver during examination might affect detection rates [31]. It is possible to use Valsalva maneuver in case of patients undergoing CT imaging. However, this is rarely reported in studies.

Gurmu et al. found a low inter-observer reliability when patients were clinically examined by surgeons, indicating that PSH is difficult to diagnose by clinical examination [25]. This was also stated by Sjödahl et al. who found poor correlation between US and findings at clinical examination [41]. If these examinations are performed correctly, the use of dynamic modalities such as US and clinical examination may have some advantages compared with the more static and expensive CT or MRI. However, the inter-observer variation and diagnostic accuracy of US have not been investigated thoroughly. In contrast, more evidence is available on the diagnostic performance of clinical examination and CT. For research purposes, the combined use of these two modalities might be recommended since multiple studies found significant disagreements in detection rates between both modalities [27, 33, 34].

This is the first review to date that provides a complete overview of the research of the available literature on different diagnostic modalities for PSH diagnosis. Nevertheless, it is important to note that this systematic review covers studies that investigate the PSH incidence rates in the setting of a research protocol that might not always fully reflect standard clinical practice. Also, the minority of included studies has the accuracy of the used diagnostic modality as primary outcome [20, 23, 25, 31, 34, 35, 41, 42]. In clinical practice the main goal is to identify symptomatic PSHs that might require treatment and for asymptomatic patients it seems unnecessary to follow a full diagnostic workup. Therefore, the clinical approach might differ from that in a research setting. In general, patients with stoma problems such as pain, appliance leakage, bowel obstruction, or symptoms of incarceration first undergo clinical examination by a stoma nurse and/or clinician. When PSH is identified clinically or the diagnosis is inconclusive the clinician can consider an imaging modality to confirm the diagnosis, taking into account patient safety, patient comfort, availability, and costs, whereas for research purposes, factors as costs and availability might play a less important role in the decision on imaging modality.

Intrastomal 3D US is a relatively new imaging modality for diagnosing PSH or other stoma-related pathology [44]. 3D US seems to be an accurate imaging modality with a sensitivity of 83% when compared with peroperative diagnosis [35]. With this imaging modality it is possible to examine patients in erect position and without the use of radiation, providing potential advantages over CT. There is, however, too little available evidence for this technique to consider this as standard imaging modality for the diagnosing of PSH.

In contrast to diagnosing incisional hernia, traditional two-dimensional ultrasonography (2D US) is not often used for diagnosing PSH in both research and in clinical setting. However, 2D US is the most patient-friendly, inexpensive, and practical modality of all imaging modalities. This systematic review included only one study comparing 2D US to clinical examination for diagnosing PSH. However, to make any recommendations on 2D US, it would be interesting to compare ultrasonography with other imaging modalities in the future.

Another important aspect of clinical practice with regard to the use of diagnostic modalities is that many stoma patients have a stoma created after oncological resection, and for these patients a CT is routinely made during follow-up to detect potential cancer recurrence. Although some PSHs occur many years after stoma construction, most PSHs develop within the first years after stoma construction and are thus likely to be identified with follow-up CT [5]. This is one of the main reasons why most included studies used CT instead of MRI or US. However, with the patient in supine position a CT is not a reliable tool for diagnosing PSH and a CT with the patient in prone position is associated with higher inter-observer agreement and an increase in sensitivity [31]. By using CT routinely for cancer follow-up, asymptomatic PSHs will appear more frequently. Although not entirely insignificant, studies do not often distinguish between symptomatic and asymptomatic when reporting PSH incidence rates.

Evidently, patient-reported outcomes are of paramount importance in the context of stoma-related complications. Patients know their own bodies in a way no physician possibly can, and have to take care of the stoma several times a day, whereas the physician examines the patients’ stoma once or maybe twice. Any physical differences of the stoma will be noticed by the patient, which probably makes it more reliable than the studied modalities on the existence of bulging at some time point during follow-up. Currently, prospective cohort studies, such as the PROPHER and CIPHER studies (ISRCTN17573805; ISRCTN registry), are assessing the value of subjective and objective outcomes after stoma construction or for parastomal hernia treatment, respectively.

Despite the increased interest in PSH care and research in recent decades, there is still no consensus regarding the definition of PSH or a gold standard for diagnosis [9]. Although many definitions consisted of similar terms and contexts, some definitions differ considerably which can lead to discrepancies in detection rates. Moreover, the fact that five included studies have not even described the definition of PSH, emphasizes the need for uniform reporting in studies regarding PSH [18, 23, 24, 29, 35]. This heterogeneity in diagnostic procedures makes it difficult to compare studies and to determine an accurate incidence of PSH. Therefore, a clear and standard definition and diagnosis of PSH is of paramount importance. The European Hernia Society (EHS) acknowledged this problem and proposed to use the definition of PSH introduced by Muysoms et al. [45]: “An incisional hernia through the abdominal wall defect created during placement of a colostomy, ileostomy or ileal conduit stoma”. Furthermore, the EHS proposed a new classification for PSH, which might help to facilitate more uniform reporting of outcomes in PSH research (Appendix 4 Table 10) [12].

Limitations

This systematic review has some limitations. Firstly, the low level of evidence of included studies is an important limitation. Eleven studies have a retrospective study design, which is prone for selection and information bias. Also, most studies presented small study populations. Nevertheless, to give a complete overview of diagnostic accuracy and variation of the different modalities, studies of low quality or studies with small samples were not excluded and a comprehensive overview of study characteristics and study quality assessment was provided.

Secondly, significant heterogeneity between studies was demonstrated, as operation and stoma types, use of mesh reinforcement, patient characteristics (e.g., age and BMI), and follow-up duration differed between included studies. Besides the choice of diagnostic modality, all these factors also influence the PSH incidence rates. Although it was not possible to account for this, these factors would be of less importance for the within-study diagnostic performance, since diagnostic modalities were only compared within each study. However, some studies did not investigate the PSH incidence rate or the accuracy of the diagnostic modalities as primary objective. As a result, the incidence could easily be underestimated. Accordingly, the results of diagnostic performance may also be affected.

Conclusion

In conclusion, this review shows great variance in accuracy of different modalities for the detection of PSH. The use of CT increases the PSH detection rate, indicating that this is a more accurate modality compared with clinical examination. However, the evidence on the accuracy of the other imaging modalities, also within patient-reported outcome measures, is scarce and warrants further investigation. There are significant differences in diagnostic methods between clinical practice and in the setting of research protocols, as well as between clinical studies. In order to compare studies correctly and increase transparency among studies, a more detailed report of the diagnostic method and a detailed and preferably uniform definition are required in future research. It might be of added value to develop a standard and validated protocol in which self-report, clinical examination, and imaging are combined.