Introduction

Incisional hernia (IH) is the most frequent complication after open abdominal surgery. IH prevalence rates in published cohorts vary substantially: prevalence rates between 10 and 32% have been reported [1, 2]. Several factors explaining the variability in IH rate have been brought forward such as: age, obesity, abdominal aortic aneurysms, and previous abdominal surgery [1]. Most studies investigating the treatment or prevention of IH use IH prevalence as their primary endpoint. The diagnostic modality, observer, definition, and diagnostic protocol used for the diagnosis of IH are infrequently identified as factors associated with the IH prevalence rate. However, all four of these elements regularly differ within and between studies.

Many diagnostic modalities are used for the diagnosis of IH including physical examination, ultrasound, computed tomography scan (CT-scan), magnetic resonance imaging (MRI), and per-operative diagnosis. In IH research, the use of imaging modalities is considered important to achieve more reliable results. This is accentuated by the recommendation in the ‘European Hernia Society guidelines on the closure of abdominal walls’ to use ultrasound or CT-scan in the follow-up of prospective studies [3]. This approach deviates from every day clinical practice, in which clinicians mainly focus on the diagnosis of symptomatic IHs that might require treatment [4].

In general, it is believed that the use of radiologic imaging will increase the detection rate of IH compared to physical examination alone. However, not all published cohorts show this trend [3,4,5,6].

The choice of diagnostic modality is often dictated by multiple factors such as cost, availability, safety, and especially in a research setting the detection rate, and reliability. However, the latter remains unclear, as the evidence concerning these factors is limited and sometimes contradictory [7, 8]. In IH research, the IH definition is not always uniform. The definition of IH as stated by Korenkov et al. [9]: ‘any abdominal wall gap with or without bulge in the area of a post-operative scar perceptible or palpable by clinical examination or imaging’, is acknowledged in the European Hernia Society (EHS) classification of primary and incisional abdominal wall hernias [9, 10]. Although IH is usually defined as an ‘abdominal wall gap or fascial defect’, some nuances with regard to this definition circulate as the term ‘abdominal wall weakness’ may also be used. Furthermore, bulging or a positive Valsalva maneuver may or may not be a diagnosing symptom [11, 12]. The place of imaging techniques within the diagnostic protocol often differs: some studies use a more clinical approach, reserving imaging techniques for cases with an inconclusive physical examination, whereas other studies only consider ‘radiologically confirmed’ diagnosis [2, 13, 14].

We hypothesize that the use of different diagnostic modalities, observers, definitions, and diagnostic protocols might influence the number of IHs identified. The objective of our systematic review is to evaluate the diagnostic accuracy of the different modalities used to identify IH after open abdominal surgery and after IH repair surgery. We provide a qualitative synthesis of the available data on the diagnostic accuracy of physical examination, CT-scan, and ultrasound for the identification of IH.

Methods

The study protocol was registered in the PROSPERO database (International Prospective Register of Systematic Reviews, http://www.crd.york.ac.uk/prospero) prior to the start of the systematic review with the registration number CRD42017062307. All aspects of the PRISMA statement (Preferred Items for Reporting of Systematic Reviews and Meta-analyses) were followed [15].

Search strategy

Embase, Medline ovid, Web of science, Cochrane, PubMed publisher, and Google scholar databases were searched on 28 March 2017. Full search details and syntax are presented in Appendix 1. The syntax construction and database search were performed in collaboration with a medical librarian specialized in conducting systematic reviews.

Studies reporting on IH diagnosis after primary laparotomy and after IH repair surgery were included. There was no limit in language or date of publication.

Studies were first evaluated for inclusion based on title and abstract by two independent researchers (LK and DS) and finally evaluated independently based on full text. Differences in article selection were discussed and articles were included or excluded after reaching agreement. Studies were included if they met the following criteria:

  1. 1.

    Inclusion of patients that underwent abdominal or IH repair surgery that were followed for the development of IH.

  2. 2.

    Studies assessing the performance of a diagnostic modality (physical examination, abdominal CT-scan, abdominal MRI scan, abdominal ultrasound, or surgery) used for the diagnosis of IH.

Studies assessing only laparoscopy patients, non-consecutive patient populations (e.g., patients with prior IH diagnosis), Spigelian, or occult hernias were excluded. Discrepancies in inclusion were resolved by discussion between reviewers and a senior author (JFL or FM).

Data collection

Data collection was performed independently by two different researchers (LK and DS) using the standard forms covering study characteristics (study design, year, location, and level of evidence); patient baseline characteristics (type of intervention, number of patients, age, sex, open or laparoscopic surgery, duration of follow-up, and reason for surgery). Outcome characteristics concerning diagnostic performance comprise: definition of IH, inter-observer variation, CT-scan versus ultrasound, CT-scan versus physical examination, ultrasound versus physical examination, diagnostic modalities versus per-operative diagnosis, and diagnostic performance in obese patients. Extracted data consisted of absolute data in four by four contingency tables, prevalence rates, kappa values, or intra-class correlation coefficients.

Assessment of study quality

The level of evidence of each paper was established according to the Oxford Centre for Evidence-based Medicine levels of evidence [16]. The possible risk of bias was assessed using the Cochrane Collaboration’s tool for assessing risk of bias [17]. Risk of bias was assessed separately for each outcome, since the quality of different outcomes in papers with a wide scope might differ.

Results

Search and study characteristics

The PRISMA flow diagram of the complete search strategy is shown in Fig. 1. The initial search resulted in 4855 articles (3010 after duplicates removal). After screening, 135 articles were selected for full-text reading. After full-text reading, 15 articles were selected for inclusion [2, 4,5,6,7,8, 11, 12, 14, 18,19,20,21,22,23]. Characteristics of included studies are summarized in Table 1.

Fig. 1
figure 1

Preferred reporting items for systematic reviews and meta-analyses (PRISMA) flow diagram

Table 1 Overview of included studies

Study quality

Risk of bias and applicability concerns of included studies per outcome are summarized in Fig. 2. Overall major concerns in patient selection, execution, and comparison of diagnostic tests and patient flow were present in 25–50% of the review sample (Fig. 3). Major applicability concerns were present in 10% of the review sample (Fig. 3). Specific methodological concerns are presented in Appendix 2.

Fig. 2
figure 2

Risk of bias and applicability concerns summary

Fig. 3
figure 3

Overall risk of bias and applicability concerns

Definition of IH

A clear definition for IH was reported in seven of the included studies (Appendix 3) [2, 4, 7, 11, 12, 20, 22]. IH was defined as any ‘abdominal wall gap’ or ‘defect’ in the proximity of the post-operative scar, by five out of seven studies [2, 4, 7, 12, 22]. Two of these studies included ‘a protrusion of abdominal contents’ in the definition and incorporated the terms ‘weakness’ as well as ‘defect’ of the abdominal wall in their definition [12, 22]. One study defined IH as a ‘palpable protrusion’ under the laparotomy scar [11]. One study defined IH as ‘fascial defect’ in the proximity of the scar [20]. Three studies referred to a proposed universal definition [2, 4, 12]. One study that did not clearly define IH, reported that in case of disagreement between two or more observers, this was due to the lack of a clear definition among the observers in 35% of the patients (n = 42) [23].

Inter-observer variation

Inter-observer variation was reported in five of the included studies concerning a total of 698 patients [8, 12, 18, 20, 23]. Four out of five studies included in this comparison had one or more methodological concerns [12, 18, 19, 23]. Results obtained by these studies are summarized in Table 2. Reported disagreement between two observers ranged from 11.2 to 14.4%; corresponding kappa values ranged from 0.71 to 0.74 (n = 578) [8, 12, 18]. One study comparing the inter-observer variation in a group of six radiologists and three surgeons reported disagreement rates of 69 and 27%, respectively (kappa: 0.38 and 0.62; n = 100) [23]. One other study used a panel of five independent surgeons and reported an intra-class correlation coefficient of 0.85 (n = 20) [20]. The inter-observer variation of ultrasound was assessed in one study that used a panel of three independent surgeons, and an intra-class correlation coefficient of 0.79 (n = 17) was reported [7].

Table 2 Inter-observer variation

CT-scan versus ultrasound

The prevalence rate of IH after ultrasound and CT-scan was reported in two studies concerning a total of 221 patients [7, 8]. The study by Beck et al. [7] had methodological problems concerning patient selection and patient flow. Results obtained by these studies are summarized in Table 3. These two studies obtained contradictory results. Den Hartog et al. [8] reported a higher prevalence rate when using ultrasound, whereas Beck et al. [7] reported unchanged prevalence rates. Relative increase in prevalence rate when comparing CT-scan to ultrasound was 1.41 and 0.93. Disagreement between ultrasound and CT-scan was reported in 7/40 (17.5%) and 12/181 (6.6%) cases.

Table 3 CT-scan versus ultrasound

CT-scan versus physical examination

The prevalence rates of IH after CT-scan and physical examination were reported in six studies concerning a total of 1378 patients [5, 6, 11, 12, 14, 23]. Five out of six studies included in this comparison had one or more methodological concerns [5, 11, 12, 14, 23]. Results obtained by these studies are summarized in Table 4. Four studies reported higher prevalence rates and two studies reported lower prevalence rates when using CT-scan for the diagnosis of IH. The relative increase in prevalence rates when comparing CT-scan to physical examination ranged from 0.92 to 1.8 (n = 1378). Disagreement between diagnosis by CT-scan compared to physical examination was quantifiable in four studies and ranges from 7.8 to 32% (n = 770). Between 15 and 48% of the reported IH diagnosis were solely established with use of CT-scan (N = 267) [5, 6, 14, 23].

Table 4 CT-scan versus physical examination

Ultrasound versus physical examination

The prevalence rate of IH after ultrasound and physical examination was reported in four studies concerning a total of 1013 patients [2, 4, 7, 14, 21]. All studies included in this comparison had one or more methodological concerns [2, 4, 7, 14, 21]. Results obtained by these studies are summarized in Table 5. Three studies reported higher prevalence rates and one study reported a similar prevalence rate when using ultrasound for the diagnosis of IH. The relative increase in prevalence rates when comparing ultrasound to physical examination ranges from 1 to 2.4 (n = 1013). Disagreement between diagnoses by ultrasound compared to physical examination was quantifiable in three studies. Disagreement between the two modalities was reported in 41/456 (9%), 44/338 (13%), and 15/38 (39%) of the cases. IH diagnosis was solely established with us of ultrasonography in 21/103 (20%), 41/87 (47%), and 15/26 (58%) of IH diagnosis [2, 4, 21].

Table 5 Ultrasound versus physical examination

Per-operative diagnosis

The diagnosis obtained through physical examination or CT-scan was compared to the per-operative findings in three studies concerning 80 patients. Results obtained by these studies are summarized in Table 6 [6, 22, 23]. Only one of the studies included in this comparison was of good methodological quality. All reports on this outcome were flawed by small sample sizes. Gutiérrez de la Peña et al. [6] reported a true positive rate of 100% and a false positive rate of 98% (n = 50) for diagnosis with CT-scan. For the diagnosis with physical examination, a true positive rate of 75% and a false positive rate of 90% (n = 50) were reported [6].

Table 6 Per-operative diagnosis

Impact of obesity

The impact of obesity on the diagnosis of IH was reported in three studies concerning two different patient populations [4, 14, 19]. Baucom et al. [14] compared CT-scan as diagnostic modality to physical examination in obese and non-obese patients. The disagreement rate between the two modalities was 21% (n = 96) in obese patients compared to 13% in non-obese patients (n = 85) [14]. Bloemen et al. [4] compared ultrasound as diagnostic modality to physical examination in patients with a body mass index (BMI) > 25 and in patients with a BMI < 25. The disagreement rate between the two modalities was 10% (n = 228) in the BMI > 25 patients compared to 8% in BMI < 25 patients (n = 228) [4]. One other study compared the mean surface area of incisional hernias detected with ultrasound in obese and non-obese patients and did not find a significant difference between the two [19].

Discussion

In this systematic review on diagnostic modalities for IH diagnosis, great variance between modalities and between different studies was found. The diagnosis of IH remains challenging, as no objective gold standard is present.

All included studies were of retrospective design, had multiple methodological concerns, or presented a small sample of patients (GRADE quality: low or very low). Therefore, the results of included studies should be interpreted with caution. Compared to per-operative diagnosis CT-scan seems to be reasonably accurate in one study presenting a small sample of patients [6]. However, considerable inter-observer variability has been reported [8, 12, 18, 20, 23]. Moreover, multiple studies report considerable discrepancy between CT-scan and physical examination and between CT-scan and ultrasonography results [2, 4,5,6,7, 11, 12, 14, 23]. No study compares ultrasound to the per-operative diagnosis. Two studies compare ultrasound to CT-scan and find contradictory results [7, 8]. Inter-observer variability for ultrasound and physical examination has not been assessed thoroughly; however, we may assume that inter-observer variability will be present due to the dynamic nature of these diagnostic modalities.

One prospective study of decent methodological quality provides a comparison between physical examination and the per-operative diagnosis in a small sample of 50 patients. Although the sample size was limited, this is the only report that provides some reliable insight in the sensitivity and specificity of physical examination, a sensitivity of 75%, and a specificity of 90% being reported [6]. Considerable discrepancies were reported between diagnoses by physical examination and ultrasound or CT-scan [2, 4,5,6,7, 11, 12, 14, 23]. Most studies report higher prevalence rates when using imaging modalities for the diagnosis of IH. However, not all studies show this trend [4, 6]. Relative increase in IH prevalence compared to physical examination ranged from 0.92 to 1.8 for CT-scan and 1 to 2.4 for ultrasound [2, 4,5,6,7, 11, 12, 14, 23]. Strikingly, studies that report similar prevalence rates for physical examination and ultrasound or CT-scan still show considerable disagreement between the two imaging modalities [4, 6]. The diagnostic performance of CT-scan is more thoroughly investigated compared to physical examination and ultrasound. CT-scan will likely provide the most sensitive and reproducible diagnosis of IH followed by ultrasound and physical examination. The definition of IH differed slightly in those studies that reported a definition. No study reported an IH definition specifically adapted for the diagnostic modality used. Disagreement between observers might in part be due to lack of consensus with regard to the IH definition [23].

It is important to stress that all the above-mentioned concerns relate to the research setting. For clinical studies, objective comparable measures should be used to report endpoints. The choice of diagnostic modality in a clinical setting might be relatively straightforward as most clinicians are mainly focused on identifying symptomatic incisional hernias that might require treatment. Therefore, in asymptomatic patients, a full diagnostic workup would often not be necessary. For a surgeon, detection rate is not the only argument to choose one modality over the other. In this case, costs, availability, patient safety, and patient comfort are important factors to take into account. It is understandable that a stepwise incremental approach is often chosen, in which physical examination will be the first modality used, followed by imaging in case of doubt.

In IH research, the diagnostic follow-up is challenging as no diagnostic gold standard exists and imaging will often be applied for non-IH related indications or in patients with an inconclusive physical examination, potentially causing for selection bias. The choice of diagnostic modality and the number of observers might influence the IH prevalence found. When different modalities and observers are unequally distributed over study cohorts, internal study validity could be compromised. This is especially of concern in studies of observational retrospective design, since many observers and different diagnostic modalities are present in every day clinical practice. Moreover, the aims of the clinician (identifying symptomatic IHs) often deviate from the aims of the researcher (identifying all IHs). Varying definitions for IH among observers are likely to cause a part of the observed disagreement [23].

Use of a universal definition such as the definition as proposed by Korenkov et al. [9]: ‘any abdominal wall gap with or without bulge in the area of a post-operative scar perceptible or palpable by clinical examination or imaging’, might be imperative. Based on current data, restricting the definition of IH to radiologically confirmed hernia’s only is not advisable, illustrated by the substantial inter-observer variation in CT-scan examinations and reports of false negative and false positive CT-scan diagnosis [6, 8, 18, 22, 23]. Although our knowledge with regard to inter-observer variation in IH diagnosis is mainly based on diagnosis by CT-scan, we may assume that these variations are of even more concern when applying ultrasound or physical examination, due to the more dynamic nature of these diagnostic modalities and the fact that in both modalities, subjectivity plays a larger role. The series presented by Holihan et al. [23] (CT-scan only) suggested that at least part of the observed inter-observer variation was due to subtle differences in the applied definition and methodology of operators. An IH definition specifically altered for the (radiologic) diagnostic modality of use, accompanied by a standardized systematic approach, might further improve the accuracy and consistency of IH diagnosis [7, 23]. For ultrasound examination, a systematic approach in which the midline area is examined first, followed by the abdominal areas next to the midline, and finally, the more lateral abdominal areas as suggested by Beck et al. [7] could be considered. This approach could be applied similarly for abdominal palpation. Since the diameter of the fascial defect and hernia sac significantly enlarges during a Valsalva maneuver, routine use of the Valsalva maneuver during physical examination, and radiologic evaluation of the post-operative scar might be of added diagnostic value [24].

The clinical relevance of IHs detected solely by radiologic imaging remains unclear. Only one study to date attempts to answer this question. Bloemen et al. [4] reported 26/103 of IH patients with discomfort, 3/26 of these IHs were detected by ultrasound alone, and 1/13 IHs that were treated surgically were detected by ultrasound alone. Based on current literature, the proportion of IHs solely detected by radiologic imaging that requires treatment or will progress through time remains unclear. Future research concerning the diagnosis of IHs should emphasize more on these factors.

Limitations

Our systematic review has some limitations. First, all included studies were of low quality: most were of retrospective design, and some studies presented small samples. Therefore, the data should be interpreted with caution. We assume that between study, variation is present: follow-up, indication for abdominal surgery, BMI, and age differed between studies. In addition, some studies included a small proportion of laparoscopic patients [7, 12, 14, 18,19,20]. IH prevalence rates in patients operated laparoscopically differ from patients undergoing open abdominal surgery. Therefore, the proportion of patients operated laparoscopically will influence the total IH prevalence. Although these factors influence the comparability of reported IH prevalence, these factors might be of less concern when assessing the diagnostic accuracy. The majority of included studies had multiple methodological concerns. Risks for either reporting or selection bias was found frequently (Appendix 2). Most methodological concerns will mainly influence the overall prevalence rates; however, the diagnostic accuracy will be influenced by the prevalence rate to some degree. In addition, a number of studies did not compare the diagnostic modalities in a blinded fashion, potentially diluting the presented results and diminishing generalizability [2, 4, 5, 11, 12, 18].

Conclusion

Great variance between different diagnostic modalities and between different observers was found. Use of imaging modalities will usually cause for additional/increasing numbers of IH diagnosis and increase the IH prevalence compared to use of physical examination alone. When comparing different imaging modalities, CT-scan provides the most accurate diagnosis. Lack of consensus with regard to the IH definition among observers might in part explain the inter-observer variation. The observer, diagnostic modality, and diagnostic approach could be additional factors explaining variability in IH prevalence and should, therefore, be reported with detail in IH research. To achieve internally valid study results, proper distribution of different observers and diagnostic modalities across study cohorts is imperative.