Background

Surgical techniques for treatment of ureteropelvic junction obstruction (UPJO) have seen significant advancements during the last decades. From open approaches over laparoscopic techniques and endopyelotomy (EP) in the 1990s, robot assisted approaches have been introduced in 2002 [1, 2]. Currently, all of these techniques are clinically applied, although laparoscopic and robot assisted pyeloplasty (LP and RP) are often claimed to be superior to other approaches [3,4,5,6,7]. Several studies reported that minimal invasive treatment options outperform open pyeloplasty (OP) with respect to early recovery and lower complication rates, whereas OP operating time is shorter [3, 4]. Comparing LP and RP, length of hospital stay, operating and suturing time seem to be shorter for RP [8,9,10,11]. In one meta-analysis, operative success rate was significantly higher for RP [9]. While abdominal surgery is required for open, laparoscopic and robot assisted approaches, EP probably has the lowest invasiveness [12]. Even though EP has been replaced by other approaches in many institutions, several experts in the field still advocate this technique [13]; especially, as few literature is available summarizing the evidence on all approaches. Notwithstanding that meta-analyses have been published comparing individual studies for two treatment options, these provides limited guidance in the current situation with multiple approaches available for UPJO which must be evaluated against each another. The aim of this network-meat analysis is to provide a comprehensive overview for most frequently used techniques for treatment of UPJO and to compare their effectiveness regarding various clinically important outcomes.

Materials and methods

Search strategy

This study was registered a priori at PROSPERO (CRD42018085917). The systematic literature search using EMBASE, MEDLINE and COCHRANE libraries was performed in January 2018 and was unrestricted with respect to publication language and date. In addition, publication lists of reviews and included articles as well as conference proceedings were searched. Furthermore, we searched a registry for clinical trials (www.clinicaltrials.gov) to identify potentially unpublished studies. The full search strategy is available in the Additional file 1.

Criteria for study inclusion and exclusion

We included retro- and prospective studies comparing at least two of the following surgical approaches for the treatment of UPJO: OP, LP, RP and EP. Studies had to assess at least one of the following outcomes: operative success, operating time, suturing time, estimated blood loss, transfusion rate, conversion rate, re-operation rate, intra- and postoperative as well as overall complications, urinary leakage, deaths, length of hospital stay, time to return to normal activities, renal function, pain, analgesia requirement, survival time and costs. No minimum follow-up time was required for study inclusion. Of these, the following outcomes were statistically evaluated: operative success was examined as the total number of successes as defined by the authors; rates of transfusion, conversion, re-operation as well as intra- and postoperative complications, and urinary leakage were evaluated as total numbers as well. Reviews and meta-analyses were excluded as well as studies focusing on children or animals. In addition, we excluded studies which reported insufficient data on measures of dispersion or pooled outcome data of two surgical approaches. In case of more than one publication reporting on the same patient cohort, the more comprehensive one was selected in order to meet the assumption of independence for meta-analyses.

Definition of operative success

The authors of the papers used varying definitions for operative success: Objective success was reported by all authors and included mostly included patent ureteropelvic junction confirmed by radionuclide diuretic renogram or intravenous urography (IVU) and sometimes decrease in severity of hydronephrosis. Objective success was often defined as absence of symptoms or “significant” improvement with no further specification. If separate measures for subjective and objective success were reported instead of combined values, only success rates of objective measures were included in the in the statistical analysis to account for the subjectivity of perceived pain [14,15,16]. Measures for success of the surgical procedure e.g. whether a laparoscopic surgery could be performed without conversion, were not taken into account.

Data extraction

Publication titles identified via the literature search were independently screened by 3 blinded authors resulting in a selection for abstract and full text screening which was performed by 2 independent blinded authors. From the resulting list of publications suited for inclusion, data extraction was performed in the same manner. Disagreement was resolved by consultation of a third author and majority vote. As suggested by Rothman et al., a study was considered as prospective if data collection on interventions and covariates took place before the outcome occurred [17].

Assessment of study quality

Study quality assessment using the Downs and Black instrument was performed by two independent blinded authors resolving disagreement by consensus involving a third author [18]. The Downs and Black instrument rates the quality of randomized and non-randomized studies on a scale from 0 to 32 points (0 points for the worst and 32 for the best study quality) using a catalogue of 27 items. For each item 1 point is given except for the description of the distribution of principal confounders in each group of subjects where a maximum of 2 points can be reached and for the evaluation of study power for which a maximum of 5 points can be reached. For power evaluation study sizes were credited 1 up to 5 points for < 15, 15 to 44, 45 to 59, 60 to 100, > 100 patients according to the quartiles of sample sizes of included studies. Study quality was labeled “low” (1–10 points on Downs and Black instrument), “moderate” (11–21 points) and “high” (22–32 points). The Newcastle Ottawa Scale was used for additional study quality assessment. As suggested by the Cochrane handbook, study quality was separately assessed for each outcome [19].

Statistical analyses

For dichotomous outcomes, Odds Ratios (OR) were calculated from absolute numbers or percentage given in standard manner. Continuous outcomes such as operating time, estimated blood loss and length of hospital stay were compared as median or mean with standard deviation. If available, data from multivariable models was preferentially used [20, 21]. In case of two treatment for similar interventions e.g. endopyelotomy and acucise endopyelotomy, the results were pooled [22]. If only median and interquartile range (IQR) for continuous outcomes were reported, and a large sample size indicated the high probability of an underlying normal distribution, the standard deviation was calculated by dividing the IQR by 1.35 [16]. All outcomes were compared to endopyelotomy as the reference group. A network meta-analysis with random effects approach was used as statistical method for comparison. All outcomes were ranked by p-score methods to estimate the amount of certainty that a single treatment outperforms the average of competing interventions. The p-score ranges from 0 to 1, the latter indicating the highest certainty possible [23]. Study heterogeneity was evaluated by Higgins’s I2 considering percentages below 25% as of potentially low relevance, from 26 to 50% as “moderate”, from 51 to 75% as “substantial”, and from 76 to 100% as “considerable” heterogeneity [19]. The consistency assumption was evaluated via visual assessment of net heat plots and by Cochran’s Q statistic. All pairwise comparisons of more than 10 studies were tested for publication bias using the weighted linear regression of the treatment effect on its standard error [24]. Sensitivity analyses were performed including only studies with at least 12 months follow-up time. The statistical analysis was performed using R version 3.4.2 with the packages “meta”, “netmeta”, and “metabias” as well as R Studio version 1.1.383. All p-values were calculated two-sided and an alpha-level of < 0.05 was considered statistically significant.

Results

Study characteristics

The systematic literature search yielded 3008 studies published between 1995 and 2017 of which 26 fulfilled the inclusion criteria [14,15,16, 20,21,22, 25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44]. Figure 1 depicts the selection process of the studies whereas Table 1 details the study characteristics. In total, 3143 patients were analyzed: 556 receiving OP, 1540 receiving LP, 798 receiving RP, and 249 receiving EP. Operative success and complications were evaluated by 24 studies each, whereas operating time was reported by 22 studies and length of hospital stay by 21 studies. Estimated blood loss was evaluated by 13 studies, postoperative complications by 10 studies, conversion rates by 9 studies and re-operation rates by 8 studies. Transfusion rates, intraoperative complications, urinary leakage and analgesia requirement were reported by 7 studies each. Four studies evaluated death rates, whereas suturing time and time to return to normal activities was reported by 3 studies each. Pain, renal function, and costs were evaluated by 1 study each whilst none of the included studies reported on survival time. Six studies were three-armed trials [14, 21, 25, 28, 30, 40]; the remaining 20 studies evaluated two of the 4 outcomes. Figure 2 details the number of comparisons for each end point. Only 3 congress abstracts provided sufficient data to meet the inclusion criteria [30, 31, 38]. Study design was retrospective in 18 studies, 1 study included pre- and retrospective data [40] and 5 studies explicitly stated a prospective design [27, 36, 38, 41, 44]. For 2 studies no information was available [31, 32]. Only one study had a randomized controlled design [38] and two further studies used multivariable adjustment methods to account for confounding [20, 21]. The mean follow up ranged from 1 month to greater than 6 years with the majority of the studies reporting follow-up times greater than 1 year. The study population consisted of adult patients in 15 of the included publications, whereas 8 studies reported on mixed cohorts of adults and children with the majority of patients being of age > 18 years in case of available information [30, 32,33,34, 41,42,43,44]. For 3 studies the inclusion criteria with respect to patient age were unclear [21, 31, 38]. The percentage of female patients ranged from 32 to 77% with 6 studies omitting this information [15, 16, 25, 26, 31, 38]. The geographic region of the study population was Asia in 8 cases [22, 30, 32, 34, 37, 40, 41, 43], Europe for 7 studies [15, 29, 31, 33, 35, 38, 42], and North America for 10 studies [16, 20, 21, 25,26,27,28, 36, 39, 44]. One study reported on a mixed Asian and North American population [14].

Fig. 1
figure 1

Flow diagram of study inclusion process

Table 1 Study characteristics of included studies
Fig. 2
figure 2

Number of comparisons for each end point. a) Comparative studies on success. b) Comparative studies on complications. c) Comparative studies on urinary leakeage. d) Comparative studies on re-operation. e) Comparative studies on transfusion probability. f) Comparative studies on operating time after sensitivity analyses. g) Comparative studies on length of stay

Study quality

The median study quality was 14 points (range from 6 to 25 points). Reasons for the moderate quality were missing randomization, allocation concealment and blinding as well insufficient confounder adjustment and loss to follow-up in almost all studies. Nevertheless, many studies clearly described study hypothesis, main outcomes and findings as well as patient characteristics and selected participants representative for the source population. Table 2 details the study quality separately for the most important endpoints success and complications. Comparable results were obtained using the Newcastle Ottawa scale (Additional file 1: Table S3).

Table 2 study quality according to the Downs and Black instrument for success and complications

Network meta-analyses for different outcomes

Network meta-analysis of operative success

The analysis of operative success included 34 pairwise comparisons from 24 studies. Compared with RP, EP and LP showed lower success rates with OR = 0.09 (95%CI 0.05–0.19; p < 0.001) for EP and OR = 0.51 (95%CI0.31–0.84; p = 0.008) for LP. No statistically significant difference was evident comparing OP and RP (OR = 0.69, 95%CI 0.34–1.4; p = 0.306). Table 3 depicts all pairwise comparisons in a league table. Associated p-scores are presented in Table 4. There was no evidence of study heterogeneity (I2 = 0%, p = 0.9041). Neither Cochran’s Q (Q = 1.98; p = 0.9213) nor the net heat plot depicted inconsistencies. Comparisons for OP, RP, LP, and EP as forest-plot are shown in Fig. 3a. Upon sensitivity analyses including only those studies with at least 12 months follow-up time (n = 13), comparable results were obtained (Additional file 1: Table S1). A total of 15 studies provided information on primary versus secondary UPJO [14, 16, 21, 22, 25, 27,28,29, 35,36,37, 39,40,41, 44]. Of these, only 9 studies explicitly included secondary UPJO [16, 21, 22, 25, 27, 28, 35, 37, 44]: Only 2 studies compared operative success between primary and secondary UPJO [25, 30]. Baldwin et al. stated, that for LP operative success was higher in the group with secondary UPJO (100% vs 89%). But for EP operative success was higher among patients with primary UPJO [25]. Calvert et al. reported higher success rates among patients with primary UPJO for LP (98% vs. 57%) and OP (96% vs 67%) [29].

Table 3 League table showing pairwise comparisons for all surgical approaches and end points included in the quantitative network meta-analysis
Table 4 P-scores ranking the surgical approaches for every outcome based on the amount of certainty that a single treatment outperforms the average of competing interventions. The p-score ranges from 0 to 1, the latter indicating the highest certainty possible
Fig. 3
figure 3

Pooled estimates for each endpoint. a) operative success. b) complications. c) urinary leakeage. d) re-operation. e) transfusion. f) operating time after sensitivity analyses. g) length of stay

Network meta-analysis of overall complications

The network meta-analysis of overall complications included 31 pairwise comparisons from 23 studies. Compared with OP, LP (OR = 0.62; 95%CI 0.41–0.95; p = 0.027) as well as RP (OR = 0.41; 95%CI 0.22–0.79; p = 0.007) had a statistically lower risk for complications. No statistically significant difference was detected comparing EP and OP. All pairwise comparisons are depicted in Table 3. Associated p-scores are presented in Table 4. The study heterogeneity was potentially irrelevant (I2 = 13.7%, p = 0.1416). Neither Cochran’s Q (Q = 1.02; p = 0.9064) nor the net heat plot depicted inconsistencies. Figure 3b depicts comparisons for OP, RP, LP, and EP.

Network meta-analysis of urinary leakage

The network meta-analysis of urinary leakage included 9 pairwise comparisons from 7 different studies. Compared with RP none of the other surgical treatment options had a statistically significant higher or lower risk for urinary leakage. Table 3 summarizes these findings, and Table 4 depicts the associated p-scores. There was no evidence of study heterogeneity (I2 = 0%, p = 0.5161). No inconsistencies were depicted by Cochran’s Q (Q = 0,58; p = 0.4471) or the net heat plot. Comparisons for OP, RP, LP, and EP as forest-plot are shown in Fig. 3c.

Network meta-analysis of re-operation

The analysis of re-operation was based on 10 pairwise comparisons from 8 studies. Compared with RP, none of the other surgical treatment options had a statistically significant higher or lower risk for re-operation. All pairwise comparisons are depicted in Table 3. Associated p-scores are presented in Table 4. There was no evidence of study heterogeneity (I2 = 0%, p = 0.3001). Neither Cochran’s Q (Q = 0.02; p = 0.9897) nor the net heat plot revealed inconsistencies. Figure 3d depicts comparisons for OP, RP, LP, and EP. Upon sensitivity analyses including only those studies with at least 12 months follow-up time (n = 13), comparable results were obtained and are depicted in Additional file 1: Table S2 [8, 14, 15, 21, 22, 28, 30, 33,34,35, 38, 40, 42]. Five studies stated, whether concomitant stones were present at the time of pyeloplasty but only 2 studies evaluated any effect on one of the outcomes [15, 16, 27, 28, 42]. Lucas et al. reported that presence of urolithiasis did not affect the rate of secondary interventions [16].

Network meta-analysis of transfusion rate

The analysis of transfusion rates included 11 pairwise comparisons from 7 studies. Compared with RP none of the other surgical treatment strategies reached a statistically significant difference in transfusion rates as presented in Table 3. Associated p-scores are depicted by Table 4. The study heterogeneity was “moderate” but not statistically significant (I2 = 42.3.7%, p = 0.4396). There were statistically significant inconsistencies (Cochran’s Q = 6.64, p = 0.0361) for which the sources were not identifiable by visual assessment of the net heat plot. Therefore, no sensitivity analyses excluding any studies were possible. Comparisons for OP, RP, LP and EP are depicted by Fig. 3e.

Network meta-analysis of operating time

A total of 14 pairwise comparisons from 12 studies were included in the analysis of operating time. Compared with EP, RP had a statistically significant longer operating time: mean = 102.87 min (95%CI 41.79 min–163.95 min, p = < 0.001). Further statistically significant differences in operating times were detected when comparing LP to EP with mean = 115.13 min (95%CI 65.63 min–164.63 min, p = < 0.001) and OP to EP with mean = 91.96 min (95%CI 32.33 min–151.58 min, p = 0.003). No statistically significant differences resulted from comparisons of the operative techniques against each other. The study heterogeneity was “considerable” and statistically significant (I2 = 95.2%, p < 0.001). The net heat plot suggested that the study design comparing EP, LP, and OP contributed most to network inconsistencies (Q = 101.59, p < 0.001). In consequence, sensitivity analyses excluding the study of Chen et al. were conducted. This resulted in 11 pairwise comparisons from 11 studies. Again, RP had a statistically significant longer operating time when compared to EP mean = 115.39 min (95%CI 55.58 min–175.19 min, p = < 0.001). This was also true for comparisons of LP or OP to EP. For LP to EP with mean = 127.51 min (95%CI 75.19 min - 179.83 min, p = < 0.001) and for OP to EP mean 76.16 min (95%CI 9.47 min–142.85 min, p = 0.025). In addition, statistically significant longer operating times resulted from a comparison of LP to OP with mean = 51.35 min (95%CI 10 min–92.7 min, p = 0.015). Table 3 depicts all pairwise comparisons of the reduced analysis. Associated p-scores of the reduced model are detailed in Table 4. The aforementioned “considerable” heterogeneity remained (I^2 = 92.4%, p < 0.001) which was not explored further in order to maintain adequate numbers of comparisons. As only direct comparisons remained in the reduced analysis, no evaluation of inconsistencies was reasonable. Bird et al. reported that concurrent treatment of nephrolithiasis did not affect operating time [27]. Figure 3f depicts comparisons for OP, RP, LP, and EP after sensitivity analyses.

Network meta-analysis of length of stay

The analysis of length of stay included 8 pairwise comparisons from 8 studies. Compared with RP none of the surgical treatment options had a statistically significant shorter or longer length of stay. All pairwise comparisons are depicted in Table 3 whereas Table 4 shows the associated p-scores. Analyses of heterogeneity revealed an I2 of 96.7% (p < 0.001). Due to the available study designs with direct comparisons only, no evaluation of inconsistency was warranted. Again, no further subgroup analyses were performed because of low numbers of comparisons. Comparisons for OP, RP, LP, and EP as forest-plot are shown in Fig. 3g.

Publication bias

Assessment of publication bias was possible for operative success and complications with the interventions LP versus OP and LP versus RP each. On visual assessment, slight asymmetry was evident for operative success with the comparison of LP versus OP. Less studies reported on ORs < 0.74. Statistical evaluation did not reveal significant publication bias (p = 0.9118). Visual assessment of publication bias for operative success with RP compared to LP showed slight publication bias as well. Less studies reported on ORs > 1.94. Again, statistical evaluation did not reveal significant publication bias (p = 0.1519). Moreover, visual assessment comparing LP versus OP with respect to complications also resulted in the impression of slight publication bias which was not statistically significant: p = 0.365. Fewer studies reported on ORs < 0.62.Visual assessment and statistical evaluation of the comparison of LP and RP yielded no publication bias (p = 0.4808). Figure S1 in the Additional file 1 illustrates the funnel plots.

Narrative meta-analysis of other outcomes

Only three studies reported on suturing time for RP versus LP [27, 33, 34]. In all cases, suturing time was shorter for RP. Estimated blood loss was reported by 13 studies of which only 4 studies provided estimates of dispersion which did not allow for meta-analyses. In most studies, EP had the lowest blood loss, followed by LP and RP. The highest blood loss was reported for OP in all studies. Due to the nature of the intervention, conversion rates were only reported for RP and LP: Low event rates did not allow for any reliable pooling which applied to death rates as well. Analgesia requirement was reported by 7 studies with different medication such as morphine equivalents, diclophenac, pethidine or tramadol and most of the time without measure of dispersion. Overall, the studies reported EP to have the lowest analgesic requirement, followed by LP and OP. Only one study compared RP and LP and described lower need for analgesic medication for RP. For time to return to normal activity, renal function, and costs only 1 to 3 studies reported estimates with heterogeneous outcome definitions. Therefore, no meta-analysis was possible in these cases.

Discussion

Several surgical techniques have been developed for treatment of UPJO, each yielding unique advantages and potential limitations. Although RP and LP are allegedly superior, these claims are based on pairwise meta-analyses that failed to evaluate all available techniques at once. In contrast, our study provides a comprehensive overview on OP, EP, LP and RP, comparing their performance with respect to crucial clinical outcomes. Our results indicate that RP is the technique with highest rates of operative success, lowest overall complication rates, shortest hospital stay as well as lowest re-operation rates and transfusion rates. On the other hand, EP yields lowest rates of urinary leakage and shortest operating times. Robot assisted surgery is known for its minimally invasive nature which goes along with less postoperative pain and earlier recovery, which probably causes shorter hospital stays. In addition, robot assisted surgery allows for high precision movements with articulated arms and provides magnified 3-D vision for the surgeon [45]. This might explain high operative success rates of RP. Low transfusion rates of RP are the consequence of the minimally invasive nature of robotic surgery which allows immediate and precise reaction to local bleeding [46]. The reason for short operating time in EP probably is that the kidney access requires more time during abdominal surgery than this endoscopic procedure. In addition, EP is less complex, even though the percutaneous approach involves a short flank incision and sewing techniques which are both of low complexity [47, 48]. Another advantage of EP are low rates of urinary leakage, which might be due to the small extent of manipulation compared to pyeloplasty approaches. However, the reduced invasiveness of EP is on the cost of high recidive rates. Urothelial scarring probably explains these differences since EO only involves an urothelial incision as opposed to surgical approaches where strictured tissue is resected [49, 50]. When evaluating different treatment approaches, costs have to be taken into account as well. Only one of the included studies evaluated treatment costs [36]: Link et al. reported 2.7 times higher costs for RP ($5323.80) compared to LP ($1989.87). More literature is available comparing LP, RP and OP: Yu et al. found RP to be associated with the highest median costs ($11,829), followed by OP ($9520) and LP ($8291) [51]. Gettmann et al. published costs for EP ranging between $3842 and $5297 compared to higher costs for LP ($7026) and OP ($7119) [52]. No decision tree analyses have been published to evaluate whether the differences in these expenditures outbalance the benefits of the approaches. Overall study quality was moderate due to limitations in study design such as randomization, concealment of treatment allocation, blinding and omission of multivariable analyses. Still, the nature of interventions and outcomes assessed in this meta-analysis questions, whether higher quality trials would yield relevant changes in the observed effects. Our study is not devoid of limitations, which are mainly inherent to the published trials as its data source: most publications did not adjust for confounding and only one randomized controlled trial could be included. Therefore, the pooled estimates might slightly vary from the true effects. Visual assessment of publication bias yielded minor asymmetries for some the funnel plots. Nevertheless, statistical tests did not return statistically significant evidence for publication bias, which does not completely exclude such bias but suggests low impact. The network meta-analyses of operating time and length of stay yielded statistically significant heterogeneity which could not be bypassed by subgroup analyses in order to maintain adequate numbers of comparisons. Therefore, the pooled estimates might not be generalizable to specific patient subpopulations. Finally, results on the inferiority of EP might be due to differing failure patterns, which is mainly due to missed diagnosis of crossing vessels in EP, and due to inadequate spatulation or incomplete excision of the diseased segment in RP, LP and OP. Still, EP studies were included since they contributed indirect evidence for comparison of other surgical approaches as well. Still, our findings are based on a total of 26 included studies which is the largest meta-analysis published so far and the first comparing more than two interventions simultaneously. The novel network meta-analyses approach further allows for combination of direct and indirect evidence to enhance comparisons of formerly underpowered treatment approaches [53,54,55].

Conclusions

Comparing OP, EP, LP and RP for UPJO in a comprehensive network meta-analysis approach, our study found that RP has the highest rates of operative success and as well as LP lower complication rates than OP. Operating time is shortest for EP, followed by OP, RP, and LP. Surgeons should consider these findings when selecting the optimal treatment method for individual UPJO patients. Further research should aim for improvement of study quality and decision tree analyses based on associated costs.