Background

Bladder cancer (BC), the 10th most common form of cancer worldwide, has become a major global public health issue [1]. Approximately 75% of BCs do not involve the muscle wall of the bladder [2]. Timely and effective treatment for non-muscle-invasive bladder cancer (NMIBC) can achieve good outcomes, potentially avoiding increase in recurrence rates and progression to muscle-invasive bladder cancer (MIBC) [3].

To optimize patient health care, the use of unnecessary medical intervention should be minimized, and cost-effectiveness should be improved. Clinical practice guidelines (CPGs) for NMIBC drafted by many national and international organizations have therefore been developed.

According to the Institute of Medicine (IOM), a trustworthy CPG is to “be developed via a transparent process by a group of multidisciplinary experts (including patient representatives) screened for minimal potential bias and conflicts of interest, and supported by a systematic review (SR) of the evidence” [4].

Given the standardization of the evidence-based medicine paradigm and concerns about the quality of care and increasing healthcare costs, the flood of CPGs for NMIBC has been accompanied by growing concerns about the variations in guideline recommendations and quality.

There has been considerable debate regarding the management of NMIBC, the clinical course of which is variable and complicated. Significant consensus exists in the majority of areas despite some variations in NMIBC guidelines [5].

To our knowledge, the quality of NMIBC guidelines has not yet been systematically searched and appraised. Therefore, to assist clinicians and patients in the field to make decisions about appropriate healthcare for specific clinical circumstances, we have thoroughly reviewed NMIBC guidelines published within the past 5 years, evaluated the quality of NMIBC guidelines, summarized the management of NMIBC and identified the discrepancies and consistencies.

Methods

Strategy for NMIBC guideline search

An exhaustive search (from January 12, 2014 to January 12, 2019) was performed in the PubMed, Embase, and Web of Science databases using a combination of text-free terms and their corresponding MeSH terms, as well as four major Chinese academic databases. The search strategy on PubMed is outlined in Additional file 1.

We also searched the websites of guideline development organizations and professional societies. A list of the websites with potential NMIBC guidelines are outlined in Additional file 2.

Identification of guidelines for NMIBC

All guidelines related to NMIBC published in English or Chinese were included. A document was considered a guideline if it met the following criteria: (1) Explicit recommendations on the management of NMIBC have been provided. Only the CPGs including recommendations of transurethral resection of bladder tumour (TURBT) and intravesical therapy were included. (2) Evidence-based guidelines. To determine whether the guidelines were evidence-based, we investigated whether they reported a search strategy, literature quality or data extraction that classified the level of evidence (LOE) and graded the strength of recommendation (SOR). (3) Only the recent updated version was included. Single-author overviews, consensus statements, translations of CPGs and adapted CPGs were excluded.

Evaluation of NMIBC guidelines

Four reviewers (J.Z., H.W., Y.Y.W. and Q.H.) from different backgrounds, consisting of urologists and methodologists, with extensive experience in evaluating CPGs independently evaluated the eligible guidelines using the AGREE II instrument. AGREE II consists of 23 key items organized within 6 domains (scope and purpose, stakeholder involvement, rigour of development, clarity and presentation, applicability, and editorial independence) [6].

Each domain identified a unique dimension of guideline quality rated on a 7-point scale scored from 1 (strongly disagree) to 7 (strongly agree). We summarized the domain scores individually and scaled the total of that domain, calculated by the following formula: (obtained score - minimal possible score)/(maximal possible score - minimal possible score) × 100% [6].

Data collection

Two reviewers (T.D., D.Q.W.) independently extracted the details of the guidelines pertaining to the CPG characteristics, such as target disease, guideline developers, LOE and SOR of guidelines, and the related recommendations. The records of the two reviewers were compared, and any disagreement was resolved based on the evaluation of a third reviewer (F.H.).

Whereas various grading systems have been used to evaluate the LOE and SOR in different guidelines, for the convenience of statistics, we discussed and reached a consensus on a composite grading system generated in Additional file 3 for presenting the evidence and recommendations.

Synthesis of guideline recommendations for NMIBC

We conducted a textual descriptive synthesis to analyse the scope, content, and consistency of the included recommendations related to the management of NMIBC. The synthesis was divided into the following sections and items: (1) TURBT and re-TURBT; (2) immediate postoperative instillation of intravesical chemotherapy; (3) measures to optimize chemotherapy administration; (4) induction and maintenance intravesical chemotherapy or immunotherapy; (5) side effects of and contraindication for Bacille Calmette-Guérin (BCG). Only recommendations with any assigned grade could be extracted.

Data statistical analysis

A descriptive statistical analysis was performed by calculating each domain score and scaled domain score. The data for each AGREE II domain were provided as medians and interquartile ranges (IQRs).

Agreement among four reviewers was tested with intraclass correlation coefficient (ICC) with a 95% confidence interval (CI) for each domain. According to the scale proposed by Fleiss, the degree of agreement between 0.00 and 0.40 was deemed poor, 0.41 to 0.75 was fair to good, and 0.75 to 1.00 was excellent [7]. Statistical analyses were conducted using SPSS version 19.0 (SPSS Inc., Chicago, IL, USA).

Results

The flow chart in Fig. 1 shows the process by which we screened and selected the guidelines. Ultimately, there were 9 guidelines that met the inclusion criteria [3, 8,9,10,11,12,13,14,15]. For every guideline that was ultimately included, we systematically collected all accompanying technical and supporting materials to better inform our assessments [16, 17]. The characteristics of the eligible guidelines are listed in Table 1.

Fig. 1
figure 1

Flow chart of the identification process of CPGs for NMIBC

Table 1 Characteristics of the identified guidelines on management of NMIBC

Quality assessment of guidelines

The ICC values for appraisal of the identified guidelines ranged from 0.81 to 0.97, indicating a good agreement among appraisers. The overall quality of the included CPGs was moderate, with the domain ‘clarity of presentation’ receiving the highest score, and the domain ‘applicability’ receiving the lowest score (Table 2, Additional file 4).

Table 2 AGREE II domain scores of included CPGs for NMIBC

Scope and purpose

Guidelines for this domain received a median score of 69.44% with the IQR ranging from 35.42 to 85.42%. The highest score in this domain was 86.11%, as the guideline clearly defined its scope and global objectives and specifically defined the related clinical field and target populations [9].

Stakeholder involvement

The guidelines appraised received the second lowest scores for stakeholder involvement (median, 41.67%; IQR: 30.56 to 75.00%). Six guidelines (66.67%) scored lower than 50% for domain ‘stakeholder involvement’ [3, 8, 10, 11, 13, 15]. Another three guideline panels consisted of a multidisciplinary group of covering clinicians [9, 12, 14], methodologists [9, 12, 14], pharmacists [14] and administrative staff [14]. Two guidelines involved patients or their representatives in guideline development to consider the preferences of the target population [9, 14].

Rigour of development

The median score for the domain ‘rigour of development’ was 48.96% with an IQR ranging from 27.08 to 65.63%. Five guidelines (55.56%) scored lower than 50% [8, 10, 11, 13, 15], this was probably because these guidelines did not report the systematic methods for searching or evaluating the evidence [8, 11, 13]. Only one guideline described the process of how final decisions were made [14]. The proportions of SRs in evidence types were approximately 11.27% [10], 12.78% [3], 14.39% [12] and 14.73% [9] in four guidelines that presented their body of evidence clearly.

Clarity of presentation

The domain ‘clarity of presentation’ received the median score of 80.56% (IQR: 66.67–93.06%), with all guidelines scoring > 60%, as the most relevant recommendations in all guidelines could be easily found with explicit SOR and LOE.

Applicability

The domain ‘applicability’ received the lowest median score (median 34.38%; IQR: 22.92 to 40.63%). In general, there was little information regarding potential organizational barriers, cost implications, and tools for application, except for the NICE guideline [9], which scored 81.25%. Some derivative products including pathways [9], summaries for the public [9], quick reference document [12] and various translation versions [12], could be useful for application. Cost effectiveness was considered only in the NICE guideline, which involved health economists in guideline panels, incorporated health economics evidence and discussed implications for budgets behind recommendations [9].

Editorial independence

The greatest range of scores was observed in the domain ‘editorial independence’ (IQR: 35.42, 85.42%). Although all the guidelines disclosed their conflicts of interest (COI), the quality of disclosure was not ideal. They gave minimal information about ways in which any COI were managed in either tabular or narrative form. A complete summary of the process for identifying, managing and reporting COI during guideline development was only presented in one of the guidelines [14].

Synthesis of recommendations

Of the 9 guidelines, one guideline did not present the LOE underpinning the recommendations [11], and the remaining eight guidelines used six grading systems to rate the LOE and seven grading systems to rate the SOR (Additional file 5).

A total of 177 recommendations on the management of NMIBC were extracted for statistics (Additional file 6). Three guidelines tended to formulate a recommendation supported by more than one type of evidence, resulting in no correspondence between the number of types of evidence and recommendations [9, 10, 12]. It could be clearly seen that recommendations rated as grade A (33.9%) plus grade B (49.7%) accounted for a higher proportion, whereas evidence rated as level 2 (48.1%) plus level 3 (20.9%) accounted for a higher proportion.

To demonstrate differences between the identified guidelines, the key recommendations for the management of NMIBC were extracted and summarized (Tables 3, 4 and 5, Additional files 7, 8 and 9). Although the contents of recommendations achieved a significant consensus in most areas, there were some noteworthy discrepancies in these guidelines.

Table 3 Recommendations of TURBT and re-TURBTa
Table 4 Recommendations of intravesical therapy for low and intermediate risk patientsa
Table 5 Recommendations of intravesical therapy for high risk patientsa

Discussion

The rigour of CPG development needs to be improved in the future

The rigour of development could be an important domain for measuring the credibility of guidelines. The most effective CPGs should incorporate the current best evidence and place it in the context of local settings. Failure to use SRs to support their recommendations or to make explicit links between the supporting evidence and the recommendation still existed in some guidelines.

If recommendations were made, the strength is linked directly to the consideration of benefit and harm. Research for intervention safety should be conducted and safety outcomes should be set as key outcomes to balance benefit and harm. A transparent process for reaching consensus is vital for guideline validity, and it is also necessary to record details of all processes by which evidence was appraised and how recommendations were formulated.

Consumer involvement in cancer-related guidelines

Consumers are broadly defined as recipients of health care who provide a layperson’s perspective and can help in reaching consensus regarding the appropriate rating, presenting recommendations in ways that are understandable to patients and respectful of their needs and acting as a safeguard against conflicts of interests [18].

For example, a patient might consider that the potential benefits in terms of survival might not be worthwhile in view of the potential important, even life-threatening side effects, of a given treatment. Therefore, it is important to consider patient views and expectations in cancer-related treatment recommendations.

BCG instillation has more noticeable side effects than chemotherapy, so the balance between benefit and harm it should be given special attention when making recommendations, especially when attributing the SOR.

The need to improve the implementation of guidelines during the development process

The score of the applicability domain was disturbingly low, indicating that guideline panels considered the development and implementation of the guidelines as separate activities and did not pay enough attention to the potential facilitators and barriers to the guideline dissemination [19].

To facilitate implementation, guideline panels should consider the publication types and format when reporting the guidelines. Some derivative products were specifically tailored for the target users, including summaries, algorithms and wall charts [20]. Some other resources, such as commissioning support, including audit, measurement and bench marking tools, might be needed as well [16].

Furthermore, disparities in available resources for health care were enormous and shocking. Most included CPGs were developed for situations having full resources so incurring the maximal level of costs, making the applicability of limited utility. Cost-effectiveness analyses were needed for a sensible recommendation especially for developing countries. Economic evaluation should start during scoping of the guidelines. A reliable health economist shall be available to give advice on which questions are likely to require an assessment, and conduct the assessment and then report the results prior to the formulation of recommendations [21].

Recommendations varied in detail for a variety of reasons

Although most CPGs recommended TURBT and intravesical therapy, they differed in some details such as indications for re-TURBT and the use of chemotherapy agents and BCG in intermediate and high risk NMIBC.

The reasons for offering different recommendations were undoubtedly multifactorial, which might in part be explained by the fact that the guidelines were produced by organizations from different contexts and settings. It could be possible that some discrepancy in guidelines arose through limitations in the current evidence for guideline panels to support their recommendations. In addition, the lack of a transparent process for recommendation formulation resulted in the risk of current evidence having been interpreted differently, because of the different weighting given to certain outcomes during decision making process.

Notably, the recommendations were mostly based on low and moderate quality evidence, whereas the SOR results rated strong plus moderate accounted for a higher proportion. The lack of high-quality evidence might have increased the role that the decision-makers’ opinion had to play in framing the recommendations. Apart from the methodology of guideline development, guideline panels need to focus more on the growing body of evidence.

Issues that need to be resolved to optimize the treatment

Although the recommendations covered most areas for managing NMIBC patients, some issues that need to be resolved for optimizing treatment have been indicated in some guidelines.

The first important item was whether the second TURBT should be performed after the intravesical therapy followed by the TURBT and whether intravesical therapy should be offered before pathology reports are available. The ESMO guidelines described re-TURBT as a reasonable option in high-risk NMIBC tumours after intravesical therapy, whereas the grade of the recommendation was rated low at III.8 The need for further research was obvious.

Such an acknowledged item was which BCG strain is the safest and most effective option [3, 10, 12,13,14]. Different BCG strains have been implicated in determining responses to BCG, and some strains could influence antitumour immune responses as has been suggested by clinical studies comparing different BCG strains [22]. However, the trial did not reach statistical significance for progression free survival, and none of the CPGs could offer related recommendations. Further evaluation using prospective trials might be needed [12, 23].

Different drug combinations of BCG, chemotherapeutic agents and interferon have been evaluated in various studies, such as interferon plus BCG [24], interferon plus epirubicin [25], BCG plus MMC [26], or BCG plus isoniazid [27]. While CPGs don’t really recommended an optimal combination option, probably because of insufficient evidence, no significant different decrease in recurrence and progression could be found for any of these combination therapies [3, 9, 10, 12, 14].

Despite the disappointing results of combination therapy to date, device-assisted therapies have shown some promising data. Several studies have evaluated the efficacy of hyperthermia to improve the penetration of chemotherapy agents into the bladder wall, thus potentially improving outcomes [28]. The use of electromotive drug administration (EMDA) has been demonstrated to reduce recurrence rates and prolong disease-free intervals [29]. The definitive conclusion, however, needs additional studies to further validate their efficacy as first- and second-line treatments [10, 12].

Limitations and strengths

Our study might have some potential limitations. First, various grading systems to rate the LOE and SOR make it difficult to compare LOE and SOR among guidelines. Second, recommendations about BCG relapse and radical cystectomy have not been extracted from guidelines, causing the presentation and synthesis of recommendations on the management of NMIBC to be potentially incomplete.

Nonetheless, our present study was reliable and helpful. First, a systematic literature search was conducted for screening eligible CPGs. Second, the reviewers applied AGREE II quality criteria to each CPG and achieved excellent interrater agreement. Furthermore, this is the first attempt to systematically synthesize and appraise CPGs for NMIBC management.

Conclusions

The quality of NMIBC guidelines in the past 5 years was moderate. The included guidelines often failed to meet the methodological criteria for ideal development and implementation as described by AGREE II. Notwithstanding many consistencies, the recommendations were sometimes inconsistent in details; to what extent this was attributable to the underlying development process remained unclear.