Introduction

Most neurosurgical procedures are the result of continuous improvement and evolution of existing practices, and are rarely compared with non-operative management. The randomized controlled trial (RCT) is commonly regarded as the pinnacle of trial design and is thought to produce the highest quality evidence to prove effectiveness of interventions [21]. Conducting a randomized controlled trial in neurosurgery is regarded as challenging due to difficulties with patient inclusion, surgical selection bias, finding an appropriate control group, defining clinically relevant outcomes, perceived lack of equipoise, and providing a conclusive answer to its initial question [3, 22]. Most innovation in neurosurgery takes place without formalized oversight, which some justify given the unique nature of surgery, an idea referred to as “surgical exceptionalism” [15]. Perhaps as a result, RCTs in neurosurgery are conducted relatively infrequently, and their quality has been suggested to be poor [4, 12, 18]. This may be especially true for trials comparing neurosurgical procedures to non-operative management, rather than to a different neurosurgical procedure or the use of a medical device [7, 11, 22]. In many other surgical fields, including ophthalmologic surgery and vascular surgery, RCT quality seems to be poor, even though the quality of surgical RCTs seems to be improving [2, 5, 26].

Neurosurgical trial quality, registration, and reporting have been questioned as well [17, 18]. These factors may affect reported outcomes and complicate their interpretability and relevance to neurosurgical care. In this systematic review, the literature was evaluated for RCTs that compared a neurosurgical procedure with non-operative management. In addition to evaluating neurosurgical RCT design, quality, conduction, and reported outcomes, this review aims to assess what trial characteristics are associated with a reported surgical benefit.

Methods

A systematic search was performed in both PubMed and Embase databases according to the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines, [24] in order to identify all potentially relevant trials between 2000 and 2017. The search string was drafted with the help of a professional librarian using search terms related to “neurosurgery” together with specific neurosurgical procedures and synonyms of “randomized trial.” The databases were only searched for RCTs published after 2000 to identify relatively recent trials. The exact search syntaxes for PubMed and Embase are shown in Supplementary Table S1. Studies were included if they described data from a randomized controlled trial that compared any form of surgery to a non-surgical group. Only incisional surgery was regarded as surgical treatment, but sham surgery was regarded as non-surgical. Papers were excluded that (1) were not part of a trial of which the results were already published, (2) had no full text available, or (3) were not written in English, Dutch, German, or French. The initial review was carried out by four independent authors (EM, IM, JS, AD). Disagreements were solved by discussion in which one additional author was involved (MB). The number of published papers per trial was recorded and included published design/protocol, pilot studies, and early results. Data were extracted from the first published paper on main results. These included (a) trial start and end dates, (b) neurosurgical subspecialty, (c) countries involved, (d) number of countries involved, (e) number of participating centers, (f) funding source (non-industry, industry, or not reported), (g) total number of anticipated and included patients, (h) patients per study arm, (i) masking, and (j) if the outcome favored surgery or non-operative treatment. Scopus was consulted for the number of times the first results of the study were cited. The impact factor of the journal was determined as the journal’s indicated impact factor of 2016. Jadad scales were calculated for each trial to measure study quality [8]. The Jadad scale is the most widely used tool to assess methodological quality of a clinical trial giving scores zero (very poor) to five (rigorous) for randomization, blinding, and description of withdrawals and dropouts.

Four trial registries (ClinicalTrials.gov, EudraCT, ISRCTN, and ICTRP) were also searched with synonyms of “neurosurgery” and neurosurgical procedures. All randomized trials investigating a neurosurgical treatment to a non-surgical treatment were included. Registry data and published protocols were used to determine if and what changes, if any, were made to primary and secondary outcome measurements in protocols as compared to the first published trial results. Additionally, the anticipated accrual of patients was evaluated to determine whether it was met. The current status of registered trials was also noted.

Methodological characteristics (as listed above) were evaluated for association with benefit for either the surgical or non-surgical arm by univariate logistic regression. Statistical analyses and data visualization were conducted using R version 3.4.3 (R Core Team, Vienna, Austria, 2017).

Results

After removal of duplicates, a total of 11,469 citations were identified in PubMed and Embase databases. Six hundred four potentially relevant articles were selected through title/abstract screening, of which 193 articles were selected for qualitative synthesis after full-text screening (Fig. 1). A total of 82 individual RCTs were identified (Table 1, Supplementary Table S2). By search trial registries, a total of 84 RCTs were found.

Fig. 1
figure 1

Flowchart depicting study selection

Table 1 RCT demographics per subspecialty

Study characteristics

Of all included randomized trials, 40 (48.8%) could be categorized as spine, 19 (23.2%) neurovascular and neurotrauma, 11 (13.4%) functional-, 10 (12.2%) peripheral nerve–, and 2 (2.4%) pituitary-surgery (Table 1). Overall, a median of two papers (IQR 1–3) were published per trial, with spinal (2, IQR 1–4) and functional (2, IQR 1–2) subspecialties having most publications per RCT. Trial registration was relatively the highest in vascular neurosurgery and neurotrauma (68.4%) and lowest in spine surgery (37.5%). Twenty RCTs were multicentre trials, but this was only the case in 20% of peripheral nerve surgery trials (n = 2). Median time to trial inclusion completion was 42 months (IQR 27.8–68.0). RCTs in peripheral nerve surgery had the lowest median time to study completion (18 months, IQR 12.5–36.5). Overall, the median number of patients included in an RCT was 95 (IQR 50–175), with relatively smaller populations in functional neurosurgery trials (48, IQR 35–118). Study arms were generally distributed evenly (Table 1). Overall, most trials were open-label (59.8%) and double-blind trials were relatively rare (8.5%). Double-blind trials were most common in functional neurosurgery (36.4%). Funding was usually from non-industry parties (58.5%). However, the funding was not reported in 25.6% of RCTs. Median Jadad scores were 3 (IQR 2–3).

Factors associated with trial outcome

The majority of trials reported a favorable outcome for surgical intervention (63.4%) (Table 1). Only 3.7% of all trials reported a beneficial effect of the non-surgical intervention, while the rest (32.9%) did not find any statistical differences. High Jadad scores (≥ 4) were negatively associated with the demonstration of a surgical benefit (OR 0.10, 95% CI 0.01–0.89). None of the other trial characteristics showed a significant relationship to surgical benefit (all P values > 0.05, Table 2).

Table 2 Univariate analysis of trial outcome

Changes in primary and secondary outcome measures

Only registered trials (n = 38) were available for assessment of changes in primary and secondary outcomes. 13.2% of these RCTs changed their primary outcome measurement between registration and publication (n = 5, Fig. 2). 60% of these changes were simple changes to the primary outcome measure (n = 3), 20% added a primary outcome measure (n = 1), and 20% removed one of the primary outcome measures (n = 1, Table 3). Secondary outcome measures were changed in 34.2% of all RCTs (n = 16). 50% were simply changed (n = 8), 37.5% had an additional secondary outcome measure (n = 6), and 12.5% of studies removed one or more of their secondary outcome measures (n = 2).

Fig. 2
figure 2

Changes in outcome measures per subspecialty

Table 3 Changes in primary and secondary outcome measures

Trial continuation and anticipated accrual of patients

65.9% of registered RCTs were completed and 26.8% was still ongoing (Table 4). 7.3% of RCTs had been terminated. This was most commonly due to slow recruitment or meeting a pre-specified futility boundary. The initial anticipated accrual was lowered by more than 10% in 41.9% of all RCTs. The accrual was diminished by 58.5% on average (SD 25.1%). In 12.9% of trials, initial estimated accrual surpassed 110% of planned patient enrolment (mean added percentage 41.2, SD 36.0%).

Table 4 Trial registration data

Academic impact

The median number of citations per study was 95 (IQR 21.8–296.0, Table 1). Peripheral nerve surgery and pituitary trials had the lowest median number of citations (48, IQR 3.3–86.5 and 40, IQR 26.0–54.0, respectively). Median impact factor of the journal in which the study was published was 6.1 (IQR 2.4–39.3). Functional neurosurgery trials had the highest median impact factor at 23.5 (IQR 8.9–48.6). The median number of citations and impact factor did not differ for trial outcome overall (P = 0.33 and P = 0.73, respectively, Table 5). Post-hoc analyses also did not reveal any significant difference in number of citations or impact factor between trial outcomes (all P > 0.05).

Table 5 Average academic impact per outcome

Discussion

The aim of this study was to evaluate trial outcomes in recent neurosurgical RCTs comparing surgery to non-operative treatment. Most studies found superior outcomes for surgery, while non-operative treatment rarely resulted in superior outcomes. The considerable academic impact of the studies indicates that the results of neurosurgical RCTs seem to be of value to the neurosurgical community. However, their clinical impact remains a challenge to determine and it is uncertain to what extent neurosurgical practice was changed as a result of the results of neurosurgical RCTs. It has been suggested that the absence of a surgical benefit promotes non-operative management.

The authors of the identified RCTs are to be applauded for their considerable continuous efforts, given that many trials were registered and had published their protocol. However, this study identified several challenges common among neurosurgical RCTs. The overall quality of the identified studies based on the Jadad score could be considered poor. Also, funding sources were not reported consistently among all studies identified and many trials were not registered. Changes to primary or secondary outcome measures occurred frequently but were not shown to influence whether surgery was found to be superior to a non-operative treatment.

Trial registration and outcome measurement

Results of previous studies have suggested that differences between registered and published outcomes are common among RCTs in general surgery and that these differences are not related to funding sources [10, 23]. This is in line with the results of this study. Interestingly, it has been shown that 91.7% of surgical trials that changed outcome measures published significant results [13]. This is similar to findings in cardiology, rheumatology, and gastroenterology [20]. Furthermore, a recent study of RCTs in spine surgery showed that statistical findings could be considered weak as the addition of only few events or non-events would have changed the significance of the reported finding [4].

Trial quality

This study found a generally poor quality of RCTs based on Jadad scores. These results are in line with two previous studies of neurosurgical RCTs [12, 18]. The study by Mansouri et al. also identified that trials that evaluated surgical procedures met their target inclusion less often than trials that evaluated drugs or medical devices [18]. This may implicate that conducting a trial for surgical procedures is more difficult but may also be the result of bias. Kiehna et al. showed that studies published in high-impact journals had higher mean CONSORT and Jadad scores [12]. Importantly, superiority of the surgical approach did not affect academic impact. It should, however, be noted that both the CONSORT and Jadad scores have limits and do not incorporate all potential (methodological) challenges and limitations of RCTs, especially of surgical RCTs.

Strengths and limitations

This is the first study that sought to evaluate which trial characteristics were associated with the identification of a surgical superiority compared to non-operative treatment in neurosurgical RCTs. Both MEDLINE search engines and trial registries were extensively evaluated. The findings provide a valuable insight into the frequency of trial cessation, adjustment of trial design, and quality of reporting, which may provide useful insights for future neurosurgical RCTs.

There are also several limitations to this study. The search engines and registries only provided a relatively small number of RCTs. There is a possibility that not registered or unpublished trials were not identified. This may have caused selection bias influencing the findings in this analysis of studies. Selection bias by reviewers and publication bias may have occurred for studies that did not find statistically significant results, or an outcome favoring surgery. What’s more, most trials were conducted by surgeons, which may have given inherent bias to preferred outcomes. This may explain why only a very low number of studies were identified that found a neurosurgical procedure to be associated with inferior outcomes. Only RCTs published after 2000 were included, which further limits the number of trials included. Analysis to determine which trial characteristics may be associated with a surgical benefit was complicated because only a minority of the published trials had also been registered and had their protocol available. Therefore, it was not possible to evaluate whether protocols were changed for unregistered studies, which may have provided additional valuable insights. This study is also limited by the sole inclusion of RCTs that compared a surgical procedure with non-operative management. This mainly has implications for oncologic RCTs, as often different radiation and medical regimens are compared instead of a surgical procedure [17]. Moreover, although the Jadad score is the most commonly used assessment tool for trial quality, it does not take allocation concealment into account. This may potentially bias results. Lastly, non-quantifiable trial characteristics that were not compared in this study may influence these findings.

Future studies on neurosurgical RCTs could study subspecialty specific trial characteristics even more profoundly and their influence on trial quality and findings. Also, investigating trials comparing a novel neurosurgical procedure to current standard of practice in a similar fashion to this study may give insightful information on how to better interpret their results. Finally, evaluation of neurosurgical RCTs could be aided by the introduction of a trial registry that is specific to neurosurgery and takes into account the unique challenges of a neurosurgical RCT.

Implication for future neurosurgical RCTs

The findings of this study regarding trial registration, patient accrual, trial completion, publication, and alteration of outcome measures provide suggestions for improvement of future neurosurgical RCTs. Neurosurgical RCTs should seek to answer questions that live among the neurosurgical community and can be answered by an RCT. This requires true equipoise, the availability of patients, and sufficient funding among other things. Other trial designs, such as a prospective observational study, should be considered if they are more suitable to answer unresolved controversies in neurosurgery [16].

Most journals nowadays require an RCT to be registered, disclose their funding sources, and publish a protocol to increase transparency. The protocol should ideally be published in a neurosurgical journal to provide a neurosurgical readership the possibility to suggest alterations to the trial design to improve trial quality and make the potential findings as relevant as possible. Alterations to outcome measures should always be disclosed to readers together with a reason for this alteration. Investigators should be realistic about inclusion and exclusion criteria to meet the estimated number of patients to be included and should optimize the inclusion process. Similar to our findings, another study found trial discontinuation to be common in neurosurgical trials in general, most commonly due to slow recruitment [9]. A pilot study to evaluate the patient inclusion process that also provides an estimate of the outcome measure may prevent inadequate recruitment [14]. Others found that telephone reminders to non-responders, opt-out procedures, and financial incentives may help patient inclusion [25].

Although conducting a neurosurgical RCT may be considered burdensome, they should, in the end, provide answers of the highest possible quality that are relevant to the neurosurgical community. A well-designed and conducted trial could make sure that the effort and funding put in do not go to waste. A trial registry specific to neurosurgery might help address some of the issues affecting the quality of RCTs in neurosurgery. Alternatively, comparative effectiveness research (CER) or pragmatic RCTs may also provide valuable insights and have been suggested to be of great use in spine surgery [6, 19]. Furthermore, “big data” may prove an important tool for identification of trial-worthy innovations. The digitization of medical records, introduction of patient outcome measures, and increasing computational capacity have resulted in the availability of the most comprehensive pre-trial data yet, despite varying quality. These data sets could become of high value by itself in cases where RCTs are not feasible [1].

Conclusion

RCTs comparing surgical to non-operative treatment are rare in neurosurgery and the majority identify a benefit for surgical treatment. The quality of RCTs is generally low and outcome measurements frequently change. Trial registration is done in half of all RCTs and funding sources are not always reported. Furthermore, the anticipated accrual of patient was often greater than the number of included patients. Success of future neurosurgical RCTs could be improved by trial and protocol registration prior to patient inclusion, pilot studies, and use of big data.