Machine learning in neurosurgery: a global survey

Background Recent technological advances have led to the development and implementation of machine learning (ML) in various disciplines, including neurosurgery. Our goal was to conduct a comprehensive survey of neurosurgeons to assess the acceptance of and attitudes toward ML in neurosurgical practice and to identify factors associated with its use. Methods The online survey consisted of nine or ten mandatory questions and was distributed in February and March 2019 through the European Association of Neurosurgical Societies (EANS) and the Congress of Neurosurgeons (CNS). Results Out of 7280 neurosurgeons who received the survey, we received 362 responses, with a response rate of 5%, mainly in Europe and North America. In total, 103 neurosurgeons (28.5%) reported using ML in their clinical practice, and 31.1% in research. Adoption rates of ML were relatively evenly distributed, with 25.6% for North America, 30.9% for Europe, 33.3% for Latin America and the Middle East, 44.4% for Asia and Pacific and 100% for Africa with only two responses. No predictors of clinical ML use were identified, although academic settings and subspecialties neuro-oncology, functional, trauma and epilepsy predicted use of ML in research. The most common applications were for predicting outcomes and complications, as well as interpretation of imaging. Conclusions This report provides a global overview of the neurosurgical applications of ML. A relevant proportion of the surveyed neurosurgeons reported clinical experience with ML algorithms. Future studies should aim to clarify the role and potential benefits of ML in neurosurgery and to reconcile these potential advantages with bioethical considerations.


Introduction
Recent years have witnessed the rise of machine learning applications in the scientific literature, both in basic science and clinical medicine [18,26]. Neurosurgical practice has always relied on the individual experience of surgeons to carefully balance surgical indications, operative risk and expected outcome [30]. The advent of evidence-based medicine has framed the surgical decision-making process into guidelines based on the results of high-quality data, and of randomized controlled clinical trials-not devoid of several flaws in design themselves [19]. This approach, despite remaining the gold standard, is limited by the oversimplification of patients' individual characteristics that often do not allow patientspecific analytics. With the exponential growth of data in the era of big data, it is increasingly important to provide clinicians with tools for integrating this individual patient data into reliable prediction models. The latter primarily aims to enhance the surgical decision-making processes and potentially improve outcomes, but predictive analytics furthermore harbour the potential to reduce unnecessary health-care costs [21,29,31,34,36,37,41].
It is often difficult for clinicians to integrate the many described risk factors and outcome predictors into a single workable prognosis [3]. Neurosurgical research and clinical practice is ideal for the application of machine learning (ML), which harbours the potential for predictive analytics to integrate all relevant patient factors in a way that is often too complex for natural intelligence [28,40]. Moreover, ML can be used to extract deep features from data such as radiological and histological images, or genomic data [16,[38][39][40]43].
At present, the neurosurgical literature is increasingly focusing on substituting traditional statistical models with more complex ML models with the aim of improving predictive power [29,31]. For example, ML has been used in neurosurgery to predict post-operative satisfaction [2], early postoperative complications [41] or cerebrospinal fluid leaks [37]. Despite this encouraging trend and the presence of recent publications reviewing the large range of publications on ML in neurosurgery [28][29][30], data on the worldwide adoption and perception of ML in our specialty are currently lacking. Our aim was to carry out a worldwide survey among neurosurgeons to assess the adoption of ML algorithms into neurosurgical clinical practice and research and to identify factors associated with their use.

Sample population
The survey was distributed via the European Association of the Neurosurgical Societies (EANS) and Congress of Neurological Surgeons (CNS) in January, February and March 2019. The EANS is the professional organization that represents European neurosurgeons. An email invitation was sent through the EANS newsletter on January 28, 2019. Furthermore, the membership database of the CNS was searched for email addresses of active members and congress attendants. The CNS is a professional, US-based (US) organization, that represents neurosurgeons worldwide. At the time of the search, the database contained 9007 members from all continents. A total of 7280 neurosurgeons had functioning email addresses and were recipients of the survey. The survey was hosted by SurveyMonkey (San Mateo, CA, USA) and sent by email alongside an invitation letter. Reminders were sent after 2 and 4 weeks to non-responders to increase the response rate. To limit answers to unique site visitors, each email address was only allowed to fill in the survey once. All answers were captured anonymously. No incentives were provided.

Survey content
The online survey was made up of nine or ten compulsory questions, depending on the participants' choice of whether they had or had not used ML in their neurosurgical practice. A complete overview of survey questions and response options is provided in Table 1. The order in which potential reasons for use/non-use were displayed was randomized to avoid systematic bias. The definition of ML applications that were provided within the survey was: "Any form of artificial intelligence (AI)-based or algorithm-based assistance, including but not limited to (online) prediction models, automated radiographic analysis (i.e. segmentation, classification), diagnostic models, ML-based scoring systems, etc. Logistic and linear regressions are also considered ML. Other common ML algorithms include (deep) neural networks, random forests, decision trees, gradient boosting machines and naïve Bayes classifiers. The survey was developed by the authors based on prior, similar surveys carried out in a similar population [9,10]. This report was constructed according to the Checklist for Reporting Results of Internet E-Surveys (CHERRIES) guidelines [8].

Statistical analysis
Continuous variables are given as means ± standard deviations (SD), whereas categorical variables are reported as numbers (percentages). By use of multivariable logistic regression models, we identified independent predictors of adoption of ML algorithms into clinical practice and research, respectively. Countries were grouped by region (Europe/North America/ Latin America/Asia and Pacific/Middle East/Africa) according to a previous worldwide survey by Härtl et al. [10], and response rates per region were calculated. Fisher's exact test was applied to compare ML implementation rates among regions. The importance of reasons for use or non-use of ML was compared among regions using Kruskal-Wallis H tests. When calculating the ratio of respondents who had applied ML in research, we incorporated both respondents who had never used ML in their research as well as those who do not participate in medical research into the denominator. All analyses were carried out using R version 3.5.2 (the R Foundation for Statistical Computing, Vienna, Austria). A p ≤ 0.05 was considered statistically significant in two-sided tests.

Response rate and respondent characteristics
A total of 7280 CNS/EANS members were sent the survey and 362 complete or incomplete answers were received for analysis. The descriptive data of respondents are provided in Table 2. The most represented age range was 30-40 (32.6%), and 89.2% of the answers were from male participants. Most of surveyed neurosurgeons were specialized in spine surgery (36.2%). As far as the work setting was concerned, more than two-thirds of the neurosurgeons were practicing in an

Machine learning in clinical practice and research
A total of 28.5% and 31.1% of the surveyed population responded positively when asked about the use of ML in clinical practice and in clinical research, respectively. Concerning the use of ML in clinical practice, stratified by region (  Figure 1 illustrates the worldwide clinical use of ML. We also asked respondents to list the kinds of applications that they employed ML for ( Table 4). The most frequently reported uses of ML were for prediction of outcome (60.2%) and complications (51.5%), as well as to interpret or quantify medical imaging (50.5%). In addition, neurosurgeons applied ML to better inform their patients (38.8%), to grade disease severity (37.9%) and for diagnostic analytics (19.4%).

Predictors of machine learning use
Multivariate logistic regression analysis (Table 5) was used to investigate independent predictors of ML use in clinical practice and research. Our analysis revealed that none of the studied variables was associated with increased or decreased use of ML in clinical practice, demonstrating the wide and homogenous adoption of ML globally. On the other hand, surgeons specialized in neuro-oncology (odds ratio (OR) = 2.76, 95% confidence interval (CI) = 1.28 to 6.05, p = 0.010), functional neurosurgery (OR = 2.79, 95% CI = 1.03 to 7.47, p = 0.040), trauma (OR = 3.8, 95% CI = 1.44 to 10.02, p = 0.007) and epilepsy (OR = 3.8, 95% CI = 1.14 to 12.9, p = 0.030) were found to be significantly more likely to apply ML for research purposes with respect to the reference group. Also, when referenced to neurosurgeons working in academic hospitals, those working in non-academic centres (OR = 0.23, 95% CI = 0.08 to 0.57, p = 0.003) or in private practice (OR = 0.36, 95% CI = 0.14 to 0.85, p = 0.026) were significantly less likely to engage in ML-based research.

Attitudes towards machine learning in neurosurgery
The surveyed population was also asked to rate the importance of the factors that encouraged or prevented the application of ML in neurosurgical clinical practice (Table 6). Among those the surgeons adopting who had already adopted ML into their clinical practice, their most important reasons determining this choice were first improved preoperative surgical decision-making/treatment selection (3.27 ± 0.86), followed by objectivity in diagnosis/grading/risk assessment (3.22 ± 0.84), improved anticipation of complications (3.13 ± 0.92) and improved shared decision-making/patient information (3.07 ± 0.9), while less importance was given to potential time savings (2.62 ± 1.07). These attitudes towards the benefits of ML in clinical practice were compared among regions, with no significant differences between the regions apart from the anticipation of complications (p = 0.048).
On the other hand, when asked to rate reasons for not using ML, lack of skilled resources (staff, equipment) to develop a model received the highest score (3.11 ± 0.98), followed by time limitations restricting ML application in clinical practice (2.85 ± 0.96), lack of available ML models for the indications of interest (2.84 ± 1), uncertainty concerning which processes may benefit most from application of ML algorithms (2.75 ± 0.96) and, less importantly, lack of data quantity/quality to develop a ML model (2.67 ± 0.99). The lack of personal conviction of the added value of ML scored last (2.04 ± 1.05). The only differences among regions were observed in terms of the affordability of ML applications-this reason for non-use of ML was rated significantly higher in the Middle East and Latin America (p = 0.034).

Discussion
There exists no prior published data on the worldwide adoption of ML in neurosurgery. This global survey reached a diverse cohort of neurosurgeons at different levels of training.
Our results indicate that ML has already quickly gained wide acceptance in the neurosurgical community, without notable heterogeneity in its global distribution. Almost a third of neurosurgeons reported having applied ML in either clinical practice or research, a value that exceeded expectations. Furthermore, the most common applications of ML in neurosurgery were for prediction of complications and outcomes, as well as to interpret or automatically quantify imaging. No predictors of clinical ML use were identified, again stressing that the availability and acceptance of readily developed ML tools are not bound by socio-demographic factors. On the other hand, among research-active neurosurgeons, some subspecialties as well as academic surgeons appear to apply ML more frequently for their research.
Our study is the first to our knowledge to provide a worldwide overview of the implementation of ML in neurosurgical clinical practice and research. To our surprise, almost a third of respondents stated making use of ML, and this was true for both clinical practice and research. Although this can be partially explained by response bias-with academic surgeons active in the EANS and CNS targeted and with a likely higher response rate to our survey among surgeons interested in ML-our results still indicate that ML is quickly becoming one of the foremost technologies in neurosurgical practice. Importantly, the heterogeneity in adoption rates among regions was relatively low, and adoption of ML into clinical practice was not apparently influenced by limitations in costs or socioeconomic status, as is the case with other less accessible technologies such as robotics [33,35]. While the development of ML models can often be expensive and resourceintensive, the application of readily trained ML algorithms does not usually require especially high technological standards or expenses. Many ML applications are web-based [25]. For this reason, we expect that ML will increasingly enable enhanced diagnostic, prognostic and predictive analytics around the world, even in the most rural areas. After controlling for potential confounding factors, we could not identify factors associated with increased or decreased use of ML in clinical practice. This again demonstrates how homogenously ML use seems to be distributed among the neurosurgical community. On the other hand, subspecialists in neuro-oncology, functional neurosurgery, trauma and epilepsy were significantly more likely to apply ML in their research. As expected, surgeons working in nonacademic centres and private practice were less likely to engage in ML-based neurosurgical applications, consistent with the development of ML models currently being rather confined to academic institutions possessing the resources, protected time, expertise, extensive databases and computational power to create and distribute algorithms. However, it has to be considered that the development of e.g. ML-based prediction models has been massively eased by free software packages released by the major technology companies, which nowadays enable training of simple ML models on even the most basic notebooks. Still, the development of models may be limited by a lack of high-quality, structured datasets [24].
In fact, ML has already been broadly applied to several subspecialties in neurosurgery spanning from cranial [1,7,39], vascular [15,32], spinal [5,11,13,25,31,36] and radiosurgery, among others [23,41]. Several examples of how ML outperforms traditional statistics and prognostic indexes commonly applied in the clinical practice are already available in the medical literature. For example, a recent study by van Niftrik et al. reported the use of a gradient boosting machine to predict early post-operative complications after intracranial tumour surgery [41]. The authors were able to show improved performance with respect to conventional statistical modelling based on logistic regression and interestingly observed that among the variables in their model, features that were not taken into account in the statistical model, such as histology, anatomical localization or surgical access in fact contributed strongly in the ML model [41]. Oermann et al. also showed that artificial neural networks performed better at 1-year survival prediction than more traditional models in patients with brain metastases treated with radiosurgery [22]. The same group also was able to show an improvement in predictions of arteriovenous malformation radiosurgery outcomes [23]. Staartjes et al. found that a deep learning approach was significantly better at predicting intraoperative cerebrospinal fluid leaks and gross total resection in pituitary surgery than logistic regression, while no predictors could be identified using traditional interferential statistics for the former outcome [34,37]. In spinal neurosurgery, applications of ML have included prediction of outcome in patients with lumbar disc herniation and lumbar spinal stenosis [2,31,36], or to predict complications following elective adult spinal deformity procedures [14]. For example, Khor et al. developed a prediction model from a state-wide database to predict clinically relevant improvement after lumbar spinal fusion and integrated their model into a freely available web app, which was then externally validated [13,25].. Again, this shows that while it may be resource-intensive to develop such models, they can be rolled out to clinicians and patients around the world for free using simple interfaces.
Radiological applications are ideally suited to machine learning algorithms given the magnitude and complexity of data extractable from examinations such as CT and MRI scans. Interestingly, ML models can establish a hidden relationship between deep radiological features ("radiomics") and outcomes of the pathology of interest. Lao et al., for example, were able to stratify patients into different prognostic subgroups based on radiomic features [17]. Similarly, it has been shown that it is possible to identify IDH mutation status in gliomas from radiomic features alone [4]. Finally, more extravagant applications of ML in neuroradiology include e.g. the generation of synthetic CT images-practically indistinguishable from actual CTs-from cranial MRI [6,42].
Despite these positive results, still many present and future potential ML applications remain unknown to the majority of neurosurgical specialists. Our study determined that the factors deterring the use of ML were, in decreasing order, lack of skilled resources (staff, equipment) to develop a model, time limitations restricting ML application in clinical practice, lack of ML models for the indications of interest, uncertainty concerning which processes may benefit most from the application of ML algorithms, as well as-less importantly-lack of data to develop a model, and lack of personal convincement of the added value of this new technology.
Our results warrant some considerations. First, once a ML model with clinical relevance is developed and after it has been externally validated [25], the focus has to shift on making it easy to implement and widely available in clinical practice. Web-based apps that are clinician-or patient-friendly are ideal [12,13,25]. Second, while a large proportion of neurosurgeons may already be applying ML in their clinical practice, it is important to foster ML literacy in the neurosurgical community. As with randomized studies forming the basis of evidence-based practice, clinicians should be able to make an informed decision as to which ML models published are likely valid and have applied good methodology, and which ones should probably not be trusted in clinical practice. Lastly, ML relies on the availability of "big data" to be exploited for algorithm training and validation subsequently [21,24]. A wide and complete collection of patient data in the sense of population-based databases enables more representative ML models. Integrated databases with automated comprehensive data collection that are necessary for such applications are currently few and far between, preventing the development of highly generalizable models [20,21,24,27].

Limitations
Survey-based studies, while able to provide important insights, have inherent limits because of several potential biases. During survey distribution, selection and response bias are frequent. Time constraints on responders may have limited their ability to answer with maximal accuracy, and in fact concerning the adoption of ML into clinical research, we obtained several incomplete or blank answers. The data is mostly based on subjective impressions of surgeons. Knowing this, bias could arise from the fact that surgeons who are more exposed to neurosurgical ML can value it more positively than those who do not routinely make use of it, and vice-versa. However, the reasons for advantages and disadvantages were specifically captured separately for users and non-users. Additionally, the relative percentage of geographic regions was skewed in favour of western countries, limiting the sensitivity of our survey for what concerns regions such as Asia and Pacific, South America and in particular Africa with only two responses.

Conclusions
This study provides a first global overview of the adoption of ML into neurosurgical practice. Machine learning has the potential to improve diagnostic work-up and neurosurgical decision-making by shedding light on radiological interpretation, surgical outcome and complication prediction and as a consequence patients' quality of life and surgical satisfaction. A relevant proportion of neurosurgeons appears to already have adopted ML into their clinical practice in some form. The homogenous distribution of ML users in neurosurgery is a testimony to the accessibility of readily developed ML algorithms, even in low-resource settings. Still, many structural issues need to be addressed in order for ML to achieve its full potential in neurosurgery. These include easy-to-access resources for surgeons and patients; prospective-integrated data collection systems to allow model development; and surgeon education on ML, all of which can add to the rapid development of ML in neurosurgery while ensuring high quality of the introduced tools and their correct application. Best practice recommendations, external validation and sound methodology are necessary for any ML tool before its application in our high-stakes clinical practice. Furthermore, future trials may be conducted to assess the real clinical impact-and any changes in decision-making-that may be caused by ML algorithms in neurosurgery.