Introduction

Awake craniotomy is currently the growing standard for the majority of newly diagnosed gliomas and remains an essential technique in epilepsy surgery in crucial functional areas [258, 265]. Aside from increased tumor resection and optimal seizure control, awake brain surgery is related to more neurological and cognitive preservation [56, 84]. Historically, during awake brain surgery, language and motor function were most often mapped [52, 258]. However, the scope of neurocognitive deficits associated with gliomas and epilepsy extend far beyond the language and motor domains to, i.e., visuospatial, planning, attention, and social cognition [91, 178]. According to previous work, the bigger extent of domains that is monitored, the larger and safer the resections and the more cognitive functions will be preserved [63]. Besides, due to the prolongation of life expectancy after glioma resection, there is a need of maintaining a similar quality of life after surgery as before surgery [64, 84]. Therefore, next to mapping language and motor functions, attention is currently directed towards monitoring and sparing of neural networks that subserve (higher-order) cognitive processes. For example, executive functions are highly related to quality of life in glioma patients after awake brain surgery. Whereas sometimes going unnoticed in the hospital, serious problems can be experienced regarding planning or multitasking when returning to work [173, 204]. A recent review has outlined several other cognitive deficits that can have repercussions on a patient’s quality of life [64]. For instance, bimanual coordination is particularly important for individuals with musical and sport ambitions and conscious awareness is related to creativity and thus of high importance for artists [64]. What is more, proprioceptive deficits are related to problems in movement control and lowered independence in basic daily life activities [203]. Lastly, social cognition (e.g., mentalizing) is especially important in social interactions, and preservation of these functions is therefore necessary to prevent challenges in social behavior [176]. These examples demonstrate the high importance of extensive cognitive monitoring during surgery. This increased focus on enhancing the quality of life after surgery reflects the shift away from the traditional patient-centeredness, which aims to preserve a functional life for the patient, towards a more person-centered approach that prioritizes preserving a meaningful life for the patient [92].

Therefore, the aim of this study is to investigate whether this preferred change towards more differentiated mapping of cognitive functions has translated to a more varied set of tests used during awake surgical procedures. This study builds upon previous work that offered an overview of the neuropsychological tests used up until 2017 in patients suffering from brain tumors or epilepsy who underwent awake brain surgery [221]. The main conclusion held that language was indeed extensively monitored but that other cognitive domains received much less attention during awake brain surgeries and that there was a need of development of new tests. Since this systematic study was based upon included literature up to February 2017, we aim to build upon this work to investigate whether, and if so, what changes have taken place in the tests used for monitoring cognition by replicating the search with February 2017–November 2023 as incorporated time window [221]. Providing a new overview of the administered tests used during awake brain surgeries and comparing this with the results of the study of 2018 makes it possible to reveal recent developments in the field.

Material and methods

A systematic literature search was conducted using PubMed and Embase from February 2017 up to November 2023 according to PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analysis) guidelines [155]. We replicated the previously used framework to search our databases in which we combined diseases with awake brain surgery (disease [e.g., brain tumor, glioblastoma] AND procedure [e.g., craniotomy]) AND awake [e.g., monitoring, intraoperative]) [221]. For detailed search strategies per database, see the Supplementary.

Given that this systematic review builds upon the prior work, we employed the same approach with regard to in- and exclusion criteria (Fig. 1) [221]. We first screened the papers on title and abstract in which papers were excluded if the population was pediatric, when the language of the paper was other than English, when it was no original article (e.g., review, letter to the editor), when cognition was not monitored, or when the procedure did not comprise awake brain surgery [221]. Moreover, during the full-text assessment, a specific inclusion criteria was a clear description of the test or test paradigm used during surgery. This is especially relevant in the context of the sensory, motor, and somatosensory areas, since the procedure oftentimes starts with mapping these areas [259]. These domains were solely included when extensively studied by means of standardized tests instead of only reporting lack of sensations, movements, or control [221].

Fig. 1
figure 1

Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flow diagram of the systematic literature search with the use of ASReview

The machine learning algorithm ASReview is an artificial intelligence (AI) software and was utilized for screening the articles [238]. The software uses state-of-the-art active learning techniques to accelerate screening abstracts and titles by ranking literature on their textual proximity to previously relevant articles and is designed according to the principles of Open Source science. At first, all the articles that are derived from the database search are uploaded to the software. Beforehand, the researcher classifies minimal three articles as relevant to offer the tool a starting point. For every presented article after that, the researcher will label it as either relevant (inclusion) or irrelevant (exclusion). Based on this input, the software will first present articles which are textual close to the ones that are labeled as relevant. Since the software ranks the papers based on textual proximity, the chance that ASReview will present a relevant article diminishes with every consecutive excluded article. Therefore, our cutoff for stopping to scan title and abstract was set at 50 papers consecutively excluded, with the expectation that no relevant articles would be presented afterwards. For more details about the use of ASReview, we refer the reader elsewhere [238].

The database search yielded 5130 results (Fig. 1). The author screened by means of the active artificial machine learning tool ASReview 1337 articles for relevance based on title and abstract. In other words, after 1337 articles, we had labeled 50 papers consecutively as irrelevant and we stopped the screening process. This means that 3793 papers (5130 in total, minus 1337 that were screened) were not presented to us by ASReview, but these are with a high probability irrelevant. Out of these 1337 screened articles, 856 were excluded based on title and abstract. After removing duplicates, the 453 potentially relevant papers were assessed in full text for eligibility, resulting in a total of 243 included papers. Reference list search was applied and we added papers based on expert consultation, resulting in 272 included articles in total. Once the papers were selected from the literature databases, a description of the cognitive domains monitored during awake brain surgery and the tests or neuropsychological paradigms that were used were extracted from each included paper.

Results

An overview of the final 272 included studies in this quantitative synthesis is presented in Table 1, with each cognitive domain and used test outlined per article. Standardized neuropsychological tests are presented in italic. Figure 2 presents a comparison of the percentual cognitive domains that were monitored, in order to compare our results with the reported domains in the work of 2018 [221]. As visible, the vast majority (90.4%) of included studies tested the language domain (Fig. 2). In 40% of these studies, only the language domain was tested, compared to 68% in previous work. Compared to 2018, there seems to be a trend towards more implementation of motor, visuospatial, emotion, and “other” tasks [221]. Because the “other” category has increased compared to the previous study, the cognitive domains and tests are described in more detail in Table 2. Within this category, proprioception, clock reading, left–right orientation, and processing speed are newly described cognitive domains. When interpreting the results, it should be taken into account that we did not statistically analyze the results.

Table 1 Studies included in the review, with cognitive domains monitored during surgery and test/paradigm used to assess the domain
Fig. 2
figure 2

Percentages of studies reporting tests or paradigms per cognitive domain during awake brain surgery. *“Other” includes executive functions, clock reading, processing speed, left–right orientation, face recognition, musical skills, and proprioception

Table 2 Overview of the tests or test paradigms that are part of the category “other”

Discussion

With the shifted focus towards more extensive monitoring of cognition and more person-centered care, we created an overview to see whether the scope of tests used during awake craniotomy has broadened and we present the most important changes over the last years. First and foremost, the language domain continues to be by far the most extensively, and most often, monitored domain during the surgical procedures. It is not surprising that language is such an integral part of almost all awake craniotomies, since this function is highly related to quality of life [82]. Another reason why language is oftentimes monitored is that it relatively easily meets the specific criteria for tests that are used during awake craniotomy which are different than those for the standard neuropsychological tests used in the clinical setting. For example, a stimulus can only be presented for a very short duration because of time of electrical stimulation [191]. Moreover, tests need multiple stimuli with comparable levels of difficulty to allow for repeated measures, but learning effects should remain minimal [221]. To diminish the possibilities of chance-level guessing, multiple choice answers are less desirable. These criteria are easily applicable for language tests, which contributes to the extensive mapping of this domain during surgery. On the other hand, these criteria can explain why other higher-order functions remain underexposed. For instance, memory tasks in general tend to take much more time and raises the question whether stimulation should be applied during the encoding or retrieval phase. These are examples of factors complicating the development of new tests in such cognitive domains. A notable change within the language domain is the increased use of the Pyramid and Palm Trees Test (PPTT; up to 15.4% compared to the previous 2.5%), a test designed to measure nonverbal semantic associations. Recent work shows that there is a dissociation between cortical areas which are associated with verbal semantic cognition and those with nonverbal semantics [99].

Regarding motor and praxis functions, there seems to be an overall percentual increase of studies testing this domain. For praxis, the hand-object manipulation task (HMt), a novel intraoperative task to prevent post-operative apraxia, is reported in 11 included studies (e.g., [75, 76, 202, 214, 216, 277]). The task is useful for testing regions important in motor execution with the dorsal and ventral premotor areas as main stimulation sites impacting different task features. In short, the task consists of a small cylindrical handle which is inserted inside a rectangular base with a worm screw [214]. By means of a precision grip, the patient is sequentially grasping, holding, rotating, and releasing this handle in a self-generated rhythm. Since they receive no external cues, muscle control is solely guided by tactile and proprioceptive information. The task contributes to identification and preservation of dexterous hand movement areas, extending beyond the dorsal premotor areas towards ventral areas within the premotor central gyrus [214]. One of the advantages of this newly developed task is that the rhythmic movement overcomes the problem of the short electrical stimulation criteria and the task minimalizes learning effects. In a case report, praxis and motor sequencing was tested by implementing the Luria Motor Sequence task [249]. Problems with executing this task are associated with kinetic apraxia, which is the inability to correct for erroneous behavior in complex motor sequences [295]. Whereas the authors did not clearly describe how they performed the task during surgery, it is assumed that the underlying principles align with the Hand Manipulation task, since the task concerns sequencing of movements. This would allow for the short periods of electrical stimulation which is necessary in tasks used during awake brain surgery. The importance of bimanual coordination in sports and music has been previously mentioned and it has been noted that patients with frontal glioma can experience permanent deficits in bimanual movements [64, 118]. In the current included studies, there is no clear evidence that this function is tested, but there are six studies that included the finger tapping task, which is often used to study the motor system and can theoretically be used to study bimanual coordination [285].

Notably, compared to the 2% of studies that previously described measuring social cognition, there is currently more attention for this domain as this percentage increased up to 11% [221]. Of this 11%, more than 73% explicitly mention the Reading the Mind in the Eyes Test, which is a well-validated test for face-based mentalizing, that is, the ability to attribute mental states to others [26]. This subserves anticipating the actions of others, but does not involve making inferences about the content or origin of the mental state. Therefore, attribution of the mental state of others based on the area just around the eyes is a part of mentalizing, but is not all of it [26]. The other 27% made use of other tests for social cognition, such as the Pictures of Facial Affect which shows complete faces instead of just the eyes, a false beliefs task measuring theory of mind, or a task designed to predict mental states of others based on a specific arrangement of pictures [140, 175, 176]. The increased use of social cognition monitoring aligns with the preferred shift towards intraoperative mapping of the higher emotional cognitive states in order to avoid long-lasting social cognitive disorders, due to the strong link between preserved social cognition and social interactions [176].

Regarding visuospatial functions, an increase is seen in studies incorporating this domain during mapping, but only a handful of different tests are being used. The importance of monitoring visuospatial deficits subserves preventing post-operative neglect and hemianopia, which both have a highly negative impact on daily functioning [272]. Visual field tests, naming of objects presented diagonally on a screen which is divided in four quadrants, and line bisection tasks are adequate tests to monitor visuospatial functions. An interesting new paradigm that is already incorporated in some studies is the time-to-contact (TTC) test [29]. The task is developed as a measure of time estimation in which an initial part of an object’s trajectory (e.g., a looming ball in a corridor) is presented for a short period of time [33]. Then, the stimulus is shortly occluded and the participant is required to give indication upon the estimated arrival in their peripersonal space. A benefit of this paradigm is that velocity, occlusion time, and trajectory distance can be varied to allow repeated measures while preventing learning. The decision to use the TTC task in current study was to get a more fundamental understanding of the anatomical structures that are involved in TTC estimations [29]. The authors conclude on a role of the right parietal lobe when in the peripersonal space of the observer [29]. However, there is no conclusion yet on whether this network is essential in visuospatial processing in general or only TTC perception. Whereas only preserving TTC perception is interesting for daily life activities such as crossing a street, if the network generalizes to visuospatial processing in general, this specific task will be a more useful addition to the incorporated tests during awake brain surgeries [29, 30]. Therefore, more research is needed in a diverse patient population with visuospatial deficits. As concluded in 2018, we were in specific need of tests in the executive function domain [221]. The only two studies previously included measured inhibition by means of a go-no go task or the Stroop task [221]. Currently, the Stroop task is most often used, but as can be seen in Table 2, there are other tests that can be implemented as well, such as the TMT-B to objectify set-shifting [147]. Another example which we want to highlight is a case study in which shifting between languages is monitored as measure of cognitive control [244].

The famous-face naming task has received increased attention over the past years. This task is particularly important as deficits in naming people is frequently observed in patients with temporal lobe epilepsy [32]. As retrieving proper names by people is a higher order recognition process, the recent focus on assessing higher order cognitive functions might explain the rise of the test [198]. Naming of (famous) faces could also be incorporated to monitor prosopagnosia.

Incorporating digitalized versions of classical neuropsychological tests is a promising approach for awake brain surgery protocols as it offers the possibility to use tests that are difficult to apply as a paper and pencil version. For instance, the conventional TMT cannot be administered effectively due to the brief duration of electrical stimulation and the logistical challenge posed by the lying position of patients during surgery. The use of digitalized versions of tests overcomes this problem as it can provide not only more continuous outcome measures, but also more fine-grained outcome measures such as response time per connected step in the trail instead of solely overall completion time [273]. This can then be used to objectify sustained attention by reaction time measures during one or several tasks every 4 to 5 s [64]. Others used a tablet to measure set shifting by means of the Trail Making Test part B and the digital line bisection task to measure spatial attention [147]. The Symbol Digit Modalities Test to measure processing speed also has a digitalized version that could be incorporated in surgical protocols [195]. Therefore, we hope to see a shift in the upcoming years in which more classical tests will be digitalized to stimulate the use of these during awake craniotomy. Of course, precise, and quantitative registrations from digitalized tests should always go together with more qualitative outcome measures. For example, alterations in the emotional tone of the voice or in patient’s mimic may be an indication of changes in social cognition or emotional expression and may be as relevant as exact response time per item.

The results of this study demonstrate an enormous number of tests or test paradigms that can be used for monitoring different cognitive functions during awake brain surgery. Some of them are frequently used, others still only sporadically. This frequency tells us something about for example the feasibility of the test during surgery. However, the frequency in which a test has previously been used or reported should not be leading in deciding which tasks will be used for an individual patient. For this, other more important factors must be leading, e.g., location of the tumor and the surrounding cortico-subcortical neural circuits, patient’s cognitive complaints, and patient’s wishes [143].

The results of previous review showed that in the majority of studies solely one cognitive domain was monitored during the surgical procedures [221]. In current review, this decreased to 49%, indicating a trend towards monitoring multiple domains and using different tests. Mapping a broader cognitive range can result in more global preservation of cognitive functions. However, not everyone agrees that all (complex) cognitive functions should be monitored during awake surgery. There is an interesting debate about the expansion of cognitive mapping in the context of the onco-functional balance [98, 141]. The fact that complex cognitive functions seem to rely on large-scale networks makes them possibly more difficult to map with electrostimulation [98]. Furthermore, it can be questioned whether neuropsychological tests used during surgery indeed measure the complex cognitive functions you wanted to map [98]. In addition, some cognitive functions are possibly more resilient to damage than others. In contrast, others do advocate developing new tasks to better explore such complex cognitive functions, both extra- and intraoperatively [141]. However, before introducing new tasks that can be used to monitor cognition during awake surgery, their level of evidence should be analyzed in a systematic way [144]. Although the field is quickly developing, many research questions are still in need of being answered. Publishing about cognitive monitoring during awake surgery, specifically about which tests are used to measure what kind of cognitive domains in combination with clear descriptions of outcome measurements (cognitive outcomes, but for example also extent of resection), contributes to best patient care and we therefore recommend these steps for future research.

As with any study, some strengths and limitations should be discussed. An advantage of the method used in current paper is the extensiveness of the search string and that this is an exact replication of previous work so that the results can be compared [221]. Moreover, we have made use of the relatively new artificial tool ASReview, which has been proven to be efficient and reliable [238]. However, using machine learning–based screening system does have drawbacks. For example, the tool does not provide an accurate estimation of the system’s error rate and bias in data extraction and coding remains present [238]. That being said, screening by humans remains imperfect and mistakes can have been made during the labeling of studies [283]. Furthermore, with the large number of included studies in this review, we do not expect that the results would deviate a lot from current findings depending on missed articles or wrongfully excluded articles (due to human error).

Conclusion

In conclusion, the current study indicates that there is a positive trend towards implementation of a broader range of tests during awake brain surgery. We see a shift towards more extensive monitoring during the procedures, especially in the domains of motor functioning, social cognition, visuospatial processing, and executive functioning. In order to achieve more extensive cognitive monitoring, implementations of new tests, revised tests, or digital versions of more traditional neuropsychological tests during surgery offer opportunities for the future. We hope to see that this process will be continued during the upcoming years to increase the quality of life after awake craniotomy and to strengthen the focus on the specific needs of the patients.