Introduction

The timely diagnosis and treatment of patients with seizures can prevent significant morbidity and mortality [1]. Approximately 30% of patients with altered mental status in critical care settings have seizures, and over 90% of these are nonconvulsive seizures that can only be detected with electroencephalography (EEG) [2,3,4,5,6]. There is rising awareness among healthcare practitioners that the burden of seizures, including nonconvulsive seizures, is associated with brain injury and, thus, continuous or frequent seizures (i.e., a high seizure burden) merit timely detection and treatment [7,8,9,10]. For this reason, both timely interpretation of EEG data and timely and accurate quantification of seizure burden are paramount to minimizing brain injury.

Unfortunately, the conventional practice of EEG in critical care and emergency department settings suffers from both delayed access to EEG recordings and significant delays in its interpretation by skilled neurologists [11, 12]. When actionable EEG interpretations are not immediately available, treatment decisions are made on the basis of clinical suspicion alone, which results in the potential for missing or undertreating some patients with nonconvulsive seizures and overtreating a significantly larger number of patients without seizure activity who may not need aggressive antiseizure medications [13, 14]. To provide timely access to EEG, novel rapid EEG systems can be used to enable physicians and allied health professionals to acquire EEG within minutes and stream the data in real time to the cloud, where a machine learning-derived classification algorithm can be applied [15,16,17].

The use of artificial intelligence (AI) in clinical medicine has been on the rise, and within the specialty of neurology, brain signals have proven particularly amenable to the machine learning approach [18]. There are few software programs commercially available to detect seizures or epileptiform discharges and mark the EEG tracing to help streamline expert review by neurologists. More novel methods for automatic detection of seizures and epileptic spikes have also been described in the literature with various methods and varying degrees of accuracy [19]. Such algorithms are designed to work solely with traditional EEG systems, which as noted are often delayed or unavailable, especially during after-hours and weekends [11, 12] and are too cumbersome for nonneurology experts to use at the bedside. While a full review of these methods is beyond the scope of our present work, it is noteworthy that no method has yet been developed to provide an automated and quantified metric of seizure burden (i.e., frequency of seizures per unit of time) to help bedside practitioners caring for critically ill patients. Providing such feedback would allow for risk stratification and evaluation of treatment response, as well as for determining the urgency of requesting neurological consultation, in real time.

The use of AI-assisted programs for EEG interpretation is becoming increasingly necessary as the utilization of EEG is expanding in the fields of critical care and emergency medicine while human resources are scarce, and detailed review of many simultaneous continuous EEG recordings by neurologists in real time is simply too cost prohibitive to be deployed at scale. As a result, there exists a significant unmet need for automated algorithms that could assist nonexperts by providing a reliable risk stratification tool using EEG data in real time [18, 20, 21]. Such a tool could alert the bedside nurse or provider on call when it detects a near-continuous epileptiform pattern resembling status epilepticus that may require urgent management and enable providers to see the real-time effect of administered antiseizure medications on the burden of seizure activity.

In the current study, we aim to validate a supervised machine learning algorithm, labeled as Claritγ (Ceribell Inc., Mountain View, California), that was recently approved by the FDA and is being used in clinical settings as a clinical decision support tool. We designed this retrospective study to measure the performance of the algorithm applied to EEG data acquired using Ceribell’s Rapid Response EEG system (Rapid-EEG) from patients in critical care and emergency department settings and to describe its potential clinical implications.

Methods

Rapid Response EEG System

The Rapid-EEG system (Fig. 1, see www.ceribell.com for additional information) consists of a headband with ten electrodes connected to a handheld recorder. The headband is placed circumferentially around the head and is fastened over the forehead with a locking clasp, and EEG setup is typically performed by nurses, allied health professionals, or healthcare providers trained by either online or in-person sessions. The Rapid-EEG electrodes (1–5 on the left, 6–10 on the right; electrode number increases anterior to posterior, i.e., leads 1 and 6 are near the frontal pole and leads 5 and 10 are near the occiput) correspond approximately to the lateral chains of the International 10–20 system (Fp1–F7, F7–T3, T3–T5, and T5–O1 on the left; Fp2–F8, F8–T4, T4–T6, and T6–O2 on the right), and a longitudinal bipolar montage is constructed to display the EEG waveforms. During EEG recording, the system simultaneously measures the impedance between two adjacent electrodes once per minute using a test frequency that is outside the EEG recording band to avoid creating artifact. The data are acquired as digital samples at a rate of 250 Hz. The handheld recorder displays, records, and wirelessly transmits the data to a remote cloud server, where a cloud-based seizure detection software continuously monitors the EEG recording.

Fig. 1
figure 1

Rapid Response EEG system. The Rapid Response EEG system (Rapid-EEG) consists of a portable EEG recorder and a disposable electrode headband. Recorded EEG tracings are shown on the device screen (1) and sonified when needed (2) by the bedside recorder. HIPAA-compliant secure Wi-Fi connection enables real-time transfer of the data to the cloud where the EEG tracings can be reviewed by expert neurologists using the remote portal for EEG review (3). Machine learning computations (by Claritγ algorithm) are performed on the cloud portal (4) interfacing in real time with the bedside device. As such, the system is meant to provide not only easy and fast access to EEG acquisition, but also a reliable and actionable diagnostic information for risk stratification using four different modes of triage

EEG Data

All EEGs were obtained with Rapid-EEG (Ceribell Inc., Mountain View, California). The recordings were from adult patients (≥ 18 years old) undergoing evaluation for altered mental status and possible seizures in intensive care units and emergency departments at six academic and community hospitals across the USA between January 2018 and April 2019. EEG data were anonymized, and no identifying demographic or clinical information was accessed for this study. The study was classified as exempt research according to the US Department of Health and Human Services regulation 45 CFR 46.104(d)(4), and individual patient consent was not required. It should be noted that the Claritγ algorithm was developed using training and testing datasets that were entirely different and independent from the dataset we have used in the current study for the validation of the algorithm performance.

Validating Claritγ Seizure Burden Algorithm

We validated the performance of the Claritγ algorithm in a cohort of 353 Rapid-EEG recordings from 353 patients. An overview of the algorithm is shown in Fig. 2. The signal from each EEG channel was filtered and segmented into nonoverlapping ten-second bins. Time-domain and frequency-domain features were calculated for each ten-second bin of EEG signal. Seizure activity was defined for each ten-second bin using multiple features, including measures of power (power within each frequency band and their ratios), morphology (signal amplitude, variability, distribution, and change over time), rhythmicity and regularity (measures of entropy), and correlation (cross-channel correlation of signals). For each ten-second bin, the algorithm classified the segment of signal as either seizure or nonseizure in a deterministic (rather than probabilistic) manner. Seizure burden was calculated as the percentage of ten-second bins of EEG data in a 5-min period that were classified as seizure activity. This seizure burden value is updated every ten seconds to generate a rolling 5-min window, resulting in a continuous seizure burden trend that represents the evolution of the patient’s seizure prevalence over the course of monitoring. The maximum seizure burden for the duration of each recording was used as the final output of the algorithm for this study; however in clinical practice, the seizure burden value and trend would be available continuously.

Fig. 2
figure 2

Computation of seizure burden. The output of the Claritγ algorithm was a continuous quantitative trend of seizure burden values, which represented the percentage of 10-second long bins of EEG data in a 5-min period that contained seizure activity. Seizure burden values updated every 10 s; therefore, consecutive seizure burden values (e.g., value 1 and 2, as shown, offset by 10 s) could represent the evolution of the patient’s seizure prevalence over the course of the recording. Seizure burden thresholds were adapted from American Clinical Neurophysiology Society guidelines [22], such that “frequent” seizure activity was defined as 10% seizure burden (i.e., 30 s of seizure activity within a 5-min period), “abundant” seizure activity was defined as 50% seizure burden (i.e., 2.5 min of seizure activity within a 5-min period), and “continuous” seizure activity was defined as 90% seizure burden (i.e., 4.5 min of seizure activity within a 5-min period). An alert was presented to the user when seizure burden reached a threshold of 90%, which indicated a high risk of status epilepticus and the impending need for urgent clinical intervention

Seizure burden, the prevalence of seizure activity within any 5-min period, was described using thresholds adapted from American Clinical Neurophysiology Society guidelines [22] as follows: 10% was defined as “frequent” (indicating 30 s of seizure activity), 50% was defined as “abundant” (indicating 2.5 min of seizure activity), and 90% was defined as “continuous” (indicating 4.5 min of seizure activity). A seizure burden of 90% (4.5 min) indicates activity approaching the definition of status epilepticus (5 min), so the algorithm would present an alert to the user at any point when the seizure burden reaches a threshold of 90%.

Reference Standard Defined by Expert Neurologists’ Review of EEG

Each EEG file was independently reviewed by at least two independent neurologists with fellowship training in clinical neurophysiology or epilepsy. Reviewers were blinded to patients’ clinical information, including medical history, indication for EEG monitoring, prior treatment with antiseizure medication, and Claritγ seizure burden trend. Expert consensus (reference standard) was defined by agreement between a minimum of two neurologists; additional reviewers were consulted if the first two neurologists did not agree until a majority consensus was reached. Reviewers were instructed to indicate whether the EEG contained normal, diffusely slow, highly epileptiform, or seizure activity. Highly epileptiform patterns (HEP) included activity that did not fully meet the Salzburg criteria [23] for electrographic seizure activity, but did represent abnormal electrographic epileptiform activity such as periodic discharges or lateralized rhythmic delta activity [24, 25]. These patterns merited separate consideration from other nonseizure patterns for two major reasons. First, these patterns exist on a spectrum of pathologic activity along with seizures—the so-called ictal–interictal continuum (IIC)—and at times, they can be difficult to distinguish from seizures, especially absent ancillary clinical information [26,27,28]. Second, these patterns might benefit from treatment with antiseizure medications, and measuring the performance of the algorithm in these cases would have clinical significance [26, 29,30,31]. See Fig. 3 for a representative sample of each of these categories recorded using Rapid-EEG. Each EEG recording and each individual labeled episode were divided into one of three categories based on the most severe pattern present in the recording defined by expert majority consensus: seizures (high severity), HEP (intermediate severity), and normal or slow activity (low severity). Therefore, a reference standard was generated for individual episodes as well as for the overall EEG recording. EEG readers also labeled the start and end of the EEG patterns, which allowed for calculation of seizure duration that could be compared to the algorithm output.

Fig. 3
figure 3

Samples of EEG recorded with Ceribell Rapid Response EEG System. Each EEG is displayed in a ten-second epoch with filter settings of 1–30 Hz. The line plot under each EEG shows the Claritγ algorithm output. The top image shows seizure activity approaching the 90% threshold to trigger a status epilepticus alert, and the bottom image shows lateralized periodic discharges that go undetected by the algorithm

Statistical Analysis

We defined the reference (“gold”) standard as the consensus agreement of at least two neurologists reading the same EEG, both for each overall EEG record and for individual episodes of expert-identified seizures. We then tabulated Claritγ algorithm output of seizure burden against this reference standard. Using these tabulations, we calculated the sensitivity, specificity, and false detection rate (number of false positive events divided by the total duration of EEG recordings in hours) of various seizure burden thresholds (10%, 50%, 90%) against the expert consensus; 95% CIs for sensitivity and specificity measures were calculated using established formulas [32]. It should be noted that the three thresholds are cumulative—i.e., all 90% alerts by definition generate both a 10% and a 50% notification, and all 50% alerts by definition first generate a 10% notification. Our validation study was not designed to optimize among the three different thresholds.

Given the wealth of evidence [33,34,35,36] describing the variability between expert EEG reviewers, whether due to human error or to differences in interpretation, we sought to contextualize the observed diagnostic accuracy of the seizure detection algorithm by quantifying the inter-rater variability. We calculated the sensitivity and specificity of identifying EEGs with status epilepticus for each of the experts that reviewed at least 250 h of EEG. No single expert reviewed all of the cases, and because each of the experts reviewed a different subset of the recordings, we could not quantify inter-rater variability using Cohen’s κ.

Results

Reference Standard

A total of 353 Rapid-EEG recordings were evaluated in this study with a cumulative EEG recording duration of 1052 h (mean EEG duration: 233 ± 227 min). No cases were excluded. Expert consensus determined the most severe feature of the overall EEG record to be status epilepticus in nine cases, seizures (duration < 4.5 min) in eight cases, HEP in 87 cases, and normal or low background in 249 cases (Table 1). Within the 353 recordings, a total of 47 discrete seizure events were identified by expert consensus.

Table 1 Summary of Claritγ Performance (individual patient level)

Algorithm Performance

Claritγ algorithm output of seizure burden compared against expert consensus diagnosis for both overall EEG records and individual events is summarized in Table 1, and performance (in terms of sensitivity, specificity, and false detection rate) is summarized in Table 2. Claritγ detected ≥ 90% seizure burden (seizure activity ≥ 4.5 min, thereby triggering an alert for impending status epilepticus) in nine out of nine Rapid-EEGs with status epilepticus, and within these cases, Claritγ correctly detected ≥ 90% seizure burden in 12 out of 13 discrete events of seizure lasting ≥ 4.5 min. Therefore, the sensitivity for identifying status epilepticus was 100% for the overall record, and the sensitivity for identifying individual seizure episodes that were ≥ 4.5 min was 92.3%.

Table 2 Sensitivity and specificity of Claritγ algorithm for seizure detection

The specificity of the ≥ 90% seizure burden notification was 93.0%, resulting in a false detection for only 24 out of 353 EEGs. In 21 (87.5%) of these false detections, the expert consensus categorization of the EEG was HEP, indicating that there was concerning epileptiform activity present in the record, even if no unequivocal seizures were agreed to be present by both reviewers.

Claritγ correctly identified 41 of the 47 total discrete seizure events of any duration (88.2% sensitivity). In the 1052 h of cumulative EEG across the 353 EEG recordings, Claritγ had 383 false detections, resulting in an overall false detection rate of 0.36 per hour. Of the 179 EEG recordings in which Claritγ detected no seizures, seizures were identified by the expert reviewers in only two cases (negative predictive value of 99%). In both cases, the seizures missed by Claritγ were less than 30 s in duration.

The four experts reviewed at least 250 h of the overall dataset (1052 total hours). When these four reviewers’ identifications of status epilepticus were compared to the majority consensus (Table 3), individual expert raters displayed variability in both sensitivity (range 20–89%) and specificity (range 94–99%). In general, reviewers with the lowest sensitivity also had the highest specificity, a natural trade-off. It is also important to note that, in some cases, reviewers classified a status epilepticus case as a highly epileptiform pattern (HEP). This was considered a miss and reduced the reviewer’s sensitivity even though the reviewer did not consider the EEG to be normal. Each expert reviewed a different subset of the 353 total recordings. Between the four experts, a total of 784 individual reviews were performed (an average of 2.22 reviews per EEG recording). No single expert reviewed all of the cases. The number of reviews performed by each expert is shown in Table 3. Overall, experts displayed greater specificity compared to Claritγ, and each of the reviewers misclassified at least one case of status epilepticus.

Table 3 Variability in status epilepticus detection between individual experts and Claritγ status alert compared to expert consensus

Discussion

In this study, we describe the development of an artificial intelligence machine learning algorithm for seizure burden measurement using EEG data acquired with Rapid Response EEG. This algorithm showed high sensitivity (100%) in the detection of status epilepticus (even outperforming some neurologists with fellowship training in clinical neurophysiology or epilepsy) and accurately identified 88% of seizures of any duration when compared to the reference standard of consensus of expert neurologists (Fig. 4). The specificity of the algorithm for seizures at all seizure burden thresholds was principally limited by “misclassification” of highly epileptiform patterns. These patterns possess multiple “seizure-like” qualities trained into our algorithm (e.g., rhythmicity, sharply contoured morphology, high amplitude, and extant correlation across channels) as seizures. Raising the seizure burden threshold to trigger an alarm (i.e., only alerting the user if the burden reaches 90% level) increased the algorithm’s specificity for prolonged seizures as high as 93%. As such, the algorithm performance can be considered to be reliable and valid at the extremes of the spectrum of seizure burden (i.e., possible status epilepticus and normal diffusely slow activity), but considerably less so if the burden of abnormality is less frequent or brief in duration. While our new algorithm output provides a trendline and a means to assist in triage/prioritization, it is important for users to understand its lower specificity for milder cases of abnormality or shorter duration seizures.

Fig. 4
figure 4

Summary of Claritγ Performance. Performance of Claritγ algorithm at the group level suggests that the algorithm can be seen as a reliable triage tool to help detect cases of status epilepticus with the highest sensitivity (while overcalling about one-fourth of highly epileptiform patterns as possible status epilepticus). It also performs as a reliable triage tool to help physicians avoid over-aggressive treatments in majority of EEG cases where the overwhelming pattern is either slowing or normal. HEP highly epileptiform patterns, NL normal activity, RDA rhythmic delta activity, SE status epilepticus, SL slow activity, SZ seizure

Suggested Clinical Implementation Workflow

In considering the potential clinical implications of Rapid Response EEG with Claritγ, it is important to ensure its proper integration into existing workflows to secure its maximum (and safe) impact on physicians’ clinical decision making and patient management. We remind the reader that the output of any AI algorithm ought to be interpreted in the context of the user’s pretest clinical judgment. In keeping with this, we have made an initial attempt to recommend a possible workflow for the use of our algorithm in the current clinical practice of Rapid-EEG (Table 4). We are mindful that the workflow may need to be modified in different settings depending on the resources available and the clinical division where the Rapid-EEG technology is being utilized.

Table 4 Suggested clinical implementation workflow of Claritγ algorithm

As suggested in this workflow, Claritγ algorithm has potential utility as the first of its kind risk stratification tool to streamline the practice of stat EEG and guide emergent triage and more precise treatment for patients with low or high suspicion for nonconvulsive seizures. Moreover, it also can lessen the stress and unnecessary burden on neurologists with EEG expertise. We are hopeful that simplified risk stratification offered by our algorithm combined with the earlier and easier acquisition of EEG will lead to better management of patients in need.

Study Limitations

The lack of clinical data provided to reviewers during their retrospective EEG review was the most important limitation of this study and should be considered highly relevant to the interpretation of its findings. In clinical practice, patient history (e.g., an established diagnosis of epilepsy with a known seizure semiology, recent antiseizure medications) and bedside or video observations (e.g., twitching or behavioral changes) are used to inform EEG interpretation and make definitive determinations of seizure versus nonseizure, and treatment is tailored to the patient as a whole rather than the EEG in isolation. Claritγ, like all other automated seizure detection algorithms, does not consider ancillary clinical information, and it is critical to acknowledge that output from such an algorithm does not provide the final diagnostic conclusion for the patient. Indeed, the sensitivity and specificity of any system for EEG review (human or machine) would be significantly improved by access to additional patient data.

The implementation of Claritγ and its impact on real-world diagnostic and treatment decisions were not studied here and may be addressed in future investigations. As noted in the discussion of clinical workflow, individual providers or practice groups may utilize different alarm thresholds to expedite EEG review and treatment or establish treatment protocols that rely on the various functions of Rapid-EEG (visual waveform review, EEG sonification, Claritγ seizure burden) based on providers’ comfort with each.

Our cohort contained a relatively low number of status epilepticus cases (nine out of 353 EEGs) which affects the reliability of our sensitivity estimate. A larger cohort of Rapid-EEG data would be helpful to address this. However, it should be noted that this multisite cohort was obtained without excluding cases and could be considered reflective of the population being studied.

The Rapid-EEG system lacks midline and parasagittal electrodes, and consequently a focal seizure that is highly localized to the parasagittal region would not be detected by Rapid-EEG or identified by Claritγ (although prior research [37,38,39] has argued that this lack of coverage might not significantly impact its sensitivity in critical care and emergency medicine settings).

Conclusions

Implementation of artificial intelligence tools in the field of neurology and epileptology remains limited despite the urgent need for tools to accommodate the expanding practice of EEG and address the inefficiencies of the current EEG infrastructure. Claritγ provides highly sensitive detection of status epilepticus and may be useful as a risk stratification tool that could expedite diagnosis and treatment of patients with nonconvulsive seizures. We believe that machine learning tools will never replace a careful history and examination filtered through a well-honed clinical acumen of the user and recommend its safe use by interpreting its output in the context of proper clinical judgment.