Background

Epilepsy is one of the most common neurological diseases and has affected more than 68 million people worldwide [1]. Although antiepileptic drugs (AEDs) are currently the primary treatment option for patients with epilepsy (PWE), about 40% of PWEs will suffer the consequence of drug-resistant epilepsy (DRE) [2]. According to the International League Against Epilepsy, DRE is defined as the failure to achieve sustained seizure freedom after adequate trials of two tolerated and appropriately chosen AEDs treatments (monotherapies or combination therapies) [3]. The mechanism of DRE is not fully clear and may be related to the sensitivity of drug targets, activity of drug transporters, cytochrome P450, structural neural network, and other potential causes of epilepsy [4]. Abrupt and repetitive seizures may lead to neurobiochemical changes in the brain, cognitive decline, and serious psychological problems in PWEs, which can seriously affect patients’ quality of life and cause an increased burden on their families. Accurate prediction of the efficacy of AEDs before the initiation of treatment can reduce the use of ineffective drugs, alleviate patients’ pain, and improve the prognosis in patients.

Doctors and patients often find it hard to decide whether to reduce or stop AEDs use. Although PWEs can have better cognitive performances and higher quality of life after AEDs withdrawal, there is also an increased risk of recurrence. To avoid recurrence, many patients decide to put up with the side effects rather than completely withdrawing the AEDs. Therefore, there is an urgent need of effective and practical decision-making tools to assist clinicians to establish the course of AEDs treatment and withdrawal, as well as to help realize precise, personalized treatment.

Prediction models can integrate multiple clinical or non-clinical parameters within a certain time to calculate the probability of diagnostic outcomes as well as the disease prognosis. These models can stratify patient risk stratification to support clinical decision-making and improve the prognosis and quality of care for patients [5]. Prediction models are divided into two main categories: those based on statistics and those based on machine learning algorithms (MLAs).

The statistical prediction models are those whose development is based on statistics, such as univariate and multivariate logistic regression/COX regression analysis to select prediction variables. The integration of multiple selection variables is used to calculate the probability of a particular diagnosis or disease prognosis [5]. The development of statistical prediction models involves collection of datasets, selection of prediction variables, development of a prediction model, evaluation of the model’s performance, internal and external validation, and further update of the model. This type of method can be used to create an easy-to-use prediction scoring system [6]. One example is the Framingham risk score, which is widely used in the public health field for estimating the probability of the occurrence of cardiovascular diseases in an individual within the next 10 years. This model was built based on the traditional prediction variables of age, sex, systolic blood pressure, hypertension treatment, total and high-density lipoprotein cholesterol levels, smoking, and diabetes [7].

With the development of artificial intelligence, machine learning is concerned with algorithm induction to improve model performance by using statistical and computer science approaches, and has shown potentials for industrialization. Machine learning has also been applied to fields like speech recognition, image classification, text translation, and medical care [8] for the detection of critical findings in head computerized tomography scans [9] and the classification of cancer [10]. The machine learning technique is superior to manual assessment by clinical experts in that it has higher accuracy of diagnosis and outcome prediction, and it can also be used for epilepsy, especially for automated seizure detection, analysis of imaging and clinical data, epilepsy localization, and prediction of medical and surgical outcomes [11]. Additional validation techniques, such as the hold-out cross-validation, k-fold cross validation, and “leave-one-out method” cross-validation, can be used to estimate the performance of the technique. In the following, we will give a brief overview of the efficacy of statistical prediction models and MLAs for predicting AEDs treatment response and patients’ outcome after AEDs withdrawal.

Prediction models for the response to AEDs treatment

Drug selection mainly relies on official guidelines and clinical experience of doctors, due to the fact that the treatment efficacy varies among individual PWEs. Personalized selection of effective AEDs still remains a big challenge. Some prospective studies have identified certain predictive factors of DRE, such as early onset, sex, duration of epilepsy, multiple seizure types, comorbidities, history of central nervous system infection, cognitive impairment, epilepsy syndrome, presence of structural abnormalities in magnetic resonance imaging (MRI), previous history of status epilepticus, family history of epilepsy, history of perinatal brain injury, and certain electroencephalography (EEG) features [12,13,14,15]. The identification and determination of these parameters are the basis for creating a predictive model.

Statistical prediction models

Based on the statistical method, certain variables can be selected and integrated into a model and a score system can be created for a specific purpose; this type of model has been tested in the field of epilepsy. Boonluksiri et al. [16] enrolled 308 children with epilepsy in a retrospective study, and they selected the age at onset, prior neurological deficits, and abnormal EEGs as variables, and established a scale for predicting DRE in children. The children were then divided into 3 groups depending on the risk of developing DRE: low risk (score < 6 points), moderate risk (score 6–12 points) and high risk (score > 12 points), with positive likelihood ratios of 0.5, 1.8, and 12.5, respectively, and an area under the curve (AUC) of 0.76. However, as this retrospective study was conducted in a single center with a small sample size and lacked both internal and external validation, the performance and practicality of this model need further validation. In a previous study [15], we developed a scale for predicting DRE in adult patients with MRI-negative epilepsy [MRI(−)DRE]. The AUC was 0.89 and the risk stratification was given as: low risk (0–3 points), medium risk (3–5 points), and high risk (> 5 points). Using this method, the probability of DRE could also be calculated. However, this scale was limited by the retrospective design based on data of 132 patients, so further validation is needed. Latzer et al. [17] created a model to predict DRE in children with cerebral palsy at the Tel Aviv Medical Center in Tel Aviv, Israel and this model was used in a retrospective study including 118 patients. The model was composed of four parameters (low Apgar score at 5 min, neonatal seizures, focal-onset epilepsy, and focal slowing on EEG) and the AUC was 0.84. Although their model helped to identify which patient would achieve better seizure control, the study was performed with a small sample size and the lack of validation makes it difficult to judge the model’s performance (Table 1).

Table 1 Statistical prediction models for the response to AEDs treatment

In summary, a few comprehensive statistical models with multiple variables have been established for predicting the response to AEDs treatment, but they were weakened by some limitations including the retrospective design, small sample sizes, and the lack of internal and external validation. To address these, more prospective, multi-center studies with large sample sizes are required. In addition, these models need to be verified in multiple centers.

Machine learning algorithms

MLAs can be used to extract more EEG, imaging, and clinical features of patients to build prediction models and validate performance through the use of more methods. MLAs can also be readily applied in artificial intelligence-based industrialization, an extremely relevant and competitive field today.

UCB Pharma has been actively involved in conducting research on the development and validation of MLA for use in the prediction of AEDs effectiveness in individual PWEs. Devinsky et al. [18] at the New York University Medical Center, based on the UCB–IBM collaboration, explored the application of MLA to construct an algorithm for AEDs prescription. A total of 50, 000 PWEs were retrospectively enrolled in the study and randomly divided into a training group of 40, 000 patients and a testing group of 10, 000 patients. Roughly 5, 000 features were extracted to build the prediction model, which had an AUC of 0.72 and was considered to have a good predictive power. The patients with the model-predicted AEDs regimen had significantly higher survival rates than those who received another treatment. There were large discrepancies in the frequency of use of certain AEDs or their combinations between the model-predicted AEDs regimens and the actually prescribed regimens. The model performed even better than epileptologists in clinical scenarios of monotherapy with levetiracetam or lamotrigine. Regrettably, only 13% of the actually prescribed AEDs regimens matched with the regimen chosen by the model. Although this model was based on a large sample size and was applied in clinical practice, an obvious limitation of it is the lack of external validation. Thus it still needs to be further optimized to improve the accuracy. An et al. [19] recently trained and tested three algorithms, i.e. the multivariate logistic regression analysis, the support vector machine, and the random forest algorithm, to identify patients at high risk of DRE. A total of 292, 892 patients met the inclusion criteria for epilepsy; 175, 735 of them were assigned to the training cohort and the other 117, 157 were assigned to the test cohort, and 1 270 features were screened as predictive factors. The random forest algorithm had an AUC of 0.76 and performed the best of the three models. It could predict the emergence of DRE approximately 2 years in advance before a patient failed two AEDs trials. The drawback of this study was that it was a retrospective study without external validation. Furthermore, the DRE incidence was only 13.1%, which was lower than that in other studies, indicating that the dataset had significant limitations.

A number of pharmacogenomic studies have focused on identifying single nucleotide polymorphism (SNP) markers for predicting the outcomes of AEDs treatments, and some studies have tried to establish certain multi-SNP models to predict the response to AEDs. Petrovski et al. [20] prospectively collected the genetic results of patients with newly diagnosed epilepsy and developed a multi-SNP classification model, based on the k-nearest neighbor supervised learning approach, to predict the seizure freedom 1 year after AEDs treatment. Their study included 115 patients: 80% of them (92 cases) were enrolled in the training cohort and 20% (23 cases) were enrolled in the validation cohort. Two hundred and seventy-nine candidate genes were involved and five genes [rs658624 (SCN4B), rs678262 (SCN4B), rs2808526 (GABBR2), rs4869682 (SLC1A3), and rs2283170 (KCNQ1)] were selected for the final model. The model showed a good predictive accuracy of 83.5% in the developmental cohort by cross-validation; its sensitivity and positive predictive values were all above 80% in the two independent validation cohorts. However, the sample size of this study was small, the external validation was lacking, and the model was derived only from the traits of drug genes while not involving EEGs, MRIs, or other key clinical and demographic characteristics, which might affect its predictive performance. Shazadi et al. [21] assessed the validity of Petrovski’s algorithm in two UK cohorts of newly diagnosed epilepsy patients, and showed that the multi-SNP prediction model was not predictive for the initial treatment response. They also found that the five SNPs appeared to have an impact on the prescription of carbamazepine or valproate in the UK patients.

Some Chinese researchers have also investigated the use of machine learning techniques to predict AEDs effectiveness. Yao et al. [22] established five classical MLAs (decision tree, random forest, support vector machine, XGBoost, and logistic regression) to predict the outcomes of AEDs treatment in patients with newly diagnosed epilepsy. They prospectively collected information of 287 patients with newly diagnosed epilepsy and followed up the patients for a minimum of 3 years at the Second Affiliated Hospital of Zhejiang University. The patients were classified into the remission group and non-remission group with regard to the outcome of seizure re-occurrence, and the former group was further divided into the early remission group and late remission group. The authors evaluated the performance of the models based on their precision, recall, F1-scores, and AUC values. The results showed that the XGBoost algorithm had the best predictive performance between the remission group and non-remission group, with an F1 score of 0.947 and AUC value of 0.979, and between the early remission group and late remission group, with an F1 score of 0.836 and AUC value of 0.918. They claimed that the classified prediction could help doctors make clinical decisions and improve treatment strategies. In our previous study [23], we created a model based on support vector machines (SVM) to predict the possibility of seizure freedom after levetiracetam therapy. In a retrospective study including 46 PWEs treated with levetiracetam, 80% of the patients were used to establish the SVM model and the other patients were used to subsequently test the model. Before the start of levetiracetam treatment, 11 clinical variables and four EEG parameters (sample entropies of α, β, θ, δ) were extracted. Our SVM model showed an accuracy of 72.2% in a five-fold cross-validation, an accuracy of 75.0% in a jack-knife validation, and an accuracy of 67.7% in a hold-out validation in the training cohort. The prediction accuracy of our model was 90% in the test cohort, and three different verification methods all showed good reliability. The drawbacks of our model were a lack of external validation and that the data were derived retrospectively from a single center; the sample size was also small. Furthermore, the kernel function and dimension of SVM could also have affected the accuracy of the model. Therefore, this model needs to be optimized and performance must be improved by utilizing a larger dataset (Table 2).

Table 2 Machine learning algorithms for the response to AEDs treatment

While the development in machine learning technology allows for more algorithms to be created and applied in the field of AEDs treatment, it is currently difficult to use this technology in clinical practice because of its complexity and the inconsistent variables. Specific software or web calculators need to be produced to facilitate clinical use and industrialization of the models.

Prediction models for the outcome of AEDs withdrawal

About 70% of newly diagnosed PWEs could achieve seizure freedom following appropriate AEDs therapy [12], but the timing at which to stop AEDs is an important issue that remains a significant challenge for both patients and doctors. Due to the fear of seizure relapse, many PWEs choose to continue AEDs even after experiencing long-term seizure freedom, enduring side effects of the treatment. If PWEs remain seizure free after AEDs withdrawal, their psychological stress and quality of life can be significantly improved. In 2013, the Italian League Against Epilepsy issued guidelines on AED withdrawal in PWEs who had achieved a long period of seizure freedom [24], and these guidelines recommended discontinuation of AEDs treatment after a minimum seizure-free period of 2 years. It has been found that the earlier the drug is discontinued, the higher the chance of seizure recurrence is. Some factors, such as abnormal EEGs, mental retardation, perinatal insults, abnormal neurologic signs, partial seizures, older age of onset, and female sex, can independently increase the risk of seizure relapse. Although the guidelines systematically evaluated certain independent variables for AEDs withdrawal, there was not an integrated and comprehensive model for predicting the outcome of AEDs withdrawal.

Statistical prediction models for AEDs withdrawal

In 2017, Lamberink et al. [25] established two nomograms to predict the seizure recurrence and seizures in the last year of follow-up after AEDs withdrawal in seizure-free patients. They initially did a systematic review and meta-analysis to identify those studies, and then they invited the author to participate in the research, including 1 769 PWEs with ten studies in the end. The adjusted concordance statistics were 0.65 for predicting recurrence and 0.71 for predicting long-term freedom; the calibration plots showed good performance of both models. This model also showed good performance of discrimination and a web-based calculator was subsequently built for practical purposes. The study had a large sample size and the model was representative to a certain extent and had some clinical value. Given that these nomograms were established from a pooled analysis of previously published data, the uniformity of clinical variables was slightly poor and multiple imputations were used to deal with the missing data. Furthermore, there was a lack of external validation. Therefore, the universality of these nomograms requires further verification by different research teams. Lin et al. [26] from the First Affiliated Hospital of Wenzhou Medical University did an external validation of the Lamberink model. The AUCs for predicting the recurrence and long-term outcomes were 0.71 and 0.68, respectively. The calibration plots showed that the Lamberink two-year model had a good fit and, with respect to the decision curve analysis, the Lamberink two-year model also had good performance. Lin’s research showed that the Lamberink two-year model may have a greater value in guiding drug withdrawal in adult PWEs than other models.

Lamberink et al. [27] also created two nomograms for individualized prediction of recurrence and long-term outcomes of AEDs withdrawal after pediatric epilepsy surgery. They included 766 children from 15 centers, and the final models were composed of 3–5 factors. The discrimination in terms of adjusted concordance statistic was 0.68 for predicting seizure recurrence and 0.73 for predicting long-term seizure freedom and the calibration plots showed good performance. A visualized prediction tool is also provided online. In addition to the large sample size from multiple centers, the validation of these nomograms was executed well and supported by a web-based calculator. This indicates that these models have high clinical value for recommending the cessation and withholding of AEDs after pediatric epilepsy surgery. However, as there was no external validation, the application of the nomograms in other populations remains to be tested (Table 3).

Table 3 Statistical prediction models for AEDs withdrawal

Machine learning algorithm for the prognosis after AEDs withdrawal

There has been no report of MLA use for the prognosis or prediction of recurrence after AEDs withdrawal. Future research is needed.

Conclusion

Several predictive models for AEDs regimen selection and withdrawal in epilepsy have been developed to facilitate clinical decisions, but there are some common issues caused by small sample size or inconsistent parameters. In future studies, a group of main parameters should be established initially. Then, more prospective, multi-center studies with large sample sizes should be conducted to develop certain predictive models, which can be widely accepted in the field of AEDs treatment in order to improve the prognosis and quality of life of patients with epilepsy.