Predicting the post-treatment recovery of patients suffering from traumatic brain injury (TBI)

Predicting the evolution of individuals is a rather new mining task with applications in medicine. Medical researchers are interested in the progression of a disease and/or how do patients evolve or recover when they are subjected to some treatment. In this study, we investigate the problem of patients’ evolution on the basis of medical tests before and after treatment after brain trauma: we want to understand to what extend a patient can become similar to a healthy participant. We face two challenges. First, we have less information on healthy participants than on the patients. Second, the values of the medical tests for patients, even after treatment started, remain well-separated from those of healthy people; this is typical for neurodegenerative diseases, but also for further brain impairments. Our approach encompasses methods for modelling patient evolution and for predicting the health improvement of different patients’ subpopulations, i.e. prediction of label if they recovered or not. We test our approach on a cohort of patients treated after brain trauma and a corresponding cohort of controls.


Introduction
In the recent years, methods from machine learning and data mining are increasingly being used in the epidemiological and clinical domains. These methods help the clinicians in studying the causes, effects and progression of diseases and the treatments as well. In the context of brainrelated degenerative diseases, e.g. traumatic brain injury and mild cognitive impairment, medical researchers want to analyse/monitor the patients suffering from such disease as they evolve over time. In particular, they would like to answer questions like: have the patients reached a similar state like that of healthy people? or given a patient's current state what is the most suitable treatment regime that can be recommended to him? or how likely is it for a certain patient to recover from the disease? In order to provide answers to these questions, we propose mining methods that learn evolutionary predictive models over the evolving cohort of patients. These methods determine (1) if the patients have achieved a state as that of healthy people by juxtaposing them to a cohort of controls, and (2) given a patient's current state, will he show recovery after he has been prescribed a treatment regime or plan.
The study of patient evolution on the basis of timestamped clinical data has been largely influenced by the seminal work of Cox [1] on censored failure times and agespecific failure rates. As pointed out by Fitzmaurice et al. [2], the work of Cox [1] ''...was followed by a rich and important body of work that established the conceptual basis for the modern survival analysis'' [2]. Survival analysis is not applicable to this problem, because there is neither a well-defined target event, nor explicit timepoints to guide the learner. Although there is a control population to juxtapose the patients to, there are no target values to predict, because the assessments of the controls are very different from those of the patients. To acquire the labels for the patients, we rely on the recommendations of the clinical experts. We present here a method that learns an evolutionary model from unsupervised data and can also incorporate the labels for supervised evolutionary prediction.
Hospitals in recent years have started to maintain elaborate electronic health records. These store not only the condition or the state a certain patient is experiencing (for example, blood pressure pulse rate, sugar level, etc.) but also keeps track of the medications, their impact and side effects. An important challenge with respect to the impact of a treatment emerges when the desirable target state is not well defined: if clinical data show that patients after treatment are in a different state than before treatment, but they do not exhibit the abilities of a comparable healthy population (controls), what can then be concluded about the impact of treatment? We propose a method that predicts how a treatment improves the state of brain trauma patients, although there is no well-defined target state and the control population exhibits features (values in medical tests) that patients cannot reach.
We study recordings of patient and control cohorts over a certain time horizon. Longitudinal analysis of cohorts is an established and a mature field of research in statistical domain. Focus of the earliest studies in the longitudinal analysis stemmed from the studies on morbidity and mortality [2].
The contributions of our approach are as follows. We model the evolution of subpopulations of patients, for whom only two moments are available, whereby these two moments are not defined as timestamps. 1 We use this model to compute a future/target state for each patient and also the recovery labels based on clinical recommendations. We show that the projected target state of patients allows a reasonable comparison to a control population, the recordings of which are very different from the patient recordings.

Related work
Data mining methods are only recently deployed for analysis and prognosis of brain pathologies or injury conditions. The authors of [3] use different methods (e.g. decision trees, multilayer perceptron and general regression neural networks) to analyse data from neuropsychological tests (concerning attention, memory and executive function tests) from 250 subjects before and after a cognitive treatment instrumented by a cognitive tele-rehabilitation platform. Their objective is to predict the expected outcome based on the cognitive affectation profile and the performance on the rehabilitation tasks. Our objective is not the prediction of a well-defined outcome, but rather of the future similarity between treated patients and a population of healthy people.
In [4], the authors present an artificial neural network model that predicts in-hospital survival following traumatic brain injury according to 11 clinical inputs. A similar approach was taken by Shi et al [5], who also consider neural networks and logistic regression, but rather study recovery from brain surgery. An early discussion of methods for prediction of recovery from brain injury, including shortterm evolution of patients, can be found in [6]. The effect of cognitive therapies along longer periods (6 months to 1 year) is studied in [7,8]. Brown et al. learn decision trees on variables that include physical examinations and indices measuring injury severity, as well as gender, age and years of education [7]. Rovlias and Kotsou further consider pathological markers (hyperglycemia and leukocytosis) and the output of computer tomography, and learn CART trees [8]. Our study is different from the aforementioned ones, because we do not learn a model on patient recovery (we do not have recovery data), but rather study the evolution of the patients towards a control population.
There are studies [9][10][11][12][13] that track the responses to cognitive-behavioural treatments for brain-related disorder, e.g. post-traumatic disorder, mild cognitive impairment and traumatic brain injury. These studies aim towards finding the response groups based on their developmental trajectories. Methods include group-based trajectory modelling [10,13] and growth mixture modelling (GMM) [11,12]. In [13], the method learns developmental trajectories of groups with distinct cognitive change patterns; it uses a cohort of MCI patients. In [12], the authors study the progress of the PTSD (post-traumatic stress disorder) patients on two different therapeutic protocols. Their aims were to identify distinct trajectories of treatment response and to test whether pre-treatment markers predict assignment to those trajectories.
Close to our work are the methods of Tucker et al. [14] and Li et al. [15], who predict the progression of glaucoma from cross-sectional data (rather than longitudinal data). The methods learn temporal models on trajectories. A trajectory is built by fitting so-called ''partial paths'' upon the crosssectional data: path construction involves selecting one healthy individual and one patient, labelling them as start and end and then re-ordering the remaining cross-sectional instances based on the shortest path from start to end. Our approach shares with [15,14] the need to construct a trajectory of evolution. In principle, we could construct a ''partial path'' by combining the recordings of the controls and the recordings of the patients during treatment. But this would imply ignoring part of the already available temporal information (pre-treatment data). Moreover, the Trauma Brain Injury dataset of [16], which we use, shows that the control individuals are too different from the patients: this might lead to overlong and unrealistic partial paths. Thus, we rather build a single, projected moment, using data before and after the begin of treatment, and we do not involve the recordings of the controls in our learning process.
A separate thread of work models and monitors how subpopulations (clusters) evolve over time. The framework MONIC [17] encompasses a set of 'transitions' that a cluster may experience, a set of measures and a cluster comparison mechanism that assesses whether a cluster observed at some timepoint has survived, disappeared, merged or become split at the next timepoint. Later frameworks [18,19] build upon MONIC to explain evolution: they model the clusters and their transitions as nodes, resp. edges of an evolution graph. In [20], we build upon [19] to learn a Mixture of Markov chains that capture the evolution of different subpopulations. We take up the idea of subpopulations here, but our goal is to predict rather than model the evolution of the subpopulations.
There are also studies concentrating on how individual objects evolve over time. Gaffney and Smith [21] model the evolution of an object as a trajectory and cluster together objects that evolve similarly. Krempl et al. [22] extend [21] into the online algorithm TRACER that discovers and adapts the clusters as new observations of existing objects arrive and new objects appear.

Material
The traumatic brain injury dataset (TBI) contains assessments on cognitive tests for 15 patients with brain injury and for 14 controls [16]. These tests are recorded once for the controls and twice for the patients-at moments t pre and t post . The cognitive tests are listed in Table 1 with their acronyms; 2 a detailed presentation can be found in [16].

Learning a ground truth for the TBI dataset
The data in the TBI dataset are not labelled. For the two timepoints (i.e. t pre and t post ), we are only provided the scores on how did each of the patient fare wrt. different cognitive tests. In order to compute the labels of the patient after they had undergone treatment, i.e. for timepoint t pre , we use the method presented in the following subsection.
Ground truth: The opinion of the medical experts suggests that if the computed difference between pre-treatment and post-treatment values of the individual is high, it is more likely that the individual has recovered from traumatic injury. For our experiments, these extracted labels also serve as the ground truth. Our ground truth estimation method uses a similar approach but incorporates additional information. The method is outlined in the following.
1. Compute the difference between the values recorded for the variable ICP or WNC during the pre-treatment phase and post-treatment phase for each patient, separately. 3 2. Plot the pre-treatment values of the used variable (i.e. the one which has been used in step 1) against the computed difference of this variables from the pretreatment phase to post-treatment phase. We depict an example plot in Fig. 1. 3. Patients can be separated into a number of classes based on regions they fall in within the plot. In Fig. 1, we depict the regions.

Predicting the recovery of patients
In this section, we present our evolutionary label prediction method EvoLabelPred. This method is based on the unsupervised instance prediction method Evolu-tionPred of Siddiqui et al. [23]. EvoLabelPred takes as input a labelled longitudinal dataset of individuals. It learns a clustering model over the individual timepoints (i.e. t pre and t post ), and then learns a clusterbased transition model, the ''cluster evolution graph'', by discovering transitions or relationships between the clusters across timepoints. EvoLabelPred uses this cluster transition model to predict the labels of the individuals. In the next section, we first describe learning of transition model and then explain how it is used for predicting the labels of the individuals. A list of used symbols is given in Table 2.

Bootstrap sampling
EvoLabelPred learns the prediction model from the set of patients X . Since the cardinality of X is small (as is the case for many cohort datasets), we learn an ensemble of models by performing bootstrap sampling over X . The bootstrap sampling is done without replacement, and subsequent instances of each out-of-sample patient (i.e. x pre ; x post ) are removed from both t pre and t post .

FAS
Phonetic fluency test which uses as cues letters F, A, and S as the initial letters for the patients to start the production of words ICP Measure a subject's ability to perform daily activities, and awareness of the disease

Building a cluster evolution graph
The cluster evolution graph G is learned over each bootstrap sample. Before G can be learned, EvoLabelPred first learns clustering models f pre and f post , over the instances of patients from t pre and t post , respectively. We apply K-Means over the instantiations at each moment t, and build a set of clusters f t .
For learning G, we use concepts similar to MONIC [17] and FingerPrint [24] to identify cluster transitions from t pre to t post . For each pair of clusters c 2 f pre and c 0 2 f post , we compute the extend to which they contain the instances from the same patients. We define their intersection as c \ c 0 ¼ fx 2 Xjx pre 2 c^x post 2 c 0 g; and their union as If c \ c 0 6 ¼ ;, we draw an edge ðc; c 0 Þ and assign to it the weight w ðc;c 0 Þ The learned transition graph G is a directed graph, and all the edges originating from a cluster c sum up to 1, i.e.
i.e. the first_match of a pre-treatment cluster c is the posttreatment cluster with the highest weight among the clusters linked to c.
In Fig. 2(a), we show the instantiations of example individuals at timepoints t pre (yellow) and t post (aubergine); the corresponding clusters are in Fig. 2(b); the transition arrows along with the transition weights are shown in Fig.  2(c). The yellow star indicates the ''projection'' of the individual marked as a red star; projections are explained hereafter.

Projecting patients into the future
Let x 2 X be a patient, c 2 f pre be the cluster containing x pre , of X and c fm be the first match ðcÞ as of Eq. 1.
Hard projection: We define the hard projection of x from t pre to t proj as the instantiation of x such that the value of each a 2 A is determined by the values in x pre and in b c; c c fm : 4 The projection is done for each attribute a 2 A.   4 We denote the centroid of an arbitrary cluster clu as c clu.
Predicting the post-treatment recovery from TBI 37 Soft projection: We define the soft projection of x from t pre to t proj as an instantiation, the values of which are influenced by all clusters in f post that are linked to c: The projection is again done for each attribute a 2 A. Here, w c;c 0 is the weight of a transition edge. Hence, we learn models f pre and f post on some individuals and then assess the projection location of other (or the same) individuals. In Fig. 2(c), we show the soft projection of an individual (red star): the projected position is outside both post-treatment clusters, since the individual is located at the rim of the pre-treatment cluster.

Predicting patient recovery
To predict the next label of a patient, we use a prediction method EvoLabelPred that uses cluster transition G and learns conditional probabilities over each cluster. The method is depicted in Algorithm 2 and we explain it in the following.
Learning conditional probabilities: For each cluster c 2 f pre , iterate over all the patients that are members of c. For each label l in t pre and each label l 0 in t post ; we compute the occurrences of patients who undergo label transition l ! l 0 . We compute the conditional probability using the following: Label prediction: We define the label predictionl post of x from t pre to t post as the label that is computed using the conditional probability model inside each cluster c 2 f pre . Let c be the cluster in f pre that is closest to x, the label can then be computed using the following: predCLðx; l pre Þ ¼ argmax l 0 2c Pðl 0 jl pre Þ: ð5Þ

Evaluation
In this section, we evaluate our methods on predicting the recovery of patients with traumatic brain injury. Details about the dataset have already been presented in Sect. 3.1.
Here, we describe our evaluation framework.

Evaluation settings and framework
We have presented two methods, i.e. one for projecting the patients into future given his current state, Evolu-tionPred, and the other for predicting the recovery of the patients, EvoLabelPred, given his current state and current label, e.g. at t pre .

Framework for EvolutionPred
To evaluate the performance of the projections from EvolutionPred, we are inspired by the mean absolute scaled error (MASE) [25], which was originally designed to alleviate the scaling effects of mean absolute error(MAE). To define our variation of MASE, we assume an arbitrary set of moments T ¼ ft 1 ; t 2 ; . . .; t n g. For an individual x, we define the MASE of the last instantiation x n as where dðÞ is the function computing the distance between two consecutive instantiations of the same individual x. This function normalizes the error of EvolutionPred at the last moment t n (nominator) to the error of a naive method (denominator), which predicts that the next instantiation of x will be the same as the previous (truly observed) one. If the average distance between consecutive instantiations is smaller than the distance between the last instantiation and its projection, then MASE is larger than 1. Obviously, smaller values are better.
We further compute the HitsðÞ, i.e the number of times the correct cluster is predicted for a patient x. Assume that instantiation x pre belongs to cluster c pre and let c proj denote the first match ðc pre Þ (cf. Eq. 1) at the projection moment t proj . We set HitsðxÞ ¼ 1, if c proj is same as c post (i.e. cluster closest to x post ), otherwise HitsðxÞ ¼ 0. Higher values are better.
For model purity, we compute the entropy of a cluster c towards a set of classes n, where the entropy is minimal if all members of c belong to the same class, and maximal if the members are equally distributed among the classes. We aggregate this to an entropy value for the whole set of clusters f, entropyðf; nÞ.
In general, lower entropy values are better. However, the labels used by the EvolutionPred are Control and Patient: if a clustering cannot separate well between patient instantiations and controls, this means that the patient instantiations (which are the result of the projection done by EvolutionPred) have become very similar to the controls. Hence, high entropy values are better.
For learning evolutionary prediction model, we use a bootstrap sampling [26] with a sample size of 85 % and 10,000 replications. Model validation is done with the help of out-of-sample data. For clustering the union of projected instances and the controls, we use K-Means clustering. We use bootstrap sampling with a sample size of 75 % and 1,000 replications, and vary K ¼ 2; . . .8.

Framework for EvoLabelPred
In order to evaluate EvoLabelPred, we use accuracy to assess the quality of computed labels towards the ground truth that we established in Sect. 3.2. Additionally, we will vary the parameter for the number of the subgroups, i.e. K = 3, 4.
To learn an evolutionary label prediction model, we use a bootstrap sampling [26] with a sample size of 85 % and 5,000 replications. Sampling is done without replacement, i.e. duplicates are not allowed. Model is validated on the objects that are outside of the sample.

Evaluating evolutionary projection
Validation of the projection from t pre to t post : In the first experiment, we project the patient instantiations from t pre to t post . Since the true instantiations at t post are known, we use these projections to validate EvolutionPred, whereupon evaluation is done with the MASE and Hits measures (cf. Sect. 4.1). Figure 3 depicts the hard and soft projections of the pre-treatment patient instantiations, while Table 3 depicts the MASE and Hits values for each patient separately. We perform 10,000 runs and average the values per run.
In Fig. 3, we can see that the hard projection (yellow) and soft projection (green) behave very similarly. Both predict the patient instantiations at t post very well: the mean values for the projected patient instantiations are almost identical to the true instantiations, and the shaded regions (capturing the variance around the mean) overlap with the variance of the true values almost completely.
The first row of Table 3 enumerates the 15 patients in the TBI dataset, and the subsequent rows show the MASE values for the hard, respectively, the soft projection. The last row shows the Hits value per patient. The last column averages the MASE and Hits values over all but one patient: patient #14 is excluded from the computation, because prior inspection revealed that this patient is an outlier, for whom few assessments are available. All other patients exhibit low MASE values (lower is better), indicating that our projection mechanisms predict well the patient assessments at t post .
Projection from t post to the future t proj : In the second experiment, our EvolutionPred projects the patients after treatment start towards a future moment t proj , which corresponds to an ideal final set of assessments that the patient might ultimately reach through continuation of the treatment. We do not have a ground truth to evaluate the quality of our projections. Rather, we use a juxtaposition of patients and controls, as depicted in Fig. 4. We show the  Figure 5 shows the same lines and areas for assessments before and after treatment start (Pre:cyan, Post:blue) as shown in reference Fig. 4, but also the projected assessment values (Proj: green/yellow). These projected assessments are closer to the controls, indicating that at least for some of the assessments (FAS1, ICP, CIM, CV, MT, VP), treatment continuation may lead asymptotically to similar values as for the controls.  Clustering patients with controls: We investigate whether the patients can be separated from the control population through clustering. We skip the assessments TMT-B, BTA, WCST-NC and WCST-RP, which have been recorded only for some patients. We cluster the controls with the patient instantiations before treatment (Pre: red line), after treatment start (Post: yellow line), with the Hard projected instantiations (green line) and with the Soft projection (blue dashed line). We use bootstrapping with a sample size of 75 % with 1,000 replications. In Fig.  6, we show the entropy while we vary the number of clusters K. Higher values are better, because they mean that the clustering cannot separate controls from patients. High values are achieved only for the projected instantiations.
In Fig. 6, the entropy values are very high for the clusters containing controls together with projected patients, whereby soft projection and hard projection behave identically. The high values mean that the clustering algorithm cannot separate between projected patients and controls on similarity; the instances are too similar. This should be contrasted with the clusters containing controls and patients before treatment (red line): entropy is low and drops fast as the number of clusters increases, indicating that patients before treatment are similar to each other and dissimilar to controls. After the treatment starts, the separation between patients and controls on similarity (yellow line) is less easy, but an increase in the number of clusters leads to fair separation. In contrast, projected patients are similar to controls, even when the number of clusters increases: the small clusters contains still both controls and patients.

Evaluating evolutionary label prediction
We present the results from the label prediction experiments on TBI dataset in Table 4. In the experiment, we first learned the evolutionary model using EvoLa-belPred with K ¼ 3; 4 and then utilized the conditional probabilities-based label prediction (cf. Sect. 3.3.4) within each individual cluster to predict the labels for the out-ofsample patients. The accuracy of label prediction for the label learned from ICP variable is very low: the method is able to achieve a very high accuracy for some of the patients, but if fails completely for other patients.
To reflect on the low accuracies of the label prediction, we show the clusters from t pre and t post in Fig. 7, after removing the outliers. The membership information is given in Table 5. We can observe how patients move closer to the controls (depicted as a dashed blue line) from t pre to t post . The clusters take into account the changes in the similarity among patients, but this does not lead to meaningful predictions. Upon inspecting the dataset, we discovered that the ICP variable is not correlated with other attributes in the TBI dataset. One would expect this to be true, because the selected cognitive tests that are not correlated to each other. We can clearly see from the above experiments that it is not really possible to predict the ICP values from the values of other cognitive tests.
We conducted further experiments to test this non-correlation among the variables. We applied PCA on TBI dataset prior to model learning. We present the results in Table 6 with EvoLabelPred model based on K = 3  clusters and conditional probabilities-based label prediction. Although we see slight improvement compared to our results without PCA (cf. Table 4), the overall performance is low. After removing the outliers from the label prediction model, the performance of our label prediction even dropped considerably. This means that the ICP variable does not predict well whether the patient has recovered or not (contrary to the expectations).

Conclusion
In this paper, we have investigated the problem of predicting the evolution of patients treated after brain injury, i.e. predicting their recovery and their projection into the future. We have proposed a mining workflow. Key points: Our mining workflow, which consists of two individual methods, EvolutionPred and EvoLa-belPred, clusters patients on similarity (of their assessments) before and after the treatment began, and then it tracks how each cluster evolves. It builds a cluster evolution graph that captures the transitions of patient clusters before (PRE) to after treatment (POST). Once the cluster evolution graph has been constructed, our methods Evo-lutionPred and EvoLabelPred use the clusters and their transitions to project each patient to a future moment, and predict their recovery label, respectively. The projections and predictions are done on the basis of what is known on the patients thus far.
We have experimentally validated our methods on the Trauma Brain Injury dataset [16]. We have first applied the EvolutionPred on known data and have shown that the projected values are almost identical to the true ones. Then, we have compared the projected assessments to those of a control population, and we have shown that some patient assessments are projected close to the controls. We studied treatment after brain trauma, but our EvolutionPred is applicable to any impairment, where progression or the process of recovery is of interest. The clusters we find may be of use in personalized medicine. Application of   EvoLabelPred did not go as smooth. The models that we learned were predictive for only a part of the data. A major reason for this low performance was that the selected target variable was not sufficiently predictive on these data. We have to investigate this issue in the future, together with the medical experts. Shortcomings and future work: The projected assessments have not yet been evaluated against the assertions of a human expert about the patients' health state after treatment. We are currently in the process of acquiring such data for an additional evaluation. A further shortcoming is that we ignore the duration of treatment; this is planed as future step.
The evolution of brain trauma or impairment conditions is difficult to measure at the functional level. However, the scholars anticipate that the use of neuroimaging, e.g. MEG, could lead to the detection of progressive changes in the connectivity patterns even before they translate into changes at the memory, movement or orientation functions. Regularly recording MEG images before and during treatment of patients allows a more effective evaluation of treatment by providing hints and indicators about the effectiveness of a particular therapy. A next step for our work will be the integration of MEG data into our mining workflow to check whether the evolution of patients towards the subcohort of controls can be modelled more effectively with the MEG images.