Predicting functional impairment trajectories in amyotrophic lateral sclerosis: a probabilistic, multifactorial model of disease progression

Objective To employ Artificial Intelligence to model, predict and simulate the amyotrophic lateral sclerosis (ALS) progression over time in terms of variable interactions, functional impairments, and survival. Methods We employed demographic and clinical variables, including functional scores and the utilisation of support interventions, of 3940 ALS patients from four Italian and two Israeli registers to develop a new approach based on Dynamic Bayesian Networks (DBNs) that models the ALS evolution over time, in two distinct scenarios of variable availability. The method allows to simulate patients’ disease trajectories and predict the probability of functional impairment and survival at different time points. Results DBNs explicitly represent the relationships between the variables and the pathways along which they influence the disease progression. Several notable inter-dependencies were identified and validated by comparison with literature. Moreover, the implemented tool allows the assessment of the effect of different markers on the disease course, reproducing the probabilistically expected clinical progressions. The tool shows high concordance in terms of predicted and real prognosis, assessed as time to functional impairments and survival (integral of the AU-ROC in the first 36 months between 0.80–0.93 and 0.84–0.89 for the two scenarios, respectively). Conclusions Provided only with measurements commonly collected during the first visit, our models can predict time to the loss of independence in walking, breathing, swallowing, communicating, and survival and it can be used to generate in silico patient cohorts with specific characteristics. Our tool provides a comprehensive framework to support physicians in treatment planning and clinical decision-making. Supplementary Information The online version contains supplementary material available at 10.1007/s00415-022-11022-0.


ALS staging systems
The available ALS progression models generally rely on the ALSFRS-R staging system [1], which is a 12-item questionnaire rated on a 0-4 point scale evaluating the progression of disability in ALS patients; ALSFRS-R has been extensively used to assess treatment efficacy in clinical trials and measure disease progression [2]- [5]. Despite being a de facto standard, this staging system suffers from several limitations. The interpretation of an ALSFRS-R total raw score is hindered by the ambiguities of its different metric meanings for the different ALS forms [6], and by its non-linear relationship with the linear Rasch transformed measures of global function [7], [8]. The scale exhibits multi-dimensionality, thus it should not be used as a global total score [7]. The ALSFRS-R scale also suffers from the "floor-effect" and is thus unable to capture late-stage clinical changes: patients approaching the bottom of the scale appear to be "slowing down" in their worsening because it becomes increasingly difficult for them to lose further raw score points [9], [10]. Finally, there is no agreed-upon threshold at which a change in ALSFRS-R score is viewed as an important transition point in functional status [11].
To overcome these limitations, other ALS staging systems have been introduced. The King's staging system is based on disease burden as measured by the involvement of clinical regions and the presence of respiratory or nutritional failure [11], [12]. This system uses five stages, from 1 to 5, with stages 1 to 4 indicating the number of involved clinical regions (stage 1 also indicates the disease onset) and stage 5 being death. Although the King's system is not based on ALSFRS-R scores, it can be estimated from them with 92% concordance [13]. More recently, the Milano-Torino staging (MiToS) system was introduced as a novel tool to measure ALS progression [11]. This system uses six stages, from 0 to 5, with stage 0 representing symptom onset and stage 5 being death, and is based on the assessment of four functional domains (movement, communication, swallowing and breathing) assayed by the ALSFRS-R. Whether a domain is not impaired, its MiToS value is equal to 0, whereas the MiToS value is equal to 1 for the domains in which patient's independence is compromised: the MiToS score corresponds to the total number of functional domains in which the patient has lost independence. This scale was shown to be able to reliably identify relevant stages of disease in patients according to the number lost functions, to be consistent with sequential disease progression, to overcome the non-linearity and multidimensionality limitations of ALSFRS-R, and to correlate well with patients' quality of life and health service costs [14].
We employed the MiToS staging system since it tackles all the limitations of the ALSFRS-R scale while at the same time being completely derivable from the latter. The King's system, on the other hand, cannot always be fully derived from the ALSFRS-R scale, which represents a limitation. Moreover, unlike the King's staging system which summarises the clinical/anatomical spread of the disease, the MiToS system is aimed towards the distinction of functional capabilities during the spread of the disease and is able to differentiate late ALS stages in a higher resolution [15].

Datasets
The ITIS and IT datasets analysed in this work include static variables, which are either data collected at first visit only or time-independent covariates, and dynamic variables, that are measurements collected over subsequent visits. In detail, static variables are: medical centre, sex, onset site, familiality, genetics, age at onset, diagnostic delay, FTD, BMI premorbid, BMI at diagnosis and FVC at diagnosis. On the other hand, time between visits (TBV), time since onset (TSO), the MiToS items, and the NIV and PEG use are dynamic variables.

Variable discretisation
In this work we employed discrete-space/discrete-time DBNs, which encode probabilistic relationships among discrete variables over a discrete number of time steps; thus, we discretised the continuous variables according to their distribution percentiles in the training sets. For all the variables of both the ITIS and the IT datasets, we set the thresholds based on the tertiles. However, for some discrete variables (such as Time between visits, TBV, in the ITIS dataset) this led to unbalanced groups since their original values were heavily distributed around specific values, while the use of quartiles corrected this bias. For the Time since onset (TSO) variables, specifically, we implemented the discretization based on another requirement of the DBNs. The algorithm for DBN structure-learning relies on the assumption that the conditional probabilities distributions are time invariant. For example, breathing impairment at time (t-1) influences the breathing capability at time (t) in the same way if it is 12 months from onset or 36 months from onset. Obviously, this mathematical assumption does not hold in reality since the disease can proceed by following different patterns in its different stages. Therefore, the variable TSO was discretized in such a way that, within each time interval, the time-invariance assumption was verified.
Supplementary Tables 1 and 2 show the quantisation levels and the categories adopted for each variable in the ITIS and IT datasets, respectively.

Dynamic Bayesian Networks
Dynamic Bayesian Networks are a generalisation of Markov decision processes and can be used to represent medical knowledge explicitly in terms of causes and effects as obtained from clinical data and domain knowledge. A DBN is defined by its structure (set of parent-children dependencies) annotated with a set of conditional probability distributions (CPDs), since each node is a probabilistic function of its parents. Nodes in a DBN are connected through a Directed Acyclic Graph. However, DBNs allow encoding cycles and feedbacks between variables when considering their relationships over different time slices. Consequently, our DBN model was developed through a two-step iterative procedure: 1) by inferring the graph topology and 2) learning the parameters of each CPD (i.e., the probability that a variable assumes a specific value conditional to each possible joint assignment of values to its parents). The algorithm for DBN structure-learning relies on the following assumptions: two nodes cannot be a deterministic function of a single variable, the CPDs are time invariant, and variables are related to each other over a discrete number of time steps, called slices. For example, breathing impairment at time t-1 influences the breathing capability at time t. Finally, the CPDs are usually estimated through techniques like Bayesian estimation or (regularised) maximum likelihood. In this work, the DBN structure was inferred using the Max-Min Hill-Climbing (MMHC) algorithm [16], a greedy search-and-score method that starts with an initial graph (empty graph in our case) and searches the complete space of possible graph structures, by adding, reversing or deleting edges. The MMHC runs until a specific score (here the Bayesian Information Criterion) is maximised, or a specific number of iterations has been reached. Thus, the structure-learning phase provides the DBN topology with the highest probability of generating the training data. Subsequently, parameters of CPDs were computed through a maximum a posteriori estimation for each node.
The entire learning procedure was implemented in R, by using bnstruct [17], a package performing DBN inference on discrete and categorical data even in the presence of missing values, which is the case of our data and a common situation in the clinical context. Furthermore, bnstruct makes use of state-of-the-art algorithms for network learning and also provides methods for bootstrap resampling of the data and inference. bnstruct also allows the encoding of the domain knowledge, by applying constraints to the network topology; that can be set to forbid clinically or biologically non-sense relations among variables (see next section). These constraints also permit the exploration of only part of the solution space, reducing the computational complexity of learning the DBN, a task that is in general NP-hard. It has to be noticed that, although reducing the learning complexity, these choices could bring to a local minimum. On the other hand, a search of the global optimum on the entire space of possible solutions would have been computationally infeasible.

Rules for learning the networks
When learning the structure of the network from the training set, the following information can be provided: 1. the mandatory edges, i.e. the edges between variables that must be present in the network 2. the possible edges, i.e., the edges that can be found during the learning phase. They can be defined by grouping variables in separate (disjoint) layers: then, by default, variables in a given layer j can depend only on variables from layers i ≤ j. Users, however, can also allow or deny specific dependencies between layers.

Rules on the ITIS network
The mandatory edges set for the network on the ITIS dataset are: • the dependency of Onset site from Sex • the dependencies of the variables MiToS at time t from the variable Time Since Onset (TSO) • the dependencies of the variable Survival from the variable Time Since Onset (TSO).
The layering structure was defined as follows:

Rules on the IT network
The mandatory edges set for the network on the IT dataset are: • The layering structure was defined as follows: • • Layer 7 can only depend on layers 3, 6 or 10.
• Layer 8 can depend on any other layer, except for itself and layers 7 and 9.
• Layer 9 can depend on any other layer, except for itself and layers 7 and 8.
• Layer 10 cannot depend on itself or any other layer.

Bootstrap-based DBN learning
In order to assess the confidence of the identified edges, a bootstrap procedure can be performed. The bootstrap technique generates different samples of a dataset and, for each sample, learns a DBN. The result is not a directed acyclic graph (DAG) and therefore it cannot be used to learn conditional probabilities, but a weighted partially DAG (WPDAG). In this latter graph, edges (i, j) weigh the number of times an edge going from node i to node j appears in a Bayesian network learned from a bootstrap sample [17]. These numbers represent a measure of the confidence on the presence of each edge.
We performed such analysis by employing 100 bootstrap samples. Figures 2 and 3 reported below show the resulting WPDAGs, for the ITIS and the IT datasets, respectively. For each network, the grey intensity of each edge corresponds to the number of trained bootstrapped DBNs where that edge was identified (from light grey = 1, to black = 100). With respect to the networks reported in the manuscript (manuscript's Figure 1), here the relationships between the same variable at consecutive times (e.g. the impairments in the MiToS domains) are not reported as auto-loops, but as edges from the variable at time (t) to the same variable at time (t+1). Moreover, in this analysis, we did not impose any mandatory edge in order to assess what indication emerged from the data.
In both the networks, we can observe how a number of edges identified with high rate through the bootstrap procedure (the edges in black colour) correspond to those constituting the DBNs learned on the whole training sets and reported in the manuscript's Figure 1.