Keywords

1 Introduction

Most neurodegenerative diseases are caused by the accumulation of pathogenic proteins whose behaviour is governed by neurobiological mechanisms. The interplay among these mechanisms cause pathogenic proteins to accumulate and spread in the brain network, leading to loss of function and brain atrophy. However, these mechanisms are still poorly understood and there are many hypotheses regarding what mechanisms govern pathogenic protein behaviour [14, 16]. Brain imaging of patients suggests that atrophy loosely follows a number of spatiotemporal patterns, with different patterns associated with each neurodegenerative disease variant [14]. It is hypothesised that different pathogenic protein variants are governed by different mechanisms, which would explain the variety of patterns [14]. A better understanding of these mechanisms is key for the development of therapies that directly influence them and thus disease progression.

Conventional machine learning can perform differential diagnosis and prognosis [10, 15] in neurodegenerative diseases. An fMRI study determined disease-specific regions as epicentres, whose functional connectivity extracted in healthy subjects correlated with atrophy progression [7, 16], suggesting functional connectivity impacts disease progression. In addition, hub regions and regions with shorter functional paths to the syndrome-specific epicentre showed greater vulnerability. Despite their usefulness, such approaches and correlation studies do not provide information on the protein-behaviour governing mechanisms.

A computational modelling study of amyloid-beta and tau aggregation in AD evaluated candidate therapies [9], but on a more abstract level. Simulations and computational modelling of pathogenic protein behaviour governing mechanisms have previously been applied, but only to a small artificial neural network [5]. A model of brain network mediated trans-synaptic diffusion [10] achieved a strong correlation with atrophy of follow-up scans, however these were limited to four year follow-ups. In contrast to these works, we model many additional protein behaviour governing mechanisms, using a brain network and simulations and we predict disease progression in its entirety.

We propose that computational models of pathogenic protein behaviour governing mechanisms be created based on neurobiological literature and then used in simulations to predict atrophy. We evaluate models by comparing a simulation’s atrophy prediction to the prediction of an event based model fitted to APOE4 positive AD patient data [15]. The models that best fit empirical data provide evidence in favour of the related hypotheses behind them. Thus, our framework can suggest which hypotheses warrant further investigation in neurobiological research. We also demonstrate how suggestions regarding the effectiveness of candidate therapies that directly target the mechanisms can be made by modelling their influence into the simulations.

2 Methodology

Structural, functional and diffusion-weighted MR images from 10 healthy subjects were used to generate a graph representation of a healthy brain network, consisting of 27 regions. We modelled multiple pathogenic protein mechanisms. After initialising the network and seeding pathogenic protein into it, we ran simulations, using our modelled mechanisms to update the network state.

2.1 Image Dataset and Processing

T1-weighted (T1w) MR images were parcellated into 208 regions [1]. We kept 27 symmetric grey matter regions (Fig. 1), denoted as \(r\in \{1,...,27\}\). Each region has associated coordinates \(\mathbf {c}_r\) and volume \(V_r(t)\) at timestep t. Subject volumes were normalised by their total intracranial volume, then for each region \(V_r(0)\) was set to the population averaged regional volume.

Resting-state functional scans were motion and EPI corrected and high-pass filtered (0.01 Hz). Time courses were extracted, then centered and variance-normalised. The parcellations were affinely registered from the T1w to the fMRI image space. We computed the per region and per subject synaptic signals based on the methodology of Karahanoğlu et al. [6] (activity inducing signals), which were averaged over subjects to get the population regional synaptic signals \(\mathbf {Sig}_r\). The per region, per subject synaptic signals were also used to compute the synaptic activity’s power spectrum, which were averaged over subjects. Using the per region power spectrums, we computed the mean frequency per region \(f_r(t)\).

Diffusion data were corrected for motion, eddy-currents and EPI distortion using field maps before tensors were fitted. Tractography was then performed [13] and filtered using the approach porposed by Smith et al. [11]. We denote as \(\mathbf {DTI}_{r_1,r_2}\) the connectivity matrix extracted from the tractography and defined using the brain parcellation of the T1w images.

Resting state fMRI scans were used to compute each subject’s effective connectivity [2] where the structural connectivity \(\mathbf {DTI}_{r_1,r_2}\) was incorporated as prior information [12] with hyperparameters \(\alpha = 4, \beta = 12\). The population effective connectivity \(\mathbf {EC}_{r_1,r_2}\) was calculated by performing Bayesian model reduction [3].

2.2 Modelling of Protein Behaviour Governing Mechanisms

We modelled the mechanisms of protein production, clearance, misfolding, extracellular diffusion, network-mediated diffusion, frequency-related spread and atrophy. Brain regions had associated atrophy \(A_r(t)\), radius \(\text {Rad}_r(t)\) (we assumed spherical regions), pathogenic protein concentration \(P_r(t)\) and non-pathogenic protein concentration \(N_r(t)\). Pathogenic protein is hypothesised to interact with non-pathogenic protein through the mechanism of misfolding, requiring the need for a non-pathogenic protein concentration. We denote \(\mathbf {D}_{r_1,r_2}\) as the Euclidean distance between the barycentric coordinates of brain regions. We calculated the correlation matrix \(\mathbf {Corr}_{r_1,r_2}\) between all pairs of synaptic signals \(\mathbf {Sig}_{r_1},\mathbf {Sig}_{r_2}\). Regional atrophy was initialised to \(A_r(0)=0\) and increases as a function of the regional protein concentration (Eq. 1), with \(A_t\) defining the protein concentration threshold below which no additional atrophy occurs and \(A_m\) controlling the atrophy magnitude. Once \(A_r(t)=1\), then the region has fully atrophied. As atrophy increases, it linearly decreases volumes \(V_r(t)\) (Eq. 2, also requiring an updating of radii), as well as the regional synaptic frequency \(f_r(t)\) (Eq. 3), as we assumed that as brain regions atrophy, less synaptic activity will occur.

$$\begin{aligned} A_r(t)= & {} A_r(t-1) + \max \left[ 0, A_m \left( e^{10\left( N_r(t)+P_r(t)-A_t\right) }-1\right) \right] \end{aligned}$$
(1)
$$\begin{aligned} V_r(t)= & {} \left( 1-A_r(t)\right) V_r(0) \end{aligned}$$
(2)
$$\begin{aligned} f_r(t)= & {} \left( 1-A_r(t)\right) f_r(0) \end{aligned}$$
(3)

All modelled mechanisms are applied similarly to non-pathogenic and pathogenic protein, except for misfolding. Therefore, we only display equations for pathogenic protein except for Eq. 14. Production rates of non-pathogenic \(R_{\text {ProdN}}\) and pathogenic \(R_{\text {ProdP}}\) protein model transcription and translation, the biological process through which new protein molecules are created in cells. Our model of clearance [8] summarises all the processes through which the brain removes protein molecules to be replaced by new ones. We modelled clearance \(\text {ClearN}_r(t),\text {ClearP}_r(t)\) (Eq. 4) such that when concentrations diverge from the equilibrium of normal protein concentration levels \(N_{\text {Equilibrium}},P_{\text {Equilibrium}}\), where clearance rates \(R_{\text {ClearN}}, R_{\text {ClearP}}\) are typically equal to production rates, the clearance rates adjust to compensate. We assumed pathogenic protein behaves in a “prion-like” manner [4, 14], meaning that when pathogenic protein and non-pathogenic protein come in close proximity, the non-pathogenic protein misfolds and converts to the pathogenic state. Given a misfolding rate of \(R_{\text {Mis}}\) a concentration of non-pathogenic protein \(\text {Misfold}_r(t)\) misfolds to pathogenic (Eq. 5).

$$\begin{aligned} \text {ClearP}_r(t)= & {} R_{\text {P}} \log \left( 1+(e-1) \frac{P_r(t)}{P_{\text {Equilibrium}}} \right) \end{aligned}$$
(4)
$$\begin{aligned} \text {Misfold}_r(t)= & {} R_{\text {Mis}} N_r(t) P_r(t) \end{aligned}$$
(5)

Many candidate mechanisms are hypothesised to spread protein [4] (e.g. diffusion, exocytosis, etc). We modelled extracellular diffusion \(\mathbf {ED}_{r_1,r_2}(t)\) (Eq. 7), network-mediated diffusion \(\mathbf {ND}_{r_1,r_2}(t)\) (Eq. 9) and frequency-related spread \(\mathbf {FS}_{r_1,r_2}(t)\) (Eq. 10), with the elements of these matrices indicating the probability of protein spreading from region \(r_2\) to \(r_1\). Extracellular diffusion models Brownian motion in the extracellular space. The probability of protein spreading through extracellular diffusion to a region is given by the integral of a one dimensional (we assumed isotropic diffusion) normal distribution which is based on the region’s radius, its Euclidean distance from the origin region and the standard deviation \({\sigma }_{\text {ED}}\), which controls the extracellular diffusion speed. Network-mediated diffusion models Brownian motion adjusted to the strengths of network connections. The probability of protein spreading out of region \(r_2\) is given by \({\text {CL}}_{r_2}\) (Eq. 8) and is based on the integral of a one dimensional normal distribution, with the standard deviation \({\sigma }_{\text {ND}}\) controlling the speed, as well as a term for the relative connection strength per unit of volume. The probability to spread into a region \(r_1\) also depends on the regions’ normalised connection strength \(w_{r_1,r_2}\). Frequency-related spread assumes that the more frequent synaptic activity is, the more protein spreads out of a region. The probability of protein spreading from region \(r_2\) to region \(r_1\) is based on the general strength of frequency-related spread \(R_{\text {FS}}\), frequencies \(f_{r_2}(t)\) and on the normalised connection strengths \(w_{r_1,r_2}\) (Eq. 10).

$$\begin{aligned} \text {NED}_{r_2}(t)= & {} \sum _{r_1} \frac{1}{\sqrt{2\pi }} \int _{\frac{\mathbf {D}_{r_1,r_2} - \text {Rad}_{r_1}(t)}{{\sigma }_{\text {ED}}} }^{\frac{\mathbf {D}_{r_1,r_2} + \text {Rad}_{r_1}(t)}{{\sigma }_{\text {ED}}} }e^{\frac{x^2}{2}}dx \end{aligned}$$
(6)
$$\begin{aligned} \text {ED}_{r_1,r_2}(t)= & {} \frac{1}{\text {NED}_{r_2}(t)\sqrt{2\pi }} \int _{\frac{\mathbf {D}_{r_1,r_2} - \text {Rad}_{r_1}(t)}{{\sigma }_{\text {ED}}} }^{\frac{\mathbf {D}_{r_1,r_2} + \text {Rad}_{r_1}(t)}{{\sigma }_{\text {ED}}} }e^{\frac{x^2}{2}}dx \end{aligned}$$
(7)
$$\begin{aligned} \text {CL}_{r_2}= & {} 2\left( \frac{1}{\sqrt{2\pi }} \int _{-\infty }^{\frac{-\text {Rad}_{r_2}(t)}{{\sigma }_{\text {ND}}}}e^{\frac{x^2}{2}}dx\right) \sum _{r_1} w_{r_1,r_2}\frac{\min _{r_3}V_{r_3}(0)}{V_{r_2}(0)} \end{aligned}$$
(8)
$$\begin{aligned} \mathbf {ND}_{r_1,r_2}(t)= & {} {\left\{ \begin{array}{ll} \text {CL}_{r_2} w_{r_1,r_2}, \mathrm{\ if \ }r_1 \ne r_2 \\ \text {CL}_{r_2} w_{r_1,r_2} + \left( 1-\text {CL}_{r_2}\right) , \mathrm{\ if \ }r_1=r_2 \end{array}\right. } \end{aligned}$$
(9)
$$\begin{aligned} \mathbf {FS}_{r_1,r_2}(t)= & {} {\left\{ \begin{array}{ll} R_{\text {FS}} \frac{f_{r_2}(t)}{{\max }_{r_3}f_{r_3}(0)} w_{r_1,r_2} , \mathrm{\ if \ }r_1 \ne r_2 \\ R_{\text {FS}} \frac{f_{r_2}(t)}{{\max }_{r_3}f_{r_3}(0)} w_{r_1,r_2} + 1 - R_{\text {FS}} \frac{f_{r_2}(t)}{{\max }_{r_3}f_{r_3}(0)}, \mathrm{\ if \ }r_1=r_2 \end{array}\right. } \end{aligned}$$
(10)

The connection strengths for network-mediated diffusion and frequency-related spread can be based on any related metric. We explored the following possibilities: connection strengths based on the correlation coefficients \(w_{r_1,r_2}=|\mathbf {Corr}_{r_1,r_2}|\) between synaptic signals or based on the fibre tract connectivities \(w_{r_1,r_2}=|\mathbf {DTI}_{r_1,r_2}|\) or based on the effective connectivity strengths \(w_{r_1,r_2}=|\mathbf {EC}_{r_1,r_2}|\). Respectively they create the network-mediated diffusion matrices \(\mathbf {ND}^C_{r_1,r_2}(t)\) with speed \({\sigma }_{\text {ND-C}}\), \(\mathbf {ND}^D_{r_1,r_2}(t)\) with speed \({\sigma }_{\text {ND-D}}\) and \(\mathbf {ND}^E_{r_1,r_2}(t)\) with speed \({\sigma }_{\text {ND-E}}\) and the frequency-related spread matrices \(\mathbf {FS}^C_{r_1,r_2}(t)\) with strength \(R_{\text {FS-C}}\), \(\mathbf {FS}^D_{r_1,r_2}(t)\) with strength \(R_{\text {FS-D}}\) and \(\mathbf {FS}^E_{r_1,r_2}(t)\) with strength \(R_{\text {FS-E}}\).

Each timestep, we update atrophy \(A_r(t)\), volumes \(V_r(t)\), radii \(\text {Rad}_r(t)\) and frequencies \(f_r(t)\). Production, misfolding and clearance update the concentrations, which are then transformed to quantities (Eq. 14). After spreading through the network (Eq. 15) they are transformed back to concentrations (Eq. 16).

$$\begin{aligned} \mathbf {FS}(t)= & {} \mathbf {FS}^C(t) \times \mathbf {FS}^D(t) \times \mathbf {FS}^E(t) \end{aligned}$$
(11)
$$\begin{aligned} \mathbf {ND}(t)= & {} \mathbf {ND}^C(t) \times \mathbf {ND}^D(t) \times \mathbf {ND}^E(t) \end{aligned}$$
(12)
$$\begin{aligned} \mathbf {QN}_r(t)= & {} (N_r(t) + R_{\text {ProdN}} - \text {ClearN}_r(t) - \text {Misfold}_r(t)) V_r(t) \end{aligned}$$
(13)
$$\begin{aligned} \mathbf {QP}_r(t)= & {} (P_r(t) + R_{\text {ProdP}} - \text {ClearP}_r(t) + \text {Misfold}_r(t)) V_r(t) \end{aligned}$$
(14)
$$\begin{aligned} \mathbf {QNP}(t)= & {} \mathbf {FS}(t) \times \mathbf {ND}(t) \times \mathbf {ED}(t) \times \mathbf {QP}(t) \end{aligned}$$
(15)
$$\begin{aligned} P_r(t+1)= & {} \mathbf {QNP}_r(t)/V_r(t) \end{aligned}$$
(16)

3 Results

We set: \(N_r(0)=0.01,R_{\text {ProdN}}=R_{\text {ClearN}}=2e-4,A_t=0.04\). We varied whether pathogenic protein was soluble (\(R_{\text {ClearP}}=2e-5\)) or insoluble (\(R_{\text {ClearP}}=0\)), whether there was pathogenic protein production (\(P_r(0)=0.01,R_{\text {ProdP}}=2e-5\)) or not (\(P_r(0)=0,R_{\text {ProdP}} = 0\)) and whether there was pathogenic protein seeding of concentration \(P_{seed}\) at the hippocampus, parahippocampal gyrus or entorhinal area (e.g. for seeding the hippocampus: \(P_{r=hippocampus}(t=0)=P_{seed}\)).

We evaluated our models by comparing simulation atrophy prediction against the prediction of an event based model (EBM) [15] which computed the uncertainty matrix \(\mathbf {UM}_{r,i}\) (Fig. 1), similar to Fig. 1 of Young et al. [15], but using empirical data of APOE4 positive AD patients, MCI patients and healthy controls. The EBM assumes brain regional volumes are probabilistically healthy or abnormal, fitting a normal and a uniform distribution to each brain region’s volume data, which calculate the probability of health and abnormality respectively. Volume abnormality threshold \(\text {Vthres}_r\) for a brain region is defined as the largest volume value such that the probability of health is equal to the probability of abnormality. We kept track of the exact timestep that each region’s volume became abnormal during a simulation. We denote \(\text {OS}_r\) as the event position that brain region r became abnormal during a simulation (e.g. if \(\text {OS}_{r=hippocampus}=5\) then hippocampal volume became abnormal fifth, after four other brain regional volumes became abnormal). The element \(\mathbf {UM}_{r,i}\) is the probability of region r being i-th in the order of brain regional volumes becoming abnormal based on the EBM. We used the following metric to determine the goodness of fit of each model and to optimise the parameter set \(\theta \) (Eq. 18):

$$\begin{aligned} \theta ^{\star }= & {} \min _{\theta } \sum _{r }\log (\mathbf {UM}_{r,\text {OS}_r} ) \end{aligned}$$
(17)
$$\begin{aligned} \theta= & {} \{R_{\text {Mis}}, A_m, P_{seed}, {\sigma }_{\text {ED}}, {\sigma }_{\text {ND-C}}, {\sigma }_{\text {ND-D}}, {\sigma }_{\text {ND-E}}, R_{\text {FS-C}}, R_{\text {FS-D}}, R_{\text {FS-E}} \} \end{aligned}$$
(18)

A model of pathogenic protein without production or clearance and with hippocampus seeding best fitted the empirical data (Fig. 1) with parameters \(\{ R_{Mis}^{\star }=0.297, A_m^{\star }=0.00152, P_{seed}^{\star }=0.0969, {\sigma }_{\text {ED}}^{\star }=0.000243, {\sigma }_{\text {ND-C}}^{\star }=0,\) \( {\sigma }_{\text {ND-D}}^{\star }=0.0829, {\sigma }_{\text {ND-E}}^{\star }=0.00276, R_{\text {FS-C}}^{\star }=0, R_{\text {FS-D}}^{\star }=0, R_{\text {FS-E}}^{\star }=0 \}\).

Fig. 1.
figure 1

The matrix \(\mathbf {UM}\), indicating in grey the event position uncertainty for each region, with the sequence \(\text {OS}_r\) with the optimal parameters \(\theta ^{\star }\) overlayed as red crosses.

4 Discussion and Future Work

The simulation with the optimal parameters predicted the early stages of atrophy progression well, whereas later stages had a higher variance from the diagonal sequence given by the EBM (Fig. 1). All other parameters being equal, simulations with pathogenic protein without production or clearance better fitted the data, evidence in favour of the prion-like spread hypothesis [4, 14]. Spread was primarily driven by the fibre tract connectivity, whereas functional correlations, (\({\sigma }_{\text {ND-C}}^{\star }=0\)) extracellular diffusion (\({\sigma }_{\text {ED}}^{\star }=0.000243\)) and effective connectivity (\({\sigma }_{\text {ND-E}}^{\star }=0.00276\)) only had small contributions. Frequency-related spread (\(R_{\text {FS-C}}^{\star }=0, R_{\text {FS-D}}^{\star }=0, R_{\text {FS-E}}^{\star }=0\)) also did not contribute to the spread. This evidence suggests protein spread is driven by structural fibre tract connectivity and not by synaptic activity, in agreement with the modelling of Raj et al. [10].

If we assume that our modelling with the optimal parameters \(\theta ^{\star }\) is an accurate and sufficiently complex model of AD progression, then hypothetically our framework could easily evaluate candidate therapies by simulating therapies that have an effect on one or more of the models. For example, altering the speed of extracellular diffusion had little effect on atrophy progression, whereas decreasing the speed of network-mediated diffusion significantly slowed down atrophy progression. Despite this example’s oversimplification, it is clear how our framework can suggest potential therapy targets.

We presented a proof-of-concept of our methodology, where we aimed at inferring plausible physiological properties from empirical data with a more complicated model than proposed by Raj et al. [10]. Most modelling approaches rely on mathematical properties and are effective at capturing atrophy patterns (e.g. classification task), but are unable to elucidate the underlying mechanisms. The proposed approach, instead, aims at gaining an understanding of these mechanisms through simulation of pathogenic protein spread within the brain network. To achieve this goal, we had to make multiple assumptions, which are difficult to validate, since many neurobiological properties are still unknown.

In future work additional mechanisms (e.g. protein homeostasis, amyloid and tau interaction, etc.) will be modelled and their contribution to disease progression will be assessed for a variety of neurodegenerative diseases, under the hypothesis that different parameter values will be linked to different diseases. Adding appropriate regularisation terms to the cost function and estimating the structural and functional connectivity from a larger population of healthy controls would also be desirable. In this work, connectivity metrics were assumed to remain constant under atrophy, which is an assumption that should be relaxed in future work.