Neural network-based anomalous diffusion parameter estimation approaches for Gaussian processes

Anomalous diffusion behavior can be observed in many single-particle (contained in crowded environments) tracking experimental data. Numerous models can be used to describe such data. In this paper, we focus on two common processes: fractional Brownian motion (fBm) and scaled Brownian motion (sBm). We proposed novel methods for sBm anomalous diffusion parameter estimation based on the autocovariance function (ACVF). Such a function, for centered Gaussian processes, allows its unique identification. The first estimation method is based solely on theoretical calculations, and the other one additionally utilizes neural networks (NN) to achieve a more robust and well-performing estimator. Both fBm and sBm methods were compared between the theoretical estimators and the ones utilizing artificial NN. For the NN-based approaches, we used such architectures as multilayer perceptron (MLP) and long short-term memory (LSTM). Furthermore, the analysis of the additive noise influence on the estimators’ quality was conducted for NN models with and without the regularization method.

In real data analysis, the main difficulty is fitting the model and estimating its parameters. While both fBm and sBm are simple models, there are problems with reliable estimation using classic methods, especially when the measurement methods of the trajectories are inaccurate [31]. We observe that classical approaches degrade severely when the scale of the noise is substantial up to the point when they have a problem with the distinction between sub-and superdiffusion.
Here, we focus on the methods utilizing sample autocovariance function (ACVF) [32]; however, other approaches are viable [33]. It is motivated by the theoretical properties of this function, namely that for a centered Gaussian process only ACVF is needed for unique identification. We compare the classical approaches with the proposed NN-based ones and analyze the influence of the added noise on the estimation quality.
The rest of the paper has the following structure. First, we define normal and anomalous diffusion using partial differential equations and the stochastic processes emerging from their solutions with the focus on fBm and sBm. In the third section, the estimation methods are presented, starting from the summary of the fBm identification approach (described in detail in [34]) ending with the sBm identification methodology and the description of NN architecture. In the following section, the methods' evaluation and comparison occur. The data generation methodology is presented with the error analysis for both data with and without the noise of different scales. We also inspect the influence of adding noise to the training samples. The last section is a summary of the paper with propositions for further improvement of the presented methodology.

Anomalous diffusion models
In this section, we describe normal and anomalous diffusion. We also define and characterize two examples of such processes, namely the fBm and sBm processes.
The roots of diffusion and anomalous diffusion are in physics; for this reason, we will define it using the partial differential equation (PDE) [35]: where p(x, t) is density of a material in time t and point in space x, D is a constant determining the rate of dispersion in the volume and D is Laplace operator defined as . The solution of the equation (2) (under certain assumptions) leads to the probability density function (PDF) of Gaussian distribution [35]: Thus, normal diffusion can be modeled using Brownian motion [36]. However, when using this model, we assume that in the system there are no external forces and it is in equilibrium. Furthermore, the diffusivity coefficient has to be constant. For those reasons, a generalization was proposed, namely anomalous diffusion. As mentioned in the introduction, it is defined using ensemble MSD (1). For Brownian motion fBðtÞg (normal diffusion) E B 2 ðtÞ ½ /t thus any deviation of the second moment from the linearity is called anomalous. For a ¼ 1, as already established, the process is called normal diffusion. When a\1 the process is subdiffusive and for a [ 1 superdiffusive.

Fractional Brownian motion
One possible generalization of diffusion equation (2) is to include the diffusivity that is nonconstant. The fBm comes from such an extension, where D is replaced by function DðtÞ :¼ aCt aÀ1 to get the following PDE [37]: The solution to this equation can also be associated with a Gaussian process as: Comparing Eq. (3) to Eq. (4) one can observe that for a ¼ 1 the solutions are alike. The fBm fB a ðtÞg, a 2 ð0; 2Þ is defined as a Gaussian process B a ðtÞ $ N ð0; 2Ct a Þ with the following characteristics [38]: The process is self-similar B a ðatÞ $ a a =2 B a ðtÞ with stationary increments B a ðt þ sÞ À B a ðtÞ $ B a ðsÞ, s [ 0. For a [ 1 the process exhibits the so-called long-range dependence P 1 n¼1 E B a ð1ÞðB a ðn þ 1Þ À B a ðnÞÞ ½ ¼ 1 .

Fractional Brownian noise
Fractional Brownian noise (fBn) is defined as the increment process of fBm, namely: DB a ðtÞ ¼ B a ðtÞ À B a ðt À 1Þ; t 2 N þ : As it was established in Sect. 2.1, fBm has stationary increments and thus fBn is (weakly) stationary, which implies that the distribution of the process does not change in time DB a ðtÞ $ B a ð1Þ $ N ð0; CÞ. Furthermore, the ACVF only depends on the lag s: This crucial property of fBn allows for a feasible estimation of the a parameter as the estimation procedure can rely only on the ACVF evaluated for certain lags s.

Scaled Brownian motion
Scaled Brownian motion process is a simple modification of the Brownian motion fBðtÞg. Namely, it is constructed from the ordinary Brownian motion by ''scaling'' time: e B a ðtÞ ¼ Bðt a Þ: Many properties of sBm are shared with standard Brownian motion. What is more, for a ¼ 1, the f e B a ðtÞg reduces to regular Brownian motion fBðtÞg. The process is Gaussian e B a ðtÞ $ N ð0; 2Dt a Þ and self-similar BððatÞ a Þ $ a a =2 B a ðt a Þ.
Furthermore, it does have independent yet not stationary increments: Bððt þ sÞ a Þ À Bðt a Þ $ Bððt þ sÞ a À t a Þ ¿ Bðs a Þ; s [ 0; a 6 ¼ 1: As the process is Gaussian, it can be fully characterized by the mean value and covariance matrix:

Scaled Brownian noise
Similarly to fBn (Sect. 2.2), sBn is defined by the differentiation of sBm: This process, alike fBn, has the expected value equal to zero ED e B a ðtÞ ¼ 0. However, the ACVF is still dependent on both t and s [17]: where 1 s¼0 takes the value of 1 when s ¼ 0 and 0 otherwise. Furthermore, the distribution of the process is as follows: D e B a ðtÞ $ N ð0; 2Dat aÀ1 Þ:

Methodology
In this section, the methodology to estimate the a parameter (the coefficient responsible for the type of diffusion) for fBm and sBm will be presented. The proposed approach relies on the properties of Gaussian processes, namely that those can be uniquely identified with the expected value l and covariance matrix R [39]: where fXðtÞg | denotes transposition of the column vector fXðtÞg.
Let us consider N trajectories of a Gaussian process fXðtÞg where t 2 f0; . . .; Tg with Xð0Þ ¼ 0 almost surely.
If the mean l is equal to zero, as it is in fBm and sBm, we can consider just the covariance matrix R. However, it has many values, among which many might not provide new information, as the trajectory has length T þ 1 and covariance matrix has size ðT þ 1Þ 2 . Furthermore, one trajectory (N ¼ 1) is insufficient to calculate a reliable estimate, as the estimator of R for a process fXðtÞg is as follows: X n ðtÞX n ðsÞ:

Fractional Brownian motion
As it is shown in Sect. 2.1, the fBm covariance depends on both parameters t and s (Eq. (5)). This problem can be simply resolved by differentiating the process to get fBn (see Sect. 2.2). Then, the covariance depends on the difference jt À sj ¼ s (see ACVF in Eq. (6)). Such ACVF can be efficiently estimated from a single trajectory using the following formula: Furthermore, for the ACVF to be independent of the scale parameter C, we can simply scale the trajectory: DB a ðtÞ ffiffiffiffiffiffif can be used to estimate a. Details are described in the following article [34].

Scaled Brownian motion
However, for sBm, the sBn's ACVF does not depend only on a lag s, thus it is needed to approach the problem differently. The idea is to make the vector stationary and as stable numerically as possible. Using Eq. (7), one can obtain the following results: While the goal is to estimate a using one trajectory, Eq. (9) is used as the suggestion on how to ''stationarize'' the process. Hence, the following process will be used for estimation of a parameter: Similar calculations could be conducted for sBm, however, the resulting estimators characterize a large variance causing numerical instability.
Omission of the expectations certainly changes the behavior of the vector. However, as we show, this estimator is unbiased for each value t. Proof To show that the estimators are unbiased, it is needed to validate that the expectation matches the estimated parameter, namely: Consequently, this estimator is unbiased: h One could think that with such a property, it should be reasonable for a single trajectory to define the following unbiased estimator: However, it characterizes the large variance. In Fig. 1, sample trajectories of sBm with corresponding evaluations of fnðtÞg statistics (10) are presented. The distribution of fnðtÞg from certain t starts to stabilize. It is easily observed that the distribution is not symmetric-it is highly skewed. In addition, the moving average does not necessarily match the true value nðtÞ ¼ a À 1.
In Fig. 2, we see that although the estimator's mean matches the true value, yet the spread is severe. Depending on the trajectory size, the observed mean absolute error (MAE) for T ¼ 1024 is MAE ¼ 1:030 and for smaller sample T ¼ 32 it reaches MAE ¼ 1:255. Thus, while using this estimator, it is highly probable to confuse the type of anomalous diffusion. Consequently, a superior aggregation function is needed.

Neural network algorithm
The architecture of the proposed neural network will be simple and similar to one in [34]. However, one significant change will be done that allows for variable input vector size. The first layer will be exchanged for a recurrent layer, to be precise, the long short-term memory (LSTM) [42,43]. The ingestion of the input vector fnðtÞg will be done in the natural order-with increasing time. It will 260 Int J Adv Eng Sci Appl Math (2021) 13(2-3):257-269 allow the algorithm to disregard the more variable part (beginning) of the fnðtÞg vector (see Fig. 1) and learn more stably. In recurrent neural networks (RNNs), the so-called exploding and vanishing gradient problem occurs [44], and this approach minimizes the risk of such an event occurring.
The RNN architecture is especially needed because with multilayer perceptron (MLP) [45] (used in [34]) the model's error does not plateau (Fig. 3)-with increasing e T the error is decreasing with log-linear rate (from certain e T ). This is inline with the law of large numbers [46]. Furthermore, we observe that addition of the mean does improve the performance at first. However, from certain e T , the difference is negligible, despite the fact that there is the difference of min e T ðT À e T Þ ¼ 512 values used for estimation (as T ¼ 1024).
Thus, the proposed model is built from the LSTM layer of size 128 followed by 5 densely connected layers (with the corresponding number of neurons 128, 128, 128, 64, However, other well-performing functions could be used like GeLU [49] (very similar to SiLU), ELU [50] or classical Leaky ReLU [51]. The choice of SiLU was dictated by its superior performance in the majority of NN applications. The output layer consists of 1 neuron corresponding to estimated value a. To constrain the NN to estimate values from the interval (0, 2), the scaled sigmoid e rðxÞ activation function was used: To learn the NN's parameters, Adam optimization algorithm [52] was applied as it quickly converges to a minimum of a loss function [53][54][55][56]. The optimization problem is to minimize mean square error (MSE) loss function [45]: The optimization was set to a maximum of 50 epochs (with early stopping when the validation error does not improve for 5 epochs). Furthermore, the learning rate was set to decrease when the validation error plateaus (by a factor of 10 starting from 0.01). While in [34] MLP NN architecture was used for intelligent fBm identification, we will attach such architecture into plots.

Monte Carlo simulation study
The classic estimators, proposed (for sBm) in Sect. 3.2 and (for fBm) in article [34], assume a certain model and with that information, the comparison of the theoretical ACVF to the true one is made. On the other hand, the NN-based methods propose a certain structure of a mapping that is to be derived from a dataset. The universal approximation theorem [57] implies that any function can be approximated by a neural network with a single hidden layer and sigmoid rðxÞ activation function (12). Thus, the proposed model (in Sect. 3.3), to be viable, needs to learn from the data generated by the theoretical process. For this reason, we first present the method for data generation. To make the comparison easier, the models are of the same size (layer and parameter count). Then, we compare each method using a prediction error measure-MAE. Furthermore, we check each methods' resilience to noise (which simulates the error of the measurement equipment [31]).

Data generation
Standard machine learning approaches use the division of the data into 3 distinct subsets: training, validation, and test set. This step is important as the NNs are prone to overfitting (remembering the data). To measure whenever the trained model suffers from this effect, a validation set is used. This set is also used to fine-tune the hyperparameters of the model. The final performance of the model is measured using the test set. We generate the datasets with 6 lengths of trajectories T 2 f2 5 ; 2 6 ; . . .; 2 10 g for models to learn the ''noise'' caused by the small sample size and the more stable true values of the ACVF estimates (law of large numbers). For a single trajectory length, we generated N train ¼ 262144 (2 18 ), N val ¼ 32768 (2 15 ), and N test ¼ 8192 (2 13 ) trajectories for training, validation, and test set, accordingly. Each trajectory is generated with a random a parameter taken from the uniform distribution Uð0; 2Þ. The motivation behind such sampling is for the models to learn all possible cases of anomalous diffusion evenly.
For the LSTM NN model (defined in Sect. 3.3), we create an input vector of random size, namely we put e T ¼ DUð32; TÞ. It is to force the NN to learn a more robust and efficient estimator. The choice of 32 as a lower limit of e T parameter is caused by the fact that for fBm it is the minimal number of ACVF lags needed for the estimation error to plateau (as shown in [34]). Thus, for the comparison, the same strategy will be applied for sBm.
For the noise influence study, we simulate the measurement error by the addition of independent, identically distributed (i.i.d.) centered Gaussian noise N ð0; r 2 Þ to raw trajectories.

Methods' comparison
Here, we compare the ''Naive'', MLP and LSTM methods for two models (fBm and sBm). The ''Naive'' method for fBm (as described in [34]) is given by the formula: a Naive-fBm ¼ log 2 2R 1 þ 2 À Á and the estimator for sBm is given in Eq. (11). In Fig. 4, for each process and method, quantile lines are presented. We observe that the ''Naive'' method for fBm behaves vastly better than the one for sBm. It has low accuracy across all the a parameter space. The median of absolute error (AE) is always higher than 1. Consequently, even for edge cases of anomalous diffusion, with this estimator, half of the trajectories' diffusion type will be misclassified. The performance is greatly improved with the NN approach. The upper quantile does not even reach the error of 1. Indicating that the method is much more robust. However, comparing the same models for different processes, there is a significant difference in the accuracy with which those methods estimate a. NN's MAE for fBm is on the level of MAE fBm ¼ 0:06 when for sBm it is almost of order of magnitude greater MAE sBm ¼ 0:21. This is undoubtedly caused by the fact that fBm's ACVF provides more aggregated and essential information about the process. While for sBm, the information is much more spread across the whole vector. This effect could also be observed when analyzing the influence of input vector size on the performance (Fig. 3). Furthermore, for both fBm and sBm, MLP and LSTM models perform similarly; thus, in real-life applications, when the vectors are of constant size (or differ by a small margin), it is recommended to opt for a simpler and faster approach, namely the MLP.
When analyzing the influence of the added noise on estimators, in Fig. 5, one can observe that all models suffer. The smallest change is apparent in sBm's ''Naive'' method, as it in a noiseless environment struggled to estimate correctly. After the addition of noise, the variance of the estimator increased and the median stabilized at MAE ¼ 1. Furthermore, NN estimators for sBm decreased their accuracy significantly even for small r. However, the level of MAE is fairly constant for even larger amounts of noise (large r). The NN-based estimators for fBm are much more resilient to noise as they steadily increase their MAE with increasing r. However, they reach higher levels than sBm, where the type of diffusion is hard to distinguish. For fBm, the 75% quantile almost reaches AE fBm ¼ 1 while for sBm the same quantile only reaches AE sBm ¼ 0:5.
Nevertheless, for NN, it is fairly simple to add the assumption of noise to the model. As it trains from trajectories, thus, all is needed is to augment the dataset by adding the Gaussian noise to the trajectories. Then, the NN should become more resistant to the noise. This technique is also known as dataset augmentation or model regularization [58].
In Fig. 6, accuracy metrics for the retrained models are presented. fBm model has learned a significantly more resilient estimator as the error only starts to increase around r ¼ 1:5. One cannot say the same about the retrained sBm a estimator as Fig. 6b greatly resembles results for the previously presented model (Fig. 5b). It is more vivid in Fig. 7, where Figs. 5 and 6 are overlaid. There, one can see that for both processes, the models, at the expense of the noiseless case, improved the distribution of the error for environments with a larger noise. For fBm the change is evident, however, for sBm it is more subtle. The median error did not change significantly, yet the distribution is more concentrated-the upper quantiles are considerably lower. Consequently, the issue of discerning the diffusion type is reduced. This vastly different behavior of the models could be reasoned by the fact that sBm, because of independent increments, can utilize only s ¼ 0 lag. While the added noise is independent, it influences solely s ¼ 0 ACVF's lag. For this reason, it is much harder to separate the effect of noise in the approach proposed for the sBm process. For fBm, the problem is much simpler to handle because there are other lags s 6 ¼ 0 available which values are not obscured by the noise.

Summary and conclusions
Anomalous diffusion phenomenon may emerge in various applications. For this reason, we explored two popular anomalous diffusion Gaussian processes used to model such behavior. For both processes, although similar, we presented different methodologies to identify the fBm and sBm. Even though we used those specific processes, it could be any Gaussian process that is either weakly stationary or with independent increments. Showing the universality of those methods.
The methodology is based on the crucial property of Gaussian processes, namely that such a process can be uniquely identified by the mean vector and covariance matrix. Consequently, when trying to design features describing the process, we can solely base on those quantities. For centered, weakly stationary processes (such as fBn), we can depend exclusively on ACVF which can be efficiently estimated from a single trajectory. The approach in [34] utilizes such ACVF vectors to identify fBm.
In our novel approach, concerning the estimation of anomalous diffusion parameter a for sBm, we also utilize ACVF; however, as sBn is nonstationary and independent, the same method could not be used. Thus, we proposed a way to ''stationarize'' process (with the help of ACVF), which has proven to give reasonably stable trajectories. This allowed us to feasibly estimate the a parameter. Although the standard approach to create an estimator has failed because of the large variance, the NN-based one has produced an estimator which allows estimating the diffusion parameter with an accuracy that is more than sufficient to discern the diffusion type. While in real-life applications the noise is omnipresent [31], it is critical for the estimators to withstand the influence of distortions and not degrade severely. For this reason, we validated and compared the theoretical estimators with NN-based ones, with and without dataset augmentation (which should be seen as a regularization technique [58]). The results show that NN approaches are more resilient to noise even without regularization. Additionally, with the added dataset augmentation, the robustness further improved and significantly surpassed the theoretical estimators. It was observed that this technique made fBm NN-based estimator significantly more robust; however, such effect was not as vivid for sBm, by reason of independent increments.
The proposed NN models are of similar size (for easier comparison); however, it should be noted that better accuracy could be achieved with larger models. Here, we increased the size of the model (number of parameters) of the proposed in [34] MLP model and reached lower MAE. Thus, achieving even better results is within the reach of the hand (however, the larger the model, the more computation and time is needed to converge).
The LSTM layer at its time was revolutionary for natural language processing (NLP), wherein the problem of various input lengths is customary. However, in the famous paper (in the NLP community) ''Attention is all you need'' [59], the authors state that the mechanism of attention and transformer is superior to LSTM or other recurrent NN topologies, such as [60][61][62]. Thanks to the usage of layer normalization [63] and skip-connections (proposed in [64]), the NN can be extended to larger sizes without gradient related problems [44]. Furthermore, this architecture preserves the sense of the order of the vector as the LSTM architecture. In many domains, approaches based on the one presented in [65] have shown to be successful [66][67][68][69]. Thus, inspiration from this architecture (and other extensions [70][71][72][73]) could also lead to better results.
Another extension of the approach presented in this paper is to use other statistics than ACVF. Furthermore, the model can take as input more than one statistic and intelligently combine those to reach a more reliable estimator. What is more, for this application, other architectures of NN can perform better. In the literature, there exist several other versions of recurrent layers [60][61][62]. Moreover, while we observed that the augmentation of the dataset helps in the models' generalization ability, thus other methods of regularization could help produce an estimator that is reliable also in other, more intricate, environments.
Funding This research received no external funding.

Declarations
Conflict of interest The author declares no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.