MultiPDF particle filtering in state estimation of nonlinear objects

This paper presents a new particle filter algorithm (MultiPDF) for state estimation of nonlinear systems. The proposed method is a modification of the standard particle filter approach. Due to the strong need for the acceleration of calculations and an improvement in the estimation quality of state estimation, the authors propose a method which enables one to divide the main particle filter into smaller sub-filters with an accordingly smaller number of particles for each one of them. The algorithm has been implemented for various numbers of particles and subordinate parallel filters. Estimation quality has been checked for nine nonlinear objects (both one- and multidimensional) and evaluated through the quality index, average root-mean-squared error. The computation time of the particle filter algorithm for several hardware configurations has been compared. Based on the obtained results, it can be concluded that, besides the computation acceleration, the parallelization of the particle filter’s operation also improves the estimation quality.


Introduction
Scientific and technical problems with noisy measurements require state estimation to filter out the noise. State estimation is widely used in power systems [1,2], electronics and electric drives [3,4], unmanned aerial vehicles (UAVs) [5][6][7] and in image processing [8,9]. An efficient tool for measurement noise filtering is the particle filter (PF), firstly proposed by Gordon, Salmond and Smith in 1993 [10]. The PF algorithm builds on the Bayesian Filter, which is the optimal solution [11]. It enables estimation in the presence of noise with any probability density function (PDF). Furthermore, it provides proper estimates even for strongly nonlinear objects, in contrast to Kalman filter algorithms [12].
Over the years, many modifications of the particle filter have been made, for example, the mixed Kalman particle filter (MKPF) [13], i.e., a combination of two Kalman filters, extended and unscented, with the PF method. This approach provides the advantage of combining the features of all three methods. Another example is the Remaining Useful Life Particle Filter (RULPF) [14], where the genetic algorithm has been applied to improve estimation quality. This modification maximizes the particles variety after the resampling step.
Some of the approaches invented for the particle filters use a new choice of the importance function, from which particles are drawn in a prediction step. One example of such PFs is the intelligent particle filter (IPF) [15] (it was inspired by the genetic algorithm again), in which the modification of small-weight particles into large-weight particles takes place through a change in their values. The generic particle filter can be treated as a special case of IPF with appropriate parameters. The next case is PF based on the Pearson correlation coefficient (PCCPF) [16], where the correlation coefficient is used to reduce the particles degeneracy, in place of the resampling step, or a new data-driven prognostic method based on the interacting multiple model particle filter (IMMPF) [17]. A feedback particle filter (FPF) with additional feedback information [18] has also been proposed. The latter method is used for noise removal from images through the update of a special gain function, using feedback information at every time step.
Another modification of the PF is the marginalized particle filter (MPF) proposed in [19]. State variables are divided into those which are either linear or nonlinear (based on the transition function). PF is used only for nonlinear ones, while the estimates of linear state variables are calculated by the extended Kalman filter (EKF) [20]. The measurement function for these variables is linearized. This approach is widely used, for example, in medicine or localization problems [21,22].
Another PF group constitutes of algorithms enabling the parallelization of computing during the same filter action in order to accelerate calculations. In reference [23], the authors proposed a method to parallelize the main filter for implementation on multiple processors. First, the computation speed in relation to the traditional method for theoretical models was compared. A significant time reduction benefit was shown.
Research was also carried out on the application of PF parallelization to real applications, such as determining the position in urban space [24]. The resampling bottleneck of the particle filter is very often parallelized. Parallel resamplings and its application among others in FPGA or GPU are presented in [25][26][27][28]. Parallel forms of different types of resampling known from the literature are shown, as well as the time bene-fits resulting from them. The need and applicability of the parallelization of PF calculations were also highlighted in the surveys [29,30], in which several proposals were also presented, like non-interacting particle filters (not using communication between sub-filters) or particle islands (using the sequential Monte Carlo SMC method). Additionally, the total particles' population is divided here into sub-populations (named "islands").
As presented above, sometimes, the PFs usage can be problematic because of their computational complexity, especially with a large number of particles and with multidimensional systems [31]. A modification of the particle filter proposed here by the authors can solve this problem. Additionally, the main aim of this research is to improve the quality of the estimation and find the group of objects for which this method works best. It is obvious that it is impossible to obtain a better quality of the estimation than for the Bayesian filter; however, the particle filter is only an approximation, so one can search for methods and modifications thanks to which, despite the use of a relatively small number of particles, the quality of the estimation will approach the optimal solution (for the case of using continuous PDF). The proposed method, named MultiPDF Particle Filter (MultiPDF PF), divides the total number of particles into smaller parts, which allows one to parallelize the computation and reduces the overall time cost. This approach also makes it possible to improve the quality of results compared to the traditional PF. As will be shown, several smaller sub-filters can mutually compensate for incorrect estimates.
In the vein of the MultiPDF approach, a total probability density function, which should be estimated by PF, is divided into parts. Each partial PDF is calculated using its own sub-filter with, respectively, smaller number of particles. (The total particle number is divided by the number of sub-filters, which can work parallelly.) As is shown in this paper, the total PDF can be treated as the set of particles from all sub-filters; however, it is not estimated in itself, and only the expected value of the PDF is calculated as the mean of expected values from all sub-filters (in each time step). Factors improving the quality of estimation using this method (and special types of plants) were indicated. The paper also shows the possibilities of accelerating the calculations thanks to the use of the MultiPDF PF.
The paper is organized as follows. In Sect. 2, the algorithm of the particle filter is presented. Section 3 contains a description of MultiPDF Particle Filter-the Real value of j-th state variable at k-th time step x i,(k) i-th particle vector at k-th time step i-th particle vector from l-th independent filter at k-th time step Weight of i-th particle at k-th time step

Particle filter
This method can be used for discrete-time nonlinear objects (also for plants with non-Gaussian transition or measurement noises). The models can be written in state-space representation given by where x (k) denotes the state vector at the k-th time step, u (k) is the input vector, y (k) is the output vector, v (k) and n (k) are the vectors of process and measurement noises vectors, respectively, and f(·) and h(·) denote vectors of nonlinear functions. The main task of PF is to estimate the state vector based on the available measured outputs, known PDFs, and inflicted inputs to the system. The PF operation principle is based on the recursive Bayesian filter [11] and is given by where Y (k) is the set of output measurements from y (1) to is the prior PDF, and p y (k) |Y (k−1) is the evidence given by equation It is worth noting that the Bayesian filter, apart from the problem of online filtration presented in the paper, may also concern two other issues. These are smoothing, where the past state vector p x (k) |Y (k+t) is estimated, and prediction, where the goal is to estimate future state values p x (k+t) |Y (k) , which are given by the formulas and Contrary to the filtering process, the above-mentioned issues are not used in real-time applications, but in postor pre-processing noise-smoothing applications [32][33][34].
In the PF algorithm, the posterior PDF is represented by a set of particles. Each i-th particle consists of the value x i and the weight q i for i = 1, 2, ..., N p , where N p is the number of particles. Hence, the i-th particle is represented by the pair {x i , q i }. When the particles number N p is large enough, posterior PDF can be approximated by where δ(·) is a Dirac delta function. The simplest and first proposed particle filter method was the bootstrap filter (BF) [11]. The algorithm of bootstrap filter is presented below.

Update. Calculate particles' weights based on the measurement model
4. Normalization. Scale values of the weights in such a way that their sum equals 1. 5. Resampling. Draw N p particles, but only from existing ones. PDF for drawing is defined by the set of normalized weights (in the current paper the systematic resampling is used [35]). 6. End of the iteration. Calculate the estimate of the state vector at k-th time step, update the time step k := k + 1, go to Step 2.
The presented algorithm, named bootstrap filter, is a special case of sequential importance resampling (SIR) [36], in which the resampling step is executed conditionally, when the effective sample size value, given by is not high enough. In the SIR update step, particles are drawn from is the importance function. In the PF algorithm proposed here, the resampling is executed at every time step, and the importance function (from which particles are drawn) is given by ), so calculations in prediction and update steps are simplified in relation to the SIR method. More information about particle filters can be found in [36,37].

MultiPDF particle filter
In this paper, we propose a modification of the particle filter-MultiPDF PF, which divides the main filter into N f parts-independent filters with N p N f particles each. These parallel filters are independent, and they can communicate with each other (in general). At each time step, the posterior PDF is calculated in all N f sub-filters independently, as well as estimated value. The estimated value of the whole MultiPDF PF is calculated as the mean of sub-filter results. After calculations, the N f independent state variable results are averaged ( Fig. 1).
It is assumed that PDFs of each parallel sub-filter are completely independent. This way, a set of estimated state vectors is obtained. Based on equation (6), one can writê wherep [l] (·) is the estimated posterior PDF from the l-th sub-filter. The estimated state vector x (k) [l] is calculated in every time step based onp [l] (·).
For example, for a total particle number N p = 500 and dividing PF into N f = 3 parallel parts, every lth sub-filter will contain N pl = round 500 [l] from the initial PDF p(x (0) ), set time step k = 1.
2. Calculate N f state vector values. For each parallel filter l = 1, ..., N f perform the following steps: ) (for every i = 1, ..., N p l ). The set of particle values is given for each parallel filter separately, from the previous time step. b. Update. Calculate the weights of particles based on the measurement model c. Normalization. Normalize the weights so that their sum equals 1. d. Resampling. Draw N p l particles from PDF obtained by the set of normalized weights (systematic resampling [35] has been used).
e. Calculate the estimated state vector x (k) [l] for the l-th independent sub-filter.
3. End of iteration. Calculate the estimatex (k) at k-th time step as the average of the N f estimated vectors obtained in Step 2, set k = k + 1, go to Step 2.
The operation of the method can be easily compared to tourists climbing to the peak, but without precise information on where the peak is located. (The position of the peak is the estimated value.) The group looks for the top, not on the basis of the ground slope (this is how the gradient methods would work), but randomly placing the participants of the expedition and finding which of them are higher than the others. On the other hand, if this group is divided into smaller parts, each sub-group will be worse at estimating the vertex individually, but if each group comes from a different slope, the average of their positions may give a better estimate than if they all worked together, within one group. In this comparison, it should be taken into account that this "climb" is one step, and this "peak" changes its position in each subsequent iteration (the state vector and thus the function p(y (k) |x i,(k) ) whose vertex we are looking for).
Sub-filters realize the same algorithm, but within different particle's sets. The only communication is averaging the estimate results from sub-filters at the end of each step, which does not require a lot of computation. It is worth mentioning that the communication used is one way-no sub-filter receives any information, and they only send information about the estimate to the master (averaging unit). This approach has a positive impact on the computational time over the PF, allowing for a linear acceleration of computations, which has been expanded on in Introduction. However, the authors do not reject the possibility of broader communication between sub-filters in the MultiPDF algorithm, but this has not been currently studied-it is a topic for future research.
It results from the research described in detail in Discussion section that for a two-dimensional system (with two state variables) in 85.46% of cases, the distance between the real and the estimated point x (on the trajectory for MultiPDF PF) is lower than the minimum obtained by the independently operating components (see Tables 6 and 7 for more details).
It can be also mentioned that in this work, in order to improve the estimation quality using parallelization of the particle filter, the object has to be multidimensional (the number of state variables must be greater than one). Also, the stronger the nonlinearity (especially in the measurement function), the better the estimation quality. The proposed algorithm allows one to minimize the degeneracy risk of all particles [36] (thanks to the division of the main filter), which often occurs in the mentioned plant types. The elements and threats mentioned above have been analyzed in Discussion section. Besides the results presented in the simulation part of this paper, additional studies were made on the influence of object factors on the estimation quality. See Appendix for the results.
The proposed solution is universal and allows one to speed up calculations in situations where there is limited computing power and time to perform the studies. This parallelization allows for an acceleration of calculations, for example, on GPU or FPGA [38][39][40][41][42], which is a current and much needed task. The special case of MultiPDF PF for N f = 1 is the simple bootstrap particle filter.

Examined objects
Nine objects were considered, eight of which were multidimensional and all nonlinear. They were selected in such a way as to show the influence of various plant features on the operation of the proposed algorithm. For example, some of them have linear transition functions and some have linear measurement functions.
Most of the examined systems are plants without an input signal (autonomous systems), so the change of their state is only due to the presence of process noise. Ob1 is used very often in particle filters studies [10,36]. According to [43], this object was firstly proposed in 1978 by Netto, Gimeno and Mendes.

Ob1
: N (0; 10), n (k) ∼ N (0; 1), Ob2 is a multidimensional model proposed by the authors, with 5 state variables and 15 outputs. This plant shows the benefits of the parallelization of the particle filter.
Ob2 : To simplify as much as possible the notation, it was assumed that Ob3 has been proposed (in each of the following 5 versions) in [44]. This object has been created in a network structure; the precursor for it is a power system grid. However, in power systems, every additional node (bus) is associated with two additional state variables. In the proposed network-type theoretical object, every node is related to only one additional state variable, which enables research on the system structure in a relatively short time (the number of particles N p should grow exponentially with the number of state variables N x ).

Ob3
: Subsequent versions of the system Ob3 (with different measurements) are as follows: All the transition and measurements functions for this plant can be found in [45] (where this object is referred to as "Ob403").
Ob4 is a two-dimensional system which was also proposed by the authors. This plant also enables one to show the improvement of the algorithm quality for a higher number of parallel sub-filters.

Ob4
: Figure 2 Ob5 is a magnetic levitation system (MLS) [46] produced by Inteco. The electromagnetic forces (caused by the voltages applied to the system inputs) change the position of the ball (sphere), which is one of the outputs of the system. The second measurement is the electromagnetic force from the upper magnet, which is given by a nonlinear function of the state variables.
The system is given by the continuous and strongly nonlinear state equations below.

Ob5
: Object parameters are described in Table 3. For the simulation test, the model was discretized by the Euler extrapolation method with the sampling period T s = 0.1 ms. Constraints on the system, also included in the simulation model, are as follows. 0m ≤ x 1 (t) ≤ x d m

Simulation results
The aim of this part of the paper is to compare the performance quality of the proposed MultiPDF PF with several known particle filter algorithms (bootstrap and marginalized). Simulations and calculations were performed for nine objects and repeated many times to obtain accurate results. Each time the ground truth was generated separately, and the adjustment of the estimation results was checked. For Ob1, simulations were repeated 1400 times, for Ob2-at least 100 times, and for the remaining systems, the tests were repeated several thousand times. The different number of simulations is a consequence of the specific construction of each object and was chosen to obtain the appropriate accuracy of the simulation results. Simulations were performed using various numbers of particles N p and parallel filters N f -the results were compared for those parameters.
In this paper, the average root-mean-square error (aRMSE) quality index has been used, which is given by where M is the number of time steps, N x is the number of state variables, andx (k) j and x +(k) j are the estimated and the real value of the j-th state variable, respectively, obtained at the k-th time step.
Additionally, for objects containing linear state variables in the transition function (Ob2, Ob3 and Ob5), the operation of the above-mentioned algorithms was compared with marginalized PF [19,21]. The analysis of the computational complexity of the algorithm is presented in [47]. MPF enables one to use fewer particles than PF for the same, multidimensional system.
The simulation results (quality indices) are presented in Figs. 3,4,5,6,7,8,9,10,11. Standard deviations with 95% probability in both directions from the mean, according to the 68-95-99.7 rule, are marked on the graphs. Standard deviations were estimated based on the well-known square root law which states that for a Gaussian PDF, the standard deviation of the mean value is √ m times smaller than the standard deviation from m observations. The results of an MPF operation for Ob2 and Ob3 are presented in Table 4 (the results have not been presented in the figures, because the values exceeded the chart's ranges). Since all state variables in the transition function are linear, the performance of MPF is independent of the number of particles [21], and hence, only one value is presented for these plants. In the case of Ob5, the performance quality was also examined under the influence of particle number, and the results are shown in Fig. 11. The algorithms 1 and 2 were thoroughly analyzed for Ob1. Firstly, the theoretical complexity of both was calculated. Then, after implementation of the algorithms in the C# programming language, the performance on four independent hardware platforms was tested. The goal was to compare computational speed of algorithms and determine how the number of threads affects the speed.
The complexity was calculated using the most common method called big O notation [48]. The method allows one to describe the asymptotic behavior of a function or a set of functions. The conducted calculations have shown that both algorithms have the same theoretical complexity, i.e., O(n 2 ). In this case, dissimilarities are not noticeable, due to the feature of The implementation, as mentioned before, was done in the C# programming language using the .NET Framework 4.7.2. Computational time was measured for the cases where the number of particles was 1200 and then 1500. As MultiPDF Particle Filter is ded-icated for multithreaded processors, tests were performed using 1, 2, 3, 5, 7 and 10 threads. The platform details are shown in Table 5. For each experiment, Microsoft Windows operating systems based on x64 architecture were used.
The results of the performed tests are shown in Figs. 12 and 13. Each value was calculated as an average of 10 tests. 12 tests per single scenario were conducted, and the best and the worst time was excluded due to Fig. 11 Values of aRMSE for Ob5 with a variable number of particles N p and for a different number of parallel sub-filters N f ; additionally, the results of the MPF's operation are shown possible anomalies. As Platform D was much slower than others, it was decided to use a secondary axis (on the right) where values are correlated with the gray bars. It makes charts much easier to read. Figure 12 presents results for N p = 1200 particles, and Fig. 13-for N p = 1500 particles. The simulations were performed for the same number of threads and the number of N f , marked in the graphs as "T". As one can see, the computational time for PF and MultiPDF (1T) PF for the same platforms is similar, and both of them are using only one thread. With the increasing number of threads, one can observe a reduction in computational time. While working with more than 7 threads, computational time is not reduced, and even extended in a few cases. The situation is directly related to processors, especially with the numbers of cores and threads. Platforms B, C and D offer only 4 threads (or cores). Platform A offers up to 16 threads. If one is trying to use more threads than physically or logically available, the system wastes a lot of time changing active tasks. In the literature, it is called context switching [49].
Based on the obtained results, the authors can confirm that their algorithm is well optimized, it is scalable and it could reduce computational time up to 7 times, when used with multithreaded systems. In that sense, a calculation which was done within 7 days can now be done in just 1 day. What's more, the benefits of the parallelization are noticeable even using older hardware.
Results are not so impressive as in the case of modern hardware, but the algorithm is still up to 2.5 times faster than the PF version. and trigonometric functions) the PF algorithm dividing into a higher number of parallel filters works better. For (almost) every case, the increase of the number of particles for an unchanged N f value causes an improvement in the estimation quality. The results obtained for the last systems, named Ob4 and Ob5, show benefits from the proposed PF modification. As one can see in Figs. 10 and 11, the higher the N p parameter, the better the estimation quality (i.e., the lower value of the aRMSE index). This may be caused by the strong nonlinear character of these plants and their multidimensionality. For a higher (than one)dimensional system, there are a lot of time steps when the estimation quality deteriorates, because of a long distance between the estimated and real values. Using more than one sub-filter, there is always a chance for error compensation by sub-filters resulting in the estimated values being closer to the real values.
Additional research has been carried out for Ob4, for simulations with N p = 300, N f = 3 (each parallel filter with 100 particles) and M = 100000. In every time step (k = 1, ..., M), the authors examined mean values and variances of the distances between the estimated and real values given by Distances were calculated for three parallel filters (PF [1] , PF [2] and PF [3] ), for whole MultiPDF PF and for the traditional bootstrap filter (specifically Multi-PDF PF with N f = 1). As one can see in Table 6, the results from parallel filters provided higher mean values and standard deviations than from single ones, but statistics for estimate from MultiPDF PF are noticeably smaller. This is caused by the compensation of the larger distances by the results from other sub-filters with a better fit. The bootstrap filter provided somewhat worse results, because there is no possibility of the wrong estimated values being compensated by the other components, as in MultiPDF PF with a parameter N f higher than 1. Table 7 presents the percent of cases, in which the distance of the MultiPDF PF estimated value given by (14) is smaller than for the special cases (named "events") during one simulation with M = 100000. A comparison of distances for MultiPDF and BF slightly favors BF, but as one can see in Table 6, the mean distance for MultiPDF PF is generally smaller than for BF. These results show the potential benefits with the possibility of the compensation of wrong values. As one can see, almost always the result from one of the parallel filters is farther from the true value than in the case of the MultiPDF PF result.
Most of Figs. 3,4,5,6,7,8,9,10,11 show that for a high number of particles, standard deviations for specific number of particles overlap. This is due to the very small distance between these results on the graph. For versions (v4) and (v5) of Ob3, the distance between adjacent charts is relatively large for many particles. This is probably caused by the strong nonlinearity of these objects. To make results similar to each other for a different N f parameter, one must make calculations for a much larger number of particles. In Fig. 8 and for the highest N p , an interesting phenomenon can be observed: as the number of particles increases, the quality of estimation worsens. This may be caused by the distance between the subsequent noised measurements. The more particles, the greater is the chance that noisy measurements correspond and will be recognized by the algorithm as correct, and hence, they will provide a higher quality index.
Regarding the marginalized PF simulation results (Table 4), it can be shown that the estimation quality was improved only for Ob3 in version (v1), because of linear measurement function of this plant. Only for this plant, the quality index in Table 4 is smaller than most    values than all values visible in the graphs). Therefore, in some cases, speeding up the calculations using MPF may result in a significant deterioration of the estimation quality. Ob5 is the system with one linear and three nonlinear state variables. The MPF results in this case show a slight improvement over BPF (the Kalman filter works better for linear systems), but with paralleliza-tion the MultiPDF PF still works best. The average simulation time measured for 1500 particles was 6.29 s for PF and 5.72 s for MPF, so the time gain was small and could be compensated for by MultiPDF PF with multithreaded implementation.

Conclusions
The main conclusion of this article is that the proposed particle filter can improve the quality of estimation and its multithreaded implementation can significantly speed up calculations. The main condition for improving estimation quality is the presence of strong nonlinearities in the measurement function of the object (detailed conditions are presented in section MultiPDF Particle Filter and Appendix, additionally). For these objects types, there is a bigger chance for a wrong estimation, so large errors from one sub-filter can be compensated by the other units. For special object types, even a lot of parts make the quality index smaller. In general, a trade-off must be found between the speed of computation and the quality of the estimation, because all real-world applied systems are specific and different.
In most cases, an improvement in the estimation can be observed for smaller sums of particles, while for large sums there is no difference (or it is negligible) in the quality of the estimation. It is possible to further improve the quality of the estimation (as can be seen on the right-hand side of the graphs), but the additional computational effort has a relatively small impact on the result. At the same time, it can be clearly observed that improving the quality of estimation by using MultiPDF PF is possible especially for a relatively small number of particles, while according to the theory, when N p → ∞, the specific PF method does not matter.
In the vein of this method, it is also possible to further parallelize the PF algorithm (in every sub-filter), because the bottlenecks are normalization and resampling. Normally, they must be carried out using information about all particles. (In the first case, the sum of all weights is needed; in the second case, even using the fastest resampling, it is necessary to act in sequence, not parallel, while parallel resampling algorithms deteriorate the estimation quality [35].) Data Availability Data with saved simulation results are available on the website https://drive.google.com/drive/folders/1KdVh3v gxElqmAPH6Pl4AUvY0PyTfz0T-?usp-

Declarations
Conflicts of interest The authors declare that they have no conflict of interest Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/ by/4.0/.