Abstract
In Chaps. 3 and 4, we introduced two concepts relevant for automating the task of system identification—GP and TAG. GP can be used to explore a generalized model set, spanning across multiple model structures and complexities, specified by the chosen TAG. These ideas help us to address Research Questions 1 and 3 laid out in Chap. 1. This leads us to the next question, i.e. Research Question 2: How can we incorporate multiple user-specified performance measures in the desired identification framework, and what is the appropriate notion of optimality in a multi-objective setting?
I didn’t have time to write a short letter, so I wrote a long one instead.
Mark Twain
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Limited only by the maximum tree-depth \(m_\mathrm{d}\), see Sect. 3.2.1.
References
Aguirre LA, Barbosa BH, Braga AP (2010) Prediction and simulation errors in parameter estimation for nonlinear systems. Mech Syst Signal Process 24(8):2855–2867
Astrom K (1979) Maximum likelihood and prediction error methods. IFAC Proc 12(8):551–574
Bershad NJ, Celka P, McLaughlin S (2001) Analysis of stochastic gradient identification of Wiener-Hammerstein systems for nonlinearities with Hermite polynomial expansions. IEEE Trans Signal Process 49(5):1060–1072
Billings SA (2013) Nonlinear system identification: NARMAX methods in the time, frequency, and spatio-temporal domains. Wiley, Chichester
Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput 6(2):182–197
Ehrgott M (2005) Multicriteria optimization, vol 491. Springer, Berlin
Emmerich MT, Deutz AH (2018) A tutorial on multiobjective optimization: fundamentals and evolutionary methods. Nat Comput 17(3):585–609
Fonseca CM, Fleming PJ (1993) Genetic algorithms for multiobjective optimization: formulation, discussion and generalization. In: Proceedings of the 5th international conference on genetic algorithms. Morgan Kaufmann Publishers Inc., pp 416–423
Giordano G, Sjöberg J (2016) Consistency aspects of Wiener-Hammerstein model identification in presence of process noise. In: Proceedings of IEEE conference on decision and control, pp 3042–3047
Hagenblad A, Ljung L, Wills A (2008) Maximum likelihood identification of wiener models. Automatica 44(11):2697–2705
Khandelwal D, Schoukens M, Tóth R (2018) On the simulation of polynomial NARMAX models. In: Proceedings of the IEEE conference on decision and control. IEEE, pp 1445–1450
Laumanns M, Thiele L, Zitzler E, Deb K (2002) Archiving with guaranteed convergence and diversity in multi-objective optimization. In: Proceedings of the 4th annual conference on genetic and evolutionary computation. Morgan Kaufmann Publishers Inc., pp 439–447
Ljung L (ed) (1999) System identification. Theory for the user, 2nd edn. Prentice Hall PTR. ISBN 0-13-656695-2
Ljung L (2001) Estimating linear time-invariant models of nonlinear time-varying systems. Eur J Control 7(2–3):203–219
Miettinen K (2012) Nonlinear multiobjective optimization, vol 12. Springer, Boston
Olver FW, Lozier DW, Boisvert RF, Clark CW (2010) NIST handbook of mathematical functions. Cambridge University Press, Cambridge
Piroddi L, Spinelli W (2003) An identification algorithm for polynomial NARX models based on simulation error minimization. Int J Control 76(17):1767–1781
Zhang Q, Li H (2007) Moea/d: a multiobjective evolutionary algorithm based on decomposition. IEEE Trans Evol Comput 11(6):712–731
Zitzler E, Deb K, Thiele L (2000) Comparison of multiobjective evolutionary algorithms: empirical results. Evol Comput 8(2):173–195
Zitzler E, Laumanns M, Thiele L (2001) Spea2: improving the strength Pareto evolutionary algorithm. TIK-report 103
Author information
Authors and Affiliations
Appendices
Appendix 1: Mathematical Preliminaries
Hermite Polynomials
Let \((\mathbb {R}, \mathcal {B}_\mathbb {R}, p_x)\) be a probability space on the real line \(\mathbb {R}\), where \(\mathcal {B}_\mathbb {R}\) is the Borel \(\sigma \)-algebra on \(\mathbb {R}\), and \(p_x\) is the probability measure of the Gaussian distribution. Consider the Hilbert space \(L_2(\mathbb {R}, \mathcal {B}_\mathbb {R}, p_x)\) of random functions f that satisfy,
where E is associated with the inner product
for any \(f,g \in L_2(\mathbb {R}, \mathcal {B}_\mathbb {R}, p_x)\).
Definition 5.11
Hermite polynomials \(He_{n}(x)\) of degree n are defined as
Hermite polynomials form a closed and complete orthogonal basis in the \(L_2\) space with the inner product defined in (5.39) (see [16]), i.e.,
where \(\delta _{n,m} = 1\) when \(n=m\) and 0 otherwise. Hermite polynomials possess a number of properties that will be used in order to derive simulation models automatically from a given stochastic model of form (5.2).
Since Hermite polynomials form a complete orthogonal basis in \(L_2(\mathbb {R}, \mathcal {B}_\mathbb {R}, p_x)\), any function \(f \in L_2(\mathbb {R}, \mathcal {B}_\mathbb {R}, p_x)\) can be expressed as a convergent linear combination of basis functions
Given a monomial in \(x\in \mathbb {R}\), one can compute an equivalent representation in terms of Hermite polynomials He(x) using the following inverse relationship
where \(\lfloor {.}\rfloor \) is defined as the floor operator, i.e., \(\lfloor n \rfloor = a\), where \(a \in \mathbb {Z}\) is the largest integer such that \(a \le n\). Hermite polynomials of normally distributed random variables have useful properties. Let X be a random variable distributed as \(\mathcal {N}(0,1)\). It can be shown that
The following result can be obtained by using (5.41)
Based on these properties, it has been shown in [3] that for \(X_1,X_2 \sim \mathcal {N}(0,1)\),
In particular, if \(X_1,X_2\) are independent, we get
Appendix 2: Mathematical Proofs
Proof of Lemma 5.8 [11] The proof relies on the use of the proposed Hermite polynomial based representation in (5.19). Let \(\gamma _{i} := c_i \prod _{j=1}^{n_{u}} u(k-j)^{b_{i,j}}\). Using (5.43), Eq. (5.22) can be written as
Note that (5.48) is of the form (5.19). Define partitions \(P_\mathrm {e}:= \{ i \in [1,{p}] \mid d_{i,q} \text { is even } \forall q \in [1,n_{\xi }] \}\) and \(P_\mathrm {o}:= \{ i \in [1,{p}] \mid i \notin P_\mathrm {e}\}\). This yields
Taking the expectation with respect to the noise process \(\xi \), and recognizing that the second term on the right hand side drops out (due to (5.44)), we get that
Appendix 3: Experimental Verification—Simulation Examples
Motivating Example for Simulation Error
Example 5.7
Consider the example of a linear connection of 15 mass spring damper systems with mass \(m_i = 60 + 10 \cos (\frac{2 \pi i}{15} + \frac{\pi }{4})\) for \(i \in [1,15]\), spring coefficients \(k_i = 500 - 60 \cos (\frac{2 \pi i}{16})\) for \(i \in [1,15]\), and \(k_{16} = 100\), and damping coefficients \(d_i = 150 + 25 \sin (\frac{2 \pi i}{16})\) for \(i \in [1,16]\). An input force is applied to mass \(m_{15}\), and the position of mass \(m_1\) is measured as the output of the system. A dataset consisting of 2000 data-points was generated using a white noise excitation signal u with \(\sigma _u = 5\). The output signal y is corrupted by additive white noise with \(\sigma _e = 1\). This results in an approximate signal-to-noise ratio of 16.5Â dB.
The dataset was used to estimate, using a PPEM method, the parameters of two parameterized model structures—(a) when the system is in the model set, i.e., when the model structure corresponds to the structure of the true data-generating system (DGS), and (b) when the system does not belong in the chosen model set. More specifically, in both scenarios, a BJ structure is chosen, which is represented as:
where \(\xi \sim \mathcal {N}(0,\sigma _{\xi }^2)\), and B, F, C and D are polynomials of order \(n_b, n_f, n_c\) and \(n_d\) respectively. Define \(P:=\{n_b, n_c, n_d, n_f\}\) as the set of orders that specify the structure of the model to-be-identified. For scenario (a), the polynomial orders are given by \(P_a = \{31, 0, 0, 30\}\) which corresponds to the structure of the DGS. For scenario (b), we choose \(P_b = \{2, 2, 2, 3\}\). PPEM was used to estimate the parameters based on one-step-ahead prediction error in the two scenarios.
To quantify the performance of the estimated model, we use the 1-step ahead prediction error and simulation error. The simulation error of a BJ model is defined as
where, \(y_s(k)\) is the simulated output for a measured excitation signal \(\{ u_{m}(k) \}_{k=1}^N\) can be computed as
Note that, unlike the one-step-ahead-prediction output, the simulated output \(y_s(k)\) does not depend on the past measured output \(y_s(k-1)\) but its own past values \(y_s(k-1)\).
Figure 5.3 depicts the 1-step ahead prediction and the simulation output achieved by the models estimated using PPEM in scenario (a). This sets a reference for the best prediction that can be achieved by an unbiased estimator. Figure 5.4 depicts the 1-step ahead prediction and the simulation output achieved by PPEM in scenario (b), i.e. in the case where the system is not in the model class. It can be observed here that the 1-step ahead prediction error performs worse compared to scenario (a), but not by a great margin. More specifically, the best fit ratio (BFR) drops from 79.2 to 69.4\(\%\). However, the simulation of the same model with the given input captures none of the dynamics observed in the output measurements, and achieves a BFR of only 4.2\(\%\). This illustrates that when the system does not belong to the model class, the auto-regressive component of the model may contribute in re-aligning the predictions with the measurements, whereas the estimated model of the system does not capture any of the dynamics of the underlying DGS. In this case, the low order identified model cannot be trusted to perform well for other objectives than prediction, like optimal control design or system analysis.
Remark 5.8
It should be highlighted that the results depicted here are subject to high variances, and that the results are not always bleak. The purpose of this example is not to provide a complete empirical analysis, but to merely show that, when the system is not in the model set, a model identified solely based on prediction error may achieve deceptively good performance.
Remark 5.9
Note that a cross-correlation test would reveal that the residual error \(\varepsilon _{p}\) is correlated to the input excitation \(u_{m}\), implying that the model structure in scenario (b) was not rich enough to capture the dynamics present in the data. However, performing such tests requires human intervention and goes against the research goals of this dissertation. Moreover, correlation tests do not generalize to non-linear dynamics.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Khandelwal, D. (2022). Performance Measures. In: Automating Data-Driven Modelling of Dynamical Systems. Springer Theses. Springer, Cham. https://doi.org/10.1007/978-3-030-90343-5_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-90343-5_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-90342-8
Online ISBN: 978-3-030-90343-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)