Performance Measures

Khandelwal, Dhruv

doi:10.1007/978-3-030-90343-5_5

Dhruv Khandelwal²

Part of the book series: Springer Theses ((Springer Theses))

303 Accesses

Abstract

In Chaps. 3 and 4, we introduced two concepts relevant for automating the task of system identification—GP and TAG. GP can be used to explore a generalized model set, spanning across multiple model structures and complexities, specified by the chosen TAG. These ideas help us to address Research Questions 1 and 3 laid out in Chap. 1. This leads us to the next question, i.e. Research Question 2: How can we incorporate multiple user-specified performance measures in the desired identification framework, and what is the appropriate notion of optimality in a multi-objective setting?

I didn’t have time to write a short letter, so I wrote a long one instead.

Mark Twain

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Limited only by the maximum tree-depth $m_\mathrm{d}$, see Sect. 3.2.1.

References

Aguirre LA, Barbosa BH, Braga AP (2010) Prediction and simulation errors in parameter estimation for nonlinear systems. Mech Syst Signal Process 24(8):2855–2867
Article Google Scholar
Astrom K (1979) Maximum likelihood and prediction error methods. IFAC Proc 12(8):551–574
MATH Google Scholar
Bershad NJ, Celka P, McLaughlin S (2001) Analysis of stochastic gradient identification of Wiener-Hammerstein systems for nonlinearities with Hermite polynomial expansions. IEEE Trans Signal Process 49(5):1060–1072
Article Google Scholar
Billings SA (2013) Nonlinear system identification: NARMAX methods in the time, frequency, and spatio-temporal domains. Wiley, Chichester
Book Google Scholar
Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput 6(2):182–197
Article Google Scholar
Ehrgott M (2005) Multicriteria optimization, vol 491. Springer, Berlin
MATH Google Scholar
Emmerich MT, Deutz AH (2018) A tutorial on multiobjective optimization: fundamentals and evolutionary methods. Nat Comput 17(3):585–609
Article MathSciNet Google Scholar
Fonseca CM, Fleming PJ (1993) Genetic algorithms for multiobjective optimization: formulation, discussion and generalization. In: Proceedings of the 5th international conference on genetic algorithms. Morgan Kaufmann Publishers Inc., pp 416–423
Google Scholar
Giordano G, Sjöberg J (2016) Consistency aspects of Wiener-Hammerstein model identification in presence of process noise. In: Proceedings of IEEE conference on decision and control, pp 3042–3047
Google Scholar
Hagenblad A, Ljung L, Wills A (2008) Maximum likelihood identification of wiener models. Automatica 44(11):2697–2705
Article MathSciNet Google Scholar
Khandelwal D, Schoukens M, Tóth R (2018) On the simulation of polynomial NARMAX models. In: Proceedings of the IEEE conference on decision and control. IEEE, pp 1445–1450
Google Scholar
Laumanns M, Thiele L, Zitzler E, Deb K (2002) Archiving with guaranteed convergence and diversity in multi-objective optimization. In: Proceedings of the 4th annual conference on genetic and evolutionary computation. Morgan Kaufmann Publishers Inc., pp 439–447
Google Scholar
Ljung L (ed) (1999) System identification. Theory for the user, 2nd edn. Prentice Hall PTR. ISBN 0-13-656695-2
Google Scholar
Ljung L (2001) Estimating linear time-invariant models of nonlinear time-varying systems. Eur J Control 7(2–3):203–219
Article Google Scholar
Miettinen K (2012) Nonlinear multiobjective optimization, vol 12. Springer, Boston
MATH Google Scholar
Olver FW, Lozier DW, Boisvert RF, Clark CW (2010) NIST handbook of mathematical functions. Cambridge University Press, Cambridge
MATH Google Scholar
Piroddi L, Spinelli W (2003) An identification algorithm for polynomial NARX models based on simulation error minimization. Int J Control 76(17):1767–1781
Article MathSciNet Google Scholar
Zhang Q, Li H (2007) Moea/d: a multiobjective evolutionary algorithm based on decomposition. IEEE Trans Evol Comput 11(6):712–731
Article Google Scholar
Zitzler E, Deb K, Thiele L (2000) Comparison of multiobjective evolutionary algorithms: empirical results. Evol Comput 8(2):173–195
Article Google Scholar
Zitzler E, Laumanns M, Thiele L (2001) Spea2: improving the strength Pareto evolutionary algorithm. TIK-report 103
Google Scholar

Download references

Author information

Authors and Affiliations

Eindhoven University of Technology, Eindhoven, The Netherlands
Dhruv Khandelwal

Authors

Dhruv Khandelwal
View author publications
You can also search for this author in PubMed Google Scholar

Appendices

Appendix 1: Mathematical Preliminaries

Hermite Polynomials

Let $(\mathbb {R}, \mathcal {B}_\mathbb {R}, p_x)$ be a probability space on the real line $\mathbb {R}$, where $\mathcal {B}_\mathbb {R}$ is the Borel $\sigma $-algebra on $\mathbb {R}$, and $p_x$ is the probability measure of the Gaussian distribution. Consider the Hilbert space $L_2(\mathbb {R}, \mathcal {B}_\mathbb {R}, p_x)$ of random functions f that satisfy,

$$\begin{aligned} E\left[ f\right] = 0 \ \text {and} \ E\left[ f^2\right] < \infty . \end{aligned}$$

(5.38)

where E is associated with the inner product

$$\begin{aligned} \langle f,g \rangle = E\left[ f g \right] = \frac{1}{\sqrt{2\pi }}\int \limits _{-\infty }^{\infty } f(x) g(x) e^{-\frac{x^2}{2}} \mathrm {d}x, \end{aligned}$$

(5.39)

for any $f,g \in L_2(\mathbb {R}, \mathcal {B}_\mathbb {R}, p_x)$.

Definition 5.11

Hermite polynomials $He_{n}(x)$ of degree n are defined as

$$\begin{aligned} He_{n}(x) = (-1)^n e^{\frac{x^2}{2}} \frac{\mathrm {d}^n}{\mathrm {d}x^n}e^{-\frac{x^2}{2}}. \end{aligned}$$

(5.40)

Hermite polynomials form a closed and complete orthogonal basis in the $L_2$ space with the inner product defined in (5.39) (see [16]), i.e.,

$$\begin{aligned} \langle He_{n},He_{m} \rangle&= \frac{1}{\sqrt{2\pi }} \int \limits _{-\infty }^{\infty } He_{n}(x) He_{m}(x) e^{-\frac{x^2}{2}} \mathrm {d}x \nonumber \\&= n!\delta _{n,m}, \end{aligned}$$

(5.41)

where $\delta _{n,m} = 1$ when $n=m$ and 0 otherwise. Hermite polynomials possess a number of properties that will be used in order to derive simulation models automatically from a given stochastic model of form (5.2).

Since Hermite polynomials form a complete orthogonal basis in $L_2(\mathbb {R}, \mathcal {B}_\mathbb {R}, p_x)$, any function $f \in L_2(\mathbb {R}, \mathcal {B}_\mathbb {R}, p_x)$ can be expressed as a convergent linear combination of basis functions

$$\begin{aligned} f = \sum _{n=0}^\infty \frac{1}{n!} \langle f,He_{n} \rangle He_{n}. \end{aligned}$$

(5.42)

Given a monomial in $x\in \mathbb {R}$, one can compute an equivalent representation in terms of Hermite polynomials He(x) using the following inverse relationship

$$\begin{aligned} x^n = n! \sum _{m=0}^{\lfloor \frac{n}{2} \rfloor } \frac{1}{2^m m! (n-2m)!} He_{n-2m}(x), \end{aligned}$$

(5.43)

where $\lfloor {.}\rfloor $ is defined as the floor operator, i.e., $\lfloor n \rfloor = a$, where $a \in \mathbb {Z}$ is the largest integer such that $a \le n$. Hermite polynomials of normally distributed random variables have useful properties. Let X be a random variable distributed as $\mathcal {N}(0,1)$. It can be shown that

$$\begin{aligned} E[He_{n}(X)] = {\left\{ \begin{array}{ll} 1 \quad \text {for } n=0,\\ 0 \quad \text {for } n \ne 0. \end{array}\right. } \end{aligned}$$

(5.44)

The following result can be obtained by using (5.41)

$$\begin{aligned} E[He_{n}(X) \ He_{m}(X)]&= \int \limits _{-\infty }^{\infty } He_{n}(x) He_{m}(x) \frac{1}{\sqrt{2\pi }} e^{-\frac{x^2}{2}} \mathrm {d}x, \nonumber \\&= \langle He_{n},He_{m} \rangle , \nonumber \\&= n!\delta _{n,m}. \end{aligned}$$

(5.45)

Based on these properties, it has been shown in [3] that for $X_1,X_2 \sim \mathcal {N}(0,1)$,

$$\begin{aligned} E[He_{n}(X_1) \ He_{m}(X_2)] = n!\delta _{n,m} (E[X_1X_2])^n. \end{aligned}$$

(5.46)

In particular, if $X_1,X_2$ are independent, we get

$$\begin{aligned} E[He_{n}(X_1) \ He_{m}(X_2)] = {\left\{ \begin{array}{ll} 1 \quad \text {if } n =0, m=0,\\ 0 \quad \text {otherwise.} \end{array}\right. } \end{aligned}$$

(5.47)

Appendix 2: Mathematical Proofs

Proof of Lemma 5.8 [11] The proof relies on the use of the proposed Hermite polynomial based representation in (5.19). Let $\gamma _{i} := c_i \prod _{j=1}^{n_{u}} u(k-j)^{b_{i,j}}$. Using (5.43), Eq. (5.22) can be written as

$$\begin{aligned} y(k) = \sum _{i=1}^{{p}} \Bigg ( \gamma _{i} \prod _{q=1}^{n_{\xi }} \Bigg ( d_{i,q}! \sum _{m=0}^{\lfloor { \frac{d_{i,q}}{2} \rfloor }} \frac{He_{d_{i,q}-2m}(\xi (k-q))}{2^m m! (d_{i,q}-2m)!} \Bigg ) \Bigg ) + \sum _{r=1}^{n_y} g_r y_{k-r}. \end{aligned}$$

(5.48)

Note that (5.48) is of the form (5.19). Define partitions $P_\mathrm {e}:= \{ i \in [1,{p}] \mid d_{i,q} \text { is even } \forall q \in [1,n_{\xi }] \}$ and $P_\mathrm {o}:= \{ i \in [1,{p}] \mid i \notin P_\mathrm {e}\}$. This yields

$$\begin{aligned} y(k)&= \sum _{{i \in P_\mathrm {e}}} \gamma _{i} \prod _{q=1}^{n_{\xi }} \left( d_{i,q}! \sum _{m=0}^{\lfloor { \frac{d_{i,q}}{2}\rfloor } } \frac{He_{d_{i,q}-2m}(\xi (k-q))}{2^m m! (d_{i,q}-2m)!} \right) \nonumber \\&\quad +\sum _{{i\in P_\mathrm {o}}} \gamma _{i} \prod _{q=1}^{n_{\xi }} \left( d_{i,q}! \sum _{m=0}^{\lfloor { \frac{d_{i,q}}{2} \rfloor } } \frac{He_{d_{i,q}-2m}(\xi (k-q))}{2^m m! (d_{i,q}-2m)!} \right) + \sum _{r=1}^{n_y} a_{ry}(k-r). \end{aligned}$$

(5.49)

Taking the expectation with respect to the noise process $\xi $, and recognizing that the second term on the right hand side drops out (due to (5.44)), we get that

$$\begin{aligned} y_s(k)&= \sum _{i \in P_\mathrm {e}} \gamma _{i} \prod _{q=1}^{n_{\xi }} E_{\xi } \Bigg ( d_{i,q}! \bigg ( \frac{He_{0}(\xi (k-q))}{2^{\frac{d_{i,q}}{2}} \frac{d_{i,q}}{2}! } \nonumber \\&\qquad +\sum _{m=0}^{\lfloor { \frac{d_{i,q}}{2} -1 \rfloor } } \frac{He_{d_{i,q}-2m}{\xi (k-q)}}{2^m m! (d_{i,q}-2m)!)} \bigg ) \Bigg ) + E_{\xi } \left[ \sum _{r=1}^{n_y} a_{ry}(k-r)\right] ,\nonumber \\&= \sum _{i \in P_\mathrm {e}} \gamma _{i} \prod _{q=1}^{n_{\xi }}(d_{i,q}-1)!! + \sum _{r=1}^{n_y} a_{r} y_s(k-r), \end{aligned}$$

(5.50)

Appendix 3: Experimental Verification—Simulation Examples

Motivating Example for Simulation Error

Example 5.7

Consider the example of a linear connection of 15 mass spring damper systems with mass $m_i = 60 + 10 \cos (\frac{2 \pi i}{15} + \frac{\pi }{4})$ for $i \in [1,15]$, spring coefficients $k_i = 500 - 60 \cos (\frac{2 \pi i}{16})$ for $i \in [1,15]$, and $k_{16} = 100$, and damping coefficients $d_i = 150 + 25 \sin (\frac{2 \pi i}{16})$ for $i \in [1,16]$. An input force is applied to mass $m_{15}$, and the position of mass $m_1$ is measured as the output of the system. A dataset consisting of 2000 data-points was generated using a white noise excitation signal u with $\sigma _u = 5$. The output signal y is corrupted by additive white noise with $\sigma _e = 1$. This results in an approximate signal-to-noise ratio of 16.5 dB.

The dataset was used to estimate, using a PPEM method, the parameters of two parameterized model structures—(a) when the system is in the model set, i.e., when the model structure corresponds to the structure of the true data-generating system (DGS), and (b) when the system does not belong in the chosen model set. More specifically, in both scenarios, a BJ structure is chosen, which is represented as:

$$\begin{aligned} y(k) = G(q^{-1}) u(k) + H(q^{-1}) \xi (k) = \frac{B(q^{-1})}{F(q^{-1})} u(k) + \frac{C(q^{-1})}{D(q^{-1})} \xi (k), \end{aligned}$$

(5.51)

where $\xi \sim \mathcal {N}(0,\sigma _{\xi }^2)$, and B, F, C and D are polynomials of order $n_b, n_f, n_c$ and $n_d$ respectively. Define $P:=\{n_b, n_c, n_d, n_f\}$ as the set of orders that specify the structure of the model to-be-identified. For scenario (a), the polynomial orders are given by $P_a = \{31, 0, 0, 30\}$ which corresponds to the structure of the DGS. For scenario (b), we choose $P_b = \{2, 2, 2, 3\}$. PPEM was used to estimate the parameters based on one-step-ahead prediction error in the two scenarios.

To quantify the performance of the estimated model, we use the 1-step ahead prediction error and simulation error. The simulation error of a BJ model is defined as

$$\begin{aligned} \varepsilon _{s}(k) := y_m(k) - y_s(k), \end{aligned}$$

(5.52)

where, $y_s(k)$ is the simulated output for a measured excitation signal $\{ u_{m}(k) \}_{k=1}^N$ can be computed as

$$\begin{aligned} y_s(k) = \frac{B(q^{-1})}{F(q^{-1})} u_{m}(k). \end{aligned}$$

(5.53)

Note that, unlike the one-step-ahead-prediction output, the simulated output $y_s(k)$ does not depend on the past measured output $y_s(k-1)$ but its own past values $y_s(k-1)$.

Figure 5.3 depicts the 1-step ahead prediction and the simulation output achieved by the models estimated using PPEM in scenario (a). This sets a reference for the best prediction that can be achieved by an unbiased estimator. Figure 5.4 depicts the 1-step ahead prediction and the simulation output achieved by PPEM in scenario (b), i.e. in the case where the system is not in the model class. It can be observed here that the 1-step ahead prediction error performs worse compared to scenario (a), but not by a great margin. More specifically, the best fit ratio (BFR) drops from 79.2 to 69.4$\%$. However, the simulation of the same model with the given input captures none of the dynamics observed in the output measurements, and achieves a BFR of only 4.2$\%$. This illustrates that when the system does not belong to the model class, the auto-regressive component of the model may contribute in re-aligning the predictions with the measurements, whereas the estimated model of the system does not capture any of the dynamics of the underlying DGS. In this case, the low order identified model cannot be trusted to perform well for other objectives than prediction, like optimal control design or system analysis.

Remark 5.8

It should be highlighted that the results depicted here are subject to high variances, and that the results are not always bleak. The purpose of this example is not to provide a complete empirical analysis, but to merely show that, when the system is not in the model set, a model identified solely based on prediction error may achieve deceptively good performance.

Remark 5.9

Note that a cross-correlation test would reveal that the residual error $\varepsilon _{p}$ is correlated to the input excitation $u_{m}$, implying that the model structure in scenario (b) was not rich enough to capture the dynamics present in the data. However, performing such tests requires human intervention and goes against the research goals of this dissertation. Moreover, correlation tests do not generalize to non-linear dynamics.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Khandelwal, D. (2022). Performance Measures. In: Automating Data-Driven Modelling of Dynamical Systems. Springer Theses. Springer, Cham. https://doi.org/10.1007/978-3-030-90343-5_5

Download citation

DOI: https://doi.org/10.1007/978-3-030-90343-5_5
Published: 03 February 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-90342-8
Online ISBN: 978-3-030-90343-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics