Keywords

figure a

1 Introduction

Despite recent advances, systems described by nonlinear ordinary differential equations are still hard to analyze, control, and verify. On the other hand, a powerful body of methods and theories exists for linear systems making analysis, control, and verification much easier, even for high-dimensional systems. The efficiency of techniques related to reachability analysis for linear systems [4, 6, 15] motivates the use of Koopman operator linearization, where a higher-dimensional linear system approximates the dynamic behavior of a nonlinear system. Koopman operator techniques are also well-suited for data-driven approaches since the Koopman linearized system can be directly created from measurements, bypassing a potentially complex modeling step. The Koopman framework has been successfully applied to many applications, including control [26, 28], state estimation [31] and recently, formal verification [5].

The main contribution of this paper is to advance the state-of-the-art in formal verification using reachability analysis on Koopman operator linearized systems. First, we improve the accuracy of the finite Koopman linearization by employing random Fourier features [29]. In contrast with an ad hoc, finite-dimensional feature space, random Fourier features leverage the powerful kernel trick from machine learning [36, 38] to generate a computationally tractable mapping over an infinite-dimensional feature space. Second, we improve speed. Instead of using an SMT solver to reason over non-convex initial sets, we propose combining Taylor models with polynomial zonotope refinement. A comparison on the same nonlinear system benchmarks used in the earlier Koopman verification work [5] demonstrates both the improved accuracy and the improved verification speed.

1.1 Related Work

The concept of Koopman operator linearization was originally introduced in 1931 [22]. Instead of investigating the dynamic evolution of the original system state, the Koopman approach considers the evolution of so-called observable functions or observables defined by nonlinear transformations of the original system state. Since the set of all possible observables defines a vector space, it then holds that the dynamic behavior of every nonlinear system can be equivalently represented by an infinite dimensional linear system. Because it is obviously infeasible to handle infinite dimensions, a finite set of observables is used in practice. Given such a set, the system matrix resulting in the most accurate linear approximation of the original system behavior can be determined using extended dynamic mode decomposition [41].

Many different methods for determining good observables have been proposed: Carleman linearization [7] equivalently represents the dynamic behavior of polynomial systems with an infinite dimensional linear system. The corresponding observables are multi-variate monomials, which are determined by repeatedly computing the time-derivative of the current observables. Terminating this iteration after a certain number of steps yields a finite set of observables. Carleman linearization can be extended to general nonlinear systems by using a Taylor series expansion. A finite set of observables defines an exact linear representation of the original system if the vector space spanned by the observables is closed under the operation of Lie-derivatives [34]. Consequently, a natural approach is to refine an initial set of observables by removing observables that violate the condition [34]. This concept can be extended to obtain polynomial instead of linear representations for the original nonlinear system [35]. Another class of approaches uses neural networks as observables [16, 43], where the weights of the network are trained on traces of the real system. Since these approaches usually train the system matrix together with neural networks, they circumvent the subsequent application of dynamic mode decomposition. If one aims to reason about the original system based on the Koopman linearization, some quantification of the approximation error is required. Several approaches derive error bounds for truncated Carleman linearization [3, 12, 24] considering quadratic systems [24], polynomial systems [12], as well as general nonlinear systems [3].

The main motivation for using the Koopman framework for reachability analysis is that reachable sets for linear systems can be computed efficiently [11, 15, 23] even for high-dimensional systems [2, 4, 6], while reachability analysis for nonlinear systems [1, 8, 27] is often computationally demanding and potentially results in large over-approximations. Another advantage is that the Koopman approach can also be applied to data-driven systems where no model is available. Due to the nonlinear transformation of the initial state defined by the observables, reachability analysis for Koopman operator linearized system represents a special type of reachability problem. To the best of our knowledge only two approaches exist for far: The first approach [13] utilizes the error bounds for quadratic systems [24] to compute an enclosure of the reachable set for weakly nonlinear systems based on a finite Carleman linearization, where interval arithmetic [17] is applied to enclose the image of the initial set through the observables. The second approach [5], which represents the work closest to our method, presents two different verification strategies: 1) Direct encoding of the nonlinear transformation defined by the observables using a SMT solver, and 2) zonotope domain splitting, where the initial set is recursively split into smaller sets until the specification can be verified or falsified.

1.2 Overview

In this work we address the two main bottlenecks of formal verification for Koopman operator linearized systems, which are the selection of observables and the computation of the image of the initial set through the nonlinear transformation defined by the observables. In particular, while currently observables often have to be selected manually by the user, we generate observables in a systematic fashion using random Fourier features. As we demonstrate with numerical experiments, these observables yield high-accuracy approximations of the real system behavior. Moreover, while previous approaches either compute very conservative convex enclosures of the image through the observables [13] or have to split the initial set in order to achieve a desired precision [5], we calculate tight non-convex enclosures of the image by combining Taylor model arithmetic with polynomial zonotopes. To conduct collision checks between the resulting non-convex reachable set enclosures and unsafe regions we then use a novel polynomial zonotope refinement strategy, which is significantly faster than the previous SMT solver and zonotope domain splitting approaches [5].

The remainder of the paper is structured as follows: We first recapitulate some preliminary results that are required throughout the paper in Sect. 2. In the main part we then describe the systematic generation of observables using random Fourier features in Sect. 3, before we present our proposed verification algorithm in Sect. 4. Finally, we demonstrate the superior performance of random Fourier feature observables and our verification algorithm in comparison with existing techniques on various benchmark systems in Sect. 5.

1.3 Notation

In the remainder of this paper, we will use the following notations: Sets are denoted by calligraphic letters, matrices by uppercase letters, vectors by lowercase letters, and lists by bold uppercase letters. Given a vector \(b \in \mathbb {R}^n\), \(b_{(i)}\) refers to the i-th entry. Given a matrix \(A \in \mathbb {R}^{n \times m}\), \(A_{(i,\cdot )}\) represents the i-th matrix row, \(A_{(\cdot ,j)}\) the j-th column, and \(A_{(i,j)}\) the j-th entry of matrix row i. Given a discrete set of positive integer indices \(\mathcal {H} = \{h_1,\dots ,h_{w} \}\) with \(1 \le h_i \le m ~ \forall i \in \{ 1, \dots , w \}\), \(A_{(\cdot ,\mathcal {H})}\) is used for \([ A_{( \cdot ,h_1 )} ~ \dots ~ A_{( \cdot , h_{w} )} ]\), where [C D] denotes the concatenation of two matrices C and D. The symbols \(\textbf{0}\) and \(\textbf{1}\) represent matrices of zeros and ones of proper dimension, the empty matrix is denoted by [ ], and \(I_n \in \mathbb {R}^{n \times n}\) is the identity matrix. Given an ordered list \(\textbf{L} = (l_1,\dots ,l_n)\), \(\textbf{L}_{(i)} = l_i\) refers to the i-th entry and \(|\textbf{L}| = n\) denotes the number of elements in the list. Moreover, the concatenation of two lists \(\textbf{L}_1\) and \(\textbf{L}_2\) is denoted by \((\textbf{L}_1, \textbf{L}_2)\). The left multiplication of a matrix \(M \in \mathbb {R}^{m \times n}\) with a set \(\mathcal {S} \subset \mathbb {R}^n\) is defined as \(M \mathcal {S} = \{ M s \, | \, s \in \mathcal {S} \}\), and the Cartesian product of two sets is denoted by the \(\times \) operator. We further introduce an n-dimensional interval as \(\mathcal {I} = [l,u],~ \forall i ~ l_{(i)} \le u_{(i)},~ l,u \in \mathbb {R}^n\).

2 Preliminaries

Our approach utilizes several existing techniques and concepts, which we shortly recapitulate here. We use the nonlinear system

$$\begin{aligned} \begin{aligned}&\dot{x}_1 = x_1 \\&\dot{x}_2 = x_2 - x_1^4 \end{aligned} \end{aligned}$$
(1)

in combination with the initial set \(\mathcal {X}_0 = [-2,2] \times [0,4]\) as a running example throughout this section.

2.1 Koopman Operator Linearization

First, we describe the general concept of Koopman operator linearization [22]. Given a nonlinear system

$$\begin{aligned} \frac{\partial x}{\partial t} = f\big (x \big ) ~~ \text {with} ~~ x \in \mathbb {R}^n, ~ f:~\mathbb {R}^n \rightarrow \mathbb {R}^n, \end{aligned}$$
(2)

our goal is to find observables \(g_i: \mathbb {R}^n \rightarrow \mathbb {R}\) such that the dynamics of the resulting new variables \(g_i(x)\) is linear:

$$\begin{aligned} \frac{\partial g\big (x\big )}{\partial t} = A \,g\big (x\big ) ~~ \text {with}~~ A \in \mathbb {R}^{m \times m}, \end{aligned}$$
(3)

where \(g(x) = [g_1(x)~\dots ~g_m(x)]^T\) is the observable function. Since the new variables \(g_i(x)\) are functions of the original system state x, the linear system (3) defines an equivalent representation of the dynamic behavior of the original system (2). Usually, the number of observables m is significantly larger than the dimension n of the original system.

Let us demonstrate Koopman linearization for our exemplary system in (1). By choosing the observables \(g_1(x) = x_1\), \(g_2(x) = x_2\), and \(g_3(x) = x_1^4\) we obtain the linear system

$$\begin{aligned} \frac{\partial }{\partial t} \begin{bmatrix} g_1(x) \\ g_2(x) \\ g_3(x) \end{bmatrix} = \begin{bmatrix} 1~ &{} ~0~ &{} 0 \\ 0~ &{} ~1~ &{} -1 \\ 0~ &{} ~0~ &{} 4 \end{bmatrix} \begin{bmatrix} g_1(x) \\ g_2(x) \\ g_3(x) \end{bmatrix} \end{aligned}$$

since \(\partial g_1(x)/\partial t = \dot{x}_1 = x_1\), \(\partial g_2(x)/\partial t = \dot{x}_2 = x_2 - x_1^4 = g_2(x) - g_3(x)\), and \(\partial g_3(x)/\partial t = 4\, x_1^3 \, \dot{x}_1 = 4 \, x_1^4 = 4 \, g_3(x)\).

The exact linearization using a finite number of observables demonstrated by the example above is unfortunately only possible for a small number of special systems. In practice one therefore usually aims to instead determine a linear system (3) that approximates the dynamic behavior of the nonlinear system (2) well enough. Given observables \(g_i(x)\), the system matrix A resulting in the best approximation can be determined by applying extended dynamic mode decomposition [41] to traces of the original system. Since those traces can also be generated by simulating black-box systems or by measuring the real system behavior, we do not necessarily require a model (2) of the original system. This is one of the biggest advantages of the Koopman framework making it well suited for data-driven approaches. The approach we present in this work verifies Koopman linearized systems using reachability analysis:

Definition 1

(Reachable set) Given an initial set \(\mathcal {X}_0 \subset \mathbb {R}^n\), the reachable set for a Koopman linearized system is

$$\begin{aligned} \mathcal {R}(t) := \big \{ \xi (t,g(x_0)) ~\big |~x_0 \in \mathcal {X}_0 \big \}, \end{aligned}$$

where \(\xi (t,g(x_0))\) is the solution to (3) at time \(t \in \mathbb {R}_{\ge 0}\) for the initial state \(g(x_0)\).

Consequently, to compute the reachable set for a Koopman linearized system one first needs to propagate the initial set through the nonlinear transformation defined by the observables, followed by the calculation of the reachable set for the linear system in (3) using a reachability algorithm. This procedure is visualized in Fig. 1. Definition 1 defines the reachable set for the observables \(g_i(x)\). However, since safety specifications are typically defined on the original system state x rather than on g(x), we usually require the reachable set for the original state \(\mathcal {R}_x(t)\) for verification. This issue can easily be resolved by using the original system state x for the first n observables \(g_i(x) = x_{(i)}\), \(i = 1,\dots ,n\), in which case \(\mathcal {R}_x(t)\) can be obtained via projection: \(\mathcal {R}_x(t) = [I_n~\textbf{0}]\, \mathcal {R}(t)\).

Fig. 1.
figure 1

Schematic visualization of reachability analysis for Koopman linearized systems: We first transform the initial set to the higher-dimensional observable space using g(x), then compute the reachable set of the linear system using the matrix exponential \(e^{A \varDelta t}\) with time-step size \(\varDelta t\), and finally obtain the reachable set in the original state space via projection.

2.2 Taylor Model Arithmetic

Taylor model arithmetic [25] can be utilized to compute tight non-convex enclosures for the image through a nonlinear function. It is based on a set representation called Taylor models:

Definition 2

(Taylor model) Given a polynomial function \(p:~\mathbb {R}^s\rightarrow \mathbb {R}^n\), an interval domain \(\mathcal {D} \subset \mathbb {R}^s\), and an interval remainder \(\mathcal {Y} \subset \mathbb {R}^n\), a Taylor model \(\mathcal {T}(x)\) is defined as

$$\begin{aligned} \forall x \in \mathcal {D}:~~ \mathcal {T}(x) := \big \{ p(x) + y ~\big |~ y \in \mathcal {Y} \big \}. \end{aligned}$$

The Taylor order \(\kappa \in \mathbb {N}\) defines an upper bound for the polynomial degree of the polynomial p(x). The set defined by a Taylor model is

$$\begin{aligned} \big \{\mathcal {T}(x)~\big |~ x \in \mathcal {D} \big \} = \big \{ p(x) + y~\big |~ x \in \mathcal {D},~y \in \mathcal {Y} \big \}. \end{aligned}$$

For a concise notation we use the shorthand \(\mathcal {T}(x) = \langle p(x),\mathcal {Y},\mathcal {D}\rangle _T\).

The general concept of Taylor model arithmetic is to define rules on how to perform the arithmetic operations \(+\), −, \(\cdot \), and / as well as elementary functions such as \(\sin (x)\) or \(\sqrt{x}\) on Taylor models [25, Sec. 2]. Since every nonlinear function represents a composition of arithmetic operations and elementary functions, the image through the function can then be computed by successively evaluating those rules. Given two one-dimensional Taylor models \(\mathcal {T}_1(x) = \langle p_1(x),\mathcal {Y}_1,\mathcal {D} \rangle _T\) and \(\mathcal {T}_2(x) = \langle p_2(x),\mathcal {Y}_2,\mathcal {D}\rangle _T\) the rules for addition and multiplication are for example given as

$$\begin{aligned} \begin{aligned}&\mathcal {T}_1(x) + \mathcal {T}_2(x) := \big \langle p_1(x) + p_2(x), \mathcal {Y}_1 + \mathcal {Y}_2,\mathcal {D} \big \rangle _T \\&\mathcal {T}_1(x) \cdot \mathcal {T}_2(x) := \big \langle p_1(x) \cdot p_2(x), \mathcal {Y}_1 \cdot \mathcal {Y}_2 + \mathcal {I}_1 \cdot \mathcal {Y}_2 + \mathcal {Y}_1 \cdot \mathcal {I}_2,\mathcal {D} \big \rangle _T, \\ \end{aligned} \end{aligned}$$

where \(\mathcal {I}_1 = \{p_1(x)~|~ x \in \mathcal {D} \}\) and \(\mathcal {I}_2 = \{ p_2(x)~|~ x \in \mathcal {D} \}\). The rules for elementary functions are obtained using a finite Taylor series expansion, where the order of the Taylor series is equal to the Taylor order \(\kappa \). For \(\sin (x)\) we for example obtain with \(\kappa = 2\) the rule

$$\begin{aligned} \sin \big (\mathcal {T}_1(x)\big ) := \big \langle \sin (c) + \cos (c) \,(p_1(x)-c) - 0.5 \,\sin (c) \,(p_1(x)-c)^2,\mathcal {Y},\mathcal {D} \big \rangle _T, \end{aligned}$$

where the expansion point c is chosen as \(c = p_1(c_d)\) with \(c_d\) being the center of the domain \(\mathcal {D}\), and the interval \(\mathcal {Y}\) computed according to [25, Sec. 2] encloses the remainder of the Taylor series. Due to the finite Taylor series approximation, Taylor model arithmetic yields a tight enclosure rather than the exact image. The accuracy of the enclosure can be improved by choosing a larger Taylor order.

For our verification approach we apply Taylor model arithmetic to compute the image of the initial set through the observable function. The initial set \(\mathcal {X}_0 = [-2,2] \times [0,4]\) for the exemplary system in (1) can be represented by the Taylor model \(\mathcal {T}(x) = \langle x,\emptyset ,\mathcal {X}_0 \rangle _T\). Applying Taylor model arithmetic to the observable function g(x) defined by the observables \(g_1(x) = x_1\), \(g_2(x) = x_2\), and \(g_3(x) = x_1^4\) then yields the Taylor model

$$\begin{aligned} \big \{g(x)~\big |~ x \in \mathcal {X}_0 \big \} \subseteq \left\langle \begin{bmatrix} x_{1} \\ x_{2} \\ x_{1}^4 \end{bmatrix},\emptyset , [-2,2] \times [0,4] \right\rangle _T, \end{aligned}$$
(4)

which represents the exact image in this case since the observables contain polynomial functions only.

2.3 Set Representations

In this work we use polynomial zonotopes to represent reachable sets, polytopes to represent unsafe sets, and zonotopes for efficient collision checking. Let us first introduce polytopes, for which we consider the halfspace representation:

Definition 3

(Polytope) Given a matrix \(H\in \mathbb {R}^{s\times n}\) and vector \(d\in \mathbb {R}^{s}\), the halfspace representation of a polytope \(\mathcal {P} \subset \mathbb {R}^n\) is defined as

$$\begin{aligned} \mathcal {P} := \{x \in \mathbb {R}^n ~|~ H\,x \le d\}. \end{aligned}$$

We use the shorthand \(\mathcal {P} = \langle H,d\rangle _{P}\).

A halfspace \(\mathcal {H} \subset \mathbb {R}^n\) is a special case of a polytope consisting of a single inequality constraint \(h^T \, x \le d\) with \(h\in \mathbb {R}^n\), \(d\in \mathbb {R}\). We use the shorthand \(\mathcal {H} = \langle h,d\rangle _H\). Another special type of polytopes are zonotopes, which can be stored efficiently using so-called generators:

Definition 4

(Zonotope) Given a center vector \(c \in \mathbb {R}^n\) and a generator matrix \(G \in \mathbb {R}^{n \times p}\), a zonotope \(\mathcal {Z} \subset \mathbb {R}^n\) is defined as

$$\begin{aligned} \mathcal {Z} := \bigg \{ c + \sum _{i=1}^p \alpha _i \, G_{(\cdot ,i)} ~ \bigg | ~ \alpha _i \in [-1,1] \bigg \}, \end{aligned}$$

where the scalars \(\alpha _i\) are called factors. We use the shorthand \(\mathcal {Z} = \langle c,G \rangle _Z\).

Polynomial zonotopes are a novel non-convex set representation that has been originally introduced for reachability analysis of nonlinear systems [1]. We use the sparse representation of polynomial zonotopes [20]Footnote 1:

Definition 5

(Polynomial zonotope) Given a constant offset \(c \in \mathbb {R}^n\), a generator matrix of dependent generators \(G \in \mathbb {R}^{n \times h}\), a generator matrix of independent generators \(G_I \in \mathbb {R}^{n \times q}\), and an exponent matrix \(E \in \mathbb {N}_{0}^{p \times h}\), a polynomial zonotope \(\mathcal {P}\mathcal {Z} \subset \mathbb {R}^n\) is defined as

$$\begin{aligned} \mathcal {P}\mathcal {Z} := \bigg \{ c + \sum _{i=1}^h \bigg ( \prod _{k=1}^p \alpha _k ^{E_{(k,i)}} \bigg ) G_{(\cdot ,i)} + \sum _{j=1}^{q} \beta _j G_{I(\cdot ,j)} ~ \bigg | ~ \alpha _k, \beta _j \in [-1,1] \bigg \}. \end{aligned}$$

The scalars \(\alpha _k\) are called dependent factors since a change in their value affects multiplication with multiple generators. Consequently, the scalars \(\beta _j\) are called independent factors because they only affect multiplication with one generator. We use the shorthand \(\mathcal {P}\mathcal {Z} = \langle c,G, G_I, E \rangle _{PZ}\).

Using polynomial zonotopes for verification has two main advantages:

  1. 1.

    Due to the similarity with Taylor models the set defined by a Taylor model can be equivalently represented as a polynomial zonotope [20, Prop. 4].

  2. 2.

    Due to the similarity with zonotopes tight enclosing zonotopes can be computed efficiently for polynomial zonotopes [20, Prop. 5].

For verification we therefore convert the Taylor model representing the image of the initial set through the observable function to a polynomial zonotope, for which collision checks with the unsafe sets can be efficiently realized using zonotope enclosures that are iteratively refined by splitting the polynomial zonotope.

The conversion of the Taylor model in (4) corresponding to our running example in (1) yields the following polynomial zonotope

$$\begin{aligned} \begin{aligned}&\left\langle \begin{bmatrix} x_{1} \\ x_{2} \\ x_{1}^4 \end{bmatrix},\emptyset , [-2,2] \times [0,4] \right\rangle _T = \left\langle \begin{bmatrix} 0 \\ 2 \\ 0 \end{bmatrix},\begin{bmatrix} 2~ &{} ~0~ &{} 0 \\ 0~ &{} ~2~ &{} 0 \\ 0~ &{} ~0~ &{} 16 \end{bmatrix},[~],\begin{bmatrix} 1~ &{} ~0~ &{} ~4 \\ 0~ &{} ~1~ &{} ~0 \end{bmatrix} \right\rangle _{PZ} \\&~~~~~~~~~~~~~~~~~~~~ = \left\{ \left. \begin{bmatrix} 0 \\ 2 \\ 0 \end{bmatrix} + \begin{bmatrix} 2 \\ 0 \\ 0\end{bmatrix} \alpha _1 + \begin{bmatrix} 0 \\ 2 \\ 0 \end{bmatrix}\alpha _2 + \begin{bmatrix} 0 \\ 0 \\ 16 \end{bmatrix} \alpha _1^4 ~\right| ~ \alpha _1,\alpha _2 \in [-1,1] \right\} , \end{aligned} \end{aligned}$$

where the high-level idea of the conversion is to represent the interval domain \(\mathcal {D}\) with dependent zonotope factors \(\alpha _i \in [-1,1]\).

3 Linearization via Fourier Features

We now present the automated generation of observables using random Fourier features [10]. Let us first motivate why Fourier features are a good choice for observables. For Koopman linearization, the observables g(x) define a transformation to a high-dimensional space. One commonly used approach to handle such high-dimensional spaces efficiently is the kernel trick: In many algorithms the data points \(x,y \in \mathbb {R}^n\) only appear in the form of inner products \(g(x)^T g(y)\). In this case it suffices to define a kernel function k(xy) that represents the similarity measure \(g(x)^T g(y)\) between data points in the high-dimensional feature space, rather than explicitly defining a transformation g(x) to this space. Kernel functions can also represent more general features that are not vectors and even infinite dimensional features, which motivates their application in the Koopman framework. The kernel trick is mainly applied for machine learning techniques [36], such as regression [38], clustering [18], and classification [39]. However, also the extended dynamic mode decomposition algorithm [41] can be formulated in terms of inner-products [42], so that the kernel trick can be applied for Koopman linearization. Rather than explicitly choosing observables g(x) we can therefore select a kernel function instead, which implicitly defines the observable function g(x) through the kernel’s relation to an inner product space. Commonly used kernels are radial basis function kernels, polynomial kernels, and spline kernels.

The kernel trick cannot be applied directly to our reachability technique since we require an explicit formulation of the observables g(x). We therefore first select a kernel function k(xy), and then determine observables g(x) that yield a good approximation of the kernel function \(k(x,y) \approx g(x)^T g(y)\). Random Fourier features are a common technique to approximate kernel functions [10, 29]. They are based on Bochner’s theorem [33, Sec. 1.4.3], which links a weakly stationary kernel function to a Fourier transform:

$$\begin{aligned} k(x,y) = \int _{\mathbb R^n} e^{\, j\, \omega ^T( x - y)} \, d\mu (\mathbf \omega ) = \mathbb E_{\mathbf \omega } \left( e^{\, j\,\omega ^T x} ~ \overline{e^{\, j\,\omega ^T y }} \right) , \end{aligned}$$
(5)

where the function \(\mu :~\mathbb {R}^n \rightarrow [0,1]\) defines a probability distribution, \(\mathbb {E}_\omega (\cdot )\) denotes the expected value with respect to \(\omega \), \(j\) is the imaginary unit, and \(\overline{a}\) denotes the complex conjugate for a complex number \(a \in \mathbb {C}\). The distribution \(\mu (\mathbf \omega )\) associated with a specific kernel can be obtained by taking the inverse Fourier transform of k(xy) [29]. We can collect m samples from the distribution \(\mu (\omega )\) to approximate the expected value in (5), which finally yields

$$\begin{aligned} k(x, y) = \mathbb E_{\mathbf \omega } \left( e^{\, j\,\omega ^T x} ~ \overline{e^{\, j\,\omega ^T y }} \right) \approx \frac{1}{m} \sum _{i=1}^{m} \, \underbrace{ e^{\, j\, \omega _i^T x}}_{g_i(x)} ~ \underbrace{e^{\, j\, \omega _i^T y}}_{g_i(y)}. \end{aligned}$$

The random Fourier features are the resulting observables \(g_i(x)\) that approximate the kernel function. Note that we can omit the constant factor \(\frac{1}{m}\) since extended dynamic mode decomposition will automatically scale the observables accordingly. We consider real-valued kernels only, so we use Euler’s formula \(e^{j\,x} = \cos (x) + j\, \sin (x)\) to simplify the random Fourier features to

$$\begin{aligned} g_i(x) = \sqrt{2} \, \cos (\omega _i^T x + b_i),~~i = 1,\dots ,m, \end{aligned}$$
(6)

where the shift \(b_i\) is selected uniformly from the interval \([0,2 \pi ]\) and \(\omega _i\) is drawn randomly from the probability distribution \(\mu (\omega )\) corresponding to the kernel that is used. While this random selection might appear to be a disadvantage at first sight, it is guaranteed that the random Fourier feature approximation converges to the exact kernel function when increasing the number of observables [29]. Moreover, we observed from our numerical experiments that changes in the values for \(b_i\) and \(\omega _i\) do not significantly influence the accuracy of the resulting linear approximation.

In summary, the random Fourier features presented above represent a systematic method for selecting a finite set of accurate observables, which requires only few hyperparameters. These hyperparameters include the type of kernel that is used, the kernel parameters, and the number of observables. For the numerical experiments in this paper we use a radial basis function kernel

$$\begin{aligned} k(x,y) = e^{-\frac{\Vert x - y \Vert _2^2}{2 \, \ell ^2}}, \end{aligned}$$

which contains the lengthscale \(\ell \) as the only parameter. The probability distribution \(\mu (\omega )\) for this kernel is the multivariate normal distribution with covariance matrix \(\ell ^2 \cdot I_n\) centered at the origin [29, Fig. 1].

4 Verification Using Reachability Analysis

We now present our novel verification algorithm for Koopman linearized systems, which is summarized in Algorithm 1. For simplicity we assume that the specification we aim to verify is described by a single unsafe set \(\mathcal {U}\), but the extension to multiple unsafe sets is straightforward. We first apply Taylor model arithmetic (see Sect. 2.2) to compute a tight non-convex enclosure for the image of the initial set \(\mathcal {X}_0\) through the observable function g(x) in Line 3. Since it simplifies the computation of the zonotope enclosures required later on, we then convert the resulting Taylor model to a polynomial zonotope in Line 4. This polynomial zonotope is used as the initial set for the computation of the reachable set for the Koopman linearized system as performed in Line 5, for which we can use any reachability algorithm for linear systems. For simplicity we assume here that the obtained reachable sets are exact. In the general case where the exact reachable set cannot be computed one can for example incorporate the error measures from [14] and [40] into the verification algorithm.

figure b

The problem we are facing now is that the reachable sets \(\mathcal {R}_0,\dots ,\mathcal {R}_{t_F/\varDelta t}\) are represented by polynomial zonotopes, a set representation for which exact collision checks with the unsafe set \(\mathcal {U}\) are computationally demanding. We resolve this issue by applying a novel polynomial zonotope refinement procedure in lines 619, where we recursively split the polynomial zonotopes until we can either verify or falsify the specification using zonotope enclosures of the split sets. In particular, we first enclose each polynomial zonotope in the queue \(\textbf{L}\) with a zonotope in Line 9. For a zonotope \(\mathcal {Z} = \langle c,G\rangle _Z\) collision checks with an unsafe set as performed in Line 10 are very efficient: If the unsafe set is a halfspace \(\mathcal {U} = \langle h, d\rangle _H\), we have according to [15, Sec. 5.1]

$$\begin{aligned} (\mathcal {Z} \cap \mathcal {U} \ne \emptyset ) \Leftrightarrow \bigg ( h^T c - \sum _{i=1}^p |h^T G_{(\cdot ,i)})| \le d\bigg ) \end{aligned}$$
(7)

For general polytopes \(\mathcal {U} = \langle H,d\rangle _P\) collision checks can be realized using linear programming:

$$\begin{aligned} (\mathcal {Z} \cap \mathcal {U} \ne \emptyset ) \Leftrightarrow (\delta = 0), \end{aligned}$$
(8)

where

$$\begin{aligned} \delta = \min _{\alpha ,x} \Vert c + G \alpha - x \Vert _1 ~~ \text {s.t.} ~~ \alpha \in [-\textbf{1},\textbf{1}], ~ Hx \le d. \end{aligned}$$
(9)

If the specification cannot be verified, we next try to falsfy it in lines 1113 by extracting the initial point \(x_0\) that is expected to violate the specification the most from \(\mathcal {Z}\). For a halfspace \(\mathcal {U} = \langle h,d\rangle _H\) the vector of zonotope factors \(\alpha = [\alpha _1 ~\dots ~\alpha _p]^T\) resulting in the largest violation is given as \(\alpha = -\text {sign}(h^T G)\), where the signum function is interpreted elementwise. Since the factors \(\alpha \) of the zonotope enclosure are related to the dependent factors of the original polynomial zonotope and since polynomial zonotopes preserve dependencies during reachability analysis [21], we can then directly extract the initial point \(x_0\) corresponding to \(\alpha \) from the polynomial zonotope. For general polytopes we can use the optimal \(\alpha \) from the linear program in (9) to estimate the most critical initial point. If we can neither verify nor falsify the specification we have a so called spurious counterexample that arises due to the over-approximation introduced by the zonotope enclosure. We therefore split the polynomial zonotope in this case in Line 15 since splitting reduces the over-approximation in the zonotope enclosure (see Fig. 2). The split sets are then added to the queue in Line 16, where we use a first-in, first-out scheme for the queue to detect easy falsifications fast before excessively splitting the sets.

Fig. 2.
figure 2

Reachable set for the Roessler system (see Sect. 5.1) at time \(t = 2.95\), where polynomial zonotopes are depicted by solid lines, the corresponding zonotope enclosures are depicted by dashed lines, and the unsafe set is shown in orange. While the zonotope enclosure of the original polynomial zonotope is too conservative to verify the specification (left), splitting the polynomial zonotope once reduces the over-approximation enough for verification to succeed (right).

One remaining issue we are facing is that Taylor model arithmetic is not exact. Due to the over-approximation in the initial set it can therefore happen that we can neither verify nor falsify the specification by splitting the polynomial zonotope. To solve this issue we embed our whole algorithm into a repeat-until-loop that iteratively increases the order \(\kappa \) used for Taylor model arithmetic (see Line 20). Since Taylor model arithmetic converges to the exact result if the order goes to infinity, we obtain a complete algorithm that is guaranteed to terminate. In practice we can often prevent computational expensive iterations of the outer loop by choosing the initial order \(\kappa _0\) large enough. It remains to decide when to stop splitting the polynomial zonotopes and increase the Taylor order instead (see Line 19). The simplest method is to just use an upper bound for the number of recursive splits that are performed. A more sophisticated approach is to abort splitting if the distance between the most critical point \([I_n~\textbf{0}] \, e^{At}g(x_0)\) and the unsafe set \(\mathcal {U}\) is smaller than the over-approximation in the polynomial zonotope \(\mathcal {P}\mathcal {Z}\), which is given by the independent generators.

Finally, we provide a closed-form expression for splitting a polynomial zonotope since this operation is not specified in the original work [20]:

Proposition 1

(Split) Given a polynomial zonotope \(\mathcal {P}\mathcal {Z} = \langle c,G,G_I,E \rangle _{PZ} \subset \mathbb {R}^n\) and the index \(r\in \{1,\dots ,p\}\) of one dependent factor, the operation \({ \texttt {split}}(\mathcal {P}\mathcal {Z},r)\) returns two polynomial zonotopes \(\mathcal {P}\mathcal {Z}_1\), \(\mathcal {P}\mathcal {Z}_2\) satisfying \(\mathcal {P}\mathcal {Z}_1 \cup \mathcal {P}\mathcal {Z}_2 = \mathcal {P}\mathcal {Z}\):

$$\begin{aligned}&\mathcal {P}\mathcal {Z}_1 = \Big \langle c,\big [ \widehat{G}^{(1)}_{1} ~ \dots ~ \widehat{G}^{(1)}_{h} \big ] , G_I,\big [ \widehat{E}_{1} ~ \dots ~ \widehat{E}_{h} \big ] \Big \rangle _{PZ}\\&\mathcal {P}\mathcal {Z}_2 = \Big \langle c,\big [ \widehat{G}^{(2)}_{1} ~ \dots ~ \widehat{G}^{(2)}_{h} \big ] , G_I,\big [ \widehat{E}_{1} ~ \dots ~ \widehat{E}_{h} \big ] \Big \rangle _{PZ} \end{aligned}$$

with

$$\begin{aligned} \begin{gathered} \widehat{E}_{i} = \begin{bmatrix} E_{(\{1,\dots ,r-1 \},i)} &{} E_{(\{1,\dots ,r-1 \},i)} &{} \dots &{} E_{(\{1,\dots ,r-1 \},i)} &{} E_{(\{1, \dots ,r-1\},i)} \\ 0 &{} 1 &{} \dots &{} E_{(r,i)}-1 &{} E_{(r,i)} \\ E_{(\{r+1,\dots ,p \},i)} &{} E_{(\{r+1,\dots ,p \},i)} &{} \dots &{} E_{(\{r+1,\dots ,p \},i)} &{} E_{(\{r+1, \dots ,p\},i)} \end{bmatrix}, \\ \widehat{G}^{(k)}_{i} = \begin{bmatrix} b_{i,0}^{(k)} \cdot G_{(\cdot ,i)}&\dots&b_{i,E_{(r,i)}}^{(k)} \cdot G_{(\cdot ,i)} \end{bmatrix}, \\ b_{i,j}^{(1)} = 0.5^{E_{(r,i)}} {{E_{(r,i)}}\atopwithdelims (){j}} , ~~ b_{i,j}^{(2)} = -0.5^{E_{(r,i)}} \big (2 (E_{(r,i)} \, \textrm{mod} \,2)-1\big ) {{E_{(r,i)}}\atopwithdelims (){j}}, \end{gathered} \end{aligned}$$

where \(x\, \textrm{mod} \, y\), \(x,y \in \mathbb {N}_0\) is the modulo operation and \({w} \atopwithdelims (){z}\), \(w,z \in \mathbb {N}_0\) denotes the binomial coefficient. To remove redundancies we subsequently apply the compact operation as defined in [20, Prop. 2] to \(\mathcal {P}\mathcal {Z}_1\) and \(\mathcal {P}\mathcal {Z}_2\).

Proof

The split operation is based on the substitution of the selected dependent factor \(\alpha _r\) with two new dependent factors \(\alpha _{r,1}\) and \(\alpha _{r,2}\):

$$\begin{aligned} \begin{aligned}&\big \{ \alpha _r~|~ \alpha _r\in [-1,1] \big \} = \big \{ 0.5 (1 + \alpha _{r,1}) - 0.5(1 + \alpha _{r,2}) ~|~ \alpha _{r,1},\alpha _{r,2} \in [-1,1] \big \} \\&~~~~~~~~\big \{ 0.5(1 + \alpha _{r,1}) ~|~ \alpha _{r,1} \in [-1,1] \big \} \cup \big \{ -0.5(1 + \alpha _{r,2}) ~|~ \alpha _{r,2} \in [-1,1] \big \}. \end{aligned} \end{aligned}$$
(10)

Inserting this substitution into the definition of polynomial zonotopes in Definition 5 yields

Finally, with

$$\begin{aligned}&\Big ( \frac{1 + \alpha _{r,1}}{2} \Big )^{E_{(r,i)}} = b_{i,0}^{(1)} + b_{i,1}^{(1)} \alpha _{r,1} + b_{i,2}^{(1)} \alpha _{r,1}^2 + \dots + b_{i,E_{(r,i)}}^{(1)} \alpha _{r,1}^{E_{(r,i)}} \\&\Big ( \frac{1 + \alpha _{r,2}}{-2} \Big )^{E_{(r,i)}} = b_{i,0}^{(2)} + b_{i,1}^{(2)} \alpha _{r,2} + b_{i,2}^{(2)} \alpha _{r,2}^2 + \dots + b_{i,E_{(r,i)}}^{(2)} \alpha _{r,2}^{E_{(r,i)}} \end{aligned}$$

we obtain the equations above.

The split operation for polynomial zonotopes is not exact, meaning that the resulting sets usually overlap (see Fig. 2). To minimize the size of the overlapping region we split the dependent factor with index \(r\) that maximizes the following heuristic:

$$\begin{aligned} \max _{r\in \{1,\dots ,p\}} \sum _{\begin{array}{c} i = 1 \\ E_{(r,i)} > 1 \end{array}}^h \left( 1 - 0.5^{E_{(r,i)}} \right) \, \Vert G_{(\cdot ,i)} \Vert _2, \end{aligned}$$
(11)

where \(G \in \mathbb {R}^{n \times h}\) and \(E \in \mathbb {N}_{0}^{p \times h}\) are the generator and exponent matrix of the polynomial zonotope. Moreover, since the goal of splitting in Algorithm 1 is to verify a certain specification, it is advisable to first project the polynomial zonotope onto the halfspace normal directions of the unsafe set \(\mathcal {U}\) before evaluating the heuristic (11) in order to direct the splitting process towards directions that are beneficial for verification.

Note that the polynomial zonotope refinement technique presented in this section is not restricted to verification of Koopman linearized systems, but can equally be applied for collision checks of polynomial zonotopes or Taylor models with halfspaces and polytopes in general. Moreover, by inverting the inequality constraints polynomial zonotope refinement can also be applied to check if a Taylor model or polynomial zonotope is contained in a halfspace or polytope.

5 Experimental Results

We now evaluate the performance of random Fourier feature observables and our novel reachability algorithm on various benchmark systems. For this, we compare our approach with the closest method from the literature [5]. Since the algorithms presented there are implemented in Julia, we also implemented our approach in Julia to obtain a fair comparison of the computation time. In our implementation we use the package TaylorModels.jlFootnote 2 for Taylor model arithmetic and the package DataDrivenDiffEq.jlFootnote 3 for extended dynamic mode decomposition. All computations are carried out on a 3.2 GHz 8-core AMD Ryzen 7 5800H processor with 16 GB memory. We published our implementation together with a repeatability package that reproduces the results shown in this paper as a CodeOcean compute capsuleFootnote 4.

5.1 Benchmarks

Let us first define all benchmarks that we use for the evaluation. Again, we consider the same systems and specifications as in [5] for a fair comparison:

Roessler Attractor: The dynamic equations for the Roessler attractor [32] are

$$\begin{aligned} \dot{x}_1&= -x_2 -x_3\\ \dot{x}_2&= x_1 + 0.2\, x_2\\ \dot{x}_3&= 0.2+x_3 \, (x_1-5.7), \end{aligned}$$

and we consider the initial set \(\mathcal {X}_0 = [-0.05,0.05] \times [-8.45,-8.35] \times [-0.05,0.05]\), the final time \(t_F = 6\), and the unsafe region \(x_2 \ge 6.375 - 0.025 \cdot i\) parameterized by \(i \in [0, 20]\).

Steam Governor: The dynamic equations for the steam governor [37] are

$$\begin{aligned} \dot{x}_1&= x_2\\ \dot{x}_2&= x_3^2 \, \sin (x_1) \, \cos (x_1)- \sin (x_1) -3 \, x_2\\ \dot{x}_3&= \cos (x_1) - 1, \end{aligned}$$

and we consider the initial set \(\mathcal {X}_0 = [0.95,1.05] \times [-0.05,0.05] \times [0.95,1.05]\), the final time \(t_F = 3\), and the unsafe set \(x_2 \le -0.25 + 0.01 \cdot i\) parameterized by \(i \in [0, 10]\).

Coupled Van-der-Pol Oscillator: The dynamic equations for the coupled Van-der-Pol oscillator [30] are

$$\begin{aligned} \begin{array}{ll} \dot{x_1} = x_2 &{} \dot{x_3}=x_4\\ \dot{x_2} =(1-x^2_1)\, x_2 - x_1 + (x_3-x_1)~~~~~~ &{} \dot{x_4}=(1-x^2_3)\, x_4-x_3+(x_1-x_3), \end{array} \end{aligned}$$

and we consider the initial set \(\mathcal {X}_0 = [-0.025,0.025] \times [0.475,0.525] \times [-0.025,0.025] \times [0.475,0.525]\), the final time \(t_F = 2\), and the unsafe set \(x_1\ge 1.25 - 0.05 \cdot i\) parameterized by \(i \in [1,16]\).

Biological System: The dynamic equations for the biological system [19] are

$$\begin{aligned} \begin{array}{ll} \dot{x_1} = -0.4 \, x_1 + 5 \, x_3 \, x_4 ~~~~~~~ &{} \dot{x_5}=-5 \, x_5 \, x_6 + 5 \, x_3 \, x_4\\ \dot{x_2} =0.4\, x_1 -x_2 &{}\dot{x_6}=0.5\, x_7 - 5\, x_5 \, x_6\\ \dot{x_3}=x_2-5\, x_3\, x_4 &{}\dot{x_7}=-0.5\, x_7 + 5\, x_5\, x_6, \\ \dot{x_4} = 5 \, x_5 \, x_6 - 5 \, x_3 \, x_4 &{} \end{array} \end{aligned}$$

and we consider the initial set \(\mathcal {X}_0 = [0.99,1.01] \times \dots \times [0.99,1.01]\), the final time \(t_F = 2\), and the unsafe set \(x_4\le 0.883 + 0.002 \cdot i\) parameterized by \(i \in [0, 20]\).

Fig. 3.
figure 3

Relative simulation error between Koopman linearized systems and the original nonlinear system in percent.

5.2 Approximation Error

We first investigate the accuracy of the Koopman linearized system with respect to the original nonlinear dynamics, where we compare our random Fourier feature observables with the ad hoc observables from [5]. These ad hoc observables consist of multi-variate polynomials of the system state x up to a fixed order, trigonometric functions of the time t, and combinations of these (e.g., \(x_1\,x_2 \sin ^2(t) \cos (t)\)). To obtain the data traces required for extended dynamic mode decomposition we simulate the original nonlinear systems for 500 points sampled from the corresponding initial set, where a Sobol sequence is used for sampling. For the generation of the random Fourier feature observables according to (6) we use the parameter \(\ell = 0.3\) and \(m=71\) for the Roessler attractor, \(\ell = 1.62\) and \(m = 72\) for the steam governor, \(\ell = 1.24\) and \(m = 132\) for the coupled Van-der-Pol oscillator, and \(\ell = 1.81\) and \(m = 105\) for the biological system, where \(\ell \) is the lengthscale parameter of the kernel and the number of observables m is chosen identical to the one used for the ad hoc observables [5]. As a measure for the accuracy we use the Euclidean distance between simulated trajectories for the original nonlinear system and the Koopman linearized system. The initial points for these trajectories are the center and the vertices of the initial set. According to Fig. 3 random Fourier feature observables are for the steam governor and the Roessler attractor more accurate than than the ad hoc observables used in earlier work [5]. Moreover, while for the short time horizons considered in Fig. 3 it seems that the ad hoc observables are more precise for the coupled Van-der-Pol oscillator and the biological system, over longer time horizons the error of the ad hoc observables is exploding. This is visualized in Fig. 4, where the trajectory corresponding to the ad hoc observables progresses into a completely different direction than the original system, while random Fourier features stay accurate. In this way, random Fourier features are not only a more systematic approach for choosing observables, but also improve the precision of the resulting Koopman linearized system.

Fig. 4.
figure 4

Comparison of simulations for Koopman linearized systems with the ground truth from the original nonlinear system for a time horizon of \(t_F = 10\), where the biological system is shown on the left and the coupled Van-der-Pol oscillator is shown on the right.

5.3 Verification Using Reachability Analysis

We now compare our novel verification algorithm for Koopman linearized systems with the verification strategies presented in [5]. In particular, we compare to verification of the original nonlinear system using Flow* [9], direct encoding of nonlinear constraints using a SMT solver [5, Sec. 4.1], and zonotope domain splitting [5, Sec. 4.4]. Both approaches from [5] consider discrete-time safety, where the system is considered to be safe if the specification is satisfied at time points \(0,\varDelta t,2\varDelta t, \dots , t_F\) with \(\varDelta t = 0.05\). While our verification algorithm also supports continuous-time safety, we consider discrete-time safety here to obtain a fair comparison. Note that for discrete-time safety the reachable set computation in Line 5 of Algorithm 1 simplifies to \(\mathcal {R}_i = [I_n~\textbf{0}]\, e^{A i \varDelta t} \, \mathcal {X}_0\), \(i = 0,\dots ,t_F/\varDelta t\). For the comparison we consider both, the ad hoc observables used in [5] as well as the random Fourier feature observables presented here.

Table 1. Computation time in seconds for verification or falsification of the benchmark systems from Sect. 5.1 using different approaches, where the symbol − indicates that the computation timed-out after 2 h. The parameter i specified in the second column changes the specification, and the third column shows weather the specification can be verified or falsified.

The resulting computation times for verification are summarized in Table 1. For all benchmark instances our novel verification algorithm has the lowest computation time, and is often even magnitudes faster than the other verification approaches. The main reason for this is that with our polynomial refinement strategy we can completely avoid the computational expensive calls to SMT solvers used by the other methods. Moreover, while the computation time for the other approaches often depends on how difficult it is to verify or falsify the specification, our algorithm exhibits roughly equal runtimes for all specifications. The explanation for this is that the polynomial zonotope refinement approach that we use for the collision checks with unsafe sets is very efficient, so that the majority of the runtime is spent on the computation of the image through the observable function using Taylor model arithmetic, a task which is independent from the specification. Interestingly, using random Fourier features instead of ad hoc observables can either prolong or accelerate the verification process, depending on the benchmark instance and verification approach used. However, even if they prolong the time required for verification in some cases, the usage of random Fourier feature observables can be justified by their superior accuracy demonstrated in Sect. 5.2. Yet another observation is that direct encoding and zonotope domain splitting are not able to verify or falsify the high-dimensional biological model at all if random Fourier feature observables are used. The reason for this is that both of these approaches apply an SMT solver for verification, which do not scale to high-dimensions and are not well-suited for handling the trigonometric functions as well as the high coupling between variables used for random Fourier feature observables. So in summary our proposed verification algorithm outperforms all exiting verification techniques for Koopman linearized systems in terms of runtime. In addition, it handles different types of observables well and scales to high-dimensional systems.

6 Conclusion

We presented two major improvements for reachability analysis of Koopman operator linearized systems: First, we use random Fourier features as observable functions, which yields a systematic approach requiring much less user insight than previous methods. Second, we handle the nonlinear transformation of the initial state by combining Taylor model arithmetic with polynomial zonotope refinement. As demonstrated on several nonlinear system benchmarks, the combination of these two techniques is both extremely accurate and extremely fast.

The main trade-off with Koopman linearized systems is that the guarantees are on the system approximation, not the original system. Despite this, we believe the method could still be useful for verification in systems engineering, where the goal is to produce evidence that the system meets its requirements. It could also be effective for finding unsafe counterexamples—falsification—or to analyze systems where only simulation code is provided, or even real-world systems where sensor measurements could be used to create a Koopman linearized model for analysis. As such systems do not have models given with symbolic differential equations, most traditional reachability methods cannot be applied.