Keywords

1 Introduction

End-to-end deep neural network controllers have been extensively used in executing complex and safety-critical autonomous systems in recent years [13, 40, 51, 52]. In particular, high-dimensional controllers (HDCs) based on images and other high-dimensional inputs have been applied in areas such as autonomous car navigation [49, 61] and aircraft landing guidance [47]. For example, recent work has shown the high performance of controlling aircraft to land on the runway with a vision-based controller [65]. For such critical applications, it is important to develop techniques with strong safety guarantees for HDC-controlled systems.

However, due to the high-dimensional nature of the input space, modern verification cannot be applied directly to systems controlled by HDCs [2, 43]. Current closed-loop verification tools, such as NNV [54], Verisig [30], Sherlock [18], and ReachNN* [28], are capable of combining a dynamical system and a low-dimensional controller (LDC) to verify a safety property starting from an initial region of the low-dimensional input space, such as position-velocity states of a car. DeepReach [5] has pushed the boundary of applying Hamilton-Jacobi (HJ) reachability to systems with tens of state dimensions. However, such verification tools fail to scale for an input with thousands of dimensions (e.g., an image). One issue is that the dynamics of these dimensions are impractical to describe. Furthermore, the structure of an HDC is usually more complicated than that of an LDC, with convolution and pooling layers. For example, an image-based HDC may have hundreds of layers with thousands of neurons, whereas an LDC usually contains several layers with dozens of neurons, making HDC verification difficult.

Fig. 1.
figure 1

Our verification approach for systems with high-dimensional controllers.

To deal with these challenges, researchers have built perception abstractions into the verification process. One work [31] verified a generative adversarial network (GAN) that creates images from states. Such methods cannot guarantee the GAN’s accuracy or relation to reality, which becomes a major falsifiable assumption of their verification outcomes. Another work [47] built a precise mathematical model capturing the exact relationship between states and image pixels to verify the image-based controller, which is effortful and needs to be redone for each system. Inspired by previous work on decreasing the dimensions, we skillfully create verifiable low-dimensional controllers from high-dimensional ones.

This paper proposes an end-to-end methodology to verify systems with HDCs by employing the steps displayed in Fig. 1. Instead of verifying an HDC’s safety directly over a complicated input space, our key idea is to approximate it with several LDCs so that we can reduce the HDC reachability problem to several LDC reachability problems. A crucial step is to upper-bound the difference between LDC and HDC, which we do statistically. Finally, we extend the reachable sets with the statistical bounds to obtain a safety guarantee for the HDC.

Since the input space and structure of the HDC are too complex to verify, we leverage knowledge distillation [25]—a model compression method—to train simplified “student models” (LDC) based on the information from the sophisticated “teacher model” (HDC). This training produces an LDC that is lightweight and amenable to closed-loop verification because it operates on dynamical states, not images. Moreover, due to the importance of the Lipschitz to minimizing the overapproximation error [28, 29], our methodology adopts two-objective gradient descent [21], decreasing both the approximation error and Lipschitz constant.

After training the LDCs, we calculate the statistical upper bound of the discrepancy between the two controllers, since obtaining the true discrepancy is impractical. To this end, we rely on conformal prediction [22, 45, 48], one of the cutting-edge statistical methods to provide a lower bound of the confidence interval for prediction residuals without distributional assumptions or explicit dependency on the sample count. We propose two conformal techniques to quantify the difference between HDC- and LDC-controlled systems, by bounding: (i) the distance between their trajectories, and (ii) the difference between the actions produced by the HDC and LDC. We inflate reachable sets of the LDC system based on both bounds to obtain safety guarantees on the HDC system.

We evaluate our approach on three popular case studies in OpenAI Gym [7]: inverted pendulum, mountain car, and cartpole. Our contributions are three-fold:

  1. 1.

    Two verification approaches for high-dimensional controllers that combine reachability analysis and statistical inference to provide a safety guarantee for systems controlled by neural networks with thousands of inputs.

  2. 2.

    A novel neural-network approximation technique for training multiple LDCs that collectively mimic an HDC and reduce overapproximation error.

  3. 3.

    An implementation and evaluation of our verification approaches on three case studies: inverted pendulum, mountain car, and cartpole.

Section 2 provides the background and our problem. Section 3 describes our verification approach, which is evaluated in Sect. 4. Finally, we review the related work in Sect. 5 and conclude in Sect. 6. More details are in the extended online version [67].

2 Background and Problem Setting

High- and Low-Dimensional Systems. The original high-dimensional closed-loop system is a tuple \(M_{hd} = (S, Z, U, s_0, f, c_{hd}, g)\). Here, the S is the state space, Z is the high-dimensional sensor space of so-called “images” (e.g., camera images or LIDAR scans), and the U is the control action space, \(s_0\) is the initial state, \(f:S \times U \rightarrow S\) is the dynamics, and \(c_{hd}: Z \times S \rightarrow U\) is the HDC. Note that \(c_{hd}\) only uses a subset of state dimensions as input (e.g., a convolutional neural network with image and velocity inputs, but not position), getting the rest of the information from the image.

For mathematical convenience, we also define an (unknown) deterministic state-to-image generator as \(g: S \rightarrow Z\) and the role and assumptions of generator g are stated below. As a verifiable approximation of \(M_{hd}\), our low-dimensional closed-loop system is defined as \(M_{ld} = (S, U, s_0, f, c_{ld})\). Both \(M_{hd}\) and \(M_{ld}\) have the same state space and action space. The only difference is that the \(M_{ld}\) has a low-dimensional controller \(c_{ld}: S \rightarrow U\), which operates on the exact states.

System Execution. The execution of \(M_{hd}\) starts from the initial state \(s_0\). Next, an image z can be generated by image generator g from that state. Then it is fed into \(c_{hd}\) to obtain a corresponding control action \(u = c_{hd} (z)\), which is used to update the state via dynamics f. For \(M_{ld}\), the execution proceeds similarly, except that the current state s directly results in a control action \(u = c_{ld} (s)\). Thus, we denote the state at time t starting from \(s_0\) executed by \(M_{hd} \) or \(M_{ld} \) as \(\varphi _{hd}(s_0, t)\) and \(\varphi _{ld}(s_0, t)\) respectively. The trajectory of \(M_{hd}\) is defined as a state sequence: \(\tau _{hd}(s_0, T) = [s_0, \varphi _{hd}(s_0, 1), \dots , \varphi _{hd}(s_0, T)],\) and similarly for \(\tau _{hd}\).

Based on previous background, we define reachable sets and tubes:

Definition 1

(Reachable set). Given an initial set \(S_0\) and an integer time t, a reachable set \(\textsf{rs}_M(S_0, t)\) for (either) system M contains all the states that can be reached from \(S_0\) in t steps: \(\textsf{rs}_M(S_0, t) = \{\varphi _M(s_0, t) \mid \forall s_0 \in S_0\}\).

Definition 2

(Reachable tube). Given an initial set \(S_0\) and time horizon T, a reachable tube \(\textsf{rt}_M (S_0, T)\) for (either) system M is a sequence of all the reachable sets from \(S_0\) until time T: \(\textsf{rt}_M (S_0, T) = [S_0, \textsf{rs}_M(S_0,1),..., \textsf{rs}_M(S_0, T)]\).

Assumptions on Image-State Mapping g. Our key challenge is establishing a mapping between the high-dimensional image space Z and the low-dimensional state space S. Our verification methodology is based on the existence of a deterministic image generator g that is part of \(M_{hd}\). This generator is the true and unknown mechanism that creates images from states (e.g., a camera system). We do not assume or use an analyzable closed-form description of g. We also do not assume or verify any perception model (which obtains states from images).

We only use g in the training process for a limited state-image dataset, analogously to a “lab study” of an instrumented system \(M_{hd}\) (e.g., with positioning sensors or human annotators) to label each image z with a corresponding low-dimensional state s. Further, to check our robustness to this assumption, we will perform a sensitivity analysis by adding zero-mean Gaussian noise to the state-image mapping. The results of this evaluation will be discussed in Sect. 4.

Verification Problem. Our problem is to guarantee that the high-dimensional system \(M_{hd}\) reaches the goal set G from an initial set \(S_0\) within time T. To this end, we aim to compute reachable sets of the high-dimensional system \(M_{hd}\) and intersect them with the goal set to obtain the verification verdict. Set G is specified in low dimensions (i.e., using physical variables); however, the \(M_{hd}\) behavior is determined by the images from generator g and the HDC’s response to them.

Thus, given an initial set \(S_0\), goal set G, system \(M_{hd}\), and time horizon T, our goal is to verify this assertion:

$$\begin{aligned} {\begin{matrix} \forall s_0 \in S_0 ~\cdot ~ &\textsf{rs}_{M_{hd}}(S_0, T) \subseteq G \end{matrix}} \end{aligned}$$
(1)

This problem can be divided into two parts: (a) approximating \(M_{hd}\) with low-dimensional systems \(M_{ld} ^1, \dots , M_{ld} ^n\) and verifying them; (b) combining these reachability results based on the approximation error bounds into a reachability verdict to solve the above \(M_{hd}\) problem with statistical confidence.

3 Verification of High-Dimensional Systems

Considering the challenges of complex structure and dynamics of high-dimensional systems, and the difficulties of defining safety in high dimensions, our end-to-end approach is structured in five steps: (1) train low-dimensional controller(s), (2) perform reachability analysis on them, (3) compute statistical discrepancy bounds between high- and low-dimensional controllers, (4) inflate the reachable tubes from low-dimensional verification with these bounds, and (5) combine the verification results and repeat the process as if needed on different states/LDCs.

Step 1: Training Low-Dimensional Controllers

Given the aforementioned challenges of directly verifying \(M_{hd}\), we plan to first verify the behavior in the low dimensions according to \(M_{ld}\). Hence, we train a \(c_{ld}\) to imitate the performance of \(c_{hd}\) starting from a given state region, which serves as an input to Step 1 (our first iteration uses the full initial state region \(S_0\) to train one \(c_{ld}\)). As a start, we collect the training data for \(c_{ld}\): given the \(c_{hd}\), access to image generator g, and the initial state space region \(S_0\), we construct a supervised training dataset \(\mathcal {D}_{tr} = \big \{\big (\tau _{hd}(s_i,T), (u_1, ..., u_T)_i\big )\big \}_{i=1}^m\) by sampling the initial states \(s_i \sim D_0\) from some given distribution \(D_0\) (in practice, \(D_0 = {\text {Uniform}}(S_0)\)).

Training a verifiable LDC has two conflicting objectives. On the one hand, we want to approximate the given \(c_{hd}\) with minimal Mean Squared Error (MSE) on \(\mathcal {D}_{tr}\). On the other hand, neural networks with smaller Lipschitz constants are more predictable and verifiable [15, 23, 50].

We balance the ability of the \(c_{ld}\) to mimic the \(c_{hd}\) and the verifiability of \(c_{ld}\) by using a recent verification-aware knowledge distillation technique [21]. Originally, this method was developed to compress low-dimensional neural networks for better verifiability—and we extend it to approximate an HDC with LDCs using the supervised dataset \(\mathcal {D}_{tr}\). Specifically, we implement knowledge distillation with two-objective gradient descent, which aims to optimize the MSE loss function \(L_{mse}\) and Lipschitz constant loss function \(L_{lip}\). First, it computes the directions of two gradients with respect to the \(c_{ld}\) parameters \(\theta \):

$$\begin{aligned} d_{L_{mse}} = \frac{\partial L_{mse}}{\partial \theta }, \quad d_{L_{lip}} = \frac{\partial L_{lip}}{\partial \theta } \end{aligned}$$
(2)

The two-objective descent operates case-by-case to optimize at least one objective as long as possible. If \(d_{L_{mse}} \cdot d_{L_{lip}} > 0 \), the objectives can be optimized simultaneously by following the direction of the angular bisector of the two gradients. If \(d_{L_{mse}} \cdot d_{L_{lip}} < 0\), then it is impossible to improve both objectives. Then, weights are updated along the vector of \(d_{L_{mse}}\) (the higher priority) projected onto the hyperplane perpendicular to \(d_{L_{lip}}\). The thresholds for MSE and Lipschitz constants in our system \(M_{ld}\) are denoted as \(\epsilon \) and \(\lambda \) respectively. The stopping condition is met when both loss functions are below their thresholds or the training time exceeds the limit. Later on, Step 1 will be referred to with function TrainLDC, and our way of tuning \(\epsilon \) and \(\lambda \) will be described later in Step 5.

Step 2: Reachability Analysis In Low Dimensions

After training LDCs \(\{c_{ld} ^1,...,c_{ld} ^m \}\), we construct overapproximate reachable tubes for each. We perform reachability analysis for systems \(M_{ld} ^1, \dots , M_{ld} ^m\) with the respective controllers and the initial set \(S_0\) specified in the original verification problem. This will result in a set of reachable tubes \(\textsf{rt}_{M_{ld} ^1}(S_0, T), \dots , \textsf{rt}_{M_{ld} ^m}(S_0, T)\).

To implement reachability analysis, we use the POLAR toolbox (https://github.com/ChaoHuang2018/POLAR_Tool), version of December 2022 [27, 62], which computes univariate Bernstein polynomials to overapproximate activation functions in \(c_{ld}\), and then tightly and selectively overapproximates \(c_{ld}\) with Taylor/Bernstein polynomials. For dynamics reachability, alternating with neural-network overapproximation, POLAR relies on the mature Flow* tool with Taylor model approximations [9]. The latest experimental results [62] show that POLAR outperforms other neural-network verification tools in both computational efficiency and tightness. The verification details are formalized in Algorithm 2 in Step 5.

Step 3a: Defining Discrepancy Bounds

The LDC reachable tubes from Step 2 cannot be used directly to obtain HDC guarantees because of the discrepancy between LDC and HDC behaviors, which inevitably arises when compressing a higher-parameter neural network [24]. Therefore, we will quantify the difference between LDCs and HDCs using discrepancy functions, inspired by the prior work on testing hybrid systems [19, 20, 44]. We introduce and investigate two types of discrepancy functions in our setting:

1. Trajectory-based discrepancy \(\beta \) considers the difference between the HDC and LDC trajectories starting from a matched state-image pair (sz), i.e., \(z = g(s)\). It is defined as the least upper bound on the maximum L1 distance between two trajectories, i.e., \(\Vert \tau _{hd} (s_0,T) - \tau _{ld} (s_0,T) \Vert _1\), over time T for all initial states \(s_0\) within the initial set \(S_0\). Therefore, each initial set \(S_0\) gives rise to its trajectory-based discrepancy \(\beta (S_0)\).

2. Action-based discrepancy \(\gamma \) considers the difference between LDC and HDC actions on a matched state-image pair (sz), i.e., \(z = g(s)\). Similarly to the above, it is defined as the least upper bound on the difference between control actions over time horizon T starting from any initial state \(s_0\) within the initial set \(S_0\). Note that the control difference, \(\Vert c_{hd} (g(s_{hd}^t)) - c_{ld} (s_{ld}^t)\Vert _1\), is considered at each time step, where the s is each state in the two trajectories.

Step 3b: Computing Statistical Discrepancy Bounds

Unfortunately, obtaining the true discrepancies is impractical: it would require solving optimization/feasibility problems in high-dimensional image spaces. Instead, we calculate the statistical upper bounds for these discrepancies via conformal prediction, which is a distribution-free statistical technique to provide probabilistically valid uncertainty regions for complex prediction models—without strong assumptions about these models or their error distributions [55].

Below we briefly summarize basic conformal prediction. Consider \(k+1\) independent and identically distributed random variables \(\varDelta , \varDelta ^1,..., \varDelta ^k\), also known as non-conformity scores. Conformal prediction computes an uncertainty region for \(\varDelta \) via a function \(\bar{\varDelta }: \mathbb {R}^k \rightarrow \mathbb {R}\) from the other k values. Given a failure probability \(\alpha \in (0, 1)\), conformal prediction provides an uncertainty bound on \(\bar{\varDelta }\) such that \({\text {Pr}}(\varDelta \le \bar{\varDelta }\)) \(\ge 1 - \alpha \). This is performed with a surprisingly simple quantile argument, where the uncertainty bound \(\bar{\varDelta }\) is calculated as the \((1-\alpha )\)-th quantile of the empirical distribution over the values of \(\varDelta ^1, \varDelta ^2,..., \varDelta ^k,\) and \(\infty \). The guarantee is formalized in the lemma below, and for details see a popular tutorial [48].

Lemma 1

(Lemma 1 in [22]) Let \(\varDelta , \varDelta ^1, \varDelta ^2,..., \varDelta ^k\) be k+1 independent identically distributed real-valued random variables. Without loss of generality, let \(\varDelta , \varDelta ^1, \varDelta ^2,..., \varDelta ^k\) be stored in non-decreasing order and define \(\varDelta ^{k+1} {:}{=}\infty \). For \(\alpha \in (0, 1)\), it holds that \({\text {Pr}}(\varDelta \le \bar{\varDelta }\)) \(\ge 1 - \alpha \) where \(\bar{\varDelta } {:}{=}\varDelta ^{(r)}\), which is the r-ranked variable with \(r = \lceil (k+1)(1-\alpha ) \rceil \), and \(\lceil . \rceil \) is the ceiling function.

Leveraging conformal prediction, we define the statistical versions of our discrepancy functions. For the trajectory-based one, we define the non-conformity as the maximum L1 distance between states at the same time in two matched trajectories \(\tau _{ld} (s_0, T)\) and \(\tau _{hd} (s_0, T)\) starting from a random state \(s_0 \sim D_0\) sampled independently and identically distributed (i.i.d.) from a given distribution \(D_0\) over the initial region \(S_0\), similar to recent works [11, 44]. This leads to a trajectory dataset \(\mathcal {D}_{tb}\), from which k non-conformity scores are calculated.

Definition 3

(Statistical trajectory-based discrepancy). Given distribution \(D_0\) over \(S_0\), confidence \(\alpha \in (0, 1)\), and state functions \(\varphi _{hd}(s, t)\) and \(\varphi _{ld}(s, t)\) for systems \(M_{hd} \) and \(M_{ld} \), a statistical trajectory-based discrepancy \(\bar{\beta }(D_0)\) is an \(\alpha \)-confident upper bound on the max trajectory distance starting from \(s_0\sim D_0\):

$$\begin{aligned}&{\text {Pr}}_{s_0 \sim D_0}\Big [ \max _{t=0..T} \Vert \varphi _{hd}(s_0, t) -\varphi _{ld}(s_0, t) \Vert _1 \le \bar{\beta }(D_0)\Big ] \ge 1 - \alpha \end{aligned}$$

To obtain this bound \(\bar{\beta }(D_0)\), we leverage conformal prediction as follows. Dataset \(\mathcal {D}_{tb}\) contains i.i.d. samples \(s_1, s_2, ..., s_k\) from our chosen distribution \(D_0\). In practice, we choose the uniform distribution, namely \(s \sim {\text {Uniform}}(S)\), because we value the safety of each state equally. We compute the corresponding non-conformity scores \(\delta ^1, \delta ^2,..., \delta ^k, \delta ^{k+1}\) as the maximum L1 distances between the same-time states in the two trajectories over all times \(t \in [0..T]\):

$$\begin{aligned} \delta ^i = \max _{t=0..T}\Vert \varphi _{hd}(s_i, t) -\varphi _{ld}(s_i, t)\Vert _1 \text { for } i = 1\dots k; \text { and } \delta ^{k+1} = \infty \end{aligned}$$

We sort the scores in the increasing order and set \(\bar{\beta }(S_0)\) to the r-th quantile:

$$\begin{aligned} \bar{\beta }(D_0) {:}{=}\delta ^{(r)} \text { with } r = \lceil (k+1)(1-\alpha ) \rceil \end{aligned}$$
(3)

We follow a similar procedure for the statistical action-based discrepancy, except that now the non-conformity scores are defined as the maximum differences between actions at the same time in two paired trajectories.

Definition 4

(Statistical action-based discrepancy). Given confidence \(\alpha \in (0, 1)\), distribution \(D_0\) over \(S_0\), and systems \(M_{ld}\) and \(M_{hd}\), a statistical action-based discrepancy \(\bar{\gamma }(D_0)\) is an \(\alpha \)-confident upper bound on maximum action discrepancy in two trajectories starting from \(s_0\sim D_0\):

$$\begin{aligned} {\text {Pr}}_{D(S_0)}\Big [\max _{t=0..T} \Vert c_{hd} \big (g(\varphi _{hd}(s_0, t))\big ) - c_{ld} \big (\varphi _{ld}(s_0, t)\big )\Vert _1 \le \bar{\gamma }(D_0)\Big ] \ge 1 - \alpha \end{aligned}$$

To implement this statistical action-based discrepancy function, we sample initial states \(s_1, s_2, ..., s_k\) from a given set \(S_0\) following the distribution \(D_0\) (in practice, uniform) and obtain the corresponding low-dimensional trajectories. Then we generate with g the corresponding images matched to each state in each trajectory—and these pairs form our action-based dataset \(\mathcal {D}_{ab}\). The corresponding nonconformity scores \(\delta ^1, \delta ^2,..., \delta ^k, \delta ^{k+1}\) are maximum action differences:

$$\begin{aligned} \delta ^i = \max _{t=0..T} \Vert c_{hd} (g(\varphi _{hd}(s_0, t))) - c_{ld} (\varphi _{ld}(s_0, t))\Vert _1 \text { for } i=1\dots k; \delta ^{k+1} = \infty . \end{aligned}$$

Then we sort these non-conformity scores in the non-decreasing order and determine the statistical bound for the action-based discrepancy as:

$$\begin{aligned} \bar{\gamma }(D_0) {:}{=}\delta ^{(r)} \text { with }r = \lceil (k+1)(1-\alpha ) \rceil \end{aligned}$$
(4)

Step 4: Inflating Reachability With Discrepancies

This step combines low-dimensional reachable tubes (Step 2) with statistical discrepancies (Step 3b) to provide a safety guarantee on the high-dimensional system. Thus, we inflate the original LDC reach tubes with either trajectory or action discrepancy to contain the (unknown) true HDC tube with chance \(1-\alpha \).

Trajectory-Based Inflation. The trajectory-based approach inflates the LDC reachable set starting in region \(S_0\) with the statistical trajectory-based discrepancy \(\bar{\beta }(D_0)\). Since the final reachable tube for a given initial set of \(c_{ld}\) is represented as a sequence of discrete state polytopes calculated by concretizing the Taylor model with interval arithmetic on the initial set [27], we inflate these polygons by adding \(\bar{\beta }(D_0)\) to their boundaries.

Definition 5

(Trajectory-inflated reachable set). Given a distribution \(D_0\) over initial set \(S_0\) that is controlled by LDC \(c_{ld}\), reachable set \(\textsf{rs}(S_0, t)\), and its trajectory discrepancy \(\bar{\beta }( D_0 )\), a trajectory-inflated reachable set is defined as:

$$\begin{aligned} \textsf{irs}(S_0, t, \bar{\beta }(D_0)) = \big \{s \in S ~|~ \exists s' \in \textsf{rs}(S_0, t) \cdot \Vert s-s'\Vert _1 \le \bar{\beta }(D_0)\big \} \end{aligned}$$

Definition 6

(Trajectory-inflated reachable tube). Given a distribution \(D_0\) over initial set \(S_0\) that is controlled by LDC \(c_{ld}\), a reachable tube \(\textsf{rt}(S_0, t) = \big [S_0, \textsf{rs}(S_0, 1), \dots , \textsf{rs}(S_0, T)\big ]\) over time horizon T, and its trajectory discrepancy \(\bar{\beta }(D_0)\) over the initial set \(S_0\), a trajectory-inflated reachable tube \(\textsf{irt}(S_0, \bar{\beta }(D_0)) \) is defined as:

$$ \textsf{irt}(S_0, \bar{\beta }(D_0)) = \left[ \textsf{irs}(S_0, 0, \bar{\beta }(D_0)), \textsf{irs}(S_0, 1, \bar{\beta }(D_0)) , \dots , \textsf{irs}(S_0, T, \bar{\beta }(D_0))\right] . $$

Based on Definitions 5 and 6, we establish Theorem 1 that the trajectory-inflated LDC reachable tube contains the HDC reachable tube with at least \(1-\alpha \) probability.

Theorem 1

(Confident trajectory-based overapproximation). Consider distribution \(D_0\) over initial set \(S_0\), confidence \(\alpha \), a high-dimensional system \(M_{hd}\), approximated with a low-dimensional system controlled by \(c_{ld} \) with an \(\alpha \)-confident statistical trajectory-based discrepancy function \(\bar{\beta }(S_0)\). Then the trajectory-inflated low-dimensional tube \(\textsf{irt}_{M_{ld}}(S_0, \bar{\beta }(D_0))\) contains the high-dimensional reachable tube \(\textsf{rt}_{M_{hd}}(S_0)\) with probability \(1-\alpha \):

$$ {\text {Pr}}_{D_0}\Big [\textsf{rt}_{M_{hd}}(S_0) \subseteq \textsf{irt}_{M_{ld}}(S_0, \bar{\beta }(S_0)) \Big ] \ge 1-\alpha $$

Proof

All the proofs are found in the extended online version [67].

Definitions 5 and 6 and Theorem 1 describe inflation and guarantees with a single LDC. However, one LDC usually cannot mimic the behavior of the HDC accurately. Therefore, we train several LDCs \(\{c_{ld} ^1,c_{ld} ^2, \dots , c_{ld} ^{m} \}\), one for each subregion of initial set \(\{S_1, S_2,\dots ,S_{m}\}\) with respective distributions \(D_0 = \{D_1, D_2,\dots ,D_{m}\}\). Subsequently, the trajectory-inflated tube with multiple LDCs can be represented as a union of all the single trajectory-inflated tube \(\textsf{irt}(S_0, \bar{\beta }(D_0)) := \bigcup _{i = 1}^{m}\textsf{irt}(S_i, \bar{\beta }(D_i))\).

Action-Based Inflation. Action-based inflation is less direct than with trajectories: we inflate the neural network’s output set that is represented by a Taylor model TM\((p(S_0), I)\) [27], where \(p(S_0)\) is a polynomial representing order-k Taylor series expansion of the \(c_{ld}\) activation functions in region \(S_0\), and the remainder interval I ensures that Taylor model overapproximates the neural network’s output. In this context, we widen the bounds of the remainder interval I in the last layer of the \(c_{ld}\) by our statistical action-based discrepancy \(\bar{\gamma }(D_0)\), ensuring that the potential outputs of \(c_{hd}\) are contained in the resulting Taylor model.

Definition 7

(Action-inflated reachable set). Given distribution \(D_0\) over set \(S_0\) that is controlled by LDC \(c_{ld} \), statistical action-based discrepancy \( \bar{\gamma }(D_0)\), and low-dimensional control bounds \([u_{min}(t), u_{max}(t)] \supseteq c_{ld} \big (S_0\big )\), the action-inflated reachable set contains states reachable by inflating the action bounds:

$$\begin{aligned} \textsf{irs}(S_0, \bar{\gamma }(D_0)) &= \big \{ f(s, u) \mid s \in S_0, u \in \big [u_{min}(t) - \bar{\gamma }(D_0), u_{max}(t) + \bar{\gamma }(D_0)\big ] \big \} \end{aligned}$$

Definition 8

(Action-inflated reachable tube). Given an distribution \(D_0\) over initial set \(S_0\) that is controlled by LDC \(c_{ld}\), dynamics f, time horizon T, and action-based discrepancy functions \( \bar{\gamma }(D_0)\), the action-inflated reachable tube is a recursive sequence of inflated action-based reachable sets:

$$ \textsf{irt}(S_0, \bar{\gamma }(D_0)) = \big [ S_0, \textsf{irs}_1(S_0, \bar{\gamma }(D_0)), \textsf{irs}_2(\textsf{irs}_1, \bar{\gamma }(D_0)) , \dots , \textsf{irs}_T(\textsf{irs}_{T-1}, \bar{\gamma }(D_0)) \big ]. $$

Based on Definitions 7 and 8, we put forward Theorem 2 below for the lower probability bound of the action-inflated LDC tube containing the true HDC tube.

Theorem 2

(Confident action-based overapproximation). Consider distribution \(D_0\) over initial set \(S_0\), high-dimensional system \(M_{hd}\) with controller \(c_{hd}\), approximated by low-dimensional system \(M_{ld} \) controlled by \(c_{ld} \) with \(\alpha \)-confident statistical action-based discrepancies \(\bar{\gamma }(S_0)\). Then the action-inflated low-dimen-

sional tube \(\textsf{irt}_{M_{ld}}(S_0, \bar{\gamma }(S_0))\) contains the high-dimensional tube \(\textsf{rt}_{M_{hd}}(S_0)\) with probability \(1-\alpha \):

$$ {\text {Pr}}_{D_0} \Big [\textsf{rt}_{M_{hd}}(S_0) \subseteq \textsf{irt}_{M_{ld}}(S_0, \bar{\gamma }(S_0)) \Big ] \ge 1-\alpha $$

Definitions 7 and 8 describe inflation with a single LDC, which we extend to multiple LDCs by taking the union of all the LDCs’ inflated tubes. Given a partitioned initial set \(S_0 = \{S_1, ..., S_m \}\) with respective controllers \(\{c_{ld} ^1, \dots , c_{ld} ^{m} \}\) and distributions \(D_0 = \{D_1, ...,D_m \}\), the multiple LDCs action-inflated reachable tube is \(\textsf{irt}(S_0, \bar{\gamma }(D_0)) := \bigcup _{i = 1}^{m}\textsf{irt}(S_i, \bar{\gamma }(D_i))\). As it turns out, this reachable tube also contains the HDC tube with at least \(1-\alpha \) chance.

Step 5: Iterative Retraining and Re-gridding

Algorithm 1
figure a

. Iterative LDC training for the action-based approach

Once the inflated reachable tubes are obtained in Step 4, we focus on the regions of the initial set where HDC simulations succeed—yet safety verification fails. This can happen for two reasons: (i) overly high overapproximation error in the LDC reachability, or (ii) overly high conformal discrepancy bounds from \(\bar{\beta }\) or \(\bar{\gamma }\).

Reducing Reachability Overapproximation Error. We lower the threshold for the Lipschitz constant \(\lambda \) to retrain the respective LDCs in Step 1. In our experience, this almost always reduces the overapproximation in the LDC analysis and makes low-dimensional reachable tubes tighter—but may result in higher statistical discrepancy bounds, which we address below.

Reducing Conformal Discrepancy Bounds. When these bounds are loose, our LDC imitates the HDC poorly in some state-space region. Here, we take inspiration from refinement techniques in testing [45, 66]. When a desired discrepancy bound \(\xi \) is exceeded in a state-space region, we split it into subregions by taking its midpoints in each dimension, leading to an updated state-space grid \(\textbf{S}'\). Then in each sub-region, we retrain an LDC as per Step 1 with a reduced MSE threshold \(\epsilon \) and re-compute its bounds as per Step 3b. leading to tighter statistical overapproximations of HDC reachable tubes.

To summarize, Algorithm 1 shows our iterative training procedure for the action-based approach (its trajectory-based counterpart proceeds analogously, except for computing the discrepancies over trajectories).

Algorithm 2
figure b

. End-to-end reachability verification of an HDC

Combining all the five steps together, we present Algorithm 2 that displays our end-to-end verification of a given HDC with either trajectory-based or action-based discrepancies. The LDCs and their discrepancies are input into the reachability analysis, implemented with the function Reach, to calculate the inflated reachable tubes (using the POLAR toolbox in practice). Note that the verification regions of \(\mathbf {S_{ver}}\) in Algorithm 2 are much smaller partitions of larger gridding regions \(\textbf{S}\) defined in Algorithm 1 for training. Each gridding region, which for instance is a 0.5 \(\times \) 0.5 square, corresponds to one LDC. Inside each gridding region, the verification region \(\mathbf {S_{ver}}\) is a 0.01\(\times \)0.01 square. Our end-to-end algorithm guarantees that an affirmative answer to our verification problem is correct with at least \(1-\alpha \) probability, as per Theorem 3.

Theorem 3

(Confident guarantee of HDC safety). Consider a partitioned initial set grid \(S_0 = \{S_1, \dots , S_{m}\}\), a set of corresponding distributions \(\{D_1,...D_m \}\), a high-dimensional system \(M_{hd}\) with controller \(c_{hd}\), and a set of low-dimensional systems \(M_{ld} ^1, \dots , M_{ld} ^m\) with respective controllers \(c_{ld} ^1, \dots , c_{ld} ^n\) that approximate \(c_{hd}\) with either an \(\alpha \)-confident trajectory discrepancy or action discrepancy, the probability that HDC safe set \(S_{safe}\) calculated by Algorithm 2 with either discrepancy belongs to ground truth safe set \( S^*_{safe}\) is at least \((1 - \alpha )\):

$${\text {Pr}}_{D_1...D_m}\Big [ S_{safe} \subseteq S^*_{safe} \Big ] \ge (1-\alpha ) $$

4 Experimental Evaluation

Benchmark Systems and Controllers. We evaluate our approach on three benchmarks from OpenAI Gym [7]: two two-dimensional case studies—an inverted pendulum (IP) with angle \(\theta \) and angular velocity \(\dot{\theta }\); a mountain car (MC) with position x and velocity v, and a four-dimensional case study—a cart pole (CP) with cart position x, cart velocity v, angle \(\theta \), and angular velocity \(\dot{\theta }\). Our selection of case studies is limited because of the engineering challenge of setting up both vision-based control and low-dimensional verification for the same system. Our continuous-action, convolutional HDCs \(c_{hd}\) for these systems were trained with deep deterministic policy gradient (DDPG) [36]. To imitate the performance of \(c_{hd}\), we train simpler feedforward neural networks \(c_{ld}\) with only low-dimensional state inputs. See the Appendix for their architecture and dynamics, and our code can be accessed from GitHubFootnote 1

Experimental Procedure. Our verification’s goal is to check whether the system will stay inside the specified goal set G after T time steps (e.g., the mountain car’s position must stay within the target set \([0.45, \infty ]\) after 60 steps). The verification returns “safe” if the inflated reachable set for \(t=T\) lies entirely in G—and “unsafe” otherwise. The details are found in the Appendix.

For both approaches, we calculate the discrepancies in 0.25-sized state squares within the initial set in IP, hence creating \(8 \times 8 = 64\) regions (MC has \(8 \times 9 = 72\) regions; CP has \(5 \times 5 \times 5 \times 5 = 625\)). In each, we sample 60 trajectories to compute both trajectory-based discrepancies \(\bar{\beta }\) and action-based discrepancies \(\bar{\gamma }\) because it is a relatively small sample count that avoids the highest non-conformity score or the infinity as the conformal bound. We also implement a pure conformal prediction baseline and, for a fair comparison, give it the same data/regions. This results in 3840 sampled trajectories in IP, 4320 in MC, and 76800 for CP.

We use closed-loop simulation to obtain the (approximate) ground truth (GT) of safety. For IP and CP, we grid the initial set into squares with an interval of 0.01. For MC, we grid the initial set with the position step 0.01 and velocity step 0.001. Within each grid cell, we uniformly sample 10 initial states and simulate a trajectory from each. If all 10 trajectories end in the goal set G, we mark this cell as “truly safe”, otherwise “truly unsafe”. In IP, the truly safe-to-unsafe cell ratio is 0.56, 0.78 in MC, and 0.58 in CP. The verification process uses the same grid cells as its initial state regions, leading to 40k low-dimensional verification runs for IP, 14k for MC, and 50k for CP. The trajectory-based verification time for IP, MC, and CP are 6.2, 5.8, and 6.4 h respectively; the action-based verification takes 6.3, 6.1, and 6.6 h respectively.

Success Metrics. We evaluate verification as a binary classifier of the GT safety, with “safe” being the positive class and “unsafe” being the negative. Our evaluation metrics are the (i) true positive rate (TPR, a.k.a. sensitivity and recall), indicating the fraction of truly safe regions that were successfully verified; (ii) true negative rate (TNR, a.k.a. specificity), indicating the fraction of truly unsafe regions that failed verification; (iii) precision, indicating the fraction of safe verification verdicts that are truly safe (which is essential for safety-critical systems and controlled by rate \(\alpha \) as per Theorem 3); and (iv) F1 score, which is a harmonic mean of precision and recall to provide a class-balanced assessment of predictions.

Table 1. Verification performance (\(M=4\) for IP and CP, \(M=10\) for MC).
Table 2. Verification performance for multiple LDCs with zero-mean Gaussian noise added to true state before image generator g.

Verification Results. The quantitative results of the three case studies are summarized in Table 1. Confidence \(\alpha \) is set to 0.05 for all methods, which sets the minimum precision to 0.95, satisfied by all the approaches. The pure conformal prediction baseline shows high precision and TNR, but loses in TPR to our approaches—thus being able to correctly verify a significantly smaller region of the state space. When it comes to well-balanced safety prediction in practice, F1 score shows that our trajectory-based approach outperforms the other two.

Across all case studies, the baseline is significantly more conservative than the requested 95% precision. While this can be an advantage in safety-critical settings, excessive conservatism can also hamper adoption, so the approach should be sensitive to the desired confidence—which our trajectory-based approach demonstrates in the mountain car case study (see Precision in Table 1).

Across all case studies, the multi-LDC approaches always match or outperform the one-LDC approaches. This result demonstrates the utility of modularizing the HDC approximation problem. Also, our single-LDC action-based approach successfully verifies relatively few regions, leading to its low TPR. That is because unlike in the case of trajectory discrepancies, only one LDC cannot provide tight statistical upper bounds for control actions, causing large overapproximation in the inflated reachable sets, resulting in false negatives.

Sensitivity to Noisy Images. Despite adding Gaussian noise to generator g, our approaches perform similarly to noise-free g when under low noise variance as per Table 2, thus showing some robustness. However, we saw a significant decline in the verification coverage (TPR, but not the TNR and \(\alpha \)-guaranteed precision) under substantial noise variance (up to 0.5, not shown in Table 2).

Limitations. Our approach relies on statistical inference based on i.i.d. sampling from a fixed distribution, which downgrades the exhaustive guarantees of formal verification. However, it may be possible to exhaustively bridge this gap with neural-network conformance analysis based on satisfiability solving [41]. We also envision relaxing the i.i.d. assumption with time-series conformal prediction [3, 58], as well as uncertainty-guided gridding [37] to reduce our discrepancy bounds.

5 Related Work

Low-Dimensional Verification of Closed-Loop Systems. Neural-network controlled systems have been used widely [42, 46, 52], which has highlighted the challenges of verifying their correctness within closed-loop systems. Since it’s impossible to calculate all the exact states, especially in non-linear systems, current approaches primarily focus on how to make tight overapproximate reachable sets [2, 9, 10]. For sigmoid-based NNCS, Verisig [30] toolbox can transform the neural-network controlled system into a hybrid system, which can be verified by other tools like flow*. NNV [54] performs overapproximation analysis by combining star sets [38, 53] for feed-forward neural networks with zonotopes for non-linear plant dynamics in CORA [2]. POLAR [27] overcame the challenges of non-differentiable activation functions by combining the Bernstein-Bézier Form [28] and the symbolic remainder. This method achieves state-of-the-art performance in both the tightness of reachable tubes and computation times. Another type of verification called Hamilton-Jacobi (HJ) reachability [4], is inspired by optimal control. The DeepReach [5] technique can solve the verification problem with tens of dimensions by leveraging a deep neural network to represent the value function in the HJ reachability analysis. Nonetheless, such methods remain ill-suited for handling inputs with hundreds or thousands of dimensions.

These verification tools cannot deal with complicated neural network controllers. Therefore, an alternative approach is to simplify complex controllers into smaller, verifiable controllers by model reduction techniques [16, 33], such as parameter pruning, compact convolution filters, and knowledge distillation [25].

Abstractions of Perception Models. Given the challenge of verifying the image-based closed-loop systems directly, many methods construct abstractions of the perception model to map the relationship between the image and the states for verification [43]. One abstraction approach [31] employs the generative model, especially Generative Adversarial Network (GAN), mapping states to images. The generated images will be put into the controller in the verification phase. Hence, the accuracy of the verification results depends on the quality of the image produced by the generative model. Other researchers [26] construct the exact mathematical formula mapping the real state into the simplified image [47], which can be verified in another neural network checker [32]. One limitation of exact modeling is the effort to generalize for other systems or scenarios. For instance, their implementation may be specific to a proportional controller in the aircraft landing or lane-keeping scenarios, which may not be suitable for the more complicated image-based systems in other cases.

Statistical Verification. Statistical verification draws samples to determine the property satisfaction from a finite number of trajectories [1, 11, 34, 35]. One advantage of such algorithms is that they provide assurance for arbitrarily complex black-box systems, merely requiring the ability to simulate them [59, 60]. Conformal prediction [55], which has been a popular choice for distribution-free uncertainty quantification, has recently been used to provide probabilistic guarantees on the satisfaction of a given STL property [37, 45]. Purely statistical methods come at the price of drawing sufficient samples—and only obtaining the guarantees at some level of statistical confidence, which can be difficult to interpret in the context of a dynamical system. Our work restricts the use of sampling only to the most challenging aspects and leverages exhaustive verification for the rest of the system, thus reducing our reliance on statistical assurance.

6 Conclusion

This paper takes a significant step towards addressing the major challenge of verifying end-to-end controllers implemented with high-dimensional neural networks. Our insight is that the behavior of such neural networks can be effectively approximated by several low-dimensional neural networks operating over physically meaningful space. To balance approximation error and verifiability in our low-dimensional controllers, we harness the state-of-the-art knowledge distillation. To close the gap between low- and high-dimensional controllers, we apply conformal prediction and provide a statistical upper bound on their difference either in trajectories or actions. Finally, by inflating the reachable tubes with two discrepancy types, we establish a high-confidence reachability guarantee for high-dimensional controllers. Future work may further reduce the role of sampling.