Bridging Dimensions: Confident Reachability for High-Dimensional Controllers

Geng, Yuang; Baldauf, Jake Brandon; Dutta, Souradeep; Huang, Chao; Ruchkin, Ivan

doi:10.1007/978-3-031-71162-6_20

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14933))

Included in the following conference series:

International Symposium on Formal Methods

19 Accesses

Abstract

Autonomous systems are increasingly implemented using end-to-end learning-based controllers. Such controllers make decisions that are executed on the real system, with images as one of the primary sensing modalities. Deep neural networks form a fundamental building block of such controllers. Unfortunately, the existing neural-network verification tools do not scale to inputs with thousands of dimensions—especially when the individual inputs (such as pixels) are devoid of clear physical meaning. This paper takes a step towards connecting exhaustive closed-loop verification with high-dimensional controllers. Our key insight is that the behavior of a high-dimensional vision-based controller can be approximated with several low-dimensional controllers. To balance the approximation accuracy and verifiability of our low-dimensional controllers, we leverage the latest verification-aware knowledge distillation. Then, we inflate low-dimensional reachability results with statistical approximation errors, yielding a high-confidence reachability guarantee for the high-dimensional controller. We investigate two inflation techniques—based on trajectories and control actions—both of which show convincing performance in three OpenAI gym benchmarks.

You have full access to this open access chapter, Download conference paper PDF

Keywords

1 Introduction

End-to-end deep neural network controllers have been extensively used in executing complex and safety-critical autonomous systems in recent years [13, 40, 51, 52]. In particular, high-dimensional controllers (HDCs) based on images and other high-dimensional inputs have been applied in areas such as autonomous car navigation [49, 61] and aircraft landing guidance [47]. For example, recent work has shown the high performance of controlling aircraft to land on the runway with a vision-based controller [65]. For such critical applications, it is important to develop techniques with strong safety guarantees for HDC-controlled systems.

However, due to the high-dimensional nature of the input space, modern verification cannot be applied directly to systems controlled by HDCs [2, 43]. Current closed-loop verification tools, such as NNV [54], Verisig [30], Sherlock [18], and ReachNN* [28], are capable of combining a dynamical system and a low-dimensional controller (LDC) to verify a safety property starting from an initial region of the low-dimensional input space, such as position-velocity states of a car. DeepReach [5] has pushed the boundary of applying Hamilton-Jacobi (HJ) reachability to systems with tens of state dimensions. However, such verification tools fail to scale for an input with thousands of dimensions (e.g., an image). One issue is that the dynamics of these dimensions are impractical to describe. Furthermore, the structure of an HDC is usually more complicated than that of an LDC, with convolution and pooling layers. For example, an image-based HDC may have hundreds of layers with thousands of neurons, whereas an LDC usually contains several layers with dozens of neurons, making HDC verification difficult.

To deal with these challenges, researchers have built perception abstractions into the verification process. One work [31] verified a generative adversarial network (GAN) that creates images from states. Such methods cannot guarantee the GAN’s accuracy or relation to reality, which becomes a major falsifiable assumption of their verification outcomes. Another work [47] built a precise mathematical model capturing the exact relationship between states and image pixels to verify the image-based controller, which is effortful and needs to be redone for each system. Inspired by previous work on decreasing the dimensions, we skillfully create verifiable low-dimensional controllers from high-dimensional ones.

This paper proposes an end-to-end methodology to verify systems with HDCs by employing the steps displayed in Fig. 1. Instead of verifying an HDC’s safety directly over a complicated input space, our key idea is to approximate it with several LDCs so that we can reduce the HDC reachability problem to several LDC reachability problems. A crucial step is to upper-bound the difference between LDC and HDC, which we do statistically. Finally, we extend the reachable sets with the statistical bounds to obtain a safety guarantee for the HDC.

Since the input space and structure of the HDC are too complex to verify, we leverage knowledge distillation [25]—a model compression method—to train simplified “student models” (LDC) based on the information from the sophisticated “teacher model” (HDC). This training produces an LDC that is lightweight and amenable to closed-loop verification because it operates on dynamical states, not images. Moreover, due to the importance of the Lipschitz to minimizing the overapproximation error [28, 29], our methodology adopts two-objective gradient descent [21], decreasing both the approximation error and Lipschitz constant.

After training the LDCs, we calculate the statistical upper bound of the discrepancy between the two controllers, since obtaining the true discrepancy is impractical. To this end, we rely on conformal prediction [22, 45, 48], one of the cutting-edge statistical methods to provide a lower bound of the confidence interval for prediction residuals without distributional assumptions or explicit dependency on the sample count. We propose two conformal techniques to quantify the difference between HDC- and LDC-controlled systems, by bounding: (i) the distance between their trajectories, and (ii) the difference between the actions produced by the HDC and LDC. We inflate reachable sets of the LDC system based on both bounds to obtain safety guarantees on the HDC system.

We evaluate our approach on three popular case studies in OpenAI Gym [7]: inverted pendulum, mountain car, and cartpole. Our contributions are three-fold:

1.
Two verification approaches for high-dimensional controllers that combine reachability analysis and statistical inference to provide a safety guarantee for systems controlled by neural networks with thousands of inputs.
2.
A novel neural-network approximation technique for training multiple LDCs that collectively mimic an HDC and reduce overapproximation error.
3.
An implementation and evaluation of our verification approaches on three case studies: inverted pendulum, mountain car, and cartpole.

Section 2 provides the background and our problem. Section 3 describes our verification approach, which is evaluated in Sect. 4. Finally, we review the related work in Sect. 5 and conclude in Sect. 6. More details are in the extended online version [67].

2 Background and Problem Setting

High- and Low-Dimensional Systems. The original high-dimensional closed-loop system is a tuple $M_{hd} = (S, Z, U, s_0, f, c_{hd}, g)$. Here, the S is the state space, Z is the high-dimensional sensor space of so-called “images” (e.g., camera images or LIDAR scans), and the U is the control action space, $s_0$ is the initial state, $f:S \times U \rightarrow S$ is the dynamics, and $c_{hd}: Z \times S \rightarrow U$ is the HDC. Note that $c_{hd}$ only uses a subset of state dimensions as input (e.g., a convolutional neural network with image and velocity inputs, but not position), getting the rest of the information from the image.

For mathematical convenience, we also define an (unknown) deterministic state-to-image generator as $g: S \rightarrow Z$ and the role and assumptions of generator g are stated below. As a verifiable approximation of $M_{hd}$, our low-dimensional closed-loop system is defined as $M_{ld} = (S, U, s_0, f, c_{ld})$. Both $M_{hd}$ and $M_{ld}$ have the same state space and action space. The only difference is that the $M_{ld}$ has a low-dimensional controller $c_{ld}: S \rightarrow U$, which operates on the exact states.

System Execution. The execution of $M_{hd}$ starts from the initial state $s_0$. Next, an image z can be generated by image generator g from that state. Then it is fed into $c_{hd}$ to obtain a corresponding control action $u = c_{hd} (z)$, which is used to update the state via dynamics f. For $M_{ld}$, the execution proceeds similarly, except that the current state s directly results in a control action $u = c_{ld} (s)$. Thus, we denote the state at time t starting from $s_0$ executed by $M_{hd} $ or $M_{ld} $ as $\varphi _{hd}(s_0, t)$ and $\varphi _{ld}(s_0, t)$ respectively. The trajectory of $M_{hd}$ is defined as a state sequence: $\tau _{hd}(s_0, T) = [s_0, \varphi _{hd}(s_0, 1), \dots , \varphi _{hd}(s_0, T)],$ and similarly for $\tau _{hd}$.

Based on previous background, we define reachable sets and tubes:

Definition 1

(Reachable set). Given an initial set $S_0$ and an integer time t, a reachable set $\textsf{rs}_M(S_0, t)$ for (either) system M contains all the states that can be reached from $S_0$ in t steps: $\textsf{rs}_M(S_0, t) = \{\varphi _M(s_0, t) \mid \forall s_0 \in S_0\}$.

Definition 2

(Reachable tube). Given an initial set $S_0$ and time horizon T, a reachable tube $\textsf{rt}_M (S_0, T)$ for (either) system M is a sequence of all the reachable sets from $S_0$ until time T: $\textsf{rt}_M (S_0, T) = [S_0, \textsf{rs}_M(S_0,1),..., \textsf{rs}_M(S_0, T)]$.

Assumptions on Image-State Mapping g. Our key challenge is establishing a mapping between the high-dimensional image space Z and the low-dimensional state space S. Our verification methodology is based on the existence of a deterministic image generator g that is part of $M_{hd}$. This generator is the true and unknown mechanism that creates images from states (e.g., a camera system). We do not assume or use an analyzable closed-form description of g. We also do not assume or verify any perception model (which obtains states from images).

We only use g in the training process for a limited state-image dataset, analogously to a “lab study” of an instrumented system $M_{hd}$ (e.g., with positioning sensors or human annotators) to label each image z with a corresponding low-dimensional state s. Further, to check our robustness to this assumption, we will perform a sensitivity analysis by adding zero-mean Gaussian noise to the state-image mapping. The results of this evaluation will be discussed in Sect. 4.

Verification Problem. Our problem is to guarantee that the high-dimensional system $M_{hd}$ reaches the goal set G from an initial set $S_0$ within time T. To this end, we aim to compute reachable sets of the high-dimensional system $M_{hd}$ and intersect them with the goal set to obtain the verification verdict. Set G is specified in low dimensions (i.e., using physical variables); however, the $M_{hd}$ behavior is determined by the images from generator g and the HDC’s response to them.

Thus, given an initial set $S_0$, goal set G, system $M_{hd}$, and time horizon T, our goal is to verify this assertion:

$$\begin{aligned} {\begin{matrix} \forall s_0 \in S_0 ~\cdot ~ &\textsf{rs}_{M_{hd}}(S_0, T) \subseteq G \end{matrix}} \end{aligned}$$

(1)

This problem can be divided into two parts: (a) approximating $M_{hd}$ with low-dimensional systems $M_{ld} ^1, \dots , M_{ld} ^n$ and verifying them; (b) combining these reachability results based on the approximation error bounds into a reachability verdict to solve the above $M_{hd}$ problem with statistical confidence.

3 Verification of High-Dimensional Systems

Considering the challenges of complex structure and dynamics of high-dimensional systems, and the difficulties of defining safety in high dimensions, our end-to-end approach is structured in five steps: (1) train low-dimensional controller(s), (2) perform reachability analysis on them, (3) compute statistical discrepancy bounds between high- and low-dimensional controllers, (4) inflate the reachable tubes from low-dimensional verification with these bounds, and (5) combine the verification results and repeat the process as if needed on different states/LDCs.

Step 1: Training Low-Dimensional Controllers

Given the aforementioned challenges of directly verifying $M_{hd}$, we plan to first verify the behavior in the low dimensions according to $M_{ld}$. Hence, we train a $c_{ld}$ to imitate the performance of $c_{hd}$ starting from a given state region, which serves as an input to Step 1 (our first iteration uses the full initial state region $S_0$ to train one $c_{ld}$). As a start, we collect the training data for $c_{ld}$: given the $c_{hd}$, access to image generator g, and the initial state space region $S_0$, we construct a supervised training dataset $\mathcal {D}_{tr} = \big \{\big (\tau _{hd}(s_i,T), (u_1, ..., u_T)_i\big )\big \}_{i=1}^m$ by sampling the initial states $s_i \sim D_0$ from some given distribution $D_0$ (in practice, $D_0 = {\text {Uniform}}(S_0)$).

Training a verifiable LDC has two conflicting objectives. On the one hand, we want to approximate the given $c_{hd}$ with minimal Mean Squared Error (MSE) on $\mathcal {D}_{tr}$. On the other hand, neural networks with smaller Lipschitz constants are more predictable and verifiable [15, 23, 50].

We balance the ability of the $c_{ld}$ to mimic the $c_{hd}$ and the verifiability of $c_{ld}$ by using a recent verification-aware knowledge distillation technique [21]. Originally, this method was developed to compress low-dimensional neural networks for better verifiability—and we extend it to approximate an HDC with LDCs using the supervised dataset $\mathcal {D}_{tr}$. Specifically, we implement knowledge distillation with two-objective gradient descent, which aims to optimize the MSE loss function $L_{mse}$ and Lipschitz constant loss function $L_{lip}$. First, it computes the directions of two gradients with respect to the $c_{ld}$ parameters $\theta $:

$$\begin{aligned} d_{L_{mse}} = \frac{\partial L_{mse}}{\partial \theta }, \quad d_{L_{lip}} = \frac{\partial L_{lip}}{\partial \theta } \end{aligned}$$

(2)

The two-objective descent operates case-by-case to optimize at least one objective as long as possible. If $d_{L_{mse}} \cdot d_{L_{lip}} > 0 $, the objectives can be optimized simultaneously by following the direction of the angular bisector of the two gradients. If $d_{L_{mse}} \cdot d_{L_{lip}} < 0$, then it is impossible to improve both objectives. Then, weights are updated along the vector of $d_{L_{mse}}$ (the higher priority) projected onto the hyperplane perpendicular to $d_{L_{lip}}$. The thresholds for MSE and Lipschitz constants in our system $M_{ld}$ are denoted as $\epsilon $ and $\lambda $ respectively. The stopping condition is met when both loss functions are below their thresholds or the training time exceeds the limit. Later on, Step 1 will be referred to with function TrainLDC, and our way of tuning $\epsilon $ and $\lambda $ will be described later in Step 5.

Step 2: Reachability Analysis In Low Dimensions

After training LDCs $\{c_{ld} ^1,...,c_{ld} ^m \}$, we construct overapproximate reachable tubes for each. We perform reachability analysis for systems $M_{ld} ^1, \dots , M_{ld} ^m$ with the respective controllers and the initial set $S_0$ specified in the original verification problem. This will result in a set of reachable tubes $\textsf{rt}_{M_{ld} ^1}(S_0, T), \dots , \textsf{rt}_{M_{ld} ^m}(S_0, T)$.

To implement reachability analysis, we use the POLAR toolbox (https://github.com/ChaoHuang2018/POLAR_Tool), version of December 2022 [27, 62], which computes univariate Bernstein polynomials to overapproximate activation functions in $c_{ld}$, and then tightly and selectively overapproximates $c_{ld}$ with Taylor/Bernstein polynomials. For dynamics reachability, alternating with neural-network overapproximation, POLAR relies on the mature Flow* tool with Taylor model approximations [9]. The latest experimental results [62] show that POLAR outperforms other neural-network verification tools in both computational efficiency and tightness. The verification details are formalized in Algorithm 2 in Step 5.

Step 3a: Defining Discrepancy Bounds

The LDC reachable tubes from Step 2 cannot be used directly to obtain HDC guarantees because of the discrepancy between LDC and HDC behaviors, which inevitably arises when compressing a higher-parameter neural network [24]. Therefore, we will quantify the difference between LDCs and HDCs using discrepancy functions, inspired by the prior work on testing hybrid systems [19, 20, 44]. We introduce and investigate two types of discrepancy functions in our setting:

1. Trajectory-based discrepancy $\beta $ considers the difference between the HDC and LDC trajectories starting from a matched state-image pair (s, z), i.e., $z = g(s)$. It is defined as the least upper bound on the maximum L1 distance between two trajectories, i.e., $\Vert \tau _{hd} (s_0,T) - \tau _{ld} (s_0,T) \Vert _1$, over time T for all initial states $s_0$ within the initial set $S_0$. Therefore, each initial set $S_0$ gives rise to its trajectory-based discrepancy $\beta (S_0)$.

2. Action-based discrepancy $\gamma $ considers the difference between LDC and HDC actions on a matched state-image pair (s, z), i.e., $z = g(s)$. Similarly to the above, it is defined as the least upper bound on the difference between control actions over time horizon T starting from any initial state $s_0$ within the initial set $S_0$. Note that the control difference, $\Vert c_{hd} (g(s_{hd}^t)) - c_{ld} (s_{ld}^t)\Vert _1$, is considered at each time step, where the s is each state in the two trajectories.

Step 3b: Computing Statistical Discrepancy Bounds

Unfortunately, obtaining the true discrepancies is impractical: it would require solving optimization/feasibility problems in high-dimensional image spaces. Instead, we calculate the statistical upper bounds for these discrepancies via conformal prediction, which is a distribution-free statistical technique to provide probabilistically valid uncertainty regions for complex prediction models—without strong assumptions about these models or their error distributions [55].

Below we briefly summarize basic conformal prediction. Consider $k+1$ independent and identically distributed random variables $\varDelta , \varDelta ^1,..., \varDelta ^k$, also known as non-conformity scores. Conformal prediction computes an uncertainty region for $\varDelta $ via a function $\bar{\varDelta }: \mathbb {R}^k \rightarrow \mathbb {R}$ from the other k values. Given a failure probability $\alpha \in (0, 1)$, conformal prediction provides an uncertainty bound on $\bar{\varDelta }$ such that ${\text {Pr}}(\varDelta \le \bar{\varDelta }$) $\ge 1 - \alpha $. This is performed with a surprisingly simple quantile argument, where the uncertainty bound $\bar{\varDelta }$ is calculated as the $(1-\alpha )$-th quantile of the empirical distribution over the values of $\varDelta ^1, \varDelta ^2,..., \varDelta ^k,$ and $\infty $. The guarantee is formalized in the lemma below, and for details see a popular tutorial [48].

Lemma 1

(Lemma 1 in [22]) Let $\varDelta , \varDelta ^1, \varDelta ^2,..., \varDelta ^k$ be k+1 independent identically distributed real-valued random variables. Without loss of generality, let $\varDelta , \varDelta ^1, \varDelta ^2,..., \varDelta ^k$ be stored in non-decreasing order and define $\varDelta ^{k+1} {:}{=}\infty $. For $\alpha \in (0, 1)$, it holds that ${\text {Pr}}(\varDelta \le \bar{\varDelta }$) $\ge 1 - \alpha $ where $\bar{\varDelta } {:}{=}\varDelta ^{(r)}$, which is the r-ranked variable with $r = \lceil (k+1)(1-\alpha ) \rceil $, and $\lceil . \rceil $ is the ceiling function.

Leveraging conformal prediction, we define the statistical versions of our discrepancy functions. For the trajectory-based one, we define the non-conformity as the maximum L1 distance between states at the same time in two matched trajectories $\tau _{ld} (s_0, T)$ and $\tau _{hd} (s_0, T)$ starting from a random state $s_0 \sim D_0$ sampled independently and identically distributed (i.i.d.) from a given distribution $D_0$ over the initial region $S_0$, similar to recent works [11, 44]. This leads to a trajectory dataset $\mathcal {D}_{tb}$, from which k non-conformity scores are calculated.

Definition 3

(Statistical trajectory-based discrepancy). Given distribution $D_0$ over $S_0$, confidence $\alpha \in (0, 1)$, and state functions $\varphi _{hd}(s, t)$ and $\varphi _{ld}(s, t)$ for systems $M_{hd} $ and $M_{ld} $, a statistical trajectory-based discrepancy $\bar{\beta }(D_0)$ is an $\alpha $-confident upper bound on the max trajectory distance starting from $s_0\sim D_0$:

$$\begin{aligned}&{\text {Pr}}_{s_0 \sim D_0}\Big [ \max _{t=0..T} \Vert \varphi _{hd}(s_0, t) -\varphi _{ld}(s_0, t) \Vert _1 \le \bar{\beta }(D_0)\Big ] \ge 1 - \alpha \end{aligned}$$

To obtain this bound $\bar{\beta }(D_0)$, we leverage conformal prediction as follows. Dataset $\mathcal {D}_{tb}$ contains i.i.d. samples $s_1, s_2, ..., s_k$ from our chosen distribution $D_0$. In practice, we choose the uniform distribution, namely $s \sim {\text {Uniform}}(S)$, because we value the safety of each state equally. We compute the corresponding non-conformity scores $\delta ^1, \delta ^2,..., \delta ^k, \delta ^{k+1}$ as the maximum L1 distances between the same-time states in the two trajectories over all times $t \in [0..T]$:

$$\begin{aligned} \delta ^i = \max _{t=0..T}\Vert \varphi _{hd}(s_i, t) -\varphi _{ld}(s_i, t)\Vert _1 \text { for } i = 1\dots k; \text { and } \delta ^{k+1} = \infty \end{aligned}$$

We sort the scores in the increasing order and set $\bar{\beta }(S_0)$ to the r-th quantile:

$$\begin{aligned} \bar{\beta }(D_0) {:}{=}\delta ^{(r)} \text { with } r = \lceil (k+1)(1-\alpha ) \rceil \end{aligned}$$

(3)

We follow a similar procedure for the statistical action-based discrepancy, except that now the non-conformity scores are defined as the maximum differences between actions at the same time in two paired trajectories.

Definition 4

(Statistical action-based discrepancy). Given confidence $\alpha \in (0, 1)$, distribution $D_0$ over $S_0$, and systems $M_{ld}$ and $M_{hd}$, a statistical action-based discrepancy $\bar{\gamma }(D_0)$ is an $\alpha $-confident upper bound on maximum action discrepancy in two trajectories starting from $s_0\sim D_0$:

$$\begin{aligned} {\text {Pr}}_{D(S_0)}\Big [\max _{t=0..T} \Vert c_{hd} \big (g(\varphi _{hd}(s_0, t))\big ) - c_{ld} \big (\varphi _{ld}(s_0, t)\big )\Vert _1 \le \bar{\gamma }(D_0)\Big ] \ge 1 - \alpha \end{aligned}$$

To implement this statistical action-based discrepancy function, we sample initial states $s_1, s_2, ..., s_k$ from a given set $S_0$ following the distribution $D_0$ (in practice, uniform) and obtain the corresponding low-dimensional trajectories. Then we generate with g the corresponding images matched to each state in each trajectory—and these pairs form our action-based dataset $\mathcal {D}_{ab}$. The corresponding nonconformity scores $\delta ^1, \delta ^2,..., \delta ^k, \delta ^{k+1}$ are maximum action differences:

$$\begin{aligned} \delta ^i = \max _{t=0..T} \Vert c_{hd} (g(\varphi _{hd}(s_0, t))) - c_{ld} (\varphi _{ld}(s_0, t))\Vert _1 \text { for } i=1\dots k; \delta ^{k+1} = \infty . \end{aligned}$$

Then we sort these non-conformity scores in the non-decreasing order and determine the statistical bound for the action-based discrepancy as:

$$\begin{aligned} \bar{\gamma }(D_0) {:}{=}\delta ^{(r)} \text { with }r = \lceil (k+1)(1-\alpha ) \rceil \end{aligned}$$

(4)

Step 4: Inflating Reachability With Discrepancies

This step combines low-dimensional reachable tubes (Step 2) with statistical discrepancies (Step 3b) to provide a safety guarantee on the high-dimensional system. Thus, we inflate the original LDC reach tubes with either trajectory or action discrepancy to contain the (unknown) true HDC tube with chance $1-\alpha $.

Trajectory-Based Inflation. The trajectory-based approach inflates the LDC reachable set starting in region $S_0$ with the statistical trajectory-based discrepancy $\bar{\beta }(D_0)$. Since the final reachable tube for a given initial set of $c_{ld}$ is represented as a sequence of discrete state polytopes calculated by concretizing the Taylor model with interval arithmetic on the initial set [27], we inflate these polygons by adding $\bar{\beta }(D_0)$ to their boundaries.

Definition 5

(Trajectory-inflated reachable set). Given a distribution $D_0$ over initial set $S_0$ that is controlled by LDC $c_{ld}$, reachable set $\textsf{rs}(S_0, t)$, and its trajectory discrepancy $\bar{\beta }( D_0 )$, a trajectory-inflated reachable set is defined as:

$$\begin{aligned} \textsf{irs}(S_0, t, \bar{\beta }(D_0)) = \big \{s \in S ~|~ \exists s' \in \textsf{rs}(S_0, t) \cdot \Vert s-s'\Vert _1 \le \bar{\beta }(D_0)\big \} \end{aligned}$$

Definition 6

(Trajectory-inflated reachable tube). Given a distribution $D_0$ over initial set $S_0$ that is controlled by LDC $c_{ld}$, a reachable tube $\textsf{rt}(S_0, t) = \big [S_0, \textsf{rs}(S_0, 1), \dots , \textsf{rs}(S_0, T)\big ]$ over time horizon T, and its trajectory discrepancy $\bar{\beta }(D_0)$ over the initial set $S_0$, a trajectory-inflated reachable tube $\textsf{irt}(S_0, \bar{\beta }(D_0)) $ is defined as:

$$ \textsf{irt}(S_0, \bar{\beta }(D_0)) = \left[ \textsf{irs}(S_0, 0, \bar{\beta }(D_0)), \textsf{irs}(S_0, 1, \bar{\beta }(D_0)) , \dots , \textsf{irs}(S_0, T, \bar{\beta }(D_0))\right] . $$

Based on Definitions 5 and 6, we establish Theorem 1 that the trajectory-inflated LDC reachable tube contains the HDC reachable tube with at least $1-\alpha $ probability.

Theorem 1

(Confident trajectory-based overapproximation). Consider distribution $D_0$ over initial set $S_0$, confidence $\alpha $, a high-dimensional system $M_{hd}$, approximated with a low-dimensional system controlled by $c_{ld} $ with an $\alpha $-confident statistical trajectory-based discrepancy function $\bar{\beta }(S_0)$. Then the trajectory-inflated low-dimensional tube $\textsf{irt}_{M_{ld}}(S_0, \bar{\beta }(D_0))$ contains the high-dimensional reachable tube $\textsf{rt}_{M_{hd}}(S_0)$ with probability $1-\alpha $:

$$ {\text {Pr}}_{D_0}\Big [\textsf{rt}_{M_{hd}}(S_0) \subseteq \textsf{irt}_{M_{ld}}(S_0, \bar{\beta }(S_0)) \Big ] \ge 1-\alpha $$

Proof

All the proofs are found in the extended online version [67].

Definitions 5 and 6 and Theorem 1 describe inflation and guarantees with a single LDC. However, one LDC usually cannot mimic the behavior of the HDC accurately. Therefore, we train several LDCs $\{c_{ld} ^1,c_{ld} ^2, \dots , c_{ld} ^{m} \}$, one for each subregion of initial set $\{S_1, S_2,\dots ,S_{m}\}$ with respective distributions $D_0 = \{D_1, D_2,\dots ,D_{m}\}$. Subsequently, the trajectory-inflated tube with multiple LDCs can be represented as a union of all the single trajectory-inflated tube $\textsf{irt}(S_0, \bar{\beta }(D_0)) := \bigcup _{i = 1}^{m}\textsf{irt}(S_i, \bar{\beta }(D_i))$.

Action-Based Inflation. Action-based inflation is less direct than with trajectories: we inflate the neural network’s output set that is represented by a Taylor model TM$(p(S_0), I)$ [27], where $p(S_0)$ is a polynomial representing order-k Taylor series expansion of the $c_{ld}$ activation functions in region $S_0$, and the remainder interval I ensures that Taylor model overapproximates the neural network’s output. In this context, we widen the bounds of the remainder interval I in the last layer of the $c_{ld}$ by our statistical action-based discrepancy $\bar{\gamma }(D_0)$, ensuring that the potential outputs of $c_{hd}$ are contained in the resulting Taylor model.

Definition 7

(Action-inflated reachable set). Given distribution $D_0$ over set $S_0$ that is controlled by LDC $c_{ld} $, statistical action-based discrepancy $ \bar{\gamma }(D_0)$, and low-dimensional control bounds $[u_{min}(t), u_{max}(t)] \supseteq c_{ld} \big (S_0\big )$, the action-inflated reachable set contains states reachable by inflating the action bounds:

$$\begin{aligned} \textsf{irs}(S_0, \bar{\gamma }(D_0)) &= \big \{ f(s, u) \mid s \in S_0, u \in \big [u_{min}(t) - \bar{\gamma }(D_0), u_{max}(t) + \bar{\gamma }(D_0)\big ] \big \} \end{aligned}$$

Definition 8

(Action-inflated reachable tube). Given an distribution $D_0$ over initial set $S_0$ that is controlled by LDC $c_{ld}$, dynamics f, time horizon T, and action-based discrepancy functions $ \bar{\gamma }(D_0)$, the action-inflated reachable tube is a recursive sequence of inflated action-based reachable sets:

$$ \textsf{irt}(S_0, \bar{\gamma }(D_0)) = \big [ S_0, \textsf{irs}_1(S_0, \bar{\gamma }(D_0)), \textsf{irs}_2(\textsf{irs}_1, \bar{\gamma }(D_0)) , \dots , \textsf{irs}_T(\textsf{irs}_{T-1}, \bar{\gamma }(D_0)) \big ]. $$

Based on Definitions 7 and 8, we put forward Theorem 2 below for the lower probability bound of the action-inflated LDC tube containing the true HDC tube.

Theorem 2

(Confident action-based overapproximation). Consider distribution $D_0$ over initial set $S_0$, high-dimensional system $M_{hd}$ with controller $c_{hd}$, approximated by low-dimensional system $M_{ld} $ controlled by $c_{ld} $ with $\alpha $-confident statistical action-based discrepancies $\bar{\gamma }(S_0)$. Then the action-inflated low-dimen-

sional tube $\textsf{irt}_{M_{ld}}(S_0, \bar{\gamma }(S_0))$ contains the high-dimensional tube $\textsf{rt}_{M_{hd}}(S_0)$ with probability $1-\alpha $:

$$ {\text {Pr}}_{D_0} \Big [\textsf{rt}_{M_{hd}}(S_0) \subseteq \textsf{irt}_{M_{ld}}(S_0, \bar{\gamma }(S_0)) \Big ] \ge 1-\alpha $$

Definitions 7 and 8 describe inflation with a single LDC, which we extend to multiple LDCs by taking the union of all the LDCs’ inflated tubes. Given a partitioned initial set $S_0 = \{S_1, ..., S_m \}$ with respective controllers $\{c_{ld} ^1, \dots , c_{ld} ^{m} \}$ and distributions $D_0 = \{D_1, ...,D_m \}$, the multiple LDCs action-inflated reachable tube is $\textsf{irt}(S_0, \bar{\gamma }(D_0)) := \bigcup _{i = 1}^{m}\textsf{irt}(S_i, \bar{\gamma }(D_i))$. As it turns out, this reachable tube also contains the HDC tube with at least $1-\alpha $ chance.

Step 5: Iterative Retraining and Re-gridding

Once the inflated reachable tubes are obtained in Step 4, we focus on the regions of the initial set where HDC simulations succeed—yet safety verification fails. This can happen for two reasons: (i) overly high overapproximation error in the LDC reachability, or (ii) overly high conformal discrepancy bounds from $\bar{\beta }$ or $\bar{\gamma }$.

Reducing Reachability Overapproximation Error. We lower the threshold for the Lipschitz constant $\lambda $ to retrain the respective LDCs in Step 1. In our experience, this almost always reduces the overapproximation in the LDC analysis and makes low-dimensional reachable tubes tighter—but may result in higher statistical discrepancy bounds, which we address below.

Reducing Conformal Discrepancy Bounds. When these bounds are loose, our LDC imitates the HDC poorly in some state-space region. Here, we take inspiration from refinement techniques in testing [45, 66]. When a desired discrepancy bound $\xi $ is exceeded in a state-space region, we split it into subregions by taking its midpoints in each dimension, leading to an updated state-space grid $\textbf{S}'$. Then in each sub-region, we retrain an LDC as per Step 1 with a reduced MSE threshold $\epsilon $ and re-compute its bounds as per Step 3b. leading to tighter statistical overapproximations of HDC reachable tubes.

To summarize, Algorithm 1 shows our iterative training procedure for the action-based approach (its trajectory-based counterpart proceeds analogously, except for computing the discrepancies over trajectories).

Combining all the five steps together, we present Algorithm 2 that displays our end-to-end verification of a given HDC with either trajectory-based or action-based discrepancies. The LDCs and their discrepancies are input into the reachability analysis, implemented with the function Reach, to calculate the inflated reachable tubes (using the POLAR toolbox in practice). Note that the verification regions of $\mathbf {S_{ver}}$ in Algorithm 2 are much smaller partitions of larger gridding regions $\textbf{S}$ defined in Algorithm 1 for training. Each gridding region, which for instance is a 0.5 $\times $ 0.5 square, corresponds to one LDC. Inside each gridding region, the verification region $\mathbf {S_{ver}}$ is a 0.01$\times $0.01 square. Our end-to-end algorithm guarantees that an affirmative answer to our verification problem is correct with at least $1-\alpha $ probability, as per Theorem 3.

Theorem 3

(Confident guarantee of HDC safety). Consider a partitioned initial set grid $S_0 = \{S_1, \dots , S_{m}\}$, a set of corresponding distributions $\{D_1,...D_m \}$, a high-dimensional system $M_{hd}$ with controller $c_{hd}$, and a set of low-dimensional systems $M_{ld} ^1, \dots , M_{ld} ^m$ with respective controllers $c_{ld} ^1, \dots , c_{ld} ^n$ that approximate $c_{hd}$ with either an $\alpha $-confident trajectory discrepancy or action discrepancy, the probability that HDC safe set $S_{safe}$ calculated by Algorithm 2 with either discrepancy belongs to ground truth safe set $ S^*_{safe}$ is at least $(1 - \alpha )$:

$${\text {Pr}}_{D_1...D_m}\Big [ S_{safe} \subseteq S^*_{safe} \Big ] \ge (1-\alpha ) $$

4 Experimental Evaluation

Benchmark Systems and Controllers. We evaluate our approach on three benchmarks from OpenAI Gym [7]: two two-dimensional case studies—an inverted pendulum (IP) with angle $\theta $ and angular velocity $\dot{\theta }$; a mountain car (MC) with position x and velocity v, and a four-dimensional case study—a cart pole (CP) with cart position x, cart velocity v, angle $\theta $, and angular velocity $\dot{\theta }$. Our selection of case studies is limited because of the engineering challenge of setting up both vision-based control and low-dimensional verification for the same system. Our continuous-action, convolutional HDCs $c_{hd}$ for these systems were trained with deep deterministic policy gradient (DDPG) [36]. To imitate the performance of $c_{hd}$, we train simpler feedforward neural networks $c_{ld}$ with only low-dimensional state inputs. See the Appendix for their architecture and dynamics, and our code can be accessed from GitHub^{Footnote 1}

Experimental Procedure. Our verification’s goal is to check whether the system will stay inside the specified goal set G after T time steps (e.g., the mountain car’s position must stay within the target set $[0.45, \infty ]$ after 60 steps). The verification returns “safe” if the inflated reachable set for $t=T$ lies entirely in G—and “unsafe” otherwise. The details are found in the Appendix.

For both approaches, we calculate the discrepancies in 0.25-sized state squares within the initial set in IP, hence creating $8 \times 8 = 64$ regions (MC has $8 \times 9 = 72$ regions; CP has $5 \times 5 \times 5 \times 5 = 625$). In each, we sample 60 trajectories to compute both trajectory-based discrepancies $\bar{\beta }$ and action-based discrepancies $\bar{\gamma }$ because it is a relatively small sample count that avoids the highest non-conformity score or the infinity as the conformal bound. We also implement a pure conformal prediction baseline and, for a fair comparison, give it the same data/regions. This results in 3840 sampled trajectories in IP, 4320 in MC, and 76800 for CP.

We use closed-loop simulation to obtain the (approximate) ground truth (GT) of safety. For IP and CP, we grid the initial set into squares with an interval of 0.01. For MC, we grid the initial set with the position step 0.01 and velocity step 0.001. Within each grid cell, we uniformly sample 10 initial states and simulate a trajectory from each. If all 10 trajectories end in the goal set G, we mark this cell as “truly safe”, otherwise “truly unsafe”. In IP, the truly safe-to-unsafe cell ratio is 0.56, 0.78 in MC, and 0.58 in CP. The verification process uses the same grid cells as its initial state regions, leading to 40k low-dimensional verification runs for IP, 14k for MC, and 50k for CP. The trajectory-based verification time for IP, MC, and CP are 6.2, 5.8, and 6.4 h respectively; the action-based verification takes 6.3, 6.1, and 6.6 h respectively.

Success Metrics. We evaluate verification as a binary classifier of the GT safety, with “safe” being the positive class and “unsafe” being the negative. Our evaluation metrics are the (i) true positive rate (TPR, a.k.a. sensitivity and recall), indicating the fraction of truly safe regions that were successfully verified; (ii) true negative rate (TNR, a.k.a. specificity), indicating the fraction of truly unsafe regions that failed verification; (iii) precision, indicating the fraction of safe verification verdicts that are truly safe (which is essential for safety-critical systems and controlled by rate $\alpha $ as per Theorem 3); and (iv) F1 score, which is a harmonic mean of precision and recall to provide a class-balanced assessment of predictions.

Table 1. Verification performance ($M=4$ for IP and CP, $M=10$ for MC).

Full size table

Table 2. Verification performance for multiple LDCs with zero-mean Gaussian noise added to true state before image generator g.

Full size table

Verification Results. The quantitative results of the three case studies are summarized in Table 1. Confidence $\alpha $ is set to 0.05 for all methods, which sets the minimum precision to 0.95, satisfied by all the approaches. The pure conformal prediction baseline shows high precision and TNR, but loses in TPR to our approaches—thus being able to correctly verify a significantly smaller region of the state space. When it comes to well-balanced safety prediction in practice, F1 score shows that our trajectory-based approach outperforms the other two.

Across all case studies, the baseline is significantly more conservative than the requested 95% precision. While this can be an advantage in safety-critical settings, excessive conservatism can also hamper adoption, so the approach should be sensitive to the desired confidence—which our trajectory-based approach demonstrates in the mountain car case study (see Precision in Table 1).

Across all case studies, the multi-LDC approaches always match or outperform the one-LDC approaches. This result demonstrates the utility of modularizing the HDC approximation problem. Also, our single-LDC action-based approach successfully verifies relatively few regions, leading to its low TPR. That is because unlike in the case of trajectory discrepancies, only one LDC cannot provide tight statistical upper bounds for control actions, causing large overapproximation in the inflated reachable sets, resulting in false negatives.

Sensitivity to Noisy Images. Despite adding Gaussian noise to generator g, our approaches perform similarly to noise-free g when under low noise variance as per Table 2, thus showing some robustness. However, we saw a significant decline in the verification coverage (TPR, but not the TNR and $\alpha $-guaranteed precision) under substantial noise variance (up to 0.5, not shown in Table 2).

Limitations. Our approach relies on statistical inference based on i.i.d. sampling from a fixed distribution, which downgrades the exhaustive guarantees of formal verification. However, it may be possible to exhaustively bridge this gap with neural-network conformance analysis based on satisfiability solving [41]. We also envision relaxing the i.i.d. assumption with time-series conformal prediction [3, 58], as well as uncertainty-guided gridding [37] to reduce our discrepancy bounds.

5 Related Work

Low-Dimensional Verification of Closed-Loop Systems. Neural-network controlled systems have been used widely [42, 46, 52], which has highlighted the challenges of verifying their correctness within closed-loop systems. Since it’s impossible to calculate all the exact states, especially in non-linear systems, current approaches primarily focus on how to make tight overapproximate reachable sets [2, 9, 10]. For sigmoid-based NNCS, Verisig [30] toolbox can transform the neural-network controlled system into a hybrid system, which can be verified by other tools like flow*. NNV [54] performs overapproximation analysis by combining star sets [38, 53] for feed-forward neural networks with zonotopes for non-linear plant dynamics in CORA [2]. POLAR [27] overcame the challenges of non-differentiable activation functions by combining the Bernstein-Bézier Form [28] and the symbolic remainder. This method achieves state-of-the-art performance in both the tightness of reachable tubes and computation times. Another type of verification called Hamilton-Jacobi (HJ) reachability [4], is inspired by optimal control. The DeepReach [5] technique can solve the verification problem with tens of dimensions by leveraging a deep neural network to represent the value function in the HJ reachability analysis. Nonetheless, such methods remain ill-suited for handling inputs with hundreds or thousands of dimensions.

These verification tools cannot deal with complicated neural network controllers. Therefore, an alternative approach is to simplify complex controllers into smaller, verifiable controllers by model reduction techniques [16, 33], such as parameter pruning, compact convolution filters, and knowledge distillation [25].

Abstractions of Perception Models. Given the challenge of verifying the image-based closed-loop systems directly, many methods construct abstractions of the perception model to map the relationship between the image and the states for verification [43]. One abstraction approach [31] employs the generative model, especially Generative Adversarial Network (GAN), mapping states to images. The generated images will be put into the controller in the verification phase. Hence, the accuracy of the verification results depends on the quality of the image produced by the generative model. Other researchers [26] construct the exact mathematical formula mapping the real state into the simplified image [47], which can be verified in another neural network checker [32]. One limitation of exact modeling is the effort to generalize for other systems or scenarios. For instance, their implementation may be specific to a proportional controller in the aircraft landing or lane-keeping scenarios, which may not be suitable for the more complicated image-based systems in other cases.

Statistical Verification. Statistical verification draws samples to determine the property satisfaction from a finite number of trajectories [1, 11, 34, 35]. One advantage of such algorithms is that they provide assurance for arbitrarily complex black-box systems, merely requiring the ability to simulate them [59, 60]. Conformal prediction [55], which has been a popular choice for distribution-free uncertainty quantification, has recently been used to provide probabilistic guarantees on the satisfaction of a given STL property [37, 45]. Purely statistical methods come at the price of drawing sufficient samples—and only obtaining the guarantees at some level of statistical confidence, which can be difficult to interpret in the context of a dynamical system. Our work restricts the use of sampling only to the most challenging aspects and leverages exhaustive verification for the rest of the system, thus reducing our reliance on statistical assurance.

6 Conclusion

This paper takes a significant step towards addressing the major challenge of verifying end-to-end controllers implemented with high-dimensional neural networks. Our insight is that the behavior of such neural networks can be effectively approximated by several low-dimensional neural networks operating over physically meaningful space. To balance approximation error and verifiability in our low-dimensional controllers, we harness the state-of-the-art knowledge distillation. To close the gap between low- and high-dimensional controllers, we apply conformal prediction and provide a statistical upper bound on their difference either in trajectories or actions. Finally, by inflating the reachable tubes with two discrepancy types, we establish a high-confidence reachability guarantee for high-dimensional controllers. Future work may further reduce the role of sampling.

Notes

1.
https://github.com/Trustworthy-Engineered-Autonomy-Lab/Bridging-dimensions.

References

Agha, G., Palmskog, K.: A survey of statistical model checking. ACM Trans. Modeling Comput. Simul. 28 (2018,1), Publisher Copyright: 2018 ACM
Google Scholar
Althoff, M.: An introduction to CORA 2015. In: Proc. of the Workshop on Applied Verification for Continuous And Hybrid Systems, pp. 120-151 (2015)
Google Scholar
Auer, A., Gauch, M., Klotz, D., Hochreiter, S.: Conformal prediction for time series with modern hopfield networks. In: Proceedings Of The 37th International Conference On Neural Information Processing Systems (2024)
Google Scholar
Bansal, S., Chen, M., Herbert, S.L., Tomlin, C.J.: Hamilton-jacobi reachability: a brief overview and recent advances. 2017 IEEE 56th Annual Conference on Decision and Control (CDC), pp. 2242–2253 (2017). https://api.semanticscholar.org/CorpusID:35768454
Bansal, S., Tomlin, C.J.: Deepreach: a deep learning approach to high-dimensional reachability. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 1817–1824. IEEE (2021)
Google Scholar
Bassan, S., Katz, G.: Towards formal xai: formally approximate minimal explanations of neural networks. In: International Conference on Tools and Algorithms for the Construction and Analysis of Systems, pp. 187–207. Springer (2023)
Google Scholar
Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., Zaremba, W.: OpenAI Gym (Jun 2016). http://arxiv.org/abs/1606.01540, arXiv:1606.01540 [cs]
Chakraborty, K., Bansal, S.: Discovering closed-loop failures of vision-based controllers via reachability analysis. IEEE Robot. Automation Lett. 8(5), 2692–2699 (2023)
Article Google Scholar
Chen, X., Ábrahám, E., Sankaranarayanan, S.: Flow*: An analyzer for non-linear hybrid systems. In: International Conference on Computer Aided Verification (2013)
Google Scholar
Chen, X., Sankaranarayanan, S.: Reachability analysis for cyber-physical systems: Are we there yet? In: NASA Formal Methods Symposium, pp. 109-130 (2022)
Google Scholar
Cleaveland, M., Lee, I., Pappas, G., Lindemann, L.: Conformal prediction regions for time series using linear complementarity programming. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, pp. 20984–20992 (2024)
Google Scholar
Cleaveland, M., Sokolsky, O., Lee, I., Ruchkin, I.: Conservative safety monitors of stochastic dynamical systems. In: Proc. of the NASA Formal Methods Conference, May 2023
Google Scholar
Codevilla, F., Müller, M., López, A., Koltun, V., Dosovitskiy, A.: End-to-end driving via conditional imitation learning. In: 2018 IEEE International Conference On Robotics And Automation (ICRA), pp. 4693-4700 (2018)
Google Scholar
Cofer, D., et al.: Run-time assurance for learning-based aircraft taxiing. In: 2020 AIAA/IEEE 39th Digital Avionics Systems Conference (DASC), pp. 1–9 (2020)
Google Scholar
Combettes, P.L., Pesquet, J.C.: Lipschitz Certificates for Layered Network Structures Driven by Averaged Activation Operators. SIAM Journal on Mathematics of Data Science 2(2), 529–557 (Jan 2020). https://doi.org/10.1137/19M1272780, publisher: Society for Industrial and Applied Mathematics
Deng, L., Li, G., Han, S., Shi, L., Xie, Y.: Model compression and hardware acceleration for neural networks: a comprehensive survey. Proc. IEEE 108(4), 485–532 (2020)
Article Google Scholar
Dutta, S., et al.: Distributionally robust statistical verification with imprecise neural networks (Aug 2023). https://doi.org/10.48550/arXiv.2308.14815, arXiv:2308.14815 [cs]
Dutta, S., Chen, X., Jha, S., Sankaranarayanan, S., Tiwari, A.: Sherlock-a tool for verification of neural network feedback systems: demo abstract. In: Proceedings of the 22nd ACM International Conference On Hybrid Systems: Computation And Control, pp. 262–263 (2019)
Google Scholar
Fan, C., Mitra, S.: Bounded verification with on-the-fly discrepancy computation. In: Finkbeiner, B., Pu, G., Zhang, L. (eds.) ATVA 2015. LNCS, vol. 9364, pp. 446–463. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24953-7_32
Chapter Google Scholar
Fan, C., Qi, B., Mitra, S., Viswanathan, M.: DryVR: data-driven verification and compositional reasoning for automotive systems. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017. LNCS, vol. 10426, pp. 441–461. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63387-9_22
Chapter Google Scholar
Fan, J., Huang, C., Li, W., Chen, X., Zhu, Q.: Towards verification-aware knowledge distillation for neural-network controlled systems: Invited paper. In: 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pp. 1–8 (2019). https://api.semanticscholar.org/CorpusID:209497572
Fannjiang, C., Bates, S., Angelopoulos, A., Listgarten, J., Jordan, M.: Conformal prediction under feedback covariate shift for biomolecular design. Proc. Natl. Acad. Sci. 119, e2204569119 (2022)
Google Scholar
Fazlyab, M., Robey, A., Hassani, H., Morari, M., Pappas, G.: Efficient and accurate estimation of lipschitz constants for deep neural networks. In: Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc. (2019). https://proceedings.neurips.cc/paper_files/paper/2019/hash/95e1533eb1b20a97777749fb94fdb944-Abstract.html
Gou, J., Yu, B., Maybank, S.J., Tao, D.: Knowledge distillation: a survey. Int. J. Comput. Vision 129, 1789–1819 (2021)
Article Google Scholar
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)
Hsieh, C., Li, Y., Sun, D., Joshi, K., Misailovic, S., Mitra, S.: Verifying controllers with vision-based perception using safe approximate abstractions. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 41(11), 4205–4216 (2022). https://doi.org/10.1109/TCAD.2022.3197508
Article Google Scholar
Huang, C., Fan, J., Chen, X., Li, W., Zhu, Q.: Polar: A polynomial arithmetic framework for verifying neural-network controlled systems. In: International Symposium on Automated Technology for Verification and Analysis, pp. 414–430. Springer (2022)
Google Scholar
Huang, C., Fan, J., Li, W., Chen, X., Zhu, Q.: Reachnn: reachability analysis of neural-network controlled systems. ACM Trans. Embedded Comput. Syst. (TECS) 18(5s), 1–22 (2019)
Article Google Scholar
Fazlyab, M., Robey, A., Hassani, H., Morari, M., Pappas, G.: Efficient and accurate estimation of lipschitz constants for deep neural networks. In: Advances In Neural Information Processing Systems. 32 (2019)
Google Scholar
Ivanov, R., Weimer, J., Alur, R., Pappas, G., Lee, I.: Verisig: verifying safety properties of hybrid systems with neural network controllers. In: Proceedings of the 22nd ACM International Conference on Hybrid Systems: Computation And Control, pp. 169-178 (2019)
Google Scholar
Katz, S.M., Corso, A.L., Strong, C.A., Kochenderfer, M.J.: Verification of image-based neural network controllers using generative models. J. Aerospace Inf. Syst. 19(9), 574–584 (2022)
Article Google Scholar
Khedr, H., Ferlez, J., Shoukry, Y.: Peregrinn: Penalized-relaxation greedy neural network verifier. In: Computer Aided Verification: 33rd International Conference, CAV 2021, Virtual Event, July 20-23, 2021, Proceedings, Part I, pp. 287-300 (2021). https://doi.org/10.1007/978-3-030-81685-8_13
Ladner, T., Althoff, M.: Specification-driven neural network reduction for scalable formal verification. arXiv preprint arXiv:2305.01932 (2023)
Larsen, K.G., Legay, A.: Statistical model checking: past, present, and future. In: Margaria, T., Steffen, B. (eds.) ISoLA 2016. LNCS, vol. 9952, pp. 3–15. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-47166-2_1
Chapter Google Scholar
Lew, T., Janson, L., Bonalli, R., Pavone, M.: A Simple and Efficient Sampling-based Algorithm for General Reachability Analysis. In: Proceedings of the 4th Annual Learning for Dynamics and Control Conference. 168, pp. 1086–1099 (2022,6,23). https://proceedings.mlr.press/v168/lew22a.html
Lillicrap, T., Hunt, J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. CoRR. abs/1509.02971 (2015). https://api.semanticscholar.org/CorpusID:16326763
Lindemann, L., Qin, X., Deshmukh, J.V., Pappas, G.J.: Conformal prediction for stl runtime verification. In: Proceedings of the ACM/IEEE 14th International Conference on Cyber-Physical Systems (with CPS-IoT Week 2023), pp. 142–153. ICCPS ’23, Association for Computing Machinery, New York (2023). https://doi.org/10.1145/3576841.3585927
Lopez, D.M., Musau, P., Tran, H.D., Johnson, T.T.: Verification of closed-loop systems with neural network controllers. EPiC Series in Computing 61, 201–210 (2019)
Article Google Scholar
Luo, R., Zhao, S., Kuck, J., Ivanovic, B., Savarese, S., Schmerling, E., Pavone, M.: Sample-efficient safety assurances using conformal prediction. In: International Workshop on the Algorithmic Foundations of Robotics, pp. 149–169. Springer (2022)
Google Scholar
Matsumoto, E., Saito, M., Kume, A., Tan, J.: End-to-end learning of object grasp poses in the amazon robotics challenge. In: Causo, A., Durham, J., Hauser, K., Okada, K., Rodriguez, A. (eds.) Advances on Robotic Item Picking, pp. 63–72. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-35679-8_6
Chapter Google Scholar
Mohammadinejad, S., Paulsen, B., Deshmukh, J.V., Wang, C.: DiffRNN: differential verification of recurrent neural networks. In: Dima, C., Shirmohammadi, M. (eds.) FORMATS 2021. LNCS, vol. 12860, pp. 117–134. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-85037-1_8
Chapter Google Scholar
Pan, Y., Cheng, C., Saigol, K., Lee, K., Yan, X., Theodorou, E., Boots, B.: Agile Autonomous Driving using End-to-End Deep Imitation Learning. Robotics: Science And Systems XIV (2017). https://api.semanticscholar.org/CorpusID:53873353
Păsăreanu, C.S., Mangal, R., Gopinath, D., Getir Yaman, S., Imrie, C., Calinescu, R., Yu, H.: Closed-loop analysis of vision-based autonomous systems: A case study. In: International Conference on Computer Aided Verification, pp. 289–303. Springer (2023)
Google Scholar
Qin, X., Hashemi, N., Lindemann, L., Deshmukh, J.V.: Conformance testing for stochastic cyber-physical systems. In: Conference on Formal Methods in Computer-Aided Design–FMCAD 2023, p. 294 (2023)
Google Scholar
Qin, X., Xia, Y., Zutshi, A., Fan, C., Deshmukh, J.V.: Statistical verification of cyber-physical systems using surrogate models and conformal inference. In: 2022 ACM/IEEE 13th International Conference on Cyber-Physical Systems (ICCPS), pp. 116–126 (2022). https://doi.org/10.1109/ICCPS54341.2022.00017
Ruchkin, I., Cleaveland, M., Ivanov, R., Lu, P., Carpenter, T., Sokolsky, O., Lee, I.: Confidence composition for monitors of verification assumptions. In: ACM/IEEE 13th Intl. Conf. on Cyber-Physical Systems (ICCPS), pp. 1–12, May 2022. https://doi.org/10.1109/ICCPS54341.2022.00007
Santa Cruz, U., Shoukry, Y.: Nnlander-verif: a neural network formal verification framework for vision-based autonomous aircraft landing. Springer, Heidelberg (2022). https://doi.org/10.1007/978-3-031-06773-0_11
Shafer, G., Vovk, V.: A Tutorial on Conformal Prediction. J. Mach. Learn. Res. 9, 371–421 (2008). http://dl.acm.org/citation.cfm?id=1390681.1390693
Stocco, A., Nunes, P.J., D’Amorim, M., Tonella, P.: Thirdeye: Attention maps for safe autonomous driving systems. In: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, ASE 2022. Association for Computing Machinery, New York (2023). https://doi.org/10.1145/3551349.3556968
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., Fergus, R.: Intriguing properties of neural networks. In: International Conference on Learning Representations (2014)
Google Scholar
Teeti, I., Khan, S., Shahbaz, A., Bradley, A., Cuzzolin, F.: Vision-based Intention and Trajectory Prediction in Autonomous Vehicles: A Survey, vol. 6, pp. 5630–5637 (Jul 2022). https://www.ijcai.org/proceedings/2022/785, iSSN: 1045-0823
Topcu, U., Bliss, N., Cooke, N., Cummings, M., Llorens, A., Shrobe, H., Zuck, L.: Assured Autonomy: Path Toward Living With Autonomous Systems We Can Trust, October 2020. http://arxiv.org/abs/2010.14443, arXiv:2010.14443 [cs]
Tran, H.D., Manzanas Lopez, D., Musau, P., Yang, X., Nguyen, L.V., Xiang, W., Johnson, T.T.: Star-based reachability analysis of deep neural networks. In: Formal Methods–The Next 30 Years: Third World Congress, FM 2019, Porto, Portugal, October 7–11, 2019, Proceedings 3, pp. 670–686. Springer (2019)
Google Scholar
Tran, H., et al.: NNV: the neural network verification tool for deep neural networks and learning-enabled cyber-physical systems. In: International Conference on Computer Aided Verification, pp. 3-17 (2020)
Google Scholar
Vovk, V., Gammerman, A., Shafer, G.: Algorithmic Learning in a Random World. Springer, New York, 2005 edition edn. (2005)
Google Scholar
Xiang, W., Shao, Z.: Approximate bisimulation relations for neural networks and application to assured neural network compression. In: 2022 American Control Conference (ACC), pp. 3248–3253. IEEE (2022)
Google Scholar
Xiang, W., Shao, Z.: Safety verification of neural network control systems using guaranteed neural network model reduction. In: 2022 IEEE 61st Conference on Decision and Control (CDC), pp. 1521–1526. IEEE (2022)
Google Scholar
Xu, C., Xie, Y.: Conformal prediction interval for dynamic time-series. In: Proceedings of the 38th International Conference on Machine Learning, pp. 11559–11569. PMLR, July 2021. https://proceedings.mlr.press/v139/xu21h.html, iSSN: 2640-3498
Xue, B., Zhang, M., Easwaran, A., Li, Q.: Pac model checking of black-box continuous-time dynamical systems. IEEE Trans. Comput.-Aided Des. Integrated Circuits Syst. 39 (07 2020). https://doi.org/10.1109/TCAD.2020.3012251
Zarei, M., Wang, Y., Pajic, M.: Statistical verification of learning-based cyber-physical systems. In: Proceedings of the 23rd International Conference on Hybrid Systems: Computation and Control, HSCC 2020. Association for Computing Machinery, New York (2020). https://doi.org/10.1145/3365365.3382209
Zhang, M., Zhang, Y., Zhang, L., Liu, C., Khurshid, S.: Deeproad: Gan-based metamorphic testing and input validation framework for autonomous driving systems. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, pp. 132-142. ASE 2018. Association for Computing Machinery, New York, NY, USA (2018). https://doi.org/10.1145/3238147.3238187
Wang, Y., Zhou, W., Fan, J., Wang, Z., Li, J., Chen, X., Huang, C., Li, W. and Zhu, Q.: Polar-express: Efficient and precise formal reachability analysis of neural-network controlled systems. In: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. (2023)
Google Scholar
Xin, L., Tang, Z., Gai, W., Liu, H.: Vision-based autonomous landing for the UAV: A review. Aerospace 9, 634 (2022)
Article Google Scholar
Tang, C., Lai, Y.: Deep reinforcement learning automatic landing control of fixed-wing aircraft using deep deterministic policy gradient. In: 2020 International Conference On Unmanned Aircraft Systems (ICUAS), pp. 1-9 (2020)
Google Scholar
Oszust, M., et al.: A vision-based method for supporting autonomous aircraft landing. Aircraft Eng. Aerospace Technol. 90, 973–982 (2018)
Article Google Scholar
Menghi, C., Nejati, S., Briand, L., Parache, Y.: Approximation-refinement testing of compute-intensive cyber-physical models: an approach based on system identification. In: 2020 IEEE/ACM 42nd International Conference On Software Engineering (ICSE), pp. 372–384 (2020)
Google Scholar
Geng, Y., Baldauf, J. B., Dutta, S., Huang, C., Ruchkin, I.: Bridging Dimensions: Confident Reachability for High-Dimensional Controllers. 2024. arXiv preprint arXiv:2311.04843. https://arxiv.org/abs/2311.04843

Download references

Acknowledgments

The authors thank Kang Gao, Zhenjiang Mao, Priyanshu Mathur, and Sukanth Sundaran for helping implement the verification and case studies as well as providing valuable feedback on this manuscript.

This work was supported in part by the NSF Grant CCF-2403616, ARO MURI W911NF-20-1-0080, and grant EP/Y002644/1 under the EPSRC ECR International Collaboration Grants program, funded by the International Science Partnerships Fund (ISPF) and the UK Research and Innovation. Any opinions, findings, conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation (NSF), Army Research Office (ARO), the Department of Defense, or the United States Government.

Author information

Authors and Affiliations

University of Florida, Gainesville, FL, USA
Yuang Geng, Jake Brandon Baldauf & Ivan Ruchkin
University of Pennsylvania, Philadelphia, PA, USA
Souradeep Dutta
University of Southampton, Southampton, UK
Chao Huang

Authors

Yuang Geng
View author publications
You can also search for this author in PubMed Google Scholar
Jake Brandon Baldauf
View author publications
You can also search for this author in PubMed Google Scholar
Souradeep Dutta
View author publications
You can also search for this author in PubMed Google Scholar
Chao Huang
View author publications
You can also search for this author in PubMed Google Scholar
Ivan Ruchkin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuang Geng .

Editor information

Editors and Affiliations

Karlsruhe Institute of Technology, Karlsruhe, Germany
André Platzer
Iowa State University, Ames, IA, USA
Kristin Yvonne Rozier
Politecnico di Milano, Milan, Italy
Matteo Pradella
Politecnico di Milano, Milan, Italy
Matteo Rossi

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this paper

Cite this paper

Geng, Y., Baldauf, J.B., Dutta, S., Huang, C., Ruchkin, I. (2025). Bridging Dimensions: Confident Reachability for High-Dimensional Controllers. In: Platzer, A., Rozier, K.Y., Pradella, M., Rossi, M. (eds) Formal Methods. FM 2024. Lecture Notes in Computer Science, vol 14933. Springer, Cham. https://doi.org/10.1007/978-3-031-71162-6_20

Download citation

DOI: https://doi.org/10.1007/978-3-031-71162-6_20
Published: 11 September 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-71161-9
Online ISBN: 978-3-031-71162-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Bridging Dimensions: Confident Reachability for High-Dimensional Controllers