NeuReach: Learning Reachability Functions from Simulations ?

. We present NeuReach , a tool that uses neural networks for predicting reachable sets from executions of a dynamical system. Unlike existing reachability tools, NeuReach computes a reachability function that outputs an accurate over-approximation of the reachable set for any initial set in a parameterized family. Such reachability functions are useful for online monitoring, veri(cid:12)cation, and safe planning. NeuReach implements empirical risk minimization for learning reachability functions. We discuss the design rationale behind the optimization problem and establish that the computed output is probably approximately correct. Our experimental evaluations over a variety of systems show promise. NeuReach can learn accurate reachability functions for complex nonlinear systems, including some that are beyond existing methods. From a learned reachability function, arbitrary reachtubes can be computed in milliseconds. NeuReach is available at https://github.com/sundw2014/NeuReach.


Introduction
Reachability has traditionally been a fundamental building block for verification, monitoring, and prediction, and it is finding ever-expanding set of applications in control of cyber-physical and autonomous systems [19,23]. Reachtubes cannot be computed exactly for general hybrid models, but remarkable progress over the past two decades have led to approximation algorithms for nonlinear and very high-dimensional linear models (See, for example, [11,18,5,3,25,12,1,26,34]). All of these algorithms and tools compute the reachtube from scratch, every time the algorithm is invoked for a new initial set X 0 , even if the system model does not change. This is a missed opportunity in amortizing the cost of reachability over multiple invocations. All the applications mentioned above, like verification, monitoring, and prediction, indeed use multiple reachtubes of the same system, but from different initial sets.
In this paper, we present NeuReach, a tool that learns a reachability function from executions of dynamical systems. With the learned reachability function, for every new initial set a corresponding reachtube can be computed quickly. To use NeuReach, the user has to implement a simulator function of the underlying dynamical (or hybrid) system for generating trajectories, and several other functions for sampling initial sets. As output, the tool will generate a function which can be serialized and stored for repeated use. This function takes as input a query which is an initial set X 0 and a time instant t, and outputs an ellipsoid, which is guaranteed to be an accurate over-approximation of the actual reachable set.
Formally, NeuReach solves a probabilistic variant of the well-studied reachability problem: the problem is to compute a reachability function R(·, ·) for a given model (or simulator), such that for any initial set X 0 and time t, the output of the function R(X 0 , t) is an over-approximation of the actual reachset from X 0 at time t. That is, R is computed once and for all-possibly with an expensive algorithm-and thereafter, for every new initial set X 0 and time t, the reachset over-approximation R(X 0 , t) is computed simply by calling R. Thus, it enables online and even real-time applications of reachset approximations.
NeuReach computes reachability functions using machine learning. We view this as a statistical learning problem where samples of the system's trajectories have to be used to learn a parameterized reachability function R θ (·, ·). Because the trajectory samples are the only requirements from the underlying dynamical system to run NeuReach, it can be applied to systems with or without analytical models. In this paper, we discuss how the above problem can be cast as an optimization problem. This involves carefully designing a loss function that penalizes error and conservatism of the reachability function. With this loss function, it becomes possible to solve the problem using empirical risk minimization and stochastic gradient descent. For the sake of justifying our design, we derive a theoretical guarantee on the sample complexity using standard statistical learning theory tools.
We evaluate NeuReach on several benchmark systems and compare it with DryVR [21] which also uses machine learning for single-shot reachset computations. Results show that, with the same training data, NeuReach generates more accurate and tighter reachsets. Using NeuReach we are able to check the key safety properties of the challenging F-16 benchmark presented in [28]. To our knowledge, this is the first successful verification of at least some scenarios in this benchmark. Furthermore, as expected, once R(·, ·) is computed, it can be invoked to rapidly compute reachsets for arbitrary X 0 and t. For example, estimating a reachset for an 8-dimensional dynamical system with an NN-controller only takes ∼ 0.3 milliseconds. This makes NeuReach attractive for online and real-time applications.

Contributions.
(1) We present a simple but effective and useful machinelearning algorithm for learning reachability functions from simulations. With the learned reachability function, accurate over-approximation of the reachable set for any initial set in a parameterized family can be quickly computed, which enables real-time safety check and online planning; (2) We derive a probably approximately correct (PAC) bound on the error of the learned reachability function (Theorem 1) using techniques in statistical learning theory; (3) We evaluate the proposed tool on several benchmark dynamical systems and compare it with another data-driven reachability tool. Experiments show that NeuReach can learn more accurate and tighter reachability functions for complex nonlinear and hybrid systems, including some that are beyond existing methods.

Related work
Reachability analysis for models with known dynamics. This category of approaches consider the reachability analysis of models with known dynamics (i.e., white-box models). This is an active research area, and there is an extensive body of theory and tools on this topic [11,2,15,5,25,33,27,16,38,10,39]. Reachability analysis is hard in general. Exact reachability is undecidable even for deterministic linear and rectangular models [29,24]. For dynamical models described with ordinary differential equations (ODE), Hamilton-Jacobi-Bellman (HJB) equations can be used to derive the exact reachable sets [30,6,7]. An HJB equation is a partial differential equation (PDE). Solutions of this PDE defines the reachabiltiy of the underlying dynamical system. However, solving HJB equations is difficult, and such approaches do not scale to high-dimensional systems. In practice, the exact reachable set might be unnecessary. For example, over-approximations of the reachable sets could suffice for safety check purpose. To this end, many approaches and tools have been developed. For example, Flow * [11] uses the technique of Taylor model integration to compute overapproximations of the solution of an ODE.
Another series of work [22,20] leverage the sensitivity analysis of ODE to bound the discrepancy of solutions starting from a small initial set, and thus can compute an over-approximation of the exact reachable set. In [12], a Lagrangianbased algorithm is proposed, which makes use of the Cauchy-Green stretching factor derived from an over-approximation of the gradient of the solution-flows of an ODE. All of the above approaches consider set-based reachability analysis.
Data-driven reachability analysis. In the cases where the exact dynamics of the systems is unknown or partially known, the above approaches cannot be applied. One straight-forward direction is to learn the reachability from behaviors [42] of the dynamical system. Several approaches have been proposed for reachability only using simulations of the underlying system. These approaches include scenario optimization [14,44], sensitivity analysis [21], Gaussian processes [13], adversarial sampling [32,9], etc.
NeuReach falls in the category of approaches that use randomized algorithms for reachability analysis of deterministic (and not stochastic) systems. Another member in this category is the scenario optimization approach presented in [14]. Different from NeuReach, this method learns a single reachset for a fixed initial set and time interval instead of a mapping from arbitrary initial sets and time to the reachsets. Another approach based on scenario optimization is presented in [44]. This method computes a fixed-width reachtube by learning a function of time to represent the central axis of the reachtube. Moreover, it uses polynomials with handcrafted feature vectors for learning, which requires case-by-case design and fine-tuning. In contrast, our method learns a more flexible reachability function using neural networks and avoids the use of handcrafted feature vectors. DryVR [21] computes the reachtubes based on sensitivity analysis. It first learns a sensitivity function with theoretical guarantees, and then uses it to compute the reachset. Among all these tools or methods, we found that DryVR is the only one that has a publicly available implementation. Thus, we compared NeuReach with DryVR.
Neural networks for reachability analysis. Applications of machine learning with neural networks for reachability and monitoring has become an active research area. The approach in [23] aims to learn the reachtube from data using neural networks, with a focus in motion planning. Unlike NeuReach, this approach learns the dynamics of the reachtube, and the reachtube can be obtained by integrating that dynamics. In [30,7], neural networks are used as a PDE solver to approximate the solution of HJB equations. The approach in [36] makes use of neural networks to approximate the reachability of dynamical systems with control input. In [38,10], the authors develop a framework for runtime predictive monitoring of hybrid automata using neural networks and conformal prediction.

Problem setup and an overview of the tool
NeuReach works with deterministic dynamical systems. The state of the system is denoted by x ∈ X ⊆ R n . We assume that we have access to a simulator function ξ : X × R ≥0 → X that generates trajectories of the system up to a time bound T . That is, given an initial state x 0 ∈ X and a time instant t ∈ [0, T ], ξ(x 0 , t) is the state at time t. 1 Consider the evolution of the system from a set of initial states (initial set) X 0 ⊂ X . Lifting the notation of ξ to sets, we write the reachset from X 0 as ξ(X 0 , t) := ∪ x0∈X0 ξ(x 0 , t). In general, ξ(X 0 , t) cannot be computed precisely, and thus, we resort to over-approximations of ξ(X 0 , t) which are usually sufficient for verification and monitoring of safety and general temporal logic requirements, and also for planning. Beyond computing over-approximations of ξ(X 0 , t) for a single X 0 and t, we are interested in finding a reachability function R : 2 X × [0, T ] → 2 X such that, ideally, ξ(X 0 , t) ⊆ R(X 0 , t) for all valid X 0 and t. NeuReach implements a solution to this problem which provides a probabilistic version of the above guarantee with some restrictions on the shape of the initial set X 0 .
In order to discuss the error of a reachability function R, we have to assume that its arguments X 0 and t are independently chosen according to some dis-tributions P 1 and P 2 , i.e. X 0 ∼ P 1 and t ∼ P 2 . Also, we need a distribution function D(·) such that D(X 0 ) is distribution over X 0 . For example, D(X 0 ) could be the uniform distribution over X 0 . Given these distributions, the error of a reachability function is defined as: (1) Here, we assume that the joint distribution of (X 0 , t, x 0 ) is defined on the Borel σ−algebra such that any Borel set is measurable. Given the fact that R is continuous 2 and ξ as the trajectory of a dynamical system is at least piece-wise continuous, the set of all tuples (X 0 , t, x 0 ) that satisfy ξ(x 0 , t) / ∈ R(X 0 , t) must be a Borel set, and thus is measurable. Therefore, the above probability is well defined.
User interface and data representation. P 1 , P 2 and D are specified by the user as functions generating samples (explained below). The input and output of the reachability function R(X 0 , t) involve infinite objects, and in order to learn R, first, we need some finite representations of these objects. In NeuReach, X 0 is picked from a user-specified family of sets where each set can be represented by a finite number of parameters. For example, X 0 could be a ball and represented by two parameters -center and radius. From here on, we will not distinguish between X 0 and its parameterized representation. Similarly, the reachset R(X 0 , t) also needs a representation. NeuReach represents the reachsets with ellipsoids. Given a vector x 0 ∈ R n and a matrix C ∈ R n×n , the set E(x 0 , C) := {x ∈ R n : C · (x − x 0 ) 2 ≤ 1} is an ellipsoid . Thus, given the center, an ellipsoid can be represented by an n × n matrix.
In order to use NeuReach, the user has to implement the following functions.
(i) sample X0(): Produces a random initial set X 0 from a distribution P 1 . Specifically, the parameterized representation of X 0 is returned. (ii) sample t(): Produces a random sample of t from a distribution P 2 . (iii) sample x0(X0): Takes an initial set X 0 , and produces a random sample of x 0 ∈ X 0 according to a distribution D(X 0 ). (iv) simulate(x0): Takes an initial state x 0 and generates a finite trajectory ξ(x 0 , ·) which is a sequence of states at some time instants. The user should make sure that for every time instant returned by sample t(), a state corresponding to it can be found in the simulated trajectory. (v) get init center(X0): Takes an initial set X 0 and returns E [D(X 0 )] := , which is the mean value of x over the initial states. Given these functions, NeuReach computes a reachability function R with an error guarantee (Theorem 1). The reachset R(X 0 , t) is an ellipsoid centered at ξ(E [D(X 0 )] , t). As the output, NeuReach will generate a Python function R(X0, t). This function can be serialized and stored on disk for future use. When calling this function, the user provides the initial set X 0 and t, and then an n × n matrix representing the shape of the ellipsoid will be returned.

Design of NeuReach: Learning reachability functions
We present the design rationale behind NeuReach and discuss the learning algorithm it implements. We show that standard results in statistical learning theory give a probabilistic guarantee on the error of the learned reachability function.

Reachability with Empirical Risk Minimization
The basic idea is to model the reachset R(X 0 , t) as an ellipsoid around ξ(E [D(X 0 )] , t). As stated earlier, given the center, an n-dimensional ellipsoid can be represented by an n × n matrix. Thus, learning the set-valued reachability function R(X 0 , t) becomes the problem of learning a matrix-valued function C(X 0 , t) that describes the shape of the set. We represent function C using parametric models, such as neural networks. Let us denote this parametric, matrix-valued function by C θ , where θ ∈ W ⊆ R p is the vector of parameters. The parameter θ could be, for example, a scalar representing a coefficient of a polynomial, a vector representing weights of a neural network, etc. Thus, the parametric reachability function is: To simplify the notations, for X = (X 0 , t, x 0 ) and parameter θ, we define a function g θ (X) : For a particular sample X and a parameter θ, if g θ (X) ≤ 1, then ξ(x 0 , t) ∈ R θ (X 0 , t), otherwise it is outside and contributes to the error. The goal of our learning algorithm is to find a θ to minimize the error of the resulting reachability function R θ , which gives the following optimization problem: where I (·) is the indicator function. In order to solve the above optimization problem using empirical risk minimization, we consider the following setup. First, a training set is constructed. We denote a training set with N samples by ) are independently drawn from the data distribution defined by X 0 ∼ P 1 , t ∼ P 2 , x 0 ∼ D(X 0 ). The empirical loss on S for a parameter θ is where (x) := max{0, x α + 1} is the hinge loss function with the hyper-parameter α > 0, which is a soft proxy for the indicator function. Therefore, the empirical loss L ERM is a soft, empirical proxy of the actual error as defined in Equation (1).  In addition to minimizing the empirical loss, we would also like the overapproximation of the reachset to be as tight as possible. Thus, the volume of the ellipsoid should be penalized. Inspired by [14], we use − log(det(C C)) as a proxy of the volume of an ellipsoid E(x 0 , C), and the following regularization term is added to penalize large ellipsoids.
Combining the two terms, we define the overall optimization problem: where λ is a hyper-parameter balancing two loss terms.
Machine learning setup. The training set is constructed as follows. First, we sample N X0 initial sets by calling sample X0(). Then, for each initial set, we sample N x0 initial states from it using sample x0(X0) and then get N x0 trajectories by calling simulate(x0). Finally, for each trajectory, we sample N t time instants by calling sample t(). Thus, the resulting training set contains N := N X0 ×N x0 ×N t samples, but generating such a training set only needs N X0 × N x0 trajectory simulations. NeuReach implements the optimization problem of Equation (4) in Pytorch [37] and solves it with stochastic gradient descent. By default, a three-layer neural network is used to represent C θ . For n-dimensional reachsets, the number of neurons in each layer are L 1 , L 2 , and n 2 , where L 1 and L 2 can be specified by the user. The output vector of the neural network is then reshaped to be an n × n matrix. By default, we set α = 0.001 and λ = 0.03. The neural network is trained for 30 epochs with a learning rate of 0.01. Hyperparameters including learning rate, α, λ, and size of the training set can be easily changed via the user interface as shown in Table 1.

Probabilistic Correctness of NeuReach
The following theorem shows that the error of the learned reachability function Rθ can be bounded. Specifically, the difference between the error and the empirical loss is O( 1 N ), where N is the size of the training set.
Theorem 1. For any > 0, and a random training set S with N i.i.d. samples, with probability at least 1 − 2 exp(−2N 2 ), the following inequality holds, where p is the number of parameters, i.e. θ ∈ R p , and˜ (·) = min{1, (·)} is the truncated hinge loss, and L g is the Lipschitz constant of g θ w.r.t. θ.
Theorem 1 shows that by controlling and N , the actual error EX I gθ(X) − 1 > 0 can be made arbitrarily close to the empirical loss 1 N N i−1˜ (gθ(X i )−1), with arbitrarily high probability. The empirical loss on the training set S can be made very small in practice due to the high capacity of the neural network. Of course, there is no free lunch, in general. In order to drive the empirical loss to 0, we might have to increase the number of parameters, which in turn increases the term 12 α L g p N . Furthermore, the hyper-parameter λ also affects the empirical loss. A smaller λ results in lower empirical loss but more conservative reachsets. Actually, conservatism and accuracy are conflicting requirements. As shown in [21], when using reachability to verify safety, accuracy determines the soundness of the verification, while conservatism influences the sample efficiency. We wanted to focus more on soundness than on efficiency. Thus, a theoretical guarantee is derived for accuracy but not for conservatism.
Proof. Starting from the left hand side and using the definition of hinge loss, we get E X I gθ(X) − 1 > 0 ≤ E X ˜ (gθ(X) − 1) . By adding and subtracting the empirical loss term, we get: where the inequality follows from the definition of supremum.
i.e. the worstcase difference between the empirical average and the expectation of the loss. Note that V is a random quantity since S = {X i } N i=1 is random. Next, we derive an upper bound on V that holds with high probability.
Notice that V is the worst-case (among all f θ ∈ F ) gap between the expectation and the empirical average of f θ (X). A fundamental result in PAC learning (Theorem 3.3 in [35]) shows that this gap can be bounded as where Rad(F (S)) is the Rademacher complexity [35] of F (S). Furthermore, notice that F (S) can be generated from G(S) by shifting it and composing it with˜ . It follows from Talagrand's contraction lemma [31] that α is the Lipschitz constant. Finally, following from a conclusion on Rademacher complexity of Lipschitz parameterized function classes (See page 13 in [8]), we get E S [Rad(G(S))] ≤ 3L g p N . Therefore, we get Then, applying McDiarmid's inequality [35] gives a high-probability bound on V. That is, Together with Eq. (6), we have V ≤ E [V] + ≤ 12 α L g p N + with probability at least 1 − 2 exp(−2N 2 ). This implies with probability at least 1 − 2 exp(−2N 2 ), which completes the proof.

Experimental evaluation
We evaluated NeuReach on several benchmark systems including the Van der Pol oscillator, the Moore-Greitzer model of a jet engine, an 8-dimensional quadrotor controlled by a neural network [40], and an F-16 Ground Collision Avoidance system [28]. We also compare our method with DryVR [21]. Since NeuReach is fully data-driven and does not rely on the analytical model of the system, it would not make sense to compare against model-based methods like Hamilton-Jacobi reachability analysis [6], Flow * [11], C2E2 [18], or SReach [41]. Some of our benchmarks cannot be handled by these tools. Also, once the reachability function is learned, many reachsets can be computed very quickly by our method. Given that other tools need to compute the reachset from scratch for each new query, comparisons based on running times, would not make sense either.

Benchmark systems
The simulators available for the benchmark systems allow us to specify fixed time-steps ∆t and a time bound T . As for the distribution P 2 , we adopt the uniform distribution, i.e. P 2 = Unif({∆t, 2∆t, · · · , T ∆t ∆t}) (Recall, the definition of this distribution in Section 3). For a given initial set X 0 , D(X 0 ) is defined as the uniform distribution on the boundary of X 0 . As shown in Corollary 1 of [43], the boundary of the reachable set of an initial set is equal to the reachable set of the initial set's boundary for ODEs. That is, if the estimated reachable set contains the reachable set of the initial set's boundary, it automatically contains that of the interior. Thus, we only sample points on the boundary of X 0 to improve sample efficiency. As for the distribution P 1 , we will give details for each benchmark below.
Van der Pol oscillator is a widely used 2-dimensional nonlinear model. An initial set X 0 is a ball centered at c with radius r. The distribution P 1 for choosing  [28] is a challenging benchmark for formal analysis tools. This system consists of 16 state variables (See Table 1 in [28]) among which Vt and alt are air speed and altitude. The key safety property of interest is ground collision avoidance, and therefore, in our experiments we focus on estimating the reachset only for Vt and alt. We consider initial uncertainty in up to 6 state variables, [Vt, α, φ, ψ, Q, alt]. The function simulate(x0) is designed to return projections of trajectories to Vt and alt, while sample X0() returns 6-dimensional initial sets. We restrict the initial set to be hyper-rectangles as in [28]. An initial set X 0 is determined by a center c ∈ R 6 and a radius r ∈ R 6 with X 0 = {x ∈ R 6 : c − r ≤ x ≤ c + r}. However, DryVR does not support hyper-rectangles as initial sets. Thus, we also use another setting for comparison where the initial sets are balls. To do this, we sample balls from a cube with c ∼ Unif([−1, 1] × · · · × [−1, 1]) and r ∼ Unif([0, 0.5]). Then, we transform this ball to the original coordinate system by scaling each dimension. This setting is shown in Fig. 2 (Left) as F-16 (Spherical).

F-16 Ground Collision Avoidance System
Quadrotor controlled by a neural controller is based on [40]. The state of the quadrotor system is x = [p x , p y , p z , v x , v y , v z , θ x , θ y ], and the control input is u := [a z , ω x , ω y ]. We are only interested in estimating the reachability of the position variables, i.e., the first 3 dimensions of the state vector. We use balls for the initial sets with c ∼ Unif([−1, 1] × · · · × [−1, 1]) and r ∼ Unif([0, √ 8]). The time bound is set to T = 10, and time step is ∆t = 0.05.

Experimental results
Evaluation metrics. In order to evaluate the learned reachability function, we randomly sample 10 initial sets for testing. For each initial set X 0 , we then sample 100 trajectories starting from it. For every sampled time instant on the sampled trajectories, we check whether the state is contained in the estimated reachset and compute the empirical error (i.e., the frequency that a sample is not in the estimated reachset). In order to evaluate the conservatism of the over-approximations, we also compare the size of the over-approximations. For each initial set X 0 , we compute the total volume of the over-approximations R(X 0 , t i ) where t i = ∆t, 2∆t, · · · , T ∆t ∆t. Then, the total volume averaged over 10 sampled initial sets are reported. Results are summarized in Figure 2 (Left). Please note that we use the default settings in Table 1 for all benchmarks.
All experiments were conducted on a Linux workstation with two Xeon Silver 4110 CPUs and 32 GB RAM. As shown in Figure 2 (Left), NeuReach learns an accurate reachability function for each benchmark. Please note that due to the complicated dynamics and the neural controller, the F-16 model and the quadrotor are beyond the reach of current model-based tools. As shown in Figure 1 (Right), NeuReach successfully verified the safety of the F-16 model. Comparison with DryVR. DryVR [21] computes reachsets for spherical initial sets by learning a piece-wise exponential discrepancy (PED) function that bounds the sensitivity of the trajectories to the initial state. This function is of the form: where r is the radius of the initial set, [t i−1 , t i ] is the i-th time interval, and K, γ are learned parameters. For an spherical initial set X 0 = B(c, r), the computed reachset is R(B(c, r), t) := B(ξ(E [D(B(c, r))] , t), β(r, t)), where B(c, r) is a ball centered at c with radius r. It is important to recall that, similar to other reachability tools, for every new initial set X 0 , DryVR computes the PED function and the reachset from scratch. For a fair comparison, we compute the parameters K and γ on the exact same training set as the one used in NeuReach and reuse the resulting PED for further queries.
Accuracy and conservatism. As shown in Figure 2 (Left), the reachsets estimated by NeuReach are tighter and more accurate than those computed by DryVR. There are two reasons for this. First, DryVR uses piece-wise exponential functions to capture the relationship between the initial radius and the radius at time t, while NeuReach uses more expressive neural networks. Second, the use of ellipsoids allows coordinate-specific accuracy. As seen in Figure 1, the reachset of JetEngine is not a perfect circle even if the initial set is a circle. Ellipsoids can approximate the actual reachsets better.
Running time. As expected, the training phase of NeuReach takes several minutes, but once a reachability function has been learned, computation of the reachset from a new initial set is very fast. For the quadrotor system, for example, this takes ∼ 0.3 ms on the aforementioned workstation. We believe that this makes NeuReach suitable for online safety checking and motion planning.
Impact of the hyper-parameter λ. λ influences the error and volume of the reachsets computed by NeuReach. Figure 2 (Right) shows the result of running NeuReach on JetEngine with different settings of λ. As expected, larger λ results in smaller reachsets but hurts the accuracy. On the other hand, we do not need to tune λ case by case. Note that we use λ = 0.03 for all the results in Figure 2 (Left), and it works reasonably well for all our benchmarks.

Conclusion
In this paper, we presented a tool for computing reachability of systems using machine learning. NeuReach can learn accurate reachability functions for complex nonlinear systems, including some that are beyond existing methods. From a learned reachability function, arbitrary reachtubes can be computed in milliseconds. There are several limitations in the current implementation of NeuReach. First, the simulator is assumed to be deterministic-this can be too restrictive for autonomous systems with complex perception and vehicle models. We plan to extend the theory and implementation to support more general simulators. Secondly, the over-approximations are restricted to be represented as ellipsoids.
Other representations will be supported in the future.