Scenic: a language for scenario specification and data generation

We propose a new probabilistic programming language for the design and analysis of cyber-physical systems, especially those based on machine learning. We consider several problems arising in the design process, including training a system to be robust to rare events, testing its performance under different conditions, and debugging failures. We show how a probabilistic programming language can help address these problems by specifying distributions encoding interesting types of inputs, then sampling these to generate specialized training and test data. More generally, such languages can be used to write environment models, an essential prerequisite to any formal analysis. In this paper, we focus on systems such as autonomous cars and robots, whose environment at any point in time is a scene, a configuration of physical objects and agents. We design a domain-specific language, Scenic, for describing scenarios that are distributions over scenes and the behaviors of their agents over time. Scenic combines concise, readable syntax for spatiotemporal relationships with the ability to declaratively impose hard and soft constraints over the scenario. We develop specialized techniques for sampling from the resulting distribution, taking advantage of the structure provided by Scenic’s domain-specific syntax. Finally, we apply Scenic in multiple case studies for training, testing, and debugging neural networks for perception both as standalone components and within the context of a full cyber-physical system.

: Three scenes generated from a single ∼20-line Scenic program representing bumper-to-bumper traffic.

Introduction
Machine learning (ML) is increasingly used in safety-critical applications, thereby creating an acute need for techniques to gain higher assurance in ML-based systems [51,53,1]. ML has proved particularly effective at the difficult perceptual tasks (e.g., vision) arising in cyber-physical systems like autonomous vehicles which operate in heterogeneous, complex physical environments. Thus, there is a pressing need to tackle several important problems in the design of such ML-based cyber-physical systems, including: • training the system to be robust, correctly responding to events that happen only rarely; • testing the system under a variety of conditions, especially unusual ones, and • debugging the system to understand the root cause of a failure and eliminate it.
The traditional ML approach to these problems is to gather more data from the environment, retraining the system until its performance is adequate. The major difficulty here is that collecting real-world data can be slow and expensive, since it must be preprocessed and correctly labeled before use. Furthermore, it may be difficult or impossible to collect data for corner cases that are rare and even dangerous but nonetheless necessary to train and test against: for example, a car accident. As a result, recent work has investigated training and testing systems with synthetically generated data, which can be produced in bulk with correct labels and giving the designer full control over the distribution of the data [28,27,57,30].
A challenge to the use of synthetic data is that it can be highly non-trivial to generate meaningful data, since this usually requires modeling complex environments [53]. Suppose we wanted to train a neural network on images of cars on a road. If we simply sampled uniformly at random from all possible configurations of, say, 12 cars, we would get data that was at best unrealistic, with cars facing sideways or backward, and at worst physically impossible, with cars intersecting each other. Instead, we want scenes like those in Fig. 1, where the cars are laid out in a consistent and realistic way. Furthermore, we may want scenes that are not only realistic but represent particular scenarios of interest for training or testing, e.g., parked cars, cars passing across the field of view, or bumper-to-bumper traffic as in Fig. 1. In general, we need a way to guide data generation toward scenarios that make sense for our application.
We argue that probabilistic programming languages (PPLs) [26] provide a natural solution to this problem. Using a PPL, the designer of a system can construct distributions representing different input regimes of interest, and sample from these distributions to obtain concrete inputs for training and testing. More generally, the designer can model the system's environment, with the program becoming a specification of the distribution of environments under which the system is expected to operate correctly with high probability. Such environment models are essential for any formal analysis: in particular, composing the system with the model, we obtain a closed program which we could potentially prove properties about to establish the correctness of the system.
In this paper, we focus on designing and analyzing ML-based cyber-physical systems. We refer to the environment of such a system at any point in time as a scene, a configuration of objects in space (including dynamic agents, such as vehicles) along with their features. We develop a domain-specific scenario description language, Scenic, to specify such environments. Scenic is a probabilistic programming language, and a Scenic scenario defines a distribution over both scenes and the behaviors of the dynamic agents in them over time. As we will see, the syntax of the language is designed to simplify the task of writing complex scenarios, and to enable the use of specialized sampling techniques. In particular, Scenic allows the user to both construct objects in a straightforward imperative style and impose hard and soft constraints declaratively. It also provides readable, concise syntax for spatial and temporal relationships: constructs for common geometric relationships that would otherwise require complex non-linear expressions and constraints, as well as temporal constructs like interrupts for building complex dynamic behaviors in a modular way. In addition, Scenic provides a notion of classes allowing properties of objects to be given default values depending on other properties: for example, we can define a Car so that by default it faces in the direction of the road at its position. More broadly, Scenic uses a novel approach to object construction which factors the process into syntactically-independent specifiers which can be combined in arbitrary ways, mirroring the flexibility of natural language. Finally, Scenic provides constructs to generalize simple scenarios by adding noise or by composing multiple scenarios together. The variety of constructs in Scenic makes it possible to model scenarios anywhere on a spectrum from concrete scenes (i.e. individual test cases) to extremely broad classes of abstract scenarios (see Fig. 2). A scenario can be reached by moving along the spectrum from either end: the top-down approach is to progressively constrain a very general scenario, while the bottom-up approach is to generalize from a concrete example (such as a known failure case), for example by adding random noise. Probably most usefully, one can write a scenario in the middle which is far more general than simply adding noise to a single scene but has much more structure than a completely random scene: for example, the traffic scenario depicted in Fig. 1. We will illustrate all three ways of developing a scenario, which as we will see are useful for different training, testing, and debugging tasks.
Generating scenarios from a Scenic program requires sampling from the probability distribution it implicitly defines. This task is closely related to the infer-ence problem for imperative PPLs with observations [26]. While Scenic could be implemented as a library on top of such a language, we found that clarity and concision could be significantly improved with new syntax (specifiers and interrupts in particular) difficult to implement as a library. Furthermore, while Scenic could be translated into existing PPLs, using a new language allows us to impose restrictions enabling domain-specific sampling techniques not possible with general-purpose PPLs. In particular, we develop algorithms which take advantage of the particular structure of distributions arising from Scenic programs to dramatically prune the sample space.
We also integrate Scenic as the environment modeling language for VerifAI, a tool for the formal design and analysis of AI-based systems [8]. VerifAI allows writing system-level specifications in Metric Temporal Logic [33] and performing falsification, running simulations and monitoring for violations of the specifications. VerifAI provides several search techniques, including active samplers that use feedback from earlier simulations to try to drive the system towards violations. We make these techniques available from Scenic using syntax to define external parameters which are sampled by VerifAI or another external tool. Such parameters need not have a fixed distribution of values: in particular, we can define a prior distribution, but then use cross-entropy optimization [50] to drive the distribution towards one that is concentrated on values that tend to lead to system failures [15].
We demonstrate the utility of Scenic in training, testing, and debugging MLbased cyber-physical systems. Our first case study is on SqueezeDet [61], a convolutional neural network for object detection in autonomous cars. For this task, it has been shown [30] that good performance on real images can be achieved with networks trained purely on synthetic images from the video game Grand Theft Auto V (GTAV [47]). We implemented a sampler for Scenic scenarios, using it to generate scenes which were rendered into images by GTAV. Our experiments demonstrate using Scenic to: • evaluate the accuracy of the ML model under particular conditions, e.g. in good or bad weather, • improve performance in corner cases by emphasizing them during training: we use Scenic to both identify a deficiency in a state-of-the-art car detection data set [30] and generate a new training set of equal size but yielding significantly better performance, and • debug a known failure case by generalizing it in many directions, exploring sensitivity to different features and developing a more general scenario for retraining: we use Scenic to find an image the network misclassifies, discover the root cause, and fix the bug, in the process improving the network's performance on its original test set (again, without increasing training set size).
These experiments show that Scenic can be a very useful tool for understanding and improving perception systems. While this case study is performed in the domain of visual perception for autonomous driving, and uses one particular simulator (GTAV), we stress that Scenic is not specific to either. In Sec. 3 we give an example of a different domain, namely robotic motion planning (using the Webots simulator [40]), and in Sec. 6.2.2 we use Scenic and VerifAI to falsify an autonomous agent in the CARLA driving simulator [7]. The latter experiment demonstrates Scenic's use-fulness applied not only to perception components in isolation but to entire closedloop cyber-physical systems. In fact, since the conference version of this paper we have successfully applied Scenic in two industrial case studies on large ML-based systems [15,19]: an aircraft navigation system from Boeing (tested in the X-Plane flight simulator [35]) and the Apollo autonomous driving platform [3] (tested in the LGSVL driving simulator [48] and on an actual test track). Generally, Scenic can produce data of any desired type (e.g. RGB images, LIDAR point clouds, or trajectories from dynamical simulations) by interfacing it to an appropriate simulator. This requires only two steps: (1) writing a small Scenic library defining the types of objects supported by the simulator, as well as the geometry of the workspace; (2) writing an interface layer converting the configurations output by Scenic into the simulator's input format (and, for dynamic scenarios, transferring simulator state back into Scenic). While the current version of Scenic is primarily concerned with geometry, leaving the details of rendering up to the simulator, the language allows putting distributions on any parameters the simulator exposes: for example, in GTAV the meshes of the various car models are fixed but we can control their overall color. We have also used Scenic to specify distributions over parameters on system dynamics, such as mass.
In summary, the main contributions of this work are: -Scenic, a domain-specific probabilistic programming language for describing scenarios: distributions over spatio-temporal configurations of physical objects and agents; a methodology for using PPLs to design and analyze cyber-physical systems, especially those based on ML; domain-specific algorithms for sampling from the distribution defined by a Scenic program; -a case study using Scenic to analyze and improve the accuracy of a practical deep neural network used for perception in an autonomous driving context beyond what is achieved by state-of-the-art synthetic data generation methods.
The paper is structured as follows: we begin with an overview of our approach in Sec. 2. Section 3 gives examples highlighting the major features of Scenic and motivating various choices in its design. In Sec. 4 we describe the Scenic language in detail, and in Sec. 5 we discuss its formal semantics and our sampling algorithms. Section 6 describes the setup and results of our car detection case study and other experiments. Finally, we discuss related work in Sec. 7 and conclude in Sec. 8 with a summary and directions for future work.
An early version of this paper appeared as [14], extended and published as [17]. This paper further extends [17] by generalizing Scenic to dynamic scenarios (including new spatiotemporal pruning techniques), adding constructs for composing scenarios, and integrating Scenic within the broader VerifAI toolkit.

Using PPLs to Design and Analyze ML-Based Cyber-Physical Systems
We propose a methodology for training, testing, and debugging ML-based cyberphysical systems using probabilistic programming languages. The core idea is to  use PPLs to formalize general operation scenarios, then sample from these distributions to generate concrete environment configurations. Putting these configurations into a simulator, we obtain images or other sensor data which can be used to test and train the system. The general procedure is outlined in Fig. 3. For a demonstration of this paradigm on an industrial system, proceeding from falsification through failure analysis, retraining, and validation, see [15]. Note that the training/testing datasets need not be purely synthetic: we can generate data to supplement existing real-world data (possibly mitigating a deficiency in the latter, while avoiding overfitting). Furthermore, even for models trained purely on real data, synthetic data can still be useful for testing and debugging, as we will see below. Now we discuss the three design problems from the Introduction in more detail.
Testing under Different Conditions. The most straightforward problem is that of assessing system performance under different conditions. We can simply write scenarios capturing each condition, generate a test set from each one, and evaluate the performance of the system on these. Note that conditions which occur rarely in the real world present no additional problems: as long as the PPL we use can encode the condition, we can generate as many instances as desired. If we do not have particular conditions in mind, we can write a very general scenario describing the expected operation regime of the system (e.g., the "Operational Design Domain" (ODD) of an autonomous vehicle [56]) and perform falsification, looking for violations of the system's specification.
Training on Rare Events. Extending the previous application, we can use this procedure to help ensure the system performs adequately even in unusual circumstances or particularly difficult cases. Writing a scenario capturing these rare events, we can generate instances of them to augment or replace part of the original training set. Emphasizing these instances in the training set can improve the system's performance in the hard case without impacting performance in the typical case. In Sec. 6.3 we will demonstrate this for car detection, where a hard case is when one car partially overlaps another in the image. We wrote a Scenic program to generate a set of these overlapping images. Training the car-detection network on a state-of-the-art synthetic dataset obtained by randomly driving around inside the simulated world of GTAV and capturing images periodically [30], we find its performance is significantly worse on the overlapping images. However, if we keep the training set size fixed but increase the proportion of overlapping images, performance on such images dramatically improves without harming performance on the original generic dataset.
Debugging Failures. Finally, we can use the same procedure to help understand and fix bugs in the system. If we find an environment configuration where the system fails, we can write a scenario reproducing that particular configuration. Having the configuration encoded as a program then makes it possible to explore the neighborhood around it in a variety of different directions, leaving some aspects of the scene fixed while varying others. This can give insight into which features of the scene are relevant to the failure, and eventually identify the root cause. The root cause can then itself be encoded into a scenario which generalizes the original failure, allowing retraining without overfitting to the particular counterexample. We will demonstrate this approach in Sec. 6.4, starting from a single misclassification, identifying a general deficiency in the training set, replacing part of the training data to fix the gap, and ultimately achieving higher performance on the original test set.
For all of these applications we need a PPL which can encode a wide range of general and specific environment scenarios. In the next section, we describe the design of a language suited to this purpose.

The Scenic Language
We use Scenic scenarios from our autonomous car case study to motivate and illustrate the main features of the language, focusing on features that make Scenic particularly well-suited for the domain of specifying scenarios for cyber-physical systems. We begin by describing how Scenic can define spatial relationships between objects to model scenarios like "a badly-parked car", moving on to temporal relationships for dynamic scenarios like "a badly-parked car, which pulls into the road as you approach". Finally, we outline Scenic's support for composing multiple scenarios together to produce more complex ones.

Basic Scenarios
Classes, Objects, Geometry, and Distributions. To start, suppose we want scenes of one car viewed from another on the road. We can simply write: 1 from scenic.simulators.gta.model import * 2 ego = Car 3 Car First, we import Scenic's world model for the GTAV simulator: a Scenic library containing everything specific to our case study, including the class Car and information about the locations of roads (from now on we suppress this line). Only general geometric concepts are built into Scenic.
The second line creates a Car and assigns it to the special variable ego specifying the ego object which is the reference point for the scenario. In particular, rendered images from the scenario are from the perspective of the ego object (it is a syntax error to leave ego undefined). Finally, the third line creates an additional Car. Note that we have not specified the position or any other properties of the two cars: this means they are inherited from the default values defined in the class Car. Object-orientation is valuable in Scenic since it provides a natural organizational principle for scenarios involving different types of physical objects. It also improves compositionality, since we can define a generic Car model in a library like the GTAV world model and use it in different scenarios. Our definition of Car begins as follows (slightly simplified): 1 class Car: Here road is a region (one of Scenic's primitive types) defined in the GTAV world model to specify which points in the workspace are on a road. Similarly, roadDirection is a vector field specifying the prevailing traffic direction at such points. The operator F at X simply gets the direction of the field F at point X, so the default value for a car's heading is the road direction at its position. The default position, in turn, is a Point on road (we will explain this syntax shortly), which means a uniformly random point on the road.
The ability to make random choices like this is a key aspect of Scenic. Scenic's probabilistic nature allows it to model real-world stochasticity, for example encoding a distribution for the distance between two cars learned from data. This in turn is essential for our application of PPLs to training perception systems: using randomness, a PPL can generate training data matching the distribution the system will be used under. Scenic provides several basic distributions (and allows more to be defined). For example, we can write 1 Car offset by (Range(-10, 10), Range (20, 40)) to create a car that is 20-40 m ahead of the camera. The notation Range(X , Y ) creates a uniform distribution over the given continuous range, and (X , Y ) creates a pair, interpreted here as a vector given by its xy coordinates.
Local Coordinate Systems. Using offset by as above overrides the default position of the Car, leaving the default orientation (along the road) unchanged. Suppose for greater realism we don't want to require the car to be exactly aligned with the road, but to be within say 5 • . We could try: 1 Car offset by (Range(-10, 10), Range(20, 40)), 2 facing Range(-5, 5) deg but this is not quite what we want, since this sets the orientation of the Car in global coordinates (i.e. within 5 • of North). Instead we can use Scenic's general operator X relative to Y , which can interpret vectors and headings as being in a variety of local coordinate systems: If we want the heading to be relative to the ego car's orientation, we simply write Range(-5, 5) deg relative to ego.
Notice that since roadDirection is a vector field, it defines a coordinate system at each point, and an expression like 15 deg relative to field does not define a unique heading. The example above works because Scenic knows that Range(-5, 5) deg relative to roadDirection depends on a reference position, and automatically uses the position of the Car being defined. This is a feature of Scenic's system of specifiers, which we explain next.
Readable, Flexible Specifiers. The syntax offset by X and facing Y for specifying positions and orientations may seem unusual compared to typical constructors in object-oriented languages. There are two reasons why Scenic uses this kind of syntax: first, readability. The second is more subtle and based on the fact that in natural language there are many ways to specify positions and other properties, some of which interact with each other. Consider the following ways one might describe the location of an object: 1. "is at position X " (absolute position); 2. "is just left of position X " (position based on orientation); 3. "is 3 m west of the taxi" (relative position); 4. "is 3 m left of the taxi" (a local coordinate system); 5. "is one lane left of the taxi" (another local coordinate system); 6. "appears to be 10 m behind the taxi" (relative to the line of sight); 7. "is 10 m along the road from the taxi" (following a vector field; consider a curving road).
These are all fundamentally different from each other: e.g., (4) and (5) differ if the taxi is not parallel to the lane. Furthermore, these specifications combine other properties of the object in different ways: to place the object "just left of" a position, we must first know the object's heading; whereas if we wanted to face the object "towards" a location, we must instead know its position. There can be chains of such dependencies: "the car is 0.5 m left of the curb" means that the right edge of the car is 0.5 m away from the curb, not the car's position, which is its center. So the car's position depends on its width, which in turn depends on its model. In a typical objectoriented language, this might be handled by computing values for position and other properties and passing them to a constructor. For "a car is 0.5 m left of the curb" we might write: Notice how m must be used twice, because m determines both the model of the car and (indirectly) its position. This is inelegant and breaks encapsulation because the default model distribution is used outside of the Car constructor. The latter problem could be fixed by having a specialized constructor or factory function, but these would proliferate since we would need to handle all possible combinations of ways to specify different properties (e.g. do we want to require a specific model? Are we overriding the width provided by the model for this specific car?). Instead of having a multitude of such monolithic constructors, Scenic factors the definition of objects into potentially-interacting but syntactically-independent parts: 1 Car left of spot by 0.5, with model BUS Here left of X by D and with model M are specifiers which do not have an order, but which together specify the properties of the car. Scenic works out the dependencies between properties (here, position is provided by left of, which depends on width, whose default value depends on model) and evaluates them in the correct order. To use the default model distribution we would simply leave off with model BUS; keeping it affects the position appropriately without having to specify BUS more than once.
Specifying Multiple Properties Together. Recall that we defined the default position for a Car to be a Point on road: this is an example of another specifier, on region, which specifies position to be a uniformly random point in the given region. This specifier illustrates another feature of Scenic, namely that specifiers can specify multiple properties simultaneously. Consider the following scenario, which creates a parked car given a region curb defined in the GTAV world model: The function visible region returns the part of the region that is visible from the ego object. The specifier on visible curb will then set position to be a uniformly random visible point on the curb. We create spot as an OrientedPoint, which is a built-in class that defines a local coordinate system by having both a position and a heading. The on region specifier can also specify heading if the region has a preferred orientation (a vector field) associated with it: in our example, curb is oriented by roadDirection. So spot is, in fact, a uniformly random visible point on the curb, oriented along the road. That orientation then causes the car to be placed 0.25 m left of spot in spot's local coordinate system, i.e. away from the curb, as desired.
In fact, Scenic makes it easy to elaborate the scenario without needing to alter the code above. Most simply, we could specify a particular model or non-default distribution over models by just adding with model M to the definition of the Car. More interestingly, we could produce a scenario for badly-parked cars by adding two lines: This will yield cars parked 10-20 • off from the direction of the curb, as seen in Fig. 4. This illustrates how specifiers greatly enhance Scenic's flexibility and modularity. Declarative Specifications of Hard and Soft Constraints. Notice that in the scenarios above we never explicitly ensured that the two cars will not intersect each other. Despite this, Scenic will never generate such scenes. This is because Scenic enforces several default requirements: all objects must be contained in the workspace, must not intersect each other, and must be visible from the ego object. 1 Scenic also allows the user to define custom requirements checking arbitrary conditions built from various geometric predicates. For example, the following scenario produces a car headed roughly towards us, while still facing the nominal road direction: Here we have used the X can see Y predicate, which in this case is checking that the ego car is inside the 30 • view cone of the second car. If we only need this constraint to hold part of the time, we can use a soft requirement specifying the minimum probability with which it must hold: code. This is useful, for example, if we have a scenario encoding a single concrete scene obtained from real-world data and want to quickly generate variations. This will add Gaussian noise to the position and heading of taxi, while still enforcing all built-in and custom requirements. The standard deviation of the noise can be scaled by writing, for example, mutate taxi by 2 (which adds twice as much noise), and we will see later that it can be controlled separately for position and heading.
Multiple Domains and Simulators. We conclude this section by illustrating a second application domain, namely generating workspaces to test motion planning algorithms, and Scenic's ability to work with different simulators. A robot like a Mars rover able to climb over rocks can have very complex dynamics, with the feasibility of a motion plan depending on exact details of the robot's hardware and the geometry of the terrain. We can use Scenic to write a scenario generating challenging cases for a planner to solve. Figure 5 shows a scene, visualized using an interface we wrote between Scenic and the Webots robotics simulator [40], with a bottleneck between the robot and its goal that forces the planner to consider climbing over a rock. The Scenic code for this scenario is given in Appendix A.
Even within a single application domain, such as autonomous driving, Scenic enables writing cross-platform scenarios that will work without change in multiple simulators. This is made possible by what we call abstract application domains: Scenic world models which define object classes and other world information like our GTAV world model, but which are abstract, simulator-agnostic protocols that can be implemented by models for particular simulators. For example, Scenic includes an abstract domain for autonomous driving, scenic.domains.driving, which loads road networks from standard formats, providing a uniform API for referring to lanes, maneuvers, and other aspects of road geometry. The driving domain also provides generic Car and Pedestrian classes, complete with implementations of common dynamic behaviors (covered in the next section) like lane following. These make it straightforward to implement complex driving scenarios, which are then guaranteed to work in any simulator supporting the driving domain. Figure 6 illustrates this, showing the exact same Scenic code being used to generate scenarios in both the CARLA [7] and LGSVL [48] simulators.

Dynamic Scenarios
Having seen the basic constructs Scenic provides for defining objects and their spatial relationships, we now outline Scenic's support for dynamic scenarios which also define the temporal properties of objects.
Agents, Actions, and Behaviors. In Scenic, we call objects which take actions over time dynamic agents, or simply agents. These are ordinary Scenic objects, so we can still use all of the syntax described in the previous section to define their initial positions, orientations, etc. In addition, we specify their dynamic behavior using a built-in property called behavior. Using one of the behaviors defined in Scenic's driving library, we can write for example: 1 model scenic.domains.driving.model 2 Car with behavior FollowLaneBehavior A behavior defines a sequence of actions for the agent to take, which need not be fixed but can be probabilistic and depend on the state of the agent or other objects. In Scenic, an action is an instantaneous operation executed by an agent, like setting the steering angle of a car or turning on its headlights. Most actions are specific to particular application domains, and so different sets of actions are provided by different simulator interfaces. For example, the Scenic driving domain defines a SetThrottleAction for cars. To define a behavior, we write a function which runs over the course of the scenario, periodically issuing actions. Scenic uses a discrete notion of time, so at each time step the function specifies zero or more actions for the agent to take. For example, here is a very simplified version of the FollowLaneBehavior above: We intend this behavior to run for the entire scenario, so we use an infinite loop. In each step of the loop, we compute appropriate throttle and steering controls, then use the take statement to take the corresponding actions. When that statement is executed, Scenic pauses the behavior until the next time step of the simulation, whereupon the function resumes and the loop repeats.

Scenic Simulator
Execution of Behaviors. When there are multiple agents, all of their behaviors run in parallel, as illustrated in Fig. 7; each time step, Scenic sends their selected actions to the simulator to be executed and advances the simulation by one step. It then reads back the state of the simulation, updating the position, speed, etc. of each object.
Since behaviors run dynamically during simulations, they can access the current state of the world to decide what actions to take. Consider the following behavior: 1 behavior WaitUntilClose(threshold=15): 2 while (distance from self to ego) > threshold: Here, we repeatedly query the distance from the agent running the behavior (self) to the ego car; as long as it is above a threshold, we use the wait statement, to take no action. Once the threshold is met, we start driving by using the do statement to invoke the FollowLaneBehavior we saw above. Since FollowLaneBehavior runs forever, we will never return to the WaitUntilClose behavior.
Behavior Arguments and Random Parameters. The example above also shows how behaviors may take arguments, like any Scenic function. Here, threshold is an argument to the behavior which has default value 15 but can be customized, so we could write for example: 1 ego = Car 2 carB = Car visible, with behavior WaitUntilClose 3 carC = Car visible, with behavior WaitUntilClose (20) Both carB and carC will use the WaitUntilClose behavior, but independent copies of it with thresholds of 15 and 20 respectively.
Unlike ordinary Scenic code, control flow constructs such as if and while are allowed to depend on random variables inside a behavior. Any distributions defined inside a behavior are sampled at simulation time, not during scene sampling. Consider the following behavior: Here, the value of threshold is sampled only once, at the beginning of the scenario when the behavior starts running. The value strength, on the other hand, is sampled every time control reaches line 5, so that every time step when the car is braking we use a slightly different braking strength (0.8 on average, but with Gaussian noise added with standard deviation 0.02, truncating the possible values to between 0.5 and 1).
Interrupts. It is frequently useful to take an existing behavior and add a complication to it; for example, suppose we want a car that follows a lane, stopping whenever it encounters an obstacle. Scenic provides a concept of interrupts which allows us to reuse the basic FollowLaneBehavior without having to modify it. This try-interrupt statement has similar syntax to the Python try statement (and in fact allows except clauses to catch exceptions just as in Python, as we'll see later), and begins in the same way: at first, the code block after the try: (the body) is executed. At the start of every time step during its execution, the condition from each interrupt clause is checked; if any are true, execution of the body is suspended and we instead begin to execute the corresponding interrupt handler. In the example above, there is only one interrupt, which fires when we come within 5 meters of any object. When that happens, FollowLaneBehavior is paused and we instead apply full braking for one time step. In the next step, we will resume FollowLaneBehavior wherever it left off, unless we are still within 5 meters of an object, in which case the interrupt will fire again.
If there are multiple interrupt clauses, successive clauses take precedence over those which precede them. Furthermore, such higher-priority interrupts can fire even during the execution of an earlier interrupt handler. This makes it easy to model a hierarchy of behaviors with different priorities; for example, we could implement a car which drives along a lane, passing slow cars and avoiding collisions, along the following lines: Here, the car begins by lane following, switching to passing if there is a car or other obstacle too close ahead. During either of those two sub-behaviors, if the time to collision gets too low, we switch to collision avoidance. Once the CollisionAvoidance behavior completes, we will resume whichever behavior was interrupted earlier. If we were in the middle of PassingBehavior, it will run to completion (possibly being interrupted again) before we finally resume FollowLaneBehavior.
As this example illustrates, when an interrupt handler completes, by default we resume execution of the interrupted code. If this is undesired, the abort statement can be used to cause the entire try-interrupt statement to exit. For example, to run a behavior until a condition is met without resuming it afterward, we can write: 1 behavior ApproachAndTurnLeft(): This is a common enough use case of interrupts that Scenic provides a shorthand notation: The alternative form do behavior for n steps uses time steps instead of real simulation time.
Finally, note that when try-interrupt statements are nested, interrupts of the outer statement take precedence. This makes it easy to build up complex behaviors in a modular way. For example, the behavior Drive we wrote above is relatively complicated, using interrupts to switch between several different sub-behaviors. We would like to be able to put it in a library and reuse it in many different scenarios without modification. Interrupts make this straightforward; for example, if for a particular scenario we want a car that drives normally but suddenly brakes for 5 seconds when it reaches a certain area, we can write: With this behavior, Drive operates as it did before, interrupts firing as appropriate to switch between lane following, passing, and collision avoidance. But during any of these sub-behaviors, if the car enters the targetRegion it will immediately brake for 5 seconds, then pick up where it left off.
Stateful Behaviors. As the last example shows, behaviors can use local variables to maintain state, which is useful when implementing behaviors which depend on actions taken in the past. To elaborate on that example, suppose we want a car which usually follows the Drive behavior, but every 15-30 seconds stops for 5 seconds. We can implement this behavior as follows: Here delay is the randomly-chosen amount of time to run Drive for, and last_stop keeps track of the time when we last started to run it. When the time elapsed since last_stop exceeds delay, we interrupt Drive and stop for 5 seconds. Afterwards, we pick a new delay before the next stop, and save the current time in last_stop, effectively resetting our timer to zero.
Requirements and Monitors. Just as you can declare spatial constraints on scenes using the require statement, you can also impose constraints on dynamic scenarios. For example, if we don't want to generate any simulations where carA and carB are simultaneously visible from the ego car, we could write: The require always condition statement enforces that the given condition must hold at every time step of the scenario; if it is ever violated during a simulation, we reject that simulation and sample a new one. Similarly, we can require that a condition hold at some time during the scenario using the require eventually statement: 1 require eventually ego in intersection You can also use the ordinary require statement inside a behavior to require that a given condition hold at a certain point during the execution of the behavior. For example, here is a simple elaboration of the WaitUntilClose behavior we saw above: The requirement ensures that no pedestrian comes close to self until the ego does; after that, we place no further restrictions.
To enforce more complex temporal properties like this one without modifying behaviors, you can define a monitor. Like behaviors, monitors are functions which run in parallel with the scenario, but they are not associated with any agent and any actions they take are ignored. Here is a monitor for the property "carA and carB enter the intersection before carC": We use the variables seenA and seenB to remember whether we have seen carA and carB respectively enter the intersection. The loop will iterate as long as at least one of the cars has not yet entered the intersection, so if carC enters before either carA or carB, the requirement on line 4 will fail and we will reject the simulation. Note the necessity of the wait statement on line 9: if we omitted it, the loop could run forever without any time actually passing in the simulation.
Preconditions and Invariants. Even general behaviors designed to be used in multiple scenarios may not operate correctly from all possible starting states: for example, FollowLaneBehavior assumes that the agent is actually in a lane rather than, say, on a sidewalk. To model such assumptions, Scenic provides a notion of guards for behaviors. Most simply, we can specify one or more preconditions: Here, the precondition requires that whenever the MergeInto behavior is executed by an agent, the agent must not already be in the destination lane but should be on the same road. We can add any number of such preconditions; like ordinary requirements, violating any precondition causes the simulation to be rejected.
Since behaviors can be interrupted, it is possible for a behavior to resume execution in a state it doesn't expect: imagine a car which is lane following, but then swerves onto the shoulder to avoid an accident; naïvely resuming lane following, we find we are no longer in a lane. To catch such situations, Scenic allows us to define invariants which are checked at every time step during the execution of a behavior, not just when it begins running. These are written similarly to preconditions: While the default behavior for guard violations is to reject the simulation, in some cases it may be possible to recover from a violation by taking some additional actions. To enable this kind of design, Scenic signals guard violations by raising a GuardViolation exception which can be caught like any other exception; the simulation is only rejected if the exception propagates out to the top level. So to model the lane-following-with-collision-avoidance behavior suggested above, we could write code like this: When any object comes within 5 meters, we suspend lane following and switch to collision avoidance. When the latter completes, FollowLaneBehavior will be resumed; if its invariant fails because we are no longer on the road, we catch the resulting InvariantViolation exception and run a GetBackOntoRoad behavior to restore the invariant. The whole try statement then completes, so the outermost loop iterates and we begin lane following once again.
Terminating the Scenario. By default, scenarios run forever, unless a time limit is specified when running the Scenic tool. However, scenarios can also define termination criteria using the terminate when statement; for example, we could decide to end a scenario as soon as the ego car travels at least a certain distance: 1 start = Point on road 2 ego = Car at start 3 terminate when (distance to start) >= 50 Additionally, the terminate statement can be used inside behaviors and monitors: if it is ever executed, the scenario ends. For example, we can use a monitor to terminate the scenario once the ego spends 30 time steps in an intersection:

Compositional Scenarios
The previous two sections showed how Scenic allows us to model both the spatial and temporal aspects of a scenario. Scenic also provides facilities for defining scenarios as reusable modules and composing them in various ways. These features make it possible to write a library of simple scenarios which can then be used as building blocks to construct many more complex scenarios. Modular Scenarios. To define a named, reusable scenario, optionally with tunable parameters, Scenic provides the scenario statement. For example, here is a scenario which creates a parked car on the shoulder of the ego's current lane (assuming there is one), using some APIs from the driving library: The setup block contains Scenic code which executes when the scenario is instantiated, and which can define classes, create objects, declare requirements, etc. as in any of the example scenarios we saw above. Additionally, we can define preconditions and invariants, which operate in the same way as for dynamic behaviors. Having now defined the ParkedCar scenario, we can use it in a more complex scenario, potentially multiple times: of a scenario is executed in essentially the same way as a monitor, and allows all the same control-flow constructs. For example, we could write a compose block as follows: Here, a new parked car is created every 30 seconds 2 , with the distance to the curb alternating between 0.25 and 0.5 m. Note that without the for 30 seconds qualifier, we would never get past line 2, since the ParkedCar scenario does not define any termination conditions using terminate when (or terminate) and so runs forever by default. If instead we want to create a new car only when the ego has passed the current one, we can use a do-until statement: Note how we can refer to the parkedCar variable created in the ParkedCar scenario as a property of the scenario. Combined with the ability to pass objects as parameters of scenarios, this is convenient for reusing objects across scenarios.
Interrupts, Overriding, and Initial Scenarios. The try-interrupt statement used in behaviors can also be used in compose blocks to switch between scenarios. For example, suppose we already have a scenario where the ego is following a leadCar, and want to elaborate it by adding a parked car which suddenly pulls in front of the lead car. We could write a compose block as follows: If the ParkedCarPullingAheadOf scenario is defined to end shortly after the parked car finishes entering the lane, the interrupt handler will complete and Scenic will resume executing FollowingScenario on line 3 (unless the ego is still within 10 m of the lead car).
Suppose that we want the lead car to behave differently while ParkedCarPullingAheadOf is running; for example, perhaps the behavior for the lead car defined in FollowingScenario does not handle a parked car suddenly pulling in. To enable changing the behavior or other properties of an object in a sub-scenario, Scenic provides the override statement, which we can use as follows: Here we override the behavior property of target for the duration of the scenario, reverting it back to its original value (and thereby continuing to execute the old behavior) when the scenario terminates. The override object specifier , . . . statement has the same syntax as an object definition, and can specify any properties of the object except for dynamic properties like position or speed which are updated every time step by the simulator (and can only be indirectly controlled by taking actions).
In order to allow writing scenarios which can both stand on their own and be invoked during another scenario, Scenic provides a special conditional statement testing whether we are inside the initial scenario, i.e., the very first scenario to run. Random Selection of Scenarios. For very general scenarios, like "driving through a city, encountering typical human traffic", we may want a variety of different events and interactions to be possible. We saw above how we can write behaviors for individual agents which choose randomly between possible actions; Scenic allows us to do the same with entire scenarios. Most simply, since scenarios are first-class objects, we can write functions which operate on them, perhaps choosing a scenario from a list of options based on some complex criterion: However, some scenarios may only make sense in certain contexts; for example, a scenario involving a car running a red light can take place only at an intersection.
To facilitate modeling such situations, Scenic provides variants of the do statement which choose scenarios to run randomly amongst only those whose preconditions are satisfied: Here, line 1 checks the preconditions of the three given scenarios, then executes one (and only one) of the enabled scenarios. If for example the current road has no shoulder, then ParkedCar will be disabled and we will have a 50/50 chance of executing either RedLightRunner or Jaywalker (assuming their preconditions are satisfied). If none of the three scenarios are enabled, Scenic will reject the simulation. Line 2 shows a non-uniform variant, where RedLightRunner is twice as likely to be chosen as each of the other scenarios (so if only ParkedCar is disabled, we will pick RedLightRunner with probability 2/3; if none are disabled, 2/4). Finally, line 3 is a shuffled variant, where all three scenarios will be executed, but in random order 3 .
All of the examples we have seen above illustrate the versatility of Scenic in modeling a wide range of interesting scenarios. Complete Scenic code for the bumper-to-bumper scenario of Fig. 1, the Mars rover scenario of Fig. 5, as well as other scenarios used as examples in this section or in our experiments, along with images of generated scenes, can be found in Appendix A.

Syntax of Scenic
Scenic is an object-oriented PPL, with programs consisting of sequences of statements built with standard imperative constructs including conditionals, loops, functions, and methods (which we do not describe further, focusing on the new elements). Compared to other imperative PPLs, the major restriction of Scenic, made in order to allow more efficient sampling, is that conditional branching may not depend on random variables (except in behaviors). The novel syntax, outlined above, is largely devoted to expressing spatiotemporal relationships in a concise and flexible manner. Figure 8 gives a formal grammar for Scenic, which we now describe in detail.

Data Types
Scenic provides several primitive data types: Booleans expressing truth values. Scalars floating-point numbers, which can be sampled from various distributions (see Table 1). Vectors representing positions and offsets in space, constructed from coordinates in meters with the syntax (X, Y) 4 . Headings representing orientations in space. Conveniently, in 2D these are a single angle (in radians, anticlockwise from North). By convention the heading of a local coordinate system is the heading of its y-axis, so, for example, (-2, 3) means 2 meters left and 3 ahead. Vector Fields associating an orientation to each point in space. For example, the shortest paths to a destination or (in our case study) the nominal traffic direction. Regions representing sets of points in space. These can have an associated vector field giving points in the region preferred orientations (e.g. the surface of an object could have normal vectors, so that objects placed randomly on the surface face outward by default).
In addition, Scenic provides objects, organized into single-inheritance classes specifying a set of properties their instances must have, together with corresponding default values (see Fig. 8). Default value expressions are evaluated each time  normal, truncated to the given window an object is created. Thus if we write weight: Range(1, 5) when defining a class then each instance will have a weight drawn independently from Range(1, 5). Default values may use the special syntax self.property to refer to one of the other properties of the object, which is then a dependency of this default value. In our case study, for example, the width and length of a Car are by default derived from its model. Physical objects in a scene are instances of Object, which is the default superclass when none is specified. Object descends from the two other built-in classes: its superclass is OrientedPoint, which in turn subclasses Point. These represent locations in space, with and without an orientation respectively, and so provide the fundamental properties heading and position. Object extends them by defining a bounding box with the properties width and length, as well as temporal information like speed and behavior. Table 2 lists the properties of these classes and their default values.
To allow cleaner notation, Point and OrientedPoint are automatically interpreted as vectors or headings in contexts expecting these (as shown in Fig. 8). For example, we can write taxi offset by (1,2) and 30 deg relative to taxi instead of taxi.position offset by (1,2) and 30 deg relative to taxi.heading. Ambiguous cases, e.g. taxi relative to limo, are illegal (caught by a simple type system); the more verbose syntax must be used instead.

Expressions
Scenic's expressions are mostly straightforward, largely consisting of the arithmetic, boolean, and geometric operators shown in Fig. 10. The meanings of these operators are largely clear from their syntax, so we defer complete definitions of their semantics to the Appendix [18]. Figure 9 illustrates several of the geometric operators (as well as some specifiers, which we will discuss in the next section). Various points to note: • X can see Y uses a simple model where a Point can see a certain distance, and an OrientedPoint restricts this to the sector along its heading with a certain angle (see Table 2). An Object is visible iff its bounding box is. • X relative to Y interprets X as an offset in a local coordinate system defined by Y . Thus (-3, 0) relative to Y yields 3 m West of Y if Y is a vector, and 3 m left of Y if Y is an OrientedPoint. If defining a heading inside a specifier, either X or Y can be a vector field, interpreted as a heading by evaluating it at the position of the object being specified. So we can write for example Car at (120, 70), facing 30 deg relative to roadDirection.
• visible region yields the part of the region visible from the ego, so we can write for example Car on visible road. The form region visible from X uses X instead of ego.   does not make y uniform over the unit box, but rather over its diagonal. For convenience in sampling multiple times from a primitive distribution, Scenic provides a resample(D) function returning an independent 5 sample from D, one of the  Table. The second type of complex Scenic expressions are object definitions. These are the only expressions with a side effect, namely creating an object in the generated scene. More interestingly, properties of objects are specified using the system of specifiers discussed above, which we now detail.

Specifiers
As shown in the grammar in Fig. 8, an object is created by writing the class name followed by a (possibly empty) comma-separated list of specifiers. The specifiers are combined, possibly adding default specifiers from the class definition, to form a complete specification of all properties of the object. Arbitrary properties (including user-defined properties with no meaning in Scenic) can be specified with the generic specifier with property value, while Scenic provides many more specifiers for the built-in properties position and heading, shown in Tables 3 and 4 respectively.
In general, a specifier is a function taking in values for zero or more properties, its dependencies, and returning values for one or more other properties, some of which can be specified optionally, meaning that other specifiers will override them. For example, on region specifies position and optionally specifies heading if the given region has a preferred orientation. If road is such a region, as in our case study, then Object on road will create an object at a position uniformly random in road and with the preferred orientation there. But since heading is only specified optionally, we can override it by writing Object on road, facing 20 deg.
Specifiers are combined to determine the properties of an object by evaluating them in an order ensuring that their dependencies are always already assigned. If there is no such order or a single property is specified twice, the scenario is illformed. The procedure by which the order is found, taking into account properties that are optionally specified and default values, will be described in the next section. As the semantics of the specifiers in Tables 3 and 4 are largely evident from their syntax, we defer exact definitions to the Appendix [18]. We briefly discuss some of the more complex specifiers, referring to the examples in Fig. 9: • behind vector means the object is placed with the midpoint of its front edge at the given vector, and similarly for ahead/left/right of vector . • beyond A by O from B means the position obtained by treating O as an offset in the local coordinate system at A oriented along the line of sight from B . In this and other specifiers, if the from B is omitted, the ego object is used by default. So for example beyond taxi by (0, 3) means 3 m directly behind the taxi as viewed by the camera (see Fig. 9 for another example). • The heading optionally specified by left of OrientedPoint, etc. is that of the OrientedPoint (thus in Fig. 9, P offset by (0, -2) yields an OrientedPoint facing the same way as P). Similarly, the heading optionally specified by the following vectorField specifier is that of the vector field at the specified position. • apparently facing H means the object has heading H with respect to the line of sight from ego. For example, apparently facing 90 deg would orient the object so that the camera views its left side head-on.

Statements
Finally, we discuss Scenic's statements, listed in Table 5. Class and object definitions have been discussed above, and variable assignment behaves in the standard way.
Selecting a World Model. The model name statement specifies that the Scenic program is written for the given Scenic world model. It is equivalent to the statement from name import * (as in Python), importing everything from the given Scenic module, but can be overridden from the command-line when running the Scenic tool. This enables writing cross-platform scenarios using abstract domains like scenic.domains.driving, then executing them in particular simulators by overriding the model with a more specific module (e.g. scenic.simulators.carla.model).
Global Parameters. The statement param name = value, . . . assigns values to global parameters of the scenario. These have no semantics in Scenic but provide a general-purpose way to encode arbitrary global information. For example, in our case study we used parameters time and weather to put distributions on the time of day and the weather conditions during the scene. Behaviors and Monitors. The behavior statement (see Fig. 8) defines a dynamic behavior. A behavior definition has the same structure as a function definition, except: 1) it may begin with any number of precondition: boolean and invariant: boolean lines defining preconditions and invariants; 2) it may use the statements in the second section of Tab. 5, which are not allowed in ordinary functions. The monitor statement has the same structure as a behavior statement but defines a monitor.
Modular Scenarios. The scenario statement (see Fig. 8) defines a modular scenario which can be invoked from another scenario. Scenario definitions begin like behavior definitions, with a name, parameters, preconditions, and invariants. However, the body of a scenario consists of two parts, either of which can be omitted: a setup block and a compose block. The setup block contains code that runs once when the scenario begins to execute, and is a list of statements like a top-level Scenic program 6 . The compose block orchestrates the execution of sub-scenarios during a dynamic scenario, and may use do and any of the other statements allowed inside behaviors (except take, which only makes sense for an individual agent).
Requirements. The require boolean statement requires that the given condition hold in all generated scenarios (equivalently to observe statements in other probabilistic programming languages; see e.g. [41,6]). The variant require[p] boolean adds a soft requirement that need only hold with some probability p (which must be a constant). We will discuss the semantics of these in the next section. The require always and require eventually variants define requirements that must hold in every and some time step of the scenario respectively.
Mutation. The mutate instance, . . . by number statement adds Gaussian noise with the given standard deviation (default 1) to the position and heading properties of the listed objects (or every Object, if no list is given). For example, mutate taxi by 2 would add twice as much noise as mutate taxi. The noise can be controlled separately for position and heading, as we discuss in the next section.
Termination Conditions. The terminate when boolean statement defines a condition which is monitored as in require eventually, but which when true causes the scenario to end. The terminate statement can be called inside a behavior, monitor, or compose block to end the scenario immediately.
Actions. The take action, . . . statement can be used inside behaviors to select one or more actions 7 for the agent to take in the current time step. The wait statement means no actions are taken in this time step (which makes sense inside monitors and compose blocks). When either of these statements is executed, the behavior is suspended until one time step has elapsed; then its invariants are checked (raising an InvariantViolation exception if any are violated) and it is resumed.
Invoking Other Behaviors and Scenarios. The do name, . . . statement has the same structure as the take statement, but invokes one or more behaviors (if in a behavior) or scenarios (if in a compose block). It does not return until the subbehavior/sub-scenario terminates, so multiple time steps may pass (unlike take). Early termination can be enabled by adding a for scalar seconds/steps clause, which enforces a maximum time limit, or an until boolean clause, which adds an arbitrary termination criterion. When the do statement returns, the invariants of the calling behavior/scenario are checked as above.
Interrupts. The try statement (see Fig. 8) consists of a try: block and one or more interrupt when boolean: and except exception: blocks, each containing arbitrary lists of statements. As described in Sec. 3.2, when a try statement executes, the conditions for each interrupt when block are checked at each time step. While none of them are true, the try block executes. When an interrupt condition becomes true, the body of the corresponding block is executed (with lower blocks preempting those above), suspending any behaviors/scenarios that were executing in the try block until the interrupt handler completes (at which point the invariants of the suspended behavior/scenario are checked as usual). Any exceptions raised in the try block or any interrupt handler can be caught by except blocks as in the Python try statement. Additionally, any block may execute the abort statement to immediately terminate the entire try statement.
Overrides. The override name specifier , . . . statement may be used inside a scenario definition to override properties of an object during a dynamic scenario. It has the same structure as an object definition, with override and the name of the object replacing the class, so for example given an object taxi we could write override taxi with aggression 3 to set the aggression property of taxi to 3. Dynamic properties read back from the simulator at every time step, like position, cannot be overridden since they are controlled using actions and not direct assignments. Properties overridden by a scenario revert to their original values when the scenario terminates. When the behavior property is overridden, the original behavior is suspended, then resumed at the end of the scenario.

Semantics of Scenic
The output of a Scenic program has two parts: first, a scene consisting of an assignment to all the properties of each Object defined in the scenario, plus any global parameters defined with param. For dynamic scenarios, this scene forms the initial state of the scenario, which then changes after each time step according to the actions taken by the agents. Since actions and their effects are domainspecific (consider for example the different physics involved for aerial, ground, and underwater vehicles), dynamic Scenic scenarios do not directly define trajectories for objects. Instead, the second part of the output of a Scenic program is a policy, a function mapping the history of past scenes to the choice of actions for the agents in the current time step 8 . This pair of a scene and a policy is what we mean formally by the scenario generated by a Scenic program. Since Scenic is a probabilistic programming language, the semantics of a program is actually a distribution over possible outputs, here scenarios. As for other imperative PPLs, the semantics can be defined operationally as a typical interpreter for an imperative language but with two differences. First, the interpreter makes random choices when evaluating distributions [52]. For example, the Scenic statement x = Range(0, 1) updates the state of the interpreter by assigning a value to x drawn from the uniform distribution on the interval (0, 1). In this way every possible run of the interpreter has a probability associated with it. Second, every run where a require statement (the equivalent of an "observation" in other PPLs) is violated gets discarded, and the run probabilities appropriately normalized (see, e.g., [26]). For example, adding the statement require x > 0.5 above would yield a uniform distribution for x over the interval (0.5, 1).
Scenic uses the standard semantics for assignments, arithmetic, loops, functions, and so forth. Below, we define the semantics of the main constructs unique to Scenic. See the Appendix [18] for a more formal treatment.
Soft Requirements. The statement require[p] B is interpreted as require B with probability p and as a no-op otherwise: that is, it is interpreted as a hard requirement that is only checked with probability p. This ensures that the condition B will hold with probability at least p in the induced distribution of the Scenic program, as desired.
Specifiers and Object Definitions. As we saw above, each specifier defines a function mapping values for its dependencies to values for the properties it specifies. When an object of class C is constructed using a set of specifiers S, the object is defined as follows (see the Appendix [18] for details): 1. If a property is specified (non-optionally) by multiple specifiers in S, an ambiguity error is raised. 2. The set of properties P for the new object is found by combining the properties specified by all specifiers in S with the properties inherited from the class C. 3. Default value specifiers from C are added to S as needed so that each property in P is paired with a unique specifier in S specifying it, with precedence order: non-optional specifier, optional specifier, then default value. 4. The dependency graph of the specifiers S is constructed. If it is cyclic, an error is raised. 5. The graph is topologically sorted and the specifiers are evaluated in this order to determine the values of all properties P of the new object.
Mutation. The mutate X by N statement sets the special mutationScale property to N (the mutate X form sets it to 1). At the end of evaluation of the Scenic program, but before requirements are checked, Gaussian noise is added to the position and heading properties of objects with nonzero mutationScale. The standard deviation of the noise is the value of the positionStdDev and headingStdDev property respectively (see Table 2), multiplied by mutationScale.
Dynamic Constructs. As suggested in Sec. 4.4, behaviors and monitors are coroutines: they usually execute like ordinary functions, but are suspended when they take an action (or wait) until one time step has passed. Scenarios behave similarly: in their compose blocks, using wait causes them to wait for one step, and any subscenarios they invoke using do run recursively; scenarios without compose blocks do nothing in a time step other than check whether any of their terminate when conditions have been met or their require always conditions violated. The output of the policy of a dynamic Scenic program is defined according to the following procedure: 1. Run the compose blocks of all currently-running scenarios for one time step.
If any require conditions fail, discard the simulation. If instead the top-level scenario finishes its compose block (if any), one of its terminate when conditions is true, or it executes terminate, set a flag to remember this (we use a flag rather than terminating immediately since we need to ensure that all requirements are satisfied before terminating). 2. Check all require always conditions of currently-running scenarios; if any fail, discard the simulation. 3. Run all monitors of currently-running scenarios for one time step. As above, discard the simulation if any require conditions fail, and set the terminate flag if the terminate statement is executed. 4. If the flag is set, check that all require eventually conditions were satisfied at some time step: if so, terminate the simulation; otherwise, discard it.
5. Run all the behaviors of dynamic agents for one time step, gathering their actions and discarding the simulation or setting the terminate flag as in (3). 6. Repeat (4) to check the terminate flag. 7. Return the choice of actions selected by the dynamic agents.
The problem of sampling scenes from the distribution defined by a Scenic program is essentially a special case of the sampling problem for imperative PPLs with observations (since soft requirements can also be encoded as observations). While we could apply general techniques for such problems 9 , the domain-specific design of Scenic enables specialized sampling methods, which we discuss below. We also note that the scenario generation problem is closely related to control improvisation, an abstract framework capturing various problems requiring synthesis under hard, soft, and randomness constraints [16]. Scenario improvisation from a Scenic program can be viewed as an extension with a more detailed randomness constraint given by the imperative part of the program.

Domain-Specific Sampling Techniques
The geometric nature of the constraints in Scenic programs, together with Scenic's lack of conditional control flow outside behaviors, enable domain-specific sampling techniques inspired by robotic path planning methods. Specifically, we can use ideas for constructing configuration spaces to prune parts of the sample space where the objects being positioned do not fit into the workspace. Furthermore, by combining spatial and temporal constraints, we can prune some initial scenes by proving that they force a requirement to be violated at some future point during a dynamic scenario. We describe several pruning techniques below, deferring formal statements of the algorithms to the Appendix [18].
Pruning Based on Containment. The simplest technique applies to any object X whose position is uniform in a region R and which must be contained in a region C (e.g. the road in our case study). If minRadius is a lower bound on the distance from the center of X to its bounding box, then we can restrict R to R ∩ erode(C, minRadius). This is sound, since if X is centered anywhere not in the restriction, then some point of its bounding box must lie outside of C.
Pruning Based on Orientation. The next technique applies to scenarios placing constraints on the relative heading and the maximum distance M between objects X and Y , which are oriented with respect to a vector field that is constant within polygonal regions (such as our roads). For each polygon P , we find all polygons Q i satisfying the relative heading constraints with respect to P (up to a perturbation if X and Y need not be exactly aligned to the field), and restrict P to P ∩ dilate(∪Q i , M ). This is also sound: suppose X can be positioned at x in polygon P . Then Y must lie at some y in a polygon Q satisfying the constraints, and since the distance from x to y is at most M , we have x ∈ dilate(Q, M ).
Pruning Based on Size. In the setting above of objects X and Y aligned to a polygonal vector field (with maximum distance M ), we can also prune the space using a lower bound on the width of the configuration. For example, in our bumperto-bumper scenario we can infer such a bound from the offset by specifiers in the program. We first find all polygons that are not wide enough to fit the configuration according to the bound: call these "narrow". Then we restrict each narrow polygon P to P ∩ dilate(∪Q i , M ) where Q i runs over all polygons except P . To see that this is sound, suppose object X can lie at x in polygon P . If P is not narrow, we do not restrict it; otherwise, object Y must lie at y in some other polygon Q. Since the distance from x to y is at most M , as above we have x ∈ dilate(Q, M ).
Pruning Based on Reachability. Finally, we can prune initial positions for objects which make it impossible to reach a goal location within the duration of the scenario; for example, a car which travels down a road and then runs a red light must start sufficiently close to an intersection. Suppose an object is required to enter a region R within T time (either by an explicit require eventually statement or a precondition of a behavior or scenario guaranteed to eventually execute) and we have an upper bound S on the object's speed. Then we can prune away all initial positions of the object which do not lie within a distance D = ST of R, i.e., we can restrict its initial positions to dilate(R, D). If the object is also required to stay within some containing region C (e.g., a road) for the entire duration of the scenario, we can compute a tighter value of D by considering only paths that lie within C.
After pruning the space as described above, our implementation uses rejection sampling, generating scenes from the imperative part of the scenario until all requirements are satisfied. While this samples from exactly the desired distribution, it has the drawback that a huge number of samples may be required to yield a single valid scene (in the worst case, when the requirements have probability zero of being satisfied, the algorithm will not even terminate). However, we found in our experiments that all reasonable scenarios we tried required only several hundred iterations at most, yielding a sample within a few seconds. Furthermore, the pruning methods above could reduce the number of samples needed by a factor of

Experimental Setup
For our main case study, we generated scenes in the virtual world of the video game Grand Theft Auto V (GTAV) [47]. We wrote a Scenic world model defining Regions representing the roads and curbs in (part of) this world, as well as a type of object Car providing two additional properties 10 : model, representing the type of car, with a uniform distribution over 13 diverse models provided by GTAV, and color, representing the car color, with a default distribution based on real-world car color statistics [9]. In addition, we implemented two global scene parameters: time, representing the time of day, and weather, representing the weather as one of 14 discrete types supported by GTAV (e.g. "clear" or "snow").
GTAV is closed-source and does not expose any kind of scene description language. Therefore, to import scenes generated by Scenic into GTAV, we wrote a plugin based on DeepGTAV 11 . The plugin calls internal functions of GTAV to create cars with the desired positions, colors, etc., as well as to set the camera position, time of day, and weather.
Our experiments used SqueezeDet [61], a convolutional neural network realtime object detector for autonomous driving 12 . We used a batch size of 20 and trained all models for 10,000 iterations unless otherwise noted. Images captured from GTAV with resolution 1920 × 1200 were resized to 1248 × 384, the resolution used by SqueezeDet and the standard KITTI benchmark [20]. All models were trained and evaluated on NVIDIA TITAN XP GPUs.
We used standard metrics precision and recall to measure the accuracy of detection on a particular image set. The accuracy is computed based on how well the network predicts the correct bounding box, score, and category of objects in the image set. Details are in the Appendix [18], but in brief, precision is defined as tp/(tp + f p) and recall as tp/(tp + f n), where true positives tp is the number of correct detections, false positives f p is the number of predicted boxes that do not match any ground truth box, and false negatives f n is the number of ground truth boxes that are not detected.

Testing and Falsification
We begin with the most straightforward application of Scenic, namely generating specialized data to test a system under particular conditions. We demonstrate both using a static scenario to test a perception component, and using a dynamic scenario to falsify a closed-loop system.

Testing a Perception Module
When testing a model, one may be interested in a particular operation regime. For instance, an autonomous car manufacturer may be more interested in certain road conditions (e.g. desert vs. forest roads) depending on where its cars will be mainly used. Scenic provides a systematic way to describe scenarios of interest and construct corresponding test sets.
To demonstrate this, we first wrote very general scenarios describing static scenes of 1-4 cars (not counting the camera), specifying only that the cars face within 10 • of the road direction. We generated 1,000 images from each scenario, yielding a training set X generic of 4,000 images, and used these to train a model M generic as described in Sec. 6.1. We also generated an additional 50 images from each scenario to obtain a generic test set T generic of 200 images.
Next, we specialized the general scenarios in opposite directions: scenarios for good/bad road conditions fixing the time to noon/midnight and the weather to sunny/rainy respectively, generating specialized test sets T good and T bad .
Evaluating M generic on T generic , T good , and T bad , we obtained precisions of 83.1%, 85.7%, and 72.8%, respectively, and recalls of 92.6%, 94.3%, and 92.8%. This shows that, as might be expected, the model performs better on bright days than on rainy nights. This suggests there might not be enough examples of rainy nights in the training set, and indeed under our default weather distribution rain is less likely than shine. This illustrates how specialized test sets can highlight the weaknesses and strengths of a particular model. In Sec. 6.3, we go one step further and use Scenic to redesign the training set and improve model performance.

Falsifying a Dynamic Closed-Loop System
Next, we demonstrate how we can use a dynamic Scenic scenario to test a closedloop system, using VerifAI's falsification facilities to monitor and analyze counterexamples to a system-level specification. We tested an autonomous agent 13 in the CARLA [7] driving simulator, for which we wrote a similar Scenic world model as we did for GTAV. This agent consists of a planner and controller (but no perception components) which implement basic driving behaviors including abiding by traffic lights, lane following, and collision avoidance.
We wrote a Scenic program describing a scenario where the ego vehicle (i.e. the autonomous agent) is performing a right turn at an intersection, yielding to the crossing traffic. As the ego approaches the intersection, the traffic light turns green, but a crossing car runs the red light. The ego vehicle has to decide either to yield or make a right turn. The crossing car executes a reactive behavior where it slows down to maintain a minimum distance with any car in front.
We allowed three environment parameters to vary in this scenario: -The traffic light's transition from red to green is triggered when the distance between the ego and the crossing car reaches a threshold, which was uniformly random between 10-20 m. -The crossing car's speed was uniformly random between 5-12 m/s. -The scenario takes place at a random 4-way intersection in the CARLA map.
To demonstrate how Scenic programs can be written in a generic, map-agnostic style, we used the same Scenic code on two different CARLA maps (Town05 and Town03).
We formulated a safety specification for the autonomous agent in Metric Temporal Logic, stating that the distance between the agent and the crossing car must be greater than 5 meters at all times. Giving this specification and the Scenic program to VerifAI, we generated 2,000 scenarios for each map. VerifAI monitored each simulation and computed the robustness value ρ of the MTL specification, which measures how strongly the specification was satisfied [33] (negative values meaning it was violated).
Our results are shown in Fig. 11. On the left, we plot ρ as a function of the traffic light trigger threshold and the speed of the crossing car. Each dot represents one simulation, with redder colors indicating smaller ρ, i.e., being closer to violating the safety specification. We found a significant number of violations, approximately 21% and 17% of tests on Town05 and Town03 respectively. From the plots we observe broadly similar behavior across the two maps, with the distance when the traffic light switch occurs being the dominant factor controlling failures of the autonomous agent (most failures occurring for values of 15-25 m).
On the right side of Fig. 11, we plot the average value of ρ at each intersection, with color again indicating the average value of ρ and the size of each dot being proportional to its variance. We can see that some intersections are much easier or harder for the autonomous agent to handle. Investigating some of the most extreme intersections, we observed that those with 4-lane legs and a turning radius of about 6.5 m caused the agent to fail most frequently. Re-testing the agent at such intersections, we found that this geometry often created a situation where the agent and the crossing car were merging into the same lane simultaneously, instead of one car completing its maneuver before the other.
These results show how we can use Scenic to find scenarios where a closed-loop system violates its specification. In 6.4, we will further show how Scenic can help us diagnose the root causes of failures and eliminate them through retraining. with roadDeviation resample(wiggle) Fig. 12: A scenario where one car partially occludes another. The property roadDeviation is defined in Car to mean its heading relative to the roadDirection. Fig. 13: Two scenes generated from the partial-occlusion scenario.

Training on Rare Events
In the synthetic data setting, we are limited not by data availability but by the cost of training. The natural question is then how to generate a synthetic data set that as effective as possible given a fixed size. In this section we show that overrepresenting a type of input that may occur rarely but is difficult for the model can improve performance on the hard case without compromising performance in the typical case. Scenic makes this possible by allowing the user to write a scenario capturing the hard case specifically. For our car detection task, an obvious hard case is when one car substantially occludes another. We wrote a simple scenario, shown in Fig. 12, which generates such scenes by placing one car behind the other as viewed from the camera, offset left or right so that it is at least partially visible; Fig. 13 shows some of the resulting images. Generating images from this scenario we obtained a training set X overlap of 250 images and a test set T overlap of 200 images.
For a baseline training set we used the "Driving in the Matrix" synthetic data set [30], which has been shown to yield good car detection performance even on real-world images 14 . Like our images, the "Matrix" images were rendered in GTAV; however, rather than using a PPL to guide generation, they were produced by allowing the game's AI to drive around randomly while periodically taking 14 We use the "Matrix" data set since it is known to be effective for car detection and was not designed by us, making the fact that Scenic is able to improve it more striking. The results of this experiment also hold under the Average Precision (AP) metric used in [30], as well as in a similar experiment using the Scenic generic two-car scenario from the last section as the baseline. See Appendix [18] for details. Table 6: Performance of models trained on 5,000 images from X matrix or a mixture with X overlap , averaged over 8 training runs with random selections of images from X matrix . screenshots. We randomly selected 5,000 of these images to form a training set X matrix , and 200 for a test set T matrix . We trained SqueezeDet for 5,000 iterations on X matrix , evaluating it on T matrix and T overlap . To reduce the effect of jitter during training we used a standard technique [2], saving the last 10 models in steps of 10 iterations and picking the one achieving the best total precision and recall. This yielded the results in the first row of Tab. 6. Although X matrix contains many images of overlapping cars, the precision on T overlap is significantly lower than for T matrix , indicating that the network is predicting lower-quality bounding boxes for such cars 15 .
Next we attempted to improve the effectiveness of the training set by mixing in the difficult images produced with Scenic. Specifically, we replaced a random 5% of X matrix (250 images) with images from X overlap , keeping the overall training set size constant. We then retrained the network on the new training set and evaluated it as above. To reduce the dependence on which images were replaced, we averaged over 8 training runs with different random selections of the 250 images to replace. The results are shown in the second row of Tab. 6. Even altering only 5% of the training set, performance on T overlap significantly improves. Critically, the improvement on T overlap is not paid for by a corresponding decrease on T matrix : performance on the original data set remains the same. Thus, by allowing us to specify and generate instances of a difficult case, Scenic enables the generation of more effective training sets than can be obtained through simpler approaches not based on PPLs.

Debugging Failures
In our final experiment, we show how Scenic can be used to generalize a single input on which a model fails, exploring its neighborhood in a variety of different directions and giving insight into which features of the scene are responsible for the failure. The original failure can then be generalized to a broader scenario describing a class of inputs on which the model misbehaves, which can in turn be used for retraining. We selected one scene from our first experiment, shown in Fig. 14, consisting of a single car viewed from behind at a slight angle, which M generic wrongly classified as three cars (thus having 33.3% precision and 100% recall). We wrote several scenarios which left most of the features of the scene fixed but allowed others to vary. Specifically, scenario (1) varied the model and   To investigate these possibilities further, we wrote a second round of variant scenarios, also shown in Tab. 7. The results confirmed the importance of model and color (compare (2) to (7)), as well as angle (compare (5) to (6)), but also suggested that being close to the camera could be the relevant aspect of the car's local position. We confirmed this with a final round of scenarios (compare (5) and  (8)), which also showed that the effect of car model is small among scenes where the car is close to the camera (compare (4) and (9)).
Having established that car model, closeness to the camera, and view angle all contribute to poor performance of the network, we wrote broader scenarios capturing these features. To avoid overfitting, and since our experiments indicated car model was not very relevant when the car is close to the camera, we decided not to fix the car model. Instead, we specialized the generic one-car scenario from our first experiment to produce only cars close to the camera. We also created a second scenario specializing this further by requiring that the car be viewed at a shallow angle.
Finally, we used these scenarios to retrain M generic , hoping to improve performance on its original test set T generic (to better distinguish small differences in performance, we increased the test set size to 400 images). To keep the size of the training set fixed as in the previous experiment, we replaced 400 one-car images in X generic (10% of the whole training set) with images generated from our scenarios. As a baseline, we used images produced with classical image augmentation techniques implemented in imgaug [31]. Specifically, we modified the original misclassified image by randomly cropping 10%-20% on each side, flipping horizontally with probability 50%, and applying Gaussian blur with σ ∈ [0.0, 3.0].
The results of retraining M generic on the resulting data sets are shown in Tab. 8. Interestingly, classical augmentation actually hurt performance, presumably due to overfitting to relatively slight variants of a single image. On the other hand, replacing part of the data set with specialized images of cars close to the camera significantly reduced the number of false positives like the original misclassification (while the improvement for the "shallow angle" scenario was less, perhaps due to overfitting to the restricted angle range). This demonstrates how Scenic can be used to improve performance by generalizing individual failures into scenarios that capture the essence of the problem but are broad enough to prevent overfitting during retraining.

Related Work
Data Generation and Testing for ML. There has been a large amount of work on generating synthetic data for specific applications, including text recognition [28], text localization [27], robotic object grasping [57], and autonomous driving [30,11]. Closely related is work on domain adaptation, which attempts to correct differences between synthetic and real-world input distributions. Domain adaptation has enabled synthetic data to successfully train models for several other applications in-cluding 3D object detection [37,54], pedestrian detection [58], and semantic image segmentation [49]. Such work provides important context for our paper, showing that models trained exclusively on synthetic data (possibly domain-adapted) can achieve acceptable performance on real-world data. The major difference in our work is that we provide, through Scenic, language-based systematic data generation for any cyber-physical system. Some works have also explored the idea of using adversarial examples (i.e. misclassified examples) to retrain and improve ML models (e.g., [62,59,23]). In particular, Generative Adversarial Networks (GANs) [22], a particular kind of neural network able to generate synthetic data, have been used to augment training sets [36,39]. The difference with Scenic is that GANs require an initial training set/pretrained model and do not easily incorporate declarative constraints, while Scenic produces synthetic data in an explainable, programmatic fashion requiring only a simulator.
Model-Based Test Generation. Techniques using a model to guide test generation have long existed [4]. A popular approach is to provide example tests, as in mutational fuzz testing [55] and example-based scene synthesis [12]. While these methods are easy to use, they do not provide fine-grained control over the generated data. Another approach is to give rules or a grammar specifying how the data can be generated, as in generative fuzz testing [55], procedural generation from shape grammars [42], and grammar-based scene synthesis [29]. While grammars allow much greater control, they do not easily allow enforcing global properties. This is also true when writing a program in a domain-specific language with nondeterminism [10]. Conversely, constraints as in constrained-random verification [43] allow global properties but can be difficult to write. Scenic improves on these methods by simultaneously providing fine-grained control, enforcement of global properties, specification of probability distributions, and simple imperative syntax.
Probabilistic Programming Languages. The semantics (and to some extent, the syntax) of Scenic are similar to that of other probabilistic programming languages such as Prob [26], Church [24], and BLOG [41]. In probabilistic programming the focus is usually on inference rather than generation (the main application in our case), and in particular to our knowledge probabilistic programming languages have not previously been used for test generation. However, the most popular inference techniques are based on sampling and so could be directly applied to generate scenes from Scenic programs, as we discussed in Sec. 5.
Several probabilistic programming languages have been used to define generative models of objects and scenes: both general-purpose languages such as WebPPL [25] (see, e.g., [46]) and languages specifically motivated by such applications, namely Quicksand [45] and Picture [34]. The latter are in some sense the most closely-related to Scenic, although neither provides specialized syntax or semantics for dealing with geometry or dynamic behaviors (Picture also was used only for inverse rendering, not data generation). The main advantage of Scenic over these languages is that its domain-specific design permits concise representation of complex scenarios and enables specialized sampling techniques.
Scenario Description Languages for Autonomous Driving. Recently, formal dynamic scenario description languages have been proposed for the domain of autonomous driving. The Paracosm language [38] is used to model dynamic scenarios with a reactive and synchronous model of computation. However, it is not a PPL, so it lacks probability distributions and declarative constraints; it also does not provide constructs like Scenic's interrupts which allow easy customization of generic behavior models. The Measurable Scenario Description Language (M-SDL) [13], introduced after the first version of Scenic, does provide declarative constraints, as well as compositional features similar to those we introduced in this paper. However, compared to both of these languages, Scenic has several distinguishing features: (i) it provides a much higher-level, declarative way of specifying geometric constraints; (ii) it is fundamentally a probabilistic programming language (as opposed to M-SDL where distributions are optional), and (iii) it is not specific to the autonomous driving domain (as demonstrated in [17,15]).

Conclusion
In this paper, we introduced Scenic, a probabilistic programming language for specifying distributions over configurations of physical objects and the behaviors of dynamic agents. We showed how Scenic can be used to generate synthetic data sets useful for a variety of tasks in the design of robust ML-based cyberphysical systems. Specifically, we used Scenic to generate specialized test sets and falsify a system, improve the robustness of a system by emphasizing difficult cases in its training set, and generalize from individual failure cases to broader scenarios suitable for retraining. In particular, by training on hard cases generated by Scenic, we were able to boost the performance of a car detector neural network (given a fixed training set size) significantly beyond what could be achieved by prior synthetic data generation methods [30] not based on PPLs.
In future work we plan to conduct experiments applying Scenic to a variety of additional domains, applications, and simulators. As we mentioned in the Introduction, we have already successfully applied Scenic to aircraft [15], and we are currently investigating applications in further domains including underwater vehicles and indoor robots. We also plan to extend the Scenic language itself in several directions, including allowing user-defined specifiers and describing 3D scenes. Finally, we are exploring ways to combine Scenic with automated analyses: in particular, reducing the human burden of writing Scenic programs through algorithms for synthesizing or adapting such programs (e.g. [32]), and improving the efficiency of falsification by performing white-box analyses of the system.

A Gallery of Scenarios
This section presents Scenic code for a variety of scenarios from our autonomous car case study (and the robot motion planning example used in Sec. 3), along with images rendered from them. The scenarios range from simple examples used above to illustrate different aspects of the language, to those representing interesting road configurations like platoons and lanes of traffic.

A.2 The Simplest Possible Scenario
This scenario, creating a single car with no specified properties, was used as an example in Sec. 3.
1 ego = Car 2 Car Fig. 15: Scenes generated from a Scenic scenario representing a single car (with reasonable default properties).

A.4 A Badly-Parked Car
This scenario, creating a single car parked near the curb but not quite parallel to it, was used as an example in Sec. 3.

A.5 An Oncoming Car
This scenario, creating a car 20-40 m ahead and roughly facing towards the camera, was used as an example in Sec. 3. Note that since we do not specify the orientation of the car when creating it, the default heading is used and so it will face the road direction. The require statement then requires that this orientation is also within 15 • of facing the camera (as the view cone is 30 • wide).

A.6 Adding Noise to a Scene
This scenario, using Scenic's mutation feature to automatically add noise to an otherwise completely-specified scenario, was used in the experiment in Sec. 6.4 (it is Scenario (3) in Table 7). The original scene, which is exactly reproduced by this scenario if the mutate statement is removed, is shown in Fig. 20

A.10 A Platoon, in Daytime
This scenario illustrates how Scenic can construct structured object configurations, in this case a platoon of cars. It uses a helper function provided by gtaLib for creating platoons starting from a given car, shown in Fig. 24. If no argument model is provided, as in this case, all cars in the platoon have the same model as the starting car; otherwise, the given model distribution is sampled independently for each car. The syntax for functions and loops supported by our Scenic implementation is inherited from Python.

A.11 Bumper-to-Bumper Traffic
This scenario creates an even more complex type of object structure, namely three lanes of traffic. It uses the helper function createPlatoonAt discussed above, plus another for placing a car ahead of a given car with a specified gap in between, shown in Fig

B Semantics of Scenic
In this section we give a precise semantics for Scenic expressions and statements, building up to a semantics for a complete program as a distribution over scenes.

B.1 Notation for State and Semantics
We will precisely define the meaning of Scenic language constructs by giving a small-step operational semantics. We will focus on the aspects of Scenic that set it apart from ordinary imperative languages, skipping standard inference rules for sequential composition, arithmetic operations, etc. that we essentially use without change. In rules for statements, we will denote a state of a Scenic program by s, σ, π, O , where s is the statement to be executed, σ is the current variable assignment (a map from variables to values), π is the current global parameter assignment (for param statements), and O is the set of all objects defined so far. In rules for expressions, we use the same notation, although we sometimes suppress the state on the righthand side of rules for expressions without side effects: e, σ, π, O → v means that in the state (σ, π, O), the expression e evaluates to the value v without side effects.
Since Scenic is a probabilistic programming language, a single expression can be evaluated different ways with different probabilities. Following the notation of [52,6], we write → p for a rewrite rule that fires with probability p (probability density p, in the case of continuous distributions). We will discuss the meaning of such rules in more detail below.

B.2 Semantics of Expressions
As explained in the previous section, Scenic's expressions are straightforward except for distributions and object definitions. As in a typical imperative probabilistic programming language, a distribution evaluates to a sample from the distribution, following the first rule in Fig. 30. For example, if baseDist is a uniform interval distribution and the parameters evaluate to low = 0 and high = 1, then the distribution can evaluate to any value in [0, 1] with probability density 1.
The semantics of object definitions are given by the second rule in Fig. 30. First note the side effect, namely adding the newly-defined object to the set O. The premises of the rule describe the procedure for combining the specifiers to obtain the overall set of properties for the object. The main step is working out the evaluation order for the specifiers so that all their dependencies are satisfied, as well as deciding for each specifier which properties it should specify (if it specifies a property optionally, another specifier could take precedence). This is done by the procedure resolveSpecifiers, shown formally as Alg. 1 and which essentially does the following: Let P be the set of properties defined in the object's class and superclasses, together with any properties specified by any of the specifiers. The object will have exactly these properties, and the value of each p ∈ P is determined as follows. If p is specified non-optionally by multiple specifiers the scenario is ill-formed. If p is only specified optionally, and by multiple specifiers, this is ambiguous and we also declare the scenario ill-formed. Otherwise, the value of p will be determined by its unique non-optional specifier, unique optional specifier, or the most-derived default value, in that order: call this specifier sp. Construct a directed graph with vertices P and edges to p from each of the dependencies of sp (if a dependency is not in P , then a specifier references a nonexistent property and the scenario is ill-formed). If this graph has a cycle, there are cyclic dependencies and the scenario is ill-formed (e.g. Car left of 0 @ 0, facing roadDirection: the heading must be known to evaluate left of vector , but facing vectorField needs position to determine heading). Otherwise, topologically sorting the graph yields an evaluation order for the specifiers so that all dependencies are available when needed.
The rest of the rule in Fig. 30 simply evaluates the specifiers in this order, accumulating the results as properties of self so they are available to the next specifier, finally creating the new object once all properties have been assigned. Note that we also accumulate the probabilities of each specifier's evaluation, since specifiers are allowed to introduce randomness themselves (e.g. the on region specifier returns a random point in the region).
As noted above the semantics of the individual specifiers are mostly straightforward, and exact definitions are given in Appendix C. To illustrate the pattern we precisely define two specifiers in Fig. 30: the with property value specifier, which has no dependencies but can specify any property, and the facing vectorField specifier, which depends on position and specifies heading. Both specifiers evaluate to maps assigning a value to each property they specify.

B.3 Semantics of Statements
The semantics of class and object definitions have been discussed above, while rules for the other statements are given in Fig. 31. As can be seen from the first rule, variable assignment behaves in the standard way. Parameter assignment is nearly identical, simply updating the global parameter assignment π instead of the variable assignment σ.
As noted above, the require boolean statement is equivalent to an observe in other languages, and following [6] we model it by allowing the "Hard Requirement" rule in Fig. 31 to only fire when the condition is satisfied (then turning the requirement into a no-op). If the condition is not satisfied, no rules apply and the program fails to terminate normally. When Object Definitions resolveSpecifiers(class, specifiers) = ((s 1 , p 1 ), . . . , (sn, pn))  Fig. 30: Semantics of expressions (excluding operators, defined in Appendix C), and two example specifiers. Here baseDist is viewed as a function mapping parameters θ to a distribution with density function P θ , and newInstance(class, props) creates a new instance of a class with the given property values.

Algorithm 1 resolveSpecifiers (class, specifiers)
gather all specified properties 1: specForProperty ← ∅ 2: optionalSpecsForProperty ← ∅ 3: for all specifiers S in specifiers do 4: for all properties P specified non-optionally by S do 5: if P ∈ dom specForProperty then 6: syntax error: property P specified twice 7: specForProperty (P ) ← S 8: for all properties P specified optionally by S do 9: optionalSpecsForProperty (P ).append(S) filter optional specifications 10: for all properties P ∈ dom optionalSpecsForProperty do 11: if P ∈ dom specForProperty then 12: continue 13: if |optionalSpecsForProperty (P )| > 1 then 14: syntax error: property P specified twice 15: specForProperty (P ) ← optionalSpecsForProperty (P )[0] add default specifiers as needed 16: defaults ← defaultValueExpressions (class) 17: for all properties P ∈ dom defaults do 18: if P ∈ dom specForProperty then 19: specForProperty (P ) ← defaults (P ) build dependency graph 20: G ← empty graph on dom specForProperty 21: for all specifiers S ∈ dom specForProperty do 22: for all dependencies D of S do 23: if D ∈ dom specForProperty then 24: syntax error: missing property D required by S 25: add an edge in G from specForProperty (D) to S 26: if G is cyclic then 27: syntax error: specifiers have cyclic dependencies construct specifier and property evaluation order 28: specsAndProps ← empty list 29: for all specifiers S in G in topological order do 30: specsAndProps.append((S, {P | specForProperty (P ) = S})) 31: return specsAndProps defining the semantics of entire Scenic scenarios below we will discard such non-terminating executions, yielding a distribution only over executions where all hard requirements are satisfied.
The statement require[p] boolean requires only that its condition hold with at least probability p. There are a number of ways the semantics of such a soft requirement could be defined: we choose the natural definition that require[p] B is equivalent to a hard requirement require B that is only enforced with probability p. This is reflected in the two corresponding rules in Fig. 31, and clearly ensures that the requirement B will hold with probability at least p, as desired.
Since the mutation statement mutate instance, . . . by number only causes noise to be added at the end of execution, as discussed above, its rule Fig. 31 simply sets a property on the object(s) indicating that mutation is enabled (and giving the scale of noise to be added). The noise is actually added by the first of two special rules that apply only once the program has been reduced to pass and so computation has finished. This rule first looks up the values of the properties mutationScale, positionStdDev, and headingStdDev for each object. Respectively, these specify the overall scale of the noise to add (by default zero, i.e. mutation is disabled) and factors allowing the standard deviation for position and heading to be adjusted individually.

B.4 Semantics of a Scenic Program
As we have just defined it, every time one runs a Scenic program its output is a scene consisting of an assignment to all the properties of each Object defined in the scenario, plus any global parameters defined with param. Since Scenic allows sampling from distributions, the imperative part of a scenario actually induces a distribution over scenes, resulting from the probabilistic rules of the semantics described above. Specifically, for any execution trace the product of the probabilities of all rewrite rules yields a probability (density) for the trace (see e.g. [6]). The declarative part of a scenario, consisting of its require statements, modifies this distribution. As mentioned above, hard requirements are equivalent to "observations" in other probabilistic programming languages, conditioning the distribution on the requirement being satisfied. In particular, if we discard all traces which do not terminate (due to violating a requirement), then normalizing the probabilities of the remaining traces yields a distribution

B.5 Sampling Algorithms
This section gives pseudocode for the domain-specific sampling techniques described in Sec. 5.2. Algorithm 2 implements pruning by orientation, pruning a set of polygons map given an allowed range of relative headings A, a distance bound M , and a bound δ on the heading deviation between an object and the vector field at its position.
Algorithm 3 similarly implements pruning by size, given map and M as above, plus a bound minWidth on the minimum width of the configuration. Here the subroutine narrow finds all polygons which are thinner than this bound.

C Detailed Semantics of Specifiers and Operators
This section provides precise semantics for Scenic's specifiers and operators, which were informally defined above.

C.1 Notation
Since none of the specifiers and operators have side effects, to simplify notation we write X for the value of the expression X in the current state (rather than giving inference rules). Throughout this section, S indicates a scalar , V a vector , H a heading, F a vectorField, R a region, P a Point, and OP an OrientedPoint. Figure 32 defines notation used in the rest of the semantics. In forwardEuler, N is an implementation-defined parameter specifying how many steps should be used for the forward Euler approximation when following a vector field (we used N = 4).
x, y = point with the given XY coordinates C.3 Specifiers for position and optionally heading Figure 34 gives the semantics of the position specifiers that also optionally specify heading.
The figure writes the semantics as an OrientedPoint value; if this is OP , the semantics of the specifier is to assign the position property of the object being constructed to OP.position, and the heading property of the object to OP.heading if heading is not otherwise specified (see Sec. 4 for a discussion of optional specifiers).
in R = on R = OrientedPoint  Figure 35 gives the semantics of the heading specifiers. As for the position specifiers above, the figure indicates the heading value assigned by each specifier.

C.5 Operators
Finally, Figures 36-41 give the semantics for Scenic's operators, broken down by the type of value they return. We omit the semantics for ordinary numerical and Boolean operators (max, +, or, >=, etc.), which are standard.

D Additional Experiments
This section gives additional details on the experiments and describes an experiment analogous to that of Sec. 6.3 but using the generic two-car Scenic scenario as a baseline. Table 9: Average precision (AP) results for the experiments in Table 6.

Overlapping Scenario Experiments
In Sec. 6.3 we showed how we could improve the performance of squeezeDet trained on the Driving in the Matrix dataset [30] by replacing part of the training set with images of overlapping cars. We used the standard precision and recall metrics defined above; however, [30] uses a different metric, AP (which stands for Average Precision, but is not simply the average of the precision over the test images). For completeness, Table 9 shows the results of our experiment measured in AP (as computed using [5]). The outcome is the same as before: by using the mixture, performance on overlapping images significantly improves, while performance on the original dataset is unchanged. For a cleaner comparison of overlapping vs. non-overlapping cars, we also ran a version of the experiment in Sec. 6.3 using the generic two-car Scenic scenario as a baseline. Specifically, we generated 1,000 images from that scenario, obtaining a training set Xtwocar. We also generated 1,000 images from the overlapping scenario to get a training set X overlap .
Note that Xtwocar did contain images of overlapping cars, since the generic twocar scenario does not constrain the cars' locations. However, the average overlap was much lower than that of X overlap , as seen in Fig. 42 (note the log scale): thus the overlapping car images are highly "untypical" of generic two-car images. We would like to ensure the network performs well on these difficult images by emphasizing them in the training set. So, as before, we constructed various mixtures of the two training sets, fixing the total number of images but using different ratios of images from X overlap . We trained the network on each of these mixtures and evaluated their performance on 400-image test sets Ttwocar and T overlap from the two-car and overlapping scenarios respectively.
To reduce the effect of randomness in training, we used the maximum precision and recall obtained when training for 4,000 through 5,000 steps in increments of 250 steps. Additionally, we repeated each training 8 times, using a random mixture each time: for example, for the 90/10 mixture of Xtwocar and X overlap , each training used an independent random choice of which 90% of Xtwocar to use and which 10% of X overlap .
As Tab. 10 shows, we obtained the same results as in Sec. 6.3: the model trained purely on generic two-car images has high precision and recall on Ttwocar but has drastically worse recall on T overlap . However, devoting more of the training set to overlapping cars gives a large improvement to recall on T overlap while leaving performance on Ttwocar essentially the same. This again demonstrates that we can improve the performance of a network on difficult corner cases by using Scenic to increase the representation of such cases in the training set.