Criticality Metrics for Automated Driving: A Review and Suitability Analysis of the State of the Art

The large-scale deployment of automated vehicles on public roads has the potential to vastly change the transportation modalities of today’s society. Although this pursuit has been initiated decades ago, there still exist open challenges in reliably ensuring that such vehicles operate safely in open contexts. While functional safety is a well-established concept, the question of measuring the behavioral safety of a vehicle remains subject to research. One way to both objectively and computationally analyze traffic conflicts is the development and utilization of so-called criticality metrics. Contemporary approaches have leveraged the potential of criticality metrics in various applications related to automated driving, e.g. for computationally assessing the dynamic risk or filtering large data sets to build scenario catalogs. As a prerequisite to systematically choose adequate criticality metrics for such applications, we extensively review the state of the art of criticality metrics, their properties, and their applications in the context of automated driving. Based on this review, we propose a suitability analysis as a methodical tool to be used by practitioners. Both the proposed method and the state of the art review can then be harnessed to select well-suited measurement tools that cover an application’s requirements, as demonstrated by an exemplary execution of the analysis. Ultimately, efficient, valid, and reliable measurements of an automated vehicle’s safety performance are a key requirement for demonstrating its trustworthiness.


Introduction
A launch of automated vehicles (AVs) to public roads promises various societal benefits, ranging from improving the economic efficiency of the transportation system up to increasing the mobility of parts of the population [23].A key factor and potential bottleneck in this feat is assuring and demonstrating safe behavior of the driving functions The research leading to these results is funded by the German Federal Ministry for Economic Affairs and Climate Action within the projects 'VVM -Verification & Validation Methods for Automated Vehicles Level 4 and 5' and 'SET Level -Simulationbased Development and Testing of Automated Driving', based on a decision by the Parliament of the Federal Republic of Germany.
deployed in such vehicles beyond functional safety [42].Currently, publicly funded research undertakings, such as the PEGASUS family projects VVM and SET Level 1 , are investigating and developing methods and tools for the verification and validation of functions implementing SAE Levels 4 and 5 [82].
Specifically, research interest lies within analyzing the emergence of criticality in traffic [69].AVs are confronted with an unstructured, open context, particularly in mixed traffic environments.It is therefore imperative to identify potentially safety relevant factors of the context, such as the intricacies of human behavior, early on in the development process [85].For automated driving functions, these factors can subsequently be drawn upon to implement safety mechanisms mitigating their effects, as the human driver is no longer available as a potential fall-back mechanism.Therefore, it is essential to possess the ability of computing the effects of such factors by means of appropriate measurement tools for criticality.We note that ambiguity exists as to how criticality is defined in both the scientific literature as well as in industrial practice [37].For the scope of this work, criticality is understood as 'the combined risk of the involved actors when the traffic situation is continued' [69].
As one piece to solving the previously sketched challenge, this paper examines criticality metrics for quantifying certain aspects of criticality.Similar to Schütt et al. [88], we represent a (scene level) criticality metric as a function ∶ S × ℝ + → O that measures, for a given traffic scene S ∈ S at a time t ∈ ℝ + , aspects of criticality on a predeter- mined scale of measurement O ⊆ ℝ ∪ {−∞, +∞} .Scenario level criticality metrics extend this definition from scenes to scenarios [97], i.e. adding (retrospective) temporal aspects to the measurement.Most criticality metrics only quantify over a subset of the influencing factors that are associated with criticality, such as spatial, temporal, dynamical, perceptual, or environmental circumstances [29].
The capability of such metrics to compute an (aspectual) surrogate for criticality has lead to their potential being leveraged in various areas around automated driving, e.g. as objective functions for planning modules or for assessing the outcome of test cases.Due to the large amount of existing metrics as well as their variance in properties, selecting well-suited computational methods for criticality assessment within a given application poses an important challenge.
Hereon, this elaboration provides a blueprint to guide the reader in answering the question: How to identify a set of criticality metrics suited for computationally assessing criticality within the application at hand?Therefore, we (a) provide a novel method guiding users in answering this question, and (b) apply this method based on a schematic, unifying review of the current state of the art of criticality metrics, their usage, and their features.
We emphasize the necessity of answering the aforementioned research question by the following illustration.

Motivating Example
Consider the well-known Time To Collision (TTC) metric, which intuitively queries a dynamic motion model (DMM) for a predicted collision between two actors and computes the time until said event.For example, the TTC can be used as a part of an automated emergency braking (AEB) system for car following scenarios, where the decision logic of the system can incorporate the metric's computation, e.g. by providing a warning to the driver [83].Hence, within the application of an AEB, the TTC's information can play a vital role in situation assessment.By showing that its computed value provides a high sensitivity in car following scenarios w.r.t. the criticality induced by the lead vehicle, it can provide the AEB system with relevant information on the temporal proximity to a potentially necessary emergency braking.A highway AEB system that employs a TTC thus uses a reliable indicator in its decision making process.
As another example, the TTC can also suitably filter large highway data sets due to both its efficient algorithmic implementation as well as methods for aggregation over time that enable retrospective assessments.Such filtered data sets can then be used to build scenario catalogs for the verification and validation of highway driving functions.It has to be noted that it is unlikely that such a catalog contains every critical scenario, as the TTC only detects a certain aspect of criticality (cf.upper scenario in Fig. 1).For adequate coverage in identifying safety-critical highway scenarios in a data set, computing multiple metrics is therefore imperative.
Contrasting the previous applications, a TTC combined with a single point kinematics prediction model can assess criticality of intersection scenarios only partially, as it evaluates to infinity in many scenes.This problem arises if the DMM predicts no contact, which is likely for nonfollowing scenarios: although it is probable that the paths of two vehicles intersect in such scenarios, it becomes unlikely for the actual trajectories to overlap, as proposed by Allen et al. in 1978 [4].For this, the DMM would have to predict equal positions at the same time, which is, depending on the spatio-temporal representation of traffic participants, highly improbable.
This problem can be partially mitigated by employing suitable computational methods such as using point sets instead of single points within the DMM and a discretization of the geometrical space, effectively increasing the chance of trajectory intersection.As another viable approach, having models calculate multiple trajectories and searching for a collision [101] can increase the chance of predicting a contact.But, in its common form, the TTC with a single point kinematics suffers from reduced validity outside its designated scenario class, as depicted in Fig. 1.Thus, it is not advisable to solely rely on a TTC when computing a safety assessment of vehicles in urban intersections.For such applications, several metrics ought to be used in combination as to constitute an appropriate measurement.

Problem Statement
From this introductory example it becomes evident that, especially for safety-critical systems operating in open contexts, it is inevitable to perform an in-depth analysis of the match between the requirements imposed by the planned computation of criticality and the capabilities of the employed tools.This is specifically true when considering the use of traditional safety metrics originating from early traffic conflict research-such as the TTC-, where the primary design goal was the analysis of human behavior.Great care needs to be taken in selection and adaptation when utilizing such computational methods within the field of automated driving.
As to generalize the previously introduced challenge, one can identify the relevant sets of scenarios over the universe of all scenarios, Sc , given an application and a computation of a binary criticality classification: 1. the set of scenarios relevant for the application, Sc app 2. the set of actually critical scenarios for the given application, Sc crit app 3. the set of actually uncritical scenarios for the given application, Sc ¬crit app 4. the set of critical scenarios as indicated by , Sc crit 5. the set of uncritical scenarios as indicated by , Sc ¬crit 6. the set of scenarios to which is applicable, Sc The sets Sc crit app and Sc ¬crit app can be considered as the ground truth that is relevant for the application, which are in turn targeted to be measured by .An exemplary relation between those sets is depicted in Fig. 2.
Here, we are presented with an application that is concerned with a set of scenarios Sc app , for example, the set of all highway scenarios.Let us assume that the metric requires high sensitivity, e.g. to computationally identify a large share of critical scenarios in a highway scenario data base.Formally, this means that the set Sc crit app ⧵ ( ) shall be small.In general, besides the coverage of the scenario space, applications impose a variety of requirements on K, for example concerning the metrics' output scales or runtime capabilities.
The methodical approach and the state of the art review of this work provide a schematic solution to this generalized challenge of identifying a potent set of criticality metrics K for Sc app .This deliberate selection hence enables statements on the scenario coverage of criticality metrics.
Fig. 1 The TTC with a constant velocity model is a suitable metric for car-following scenarios (upper roadway in upper scene), but has greatly reduced sensitivity outside this scenario class, both on highways (lower roadway in upper scene) and intersections (lower scene)

Overview of the Work
Systematic reviews of safety surrogates for traffic accident research have been conducted in the past and are referenced in Sect. 2. In contrast, this work focuses on their application to the domain of automated driving and both extensively reviews and unifies the state of the art.Specifically, the subject of the presented review and analysis is single scenario level metrics, assessing properties of interest considering the influence of the actors on the criticality of a single scene or scenario [88].Here, traffic participants are considered a black box.There exist both metrics for more refined models as well as more abstract views on the traffic happenings.For the first, white and gray box approaches are proposed, e.g. to assess the performance of perception components, consolidated as unitlevel metrics.In the second direction, there exist accumulated scenarios level metrics [88] to assess macroscopic properties [31].Both approaches are not in the scope of this work.
Figure 3 sketches the methodical approach presented and utilized to review and analyze the state of the art within this publication.
We first present a review of contemporary applications of criticality metrics in Sect.3. Note that our exemplary execution of the suitability analysis puts only rudimentary assumptions on the concrete shape of the applications, which affects the concretization of the results downstream.In practice, when confronted with a detailed application, the derived suitability claims will become more rigorous.In Sect. 4 we derive properties of interest that arise when considering the outlined applications.Subsequently, for each application, we identify requirements on the derived properties of the metrics.Taking this systematic requirement analysis into account, we review the state of the art of criticality metrics in Sect. 5. Here, we depict a large set of metrics and describe their basic concepts in a unified manner.Additionally, we examine to which degree the currently available metrics satisfy the previously derived properties.The results are available on a supplementary web page which is open for contributions from the community 2 .The suitability analysis that enables an evaluation of the review is subsequently depicted in Sect.6.Here, we present an expert-based process to methodically identify potential metrics for a given application.Finally, the results of Sects. 4 and 5 are utilized to validate the proposed method, mapping an exemplary application's requirements on the metrics' actual capabilities.
To summarize, the main contributions of this work are to 1. give an extensive and unifying review of the state of the art of criticality metrics, their applications, and their properties, 2. provide a blueprint for a suitability analysis that uses the state of the art review as a backbone, 3. and evaluate the review and proposed method by means of examining an exemplary application.
Due to the rapid developments both in industry and research, there will prospectively exist new properties, applications, and metrics.The reader is therefore invited to conduct a custom suitability analysis based on the results presented and contribute gained knowledge to the open repository 2 .Consequentially, during the continuous pursuit of deploying safe automated vehicles on public roads, we imagine a constant extension and refinement of the catalog initiated in this work to readily identify a well-suited measurement tool for computing aspects of criticality within the task at hand.

Related Work
Safety surrogate indicators have been developed and employed to research and analyze traffic safety, e.g. for civil engineering and vehicle safety purposes.Firstly, the concept of analyzing traffic conflicts instead of solely relying on traffic accidents was popularized by General Motors in 1968 [76].Canonically, the foundations to objectively identify and measure such conflicts have been laid in the 1970s, among others through the works of Hayward [33], Hydén [39], and Allen [4].With a focus on traffic conflict analysis, several studies have reviewed the development of surrogate indicators since then [6,65,91,99].Besides reviewing safety surrogates, there also exist discussions on their relevant properties w.r.t.traffic accident research, such as reliability and validity [17,59,91], as well as requirements thereon [45,91].None of these studies is concerned with the application of such indicators to the domain of vehicle automation, which necessarily imposes a different perspective, e.g.regarding formal rigor or run-time capability.
In the AV domain, safety surrogate indicators are typically referred to as criticality metrics.An initial overview on contemporary criticality metrics has been published by the authors and is extended and unified in this work [69, p. 14].Similarly, there exists work within the AV community that refers to a certain subset of criticality metrics, but does not provide an extensive review on the possible choices [31,36].Additionally, motion models play a crucial role when assessing future evaluations of a scene.In this regard, Lefèvre et al. have analyzed the suitability of different abstraction levels of motion models to be used for risk assessment [63], but focus on the modeling aspect and consider metrics only to a minor extent.Other studies present broader overviews on threat assessment methods for automated driving, but omit formal details and a concerned only with a restricted application setting [19,90].Additionally, the various advantages and disadvantages of the single approaches are not compared against each other, and no systematical approach for deriving a set of suitable metrics is given.The employment of such a method is essential when studying effects of combining various metrics given the goal of a reliable and valid measurement of criticality.
For the area of automated driving, this issue was partially recognized by Junietz [47], who systematically derived requirements on criticality metrics, albeit being primarily concerned with two applications, namely to 1. extrapolate a macroscopic risk statement that can be used in a safety argumentation of a given system, and 2. identify critical scenarios within data.
The derived requirements are later being used to develop a custom criticality metric satisfying the demands of both application.Scientific and industrial approach have recently started to use criticality metrics in various other applications besides the aforementioned ones, motivating the need for a re-consideration of the set of applications to derive requirements from.The work at hand therefore extends the idea to current approaches, specifically subsuming the scenario classification as well as the safety argumentation application as presented by Junietz.

Applications
In general, criticality metrics are employed to enable quantified statements over criticality, e.g. for mathematical modeling, computation, and analysis.Those statements can be used in various automated driving applications, specifically those relying on computational methods, such as simulation.Within this work, we understand an application as a specific process that is conducted during the design, implementation, analysis, or deployment of an AV.
This section presents a selection of common applications that rely on computation of criticality metrics.Major parts were derived based on earlier work of the authors [68], where an extensive literature review on processes within scenario-based testing approaches for automated vehicles was performed.
For most of the identified approaches, criticality metrics play a central role.Such roles were subsequently categorized and are depicted in this section.An overview of the identified applications, structured along the V-model [42], is given in Fig. 4. The depicted set of common applications eventually facilitates the derivation of substantiated properties of criticality metrics from the application perspective, depicted in Sect. 4.
Before beginning the literature review, we note that criticality metrics have been used under different terminologies in several fields of study and application areas over the last decades, ranging from traffic accident research over psychology up to vehicle automation, leading to a vast heterogeneity in the literature.Due to this, for assessing the state of the art, i.e. contribution 1), we deviated from the standard method of literature identification.Based on their diverse experience and collective knowledge in working in the aforementioned fields, the authors were able to assemble a list of applications, properties, and metrics.Due to this diversity in expertise, the identified state of the art is likely to be representative, i.e. to provide a sufficient coverage of the field.9

Implementation
When implementing an automated driving function (ADF) to be used in an AV, a major aspect of the implementation's requirements is concerned with safety.One possibility of implementing safe driving functions is drawing on the computations of criticality metrics, as sketched in the following.

Objective Function (A.1)
An automated driving function can be formulated as a function optimization problem involving various encoded constraints regarding efficacy, comfort, and safety [28].For the latter, directly or indirectly minimizing a criticality metric can be seen as one possibility to optimize safety.As an example, metrics have been used to purposefully increase the relevant training data set for planning components [1].

Run-Time Monitoring (A.2)
Monitoring the automated vehicle's state during run-time enables a dynamical assessment of the situational risk the vehicle is currently in [50], also known under the term dynamic risk assessment [80].Based on these monitoring results, the system is able to then execute appropriate reactions, e.g. an evasive maneuver.For example, specifications can be formulated in a suitable logic, e.g. a signal temporal logic, which then accesses a criticality metric as a signal [104].

Identification of Risk-Reducing States (A.3)
In case a safety-critical state is detected, a fallback system can guide the vehicle to a state with lower risk, often called a minimal risk maneuver (MRM) [82].In such a situation, other constraints-such as comfort-can be secondary.Here, the primary goal of the automated vehicle is be the identification of an optimally safe trajectory, e.g. by enforcing safety constraints to the planning optimization problem [93].Similar to A.1, those constraints can be defined through appropriate criticality metrics.

Verification and Validation
In the following, we present applications that are used for verification and validation of AV safety.If only accidents are defined to be unsafe events, the set of examined scenarios is extremely small.In this case, a safety argumentation would be based on a small sample size, leaving room for large  [42] statistical variances.A criticality metric allows to increase the set of scenarios by also considering 'near-miss' events.

Requirement Elicitation (B.1)
Defining Pass-/Fail-Criteria (B.1.a)To evaluate test results, particular pass-/fail-criteria need to be derived from previously identified high-level safety goals [51], e.g. using a Systems Theoretic Process Analysis [26].Together with target values [52], criticality metrics enable a precise argumentation on the performance of a vehicle in a test case.

Scenario Elicitation (B.2)
Scenario Classification (B.2.a)As to reduce downstream test efforts, the scenario space can be classified, e.g. in the form of a finite set of abstract or logical scenarios.This classification is performed relative to given criteria, e.g. the expected behavior of the ego vehicle [14] or the presence of phenomena [69].Criticality metrics can be used complementary in the definition of scenario classes, e.g. to constrain a class to a certain measured aspect of criticality [54,71].
Scenario Instantiation (B.2.b)Besides generating a set of meaningful abstract or logical scenarios, downstream testing requires representative instances of those classes.To identify particularly safety-relevant test cases, the choice of representation can be guided by criticality metrics [49].One possibility of identification is the use of optimization methods to derive relevant parameter combinations [31].The solution to this problem can be indirectly approximated by optimizing the scenario class through learned regression models of a criticality metric [69] or directly, e.g. through importance sampling or Bayesian optimization [26] [79].As manual investigation is cumbersome, criticality metrics present as a tool in an automated analysis of large data sets [55].
Additionally, filtered data may also be clustered and compared to each other [84].Those samples can later be used to build a scenario catalog [53].

Testing (B.3)
Search-Based Testing (B.3.a)When executing an abstract test case, it is of interest to guide the system under tests into states that challenge it w.r.t. to the tested property [32].If we are given a safety property, criticality metrics can be employed as a continuous or even differentiable fitness function to effectively test the most safety-relevant aspects [13].This can be achieved through (bounded) search-based optimization approaches [27].
Test Evaluation (B.3.b)During or after the execution of a test case, the performance of the system under test needs to be evaluated.This can be done based on the previously defined pass-/fail-criteria, which involve-possibly multiple-criticality metrics indicating the AV's safety performance [36].In contrast to defining pass-/fail-criteria, a test evaluation actually computes the value of metrics and compares their results against the given criteria.Additionally, tests may not be evaluated dichotomously, but also rated according to more complex scales.

Safety Argumentation (B.4)
Quantification for Hazardous Situations (B.4.a)Metrics can be used in a safety case to support a quantified argumentation on the automation's safety.For example, they can be utilized to demonstrate the effectiveness of a risk mitigation strategy for a given hazard.Specifically, this can be achieved by showing to which degree the strategy reduces the criticality of such hazardous situations [52].

Properties of Criticality Metrics
This section firstly presents properties of criticality metrics that are derived from the identified applications of Sect. 3. Subsequently, we identify requirements that the applications impose on those properties.In the end, the requirements on the properties become relevant when choosing a criticality metric for a given application.Thus, they serve as the basis for the suitability analysis in Sect.6.Note that there obviously exists a large amount of inter-dependencies between the properties, which we explicitly omit for sake of readability.

Derivation of Properties of the Metrics
For each property, we state a short explanation and an illustration.Additionally, if possible, we give an exemplary scale on which the metrics can be assessed on, which is then used in the subsequent requirement mapping.Note that the scale is employed for the purpose of this paper and may be tuned in granularity as seen fit.

Run-Time Capabilities
Some of the previously presented applications are real-time based.Hence, they require metrics to be computed during the application's run-time, i.e. online.This implies that the metric's algorithm is receiving an input data stream.This property subsumes a variety of run-time related sub-properties, for example those that are concerned with sampling rates or deadline responsiveness.In our case study, the property is measured on a binary scale.As an example, live data filtering in a measurement vehicle requires run-time capabilities.Here, if no buffer is implemented, the metric computation needs to be finished before the next data sample comes in.

Target Values
Some applications require the metric results to be compared to target values 3 .Such target values are highly dependent on the subject type, where for example a TTC limit can vary based on whether a human-driven or automated vehicle is assessed.This property states that there exists research, suggestions, or standards on potential target values for the given application and subject type.We assess the property on a binary scale for the scope of this publication.To give an illustration, the definition of pass-/fail-criteria requires to research target values to confidently assess the test performance.

Subject Type
Safety indicators have been developed with various subjects in mind, e.g.specifically for pedestrians.Therefore, we are concerned with the types of subjects criticality metrics will be used on by the application.A subject is defined as the entity for which the criticality metric is computed for.We differentiate between metrics being applied to human subjects and automations on an abstract scale [101].This distinction becomes especially important when considering the environment of the AV: for mixed traffic applications, one needs metrics that assess the criticality of human behavior as well that of automated vehicles.In fully automated traffic, this is no longer necessary.Then, metrics that are applicable solely to AVs can be used to estimate safety w.r.t. to the other actors.To give an example, analyzing a driving study of human drivers to mine relevant scenario for a safety assurance in mixed-traffic requires metrics specifically suitable to assess human behavior.

Scenario Type
The types of scenarios for which criticality metrics are computed differ widely from application to application.For example, an AEB system employs criticality metrics on highway following-vehicle scenario types.On the other hand, the analysis of an urban intersection data set widens the scenario type.This property defines for which types of scenarios the application wants to use the metric.Scenario types can vary depending on the application, we therefore omit an assessment scale at this point.

Inputs
In the open contexts that AVs operate in, there exists a large set of criticality-relevant entity types, such as various types of road infrastructure, traffic rules, and dynamic objects.Criticality metrics reflect certain aspects of those entities.Naturally, the question arises which entities are necessary to give as inputs for the computation of a criticality metric.This may concern rudimentary information such as positions and speed for simple trajectory data sets, but can increase in complexity up to road geometry, types and amount of dynamic objects (cars, pedestrians, etc.), and weather.Depending on the application at hand, there will only be certain inputs for the metrics available, e.g. a runtime monitor has only access to information provided by the AV's perception subsystem.We define a possible input of a criticality metric to be a subset of all traffic entities and their properties, e.g.all dynamic objects o with their velocity o.v.Furthermore, we note that metrics can impose restrictions on its own inputs, e.g.o.v ≠ 0.

Output Scale
Applications will obviously use the outputs of the metric's computation.Depending on the subsequent utilization of the result, certain restrictions can be imposed on the output scale.This especially concerns the algebraic operators applied on the result.The output scale can be defined through the use of algebraic operators into nominal, ordinal, interval, or ratio scales, alongside potential physical quantities.To illustrate the concept, an optimization algorithm requires at least a partial order relation on the output.It can hence use no metrics whose results are given on a nominal scale.

Reliability
The reliability of a measurement is defined as the degree of closeness of repeated measurements to one another and relates to the stability of the conclusions that can be drawn from the measurements [34,57].More specifically, highly reliable measures have a low random error and produce measurements that are consistent with each other.As criticality metrics are often defined through deterministic algorithms that are used in deterministic simulation environments, repeating their computations on the same inputs always yields the same outputs.Hence, our case is concerned with a certain facet of reliability, namely, in the difference of (S) to (S � ) for a scene S ′ that has only a slight change in criticality compared to S. For an application, it can be crucial to be aware of the reliability of the employed metrics, e.g. when computing criticality over time in a simulation run.As another example, a safety case requires a metrics with high reliability-in the general sense-due to the need for external reproducibility, e.g. by regulatory authorities.

Validity
Besides the repetition of measurements, various applications are also directly concerned with the validity of the measurement itself [34].Here, validity is defined as the closeness of the metrics' measurement to representing the actual accident probability and severity [57].The validity of a metric can often only be stated w.r.t. the metric's scenario type-typically, using a metric outside its designated scenario type is expected to result in decreased validity.Note that a highly valid metric is inherently reliable, as a high validity directly implies the impact of a random error being low.A highly reliable metric on the other hand is not necessarily always a valid metric.As to give an example, a planning function necessitates highly valid metrics to be used in safety-relevant decision making.The validity of binary classifiers can for example be assessed on a [0, 1] scale using an accuracy score.Non-binary metrics can be validated by means of statistical distance metrics to compare the distributions.At an extreme, if a metric is not defined for an input x, then we define it as invalid for x.

Sensitivity
In combination with given target values, a metric can induce a confusion matrix when evaluated against a ground truth [31].The following two properties refer to such a matrix.Note that a valid ground truth is necessary for an assessment.Sensitivity, also called the true positive rate (TPR), is defined as the rate of correctly identified critical situations, i.e. the share of true positives (TP) among the sum of TP and false negatives (FN): An over-approximating criticality metric will induce a sensitivity value of one, as it correctly captures all of the critical situations (but obviously identifies many uncritical ones as critical).In an application such as data filtering one may require such over-approximating metrics in a first step [101].Sensitivity is measured as a fraction, i.e. a value in [0, 1].

Specificity
Analogously to sensitivity, specificity, or the true negative rate (TNR), is defined as the rate of correctly identified uncritical situations, i.e. the share of true negatives (TN) among the sum of TNs and false positives (FP): When using criticality metrics, one is often not only concerned with identifying critical situations but also having guarantees on a given scenario being uncritical.As an illustration, a planning component of an AV can require certain guarantees on the metrics' specificity.On the other hand, applications such as data filtering do not impose severe restrictions on specificity, although the filtering process efficacy improves with increased specificity.Analogously to sensitivity, specificity is measured as a fraction in [0, 1].

Prediction Model
Most scene level metrics rely heavily on models that allow to assess the criticality based on the predictions of the future states of individuals within the current scene.Note that it is also possible for scenario level metrics to apply a prediction model for each time point retrospectively, e.g. when aggregating the TTC over time.The prediction model property can be characterized in two dimensions: 1. the size of the time window allowing useful predictions (unbound or x seconds), and 2. whether the metric only considers a single, possibly the worst case, evolution (linear time) or multiple parallel lines of development (branching time).
For example, an identification of risk-reducing states requires that the valid prediction horizon should be bound to at least the predicted duration of the planned risk-reducing maneuver.Additionally, the safety of the maneuver is increased when multiple future developments are considered, e.g. by incorporating possible reactive behaviors of other traffic participants to the maneuver.A prediction model is typically bound to a specific entity type, e.g.being only applicable to predict the behavior of a human passenger car driver.When selecting a prediction model, the type of the model has to fit the entity under consideration.Illustrating this, the behavior of an automated vehicle may not be validly predicted based on human factors such as human reaction times.

Mapping Requirements of Applications on Properties of Metrics
Based on the previously identified applications and relevant properties of criticality metrics, we derive requirements from the first on the latter.For conciseness, this mapping is depicted in Table 4 in B. Here, expert-based hypotheses formulated by the authors are presented.Due to the hypothetical nature, they are obviously subject to scientific debate, especially when presented with a refined application.Nevertheless, we show that even those abstract applications impose certain qualitative requirements.

Models and Metrics
There exists a wide arsenal of functions measuring various aspects of traffic safety.
Collecting their state of the art is the final preparatory step for executing the suitability analysis method presented in this work and instantiated in Sect.6.
This section therefore presents a large set of metrics and identifies the features of their properties.Due to the exemplary nature of this process and the little assumptions on the implementation of the metrics, we are confronted with a high level of abstraction.Thus, it is impossible to instantiate all metric properties on a concrete level.This holds specifically for reliability, validity, sensitivity, and specificity, as they are dependent on specific implementation details of the metric models and algorithms.In fact, this leads to properties being seemingly intangible at this high level of abstraction.We reiterate that these are nonetheless paramount for both the presentation of the methodical approach as well as a consideration in a concrete, practical instantiation of the presented suitability analysis.In the latter case, the properties will be instantiated with more tangible features.
For simplified readability, common abbreviations and variables used in the definitions are aggregated in Table 1.Additionally, the index of acronyms of criticality metrics can be found in A.

Employed Models
Before the examined criticality metrics are described in detail, we present some shared, common models of the presented metrics.Such predictive models are applied to generate possible future developments of a scene.Note that the validity of the employed model can heavily influence a metric's measurements by invalidly representing properties of the entity under consideration, e.g. using human reaction times to model automated vehicle (AV) behavior or allowing an entity to directly achieve negative speeds after braking maneuvers.
When applying such predictions to the real world, one will have to deal with different uncertainties, e.g.process or measurement noise.These precision issues are sometimes modeled by adding white noise to the measurements or applying more sophisticated Kalman filter methods to the motion model.An additional factor that can reduce accuracy of motion predictions stems from the required discretization of the predicted motion.For more in depth considerations, one can also distinguish between different abstraction levels in motion modeling, specifically interaction-aware, maneuver-based, and physics-based models [63].However, for the scope of this paper, we will not follow this refined contemplation and not consider models without noise, uncertainties, or discretization errors.Additionally, when calculating metrics from the local perspective of an actor, one can only measure relative motion.For the sake of brevity, the work at hand focuses on motions in a fixed global coordinate system.We refer to Schubert et al. for an example of using local coordinate systems [87].
Single point kinematics In a single point kinematics model, the development of motion variables over time is approximated by a Taylor polynomial [43].Since this model is defined per actor, indices related to the actor are omitted.For position, velocity, and acceleration of an actor, it is defined as where the power with brackets (n) denotes the n-th derivative.
Simple car model The simple car model takes the speed and steering angle functions input vector u = (u s , u ) ∶ ℝ → ℝ 2 , t ↦ (u s (t), u (t)) , and outputs the position and direction of the vehicle [61].It takes the form where L denotes the distance between front and rear axle.It is possible to add further derivatives by e.g.replacing u s with another unknown function whose derivative is given by a new control input u a specifying the acceleration [61].This approach leads to an increased required regularity of the solution, while also increasing the ease of specifying more differentiable solutions if one keeps the differentiability of the control input functions the same.Continuous steering car As an example to the previously mentioned idea, LaValle introduces the continuous steering car [61].While solutions to the simple car state equation might have discontinuities in the steering behavior, by introducing the steering angle as a state variable and the steering rate u as an input variable, we obtain where the steering rate input u might include dis- continuities, but the steering angle does not.Notice that in this case, the input can be understood as ) .Naturally it is also possible to apply this approach multiple times to further increase the required regularity of solutions.

Coordinated turn model
The transition of the coordinated turn model without uncertainties [43] from a time step t to a time step t + T is given by Since the nearly coordinated turn model only differs by a uncertainty on the turn rate , it is omitted at this point.
Augmented coordinated turn model with polar velocity In comparison to the last model, the augmented coordinated turn model also takes the turn rate as a state variable into account [81].Furthermore, it demonstrates the possibility to use polar coordinates for velocities in prediction models.The underlying differential equation reads as where denotes the rotational acceleration.For constant velocity and zero rotational acceleration, we obtain the augmented coordinated turn model as a solution of the differential equation as For non-zero linear or rotational acceleration, solutions to the differential equation will obviously become more complex.
One track model The one track model assumes the speed of the center of gravity of the vehicle along the planned trajectory to be constant and the tires to be jointly modeled as a single tire at the center of both axles [86].Since this model is defined per actor, indices related to the actor are omitted.The state update equation for this model can be formulated as Two track model The two track model extends the one track model by dynamics for each individual tire and lifts the constant speed assumption [86].Again, indices related to the actor are omitted.The state update equation reads as where x is a 36-dimensional state vector including for exam- ple the position, cardan angles and vertical movements of the tires in a local coordinate frame and u is a 4-dimensional input vector with the steering angle, gas and braking pedal depression percentage, and a gear.
Potential-based models A potential-based model requires a potential function for each considered object type [105].After aggregating the potentials functions one can obtain the maneuver model (MM) for a given starting point e.g. by applying gradient descent to the combined potential function.
Stochastic reachable sets Stochastic reachable sets are based on multi-trace branching prediction [5].There are multiple approaches to using stochastic reachable sets in motion modeling, where the reachable sets arise from variation of the control inputs of the underlying single-trace dynamics model.When applying distributions to the variation of the control parameters and modeling the possible trajectories using Markov chains on a finite partitioning of the reachable sets, one obtains the most commonly used MM derived from stochastic reachable sets.Notice that this MM still depends on the underlying dynamics model used to obtain the reachable sets, thus it is always required to use another (nonlinear) MM together with this approach.

Criticality Metrics
Subsequently, we present a wide variety of criticality metrics, thereby delivering an extensive state of the art review.This analysis follows the approach described in Sect.1.3.For each metric, we generally present a description alongside its corresponding formula.
In Sect. 1, criticality metrics have been introduced as a function ∶ S → O , with S the set of all traffic scenes and O ⊆ ℝ ∪ {−∞, +∞} a measurement scale.Generally, criticality metrics can be differentiated into such scene level metrics-calculated given only the information at a specific point in time-and scenario level metrics-calculated for a given time series.Naturally, every scene level metric can be used to derive scenario level metrics by aggregation over time, but not vice versa.In the following descriptions, scene level metrics do have a point in time t as an input (i.e.(⋅, t) ), while scenario level metrics do not (i.e.(⋅)).
In order to avoid ambiguities, we note that the term 'metric' is used with various interpretations in the literature.Mathematically, the term is used for a function that measures the distance between two elements of a set.In the context of criticality metrics for automated vehicles, the term 'metric' is typically understood as a mathematical measure rather than a mathematical metric.The term 'metric' in this context originates from the engineering sciences, especially software engineering, where it has been defined as 'a measure of the extent [...] to which a product [...] possesses and exhibits certain [...] characteristics' [11].
Aggregation over Time While aggregation over time will be a non-issue for the remainder of this section, let us briefly elaborate.As already mentioned, any scene level criticality metric (⋅, t) can be used to define myriad sce- nario level metrics using suitable aggregate functions, i.e.
(⋅, [t 0 , t e ]) = agg t∈[t 0 ,t e ] (⋅, t) .As evaluation of (⋅, t) on measurements along [t 0 , t e ] implies discretization, popular choices for aggregation function over discrete time include agg ∈ {min, max, mean , p-quantile, median , ∑ , … } .The choice of an appropriate aggregate is highly dependent on the criticality metric and the context of its application.The authors suggest that no universal statement about the 'optimal' aggregation over time can be made at this point and comparative experimental data is to be collected.
Aggregation over Actors Similarly, any criticality metric that is defined for one or two actors, e.g.(A 1 , A 2 , ⋅) , can naturally be extended to arbitrary actors in a scene or scenario by aggregation, e.g.(, ⋅) = agg A i ,A j ∈,A i ≠A j (A i , A j , ⋅) .Depending on the actors of interest, a different aggregate might be appropriate: while for assessing the risk of a designated actor A 1 , such as a vehicle of interest, aggre- gating (A 1 , , ⋅) = max A j ∈,A j ≠A 1 (A 1 , A j , ⋅) can be advanta- geous, a criticality assessment of a scene which is impartial to individual perspectives can benefit from aggregating (A, ⋅) = mean A i ,A j ∈A,A i ≠A j (A i , A j , ⋅) .Analogously to aggre- gation over time, we exclude such considerations in this work, but acknowledge their necessity.

Implementation and Computation
The focus of the subsequent review lies on presenting idealized formulae for each metric, stating constraints or assessing properties of interest for variables as estimated by a given prediction model.As an example, such an idealized formula can retrieve the future position of actor A i after a duration t by using p i (t + t) .Note that for an implementation of a scene level metric, one might want to limit the time horizon t H > 0 of predictions used by the metric, i.e. constraining t ∈ [0, t H ].
By using clear interfaces to the prediction models, dependencies thereon are both minimized and distinctly defined, conceptually enabling a substitution of various models implementing the employed interfaces.We hence refer to the prediction models for implementation details such as reaction times, actuator dynamics, discretization of time, and time horizons.Altogether, this enables the suitability analysis to abstract from such details, lifting its scope to the measurement principle instead of binding it to specific realizations.This abstraction ultimately allows to derive novel computational methods that implement the idealized representation as well as the comparison of such implementations to the measurement principle.

Time to Collision (TTC)
For two actors A 1 , A 2 at time t, the TTC metric returns the minimal time until A 1 and A 2 collide according to a given DMM, or infinity if the predicted trajectories do not intersect [33,94].It is defined by A variety of the TTC, called Modified Time To Collision (MTTC), is extended under the name of Crash Index (CrI), where it is multiplied with a velocity-based severity estimate [74].
For car following scenarios and from the point of view of a distinguished actor, the TTC delivers a quality estimate on the temporal proximity to a collision that is induced by a maneuver of an actors, e.g. by a braking maneuvers of a lead vehicle.Its validity is however greatly reduced for most DMMs within intersection scenarios, cf.Fig. 1, as well as, if not meaningfully aggregated over actors, in multi actor scenes.Furthermore, the resulting time still needs to be interpreted w.r.t. the abilities and environment of A 1 , either by using appropriate target values or composed metrics such as TTM.
One possible aggregate of the TTC to the scenario level is the Time To Accident (TTA) metric which is defined as TTA(A 1 , A 2 ) = TTC(A 1 , A 2 , t evasive ) with t evasive being the first time where an evasive maneuver is performed [45].Such aggregations over time can increase the TTC's validity when used for a retrospective assessment.Further information is given when discussing the other two time aggregates of TTC in this work, TET and TIT.

Potential Time to Collision (PTTC)
The PTTC metric, as proposed by Wakabayashi et al. [103], constraints the TTC metric by assuming constant velocity of A 1 and constant deceleration of A 2 in a car following sce- nario, where A 1 is following A 2 .Given these constraints, the formula simplifies to with ḋ = ḋ(p 1 (t), p 2 (t)) and d = d(p 1 (t), p 2 (t)) respectively.While imposing such constraints on the scenario type and the DMMs of the actors reduces the computational cost of evaluating the metric, its validity is significantly reduced compared to the general TTC.

Time to Zebra (TTZ)
Defined by Várhelyi et al. [100], the TTZ measures the time until an actor A 1 reaches a zebra crossing CA, hence Note that this concept can be further generalized to a Time To Object (TTO) metric for arbitrary moving or non-moving objects and conflict areas.For moving objects, this generalization coincides with the TTC.

Time to Closest Encounter (TTCE)
The TTCE is a distance-dependent risk indicator, which generalizes the concept of the TTC to the non-collision case [20].At time t, the TTCE returns the time t ≥ 0 which minimizes the distance to another actor in the future.The corresponding minimal distance is called the Distance of Closest Encounter (DCE).The formulae are given as In particular, as DCE → 0 , TTCE → TTC which implies that DCE = 0 if and only if TTCE = TTC .Building on the TTCE and DCE, Eggert uses an exponential transform together with a survival function in order to estimate the future event probability of a collision for the distance-dependent risk [20].

Worst Time to Collision (WTTC)
Originally defined by Wachenfeld et al. [101], the WTTC metric extends the usual TTC by considering multiple traces of actors as predicted by an over-approximating DMM, i.e.
where Tr 1 (t) resp.Tr 2 (t) denotes the set of all possible tra- jectories available to actor A 1 resp.A 2 at time t, as constraint by the employed DMM.By design, the WTTC excels in selective data recording and data filtering applications.

Time Exposed TTC (TET)
The TET is a scenario level metric that builds on the TTC together with a target value [45,66].While originally defined only for discrete time, the formula can be elegantly generalized to continuous time as where denotes the indicator function.Therefore, TET measures the amount of time for which the TTC is below a given target value .Its dependency on the scenario duration could easily be eliminated through division by t e − t 0 .Moreover, let us mention that the idea of 'time exposed below target value' can readily be adapted for any metric together with a target value and is essentially independent of the TTC.

Time Integrated TTC (TIT)
Similar to the TET, the Time Integrated TTC (TIT) [66] is a scenario level metric based on the TTC and is given as It aggregates the difference between the TTC and a target value in the time interval [t 0 , t e ] .Therefore, the metric reflects criticality more accurately than the TET.As for the TET, the construction of the TIT can be adapted for other metrics.

Time to Maneuver (TTM)
The TTM metric [35,94] returns the latest possible time in the interval [0, TTC] such that a considered maneuver performed by a distinguished actor A 1 leads to collision avoid- ance or −∞ , if a collision cannot be avoided.Therefore, For analytic purposes, an extension of the TTM's output scale to negative values is possible.Various special cases of the TTM metric have been considered in the literature [48,94,102], including Time To Brake (TTB) [64] (i.e.m = 'brake'), Time To Steer (TTS) [35] (i.e.m = 'steer'), and Time To Kickdown (TTK) [35] (i.e.m = 'kickdown').

Time to React (TTR)
The TTR metric [35,94] approximates the latest time until a reaction is required by aggregating the maximum TTM metric over a predefined set of maneuvers M, i.e.

Time Headway (THW)
The THW metric calculates the time until actor A 1 reaches the position of a lead vehicle A 2 [43, 48], i.e.
Analogously to the THW, one can define the Headway (HW) metric [43] simply as the distance to a lead vehicle, i.e.
The THW is used by regulatory bodies in several countries to express recommendations and as a threshold for fines [48].

Encroachment Time (ET)
The ET metric [4] measures the time that an actor A 1 takes to encroach a designated conflict area CA, i.e.
While the value of ET is loosely correlated with criticality, it completely ignores the dynamics and behavior of other actors.

Post Encroachment Time (PET)
The PET [4] has been widely used as metric for the a-posteriori analysis of traffic data [58,45,75].The PET calculates the time gap between one actor leaving and another actor entering a designated conflict area CA on scenario level.Assuming A 1 passes CA before A 2 , the formula is Allen et al. also introduce two semi-predictive versions of the PET, called Gap Time (GT) and Initially Attempted Post Encroachment Time (IAPE), which inherit the properties of the PET and are not considered any further here [4].Both metrics, GT and IAPE, measure t exit (A 1 , CA) and predict t entry (A 2 , CA) at different points in time using a constant velocity model.Therefore, they can be seen as an evaluation of the Predictive Encroachment Time (PrET) at a specific time point.

PrET (Predictive Encroachment Time)
Here, we summarize the predictive versions of the Post Encroachment Time.The redictive Encroachment Time (PrET) [69] is the anticipated PET w.r.t. a predicted intersection point: The Time Advantage (TA) metric [58] can be interpreted as a special case of PrET for a constant velocity model, i.e. p i (s + t) = p i (t) + sv i (t) .A scaled variant of the PrET, labeled Scaled Predictive Encroachment Time (SPrET), modifies the value of PrET by multiplication with the factor in order to decrease the weight of situations long before the predicted intersection [69].Therefore, the SPrET incorporates prediction uncertainty.

Proportion of Stopping Distance (PSD)
The PSD metric, proposed by Allen et al., is defined as the distance to a conflict area CA divided by the Minimum Stopping Distance (MSD) [4,7,30,65].Therefore,

Accepted Gap Size (AGS)
he AGS is a quantity which can be used to measure the complexity of a traffic situation.In general it quantifies the gap or the actual space between actors desired or required for others to make a positive action decision.Therefore, for an actor A 1 at time t, the AGS [62, 77] is the spatial distance that is predicted for A 1 to act, i.e.
where a model action(A 1 , t, s) predicts, based on the circum- stances at t, whether A 1 decides to act given the gap size s.This model can for example refer to the size of the gap in a stream of pedestrians passing a crosswalk, which is required for a waiting driver to decide to cut in and continue.For a time dependent distance measure, the metric is also called the accepted lag size.In general the more critical a traffic situation is, the larger the desired distance to other actors will be.For example, at an intersection, drivers tend to wait if the situation is unclear and the intersection itself is already crowded.

Required Longitudinal Acceleration ( a long,req )
For two actors A 1 , A 2 at time t, a long,req measures the maxi- mum longitudinal backward acceleration required, on average, by actor A 1 to avoid a collision in the future.It can be formalized as The a long,req can be adapted for the situation where the accel- eration of A 1 needs to be positive in order to avoid a collision by taking the minimum a 1,long ≥ 0 instead.An interesting special case is exhibited, when constant acceleration of the actors is assumed in a car following scenario, cf.[43, 5.3.5].
Assuming A 1 is following A 2 , we have that Moreover, Jansson discusses the interesting cases of piecewise constant motion [43, 5.3.6] and the inclusion of actuator dynamics [43, 5.3.7].For constant acceleration, the concept of a long,req is also known as Deceleration Rate to Avoid Crash (DRAC) [6].

Required Lateral Acceleration ( a lat,req )
Similar to the a long,req , the a lat,req [43, 5.3.8] is defined as the minimal absolute lateral acceleration in either direction that is required for a steering maneuver to evade collision.For two actors A 1 , A 2 at time t, a lat,req measures the minimum absolute lateral acceleration required, on average, by actor A 1 to avoid a collision in the future: a long,req (A 1 , A 2 , t) , 0 .
For actors A 1 and A 2 with constant acceleration where A 1 is following A 2 , the formula concretizes to where with w i denoting the width of A i and k ∈ {left, right} depends on the sign of

Required Acceleration ( a req )
Based on a long,req and a lat,req , the aggregate metric a req can be defined in various ways [43, 5.3.10],e.g. by taking the norm of the required acceleration of both directions, i.e.
More complex aggregates might also take into account the maximally available acceleration in each direction by incorporating the coefficient of friction .Also, let us mention the Conditional Required Acceleration ( a req,cond ) [69] which combines a req and SPrET for the analysis of urban intersec- tion scenarios: The a req,cond demonstrates by example how new criticality metrics can be created by combination of existing metrics and target values.In particular, the conditionality of the a req,cond encodes that the dynamical aspects of criticality only become relevant when a certain temporal criticality is present.This construction, of course, can be generalized as it is not specific to the a req and SPrET.Generally, address- ing the different aspects of criticality through combination of metrics can lead to significant improvements in validity.

Deceleration to Safety Time (DST)
For an actor A 1 following another actor A 2 , the DST met- ric calculates the deceleration (i.e.negative acceleration) required by A 1 in order to maintain a safety time of t s ≥ 0 seconds under the assumption of constant velocity v 2 of actor A 2 [38, 87].The corresponding formula can be written as and extends the concept of the a long,req by requiring decelera- tion to a safety distance v 2,long (t) ⋅ t s , under the assumptions of constant velocity of A 2 .In particular, for t s = 0 , the DST agrees with the constant acceleration version of a long,req .

Brake Threat Number (BTN)
For actor A 1 , the BTN [43] is defined as the required longi- tudinal acceleration imposed on actor A 1 by actor A 2 at time t, divided by the longitudinal acceleration that is at most available to A 1 in that scene, i.e.
By definition, a BTN ≥ 1 indicates that a braking maneuver performed by A 1 cannot avoid an impeding accident under the assumed DMM.An extension of BTN to multiple actors is proposed by Eidehall [21].For car following scenarios, a special case of the BTN, known as the Deceleration-based Surrogate Safety Measure (DSSM), incorporates human factors into the model by combining the worst-case assumption of A 2 braking maximally with A 1 's required reaction time [92].

Steer Threat Number (STN)
Similar to the BTN, for two actors at time t, the STN [21,43] is defined as the required lateral acceleration divided by the lateral acceleration at most available to A 1 in that direction: By definition, an STN ≥ 1 indicates that a lateral maneuver performed by A 1 cannot avoid an impeding accident.

Lateral Jerk (LatJ) and Longitudinal Jerk (LongJ)
Jerk, being the rate of change in acceleration and thus quantifying the abruptness of a maneuver, is formulated as .
The jerk has many immediate applications such as discerning different classes of driving styles, e.g.comfortable, angry, anxious, and risky modes [10,24].Another important application area are trains and buses, where for standing passengers, the jerk enables an analysis of their reaction capabilities on the maneuver, e.g. during a change of tracks of a train [78].
The usage of LongJ and LatJ varies, e.g.LongJ can be utilized in the design of an adaptive cruise control (ACC) [41] function, whereas LatJ is used for functions dealing with steering maneuvers, e.g. a Lane Keeping Assistance System (LKAS) [40].

Space Occupancy Index (SOI)
The SOI defines a personal space for a given actor actor and counts violations by other participants while setting them in relation to the analyzed period of time [45,72,95].For each actor A i at time t, a personal space Sp(A i , t) is defined.At time t, if there exists some A j ≠ A i s.t.Sp(A i , t) ∩ Sp(A j , t) ≠ � , a violation of the personal space of A i is given.The number of conflicts is then given as , where [⋅] denotes the Iverson bracket.Thus, for a given scenario in the time interval [t 0 , t e ] , the conflict index is defined as SOI was introduced for bicycles and pedestrians, however, it is possible to formulate a similar concept for road vehicles.

Pedestrian Risk Index (PRI)
The PRI estimates the conflict probability and severity for pedestrian crossing scenarios by combining the TTZ with the impact speed [15].It is defined for a scenario with a vehicle A 1 and a vulnerable road user (VRU) P both approaching a conflict area CA .The scenario shall include a unique and coherent conflict period [t c start , t c stop ] where . Here, t s (A 1 , t) is the time A 1 needs to come to a full stop at time t, including its reaction time, leading to where s imp is the predicted speed at the time of contact with the pedestrian crossing.The PRI thus quantifies over two aspects of a whole scenario: the temporal difference is claimed to be a surrogate for the accident probability, whereas the impact speed is approximate for its severity.One possibility of estimating s imp is defined by the authors as where t r i is the reaction time of actor A i .Note that depend- ing on the DMM, other formulae for s imp may be employed.

Crash Potential Index (CPI)
The CPI is a scenario level metric and calculates the average probability that a vehicle can not avoid a collision by deceleration.It sums over the probabilities that a given vehicle's a long,req exceeds its a long,min for each time point and nor- malizes the value over the length of the scenario [18].The target value a long,min is assumed to be normally distributed and dependent on factors such as road surface material and vehicle brakes.While originally defined in discrete time, the CPI can be defined in continuous time as: Note that this concept of aggregation over time can be generalized to be applicable to other metrics, assuming that a valid probability distribution of the target value can be given.This potentially enables a more precise identification of criticality within a scenario.

Aggregated Crash Index (ACI)
The ACI measures the collision risk for car following scenarios by extending the concept of CPI from a req to multiple conditions.First, a probabilistic causal model of the scenario type under consideration is needed to derive a collision tree with all possible outcomes and their probabilities [56].
The concrete outcomes are represented by the tree's leaf nodes L j .Every leaf node has a value C L j which is 0 in case of no collision and 1 in case of a collision.Similar to CPI, the conditions are defined based on other metrics, e.g. the current stopping time of the lead vehicle being smaller than a lognormally distributed reaction time.The collision risk CR L j (S, t) of a leaf node L j given a scene S at time t is hence represented by CR L j (S, t) = P(L j , t) ⋅ C L j , where P(L j , t) is the probability of satisfying all conditions necessary to reach L j , when given the conditions at time t.Finally, the ACI is an aggregation of the collision risks in a scene S at time t: with n being the number of leaf nodes in the collision tree.

Trajectory Criticality Index (TCI)
The TCI metric models criticality using an optimization problem [49].The task is to find a minimum difficulty value, i.e. how demanding even the easiest option for the vehicle will be under a set of physical and regulatory constraints.For example, if the constraint is to avoid obstacles, then driving straight towards an obstacle and being only a few seconds away requires a large change in steering angle and acceleration to satisfy the constraint of collision avoidance.
Here, the possible set of vehicle actions are not only constrained by physically possible behavior; it additionally shall adhere to a mathematically modeled set of requirements.Said constraints are based on the necessary longitudinal ( a long ) and lateral acceleration ( a lat ) to avoid collisions as well as the margin for corrections in speed ( R long ) and course angle ( R lat ).Since both R long and R lat are dependent on a long and a lat , it suffices to minimize the combined function w.r.t. a long and a lat .The requirements include concepts such as holding a safe following distance and maximizing distance to obstacles.
Assuming the vehicle behaves according to Kamm's circle, TCI for a scene S with an ego vehicle A 1 reads as where t H is the prediction horizon, a x and a y the longitu- dinal and lateral accelerations, max the maximum coefficient of friction, g the gravitational constant, w weights, and R long and R lat the longitudinal and lateral margin for angle corrections: Here, x(t), y(t) is the position, t s the discrete time step size, v max the maximum velocity, r long (t) the reference for a fol- lowing distance (set to 2S ⋅ v long (t) ), r lat the position with the maximum lateral distance to all obstacles in S, d long (t) , d lat (t) the maximum longitudinal and lateral deviations from r long , r lat .

Conflict Index (CI)
The conflict index enhances the PET metric with a collision probability estimation as well as a severity factor [3].For this, the estimated kinetic energy that would have been released assuming a hypothetical collision between A 1 and A 2 at their states when entering ( A 2 ) resp.exiting ( A 1 ) the conflict area is estimated: where the denominator is a collision probability estimation.Therefore, it is proposed that the actual collision probability is proportional to e − PET(A 1 ,A 2 ,CA) with being a calibration factor dependent on e.g.country, road geometry, or visibility, and [ ] = S −1 .The nominator represents the collision severity, where ∈ [0, 1] is again a calibra- tion factor for the proportion of energy that is transferred from the vehicle's body to its passengers and ΔK e is the predicted absolute change in kinetic energy before and after the predicted collision.
ΔK e is estimated based on the masses as well as velocities and angles at time of entering ( A 2 ) resp.exiting ( A 1 ) CA.

Collision Probability via Monte Carlo (P-MC)
P-MC produces a collision probability estimation based on future evolutions from a Monte Carlo path planning prediction [12].At first, a binary representation of the road geometry with the distinction of drivable and non-drivable is generated.If the ego enters a non-drivable region, a collision is detected.Every object in the scene has a state, denoted by s i (t) = (p i (t), v i (t)) , and control inputs u i (t) .The motion of each object is then described by an ODE of the form ̇si (t) = f (s i (t), u i (t)).
If the bounding boxes of two objects intersect at some point between t and t + t H , a collision is detected.A goal function g(u i (t)) is defined for each object in the scene to specify the desirability of paths that the object might follow based on the possible control inputs.With k objects in a scene, the combined goal of all objects is defined as For an actor A 1 in a scene S, the collision probability is then ΔK e e PET(A 1 ,A 2 ,CA) , with P(C | U) being the collision probability of A 1 in S under the given inputs U.

Collision Probability via Stochastic Reachable Sets (P-SRS)
Althoff et al. propose to estimate a collision probability using stochastic reachable sets [5].Firstly, the reachable set R([t, t + t H ]) (the set of possible positions within t H ) is over-approximated for each actor, where the movement of the actor is approximated by Markov chains with time steps {t + t 1 , t + t 2 , … , t + t H } and a constant T = t k+1 − t k .The ego's motion is not modeled as it is assumed to be known.Afterwards, the state and input space are discretized, thus we can write R i (T) for the reachable set given a state in the i-th partition of the state space and the input in the -th partition of the input space for time T. The transition probabilities to partitions X j of the state space are given by where V returns the volume.Aforementioned concepts are then generalized to Φ ji ([0, T]) by taking the union of R i (t) for t ∈ [0, T] , not accounting for the discrete time aspect at this point [5].
Using the properties of Markov chains, one can thus derive the probability distribution of the position for each time interval.Behaviors of other actors are modeled as Markov chains on the control input space of the DMMs.Due to the discretization of the state space, we can approximate the lateral deviation by a piecewise constant function and thus we can define intervals D f where said function is constant.This leads to a lateral position probability of By splitting the state space partitions X i into position and velocity, i.e.X i = S e × V m , one can define Afterwards, all possible paths in which two actors could have intersecting vehicle bodies are identified and stored in a list Ω .Under the assumption of stochastic independence and using the previous concepts, we then have p pos ef = p path e ⋅ p dev f , hence leading to the collision probability

Collision Probability via Scoring Multiple Hypotheses (P-SMH)
Similar to other probability-based approaches, Sánchez Morales et al. propose to assign probabilities to predicted trajectories and accumulate them into a collision probability [67].The ego's motion is modeled by a two track model, cf.Sect.5.1.Due to less information being known for the other actors, a one track model is used for those.Pedestrians have the ability of changing direction, velocity, and acceleration in a finite set of steps under given constraints.Once the number N of trajectories for the ego and total number M of trajectories of all other actors is determined, one can compute the collision probability of A 1 at time t as where i j equals one if and only if the i-th trajectory of A 1 and the j-th trajectory of the actors in A ⧵ A 1 lead to a colli- sion, and p A 1 ,i resp.p (A⧵A 1 ),j are the probabilities of the trajectories being realized.

Potential Functions as Superposition of Scoring Functions (PF)
The general concept of the PF metric is to define a potential function for each static or dynamic object considered by the metric [105].This includes potentials for lane markings, the road geometry, other vehicles, or, in more urban areas, pedestrians and bicyclists.Once a potential function for each object in the scene, denoted by U i (A, S) , chosen, one can apply e.g.gradient descent for a given scene S to the combined potential function where A is an actor and k denotes the number of objects.A simple example of how to evaluate this metric for an actor A 1 and a given scene S ′ is by inserting the values into U, i.e.
However, methods involving the mentioned gradient descent to assess the criticality can improve precision and also provide a suggestion for criticality-reducing vehicle movement.Due to the way this metric is defined, almost all properties depend on the specified potential functions.Furthermore, while ethical questions play a role when defining any safety surrogate, it becomes more evident for potential functions, as an active decision making in the definition of the potentials is required.

Safety Potential (SP)
Conceptually, the Safety Force Field (SFF) framework identifies, under the assumption of all actors conducting some safe control policy (e.g. an emergency brake), whether there can exist a conflict [70].To measure safety w.r.t collision avoidance, SFF uses SP as a numeric valuation.
Formally, each safe control policy s ∈ S 1 brings an actor A 1 to a full stop in finite time.SFF defines the occupied set O 1 of an actor A 1 to include its safety margin as well as A 1 itself.For each point on each trajectory that can arise from conducting a safe control policy s ∈ S 1 , O 1 is examined.The resulting union of trajectories is the claimed set C 1 .
The unsafe set between two actors A 1 , A 2 ∈ A can then be identified as Intuitively, it is the set of all actor state combinations for which there exist safe control policies leading to a collision.
Identifying the combined state space of A 1 and A 2 as Ω 1 × Ω 2 , SFF subsequently employs a potential function 1,2 ∶ Ω 1 × Ω 2 → ℝ to rate the combined states of actors, where - 1,2 (x) > 0 for all x ∈ U 1,2 and -1,2 (x) ≥ 0 for all x ∉ U 1,2 and -1,2 (x) ≥ 1,2 (x ′ ) for all states x ′ that arise from x when A 1 and A 2 apply safety procedures s 1 , The safety potential can hence rate a two-actor scene from one of their perspectives.The authors state the following exemplary safety potential for some k ∈ ℤ >0 ∪ {∞}: where t int is the the earliest intersection time when continu- ing the current situation under some model, and t stop (A i ) is the time of full stop of A i after applying a safety procedure.

Accident Metric (AM)
AM evaluates whether an accident happened in a scenario: This simplistic metric is implicitly used in accident databases, e.g.GIDAS 4 .It fails to identify critical non-accident scenarios.

Responsibility Sensitive Safety Dangerous Situation (RSS-DS)
The RSS framework is designed to formally guarantee safety during an automated vehicle's drive [89].
For this, the safe lateral and longitudinal distances d lat min and d long min are formalized, depending on the current road geometry.The metric RSS-DS for the identification of a dangerous situation is therefore defined as In order to determine d lat min and d long min , different prediction models are used, e.g. for intersections, highways, and unstructured roads.
Note that RSS has been shown to not consider certain edge cases, e.g. during braking maneuvers and on varying road surfaces and slopes, as well as the issue of perception uncertainty [52].
An extension of RSS-DS measures the temporal extent to which the ego was not able to mitigate the dangerous situation [44].In accident research, a similar concept of classifying situations as safe and unsafe depending on longitudinal stopping distances was introduced as the Stopping Distance Index (SDI) [73].In turn, the SDI is partially based on the idea of the Potential Index for Collision with Urgent Deceleration (PICUD) [98], both comparing the stopping distances of the lead and following vehicle under emergency braking.

Delta-v ( 1v)
Δv is the change in speed over collision duration and widely used in collision databases, where it is typically calculated from post-collision measurements [25].Introduced in the 1970s [16], it uses the difference in speed to estimate the probability of a severe injury or fatality: A more complex formula for two actors taking the masses into account is given by An extended Δv measure, which is additionally consider- ing the mass as well as the driving angles of the collision participants, has been discussed by Laureshyn et al. [60].
Joksch [46] presents a model connecting Δv to the prob- ability P of a two vehicle collision leading to a fatality using This connection provides an easily interpretable measure.

Conflict Severity (CS)
CS is concerned with solely estimating the severity of a potential collision in a scenario [8].It thus presents as a suitable factor that can enhance various collision probability metrics in ensuring a more accurate representation of criticality.From the perspective of an actor A 1 performing a braking maneuver at time t e vasive , it is defined as Thus, it compares the (extended) Δv at time of the evasive maneuver against the Δv at the potential collision point as predicted by TTA if A 1 conducts an emergency brak- ing, assuming v 2 (t evasive + TTA(A 1 , A 2 )) = 0 .CS factors in the relative mass difference due to the correlation between severe injuries and fatality outcome, measured on the Abbreviated Injury Scale, and the mass ratio of the involved actors [22].

Property assessment
In Sect.4, we have presented a list of relevant properties of criticality metrics.Based on the previous depiction of the metrics, we can now perform an evaluation of their properties.
To this end, the authors have created a supplementary web page 5 on which a detailed assessment for each of the aforementioned metrics can be found.The analysis is based on the authors' hypotheses, which are, wherever possible, covered by evidences.We remark that for many metrics no suitable sources providing evidences for the properties are available in the literature, strongly suggesting further empirical research.The underlying repository is therefore open to enhancements and suggestions by the scientific and industrial community 6 .
For brevity, we refrain from including the complete set of properties within this publication, although they will be referenced and used in Sect.6.2.However, to foster comprehension of the performed property assessment, we subsequently discuss the properties of the TTB metric, as taken from the referenced web page 7 , exemplarily in more depth.

Run-time capability
The TTB metric is calculated given a scene S at a specific time point t.Semantically, it predicts the latest time for which a brake maneuver can successfully mitigate a collision, therefore forecasting a future evolution of S. Factually, these two properties are the foundation of any criticality metric to be used in an online setting, e.g.within an ADF implementation.
Target values Target values give meaning to the metric's result within a specific application, e.g.enabling to classify scenes or scenarios.The literature suggests: 1. for pruning of an MRM for an ADF within an exemplary situation, a TTB ≥ 0.4s was used to classify braking as a viable MRM [94] (application A.3), 2. for scenario classification of car following and lane merging maneuvers, 1s was used as an estimate based on the human reaction time, while 0.6s was empirically observed as critical by human subjects [48] (application B.2), 3. for criticality-based scenario classification within simulation runs, 1s was used (in combination with other criti- cality metrics) [36] (application B.2).
Note that target values are to be considered within their context.For example, a target value of 0.4s may be well used for pruning a set of MRMs, but is potentially unfit for filtering large data bases due to a low sensitivity.

Subject type
The TTB requires that an MM for braking can be formalized for the subject of the metric ( A 1 ).This is achievable for road vehicles, whether human-or automationdriven.For VRUs, such as bicyclists and pedestrians, a braking maneuver is complex to model, and often not the sole reaction to a critical situation, e.g.combined with steering.Therefore, the TTB metric is primarily concerned with road vehicles.Note that the type of A 2 is not constrained.
Scenario type If evaluated on a scene, the TTB requires a collision in the prediction model, i.e.TTC ≠ ∞ , to return a meaningful value.Otherwise, when the DMM does not predict a collision, the TTB yields ∞ even if the criticality is heightened.Consequently, if evaluated on a scenario, the TTB needs a significant time span in which a collision is reliably predicted within the employed DMM for the metric to return a time series of meaningful (i.e.non-infinite) valuations.
Inputs The TTB metric uses the current information on both actors A 1 and A 2 to predict future behaviors.It therefore requires their states-e.g.pose and shape-at the time of evaluation t.Moreover, as motivated earlier, an MM for a braking maneuver of A 1 is needed.
Output scale If a collision can not be avoided at any future time point, the TTB yields −∞ .Otherwise, a value in [0, TTC(A 1 , A 2 , t)] is returned, which leads to the output scale of {−∞} ∪ [0, ∞] having time as its quantity.As there exists an absolute point of zero, it is given on a ratio scale, therefore supporting proportional comparisons between values.
Reliability The subsequent properties rely heavily on the employed prediction model.Assume that the DMM predicts a collision at time t in the near future.In that case, the TTB adequately reflects criticality w.r.t. a braking maneuver.Assume now that in the next time point t ′ > t , the DMM is not able to predict a collision anymore, e.g.due to a small change in yaw angle by A 2 .This effectively leads to a TTB value of ∞ .The actual criticality at t ′ to t may only be changed slightly, but the metric jumps from indicating a certain criticality level at t to assessing the scene as uncritical at t ′ .Therefore, the TTB's reliability is only high under the assumption that collisions can be reliably predicted.
Validity The TTB's validity primarily depends on the validity of the collision predicted within the DMM.If the DMM assumes e.g.unrealistic motions, the TTB's validity will suffer.Furthermore, in case the TTB is evaluated solely under a fixed actor's perspective within scenes involving multiple actors, validity can be reduced.This can be mitigated by aggregating over all actors.For at least one, a braking maneuver will be indicated.
Sensitivity Both sensitivity and specificity are derived properties-they base on the validity of the metric, the target values, and the DMM.For example, the TTB's specificity may be reduced if no collisions are predicted for critical scenarios.
Specificity As braking is a key choice in human reaction to hazards [2], the TTB indicates the remaining time until no (realistic) mitigation maneuver is probable with a high specificity for humans.AVs may exhibit different abilities for mitigation maneuvers, e.g. by avoiding collisions through last-second steering, options-even if possible-often not conducted by human drivers [2].Therefore, TTB's specificity may be reduced when used for AVs, as non-braking maneuvers can still avoid critical situations in which TTB ≤ 0.
Prediction model The TTB uses the function p i ∶ ℝ + → ℝ 3 as an interface to the DMM in order to retrieve a prediction on the positions of the actor A i at future time points.Therefore, the time window is theoretically unbound, but larger time horizons may decrease the prediction validity.Additionally, it considers only linear time as, for each time point, only a single future position is retrieved from p i .

Interrelations of Criticality Metrics
This section detailed a vast set of metrics and their interrelations.For example, we showed that the BTN is dependent on the a long,req metric.Such interrelations are helpful to understand the complex network spanned by the metrics during the execution of a suitability analysis.Figure 5 gives an overview of those relations, where we differentiate between scenario and scene level metrics, as introduced in Sect.5.2.Furthermore, metrics that do not rely on a prediction model are highlighted.

Suitability Analysis
Up to this point, we have extensively reviewed the state of the art of criticality metrics within their contexts.For an indepth interpretation of these results, we propose a methodical suitability analysis, which can be understood as a synthesis of the preceding sections.Given an application (Sect.3) and its associated requirements on properties (Sect.4), the goal of the suitability analysis is to find a set of adequate metrics and models (Sect.5).The suitability analysis can additionally be used to discard metrics that are, regardless of the models, not suitable for the considered application.We first sketch the generic approach to a suitability analysis and subsequently give an evaluation using a comprehensible example.

Suitability Analysis Process
The suggested suitability analysis is presented as a five-step, expert-based process.It is given as inputs: -a description of the application at hand, A, -a set of available metrics, K, -and a set of available models, M.
Its output is defined to be a set of suitable metric that are assigned a set of suitable models, K ⊆ K × 2 M .The work flow of the process takes the following shape: (1) Identify a finite set of properties P of criticality metrics relevant for A. Derive a set of requirements R on the properties, potentially based on Table 4. (2) Order the requirements R w.r.t.their importance for the application A. This results in a relation (R, ≥).(3) For each available metric with its potential models, ( , ) ∈ K × 2 M , determine its properties, leading to an assessment similar to Sect.5.3, tailored to the details of A, relevant properties R, and available models M. (4) Choose some r ∈ max(R) , discard all metrics that do not fit the requirements from (3), and remove r, i.e. 1.
2. |R| ≠ 0 and |K| = 0 , then either conclude that there is no suitable metric for A or restart the process with -a relaxed set of requirements R, or -an enlarged set of metrics and models K.
The semantics of the relation (,   ) ⊧ r can be understood as the satisfaction of the requirement r on the corresponding property p ∈ P of for all models in .It can only be detailed on a per-application basis and is typically not formally evaluated but rather inspected by an application expert.

Evaluation of the Suitability Analysis
As to present a concise evaluation of the suggested suitability analysis, we apply the process delineated in Sect.6.1 to a concretely specified application, demonstrating the relevancy and value of the state of the art review of Sects.3, 4, and 5.

Exemplary Application
In this application, a prior analysis identified the phenomenon 'unprotected left turn' as highly relevant as well as a scenario class featuring the phenomenon.This class is to be analyzed w.r.t.criticality and the results are to be used within a scenario-based testing approach for an AV [68].In our example, only limited resources are available.Hence, for metrics with an unconstrained DMMs, it is preferred to use a physics-based model as presented in Sect.5.1.
The application generates critical instances from a logical scenario with two parameters, with the initial scene given in Fig. 6.It contains two actors on a four-arm intersection, with A 1 on the southern arm intending a left turn and A 2 on the northern arm intending a straight passing.We assume a mixed traffic environment, i.e.A 2 can be a human driver.The application's procedure for scenario instantiation is: 1. Using a state of the art simulation environment, sample uniformly from the logical scenario space [0, 40] × [5, 50] , carry out the simulation, and compute a criticality measurement for each sample.2. Fit a suitable regression model on the generated data using the measured criticality as the dependent variable.3. Use an optimization algorithm on the learned model to identify the critical parameter combinations.
The identified critical parameter combinations are then used in a downstream testing process as a representatives for the phenomenon 'unprotected left turn' in the associated scenario.
Fig. 6 The initial scene which is investigated for our exemplary scenario instantiation, and for which suitable criticality metrics are sought.

Subject Type
The metric shall be tailored towards automated vehicles, as the identified representative instances shall be critical specifically for an automation and not for human drivers.r 3 Scenario Type The metric shall be applicable to urban intersections and shall be suitable for unprotected left turns.r 4

Inputs
The metric shall have a subset of necessary input parameters available in the simulation environment.r 5 Output Scale The metric's output shall be on an ordinal scale to enable the usage of optimization algorithms.r 6

Reliability
The metric shall have a medium to high reliability, due to fitting a regression model on the samples.r 7

Validity
The metric shall have a high validity, as the critical parameter instances are used for testing and safety assurance later on.Fig. 7 Visual representation of the ordering ≥ on the set of identified requirements R within the exemplary suitability analysis

Exemplary Suitability Analysis
In the following, we walk through the analysis as presented in Sect.6.1 using the previously introduced example application as well as the set of metrics and models K identified in Sect. 5 as inputs.
Step (1)-Identification of Pand R We select the following set of properties, based on Sect.4.1: P = {Subject Type, Sce- nario Type, Inputs, Output Scale, Reliability, Validity} .For each p ∈ P , we identify justified requirements r 1 , … , r 7 ∈ R as stated in Table 2.
Step (2)-Ordering of R Next, we order the requirements r 1 to r 7 according to their importance for the application.The resulting relation is depicted in Fig. 7.
Requirements r 1 and r 3 are deemed to be most impor- tant, as without suitable scenario and subject domains, the instantiation process will not yield representative results.Furthermore, r 5 and r 7 are chosen to be in the subsequent level of concern, as both the validity of the metric and its output scale are a necessity for the optimization delivering (valid) results.Finally, we order r 6 , r 4 , r 2 linearly.
According to the results of step (1), we reduce their examined properties to P. In favor of the conciseness of the example, we do not consider properties induced by their models, although, as shown by exemplarily examining the TTB metric, they often implicate restrictions as well.For example, certain models are unfit to assess the behavior of an AV, e.g. when including human reactions time for A 1 , which the application defines to be automation-driven.In such a case, either the model or its parameters have to be carefully selected to validly reflect an automation's behavior, e.g. by choosing a reaction time based on the automation under test.
Steps ( 4) and ( 5)-Iterative refinement of metrics Firstly, Let us examine r 1 ∈ max(R) .We now discard all metrics that are concerned with subject types other than road vehicles, namely SOI, PRI, and TTZ, as they measure criticality w.r.t.pedestrians and, at least without adaptions, are not directly suitable to assess automations.
A subsequent update of R yields max(R) = {r 3 } .We therefore need to examine whether the set of remaining metrics are applicable to the intersection scenario depicted in Fig. 6.Based on the results of Sect.5.3 and the assumption that the application prefers to use simple physics-based models, we remove any metric that -is primarily designed for car following or highway scenarios: ACI, DST, HW, PTTC, TCI, and THW, -uses a predicted collision as an indicator, due to assuming a physics-based model where a collision may not be predicted over a sufficient time span in the scenario: TET, TIT, TTC, TTM, TTB, TTS, TTK, and TTR.-is concerned with collision scenarios: Δv , or -requires the identification of t evasive which may not be present in some of the scenario samples: TTA Note that although certain metrics are removed at this point, they may be later used as a suitable enhancement factor for the resulting measurements, e.g. a multiplication with Δv.
After updating R, we choose r 5 ∈ max(R) .Since we are interested in ranking scenarios w.r.t.their criticality we can not make use of nominal-scale metrics, and remove AM, RSS-DS.As a special case, we disregard SOI which reduces to a nominal scale metric for |A| = 2.
Next, we discard r 5 from R and select r 7 ∈ max(R) .Con- sidering the validity of the remainder of K, we find that, for criticality assessment in the exemplary scenario -DCE does not sufficiently consider the state of the left turn vehicle, e.g. its velocity, -ET does not model the behavior of A 2 , -PSD and WTTC are not able to sufficiently distinguish uncritical passing scenarios from close encounters, -TTCE does not incorporate any semantics of the close encounter, effectively not always being a critical event, and therefore disregard them from K. We also remove all metrics over which no sufficient certainty can be reached regarding their validity, namely PF and SP.r 6 has no impact as no residual metric has a low reliability.The final requirement, r 2 , demands the metrics to be appli- cable to automated vehicles.It removes AGS from the resulting set, as no suitable gap acceptance models for automated vehicles exist.Furthermore, as discussed in Sect.5.3.2,metrics such as TTB may be better suited to identify critical scenarios for humans than for AVs.The remaining metrics do not suffer from such limitations, and therefore remain in K.Note that this requirement can have stronger implications for selected models, which were fixed within this example.We hence complete the analysis.
Resulting set of metrics The iterative refinement returns the set of metrics depicted in Table 3, additionally annotated with their output scales.Depending on the planned efforts, a (potentially time-aggregated) combination of those metrics can now be chosen to fit a regression model for downstream testing purposes.

Conclusion and Future Work
We assembled an extensive, unified knowledge base on criticality metrics for automated vehicles.The totality of information has been sourced from several decades of research from the areas of traffic conflict research, traffic psychology, and, more recently, development and testing of AVs.With respect to applications of criticality metrics for automated vehicles, our method answers the question 'How to identify a set of criticality metrics suited for computing criticality within the application at hand?'.
In order to assess the state of the art of criticality metrics, we elicited applications of criticality metrics for automated driving in Sect.3, derived properties of such metrics in Sect.4, analyzed the applications' requirements on them.Furthermore, we conducted an extensive review of criticality metrics in Sect. 5 and, whenever possible, separated these from the underlying models while also unifying notation.We visualized the interrelations of the considered metrics in Fig. 5.The synthesis of this review resulted in a expert-based, qualitative evaluation in Sect.5.3 regarding the previously identified properties of metrics.On top, we sketched a methodical approach, labeled suitability analysis, to answer the initial research question and concluded the work at hand with an exemplary evaluation for a comprehensible yet relevant application.
Overall, this paper provides researchers and engineers working in the area of automated vehicles with a vast amount of structured and visualized information presented in a unified framework as well as a blueprint for methodically choosing suitable criticality metrics for computations within their applications.
The information gathered and the method presented in this paper offer multiple directions for future work.We finish with an outline of selected open ends.
Quantitative evaluation of properties of criticality metrics.For the evaluation of the properties of criticality metrics in this work, presented in Sect.5.3, we included references whenever available.However, for many entries, we relied on expert judgment, as no evidences were traceable.In particular, quantitative studies based on a combination of synthetic and real-world data are required to confirm or refute the initial expert-based qualitative hypothesis in this work.Moreover, the influence of different models on criticality metrics and their properties has to be incorporated.
Uncertainty quantification.We assumed the inputs of criticality metrics to be dichotomously either available or not available.For real-world applications, however, measurements of variables are subject to measurement errors.These systematically or randomly erroneous measurements are then used to numerically approximate other variables or directly as an input to various metrics for further computations.In particular, quantification of these uncertainties is of great interest for safety-critical systems such as AVs.One option is the utilization of interval arithmetic to consistently track the numerical error coming from imprecise measurements along the metric's computations.This enables a quantification of the influence of measurement errors on a metric's output.
Development of new metrics via formalized phenomena.Most metrics are concerned with the physics-based symptoms of criticality -they measure spatial and temporal aspects highly associated with accidents, such as (predicted) small distances, high relative speeds, or intense decelerations.Those symptoms are typically located near the end of an accident's causal network.A more preventative approach is the identification of other causal factors, including environmental conditions, road network complexity, or traffic rule violations.Their incorporation allows to detect critical situations both earlier and more reliably.However, adequately measuring such factors necessitates their rigorous formalization.This formalization enables the considerations of more aspects of criticality, which, in conjunction, can lead to metrics that exhibit vastly improved validity and reliability.

Fig. 2 Fig. 3
Fig. 2 An exemplary, schematic scenario space Sc app ⊆ Sc for an application.The set of metrics K = { 1 , 2 , 3 } identifies certain scenarios inside Sc as critical and uncritical

Fig. 4
Fig. 4 Arrangement of the identified applications A.1 to A.3 and B.1 to B.4 of criticality metrics (solid) along the V-model (hatched)[42]

Fig. 5
Fig. 5 Visualization of the interrelations of criticality metrics as presented in Sect.5.2

Table 1
Abbreviations and variables used for the presented metrics 1 (t), p 2 (t)) Derivative of euclidean distance d v i (t) Velocity of actor i at time t a i (t) Acceleration of actor i at time t a i,min (t) Minimal available acceleration of actor i at time t a i,max (t) Maximal available acceleration of actor i at time t j i (t) Jerk of actor i at time t u i (t) Control inputs of actor i at time t i (t) Sideslip angle of actor i at time t i (t) Steering angle of actor i at time t i (t) Steering rate of actor i at time t i (t) Yaw angle of actor i at time t i (t) Yaw rate of actor i at time t F idxy Tire forces of actor i with direction d for tire (x, y) long Longitudinal component of a vector lat Lateral component of a vector

Table 2
Derived requirements on the properties of the metrics for the exemplary application as presented in Sect.6.2.1

Table 3
The resulting metrics for the exemplary suitability analysis

Table 4
Minimal requirements of applications on the properties of criticality metrics