1 Introduction: what is so special about automotive radar modeling?

Simulation is playing an increasingly important role in proving the safety of automated vehicles. New procedures are envisaged in institutions such as UNECE [1, p. 5, 6], where simulation will be an integral part of the certification process. Automated vehicles (SAE level 3+ [2]) rely on robust environment perception with multiple sensor technologies. Radar (“radio detection and ranging”) is one of these technologies, where a signal is actively transmitted to receive the echo instead of passively collecting it from other sources, like cameras. Automotive radar sensors commonly used in series production at different OEMs are based on the frequency modulated chirp-sequence principle, which is used to determine the range and the angular position of objects via antenna patches with respect to the sensor. What makes radars special among current perception sensor technologies is their ability to measure the radial velocity via the Doppler effect and their wide use in series applications. This leads to an additional dimension of information and a radar “point cloud” therefore includes range r, azimuth angle \(\phi\), elevation angle \(\theta\), radar cross section (RCS) \(\sigma\), and relative radial velocity v.

Fig. 1
figure 1

Comparison of a radar (left) and lidar (right) “point clouds”, projected into the captured scene with parking cars [3, p. 3]. The color represents the intensity-equivalent value per detection

Fig. 2
figure 2

An abstracted radar processing chain with elements visualized as blocks with rounded corners. The group within the processing chain is visualized as dashed rounded blocks and the sensor interfaces are marked as edged blocks

As shown in Fig. 1, a radar “point cloud”, or more precise a radar “detection list” [4], appears more sparse and unstructured when compared to lidar. It becomes visible that modern digital beam forming radar sensors are not structuring the scans of the environment in a per layer order, as current time-of-flight lidar (“light detection and ranging”) sensors do with their laser beams. The unstructured and more noisy appearance of radar detections is reasoned in the radar sensor’s wave propagation characteristic and signal processing. Its front end, as shown in Fig. 2, entails the signal reception via the antennas and analog-to-digital conversion (ADC). After applying a discrete fast Fourier transform (DFFT) algorithm, the time-based signal is structured into the so-called radar cuboid. This term means a cubic multidimensional volume, often called radar cube for simplicity. It consists of multiple cells, so-called “bins”, that can be divided into the dimensions range \(\iota _r\), relative radial velocity \(\iota _v\), azimuth \(\iota _{\phi }\), and elevation \(\iota _{\theta }\).

To enhance explainability, in the remainder, the content is limited to radars with range, azimuth, and radial velocity dimensions. However, the methodology can be applied to elevation without restriction, as each coordinate is anyway validated separately. The three dimensions of the radar cuboid are visualized in Fig. 3 with one single bin colored in orange and the teal color depicting the bins of the radar cuboid at a relative radial velocity of 0 ms\(^{-1}\). Each bin contains a power ratio P in dB, which is calculated from the transmitted and the received signal. The number of bins \(I_{r/v/\phi /\theta }\) of each dimension results from sensor design parameters like bandwidth, sampling rate and measurement time, as well as the configuration of the antennas. Except for digitization itself, only information due to windowing and noise, preexisting prior to the application of the DFFT, is lost at radar cuboid level. Because of this condensation of all digital available information, it is reasonable to simulate the synthetic data with chirp sequence frequency modulated continuous wave radars with uniform array antennas at this very early interface. For the required level of detail in simulation-based safety validation of automated driving, low-level interfaces must be taken into account to simulate specific perception tasks like fusion algorithms [5] due to challenging environmental conditions or object constellations. Furthermore, simulated low level interfaces enable to enhance early signal processing.

Fig. 3
figure 3

Visualization of the radar cuboid with bins in range denoted as r, azimuth as \(\phi\) and Doppler as v. The overall number of bins is I and one bin in the corresponding dimension is \(\iota\). In orange one bin with the power value P at the position \(I_r, I_{\phi }, I_v\) is highlighted. The teal colored front marks the radar cuboid at the Doppler bin 0, which is of interest for static validation studies

Nevertheless, it will not be sufficient to have only a simulation model of a sensor available, but the validity of the sensor model must be proven along with its delivery. Only in this case simulation models can also be used in a trustworthy manner for safety assessments, as already envisaged by the UNECE [1, p. 5, 6]. Therefore, the complexity of radar measurements is due to noise based on multi-path propagation and RCS sensitivities. Also, the lack of public data sets and limitations in measurement repeatability is challenging [6]. Additionally, the number of detections depends on multiple causes and the noise of the sensor is complex due to e.g. the high frequency hardware components.

2 Related work on metrics for validation of active perception sensor simulation

In this chapter validation metrics from literature for radar model validation are listed and evaluated. Finally, an overview of the DVM and its application on lidar data is given.

2.1 Comprehensive review of already used validation metrics

According to Oberkampf and Trucano, validation is the “process of determining the degree to which a model is an accurate representation of the real world from the perspective of the intended uses of the model” [7, p. 719]. Viehof [8, p. 91] introduced the so-called sample-vise validity in the context of vehicle simulation validation. Therefore, radar sensor model validation is understood as the comparison of synthetic and real sensor data with decent metrics, sample-wise for a specific region of the desired parameter space of the application area. However, this means not only to be able to access the real sensor to simulate at the specific interface of interest, but also to design measurement campaigns according to the cause-effect chains that are modeled and investigated for a specific sample of the possible parameter space of the sensor model.

As described in the previous section, radar sensor modeling is a special case, as e.g. the relative velocity is directly measured and due to the complex interaction during radar wave propagation. Validation of radar sensor models is a complex task due to the complexity of radar measurements, as outlined in the previous section. Besides experiment design to optimize repeatability and reproducibility of measurements, while minimizing epistemic and aleatory uncertainties in reference data, the metrics for comparing real and synthetic data play a crucial role in validation. Aleatory uncertainty is a statistical deviation based on probability distributions in data. Epistemic uncertainty means a lack of information regarding model structure, world knowledge and measurement errors. Benke et al. [9] While epistemic uncertainties can and should be reduced by enhanced reference data collection, aleatory uncertainties describe the inherent randomness of measurements.

A first decision guidance for metric selection is provided by the seven criteria for validation metrics as refined by Rosenberger [10, p. 99] and e.g. used by Magosi [11, p. 11], which were condensed by Schaermann [12, pp. 20–21], combining the original lists of six criteria by Oberkampf and Barone [13, pp. 11–12] and the seven features from Liu et al. [14, p. 2]:

  1. 1.

    Metrics meet the mathematical properties of a metric as defined by Fréchet [15]. (Unbounded results)

  2. 2.

    Metrics are intuitive. (Plausible & output in unit of measurand)

  3. 3.

    Metrics are applicable to both deterministic and non-deterministic data.

  4. 4.

    Metrics are quantitative and objective. (No manually tuned parameters)

  5. 5.

    Metrics do not include acceptance criteria. (No Boolean output)

  6. 6.

    Metrics consider uncertainties. (Epistemic and aleatory)

  7. 7.

    Metrics define a confidence interval with respect to the number of measurement data.

The state of the art in validation metrics for active perception sensor simulation is extensively discussed by Rosenberger [10, pp. 60ff.]. Multiple metrics in this collection of 34 options are only indirect metrics, where detections are sorted into occupancy grids first or object detection and tracking is applied. These cannot be applied on radar data at earlier interfaces like radar cuboid or detection level. Other metrics measure distances between points in space. These do not take into account intensity or power values. Therefore, they are not applicable to the radar cuboid, which is not a list of detections, but an equidistant distributed volume filled with power values.

Table 1 shows an excerpt of the remaining metric candidates for radar data. If a metric is capable of a category given by the column title, it is marked in a specific shade of green, otherwise the cells stay blank. The table considers the data interfaces that the metrics are or could be applied to (D: Detections, O: Objects). Then, the ability to be applied to (\(\bullet {}/\rightsquigarrow {}\): (Quasi) static/dynamic) scenario is provided. Additionally, in Table 1 the scale of measurement it is able to process is considered (M: Metric (interval or ratio), O: Ordinal). The uncertainties it is able to process (

figure a

: Aleatory/epistemic) are given per metric, as well. For the first four columns, it is marked if the metrics are applied without modification in literature (x), or if the metrics are applied in literature with moderate adaptions (\(\star\)). Additionally, the coverage of the seven criteria for validation metrics from the beginning of this section is marked in dark green.

Table 1 Excerpt of the evaluation of metrics applied for active perception sensor simulation from Rosenberger [10, p. 99]

Typical metrics that can be applied on object poses in space or detection coordinates like Manhattan distance (\(d_{\textrm{Ma}}\)) and overall error (OE) could be applied to the power values for a bin-wise comparison of a synthetic and a real radar cuboid. This also holds e.g. for mean error \({\overline{d}}\), root mean squared error (RMSE) and all other familiar error metrics from the collection that are not explicitly mentioned in this work, but listed in the original source collection [10, pp. 68–72]. Still, none of these strictly mathematical metrics accounts for aleatory or epistemic uncertainties, which disqualifies them to be applied for radar sensor model validation on the detection or radar cuboid interface, due to the stochastic and sensitive characteristics. The machine learning-based Deep Evaluation Metric (DEM) as introduced by Ngo [16] is used to measure an overall simulation-to-reality gap, but it does not cover most of the seven criteria. Additionally, due to the black-box nature of the results, there is a lack of interpretability, making it difficult to formulate safety arguments based on this metric. Also, it does not provide insights on how to calibrate and enhance a sensor model and is therefore not considered in the following. In Rosenberger’s enumeration of metrics, the Mahalanobis distance \(d_{\textrm{M}}\) and its weighted variant are absent [17]. To achieve comprehensive coverage, Table 1 is supplemented with this metric. In the literature, this metric has already been applied to real and synthetic detection data from radar sensors [18, p. 33]. However, only the detection distribution over a binned bounding box is considered, without the inclusion of RCS or power values of the radar cuboid. Due to the metrics characteristic of not accounting for either aleatory or epistemic uncertainties, the Mahalanobis distance and its weighted form are excluded as metric candidates for radar validations.

Consequently, only the Kullback–Leibler divergence \(D_{\textrm{KL}}\) as applied by Schaermann [12], the Jensen–Shannon distance \(d_{\textrm{JS}}\) as used by Magosi et al. [11], the area validation metric (AVM) \(d_{\mathrm {{AVM}}}\) introduced by Ferson et al. [19], and the Frequency of positive Kolmogorov–Smirnov tests \(f_{\textrm{KS}}\), as applied by Eder [20] remain as metric candidates. This means that besides the AVM as best candidate based on 1, two families of metrics should be evaluated further, namely divergences and hypothesis testing. Clearly, both are not intuitive concepts for most people and involve some more abstract thinking compared to just computing the area between two curves, as done for cumulative distribution functions (CDFs) or empirical cumulative distribution functions (EDFs) in case of the AVM. Rosenberger presents a detailed analysis of both metric candidate families [10, pp. 105ff.], where the technique of manufactured universes [21] is used. Multiple EDFs are generated, where one is denoted as the real data and the others mimic simulated data to provoke edge cases for the metrics.

After showing hypothesis testing results for the different compared EDFs and a short summary on the ongoing discussions on these kinds of tests in general, this metric family is dismissed for sensor model validation due to the sometimes misleading and above all counter-intuitive results. They are not available in the unit of the measurand, which makes it less user-friendly e.g. in model specification, especially for negotiations with people with non-technical background. The same difficulties in interpretation of the results from comparing the different EDFs are present when applying Kullback–Leibler divergence or Jensen–Shannon distance, leading to not further considering them for sensor model validation, too [10, pp. 104–108].

The remaining metric candidate is the AVM that is simply the integral of the absolute difference between two CDFs \(F,{\widetilde{F}}\) over all real and simulated sensor measurements

$$\begin{aligned} d_{\textrm{AVM}}(F,{\widetilde{F}})=\int _{-\infty }^{\infty }|F(\zeta )-{\widetilde{F}}(\zeta )|\;\textrm{d}\zeta \,\mathrm {.} \end{aligned}$$
(1)

Due to the fact that the cumulated probability \(F(\zeta )\) for each measurand \(\zeta\) is limited to [0, 1] and unitless with m (e.g. 100) quantiles, the integral can be applied over the ordinate resulting in the mean error of all m quantiles of the CDF like

$$\begin{aligned} \begin{aligned} d_{\textrm{AVM}}(F,{\widetilde{F}})&=\int _{0}^{1}|\zeta (F)-{\widetilde{\zeta }}(F)|\;\textrm{d}F\;\\&= \frac{1}{m}\sum _{i=1}^{m}|\zeta (F_{i})-{\widetilde{\zeta }}(F_{i})|\,\mathrm {.} \end{aligned} \end{aligned}$$
(2)

Therefore, the AVM is very similar to the mean error of all n measurements

$$\begin{aligned} {\overline{d}}=\frac{1}{n}\sum _{i=1}^{n}|\zeta _{i}-{\widetilde{\zeta }}_{i}|\,\mathrm {.} \end{aligned}$$
(3)

As visible in Table 1, the AVM is the only metric that handles aleatory and epistemic uncertainties. This ability is reasoned by the fact that it is not only applicable on EDFs describing aleatory uncertainties reflected in the shape of the EDFs, but also on so-called probability boxes (p-boxes). A p-box is expressed by the left and right boundaries of multiple EDFs. The width of the p-box at each quantile describes the epistemic uncertainties, as in Fig. 4 for the two simulation EDFs. First introduced by Williamson and Downs [22], a p-box gives the possible interval of cumulative probabilities for a specific measurand x and for a given cumulative probability it gives a possible interval of values, as discussed in detail e.g. by Ferson et al. [23].

As epistemic and aleatory uncertainties should always be minimized during measurements, but can never be eliminated, they must be propagated through the simulation to reflect these uncertainties when the model is validated. Practically, this means that e.g. every position of a sensed object must be captured with reference sensors during the measurements to collect the real sensor data to validate the model. The uncertainty of this reference position measurement device, e.g. ±1.0 cm, is then input for multiple simulations per measurement, e.g. one with the exact reference position and two more for the edge cases of ±1.0 cm. These multiple simulations result in several EDFs and a combination of all EDFs from simulation forms the p-box. Its boundaries are composed of the maximum and minimum x-values of the set of EDFs for each y-value.

The AVM for p-boxes is simply calculated by adding the two portions where the simulated p-box \(\widetilde{\varvec {\mathcal{F}}}\) is higher (\(d^+\)) or lower (\(d^-\)) than the real p-box \(\varvec{\mathcal {F}}\) as

$$\begin{aligned} d_{\textrm{AVM}}(\varvec{\mathcal {F}},\widetilde{\varvec{\mathcal {F}}})=d^- + d^+\,\mathrm {.} \end{aligned}$$
(4)

For simplification, in Fig. 4 the EDF F is an infinitely thin p-box. Consequently, the AVM only considers the left and the right borders of the p-box, the original course of each EDF inside is irrelevant, and the borders could actually originate from different EDFs.

Fig. 4
figure 4

Portions of the AVM, where the simulated p-box \(\widetilde{\varvec{\mathcal {F}}}\) is higher (\(d^+\)) or lower (\(d^-\)) than the real EDF F, based on Voyles and Roy [24]

2.2 The DVM and its application in validation of lidar sensor simulation

An additional requirement on validation metrics is the ability to distinguish model bias and model scattering error to enable the structured elimination of these two different modeling errors [10, p. 72]. Model bias is an approximation of the mean deviation and model scattering error the deviation in the distribution function’s shape. Figure 5 illustrates the difference between model and measurement bias and also shows the difference between the measurement standard deviation and the model scattering error. Indeed, measurement bias and scattering error are conceptually similar to the differences in mean and variance between a set of normal distributions. However, the distribution functions of the measurand can deviate from normal distributions.

Fig. 5
figure 5

Bias and scattering of measurement and model [10, p. 11]. In the case of normal distributions the two factors are the mean value and the standard deviation

Rosenberger therefore introduced the DVM that distinguishes the two components [10, p. 118ff.]. The first part is essentially the difference of \(d^+\) and \(d^-\) in comparison to the original AVM as sum of these two portions. Voyles and Roy [24] introduced this difference that is proven to be a good estimate for the model bias as

$$\begin{aligned} d_{\textrm{bias}}(\varvec{\mathcal {F}},\widetilde{\varvec{\mathcal {F}}})=d^- - d^+\,\mathrm {.} \end{aligned}$$
(5)

It eliminates symmetrically distributed area portions of the AVM, which reflect the model scattering error and therefore only keeps the model bias. Consequently, (5) can be used to estimate a “corrected” [24] p-box as

$$\begin{aligned} \begin{aligned} \widetilde{\varvec{\mathcal {F}}}_{\textrm{c}}(\zeta )=\widetilde{\varvec{\mathcal {F}}}(\zeta - d_{\textrm{bias}})=\widetilde{\varvec{\mathcal {F}}}\Big (\zeta - (d^- - d^+)\Big )\mathrm {.} \end{aligned} \end{aligned}$$
(6)

Taking this idea of a corrected, bias-free simulated p-box further, a second-order AVM can be computed with \(\widetilde{\varvec{\mathcal {F}}}_{\textrm{c}}\) that now only entails the remaining model scattering error. This novel metric introduced by Rosenberger [10, p. 118] is called corrected AVM (CAVM) and formulated as

$$\begin{aligned} d_{\textrm{CAVM}}(\varvec{\mathcal {F}},\widetilde{\varvec{\mathcal {F}}})= d_{\textrm{AVM}}(\varvec{\mathcal {F}},\widetilde{\varvec{\mathcal {F}}}_{\textrm{c}})=d^-_{\textrm{c}}+d^+_{\textrm{c}}. \end{aligned}$$
(7)

As illustrated in Fig. 6, the CAVM is a multistep process that inherently includes the calculation of the model bias on its way. It starts with the calculation of \(d^+\) and \(d^-\) for \(d_{\textrm{bias}}\) (5). Then the simulated p-box is corrected by \(d_{\textrm{bias}}\) to get \(\widetilde{\varvec{\mathcal {F}}}_{\textrm{c}}\) (6). Finally, \(d^+_{\textrm{c}}\) and \(d^-_{\textrm{c}}\) are calculated, resulting in \(d_{\textrm{CAVM}}\) (7).

Fig. 6
figure 6

Illustration of the CAVM [10, p. 118] in a lidar sensor measurement campaign. A plate is placed in front of the sensor at 10 m in a static scenario. \(r_{\textrm{nom}}\) and \(r_{\textrm{ref}}\) are the nominal and measured reference range. n and \({\widetilde{n}}_1\) are the number of detections from real data and simulation sim1, which is a control factor for the comparison of the different EDFs [10, p. 103]. These F is the EDF from real data. \(\widetilde{\varvec{\mathcal {F}}}\) is the p-box from simulation. \(\widetilde{\varvec{\mathcal {F}}}_{{\textbf {c}}}\) is the simulated p-box corrected with the estimated model bias \(d_{\textrm{bias}}\). \(d^+\) and \(d^-\) mark areas where the simulated p-box \(\widetilde{\varvec{\mathcal {F}}}\) is higher (+) or lower (-) than the real EDF F. \(d^+_{\textrm{c}}\) and \(d^-_{\textrm{c}}\) mark areas where the corrected simulated p-box \(\widetilde{\varvec{\mathcal {F}}}_{{\textbf {c}}}\) is higher (+) or lower (−) than the real EDF F

Consequently, the novel DVM for comparison of two p-boxes \(\varvec{\mathcal {F}}\), or EDF F as infinitely thin p-boxes, is achieved that distinguishes model bias and model scattering error with respect to the actual sensor bias and its real scattering behavior, as

$$\begin{aligned} d_{\textrm{DVM}}(\varvec{\mathcal {F}},\widetilde{\varvec{\mathcal {F}}})=\Big (\;d_{\textrm{bias}} (\varvec{\mathcal {F}},\widetilde{\varvec{\mathcal {F}}}),\;\, d_{\textrm{CAVM}}(\varvec{\mathcal {F}},\widetilde{\varvec{\mathcal {F}}})\;\Big )\,\mathrm {.} \end{aligned}$$
(8)

Aligned with distinguishing model bias and model scattering error, model validation should start simple to reach more complex scenarios later. Accordingly, Rosenberger starts with demonstrating the DVM for beam-wise model evaluation in static scenarios, like targets in different distances with no other effects taking place, to more complex and object-wise validation of synthetic lidar detections. Aside from the interpretability of the results in the unit of the measurand, the accuracies are considered by the reference tests using the p-boxes. Consequently, for radar model validation, the validation should follow this incremental approach. The experimenter should take special care to consider isolated cause-effect chains, which influence the radar signal propagation. A possible ontology to derive them is PerCollECT [25] as available on Github [26].

Besides the DVM there are other applications of the AVM described in the literature. Brune et al. compare in their application the left and right edges of the measurement and simulation p-boxes with each other [27]. Compared to the DVM, this approach offers the advantage that the p-box size is included in the metric result. This aspect is missing in the DVM according to Rosenberger. However, Brune’s AVM does not explicitly consider the scattering error of the distribution function, which means that the information about the shape similarity of the p-box is lost. Figure 7 shows the AVM according to Brune et al. and illustrates how the size of the p-boxes is incorporated into the result of the AVM.

Fig. 7
figure 7

Visualization of the AVM calculation \(d_L\) and \(d_R\) based on the left and right border comparison between measurement \(\varvec{\mathcal {F}}_L\), \(\varvec{\mathcal {F}}_R\) and simulation \(\widetilde{\varvec{\mathcal {F}}}_L\), \(\widetilde{\varvec{\mathcal {F}}}_R\), based on Brune et al. [27]. In the context of radar sensor data, the measurand could pertain to parameters such as the range of the detection distribution or the distribution of power values within the radar cuboid

2.3 Double validation metric limitations

As mentioned in Sect. 1, the measurement of the radial relative component of the velocity is possible with radar sensors. This allows the discussion of the DVM to model validation in dynamic scenarios. As already shown by Holder, even simple measurement scenarios on a proving ground are subject to difficulties regarding repeatability of complex geometries that must be taken into account [28]. Figure 8 shows the variation of the RCS denoted as \(Q_{(\sigma )}\) over the distance in the radar sensor coordinate system denoted as \(_{\textrm{S}}r\) of a retroreflector, which is a corner cube reflector (CCR), and a vehicle.

Therefore, to apply the metric, it is advisable to introduce p-boxes for the measurement data to consider the limited measurement repeatability and take them into account in the metric result. Based on Fig. 8 and the limited reproducibility the size of the measurement p-box and the distribution of EDFs must be part of the metric.

Fig. 8
figure 8

RCS experiment trials of a CCR in blue and a Golf Mk5 in black from Holder [28]

Fig. 9
figure 9

Edge case (EC) 1 of the DVM definition by Rosenberger and AVM calculation by Brune. On the left side the EDFs are visualized. On the right side the p-boxes of an imagined measurement and simulation are shown with the corresponding colored area

Additionally, the size of the simulation p-box is a factor influencing the quality of the DVM. These properties are missing in the DVM according to Rosenberger because these characteristics are lost during the transformation of EDFs to p-boxes. Theoretically constructed ECs based on the method of manufactured universes substantiate the previous remarks.

These ECs can appear in the application of the validation methodology to radar due to the sensitivity of the model to small changes in reference sensor measurement uncertainties, but also due to the problem of reproducibility of measurements.

Figure 9 shows the first ECs with the results of the Rosenberger’s DVM and the evaluation according to Brune et al. [27]. It addresses, on the one hand, the size of the p-box and the overlap of the simulation and measurement p-box. To proof the independence of the methodology regarding the number of simulation and measurement EDFs, the combination of EDFs is varied.

Figure 9 illustrates EC 1 consisting of a large simulation p-box in comparison to the measurement. Additionally, the right side of the simulation p-box equals the left side of the measurement p-box. The AVM and the CAVM are both 0, and therefore, the model is valid based on the DVM. The extension by Brune of the AVM covers this EC by resulting in a \(d_\textrm{L}\), which has the same value as the simulation’s p-box size.

Figure 10 shows EC 2 with a large measurement p-box in comparison to the simulation. A concentration of measurements EDFs on the right side is also present. For Rosenberger’s DVM the simulation results are only the deviation between the right simulation and left measurement EDF.

Fig. 10
figure 10

EC 2 of the DVM definition by Rosenberger and AVM calculation by Brune

Fig. 11
figure 11

EC 3 of the DVM definition by Rosenberger and AVM calculation by Brune

Fig. 12
figure 12

EC 4 of the DVM definition by Rosenberger and AVM calculation by Brune

The AVM is small and the CAVM is almost 0, because the shape of the right simulation and left measurement EDF are nearly identical. Also, the AVM definition by Brune fails in this case. The value for \(d_L\) covers just the offset between the two p-boxes.

Figure 11 visualizes EC 3 consisting of two simulation EDFs at the left and the right side, which are very similar in shape. Between the outer distributions there are several simulation EDFs that show a completely different distribution shape. By forming the simulation p-box, the information of the inner distributions is lost.

This leads to the fact that the deviations in the shape of the distribution function are insufficiently considered in the DVM validity consideration. Brune’s AVM covers the big size of the simulation p-box but fails with identifying the inner distribution functions.

Figure 12 illustrates EC 4, which is similar to number 2, where the distribution of the simulation distribution functions has no effect on the validation result itself. In the case of Brune’s AVM, only the total deviation of the p-boxes is quantified.

However, the important information of the accumulation of distribution functions on the right side of the simulation p-box is lost, which could be helpful for the modeler and the experimenter. In the context of simulation, the aggregation of distribution functions suggests that the model’s underlying parameterization exhibits heightened sensitivity at a specific point, resulting in a substantial deviation from other simulations. This observation is mitigated by the utilization of the p-box. Conversely, for the experimenter, such aggregation signifies an outlier within the measured data, indicating potential issues such as erroneous execution, deviation from the intended scenario, or significant alterations in environmental conditions.

The presented ECs show the shortcomings of the two validation metrics. Especially the distribution of the EDFs within the p-box borders are disregarded. Another disadvantage is that an assignment of the EDF outliers, which results from parameters varied in the simulation, to the validation result is excluded. Generally, the introduction of p-boxes obscures the validation result and leads to potential misinterpretation. Therefore, the methodology to apply the DVM has to be extended and modified.

To ensure comparability of simulated and real data based on DVM, it is essential to include the number of data points in the evaluation process. A deviation in the numbers between simulation and measurement of 10% is considered as an acceptable limit in the remainder of this paper by the authors. The deviation in data points occurs due to time effects in the radar sensor, which are not covered in the simulation model with a predefined sample frequency. At the detection level, this is particularly problematic in the case of the radar sensor, as the number of detections can vary greatly between individual measurement cycles, for example due to clutter from vegetation or rain.

3 Radar validation methodology

In this chapter the methodology for determining the DVM is adapted. Furthermore, it is shown how the new methodology can be applied to radar data.

3.1 The DVM map

To overcome the DVM limitations of simulation and measurement p-box size as well as distribution and shape of EDFs an adaption of the DVM is necessary. Figure 13 shows the adapted validation methodology based on the AVM and CAVM metric. Compared to Rosenberger, the measurement and simulation p-boxes are resolved, and each simulation is compared to each measurement, deriving the new "DVM Map" as a validity tool visualized as a heat map.

Fig. 13
figure 13

This figure shows the adapted DVM methodology to address the different ECs with the new DVM Map as intermediate step

In a first step, the AVM is formed for each simulation EDF in combination with each measurement and corrected by the determined \(d_{\textrm{bias}}\) according to (5). The absolute value of \(|d_{\textrm{bias}}|\) is used for visualization in the DVM map so that for negative and positive model bias, the color value of the scale is unambiguous. Following the aforementioned procedure, the corresponding CAVM is formed and \(d_{\textrm{CAVM}}\) determined according to (7). This results in a value for the model bias and the scattering error for each simulation in comparison to each measurement. The scattering error finds an intuitive explanation in the shape deviation of the corrected simulation EDF in comparison to the measurement EDF. To calculate the overall comparison score \(d_{\textrm{Sum}}\) the absolute value of the model bias is added with the CAVM result as

$$\begin{aligned} d_{\textrm{Sum}}=|d_{\textrm{bias}}|+d_{\textrm{CAVM}}\mathrm {.} \end{aligned}$$
(9)

These deviations of the simulation can be quantified in comparison to all measurements. Therefore, the DVM Map shows the most critical measurement and simulation of the corresponding reference data uncertainty. When the results of the DVM Map need to be further processed, the maximum of \(d_{\textrm{Sum}}\) is formed, thus identifying the most critical combination of measured and simulation data for sample validation. To demonstrate the utility of the newly developed DVM Map, the ECs from Figs. 9, 10, 11 and 12 are re-examined.

Fig. 14
figure 14

DVM Map of the ECs defined in Figs. 9, 10, 11 and 12 processed with the newly introduced methodology. From left to right, the various ECs are shown. From top to bottom the DVM Map of \(|d_{\textrm{bias}}|\), \(d_{\textrm{CAVM}}\) and \(d_{\textrm{Sum}}\) is illustrated. The coloring corresponds to the quantitative deviation of the corresponding value, where blue means low and red means high. The upper and lower limit of the scale is determined by the minimum and maximum value in the corresponding DVM Map

Figure 14 is sorted by the ECs from left to right and shows the results for \(|d_{\textrm{bias}}|\), \(d_{\textrm{CAVM}}\) and \(d_{\textrm{Sum}}\) from top to bottom. The DVM Map of EC 1 shows that the large model bias of simulation 2 can be compensated. The CAVM shows minor deviations since the shape of all EDFs is similar to each other. It can be seen that the combination of measurement 2 and simulation 1 performs worst in terms of sample validity.

For EC 2, the focus is also on model bias examination. The accumulations of EDFs on the right-hand side are clearly evident, with simulation 1 deviating more than simulation 2. The result also shows up in \(d_{\textrm{sum}}\), resolving the EC.

EC 3 shows that the distribution of the different simulation EDFs can be mapped using the \(d_{\textrm{CAVM}}\). However, the influence of \(d_{\textrm{bias}}\) predominates for simulation 1.

EC 4 shows that the DVM Map can also reproduce high deviations of the simulation and thus reflects valuable information regarding sample validity and the influence of various aleatory uncertainties.

In Sect. 2.1, the seven criteria for metric selection are enumerated. As the DVM is based on the AVM, which is listed in Table 1, and the DVM Map is derived from the DVM, there exists an association among the criteria of the three metrics. Nevertheless, the DVM Map is evaluated based on the seven criteria outlined subsequently. Given that the DVM Map exhibits no boundary in its outcome, the first condition is met. As demonstrated by examples from the ECs, the results are plausible and consistent with the unit of the corresponding measure, facilitated by the use of the DVM. Furthermore, the metric is applicable to both deterministic and non-deterministic data. Additionally, aleatory and epistemic uncertainties are accounted for through the measurements and simulations of reference data. These two aspects are illustrated in the evaluation presented in Sect. 5. The metric itself possesses no tuning parameters and lacks any acceptance criteria. Hence, these two points from the list of criteria are also fulfilled. Confidence intervals can be defined based on the measurement data. However, this is not explicitly demonstrated in the subsequent evaluation.

3.2 DVM map application on radar sensor interfaces

Figure 15 shows the methodology application to radar cuboid and detections in a study of sample validity of a radar model. The first step is to run defined scenarios in the real world (bright red). In addition to the measurement data at detection and radar cuboid level, the measurement campaign yields the operational reference data, which is subject to epistemic and aleatory uncertainties. Operational reference data means in this context the ability to take a reference measurement with additional sensors independent of the radar sensor. The uncertainties are determined by means of reference sensors or reference sources.

The measured reference data is transferred to the simulation (light blue) in a further step. Here, in addition to the measured reference value, the epistemic uncertainties are propagated through the simulation. As a result, simulation data on detection and radar cuboid level are available, where the number of simulations depends on the number of simulated uncertainties.

Different variants of the new validation methodology can then be applied to the measurement and simulation data. First, a very rough consideration of all detections and all cells of the radar cuboid is advisable. From this, basic deviation of the sensor model from the measured data can be derived, as well as which uncertainty combination together with which measurement shows the largest measurand deviation. This allows conclusions to be drawn about gross modeling errors, as well as measurement outliers, provided the number of measurements is large enough to identify outliers. While more measurement data of longer time periods and / or repetitions (at least three repetitions should be required) is always better for the analysis, the number of measurements in practice is mostly restricted by ecological factors like available time at test tracks and personal.

Detections and the corresponding bins in the radar cuboid are of great interest of a validation study. This can be justified firstly by the fact that detections represent the input for all subsequent steps of radar processing, and secondly that here, either due to the environment or due to objects, power differences are present that a simulation model should represent.

Therefore, the time aggregated detection data from all measurements are clustered using a Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm and thus a region of interest is defined, which in turn can be transformed into cells of the radar cuboid. By applying the new methodology, individual areas of particular interest in the radar measurement are testable in a dedicated manner.

Fig. 15
figure 15

Validation methodology to apply the DVM Map to different levels in the radar processing chain. From left to right the application of the DVM map to all detections, to the entire radar cuboid, the region of interests, which result from time aggregated, clustered detections, associated with the corresponding radar cuboid cells and each radar cuboid cell separately is visualized

Finally, an application of the radar validation methodology to each individual cell of the radar cuboid is performed. Local effects and influences are the focus of the investigation and provide valuable information about the model sensitivity with respect to the reference sensor uncertainties. By matching the results to a satellite image detailed investigations of the influence of the environment are possible.

4 Application validation methodology

In this chapter the aspects considered theoretically so far are verified in the following by means of a validation study. For this purpose, a measurement campaign is carried out on the August–Euler airfield proving ground in Griesheim. The study is intended to serve as a proof of concept of the DVM Map, therefore a static scenario is measured. Moreover, within the context of sensor model validation, it is deemed essential to commence the process by scrutinizing simple scenarios to iterative rectify any errors inherent in the model. Consequently, the selected example is designated as the initial, static experiment within a broader validation campaign. Consequently, subsequent analyses refrain from a detailed investigation of the Doppler component within the dataset, with deference given to forthcoming publications for a more comprehensive examination of this aspect.

4.1 Experimental setup

The object in the validation study is a CCR lying flat on the asphalt, which is placed on different positions in front of the radar sensor. The positions differ in the range r to the radar sensor and the azimuth angle \(\phi\). For r, 29.56 m and 48.33 m are chosen, respectively, due to the radial resolution, so that the CCR is located once close to the edge and once in the center of the range bins. For \(\phi\), positive as well as negative angles are defined so that the CCR is within the sensor’s unambiguous azimuth measurement range and the width of the test site is sufficient. Thus, angles of − 8 \(^{\circ }\), −4 \(^{\circ }\), 0 \(^{\circ }\), 4 \(^{\circ }\), 8 \(^{\circ }\) are obtained. Figure 16 shows a sketch of the measurement setup with the different CCR positions, which are measured one after another, on the left side and the real world measurement setup for position 1 on the right side. The position of the CCR and the sensor is obtained using a real time kinematic (RTK)-based global navigation satellite system (GNSS) antenna with a measurement uncertainty of 0.02 m. For each measuring position, 5 measurements of 60 s each are recorded with the radar sensor resulting in approx. 850 samples.

Fig. 16
figure 16

Experimental setup of the validation study. CCR position 1 (\(r_{\textrm{CCR, Pos 1}}\) = 29.56 m, \(\phi _{\textrm{CCR, Pos 1}}\) = 0) is used for simulation calibration purposes and position 3 (\(r_{\textrm{CCR, Pos 3}}\) = 29.56 m, \(\phi _{\textrm{CCR, Pos 3}}\) = -8 \(^{\circ }\)) is analyzed based on the presented methodology

The measurement setup and the existing measurement technology result in the uncertain reference data listed in Table 2. Here, some parameters are determined directly and others result from the propagation of error of several measurement uncertainties. In case of the sensor height and the edge length a measurement tape is used with a given measurement uncertainty of 0.005 m. The sensor orientation is measured via a reference target and the RTK-based GNSS antenna resulting in 0.07 \(^{\circ }\).

Table 2 Measured reference data uncertainties for CCR position 3 defined in Fig. 16. The local Cartesian coordinate system G is defined in East-North-Up direction with the origin located on the August–Euler airfield in Griesheim

4.2 Simulation model

The reference data uncertainties listed in the table are transferred to the simulation in a next step. Reference data uncertainty propagation through simulation is realized by defining separate scenarios with the upper \(^{+}\) and lower limit \(^{-}\) of this uncertainty. Additionally, one simulation with all uncertainty-free reference measurement data denoted as N is integrated. For the radar simulation, the output of a black box radar ray tracing algorithm by IPG CarMaker version 9.1.1 and an adapted open source radar signal processing model [29] is used.

The input from the ray tracing algorithm is interpreted in the radar model as a delta peak in frequency space and is called “Fourier tracing” [28]. The range and angle information of the ray is used to calculate the radar cuboid bin in the different dimensions in which the delta peak is located. A windowing function is used to smear the power of the delta peak into the neighboring bins. This allows effects such as ambiguities, separation capabilities and interferences to be present in the radar model. In addition, a non-deterministic noise simulation is implemented to demonstrate the metric capabilities even for such modeling approaches. Based on measurements, the mean and standard deviation are determined for each range-azimuth cell combination. A Gaussian distribution with the determined parameters is then imposed on cells whose minimum power is below the noise floor. Figure 17 shows the determined mean and standard deviation as a range azimuth map. This result is conducted by placing the sensor on the asphalt with its front side pointing into the sky.

Fig. 17
figure 17

Mean value and standard deviation illustrated as range azimuth map for a Gaussian distribution noise model

The radar model is parameterized using the data sheet and calibrated with position 1 of the CCR on the detection interface. Additionally, a simulation model of the August–Euler airfield in Griesheim is used for the environment simulation.

5 Results and discussion

In this chapter, the different results of the DVM Map are shown and discussed. Therefore, the methodology defined in Sect. 3 is applied to the static validation scenario. The DVM Map with the new validation methodology is applied to the different interfaces as described in Figs. 13 and 15. In Sect. 5.1 all detections are analyzed. Afterwards in Sect. 5.2 the results of the whole radar cuboid is discussed. The region of interest in the radar cuboid by means of clustered detections is described in Sect. 5.3. Finally, the results of each cell of the radar cuboid combined with a visualization on top of a satellite image are shown. As an example, the previously mentioned scenario of CCR position 3 at \(r_{\textrm{CCR, Pos 3}}\) = 29.56 m and \(\phi _{\textrm{CCR, Pos 3}}\) = − 8 \(^{\circ }\) is evaluated.

5.1 Whole detections

This section contains the evaluation for all detections of the measurements and simulations. Figure 18 shows all detections of all measurements combined in one diagram in Cartesian coordinates, where the origin is the sensor position. Therefore, the EDFs of the different distributions of the quantities r, \(\phi\) and \(\sigma\) are calculated.

Fig. 18
figure 18

Plot of all detections in the Cartesian sensor coordinate system. Especially in near range of the sensor as well as at the transition of the asphalt and vegetation clutter is present. The CCR is located at \(_{S}x\) = 29.4 m and \(_{S}y\) = \(-\)4.2 m

The representation of the detections as distribution functions is visible the top row of Fig. 19. The EDFs of the different measurements are close to each other and the deviation of numbers of detections is less than 1%. This indicates a high reproducibility of the measurement results. However, there is a significant difference in the number of detections between simulation and measurement, which prevents a valid evaluation based on the DVM.

Nevertheless, further analysis of the data will be conducted to describe and analyze the general methodology. The distributions of the detections from the simulation are almost at the same position in every cycle for the range as well as the azimuth angle. However, deviations of the simulations in RCS show up. The analysis of the RCS shows that the model reacts very sensitively to small changes of the scenario, which again emphasizes the necessity to consider the reference sensor uncertainties.

The basically different distribution functions between measured and simulated data result from the environment model of the August–Euler airfield as well as the ray tracing simulation. The ray tracer used in the CarMaker version has no reflections from the road surface in the given setup as well as no reflections from the vegetation next to the asphalt surface due to the lack of simulated vegetation. Therefore, only rays from the CCR are processed as radar signal processing model input. Thus, only detections at the object result, whereby besides the small number of detections also the distribution in range and azimuth are limited to the position of the CCR. However, the detections of the CCR are in the same distance in measurements and simulations. This can be seen particularly well in the second row at r = 29.56 m as well as \(\phi\) = -8 \(^{\circ }\) by the step in the measurement EDFs highlighted by the red ellipse.

The second row of the Fig. 19 shows the simulation EDFs \({\widetilde{F}}_{\textrm{c}}\) corrected by the model bias \(d_{\textrm{bias}}\) based on the AVM calculation to measurement 1. This represents the second step of the methodology from Sect. 3. It is already evident, without a quantitative determination of the deviation by means of \(d_{\textrm{CAVM}}\), the fundamental difference of the distribution functions.

Fig. 19
figure 19

From left to right are the range in m, azimuth in \(^\circ\), and RCS in dBm\(^2\) results of all detections. The first row visualizes the EDF of all measurements F and of all simulations \({\widetilde{F}}\). In the second row the EDF of measurement 1 \(F_{\textrm{Meas1}}\) and the corrected simulation EDFs \({\widetilde{F}}_{\textrm{c}}\) are shown. The red circles illustrate the data points in the measurement EDFs where the CCR is located. The last two rows show the DVM Map of the above-mentioned quantities. The number of detections of the measurement n and simulation \({\widetilde{n}}\) is in the legend

The DVM Map in Fig. 19 verifies the findings already made from the first visual impression. In the range domain, the constancy of the simulation of \(d_{\textrm{bias}}\) and \(d_{\textrm{CAVM}}\) stands out. In the range, the simulation model is insensitive to the simulated uncertainties. This is due to the fact, that the measurement uncertainties are very small compared to the radar’s range resolution of 1.8 m. Measurement 1 and 5 show the smallest deviations in the model bias and measurement 2 the largest deviation. The scattering error of measurement 2 is largest and lowest for measurement 5 for the range dimension. Thus, the differences in the results of the DVM Map are due only to the differences in the measurements.

In the case of the azimuth dimension, the model is slightly more sensitive, as shown by the minor color changes in the perpendicular components of the DVM Map. These variations are small compared to the measurement influence shown in the horizontal of the various uncertainties. Here, the negative variations of the y-position and the rotation \(\phi\) of the sensor are the uncertainties with the most notable influence on the result.

For the RCS, in turn, the negative x-position of the CCR variation and the negative sensor height variation have the largest influence. All parameters have a clear but different influence on the RCS distribution of detections. In addition, it is shown that the DVM Map is able to represent the different positions of the EDFs with respect to each other in a very intuitive and simple way.

The influence of the environment in combination with the ray tracing algorithm as stated above is simply too large to make a validity statement by means of the analysis of all detections. However, the analysis of all radar detections can be used to compare stochastic effects from the environment. This includes not only influences like vegetation but also weather influences like rain and snowfall.

5.2 Whole radar cuboid

In this section, the DVM Map is applied to the whole radar cuboid. The only dimension considered in this case is the power distribution P of all cells of the radar cuboid, with the velocity dimension reduced to one cell \(\iota _v=0\). This reduces the radar cuboid to a range-azimuth map where each cell holds a spectral power value. Figure 20 shows the EDFs of simulation and measurement as well as the corresponding DVM Map.

The course of the simulation EDFs is characterized by the noise simulation at the beginning. Only above − 82 dB the effect of the radar cuboid cells filled by the CCR with a higher power is visible. In relation to the number, however, these are represented much less frequently, so that the gray curves just above − 80 dB are just close to 100% cumulative probability. All simulation runs are very close to each other and show only minor differences. This observation is again justified based on the sensor noise simulation introduced into the model. Nevertheless, the measurement EDFs deviate from the simulation course by a few dB up to − 80 dB. Above − 80 dB the model of the environment and the ray tracing algorithm become noticeable again. Due to reflections of the environment, cells of the radar cuboid in the measurement are filled with power up to − 60 dB. Subsequently, the effect of the CCR is visible in the form of steps at − 43 dB up to − 20 dB in the EDFs.

The corrected EDFs are also close to each other, which means that only small deviations in the \(d_{\textrm{CAVM}}\) are to be expected. The number of cells still indicates that the cycle time of the simulation model does not yet match the real sensor. Nevertheless, the deviation in the number of data points allows a comparison because the deviation is less than 10%. The problem of comparability, as evident in Sect. 5.1 when analyzing all detections, is less present in the radar cuboid.

The values of the \(d_{\textrm{bias}}\) are close to the real measurement due to the noise simulation and a deviation of 3.5 dB is tolerated given the dynamic range of a radar sensor that spans over 80 dB. Measurement 2 and 3 show the largest deviations to all simulation parameters, with an increased sensitivity of the model to the uncertainties \(_{\textrm{C}}x^{-}\), \(_{\textrm{S}}y^{-}\) as well as \(_{\textrm{S}}h^{-}\).

\(d_{\textrm{CAVM}}\) is around 3.5 dB and can be justified by the aleatory uncertainties of the environment on the measurement result. Different areas on the test track produce higher powers in the measurement, which the environment simulation does not cover. In conclusion, the noise simulation distorts the influence of the environment model and the ray tracing algorithm. Thus, before integrating stochastic effects, it is recommended to analyze and optimize the whole simulation chain with ideal test objects and small region of interests.

Fig. 20
figure 20

The DVM Map of the whole radar cuboid in dB is shown. The first diagram visualizes the EDF of all measurements F and of all simulations \({\widetilde{F}}\). In the second one the EDF of measurement 1 \(F_{\textrm{Meas1}}\) and the corrected simulation EDFs \({\widetilde{F}}_{\textrm{c}}\) are shown. The second row shows the DVM Map of the above-mentioned quantities. The number of analyzed radar cuboid cells of the measurement n and simulation \({\widetilde{n}}\) is in the legend

5.3 Clustered detections on radar cuboid level

In this section, the results of clustered detections at radar cuboid level are presented and analyzed. Figure 21 shows the position of the clustered detections in the Cartesian sensor coordinate system. The cluster of CCR is number 5 and highlighted in the figure with a red circle at \(_{\textrm{S}}x\) = 29.4 m and \(_{\textrm{S}}y\) = \(-\)4.2 m. At close range of the sensor, some detections are visible due to the reflection of the road surface. Especially on the x-axis and in the edge region of the sensor more detections due to this effect show up. All detections from \(_{\textrm{S}}x\) = 40 m are located at the road border where the asphalt ends and the vegetation starts. This clutter is present in all measurements and differs only slightly between measurements.

Fig. 21
figure 21

Clustered radar detections of all measurements in the sensor coordinate system, where the color represents the belonging to a cluster. The red circle at \(_{S}x\) = 29.4 m and \(_{S}y\) = \(-\)4.2 m shows cluster 5, where the CCR is located

In detail, the CCR and the corresponding bins of the radar cuboid are considered below. The upper part of Fig. 22 again represents the uncorrected and corrected EDFs of simulations and measurements and measurement 1, respectively.

Fig. 22
figure 22

The uncorrected and corrected EDFs as well as the DVM Map with \(d_{\textrm{Sum}}\) of the CCR cluster 5 of the radar cuboid interface in dB is shown

The step shape visible in the measurements results from the 4 different range azimuth cells analyzed in the evaluation based on the CCR’s position. The variations of a cell are within a few dB over time, which can be seen in the slope of the EDFs. Furthermore, the reproducibility of the measurements is exceptionally high, which is reflected in the overlap of the courses of the measurements.

In general, the simulations have a clear model bias. Here, the first modeling errors of the signal processing are evident. The model is calibrated to the RCS of the CCR at position 1 and therefore a difference in the calculation of the RCS from the radar cuboid power to the detections exists.

Furthermore, there is a clear influence of the uncertainties propagated by the simulation. In three simulations, the step shape is similar to the measurements, but the slope itself is more substantially smeared and not as steep. All other simulations have a much lower slope after the initial step. The beginning of the simulation slopes can be explained by the noise simulation. Two radar cuboid cells are considered here, which are not yet affected by the power increase due to the CCR. From this, a modeling error can again be identified. The window functions of the real sensor differ from those of the simulation model because the power increase of the CCR does not smear as far into neighboring bins as in the measurement. The discrepancy in the number of cell values considered indicates a sampling difference between the model and the real sensor. This is reasoned by the fact that the real sensor does not exhibit fixed cycle times, and the co-simulation restricts this parameter through a specific sampling frequency.

The previously described findings from the EDFs are also reflected in the DVM Map. The simulations of \(_{\textrm{S}}y^{-}\), \(_{\textrm{C}}x^{-}\) as well as \(_{\textrm{S}}h^{-}\) show the lowest deviations in model bias as well as scattering error. This is in agreement with the findings from Sect. 5.2, where also the mentioned uncertainties represent the smallest deviation (see also Fig. 20).

In comparison to the previous figures, the heat map of \(d_{\textrm{Sum}}\) is additionally shown, since the further considerations in Fig. 21 are based on these results. To compare the clusters with each other, the maximum value of \(d_{\textrm{Sum}}\) and its corresponding cell in the heat map is used. The values for \(|d_{\textrm{bias}}|\) as well as \(d_{\textrm{CAVM}}\) of the uncertainty measurement combination are transferred to a separate bar diagram in Fig. 23.

Clusters 1 to 4, 10, 13, 17, 21 and 24 in Fig. 23 show the clear difference between simulation and measurement in the close range of the sensor. As already described, ground reflections are not further considered in the simulation model, which results in the visible difference between simulation and measurement. Clusters 6 and 7 represent the largest clusters with the main clutter due to vegetation. The influence of vegetation is not as large as the deviations in the near sensor range, since the distance is larger and thus the power in the radar cuboid approach the noise level. Nevertheless, a clear difference between simulation and measurement can be identified.

Fig. 23
figure 23

Bar plot of all clusters of \(|d_{\textrm{bias}}|\), \(d_{\textrm{CAVM}}\) and \(d_{\textrm{Sum}}\) with the corresponding cluster number as well as the biggest influence based on the measurement uncertainty parameter combination on the x-axis. The evaluated interface is the radar cuboid of the different clusters in dB

Among the clusters farthest from the sensor in the range, number 18 stands out. At this location, there is an intersection of runway and taxiway on the August–Euler airfield. The effects of the change in ground properties are thus detectable in the methodology using the DVM Map. Across all clusters, no trend in the measurements and uncertainty parameters can be detected, which on the one hand speaks for the good reproducibility of the measurements and on the other hand for the high sensitivity of the radar model. If such a trend is observed in the presented analysis, either for an uncertain parameter or a measurement, it suggests the presence of either a parameterization with high sensitivity in the simulation or an outlier in the measurements.

5.4 Each range and azimuth radar cuboid cell

To increase the interpretability of the results, the outcome of the DVM Map are plotted on a satellite image. For this purpose, the measured positions of the sensor and the determined orientation are taken as origin and the range as well as azimuth resolution of the radar cuboid is used to distribute its cells over the satellite image. From top to bottom, \(|d_{\textrm{bias}}|\), \(d_{\textrm{CAVM}}\) and \(d_{\textrm{Sum}}\) are visualized in Fig. 24. The coloring of the cells corresponds to the results of the DVM Map per radar cuboid cell. As an example, the DVM Map of cell \(\iota _{r}=18, \iota _{\phi }=16\) is shown. As in Fig. 23 for further analysis, the maximum of \(d_{\textrm{Sum}}\) for the measurement and uncertainty combination is used. Therefore, this combination is used to color the cell in the value of \(|d_{\textrm{bias}}|\), \(d_{\textrm{CAVM}}\) and \(d_{\textrm{Sum}}\).

Fig. 24
figure 24

Satellite image in which the results of the DVM Map of the radar cuboid interface in dB are shown. From top to bottom, the validation results for \(|d_{\textrm{bias}}|\), \(d_{\textrm{CAVM}}\) and \(d_{\textrm{Sum}}\) are illustrated. In the lower right corner of each plot, the DVM Map of cell \(\iota _{r}=18, \iota _{\phi }=16\) is shown as an example on which the coloring in the satellite plot is based

In the top plot it can be seen that there is a deviation of about 30 dB in the area where the CCR is located. The smearing of the power in neighbor bins due to the windowing function is included in the simulation model, but an assignment to the causal effect of the deviation is difficult. On the one hand, the power of the CCR is too low, which can be corrected by calibrating the model at radar cuboid level instead of the detection interface with its RCS value. Nevertheless, there is a model error in signal processing in the calculation of RCS from radar cuboid data, as the simulation is calibrated to a centrally positioned CCR on the runway. On the other hand, the window function in the model is iteratively determined, which means that measurement and modeling errors may also be present here.

Directly next to the highlighted cell \(\iota _{r}=18, \iota _{\phi }=16\) there is an area with the maximum deviation between simulation model and measurement, which is 55 dB. In the simulation model there is no input data from the ray tracing algorithm and only the noise simulation fills these cells of the radar cuboid with power values. During the measurements, no objects or asphalt peculiarities were noticed that justify this increase in power. For these reasons, there has to be an effect in the signal processing of the radar sensor, which is not considered in the radar model and is triggered by the CCR. Therefore, the method is able to identify a systematic model error at this point. Along the runway there are still increases in the model bias, ranging from 15 to 25 dB especially in the transition area of the asphalt and vegetation.

In the second satellite plot the scattering error represented by \(d_{\textrm{CAVM}}\) is visualized. The highest deviation is present at the CCR. The distribution shape in the measurement looks like a step function. The simulation EDFs are not so steep, and the propagated uncertainties have a big impact on the shape especially at lower powers. This proves the high sensitivity of the radar model chain with respect to the measurement data uncertainty.

Along the runway and the transition to vegetation, notable deviations are evident, as already in \(|d_{\textrm{bias}}|\). Of particular interest is the intersection of the taxiway with the runway highlighted by the red circle. Due to the transition between vegetation and asphalt, higher values in comparison to the surrounding cells for \(d_{\textrm{CAVM}}\) are shown.

In the \(d_{\textrm{Sum}}\) satellite plot, the differences now become even more apparent. In addition to the features of the runway, vegetation, and intersection already mentioned, the sensor’s close range has significant discrepancies. This underlines the findings from the analysis of the clusters in the previous chapter.

6 Conclusion

This paper introduces the concept of the DVM Map and its application to radar data using a static scenario. Based on four ECs, the need for an extension of the existing DVM definition is presented. A methodology which allows to apply the DVM Map on radar cuboid and detection level is described. A validation study is exemplified using the described experimental setup and an adapted radar simulation model. It is evident that looking at all detections only makes sense if the environment simulation is matched with the ray tracing algorithm. It can be seen that the different number of detections is a fundamental problem of the simulation model with all its components. The evaluation of the entire radar cuboid has the advantage that the number of data depends only on the correct model parameterization. The comparison reveals whether aleatory uncertainties such as noise are modeled correctly. The clustered detections are used to analyze the areas that are particularly affected by power differences. In the case of the validation study, the analysis of the CCR shows high deviations between measurement and simulation. Therefore, objects can be identified and evaluated particularly well using this approach. Finally, all cells of the radar cuboid are analyzed. Local aleatory and epistemic uncertainties of the environment model, e.g. vegetation and asphalt, are visible. Additionally, effects of the radar signal processing model can be separated from environment model and ray tracing algorithm. In general, it can be seen that validation of a sensor model and its signal processing is only possible if the environment simulation is qualified regarding physical effects and aleatory uncertainties.

Overall, it can be seen that the DVM Map with its application to the different levels significantly increases the interpretability of scenarios in the following manner:

  • The DVM Map gives the output score in units of the analyzed size.

  • The DVM Map gives information about the sensitivity of each modeled reference data inaccuracy in the simulation with each measurement.

  • Clustering gives dedicated information about object model and modeling errors.

  • Using the DVM Map in combination with the satellite plot, errors can be spatially localized, and thus the environment model can be examined.

The DVM Map can also be used to compare measurements with each other and thus investigate stochastic effects such as rain and compare the similarity of rain conditions between measurements. Furthermore, measurement setups that have to be dismantled and reassembled can be examined and compared by a reference measurement. Particular attention should be paid to the periphery of the radar sensors’ field of view, where azimuth ambiguities are present and radar detection accuracy is lower compared to the boresight. Therefore, future experiments should focus specifically on this area and evaluate the performance of the DVM Map. Additionally, an analysis of the signal-to-noise ratio offers further potential to improve the understanding of the underlying effects in the future. So far, the consideration of uncertainties is limited to the upper and lower bounds, which does not take into account mutual influences of the uncertainties. Therefore, it is recommended in a next step to combine the uncertainties with each other and to limit the parameter space in the process. As soon as further uncertainties are added and not only the upper and lower bounds of the uncertainties are varied, this inevitably leads to an explosion of the parameter space. Assuming a parameter number in the present scope with five instead of three variations is examined full factorial, the number of necessary simulations for just one scenario with a CCR is

$$\begin{aligned} n_{\textrm{sim}} = n_{\textrm{var}}^{n_{\textrm{param}}} = 5^7 = 78{,}125. \end{aligned}$$
(10)

This estimate does not include material properties or complex geometries and each simulation has to be repeated for a change in the model itself. Therefore, the parameter sensitivity of the model must be determined in advance and thus reduce the parameter space. In the future, we will extend the methodology developed here for static scenarios to dynamic scenarios. However, this poses challenges specifically with respect to temporal aggregation of data. It is imperative that these challenges be resolved to qualify the methodology for model validation.