Accuracy Requirements of Camera-Based Depth Estimation for Urban Automated Driving

Westendorf, M.; Thal, S.; Ahrenhold, T.; Henze, R.

doi:10.1007/978-3-031-70392-8_23

M. Westendorf¹⁷,
S. Thal¹⁷,
T. Ahrenhold¹⁷ &
…
R. Henze¹⁷

Part of the book series: Lecture Notes in Mechanical Engineering ((LNME))

Included in the following conference series:

Advanced Vehicle Control Symposium

Abstract

For autonomous driving in urban areas higher accuracy requirements for localization of surrounding traffic participants become apparent. The use of cost-efficient camera sensors shows potential for a performant depth estimation and can supplement perception systems to achieve redundancy. Current research focuses on improving the algorithms towards better performance whereas the application-oriented analysis of present estimation errors in relation to urban traffic scenarios is often neglected. Based on stereo and mono camera images, a benchmark analysis of rule- and deep learning-based depth estimation approaches is conducted in this work. The error-prone estimation results are then analyzed against braking distances of urban traffic scenarios simulated by a two-track model to analyze the criticality of different depth estimation approaches. The application-oriented evaluation shows that current approaches could already be used in real automated driving systems and enable the definition of requirements.

You have full access to this open access chapter, Download conference paper PDF

Keywords

1 Introduction and Framework

In this work, the focus is set on depth estimation of binocular and monocular camera systems for autonomous driving in urban areas. For monocular systems with a single input image the issue persists that depth estimation leads to ambiguous results. To overcome this issue predictive models can be used to learn the relationship between images and depth [5]. In stereo vision, two rectified input images provide a remedy for the issue as epipolar geometry can be used to generate disparity maps. The disparity map can than be transferred into depth maps by the given geometric camera relation. Predictive models are capable of learning the stereo matching in an end-to-end fashion while increasing accuracy [6]. Two predictive models define key points of this work. Monodepth2 [5] represents the mono camera-based model whereas 2D-MobileStereoNet (2D-MSNet) [6] is the chosen model for stereo matching. Both models are characterized by low hardware requirements and are therefore suitable for vehicle-related application. The baseline of the analysis is a geometric approach which uses camera extrinsics and inverse projection to determine metric object depth [3]. For all depth estimation approaches we use a Python Framework, the original algorithm repositories and the Kitti tracking multi-object dataset [4] as image source. The 2D object detection information is taken from the dataset and used to locate and classify a filtered object class subset consisting of vehicles, pedestrians and bicycles.

2 Methodology

To infer metric object depth, any lens distortion must be removed from input images. Afterwards the image is processed by the stated depth estimation algorithms in Sect. 1. As an evaluation basis, we use trained models. For Monodepth2 the model “mono+stereo \(640\times 192\)” is selected and for 2D-MSNet the model “SF + DS + KITTI2015”. The resulting disparity maps from the deep-learning models are converted to metric depth maps by scaling known from the camera setup. The given ground truth object detection results are used in parallel to generate a list of reference points and bounding box properties per image. Using these object reference points as index for depth maps results in metric depth per object. Figure 1 visualizes exemplary inference output for both deep-learning based approaches and details the the different referent points chosen. The lower reference point is only used for the rule-based inverse perspective mapping (IPM) since the used algorithm projects the pixel coordinates into the camera reference frame. The assumption that each reference point is located on the road plane solves the ambiguity of the inverse projection. Since the rule-based distance estimation is affected by the vehicles dynamics, we use the given pitch and roll angles by the dataset to compensate vehicle movement [3]. After depth estimation for each of the 21 data splits, we filter truncated and occluded objects and calculate the absolute error based on ground truth LiDAR depth maps per chosen distance intervals.

The simulated braking distances represent an automatic emergency brake (AEB) in Euro NCAP Car-to-Pedestrian Nearside Adult (CPNA) [2] and Car-to-Car Rear stationary (CCRs) [1] test scenarios with a maximum velocity of 50 [km/h]. The simulation is based on a two-track model of the institute of Automotive Engineering which is statistically validated with real driving data. A Magic Formula tire model of Version 5.2 with parameters of current summer and winter tires and road friction coefficients of \( {\mu _{R}} = 0.3,0.6,1\) for snow, wet and dry road conditions are considered. The trigger distance is set to the braking distance to fulfill the collision unavoidable criteria [7]. Both the performance measures of the depth estimation and the results of the two-track model form the basis for the evaluation. Whereas current publications focus strongly on optimization, the outlined methodology closes the gap between the algorithmic evaluation on depth metrics and real applications like autonomous driving in urban scenarios. The results enable the definition of accuracy requirements and the impact assessment of estimation errors.

3 Evaluation

For the full training dataset and the object classes car, van, cyclist and pedestrian mean absolute error values of the metric depth are given per ground truth distance intervals of 10 m in Fig. 2. Exemplary objects evaluated with the IPM algorithm have a mean abs. error of 14.96 [m] for the interval which spans from 40 [m] to 50 [m]. The corresponding number of estimations (images \(\times \) objects) is detailed on the right. As the distance towards the objects increases, the number of objects decreases which reflects the extra urban and urban environments of the data set. In summary the 2D-MSNet shows the lowest error values across all data, as the image data from the left and right cameras are used to solve the correspondence problem. Both monocamera-based approaches have comparable error values up to 30 [m]. Whereas the error value of the IPM approach stagnates from a distance interval of 50 [m] at approx. 15 [m] under the conditions used, the error values of Monodepth2 continue to increase.

In regards to the application of autonomous driving, depth underestimation leads to an earlier braking event, overestimation means a late event. The former can cause self-inflicted rear-end collisions, the latter is safety-critical for front collisions. Considering the collision unavoidable criteria for autonomous emergency braking, full braking triggers when the metric object depth corresponds to the required braking distance [8]. Hence absolute error statistics must be added to braking distances and lead to a collision with a simulated collision velocity \({v_{Collision}}\). Figure 3 shows the simulated velocity profile over the braking distance \({s_{B}}\) for a road friction coefficient of \( {\mu _{R}} = 0.3\) and winter tires. The braking distance shifted by the 75% error qunatile \(s_{B}+\epsilon _{s_{B}, q_{0.75}}\) is detailed for the Monodepth2 algorithm. The quantile is calculated based on a narrower distance interval which spans over the simulated braking distance of 35.9 [m] with a tolerance \(\pm 1\) [m]. The added error leads to a simulated collision velocity \({v_{Collision}}\) of 23.55 [km/h]. Based on the ISO/DIS 26262-3 standard and severity levels of an automotive safety integrity level classification, this qualifies as a severity level S2 (\(<40\) [km/h], severe injuries). Furthermore, the given speed limits of the standard can be used to calculate the required error reduction \(\varDelta _{s_{B},S}\) for the next lower severity level depending on the tire type and the road friction coefficient. For the parameters of Fig. 3 the severity level can be reduced to S1 (\(v_{Severity}=20\) [km/h]) if the \(\epsilon _{s_{B},q_{0.75}}\) is reduced by \(\varDelta _{s_{B},S} = 2.22\) [m]. The simulated braking distance with applied error reduction \(s_{B} + \epsilon _{s_{B},q_{0.75}} - \varDelta _{s_{B},S}\) is visualized with a dashed line. Since the standard does not detail the collision velocity for a severity level S0, the level is only achievable if a crash is prevented. So the required error reduction equals the qunatile \(\epsilon _{s_{B},0.75}\) for S0.

Table 1 summarizes simulations results for all parameter combinations and algorithms as well as the requirements for error reduction. The lower error values of the 2D-MSNet over the entire data set are reflected in the resulting severity classes.

Table 1. Accuracy requirements formulated as \(\varDelta _{s_{B},S}\) for the resulting severity class for selected depth estimation algorithms. Lowest absolute error quantiles and delta values given in bold over all algorithms per tire type.

Full size table

4 Conclusion and Future Work

The proposed evaluation method of depth estimation algorithms as a function of real applications shows based on low severity levels that depth estimation of the 2D-MSNet algorithm can already be used in real driving scenarios. The IPM results motivate further usage for the road condition classes “dry” and “wet”. Moreover accuracy requirements for achieving a lower severity level can be formulated based on collision velocity limits given by standards as shown in Table 1. These detail an application-based foundation for further algorithm optimization and self-trained neural networks. A decisive factor of the proposed methodology is the assumption that a braking event is triggered as soon as a collision can no longer be prevented. Future analysis will target the impact on comfort-oriented braking systems. Since the used training dataset does not incorporate the weather conditions of the simulated braking distance and chosen road friction coefficients in image data, future work will take into account the impact of weather conditions on the depth estimation and object detection.

References

European New Car Assessment Programme: Test protocol – aeb car-to-car systems: Implementation 2023 (2022). https://cdn.euroncap.com/media/67887/euro-ncap-aeb-c2c-test-protocol-v40.pdf
European New Car Assessment Programme: Test protocol – aeb/lss vru systems: Implementation 2023 (2022). https://cdn.euroncap.com/media/70313/euro-ncap-aeb-lss-vru-test-protocol-v42.pdf
Fermi, T., Singh, M., Theers, M.: Algorithms for Automated Driving (2023). https://doi.org/10.5281/zenodo.7887756. https://thomasfermi.github.io/Algorithms-for-Automated-Driving/Introduction/intro.html
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The kitti vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361. IEEE (2012). https://doi.org/10.1109/CVPR.2012.6248074
Godard, C., Mac Aodha, O., Firman, M., Brostow, G.: Digging into Self-supervised Monocular Depth Estimation. https://doi.org/10.48550/arXiv.1806.01260
Shamsafar, F., Woerz, S., Rahim, R., Zell, A.: Mobilestereonet: Towards Lightweight Deep Networks for Stereo Matching. http://arxiv.org/pdf/2108.09770v1
Ahrenhold, T., Iatropoulos, J., Pethe, C., Henze, R., Kücükay, F.: Accuracy requirements for the road friction estimation of a friction-adaptive automatic emergency brake (AEB). In: 30th Aachen Colloquium Sustainable Mobility (2021)
Google Scholar
Winner, H., Hakuli, S., Lotz, F., Singer, C.: Handbuch Fahrerassistenzsysteme. Springer Fachmedien Wiesbaden, Wiesbaden (2015). https://doi.org/10.1007/978-3-658-05734-3
Book Google Scholar

Download references

Acknowledgement

The research presented in this paper is supported by the Federal Ministry for Economic Affairs and Climate Action of the Federal Republic of Germany as part of the SUE (Self- Driving Urban E-Shuttle) project.

Author information

Authors and Affiliations

Institute of Automotive Engineering, Technische Universität Braunschweig, Hans-Sommer-Str. 4, 38106, Braunschweig, Lower Saxony, Germany
M. Westendorf, S. Thal, T. Ahrenhold & R. Henze

Authors

M. Westendorf
View author publications
You can also search for this author in PubMed Google Scholar
S. Thal
View author publications
You can also search for this author in PubMed Google Scholar
T. Ahrenhold
View author publications
You can also search for this author in PubMed Google Scholar
R. Henze
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. Westendorf .

Editor information

Editors and Affiliations

Department of Mechanical Engineering, Politecnico di Milano, Milano, Italy
Giampiero Mastinu
Department of Mechanical Engineering, Politecnico di Milano, Milano, Italy
Francesco Braghin
Department of Mechanical Engineering, Politecnico di Milano, Milano, Italy
Federico Cheli
Department of Electronics, Information Technology and Bioengineering, Politecnico di Milano, Milano, Italy
Matteo Corno
Department of Electronics, Information Technology and Bioengineering, Politecnico di Milano, Milano, Italy
Sergio M. Savaresi

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this paper

Cite this paper

Westendorf, M., Thal, S., Ahrenhold, T., Henze, R. (2024). Accuracy Requirements of Camera-Based Depth Estimation for Urban Automated Driving. In: Mastinu, G., Braghin, F., Cheli, F., Corno, M., Savaresi, S.M. (eds) 16th International Symposium on Advanced Vehicle Control. AVEC 2024. Lecture Notes in Mechanical Engineering. Springer, Cham. https://doi.org/10.1007/978-3-031-70392-8_23

Download citation

DOI: https://doi.org/10.1007/978-3-031-70392-8_23
Published: 04 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70391-1
Online ISBN: 978-3-031-70392-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics