Occluded Area Detection Based on Sensor Fusion and Panoptic Segmentation

Yoshitake, Hiroshi; Gu, Jinyu; Shino, Motoki

doi:10.1007/978-3-031-70392-8_66

Hiroshi Yoshitake¹⁷,
Jinyu Gu¹⁷ &
Motoki Shino¹⁸

Part of the book series: Lecture Notes in Mechanical Engineering ((LNME))

Included in the following conference series:

Advanced Vehicle Control Symposium

Abstract

Detecting occluded areas in a driving environment is crucial to preventing traffic accidents against hidden road agents coming out from such occluded areas. Our previous work proposed a novel detection method that can offer geometric information of the detected areas by utilizing camera and LiDAR sensor fusion. However, it had difficulty identifying individual areas formed by different objects without information about distinct objects. Thus, the objective of this study was to improve our previous methodology, and panoptic segmentation, which can distinguish between individual objects and offer semantic class labels of the object, was adopted to overcome the limitation. Evaluation results revealed that our proposed methodology could achieve satisfactory results in occlusion area detection and superior accuracy in estimating hidden road agent types in the detected areas.

You have full access to this open access chapter, Download conference paper PDF

Keywords

1 Introduction

Road safety is a pressing concern globally with the development of the automobile industry. According to traffic reports, traffic accidents could be briefly divided into two main categories: accidents involving visible road agents and accidents involving hidden road agents. Visible road agents refer to entities or objects on the road that are easily detectable and identifiable by drivers or sensors. On the other hand, hidden Road agents refer to entities or objects that cannot be directly detected by drivers or sensors by conventional means. Hidden road agents from occluded areas can pose significant threats to road safety as they are not readily visible to drivers. Therefore, detecting occluded areas is crucial. With successful detection, we can integrate the detection results into future Advanced Driver Assistance Systems (ADAS) to reduce the occurrence of collisions against hidden road agents.

Existing methods use external information, including information from other vehicles [1] and pedestrians [2], to detect the presence of hidden road agents within the occluded areas. These methods do not involve direct analysis of the surrounding environment to detect occluded areas that may contain hidden road agents. On the contrary, we firmly believe that experienced drivers can perceive potential dangers by observing the surrounding environment. Thus, utilizing visual perception methods to detect occluded areas is essentially simulating human perception.

Several existing methods detect occluded areas based on visual perception methods [3,4,5]. Fukuda et al.[3] utilized a large dataset to train a predictor to estimate the occlusion area or recover the scene. Jeong et al. [4] utilize 2D LiDAR data from the bird’s eye view to estimate the occlusion areas in an occupancy grid map. Ding and Song [5] utilized the density of depth information derived from an RGB-D camera to estimate occlusion in human-related scenarios. However, these methods faced challenges in offering geometric information of the areas required for ADAS.

To overcome this limitation, we proposed a detection method utilizing sensor fusion of camera and LiDAR and a key point extractor identifying object boundaries forming the occlusion [6]. We succeeded in extracting key points, detecting occluded areas, and obtaining geometric information. However, identifying individual occluded areas was difficult because the key points were not classified into distinct objects. In other words, we can accurately detect individual areas if we can achieve information on individual objects.

As a method to achieve object information, we adopted panoptic segmentation [7]. This method can distinguish between individual objects and obtain class labels of each pixel simultaneously. Thus, we adopted this method and applied it to our previous method [6] to propose an improved detection method.

2 Methodology

Figure 1 shows the overview of the proposed methodology. This methodology consists of three main processes: 1. Data pre-processing, 2. Key point extraction, and 3. Occluded area reconstruction. The first and second processes were elaborated in our previous work [6]. Thus, this study focuses on the third process: Occluded area reconstruction.

The inputs to Process 3 are the key points, boundary points of occluded areas containing depth information, extracted in Process 2 and the image obtained by the RGB camera. Figure 2 shows a sample of extracted key points.

This process is designed to output individual occluded areas and estimated types of hidden road agents in the areas. The process consists of four steps. Firstly, panoptic segmentation is conducted to get segmented results of objects with class labels. Semantic edges of each object can be extracted from the segmented results. Secondly, the edge warping technique was adopted to obtain the depth edge of individual objects by integrating the semantic edges and the key points obtained in the previous process. Thirdly, Graham’s scan method was adopted to construct a convex hull of the depth edges of an individual object to reconstruct the occluded area. Lastly, a decision tree model was employed to estimate the types of potential road agents. Spatial information of occluded areas, such as depth, height, and width, and semantic information of occluded areas, the category label associated with the panoptic segmentation, were the input of the model.

3 Evaluation

3.1 Method

As the evaluation of the proposed methodology, the detection accuracy of occluded areas and the estimation accuracy of types of hidden road agents were evaluated. For the detection accuracy, the average Intersection of Union $\overline{IoU}$ was adopted as the evaluation metric, which can be expressed as Eq. 1, where $IoU_{i}$ denotes the IoU of area i and n denotes the number of areas. IoU between the detected results and manually labeled ground truth was evaluated for each area. For the estimation accuracy, Precision was adopted as the evaluation metric. As for the hidden road agent estimation, the model was trained with the ground truth dataset and tested with the same dataset.

$$\begin{aligned} \overline{IoU} = \frac{1}{n}\sum ^{n}_{i}IoU_{i} \end{aligned}$$

(1)

The method was evaluated for four different scenarios (S1–S4) included in the KITTI dataset [8]. Ground truth labels were annotated manually. Annotations of occluded areas were done by analyzing point cloud data, and annotations of types of hidden road agents were done by analyzing each area in RGB images.

3.2 Results

Table 1 shows the results of $\overline{IoU}$ among different scenarios. The achieved results are acceptable, as the $\overline{IoU}$ exceeded 0.5 in certain scenarios, indicating a satisfactory result. Figure 3 compares a sample scene’s detected areas and ground truths. From this result, it could be seen that our method was able to detect the same individual occluded areas as the ground truth. Although our method detected the same object, detected areas were larger than the ground truths, leading to a larger area of union, a smaller area of intersection, and a smaller IoU consequently.

Table 1. Evaluation results of $\overline{IoU}$.

Full size table

In terms of estimation of hidden road agent types, we achieved an accuracy of approximately 0.97. The high accuracy could be attributed to incorporating semantic information obtained in panoptic segmentation. We provided a higher-dimensional dataset by introducing semantic data alongside the original geometric information, enhancing linear separability.

4 Discussions

Although the evaluation metric was not high, the detection results of the proposed methodology were satisfactory and close to the ground truths, as shown in Fig. 3. This detection was possible because the key points of occluded areas were aligned with the object edges, thanks to the semantic edges obtained in the panoptic segmentation. Thus, it was confirmed that panoptic segmentation was effective in the occluded area reconstruction process.

However, at the same time, our method exhibited sub-optimal performance primarily due to the limitations of the panoptic segmentation. The segmentation struggled to distinguish background information. Figure 4 shows a sample of panoptic segmentation result, which failed to distinguish the foreground and background buildings on the left side. Since the foreground building was not segmented as a single object, the method failed to obtain the key points of the occluded area on the left side made by the foreground building. This type of failure was observed among vegetation such as trees. To overcome this issue, an update in the representation of segmentation is necessary to treat background objects as different objects.

5 Conclusions

This study proposed an occluded area detection method employing panoptic segmentation to improve our previous methodology, utilizing camera and LiDAR sensor fusion. The results showed that the proposed method could achieve satisfactory detection and superior accuracy in estimating hidden road agent types. From the results, it could be noted that our approach utilizing both geometric and semantic features obtained through sensor fusion and panoptic segmentation offered a more holistic perspective for detecting occluded areas. We plan to expand the size of the ground truth dataset to obtain more reliable evaluation results and analyze the strength of the proposed method in depth.

References

Zheng, X., et al.: Multivehicle multisensor occupancy grid maps (MVMS-OGM) for autonomous driving. IEEE Internet Things J. 9(22), 22944–22957 (2022)
Article Google Scholar
Hara, K., et al.: Predicting appearance of vehicles from blind spots based on pedestrian behaviors at crossroads. IEEE Trans. Intell. Transp. Syst. 23(8), 11917–11929 (2021)
Article Google Scholar
Fukuda, T., et al.: Blindspotnet: seeing where we cannot see. In: Proceedings of European Conference on Computer Vision, pp. 554–569 (2022)
Google Scholar
Jeong, Y., et al.: Collision preventive velocity planning based on static environment representation for autonomous driving in occluded region. In: Proceedings of 2020 IEEE Intelligent Vehicles Symposium, pp. 425–430 (2020)
Google Scholar
Ding, P., Song, Y.: Robust object tracking using color and depth images with a depth based occlusion handling and recovery. In: Proceedings of 2015 12th International Conference on Fuzzy Systems and Knowledge Discovery, pp. 930–935 (2015)
Google Scholar
Gu, J., et al.: Occlusion detection method using key point extractor based on multiple sensor fusion. In: Proceedings of 7th International Symposium on Future Active Safety Technology Toward Zero Traffic Accidents (2023)
Google Scholar
Kirillov, A., et al.: Panoptic segmentation. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9404–9413 (2019)
Google Scholar
Geiger, A., et al.: Vision meets robotics: the kitti dataset. Int. J. Rob. Res. 32(11), 1231–1237 (2013)
Article Google Scholar

Download references

Author information

Authors and Affiliations

The University of Tokyo, Kashiwa, Chiba, 277-8563, Japan
Hiroshi Yoshitake & Jinyu Gu
Tokyo Institute of Technology, Ookayama, Tokyo, 152-8550, Japan
Motoki Shino

Authors

Hiroshi Yoshitake
View author publications
You can also search for this author in PubMed Google Scholar
Jinyu Gu
View author publications
You can also search for this author in PubMed Google Scholar
Motoki Shino
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hiroshi Yoshitake .

Editor information

Editors and Affiliations

Department of Mechanical Engineering, Politecnico di Milano, Milano, Italy
Giampiero Mastinu
Department of Mechanical Engineering, Politecnico di Milano, Milano, Italy
Francesco Braghin
Department of Mechanical Engineering, Politecnico di Milano, Milano, Italy
Federico Cheli
Department of Electronics, Information Technology and Bioengineering, Politecnico di Milano, Milano, Italy
Matteo Corno
Department of Electronics, Information Technology and Bioengineering, Politecnico di Milano, Milano, Italy
Sergio M. Savaresi

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yoshitake, H., Gu, J., Shino, M. (2024). Occluded Area Detection Based on Sensor Fusion and Panoptic Segmentation. In: Mastinu, G., Braghin, F., Cheli, F., Corno, M., Savaresi, S.M. (eds) 16th International Symposium on Advanced Vehicle Control. AVEC 2024. Lecture Notes in Mechanical Engineering. Springer, Cham. https://doi.org/10.1007/978-3-031-70392-8_66

Download citation

DOI: https://doi.org/10.1007/978-3-031-70392-8_66
Published: 04 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70391-1
Online ISBN: 978-3-031-70392-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Occluded Area Detection Based on Sensor Fusion and Panoptic Segmentation