Keywords

1 Introduction

Road safety is a pressing concern globally with the development of the automobile industry. According to traffic reports, traffic accidents could be briefly divided into two main categories: accidents involving visible road agents and accidents involving hidden road agents. Visible road agents refer to entities or objects on the road that are easily detectable and identifiable by drivers or sensors. On the other hand, hidden Road agents refer to entities or objects that cannot be directly detected by drivers or sensors by conventional means. Hidden road agents from occluded areas can pose significant threats to road safety as they are not readily visible to drivers. Therefore, detecting occluded areas is crucial. With successful detection, we can integrate the detection results into future Advanced Driver Assistance Systems (ADAS) to reduce the occurrence of collisions against hidden road agents.

Existing methods use external information, including information from other vehicles [1] and pedestrians [2], to detect the presence of hidden road agents within the occluded areas. These methods do not involve direct analysis of the surrounding environment to detect occluded areas that may contain hidden road agents. On the contrary, we firmly believe that experienced drivers can perceive potential dangers by observing the surrounding environment. Thus, utilizing visual perception methods to detect occluded areas is essentially simulating human perception.

Several existing methods detect occluded areas based on visual perception methods [3,4,5]. Fukuda et al.[3] utilized a large dataset to train a predictor to estimate the occlusion area or recover the scene. Jeong et al. [4] utilize 2D LiDAR data from the bird’s eye view to estimate the occlusion areas in an occupancy grid map. Ding and Song [5] utilized the density of depth information derived from an RGB-D camera to estimate occlusion in human-related scenarios. However, these methods faced challenges in offering geometric information of the areas required for ADAS.

To overcome this limitation, we proposed a detection method utilizing sensor fusion of camera and LiDAR and a key point extractor identifying object boundaries forming the occlusion [6]. We succeeded in extracting key points, detecting occluded areas, and obtaining geometric information. However, identifying individual occluded areas was difficult because the key points were not classified into distinct objects. In other words, we can accurately detect individual areas if we can achieve information on individual objects.

As a method to achieve object information, we adopted panoptic segmentation [7]. This method can distinguish between individual objects and obtain class labels of each pixel simultaneously. Thus, we adopted this method and applied it to our previous method [6] to propose an improved detection method.

2 Methodology

Figure 1 shows the overview of the proposed methodology. This methodology consists of three main processes: 1. Data pre-processing, 2. Key point extraction, and 3. Occluded area reconstruction. The first and second processes were elaborated in our previous work [6]. Thus, this study focuses on the third process: Occluded area reconstruction.

Fig. 1.
figure 1

Overview of the proposed methodology.

The inputs to Process 3 are the key points, boundary points of occluded areas containing depth information, extracted in Process 2 and the image obtained by the RGB camera. Figure 2 shows a sample of extracted key points.

Fig. 2.
figure 2

Sample result of extracted key points of occluded areas. The top image is showing the original RGB image and the bottom image is showing the extracted key points of occluded areas made by the vehicles and building seen in the RGB image.

This process is designed to output individual occluded areas and estimated types of hidden road agents in the areas. The process consists of four steps. Firstly, panoptic segmentation is conducted to get segmented results of objects with class labels. Semantic edges of each object can be extracted from the segmented results. Secondly, the edge warping technique was adopted to obtain the depth edge of individual objects by integrating the semantic edges and the key points obtained in the previous process. Thirdly, Graham’s scan method was adopted to construct a convex hull of the depth edges of an individual object to reconstruct the occluded area. Lastly, a decision tree model was employed to estimate the types of potential road agents. Spatial information of occluded areas, such as depth, height, and width, and semantic information of occluded areas, the category label associated with the panoptic segmentation, were the input of the model.

3 Evaluation

3.1 Method

As the evaluation of the proposed methodology, the detection accuracy of occluded areas and the estimation accuracy of types of hidden road agents were evaluated. For the detection accuracy, the average Intersection of Union \(\overline{IoU}\) was adopted as the evaluation metric, which can be expressed as Eq. 1, where \(IoU_{i}\) denotes the IoU of area i and n denotes the number of areas. IoU between the detected results and manually labeled ground truth was evaluated for each area. For the estimation accuracy, Precision was adopted as the evaluation metric. As for the hidden road agent estimation, the model was trained with the ground truth dataset and tested with the same dataset.

$$\begin{aligned} \overline{IoU} = \frac{1}{n}\sum ^{n}_{i}IoU_{i} \end{aligned}$$
(1)

The method was evaluated for four different scenarios (S1–S4) included in the KITTI dataset [8]. Ground truth labels were annotated manually. Annotations of occluded areas were done by analyzing point cloud data, and annotations of types of hidden road agents were done by analyzing each area in RGB images.

3.2 Results

Table 1 shows the results of \(\overline{IoU}\) among different scenarios. The achieved results are acceptable, as the \(\overline{IoU}\) exceeded 0.5 in certain scenarios, indicating a satisfactory result. Figure 3 compares a sample scene’s detected areas and ground truths. From this result, it could be seen that our method was able to detect the same individual occluded areas as the ground truth. Although our method detected the same object, detected areas were larger than the ground truths, leading to a larger area of union, a smaller area of intersection, and a smaller IoU consequently.

Table 1. Evaluation results of \(\overline{IoU}\).
Fig. 3.
figure 3

Comparison of detected occluded areas and ground truths of the scene shown in Fig. 2. Red polygons are the detected occluded area and the blue polygons are the ground truths.

Fig. 4.
figure 4

Representative result of panoptic segmentation failing to distinguish foreground and background buildings as different objects. The top image is the original RGB image, the middle image is the panoptic segmentation result, and the bottom image is the segmentation result of the building on the left side.

In terms of estimation of hidden road agent types, we achieved an accuracy of approximately 0.97. The high accuracy could be attributed to incorporating semantic information obtained in panoptic segmentation. We provided a higher-dimensional dataset by introducing semantic data alongside the original geometric information, enhancing linear separability.

4 Discussions

Although the evaluation metric was not high, the detection results of the proposed methodology were satisfactory and close to the ground truths, as shown in Fig. 3. This detection was possible because the key points of occluded areas were aligned with the object edges, thanks to the semantic edges obtained in the panoptic segmentation. Thus, it was confirmed that panoptic segmentation was effective in the occluded area reconstruction process.

However, at the same time, our method exhibited sub-optimal performance primarily due to the limitations of the panoptic segmentation. The segmentation struggled to distinguish background information. Figure 4 shows a sample of panoptic segmentation result, which failed to distinguish the foreground and background buildings on the left side. Since the foreground building was not segmented as a single object, the method failed to obtain the key points of the occluded area on the left side made by the foreground building. This type of failure was observed among vegetation such as trees. To overcome this issue, an update in the representation of segmentation is necessary to treat background objects as different objects.

5 Conclusions

This study proposed an occluded area detection method employing panoptic segmentation to improve our previous methodology, utilizing camera and LiDAR sensor fusion. The results showed that the proposed method could achieve satisfactory detection and superior accuracy in estimating hidden road agent types. From the results, it could be noted that our approach utilizing both geometric and semantic features obtained through sensor fusion and panoptic segmentation offered a more holistic perspective for detecting occluded areas. We plan to expand the size of the ground truth dataset to obtain more reliable evaluation results and analyze the strength of the proposed method in depth.