Traffic Sign Detection Based on Improved YOLOv3 in Foggy Environment

Ma, Luxi; Wu, Qinmu; Zhan, Yu; Liu, Bohai; Wang, Xianpeng

doi:10.1007/978-981-19-2456-9_70

Luxi Ma⁴⁰,
Qinmu Wu⁴⁰,
Yu Zhan⁴⁰,
Bohai Liu⁴⁰ &
…
Xianpeng Wang⁴⁰

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE))

Included in the following conference series:

INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND APPLICATIONS

8799 Accesses
6 Citations

Abstract

Aiming at the problem of poor detection accuracy and inaccurate positioning of traffic signs under foggy conditions, this paper proposes an improved YOLOv3 detection algorithm. Firstly, a data set of Chinese traffic signs in a foggy environment is constructed; The dark channel a priori algorithm based on guided filtering is used to process the image with fog, which overcomes the problem of image quality degradation caused by fog. Mosaic data enhancement is performed on the annotated data set image, which speeds up the convergence speed of the network. Increased the feature scale of YOLOv3 algorithm. The loss function of the network is optimized, CIOU is used as the positioning loss, and the positioning accuracy is improved. At the same time, the method of transfer learning is used to overcome the problem of insufficient samples. The enhanced yolov3 algorithm proposed in this paper has higher detection accuracy and shorter detection time than the standard yolov3 algorithm and SSD algorithm.

You have full access to this open access chapter, Download conference paper PDF

Road Sign Detection and Recognition of Thai Traffic Based on YOLOv3

The Improved Framework for Traffic Sign Recognition Using Guided Image Filtering

Article Open access 30 August 2022

LLTH-YOLOv5: A Real-Time Traffic Sign Detection Algorithm for Low-Light Scenes

Article 05 January 2024

Keywords

1 Introduction

Traffic sign detection and recognition is one of the research hotspots of environment perception in the three major modules of unmanned driving [1]. Traffic sign recognition plays an important role in unmanned driving. However, in foggy weather, there are some problems in traffic sign detection, such as small target, unclear target and so on. The designed algorithm needs to take into account the characteristics of high precision and real-time. At the same time, it is necessary to ensure that the training image data is sufficient so that the neural network model can learn the characteristics of traffic signs in different complex environments [2].

Yu fuses the dark channel prior algorithm with MSR to defog, and uses the Faster R-CNN two-stage target detection algorithm to detect traffic signs in foggy environments. Compared with the first stage target detection algorithm, the detection speed is slower and the calculation amount is larger [3]. Xu uses image enhancement to defog, and proposes an improved convolutional neural network design to recognize traffic signs. The method of image enhancement is not to remove the fog, but to sharpen the image. This method can only be used for traffic sign detection under light and medium fog, and the effect is not ideal under dense fog. Chen and others first used the deep learning algorithm IRCNN to remove the haze, and then proposed a multi-channel convolutional neural network (Multi-channel CNN) model to identify the image after the haze removal [4]. However, the defogging method based on deep learning requires a large number of images in the data set and the speed is relatively slow. Moreover, none of the above methods has constructed a traffic sign data set in a foggy environment.

2 Image Defogging Preprocessing

2.1 Data Set Construction

In the research of traffic sign detection and recognition, researchers mostly use the American traffic sign data set (LISA) and other algorithms for performance testing. However, most of the above data set samples are collected under good lighting conditions, and no domestic researcher has constructed and published a rich comprehensive for the identification of China. The traffic sign data set of China in the foggy environment. For the traffic sign detection of YOLOv3 in the foggy environment, this article must have the Chinese traffic sign data set in the foggy environment [5].

Based on this, for traffic sign detection in a foggy environment, on the one hand, some clear traffic sign pictures are downloaded from the Internet, and on the other hand, it is collected on the spot by taking pictures in heavy fog. The images are divided into training set and test set according to the ratio of 8:2, a total of 3415 images, including 2390 training set and 1025 test set. Use LabelImg software to label each image. The label information includes the category attribute of the traffic sign, the illumination of the image, the upper left and lower right coordinates of the sign border (in pixels), and the information is saved in xml format. The data is divided into 3 categories: indication signs, prohibition signs, and warning signs.

2.2 Dehazing Algorithm

The existing defogging algorithms are mainly divided into three categories: One is the defogging algorithm based on image enhancement. The second is a defogging algorithm based on image restoration. Three defogging algorithms based on deep learning [6].

This paper compares several algorithms. The dehazing effect is shown in Fig. 4; the best effect is the DehazeNet algorithm. Its disadvantage is that it takes a long time and the average running time is 1.14 s. Therefore, in combination with traffic sign detection scenarios, this paper uses dark channel based on guided filtering. Empirical algorithm for image restoration [7]. The dark channel a priori principle believes that in most non-sky local areas, one of the three RGB color channels of each image has a very low gray value, almost tending to zero. According to the above principles, the dark channel map can be obtained first, and then the atmospheric light value and transmittance can be estimated by using the dark channel map, and the transmission function is refined by the guided filter, and the transmittance value is optimized. Finally, the result obtained is substituted into the atmospheric scattering The model can get the restored image. The steps of the algorithm are shown in Fig. 1 (Fig. 2):

3 YOLOv3 Algorithm and Improvement

This article chooses YOLOv3 model to complete this research because YOLOv3 has made improvements in category prediction, bounding box prediction, multi-scale fusion prediction, and feature extraction [8]; YOLOv3’s mAP can be comparable to RetinaNet, but the speed is increased by about 4 times. At the same time, there have been significant improvements in detecting small objects. Therefore, it is ideal to apply to the detection and recognition of traffic signs in complex environments [9].

3.1 YOLOv3 Detection Network

As shown by the dotted line in Fig. 3, in order to improve the accuracy of the algorithm for small target detection, YOLOv3 uses 5 downsampling of the input image and predicts the target in the last 3 downsampling. It can output 3 features of different scales, respectively Output 1, 2, 3 for prediction. The rule of side length is 13:26:52, and the depth is 255. The up-sample and fusion method of FPN (feature pyramid networks) is adopted; the advantage of choosing up-sample in the network: the expression effect is determined by the network level, and the effect becomes better as the network level deepens, so that you can directly use the deeper object characteristics to perform the object predict [10].

3.2 YOLOv3 Network Optimization

Improved Multi-scale Prediction YOLOv3 Model.

YOLOv3 only uses three-scale features, and the shallow information used is not sufficient [11]. Aiming at the problems that the detection and classification of traffic sign targets in complex environments are affected by different environments and the target is small, an improved YOLOv3 deep neural network was designed and proposed, and the fourth feature scale was added: 104 × 104; as shown in Fig. 6 As shown. The thick line in Fig. 3 shows an improved multi-scale network structure.

The specific method is: in the YOLOv3 network, after the feature layer with a detection scale of 13 × 13 is up-sampled twice, the original feature scale of 52 × 52 can be increased to 104 × 104. If you want to make full use of deep features and For shallow features, the 109th layer and the 11th layer of the feature extraction network should be feature fused through the route layer. The remaining feature fusion is: the 85th and 97th layers outputted after 2 times upsampling. The feature maps of the 85th and 61st layers, and the 97th and 36th layers are respectively merged through the route layer. As shown in Table 1, each feature layer is specific.

Table 1. YOLOv3 feature map

Full size table

Mosaic Image Enhancement.

Traditional data enhancement methods only enrich the number of data set by changing the characteristics of the image [12]. Mosaic image enhancement is a process in which a new image is obtained by combining 4 random images to train the network, which increases the diversity of data and the number of targets provide a more complex and effective training background. At the same time, the original image annotation information still exists. As shown in Fig. 4. This can further improve the accuracy and recall rate. At the same time, because multiple images are input to the network at the same time, the batch size of the input network is increased in disguise. Inputting an image stitched by four images is equivalent to inputting four original images (batch size = 4) in parallel, which reduces the need for training. The performance requirements of the equipment. Effectively improve the efficiency of statistical mean and variance of the BN (Batch Normalization) layer.

Loss Function.

YOLOv3 loss is divided into three parts: positioning loss Lloc (o, c), confidence loss Lconf (o, c), classification loss Lcla (o, c) three parts, as shown in formula 1:

$$ \begin{aligned} & L(o,c,O,C,l,g) \\ & = \,\lambda_{1} L_{conf} (o,c) + \lambda_{2} L_{cla} (O,C) + \lambda_{3} L_{loc} (l,g) \\ \end{aligned} $$

(1)

Among them, λ1, λ2, and λ3 are balance coefficients.

Intersection-to-Union Ratio (IOU) When performing bounding box regression prediction, when two bounding boxes (target bounding boxes) do not intersect, according to the definition of IOU, the IOU is zero at this time, and the propagation loss cannot be calculated at this time. In order to solve this defect, this paper introduces the CIOU loss function for the regression prediction of the bounding box. An excellent regression positioning loss should consider three geometric parameters: overlap area, center point distance and aspect ratio. The calculation formula is shown in formula (2):

$$ {\text{CI}}{\text{o}}{\text{U}} = {\text{IoU}} - \left( {\frac{{P^{2} ({\text{b,b}}^{gt} )}}{{c^{2} }} + av} \right) $$

(2)

$$ L_{CIoU} = 1 - CIoU $$

(3)

Among them, α is the weight function, and ν is used to measure the similarity of the aspect ratio, and the definition is shown in formula (4) (5).

$$ v = \frac{4}{{\pi^{2} }}(\arctan \frac{{w^{gt} }}{{h^{gt} }} - \arctan \frac{w}{h})^{2} $$

(4)

$$ a = \frac{v}{(1 - IoU) + v} $$

(5)

When the CIOU does not overlap with the target box, it can still provide the moving direction for the bounding box. The distance between the two target frames can be minimized directly, and the convergence is much faster. After adding aspect ratio considerations, it can further quickly converge and improve performance.

Retraining Based on Transfer Learning.

In the experiment, the idea of middle-level migration in migration learning is adopted. The training of the network model requires a large number of traffic signs. However, the database selected in this experiment only has 3,415 images. The lack of image data will make the network model under-fitting and ultimately reduce the detection accuracy. This article first initializes the pre-trained model (trained on the coco data set on the YOLO official website), Then use this model to retrain the system in this article. The training time is greatly reduced, and the probability of model divergence and fitting process is also reduced. There are a large amount of weight information and feature data in the pre-trained training model [13]. Weight information, these feature information can usually be shared by different tasks, transfer learning is to avoid relearning this knowledge by transferring specific and common feature data and information, and achieve rapid learning.

4 Evaluation of Training Results

4.1 Experimental Environment and Data

See Tables 2 and 3.

Table 2. Experimental environment configuration

Full size table

Table 3. Configuration file parameters

Full size table

4.2 Evaluation Indicators

The evaluation indicators are the mean Average Precision (mAP) of all traffic sign types in a complex environment and the time required for each picture t = 1/N, in ms. First, you need to understand the confusion matrix, as shown in Table 4 [14]:

Table 4. Confusion matrix

Full size table

Calculate precision and recall:

$$ precision = \frac{TP}{{TP + FP}} $$

(6)

$$ recall = \frac{TP}{{TP + FN}} $$

(7)

In the formula: TP, FN, FP, TN respectively represent the negative sample that is incorrectly detected, the positive sample that is correctly detected, the positive sample that is incorrectly detected, and the negative sample that is correctly detected.

mAP: The calculation of mAP is divided into two steps. The first step is to calculate the average precision AP (Average Precision) of each category, and the second step is to average the average precision, which is defined as follows:

$$ AP_{i} = \sum\limits_{k = 1}^{N} {P(k)\Delta r(k)} $$

(8)

$$ mAP = \frac{1}{m}\sum\limits_{i = 1}^{m} {AP_{i} } $$

(9)

where: m is the number of categories. The evaluation index uses mAP and the time required to detect a picture. The mAP value is directly proportional to the detection effect, and the detection time is inversely proportional to the detection speed.

4.3 Improved YOLOv3 Algorithm Test

In order to compare the detection effect of the improved network, the collected Chinese traffic sign detection data set were used to train and test the improved YOLOv3 network model and SSD model. The precision/recall curves of the three categories are shown in Fig. 5. It can be seen that the accuracy and recall rate of the improved network are better than the YOLOv3 model. Among them: the SSD model has the lowest accuracy rate; the average accuracy of the three categories of improved networks are 85.82%, 80.56% and 80.12%, which are higher than the detection results of YOLOv3. In terms of real-time performance, based on an image of 416 × 416, the standard YOLOv3 and the enhanced YOLOv3 methods in this article require 31.4 ms and 34.2 ms to detect an image, respectively, which meets the real-time requirements (Table 5).

Table 5. Comparison of AP value, mAP and running time of the three categories

Full size table

4.4 Experiment to Improve the Detection Ability Under Foggy Conditions

The experiment is divided into 3 groups; as shown in Table 6; the training set and test set of the first group of experiments are all original pictures, so as to compare with the following models. The second set of training sets are the images in the foggy environment after image restoration based on the dark channel algorithm of guided filtering. The test set remains unchanged. The training set and test set of the third group use pictures after image restoration processing.

Table 6. Data set classification

Full size table

Table 7. Comparison of AP value and mAP value

Full size table

It can be seen from Table 7 above that the AP and mAP values of the first group are slightly better than those of the second group, but there is not much difference overall. Compared with the first two groups, the mAP value of the third group is about 2.5% higher, so we can conclude that the detection effect after dehazing based on image restoration on both the training set and the test set is the best.

5 Conclusion

This paper constructs a traffic sign target detection training data set in foggy environments. The dark channel prior algorithm based on guided filtering is used to add image restoration steps to enhance the detection ability under bad foggy weather. Based on the YOLOv3 network, in order to solve the problems of insufficient data set and small amount of data, a Mosaic image enhancement training method is proposed, which improves the training efficiency and model accuracy. Aiming at the poor detection effect of YOLOv3 in complex environments, an improved YOLOv3 algorithm with increased feature scale is proposed. Aiming at the problems of small and fuzzy targets in foggy conditions and inaccurate positioning, the loss function of the target detector is redesigned using the CIOU loss function to further improve its detection accuracy of traffic signs in foggy conditions. In view of the fact that there are not many samples and the accuracy is not high, transfer learning training is adopted. The detection effect has been greatly improved.

References

Wu, X.: Research on traffic sign recognition algorithm based on deep learning. Beijing Architecture University (2020)
Google Scholar
Chen, F., Liu, Y., Li, S.: Overview of traffic sign detection and recognition methods in complex environments. Computer Engineering and Applications 1–11, 22 June 2021
Google Scholar
Tiantian, D., Haixiao, C., Xi, K., Deyou, W.: Research on multi-target recognition method in traffic scene in complex weather. Inf. Commun. 11, 72–74 (2020)
Google Scholar
Mogelmose, A., Trivedi, M.M., Moeslund, T.B.: Vision-based traffic sign detection and analysis for intelligent driver assistance systems: perspectives and survey. IEEE Trans. Intell. Transp. Syst. 13(4), 1484 (2012)
Article Google Scholar
Stallkamp, J., Schlipsing, M., Salmen, J., et al.: Man vs. computer: benchmarking machine learning algorithms for traffic sign recognition. Neural Netw. 32, 323 (2012)
Article Google Scholar
He, K., Sun, J., Tang, X.: Single image haze removal using dark channel prior. IEEE Trans. Pattern Anal. Mach. Intell. 33, 2341–2353 (2011)
Article Google Scholar
Redmon, J., Farhadi, A.: YOLOv3 an incremental improvement. Computer Vision and Pattern Recognition (2018)
Google Scholar
Wu, Z.: Research on detection, recognition and tracking of ships based on deep learning in the dynamic background. Three Gorges University (2019)
Google Scholar
Du, X.: Research on traffic sign recognition based on improved YOLOv3 network in natural environment. Dalian Maritime University (2020)
Google Scholar
Gudigar, A., Chokkadi, S., Raghavendra, U., Rajendra Acharya, U.: Local texture patterns for traffic sign recognition using higher order spectra. Pattern Recogn. Lett. 94, 202–210 (2017)
Article Google Scholar
Gu, S., Ding, L., Yang, Y.: A new deep learning method based on AlexNet model and SSD model for tennis ball recognition. In: IEEE 10th International Workshop on Intelligence and Applications (IWCIA) (2018)
Google Scholar
Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Computer Vision and Pattern Recognition (2017)
Google Scholar
Bharat Singh, L.S.D.: An analysis of scale invariance in object detection-SNIP. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018) (2018)
Google Scholar
Zhang, Y., Li, G., Wang, L., Zong, H., Zhao, J.: A method for principal components selection based on stochastic matrix. In: 2017 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD) (2017)
Google Scholar

Download references

Acknowledgments

This study was funded by National Natural Science Foundation of China (grant number 51867006, 51867007) and Natural Science and Technology Foundation of Guizhou province of China (grant number [2018]5781, [2018]1029).

Author information

Authors and Affiliations

School of Electrical Engineering, Guizhou University, Guiyang, 550025, China
Luxi Ma, Qinmu Wu, Yu Zhan, Bohai Liu & Xianpeng Wang

Authors

Luxi Ma
View author publications
You can also search for this author in PubMed Google Scholar
Qinmu Wu
View author publications
You can also search for this author in PubMed Google Scholar
Yu Zhan
View author publications
You can also search for this author in PubMed Google Scholar
Bohai Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xianpeng Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qinmu Wu .

Editor information

Editors and Affiliations

College of Communication Engineering, Jilin University, Jilin, Jilin, China
Zhihong Qian
Department of AI & ML, Vardhaman College of Engineering, Hyderabad, Telangana, India
M.A. Jabbar
College of Technology, Indiana State University, Terre Haute, IN, USA
Xiaolong Li

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ma, L., Wu, Q., Zhan, Y., Liu, B., Wang, X. (2022). Traffic Sign Detection Based on Improved YOLOv3 in Foggy Environment. In: Qian, Z., Jabbar, M., Li, X. (eds) Proceeding of 2021 International Conference on Wireless Communications, Networking and Applications. WCNA 2021. Lecture Notes in Electrical Engineering. Springer, Singapore. https://doi.org/10.1007/978-981-19-2456-9_70

Download citation

DOI: https://doi.org/10.1007/978-981-19-2456-9_70
Published: 13 July 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-2455-2
Online ISBN: 978-981-19-2456-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Traffic Sign Detection Based on Improved YOLOv3 in Foggy Environment

Abstract

Similar content being viewed by others

Road Sign Detection and Recognition of Thai Traffic Based on YOLOv3

The Improved Framework for Traffic Sign Recognition Using Guided Image Filtering

LLTH-YOLOv5: A Real-Time Traffic Sign Detection Algorithm for Low-Light Scenes

Keywords

1 Introduction