1 Introduction

Advancement in the automobile industry has provided many smart cars on the road in different traffic environment and assist them in automated parking. Technology expansion and ubiquitous devices assist drivers in locating themselves on the road and understanding the current state of road traffic, distance to be travelled, best route identification, and so on. The cruise control in the vehicle helps the drivers with smart driving but also requires a lot of attention. Today’s vehicles are equipped with advanced driving assistance systems that make driving safer and more comfortable. A camera mounted on the car helps the system to recognise and detect traffic signs and alerts the driver about various road conditions, like if construction work is ahead or if speed limits have changed. In real-time, the identification of traffic signs has many limitations and complexity. A small mistake in identifying a traffic sign can lead to disastrous consequences, even resulting in death. In the current era, Traffic Sign Recognition (TSR) is critical in ADAS because it helps the driver drive the car safely and minimises the accident rate. This requires good computation power and responds to real-time traffic. TSR plays a vital role in identifying the traffic sign for ADAS [2]. ADAS has a vehicle-mounted camera to acquire traffic signs to recognize and understand traffic signs from the actual road to control the vehicles [7]. It has many challenges, such as complex backgrounds, daily natural conditions, ageing signs, and changes in light [23].

Many researchers address the above challenges based on inherent information using deep learning. The TSR will assist in detecting traffic signs on the roadside. TSR will identify the signboard’s location, size, and orientation for classifying the signs by discriminative information. Deep learning algorithms based on artificial intelligence are critical in resolving traffic sign recognition challenges. The Deep Learning model improves the efficiency of the driving experience and ensures safe driving in ADAS. It has more cameras and sensors to acquire information about the driving environment and conditions. ADAS will inform the driving rules and understand the traffic signs for the users [3].

Several object detection algorithms in ADAS, such as SSD, Faster RCNN, RFCN, and YOLO, are used in mobile devices. YOLO has proven to be a viable competitor to CNN in real-time object detection. Related work shows that detecting smaller objects moving at more incredible speeds has always been a challenge. YOLO V3 and YOLO V4 tiny show significant developments in this capability [22]. Enhanced Local Binary pattern is used to detect the face in the images where images are split into sub images [20]. An better-quality of the random sampling mechanism called Supervised Relative Random Sampling is used to create a balanced dataset from a highly imbalanced [17]. Even though numerous applications are developed, supervised learning and pattern recognition are crucial areas of research in information retrieval, knowledge engineering, and image processing [16]. The MobileNet V2 model was verified to be competent with enhanced accuracy on lightweight computational devices, [21]. A well organised system is proposed to detect the traffic signs and then categorise them. The image augmentation method helps to generate more training data. The customized CNN model has more neurons in each layer than the pre-existing CNN model to classify road signs.

The latest vehicles have in-built technology and supported computing power to make the drive more comfortable. Smart driving has many advantages and also has its disadvantages. The goal is to identify the traffic sign and process the image in a minimal processing time. In such cases, the focus should be on using state-of-the-art methods and technologies. To provide a solution for detecting a traffic sign, a traffic dataset consisting of 43 different classes of traffic signs commonly used in most European countries has been taken from the German Traffic Sign Recognition Benchmark collection, which comprises 60,000 images separated into 43 categories. A custom Convolutional Neural Network model is used to classify the traffic signs with higher accuracy than the existing models. Image augmentation techniques are used to expand the dataset artificially, and that allows one to learn how the image looks from different perspectives, such as when viewed from different angles or when it looks blurry due to poor weather conditions. This will make the dataset more balanced. A Lucy-Richardson nonlinear filter is used to remove any noise from images without blurring the edges, and histogram equalisation is used for brightening the images. The progression of deep learning algorithms like YOLO v3 and YOLO v4-tiny is used to provide faster detection rates without sacrificing classification accuracy.

The following are the remaining sections: The literature review is summarised in Section 2; the technique and how the models function are explained in Section 3; the Experimental results are displayed in Section 4; and the conclusion and future scope are presented in Section 5.

2 Literature review

In a fast-moving life, sometimes people fail to recognize the road signs and consequently violate the rules, and in majority of the cases, the consequences are severe. In order to limit the number of fatalities, several studies have been undertaken in this area. Researchers employed several techniques, including HOG, SIFT, LBP, and SVM, to classify traffic signs and notify drivers. The recognition rate of these algorithms is not satisfactory. Recently, various CNN models have evolved for Traffic sign recognition [24]. Cire san et al. applied augmentation techniques to improve the performance of CNN and multi-layer perceptron’s [5]. To distinguish small items from signboards, Yuan et al. [25] suggested a multi-resolution hybrid system, a sign detection architecture. A vertical spatial sequence attention module (VSSA) gathers additional relevant details. Mobile applications incorporate Augmented Reality and GPS-based tracking. As a result, users can use their smartphone’s coordinates to find nearby resources in an accessible way based on the direction. The AlexNet structure of CNN had produced a 92.63% accuracy rate [9]. Working with massive data sets and a high number of parameters is made easier using Google Net architecture. On the other hand, large data causes network overfitting, lowering the performance to 80.5 percent [4].

VGG CNN outperformed all the other existing models by a wide margin. Their model improved the performance by lowering the hyperparameters. Additionally, the network’s batch normalization and global average pooling layers help improve performance without adding too many parameters. The combined Faster-RCNN architecture and Online Hard Examples Mining makes the system efficient in detecting tiny symptoms of impairment by erasing the pool4 layers and employing compression for ResNet [8]. TSR method based on Lenet-5 network improved the overall performance. The classifier in the Lenet-5 network performs better than the CNN, SVM, and classic Gabor classifiers [26]. A CNN-based TSR implemented has the feature extraction was carried out using CNN; the classification was performed using MPPs. MPPs have greatly improved the accuracy of recognition [18]. An attempt to use the YOLO algorithm was made and compared with existing computer vision algorithms. Vehicles, pedestrians, traffic signs, and lights are among the objects detected. A pre-trained model trained using the COCO dataset was used to establish the model’s weights. The model showed significantly better results than the pre-existing algorithms when applied in real-world scenarios [6, 13]. Yolo algorithm was also used to detect triple riding and speed violation in the traffic systems [14].

CNN has shown excellent results earlier [15] in traffic sign classification and is recently used in traffic sign detection. Yihui Wu et al. applied deep learning to reject non-traffic sign [10]. A fully convolutional network and deep CNN for classification are performed and extended to recognize traffic signs [1]. A combined CNN-LSTM network model was deeply studied to improve the detection rate of novel coronavirus using X-ray images. The authors proposed an automated system to detect from the X-ray images and can improve their diagnostics rate. Its overall performance was indicated with the ROC curve and their area under the curve is 99.8% [11]. An automated face mask detection was introduced using deep learning methods for a smart city environment and the proposed method using classification network provides 98.7% accuracy [19]. Accident falls are increasing day by day , to detect the accident falls using several deep learning techniques like convolutional neural network, Auto-encoded systems, Long Short Term memory and tried to improve the performance with the existing methods [12]. Table 1 presents a summary of the various findings with their drawbacks in the existing models.

Table 1 Summary of the literature survey

Researchers face challenges in detecting and predicting images in poor weather conditions or blurry images; this article addresses those challenges. The dataset is expanded using the image augmentation technique to get the expected results even for poor-quality images. The Lucy-Richardson nonlinear filter is used to remove any noise from images without blurring the edges, and histogram equalization is used for brightening the images, thereby enabling the proposed model to detect even in poor weather conditions. The second issue addressed is the slower response time. The proposed model employs YOLO v3 and YOLO v4-tiny to provide faster detection rates without sacrificing classification accuracy.

3 Methodology

The proposed system, as shown in Fig. 1 has five stages: capturing the video, preprocessing the video, removing the noise from the extracted images, detecting the traffic signs, and identifying traffic signs in the video frames. HD cameras are mounted on the cars to capture video, which is then sampled according to a particular frame rate. The environmental conditions enhance the images. The detection stage must be quick and efficient for a real-time detection system to function correctly. The output of the detection stage is fed into the noise removal stage, where a median filter is applied to the images for smoothening. Then the images are deblurred using the Lucy Richardson filter. Later, histogram equalization is applied to the images to increase the brightness of the images. In the recognition stage, the proposed custom-CNN model is used for classification. The YOLO algorithm is used to detect signs, and the trained custom-CNN model is used to classify captured signs into their sub-classes.

Fig. 1
figure 1

Proposed model

Algorithm 1
figure a

Proposed approach.

The images are divided into three sub-classes, once they are collected using the object detection model, as illustrated in Fig. 2.

Fig. 2
figure 2

Sign classification

3.1 Dataset

The GTSRB Dataset is used to train the custom CNN model in the proposed system. Figure 3 shows the forty-three different traffic signs classes for training the model. The size of each image in this dataset is 32 x 32. It also shows that the signs in this dataset are divided into four main classes (prohibitory, danger, mandatory, and others).

Fig. 3
figure 3

43 classes of images divided into 4 categories

3.2 Image augmentation

Due to the imbalance in the dataset, the predicted output of the model trained on this dataset would be incorrect. To ensure that the dataset is balanced, randomly remove images above the threshold from each class. The threshold is 250 images per bin. To balance the dataset, image augmentation is done. Each image is randomly zoomed, panned, and brightened, as shown in Fig. 4. In this way, the balanced dataset is expanded by adding these images.

Fig. 4
figure 4

Zoomed, panned, and brightened images

3.3 Image acquisition and pre-processing

Traffic sign video is captured with a HD camera installed on the vehicle. With its infrared low light capabilities, the device can also shoot high-quality photographs at night. The image reduction process samples the video at a certain sampling rate in order to obtain the image frames. After each image frame is pre-processed, it is sent for detection, classified into one of the four definite groups. The detected sign is then compared with the existing sign. The frequency of processed frames could be raised or lowered depending on the speed of the moving vehicle.

Images may appear hazy since videos are recorded with cameras mounted in speeding vehicles. Additionally, weather conditions, time of day, location, and brightness affect photographs’ clarity, contrast, and brightness. Images may appear hazy since videos are recorded with cameras mounted in speeding vehicles. Due to this, detection will be challenging, and some pre-processing will be necessary to decrease their impact under various settings.

When captured by a moving vehicle, images will have a blurred effect, and edges will appear vague and washed out. Motion blur can be removed from images with the efficiency of frequency-domain techniques. Due to atmospheric turbulence, images can also be corrupted by white Gaussian noise. Motion blur and environmental disruption are shown in Fig. 5a. Figure 5b displays an image that has been median filtered, processed using the Lucy-Richardson filter, and then equalized by histogram for various lighting conditions.

Fig. 5
figure 5

5a. Blurred image , 5b. Pre processed image

3.4 Building the custom CNN model

As shown in Fig. 6, a CNN model for classifying 43 sub-classes is trained using Keras and TensorFlow. The model is built using mini-batch stochastic gradient descent with the Adam optimizer. Figure 7 shows the division of the dataset into training, validation, and test sets. As this is a categorical classification, the labels are converted into one hot encoded array, as shown in Fig. 8. The first column represents the class ID where 0 represents Prohibitory, 1 represents danger, 2 is mandatory, and 3 is for others. The other column displays the actual bounding box for each class (x, y, w, h) within the image. The model is computed with a 0.0001 learning rate, a loss function of “categorical cross-entropy,” and trained with a batch size of 32 with 10 epochs.

Fig. 6
figure 6

Customized convolutional neural network model

Fig. 7
figure 7

Distribution of training, validation, and testing for each class

Fig. 8
figure 8

Categorical classification

3.5 YOLO model for object detection

One of the best algorithm for object detection in real time is YOLO. The pre-trained YOLO can detect and classify 80 COCO objects (Common Objects in Context). Nevertheless, the YOLO architecture is modified to fit the dataset with a different number of classes in the proposed model. Two versions of YOLO are used, YOLOv3 and YOLOv4Tiny. Both versions incorporate the same neural network concept used in YOLOv1, which unifies object detection by dividing the input image into grids Every grid’s centre determines its bounding box, and each box determines four coordinates, where ‘tx’ and ‘ty’ are the centre x and y coordinates, and ‘tw’ and ‘th’ are the coordinates of the box as shown in Fig. 9.

Fig. 9
figure 9

Bounding boxes in YOLO v3

Furthermore, bounding boxes predict each object’s label and confidence score based on its location. . There is a fixed dimension of the output tensor with S as size of the grid, B as the number of enclosing rectangles, and C as the number of named categories. The output from each bounding box is the label with the highest score. These bounding boxes are discarded if the output label confidence falls below a threshold specified by the user, which is 0.68. Yolo v3 uses binary cross-entropy loss to perform multi-label classification, which generates a probability for the observed object to belong to each label. Using the following relation, the output tensor size is calculated. S x S x [3 x [(4 + 1)+n]]

The number of classes is 4, i.e., prohibitory, danger, mandatory, and others, so there will be 27 filters in the S*S grid. The batch size and number of subdivisions were increased, the classes were lowered to 4, and the filters were lowered from 255 to 27. After these modifications, both YOLO v3 and YOLO v4-tiny are trained on the input images gathered by the vehicle’s mounted camera.

3.6 Implementation

Once the model has been trained, it is saved and used for predictions. In the first section, the YOLO model is used, with the input being a video shot by the vehicle’s camera. . Frames are extracted from that video and passed to the CNN model to classify the signs. After this the image will be classified as one among the 43 classes. The user will get a “No Sign Detected” prompt if a particular image does not contain a traffic sign. Using the “model predict” function, the model calculates a prediction based on the image’s classification to determine which class it belongs to after returning a list of values. Multiple iterations revealed that, even when an image does not belong to any of the 43 classes, it still classifies it into one of the 43 classes, but the confidence score is relatively low. Therefore, the confidence level for separating actual traffic signs is 0.68. Classes are categorized from the confidence level of 0 to 1 in “model predict.” As soon as an image has been classified, the sign text and its image are displayed to the driver on the vehicle’s on-board display, as shown in Fig. 10a and b.

Fig. 10
figure 10

10a. Original image, 10b. Output image

4 Experimental result analysis

The execution was carried out on a computer with an INTEL CORE i5 processor, 8 GB of RAM, an AMD Radeon RX 5300 graphics card, and a 2TB hard drive. The traffic sign identification scripts are developed in Python 3.8.9 using OpenCV, NumPy, Pandas, PIL, and Darknet, among other libraries. The YOLO v3 model obtained a detection rate of 1.53 frames per second, whereas the YOLO v4-tiny model obtained a detection rate of 7.72 frames per second as shown in Fig. 11a and b.

Fig. 11
figure 11

Comparison between YOLO v3 and YOLO v4-tiny

The findings are based on 29330 training samples collected at random using a video camera mounted on a moving vehicle and on which the model conducted its detection procedure.

A total of 51900 images were present in the original dataset, but to balance the dataset, the image count was reduced to 30000. After image augmentation, the image count increased to 41900. 12570 images were used for testing and validation.

A comparative study was made between YOLO v3 and YOLO v4-Tiny based on the quantitative measures shown in Fig. 12. Training and Validation accuracy and loss for 10 epochs is shown in Fig. 13.

Fig. 12
figure 12

Comparison between YOLOv3 and YOLOv4-tiny

Fig. 13
figure 13

Training and validation accuracy and loss for 10 epochs

The proposed model’s accuracy is 95.85%, the F1 score is 0.9528, and the precision value is 0.91. A comparative analysis is performed to compare the efficiency of the existing systems with the suggested method. The current models fail to achieve higher accuracy when the images are captured in poor weather conditions or the captured images are blurry due to the fast motion of the vehicle.

Compared to the MobileNets and Inceptionv3 algorithms, the proposed model implementing YOLOv3 and YOLOv4-tiny has produced the highest accuracy, as shown in Fig. 14.

Fig. 14
figure 14

Analysis of the proposed method in comparison to the existing algorithms

ROC curves are appropriate when the observations are balanced between each class. As the data is equally balanced between each class, an ROC curve is drawn for the proposed methodology to compare the overall performance. The German Traffic Sign Recognition Benchmark collection dataset was classified with good sensitivity, specificity, and F1-score (99.2%, 99.1%, and 98.85%). The value of sensitivity (99.2%) means that the sum of the false negatives is low, while the specificity value (99.1%) means that the sum of the true negatives is high.

Furthermore, the ROC curves are added between the true positive rate (TPR) and the false positive rate (FPR) to compare the overall performance as shown in Fig. 15. The area under the ROC curve (AUC) was calculated to be 99.3% for the proposed model, customised CNN. It was observed that the proposed network outperforms the best with good efficiency.

Fig. 15
figure 15

ROC curve

For ROC curve:

$$True Positive Rate = \frac{True Positives}{(True Positives + False Negatives)}$$
$$False Positive Rate = \frac{False Positives} {(False Positives + True Negatives)}$$

5 Conclusion and future scope

The proposed approach automatically detects and classify the various traffic signs. The system uses YOLO v3 algorithm and a customized CNN model. Based on experimental results and quantitative measures assessing multiple scenarios, the proposed method is proven reliable for the proper implementation of the system. The use of image augmentation techniques and filters like Lucy-Richardson remove the noise from the images without blurring the edges, thus enabling the system to detect traffic signs in lousy weather conditions. This is a significant contribution to driving since it would make driving easier and safer without compromising safety.

Furthermore, this system can be integrated without the need for a lot of hardware, which expands its reach. System works better when the vehicle is moving in a straight-line path, but it can also be enhanced to give the same results on the turns. The framework can be enhanced to provide an embedded warning system with a camera in the vehicle’s center. As an extension of this system, it can recognize traffic signals and inform the user of the time it will take to reach those signals and their status.