Keywords

1 Introduction

Intelligent Traffic Monitoring and Management System (TMMS) has an increasing significance as the traffic infrastructure is advancing. With the advent of technology, intelligent TMMS has taken over conventional TMMS and has become an emerging research area. It includes detection, recognition and fine-grained classification of vehicles, automatic license plate recognition and various kinds of traffic analytics. Its applications vary from commercial domains to national security levels e.g., surveillance, security system, traffic congestion avoidance, accident prevention, advanced driver assistance systems, access control systems, intelligent parking systems, electronic toll collection, etc. These applications necessitate the development of more robust techniques for Intelligent TMMS.

The fine-grained classification of vehicles is the process of categorizing a vehicle into its make and model, generally known as Vehicles Make and Model Recognition (VMMR). Many techniques are used for this classification; some by considering the whole vehicle or others through a certain area of vehicles (e.g., frontal view, grill, or logo). A vehicle logo is a distinct appearance-based feature which is an important part of this whole process. Vehicle logo detection/recognition is required for many applications of TMMS. It can be used for estimation of brand reputation and traffic monitoring etc. Its recognition also plays an important role in the scenarios where the authenticity of vehicle’s make is doubtful. For example, the scenario where a stolen vehicle’s license plate is replaced with another one, or when a license plate is obscured or illegible, which affects the readability process in any license plate recognition application. The logo detection can be used as an authenticity validation step for this purpose. However, identification through vehicle logo is a challenging task due to its smaller size. Detection and recognition of small objects is a fundamental challenge in Computer Vision and is a growing research area these days. Many researchers have achieved high performance for small objects detection using Convolutional Neural Networks (CNNs). CNN is able to learn various image-based features from large-scale data without the need of any human intervention. Compared to different kinds of small objects, the vehicle logo has more complex rich content information at times due to their unusual designs. These small sized logos are also affected by background clutter and noise. Similarly, most of the times, logo is blurred and not properly illuminated, and even not visible due to rainy or snowy weather which causes difficulties in features identification. An overview of some challenges associated with this logo detection and recognition task are provided below:

  • Logo Size and Resolution. Logos account for only ~1% of the frame obtained from camera. Due to camera distance, the resolution is low at times. These issues affect the feature extraction process.

  • Illumination Variance. Colors may vary under different illumination conditions especially in low-light conditions, where identified features may not be that effective. Similarly, vehicle logos usually have high reflectance luminous which causes issues in features extraction.

  • Logo Location. The locations for vehicle logos vary among various vehicle manufacturers. Some manufacturers place their vehicle logos on radiator grilles, while some place them on the vehicle’s front hood. This causes difficulties in the identification of logo boundaries.

  • Background Clutter and Noise. Logo is a comparatively smaller part of vehicle due to which there are many sources of background interference including radiator grille, bumper and other small objects. These objects distract the feature extraction and results in increase of false detections.

This research has modified a state-of-the-art license plate detection model IWPOD-NET, to extract the desired ROI and then further unwarp it to fix the distortions in varying angles [32]. The used surveillance recordings are low in resolution and cause the accuracy to decrease. Therefore, a combination of different CNN models are used for detection and recognition purposes. The results show a significant improvement in overall detection and recognition of vehicle logos. The main contributions of this paper are highlighted as followed.

  • An offline vehicle logo detection and recognition approach that uses video feeds acquired through low-resolution surveillance cameras for authenticity validation of vehicles.

  • Improved logo detection and classification based on a modified state-of-the-art license plate detection network.

  • Pose variance analysis to identify the varying angles on which the proposed approach can effectively work.

  • A custom vehicle logo dataset, VL-10, containing logos of 10 commonly used vehicle makes and models.

The rest of the paper is organized as followed: Sect. 2 provides a concise overview of the related research work. Section 3 provides a detailed working methodology of the proposed approaches and gives an overview of the proposed dataset. Section 4 discusses the test cases used to conduct this research and presents a comparative analysis of the results obtained from the experimentation. Finally, Sect. 5 concludes the papers and provides a future direction of this research.

2 Related Works

Work in different domains of intelligent traffic monitoring and management system has been contributed by many researchers. Setchell discussed monitoring and management of road traffic using computer vision in details [1]. The paper presented two vision-based traffic monitoring systems including license plate recognition system and a road traffic monitoring system to track vehicles. Different ideas for vision-based traffic monitoring and management systems have been proposed [2,3,4,5,6]. VMMR is also an important part of traffic monitoring and management systems.

There have been many related approaches using conventional algorithms such as Scale-invariant Feature Transform (SIFT), Histogram of Oriented Gradients (HOG) and Sequential Pattern Mining (SPM) etc. [7,8,9]. Usually, logo is detected through the reference of license plate [10,11,12]. After the detection of license plate, different techniques are then implemented to segment the logo [13,14,15]. Psyllos et al. introduced an enhanced SIFT-based features extraction mechanism for vehicle logo recognition. The experiment results showed a significant improvement of recognition accuracy compared to conventional SIFT algorithm [16]. Mao et al. proposed a novel vehicle logo detection method which highlighted logo regions using direction filters and saliency map. Information entropy was then used to choose the binary image and for precise location of logo region [17]. Yu et al. proposed a system for vehicle logo recognition based on Bag-of-Words (BoW) [18]. The vehicle logo images were represented as histograms of visual words and were classified by SVM in three steps. These steps involved SIFT feature extraction, quantization of features into visual words and histograms of visual words with spatial information. Wang et al. presented a vehicle logo detection/recognition system which uses edge features to find logo in the rough logo region detected through prior knowledge [19]. Then, a combination of template matching and HOG were used to identify the category of logo.

These conventional techniques are usually dependent on hand-crafted features, which lack robustness in certain conditions. These conventional algorithms are relatively easier to implement, however, these algorithms are computationally expensive. Compared to these conventional approaches, various deep learning-based methods are also explored by researchers. The deep learning-based algorithms have significantly better performances in detection of objects under different complex environments as discussed in Sect. 1. Tong. K et al. presented an in-depth analysis of different deep learning-based small objects detection approaches [20]. The authors showed with experimentation that techniques such as multi-scale feature learning, data augmentation and different training strategies can significantly improve the small objects detection. Similarly, these state-of-the-art techniques have also been tried and tested for the task of small objects such as vehicle logo.

Yang et al. proposed a modified YOLOv3 model for vehicle logo detection in complex scenes [21]. The approach showed good accuracy on the proposed vehicle logo detection dataset. Zhang et al. proposed a real-time lightweight vehicle logo detection system based on deep convolutional networks [22]. The experimentation showed a significant improvement in accuracy of vehicle logo detection task. Jiang et al. proposed an improved YOLOv4 model which enhanced the backbone feature extraction network for efficient extraction of small vehicle logos [23]. It further used convolutional transformer to reduce influence of complex backgrounds. Yang et al. proposed an improved YOLOv2 network which provided fast and accurate vehicle logo detection [24]. The experimental results show that this approach significantly improved accuracy and speed of logo detection.

The surveillance recordings usually have low resolutions which makes the logo detection challenging, particularly at varying angles. Various techniques are used to improve the quality of video prior to the detection of objects. Some researchers used Super Resolution algorithms to overcome these limitations and to convert these low-resolution images to high-resolution ones [25,26,27,28,29]. Others used near-field and far-field video enhancement techniques [30]. Some researchers suggested the use of ROI rate control scheme for surveillance videos [31]. However, this proposed research work adjusts the orientation of logos at varying angles, which improves the overall vehicle detection and recognition task.

3 Working Methodology

3.1 Selected Models

A unique pipeline using a combination of different deep learning models is designed in this study for improved vehicle detection and recognition. The specific details of the selected models are as follows:

Unwarping Model (IWPOD-NET).

Improved Warped Planar Object Detection Network (IWPOD-NET) [32] is able to detect four corners of vehicle license plate in a variety of unconstrained scenarios. It then unwarps the license plate to a fronto-parallel view and eliminates perspective related issues. This network was introduced to tackle license plate detection with varying viewpoints. However, this research modified the model to serve as basis for ROI localization and orientation adjustment of logos. The localized area above the license plate was extended by a variable (assuming that the logo placement is always on the top of license plate) to include the logo part as well in complete ROI, which was then unwarped. For example, if the height of license plate is 3 units and variable is set to be 2 units, then the ROI will be extended by 2 × 3 units vertically upwards. Workflows for both original and modified network are shown in the following Fig. 1. The characters on the license plate of the vehicles utilized in this study have been concealed in order to maintain the privacy of the vehicle owners.

Fig. 1.
figure 1

Workflows for both the Original IWPOD-NET (a) and Modified IWPOD-NET (b) models

Logo Detection Model (YOLOv5m).

Advent of YOLO models improved the real-time object detection. Many versions of YOLO have been introduced so far, which have significantly improved object detection. This research utilizes YOLOv5, which has different architecture series including P5 and P6. These series are further divided into nano (n), small (s), medium (m), large (l) and extra-large (x) that vary on the basis of parameter sizes and provide optimizations for particular image sizes. For example, P5 is optimized for training images of sizes 640 × 640. For the desired task, YOLO-v5-P5 medium sized model is used in this system of logo detection. Due to low classification accuracies, the model was used to detect the presence of logo only. For a comparative analysis, results are also presented in Table 2 for the approach when YOLO was used for both detection and classification purpose. The unwarped image from previous step is then fed to YOLO-v5m model for the detection of logo. The workflow is shown in following Fig. 2. As shown in the figure, logo is successfully extracted from the complete ROI.

Fig. 2.
figure 2

Logo Detection using YOLO-v5m

Logo Classification Model (EfficientNet).

EfficientNet [29] introduced by Google is a commonly used CNN architecture for image classification purposes. The model achieves state-of-the-art performance, in addition to being faster and lightweight. This research work trained EfficientNet on logos in the proposed dataset. Firstly, all images were cropped to the logo part, based on dimensions from the annotated images that were fed to the object detection model. Then these cropped logo images were used for training the classification model. In testing phase, once the logo is detected in ROI, it is then fed to the classification model as an input. It then classifies the logo into its respective class. The addition of EfficientNet has significantly improved the results as discussed in Sect. 4.

3.2 Dataset

A high-quality dataset is essentially required for tackling various computer vision tasks involving object detection and recognition etc. The lack of a local dataset can become a huge challenge when performing these tasks for any particular region with a specific complex traffic environment. To the best of our knowledge, there is no standard local dataset available. Therefore, a local dataset of vehicles, VL-10, is developed where logos are either prominently visible or partly visible due to varying angles as in surveillance environment as shown in Fig. 3.

All the logos are self-collected from various local and online sources (e.g., different car selling websites with publicly available images). The logos are categorized into ten different vehicle classes, including Honda, Suzuki, Toyota, Kia, Hyundai, Mitsubishi, Daihatsu, Faw, Hino and Nissan. A total of 500 and 50 images per class were selected to form the training and validation sets, respectively. The data augmentation was then carried out by adding blur and noise to each image. Therefore, the total images per class became 1500 and 150 for training and validation sets, respectively.

Fig. 3.
figure 3

Some Images from VL-10 Dataset

4 Results and Discussions

4.1 Test Cases

Different surveillance scenarios have been chosen for test/evaluation of the proposed model pipeline. These include video streams from toll control (i.e., CCTV video from height), law enforcement (road-level cameras), parking lot access validation (cameras installed at various entry/exit points), and dashboard cameras. The highlights of each test case are provided in Table 1. Similarly, example frames from each test case are shown in Fig. 4.

4.2 Comparative Analysis

Three different approaches were tested, and their comparative analysis is accordingly presented in this section.

Approach 1 - Without IWPOD-NET Using YOLOv5 for Both Detection and Classification.

The YOLOv5 model was trained on VL-10 dataset for both object detection and classification tasks. All test scenarios were first passed to this model to get baseline accuracies. The mAP for detection and classification is presented in Table 2 for each test case. IWPOD-NET was not used in this approach so that results could be compared, and the impact of IWPOD-NET could be observed.

Approach 2 - With IWPOD-NET Using YOLOv5 for Both Detection and Classification.

A modified IWPOD-NET was introduced. The detected vehicles were passed to IWPOD-NET for the ROI extraction and unwarping. After extraction of ROI, it is then passed to the YOLOv5 model trained on VL-10 dataset. Here, YOLOv5 detects and classifies the logos into their respective classes. Results obtained in the form of mAP are shown in Table 2. It can be seen that results were improved to a certain level. The highest improvement shown is in the Law enforcement test scenario. This is due to the fact that this test set has side camera viewpoints and IWPOD-NET helped to unwarp these side views really well. On the other hand, the lowest improvement is shown by parking lot access validation test set, which is apparently because camera viewpoint was already frontal, and the license plates were not very oriented in this case.

Approach 3 - With IWPOD-NET Using YOLOv5 for Detection and EfficientNet for Classification.

In the third approach, while using IWPOD-NET and YOLOv5, another model, EfficientNet was introduced. YOLOv5 was trained to detect the presence of logo only. All the classes of VL-10 dataset were trained on EfficientNet which was then used for logo classification. Firstly, each test dataset was passed to modified IWPOD-NET for the extraction and unwarping of ROI. The ROI was then passed to YOLOv5 model for the detection of logo. After the logo was detected, it was then passed to the EfficientNet for classification of the detected logo. The mAP for the detection by YOLOv5 and the accuracy of classification by EfficientNet is shown in Table 2. Considerable improvements in the results were observed. The highest improvement in the detection were observed in the Dashcam testing case. The margin left was due to the motion blur present in this case since both the objects (vehicle/logo) and the camera were in motion. Highest classification accuracy was produced in the case of parking lot access validation. The margin left in this case was due to the inclusion of night images in dataset where the logos were not perfectly visible, and the use of streetlights create the issue of reflection. Approach 3 stood out from the other two approaches. Therefore, this pipeline could be used as an offline tool for logo detection and classification in various surveillance environments.

4.3 Pose Variation Calculation

Pose variation analysis was conducted to determine the capacity of the proposed model to detect logos accurately at varying angles. For this purpose, some images were captured in the daylight similar to the settings in parking lot access validation dataset. Different angles of each vehicle were captured as shown in Fig. 5(a). Poses were categorized by comparing the oriented bounding boxes and rectangular bounding boxes of license plate as shown in Fig. 5(b). IWPOD-NET detected the four corners of license plate (green dots). It then made an oriented bounding box joining these four corner points (blue bounding box). Another algorithm was used to make rectangular bounding boxes (red bounding boxes). The upper lines for both bounding boxes were used to identify the poses. This was done by calculating the angle “θ” subtended between both lines.

According to the observations, the proposed module was able to detect logo in the range of 0–30° while it was mostly undetectable beyond this range. In the range 0–10°, the module’s logo detection mean Average Precision (mAP) was 0.91 while the classification accuracy was 0.97. In the range 10–20°, the module’s logo detection mAP was 0.85 while the classification accuracy was 0.90. On the other hand, in the range 20–30°, the module’s logo detection mAP was 0.80 while the classification accuracy dropped to 0.75. Angles from 0–40° are depicted in Fig. 6 along with the visual appearance of logos.

Table 1. Highlights of video data for different test cases
Table 2. Results for each testcase using all the three approaches
Fig. 4.
figure 4

(a) Single frame from each test case; (b)–(d) Individual vehicles in respective datasets of each test case along with their original logo containing ROIs and their respective unwarped ROIs

Fig. 5.
figure 5

(a): Photos capturing from different angles; (b): angle calculation for pose variance analysis

Fig. 6.
figure 6

Pose variance analysis on different angles along with respective logo appearance

5 Conclusion and Future Work

Intelligent TMMS is grasping researchers’ attention due to the evolving infrastructure of cities. This is not only becoming complex day-by-day but also demanding more innovative solutions. Vehicle classification is now shifting towards more fine-grained classification. For this, all such features of vehicles are being considered which are distinct and specific. One such feature is the vehicle’s logo. Though, it stands out as a unique feature depicting the information regarding vehicle make/model, yet its small size is a complex challenge for its detection and recognition. The detection of small sized objects like a logo in a surveillance environment where vehicles themselves appear to be very small is nearly impossible. Video enhancement techniques increase computational cost and time. This paper specifically targeted surveillance environment and presented a novel method of detection and recognition of vehicle’s logo. A modified IWPOD-NET was introduced to overcome the orientation issues. Then YOLOv5 was used to detect logos and EfficientNet was used to classify logos. This pipeline was tested on four surveillance environments namely toll control, law enforcement, dashcam, and parking lot access control. Comparative analysis showed accuracy improvement with this proposed approach in each testing case. Pose variance analysis was also performed to determine the orientation limits to which this approach can work. It was observed that the proposed approach worked best in the 0–30° range, while it was mostly undetectable beyond this range. Secondly, a new dataset, VL-10 was presented which focused on local environment settings. The proposed technique in this research work improved the results for logo detection and can therefore be used in offline surveillance environment.

As part of a future work, the deep learning architecture can be optimized to detect and recognize vehicle logos in real-time. Similarly, a completely new deep learning architecture can also be designed particularly for this vehicle logo task.