1 Introduction

During the past decade, car make and model recognition (CMMR) has been an interesting research topic in intelligent transport systems (ITS). CMMR could be a robust method to significantly improve the accuracy and reliability of car identification. More information can be drawn from a CMMR system in terms of manufacturers, models, shapes, and colors, etc., to help specify cars. This leads to more accurate results, rather than using only a license plate. In addition, CMMR can assist the police to detect suspected or blacklisted cars, or unknown license plates, via CCTV cameras in surveillance systems.

Many techniques have been presented in the past decade. The majority of the techniques are feature based. Low-level features such as edge contour, geographical parameters, and corner features are used in the process as well as high-level features, for example, scale invariant feature transform (SIFT), speeded up robust features (SURF), pyramid histogram of gradient (PHOG), and Gabor features. In some techniques, both low-level and high-level features are combined. In addition, not only are many kinds of feature used to classify CMM, but also several classifiers have been explored in order to increase the recognition rate.

The majority of previous studies have developed solutions to the problem during daytime when most vehicle features are visible. However, there are a few methods designed to identify car makes and models at night. However, the accuracies of those methods are rather low.

This paper presents a new CMMR method for scenes under limited lighting conditions or at night. The proposed method uses new feature selection methods and a one-class classifier ensemble to recognize a vehicle of interest. At night or under limited lighting conditions, cars’ front headlights tend to be on. Due to brightness and glare of headlights, most important features captured using the front view are, therefore, blurred or incomplete resulting in serious recognition inaccuracy. Therefore, in this work, the authors propose that a vehicle’s features, to be used in the CMMR process, should be captured from the rear view where features are less prone to brightness and glare. Moreover, the distinctive shapes of a vehicle’s back lights and the license plate position contain information which can be utilized in the recognition process. A genetic algorithm is then applied to select the most optimal subset of features for the recognition process. To increase the vehicle identification accuracy, the majority vote of three classifiers is employed to classify the captured salient features.

The following section describes related work including background knowledge, past research techniques, and contributions. The proposed system architecture and methodology are presented in Sect. 3. The experiments and results are shown in Sect. 4, followed by the conclusions in the last section.

2 Related work and contributions

2.1 Related work

Over the past decade, many studies have been proposed for CMMR. In previous techniques, feature-based approaches have been commonly used, such as geographical feature [1], edge-based feature [2, 3], histogram of gradient (HoG) feature [4, 5] contour point feature [6], curvelet transform feature [7], and contourlet transform feature [8, 9]. In addition, some works used combinations of two features in order to gain better results, for example, integration of wavelet and contourlet features [10], and combination of PHOG and Gabor features [11].

Other feature-based methods (called sub-feature based), such as SIFT [12] and SURF [13, 14], are insensitive to changes in object scales. They have demonstrated inexpensive computation time and high accuracy rate for object recognition. A variety of classifiers have also been proposed to class those features such as k-nearest neighbor (kNN) [15], support vector machines (SVM) [16], neural networks [17], Bayesian method [18], and ensemble methods [11].

Baran et al. [13] presented a new recognition technique to deal with real-time and non-real-time situations. SURF features and SVM are implemented to identify car models in real time. The accuracy rate of the recognition was reported at 91.7 %. To increase the accuracy, they used the combination of edge histogram, SIFT, and SURF features to recognize CMM in off-line conditions. They reported the accuracy rate of recognition at 97.2 %. Another method was proposed by Zhang [11]. He presented the cascade classifier ensembles which consist of two main stages. The first is the ensemble of four classifiers: SVM, kNN, random forest, and multiple-layer perceptrons (MLP) accepting Gabor features and PHOG features. The outputs of the first stage are accepted class and rejected class. The rejected class from the previous process is sent for re-verification in the second process. The second stage is implemented by rotation forest (RF) of MLPs as components to predict the unclassified subject. This method is reported with 98 % accuracy rate over 21 classes. Last, Kafai and Bhanu [19] presented a novel method in MMR using dynamic Bayesian networks. They use geographical parameters of the car rear view, such as taillight shape, angle between taillight and license plate and region of interest. The recognition result was better than kNN, LDA, and SVM.

As mentioned earlier, CMMR offers valuable enhancement to support additional information in car identification systems. Even though the recognition rates of existing methods are impressive with more than 90 % accuracy, there are some serious drawbacks. Most algorithms only work well under good lighting conditions, where most vehicle features are visible, and without occlusions. As stated in [20], most CMMR systems have difficulties in limited lighting conditions, e.g., at night, and when all vehicles’ features are not fully visible in the scene. In those conditions, cameras cannot clearly capture the full appearance of vehicles. Vehicles’ shapes, colors, headlight shapes, grills, and logos may not correctly or fully appear in the captured images or video resulting in recognition inaccuracy.

To date, car model classification in limited lighting remains challenging and requires more research to determine a better solution. Therefore, this work aims to present a novel method to address the problem.

A number of works have presented solutions to vehicle recognition in nighttime lighting conditions, for example, driver assistance systems (DAS) [2124] and vehicle type, such as car, bus and truck classification at night [25, 26]. However, none of the works could fully perform CMMR effectively. Under limited lighting conditions, accurate image recognition is challenging due to the reduced number of visible features. Kim et al. [22] presented a multi-level threshold to detect the head and tail spotlights for DAS and applied the SVM classifier to distinguish the characteristics of spotlights, such as the area and the distance between each blob. Moreover, time-to-collision (TTC) is implemented in [23] to estimate the time before car chasing after detecting the spotlight. Furthermore, rear lamp detection and tracking by standard low-cost camera were proposed in [24]. The author of the work optimized the camera configuration for rear taillight detection and used Kalman filtering to track the rear lamp pair. In addition, several works dealt with vehicle recognition at night by discriminating vehicle types. Gritsch et al. [25] proposed a vehicle classification system at night using smart eye traffic data sensor (TDS) for vehicle headlight detection and used a distance parameter between the headlight pair to categorize the car-like and truck-like. The results show that the classification error rate at night under dry and wet road conditions is less than 6 %. Moreover, the headlight diameter, headlight distance, and area of windscreen were employed to distinguish the vehicle class [26].

Most of the previous works are only used to localize the car and recognize the vehicle type, but they do not classify car make and model. This paper, therefore, presents a new method which applies feature-based pattern recognition to recognize car models at night or under limited lighting conditions.

2.2 Contributions

The main contributions of this paper are as follows:

  • This is the first proposed method to concentrate fully on recognizing car make and model under limited lighting conditions at night.

  • This study presents the robust salient features at night to recognize car models.

  • This study proposes the new use of a one-class classifier ensemble method of three traditional classifiers to recognize the target car model from any other models, which can improve classification accuracy.

3 Methodology

The aim of this research is to recognize a particular CMM, such as suspected or blacklisted CMM in monitoring systems, such as surveillance or traffic law enforcement systems. This proposed method is designed to recognize a specific CMM of interest, which will be a feasible strategy to implement in real applications of law enforcement systems. For example, in method implementation, if suspected car model (target car model) is reported, the method can be used to automatically identify the same type of CMM in CCTV cameras, instead of using humans to check all CCTVs. A one-class classification (OCC) method seems to be more appropriate than multi-class classification in this problem. OCC is a binary classification method where only data from one class are of interest or available [27]. This class is called the target class. The other class, which is called the outlier class, can be sampled very sparsely or can be totally absent. It has been proven that classification accuracy can be improved by using an ensemble of classifiers [28]. In this research, an ensemble of three classifiers, one-class SVM (OCSVM), decision tree (DT) and kNN, is employed in the classification stage. We choose three diverse classifiers which differ in their decision making and can complement each other to increase classification accuracy. As mentioned earlier, OCC strategy is employed in this research. The SVM technique commonly classifies data into two classes and, therefore, is an appropriate classifier for this problem. Other classifiers, kNN and DT, are not exactly used for OCC problems, but they can be adapted. kNN is the traditional classifier, which reports a high classification rate [29]. For DT, although it does not report a high recognition accuracy, it can fulfill with fast prediction. DT and kNN are unstable classifiers appropriate for the ensemble method [30]. The majority vote of those classifiers is used to identify the vehicle of interest.

The proposed system architecture is shown in Fig. 1. The system consists of two processes: training and classification. As shown in Fig. 1, in the training process, target and other car model images are input to the feature extraction process which extracts features of interest. This process includes a number of steps such as license plate (LP) detection, taillight (TL) detection, and feature extraction. After that, a feature subset selection is applied in order to determine the optimal feature set for this particular target model. Last, the classifier is specifically trained from parameters of obtained training data to identify this particular car model, and the final result is the trained target car model which is utilized in classification process. In the classification process, which is done in real time, a stream of images from CCTV containing different car models is considered. Each car image is performed the similar steps as in training process: feature extraction, feature selection, and classification. First, predefined features are extracted from subject image. Then, optimized features of particular target car model are used to classify with trained target car model. The classification result can be the target or other car model.

Fig. 1
figure 1

Overview of the proposed CMM recognition system

3.1 Feature extraction

Vehicle images can be captured from many different viewing angles. However, at night or under limited lighting conditions, front headlights are significantly bright and, as a result, blur all other potential useful features of a vehicle. Therefore, in this study, it is proposed that the rear view of a vehicle will be used as the main viewing angle. In a rear view, TLs and LP are the most salient appearances at night. From our studies, in general, each car model tends to have unique taillight shapes, sizes, and the distances between TLs and LP. In addition, the angle between TLs and LP is generally unique for each car model. These features were used to classify vehicle types such as car, truck, and SUV with high accuracy [19].

3.1.1 License plate detection

The aim of this stage is to obtain a set of reference points to determine the vehicle’s features. This is achieved by detecting the location of a license plate in the scene in order to find the LP size and shape which can be used to normalize taillight features. The published papers on LP detection can be categorized into six main techniques [31]: boundary or edge feature based, LP global information based, texture feature based, color feature based, character feature based, and combined two or more features based.

During daytime, texture and color feature-based methods produce high detection rates because most features are clearly presented in the scene. At night, the number and quality of features are greatly reduced due to low illumination. Moreover, color features are interfered with and changed due to the lighting conditions in the area and reflections from other vehicle’s lights.

The method proposed in this paper employs the LP detection technique developed in [32]. The technique detects an LP by using edge features, which is a simple, fast, and straightforward algorithm [31]. From the experimental results using a database of 722 images, a high detection rate of 95.43 % can be achieved. Moreover, the technique demonstrates robustness in coping with various illumination conditions. The algorithm is based on the state of the art of LP detection, including LP candidate extraction and LP verification process. In LP candidate extraction, the method uses edge-based technique which implements on grayscale image. Then, vertical edges are extracted by applying the Sobel’s vertical edge detection algorithm where, basically, an LP has more symmetric vertical edges than other areas. After that, mathematical morphological operations, using LP size and structure constraints, are employed to merge vertical edges and identify LP candidate regions. Next, candidates are preserved by using an LP aspect ratio (between two and six).

Last, to verify an LP position, the standard deviation of gray-level distribution is applied in the regions and the intensity region with largest standard deviation is identified as an LP position [32]. The images in Fig. 2 show an example of the steps in the license plate detection algorithm.

Fig. 2
figure 2

License plate detection steps

3.1.2 Taillight detection

TL localization methods are presented in many studies, such as DAS and vehicle classification [25]. In this paper, the authors’ algorithm [33], with accuracy of 95.35 %, is used to localize TL positions. The algorithm is based on the method proposed in [34], which consists of two processes: TL candidate extraction and TL verification. First, in candidate extraction process, color-based method is used to extract TL color pixels. Basically, TL color is white in the center and surrounded by red color [34]. With the TL color, HSV color space, which is reported more appropriate to TL color than RGB color, is used in this research [34]. In color-based technique, TL color thresholds are implemented in order to filter out TL color pixels. Then, small regions (noise) are removed in order to preserve potential candidates.

Next, in TL verification process, symmetry of each candidate TL pair is used to decide the real TLs. The obtained candidate regions are multi-paired to simulate TL pair of a vehicle. Then, symmetry of each pair is measured by calculating symmetry score of alignment of pairs, aspect ratio of shapes, and size of candidate pairs. The taillight positions are subsequently verified by considering the most symmetry score [33]. Figure 3 shows an example of the images of taillight detection process where C symbols represent TL candidates.

Fig. 3
figure 3

Images of taillight detection algorithm

3.1.3 Feature extraction

Once a vehicle’s LP and TLs are identified, a number of important features, such as dimensions, distances and angles between the LP and TLs, can be derived, as shown in Fig. 4. In Fig. 4, H, W, and C denote TLs and LP height, width, and center point, respectively, and numbers 1, 2, and 3 represent left TL, right TL, and LP, consecutively.

Fig. 4
figure 4

Detecting license plate and taillights. H, W, and C denote TLs and LP height, width, and center point, respectively, and numbers 1, 2, and 3 represent left TL, right TL, and LP, consecutively

Those features are divided into two types: geographical features and TL shape features. Geographical features are measured parameters of taillights and a license plate such as width, height, and distance. There are 12 geographical features as follows.

  • Aspect ratio of left TL

  • Aspect ratio of right TL

  • Aspect ratio of left TL width and LP width

  • Aspect ratio of left TL height and LP height

  • Aspect ratio of right TL width and LP width

  • Aspect ratio of right TL height and LP height

  • Angle of left TL and LP

  • Angle of right TL and LP

  • Aspect ratio of distance of TLs and LP width

  • Aspect ratio of distance left TL to LP and LP height

  • Aspect ratio of distance right TL to LP and LP height

  • Aspect ratio of distance of TLs and average of TL width

The aspect ratio features are used because they are normalized features and do not depend on the vehicle’s size in the image.

To determine the TL shape features, the grid method [35] is implemented to capture the TL’s shape. Experimentally, several grid blocks, such as 5 × 5 (25 features), 6 × 6 (36 features), and 8 × 8 (64 features), are applied to extract TL shape. Empirically, an 8 × 8 grid provides the best classification accuracy. However, the bigger the grid is, the more computational time required. Figure 5 shows steps of grid method description for the taillight shape.

Fig. 5
figure 5

Grid feature description of the left taillight shape

3.1.4 Feature set

In real video images, TL and LP detection can be affected by many factors, such as a reflection on LP or TL, or they are obscured by other objects in the scene. Not all features can be detected. Therefore, to make the method robust enough to deal with missing features, different training sets containing different features affected by different factors are studied.

To do this, we divide detected features into four different cases or sets:

  • First set (full features detected)

  • Second set (one TL and LP detected)

  • Third set (TLs detected)

  • Fourth set (only one TL detected)

The first set represents the case when all features are detected, Fig. 6a. In this case, all taillights and a license plate are found and the entire 140 features (12 geographical features and 128 taillight grid features) can be determined. This type of set should provide the best accuracy.

Fig. 6
figure 6

Example of rear view feature detection sets. a Full features detected. b One TL and LP detected. c TLs detected. d One TL detected

In the second set, Fig. 6b, apart from LP, only one TL is detected. The features found therefore are five geographical features, such as a, c, d, g, and j (if left TL detected) and 64 taillight shape features.

The third set where only TLs are detected is shown in Fig. 6c. In this case, there are three geographical features (a, b, and l) and 128 TL shape features.

In the last set, only one TL is detected. It has a TL aspect ratio value and shape features totaling 65 features, Fig. 6d.

3.1.5 Feature selection

A vehicle may have a certain group of eminent features derived from TLs and LP which make it clearly distinguished from others. Therefore, to improve classification accuracy, a feature selection method is applied to find the best (optimized) feature subset. Not only can feature selection enhance the predictor performance, but also it can reduce computation time [36]. Many feature subset selection techniques are available, such as principle component analysis (PCA), particle swarm optimization (PSO), and genetic algorithm (GA). In this work, GA is chosen because, according to the experiments, it is likely to offer the optimum or near-optimum results. In addition, GA was reported to work well for problems with a large number of features [37] which seems to be suitable for the research where 140 features used are proposed in CMMR technique.

The basic technique of GA is designed to mimic the process in the natural evolution strategies of human for survival which follows Charles Darwin’s principle of ‘survival of the fittest.’

GA simulates this principle mechanism by targeting optimal solutions in a complex search space. The new populations of each generation are iteratively created by GA through genetic operations such as selection, crossover, and mutation. Two parents with high relative fitness in the initial generation are chosen in the selection process. Crossover is performed by randomly exchanging parts of selected chromosomes, and mutation presents rare changing of chromosomes. Each population of chromosomes is usually encoded by binary, integer, or real numbers. The length of a chromosome is equal to the dimension of feature. For the binary chromosome employed in this work, each binary value in the chromosome presents one corresponding to the same indexed feature in the feature set. Features are selected if the chromosome value is ‘1.’ Otherwise, the features are not selected if it is ‘0.’ For example, if the generated chromosome is {1 0 1 0 1 0 1 1}, which is 8-bit length, the feature subset consists of features {f1, f3, f5, f7, f8}.

The fitness function of GA is the objective function of the optimization problem. In our case, we define a fitness function to increase the classification accuracy by finding the optimal feature subset that generates the least classification error. Figure 7 shows the feature subset selection algorithm using GA method implemented in the training process. In this work, we define fitness function as follows:

$${\text{Fitness}}\left( {{\text{c}}^{*} } \right) = \mathop {\hbox{min} }\limits_{i \in P} Err\left( {{\text{Predict}}\left( {{\text{model}}_{{{\text{train}},{\text{c}}}} ,{\text{data}}_{{{\text{test}},{\text{c}}}} } \right)} \right)$$
(1)

where c is a chromosome and c* is the optimum chromosome by GA operations.

Fig. 7
figure 7

System architecture of optimized feature selection base on GA

P is the whole population. Datatest,c and modeltrain,c are test and trained model of feature subset indexing by chromosome c. Err is the error rate of selected subset testing. Predict is a function of the proposed research to classify new car model data, Datatest,c, with the trained car model, modeltrain,c.

3.2 Classification

In this stage, we have proposed using a one-class classifier ensemble to recognize the target or the CMM of interest in the images. The ensemble consists of three classifiers: OCSVM, DT, and kNN. Majority vote is used to verify the final decision of classification. Figure 8 shows the proposed one-class classifier ensemble.

Fig. 8
figure 8

Proposed one-class classifier ensemble

OCSVM is a one-class classification method using SVM. It constructs a hyper-sphere around the positive class data that surrounds almost all points in the dataset with the minimum radius. The hyper-sphere can be adjusted to be more flexible by implementing kernel functions [27]. To classify testing data, SVM score function is applied and calculated X test features with all feature points m in the feature space. The highest score will be considered to be the predicted class. SVM score function can be calculated as defined in (2) where c is the predesigned classes.

$${\text{SVM}}_{\text{pred}} \mathop {\hbox{max} }\limits_{c \in - 1, + 1} \mathop \sum \limits_{i = 1}^{m} \alpha_{i} y^{c} K\left( {X^{i} ,X_{\text{test}} } \right) + b$$
(2)

Decision trees are trees that classify instances by sorting them based on feature values. Each node in a DT represents a feature in an instance to be classified, and each branch represents a value that node can assume. Instances are classified starting at the root node and sorted based on their feature values [30]. Maximum probability of testing features with trained class features is used to classify test data as follows:

$${\text{DT}}_{\text{pred}} = \mathop {\hbox{max} }\limits_{c \in - 1, + 1} p(X_{c} |t)$$
(3)

where p(X c |t) is the feature probability value of predefined class c at node t.

kNN is the instance-based learning which classifies the test data by comparison with the kNN training database on a distance function [30]. In this work, we use the Euclidean function to measure distance. The predicted class of test data is defined in (4), where c is the predesigned classes and k is a number of neighbors.

$${\text{NN}}_{\text{pred}} = \mathop {\hbox{min} }\limits_{c \in - 1, + 1} {\text{dist}}\left( {\sqrt {\left( {X_{c} - X_{\text{test}} } \right)^{2} } } \right)$$
(4)
$${\text{kNN}}_{\text{pred}} = \frac{1}{k}{\text{kNN}}_{\text{pred}}$$

where X c and X test mean predefined classes and test features. Minimum distance between test data and predefined classes data is used to verify prediction of this method.

Majority vote is employed to predict the final decision of the three classifiers as follows:

$$\Phi = \left( {\frac{1}{3}({\text{SVM}}_{\text{pred}} + {\text{kNN}}_{\text{pred}} + {\text{DT}}_{\text{pred}} )} \right)$$
(5)
$${\text{Final prediction}} = \left\{ {\begin{array}{*{20}c} { - 1,} & {\Phi < 0.5} \\ { + 1,} & {\Phi \ge 0.5} \\ \end{array} } \right.$$

where Φ is the numerical value which is calculated from the three classifier predictions.

From [29], the classification accuracy of SVM technique can be improved by tuning appropriate parameters. Therefore, this research proposes to use radial basis function (RBF) kernel, and the optimized parameters will be selected for OCSVM. The aim of pruning DTs is to prevent the risk of over-fitting and poor generalization to a new sample. When the tree is over-fitted, it may reduce the classification accuracy. Pruning DTs is therefore used in this research. We propose to use the majority vote of three, five, and seven nearest neighbors in order to crop the variance data and to emphasize the final prediction.

4 Experimental results

We collected video data of the rear view of passing cars. The videos were taken in a city area at night. Then all image frames were extracted and resized from 1080 × 1920 to 480 × 640. As mentioned in the LP detection step, the edge-based method is applied to localize LP position. In the method, many threshold values (e.g., LP dimensions) are employed to create LP candidate regions. Image resizing is needed on the image to incorporate those predefined thresholds. At this proof-of-concept stage, all cars in the scenes were manually labeled with their make and model. The dataset contains 421 car models, given in Table 1, with a total of 766 images, and example car images can be seen in Fig. 9. This dataset consists of two types of car models: target car model and other model. There are 100 target car models used to train and then classify against other models. Each target car model contains at least 4 images (samples), and there is one image per other car model.

Table 1 Car models dataset
Fig. 9
figure 9

Example car images in rear view

The classifiers are implemented in MATLAB version R2013a. Tenfold cross-validation was used to confirm the experiment results in each model. For GA operations, 500 generations are used to search for optimum feature subset, and fivefold cross-validation is utilized to verify each subset. The classification accuracies of 100 target models of the proposed method are given in Table 2. Each target model was tested against 420 other models by using tenfold cross-validation; four feature sets were evaluated separately. In Table 2, the average accuracy was accounted at 93.8 %, and the target car model, Mini countryman, was reported with the highest classification accuracy at 97.2 %, and the second highest was Renault Megane mk2 with accuracy at 97.1 %. From the observation, those car models have very unique appearances (on the presented features). In addition, the classification accuracies of the four feature sets were reported at 93.7, 94.0, 93.6, and 93.8 %, respectively. In Sect. 3.1, the number of features in each feature subset was discussed. The first set has 140 features, which is the largest feature number of all feature subsets. Although the first set has the highest number of features, the prediction rate of the set was not reported with the highest classification accuracy. From observation, the classification accuracy depends on the discriminant features used rather than a number of features. Moreover, using many features may have redundant features that lead to decrease accuracy and increase time consumed.

Table 2 Classification accuracy results

In Table 3, previous works including the proposed research are summarized with details related to vehicle view, types of classification, number of model, environment, and classification performance. As analyzed previous CMMR techniques, they have been presented for daytime condition in which various car appearances can be used and classification accuracies were reported very high, more than 90 %, while a few appearances can be used for car recognition at night. The classification accuracy of the proposed method, 93.8 %, is a bit lower than daytime techniques as shown in Table 3. However, with the limitation of car appearance at night, the classification accuracy of the car appearances gained from the technique used in this study is satisfied. In addition, the performance of this method is experimentally improved by using GA for feature selection.

Table 3 Comparison with other works

5 Conclusion

CMMR is an important topic for developing intelligence transport systems such as surveillance or traffic law enforcement systems. However, it is a difficult task for computer vision techniques to achieve when the recognition is performed under limited lighting conditions due to some missing features. We propose a method to recognize CMM at night by using the salient features of the car rear view. The new combination of geographical and taillight shape features can effectively help to recognize a CMM with high accuracy. The proposed method is robust and can deal with many missing features. The experiments show that the average correct recognition of all the feature sets is about 93.8 %. However, the experiments were tested on 100 CMMs. It is possible that prediction accuracy could be changed by increasing the number of CMMs in dataset.

Future work will involve finding more robust features or distinguishable features with respect to improving classification accuracy. Another problem to be considered is the case when only TLs are detected and TL shapes are not unique (e.g., circular).