The digital camera market experienced a major boom in the late 1990s and early 2000s. This is due to technological advancements in chip manufacturing, progress in embedded system design, the coming-of-age of CMOS (complementary metal oxide semiconductor) image sensors, and so on . In particular, the development of CMOS image sensors – cheaper to manufacture than CCDs – boosted this growth. Together with stand-alone digital cameras and camera phones, accessibility and demand for smart cameras also increased. According to , While the primary function of a normal camera is to provide video for monitoring and recording, smart cameras are usually designed to perform specific, repetitive, high-speed and high-accuracy tasks. Machine vision and intelligent video surveillance systems (IVSS) are the most common applications.
In general, surveillance camera systems aim to observe a given area in order to increase safety and security. In , a surveillance system for the detection of individuals within a dense crowd from a scene captured by a time-of-flight camera is presented. It makes it possible to detect and track every person’s movement, and to analyze this movement to compare it to the behavior of the entire crowd. Dedicated software enhances these capabilities by providing analysis of the situation, for example. Smart cameras are also widely used in numerous road transportation systems, including traffic management, surveillance, security and law enforcement, automated parking garages , driver assistance and control access systems, etc. A state-of-the-art application related to self-guided and driverless transport vehicles is presented in . The most common and well-known application from the category of traffic surveillance and law enforcement is license plate recognition (LPR) . Due to growing demand, other categories of vehicle classification have also been added recently. Make and model recognition (MMR)  and color recognition (CR) of cars are relatively new functionalities.
The smart camera system presented in this paper also belongs to the category of traffic surveillance and law enforcement applications. According to the goals of the INSIGMA R&D project , under which the presented system has been developed, it also incorporates the three functionalities mentioned above: LPR, MMR and CR.
For clarity of presentation, the rest of the paper is organized as follows. The current section presents an extensive literature review within the framework of the subject matter. In Section 2, the overall architecture of the presented smart camera system is introduced. The MMR, LPR and CR components of the system are presented in detail in Sections 3, 4 and 5, respectively. In Section 6, the system’s efficiency is reported and discussed. Conclusions, with an insight to potential future improvements, are drawn in Section 7.
As mentioned in Section 1, numerous computer vision approaches and their applications are used in various current video-based roadway transportation systems. Due to their extensive capabilities, such systems are categorized as intelligent transportation systems (ITS) by researchers  and legislators around the world .
Various approaches to ITS and different aspects of their architectures are presented in detail in . Methods related to traffic surveillance including tracking and recognition of vehicles, traffic flow monitoring  and driver assistance applications are also discussed in the paper. Typical driver assistance applications address lane departure and pedestrian detection problems. Traffic flow monitoring applications may prove to be useful in traffic optimization and road incident management systems. For example, they are able to evaluate the length of traffic queues  or estimate critical flow time periods .
The use of traffic cameras for security and law enforcement purposes has many practical benefits. First of all, video sequences recorded by such cameras can be used as evidence for police forces or insurance companies. They can be browsed to review events of interest at different points in time. Moreover, when a given event has been registered by a number of cameras, it can be analyzed from different views. As well as post-hoc analysis, views registered by traffic cameras are usually monitored in real time by human operators in control centers.
Computer vision techniques can significantly expand the capabilities mentioned above. Segmentation, extraction of salient regions, feature-based detection and classification, video indexing and retrieval, etc., can radically increase the number of factors taken into account during the analysis and, in this way, appreciably improve its accuracy. This helps operators avoid making wrong decisions, as they are supported by automatically-generated alarms and advised by powerful content-oriented analysis engines.
As mentioned in Section 1, the smart camera system presented in this paper can also be included in the category of security and law enforcement applications. Development of the presented architecture and the related research form a part of the INSIGMA R&D project  in which the authors of this paper are currently involved. One of INSIGMA’s objectives is to develop software which will be able to process video sequences registered by surveillance cameras in order to detect and recognize selected features of cars, including vehicle manufacturer and model, number plates and color.
Recognition of vehicle number plates, known as LPR (or as automatic number plate recognition (ANPR), especially in the UK), is one of the most popular and the earliest available application from that category. Most of the existing LPR systems use similar schemes, which usually include the following successive processing steps: preprocessing, plate detection, localization and horizontal alignment, and character segmentation and recognition. The preprocessing step is generally required to improve the quality of the processed images. This may address objectives such as shadow removal, characters enhancement, background suppression, strengthening of edges, etc. These goals are usually achieved by various binarization methods, including the Otsu binarization , adaptive binarization techniques such as variable thresholding  or the Sauvola method , and other non-adaptive methods, as in . Strengthening of edges is achieved by combining selected binarization methods with techniques including greying, normalizing, histogram equalization, etc., as reported in . Other preprocessing objectives, such as noise removal and general image enhancement, are achieved by applying wavelet-based filters  and the top-hat transform , respectively.
There are also many different approaches to license plate detection and localization. One of the simplest (albeit least efficient) methods is based on histograms obtained as a result of horizontal and vertical projections through the image . In  the density-based region growing method is also shown as being capable of detecting license plates. In , connected component analysis followed by the labeling technique is reported as an efficient method. Various edge detection algorithms, including the Canny edge detector  and Roberts cross operator , have also been found to be effective. Other approaches to license plate detection and localization are based on different types of salient features including SIFT , discrete wavelet transform , neural networks [28,33], etc.
Since objects of interest in video footage are usually distorted, an additional step of horizontal alignment of license plates must follow to improve detection and localization. A number of techniques can be applied to correct the skew of localized and extracted plates. The most effective are the Hough transform  and a method based on appropriate geometric constraints, as reported in .
The step following the successful skew correction is character segmentation. There are many different approaches to this task, some of which refer to horizontal and vertical projections through the extracted license plate image. Used alone or in combination with selected geometrical constraints (related to assumptions about the height and width of characters) are reported as being effective in  and , respectively. A grey-level quantization combined with appropriate morphology analysis were also performed to locate and separate individual characters . Another technique examined within the framework of this subject  is connected component analysis. In , it was shown that an in-depth analysis based on a combination of selected binarization methods results in good character segmentation. In , characters, even when they are adhesive or cracked, are accurately extracted thanks to the spatial scalability of their contours. Characters extracted in this way are then segmented using a matching algorithm using adaptive templates.
The final step is character recognition. The most popular approaches to this task are based on different models of neural networks, including artificial neural networks (ANNs) , probabilistic neural networks (PNNs) , and back propagation neural networks (BPNN) . Within the category of machine learning methods, support vector machine (SVM)  -based approaches are also popular, as reported for example in . Among other methods, template matching  and optical character recognition (OCR)  are also frequently used. Comprehensive surveys of LPR techniques can be found in  and .
Despite the fact that MMR frameworks are already being applied in selected security systems , the volume of related scientific literature is relatively low. This is most likely due to commercialization.
One of the first approaches to the MMR issue was presented in , where a combination of different types of features, extracted from frontal views of cars, was used to distinguish between different car models. Selected feature extraction algorithms (e.g., Canny edge detector, square mapped gradients, etc.) and various classifications methods (e.g., naive Bayes) were investigated in . Another contour oriented approach  is reported in . In this approach, contours, extracted using the Sobel filter, are transformed into complex feature arrays where only the contour points common to all images from the training set (of a given class) are represented. Such feature arrays, known as oriented-contour point matrices, are input into the classification procedure which uses four different measures including distance errors between oriented-contour points of the class model and the sample being examined. Another contour-based solution is presented in .
Methods described so far are based on features extracted from the spatial domain. There are also methods which operate in different transform domains [9,25]. An example of such an approach was presented in  where the discrete curvelet transform (DCT)  was shown; this transform domain feature extractor provides the best recognition rate out of the three being studied. In , the DCT was combined with a standard k-nearest neighbor (kNN) algorithm . According to results reported in , SVN gives better results when combined with DCT, especially when the SVM one-against-one strategy is used. Similar research based on the contourlet transform  is presented in .
Other valuable approaches to MMR are also related to the scale invariant feature transform (SIFT) . The effectiveness of SIFT-based MMR schemes was investigated and reported by the research team of Prof. Serge J. Belongie . A simple matching algorithm, where SIFT descriptors computed for a given query image are matched directly, one by one, with descriptors determined for each of the reference images, is presented in . This and other reports confirm that approaches based on SIFT  or the speeded-up robust features (SURF) method  are also promising for solving the MMR problem.
Vehicle color recognition (VCR) in outdoor conditions remains an unsolved problem. This is mainly due to lighting conditions, shadows and reflections of sunlight on the shiny vehicle surface. These problems make finding a suitable solution challenging.
In  a tri-state architecture including a Separating and Re-Merging (SARM) algorithm to effectively extract the car body and classify the vehicle color in challenging cases with unknown car type, unknown viewpoint, and no homogeneous light reflection conditions. In , in turn, different features, selected to represent various color spaces, and different classification methods (kNN, ANNs, and SVM), were analyzed with regard to the VCR task. The features were computed according to two selected views of the car: a smooth hood piece, and semi-frontal view. Sunlight reflections and filtering out vehicle parts irrelevant to the color recognition problem were the subjects of research reported in . An effective approach based on color histograms and template matching was reported in . The main objective of this approach was to find the required number of histogram bins. Color histograms, combined with principal component analysis (PCA), were examined in . A different SVM based approach was proposed in . The video color classification (VCC) algorithm presented in this paper was based on refining the foreground mask for removing the undesired region.