Abstract
This paper proposes a system to analyze urban traffic through the using of artificial vision, in order to get reliable information about the traffic flow in cities with severe traffic jam, as in Bogotá, Colombia. It was proposed a method efficient enough to be implemented in an embedded system, in order to process the images captured by a local camera and send the synthesized information to the cloud. This approach would allow spending fewer data transference, because it would not be necessary to send the video of each camera in the city via streaming, instead, each camera would send only the relevant traffic information. The system is able to calculate traffic flow, classified in motorbikes, buses, microbuses, minivans, sedans, SUVs and trucks.
The detection was implemented using a cascade classifier that evaluates HAAR features, providing a detection rate of 74.9% and a false positive rate of 1.4%. A Kalman filter was used to track and count the detected vehicles. Finally, a Convolutional Neural Network performing as classifier, with accuracies around 88%. The complete system presented errors around 2.5% in contrast with the manual counting in traffic of Bogotá, Colombia.
Keywords
- Intelligent transportation systems
- Convolutional Neural Networks
- Artificial vision
- Road car counting
Supported by Universidad Santo Tomás.
Download conference paper PDF
1 Introduction
An intelligent transportation system (ITS) is the operational combination of a set of technologies that, when combined and managed in the right way, improve the performance and capabilities of the entire system. An ITS improves the transport system of a city or an intercity highway, making it safer and more efficient [16].
In intelligent transportation systems, as in traditional systems, the decisions taken to improve the mobility on public roads are based on traffic variables such as traffic flow and average speed, which at the same time are the objective of optimization. Some of these decisions can be the change of green times of traffic lights or the allowing of the right-of-way of lanes in specific directions, or merely the measured variables can be used to establish the performance of transit policies, analyzing their values before and after their implementation.
In some countries the vehicle counting is still performed manually to calculate urban flows. This process is performed by staff located at intersections on the main roads of the city, which draw strokes in formats of vehicular capacity, where each stroke represents a car that travels by that location [2]. This method presents an unknown level of error, increasing the uncertainty of the results. In order to improve the reliability of the data obtained, redundant measures are used; then, several people perform the same work at the same location and then their results are averaged. This method does not establish a known level of error; on the contrary, it increases the costs of the measurement.
This work presents a method for detection, tracking, counting and classification of urban traffic through digital image processing and convolutional neural networks, which aims to improve the disadvantages presented by the manual process. In Sect. 2 it is presented a state of the art that introduces the three main stages that make up the proposed method: detection, tracking and classification. Section 3 explains the techniques used in each stage of the algorithm. Finally, Sect. 4 presents the results obtained from the tests carried out in different locations in the city of Bogotá. Conclusions are presented in Sect. 5.
2 Related Work
Typically, the process of detecting and counting vehicles with artificial vision is divided into three main sections: detection, which establishes the location of the vehicles present in each frame of a video; tracking, which calculates the route followed by each of the vehicles in the scene, and classification, which identifies different types of vehicles. There are a large number of methods used to perform these tasks, here are some of them as well as several previous works:
2.1 Detection
A widely used method for object detection is background subtraction, which is based on the background modelling in a sequence of images. The segments corresponding to moving objects are calculated through the difference between each frame and the previously calculated model, which is later updated with the same frame. In [3], a vehicular detection is done using this method, carrying out the background modeling from a Gaussian Mixed Model (GMM). This technique models each of the background pixels as a mixture of Gaussian distributions. Therefore, pixels that do not adjust to such distributions are considered as moving objects [15].
In [19] the potential location of the vehicles in the image is made by previously placing the shadows of the objects in the scene, in order to corroborate the detection from symmetry and edge detection. A similar approach that adds texture analysis to the image is presented in [9]. Another approach for detection of objects is the eigen component search of the element to be detected; for instance, in [11] these components are searched from basic forms, such as circles to find the tires of a bicycle. In the case of vehicles, in [13] was presented a method based on the detection of the back of the car, specifically from the rear lights. The detection method used in this work is based on a cascade classifier previously trained through the algorithm presented by Viola and Jones in [17]. Some work done with this method is presented in [14, 18, 20].
2.2 Tracking
Tracking corresponds to trace the path of a moving object as it changes its location in a scene. The tracking process is mainly divided into three types: based on region, in which the deviation of the segment corresponding to the moving object is calculated; based on active contour, which calculates and updates a box enclosing the detected object, and based on characteristics, which traces and follows specific characteristics of the object [12].
In addition, a method must be established to estimate future positions of the object, in order to perform the tracking even when the detection step does not deliver the position in a frame. The Kalman filter is an estimator that uses measures observed in the past to make future predictions of the variable. [8] presents satisfactory results in the tracking of multiple moving objects. The particle filter can be similarly used to this application, which allows to simplify the traditional methods used by Kalman filter [1].
2.3 Classification
The classification of vehicles is made from machine learning algorithms, which are trained with large datasets of different types of cars, followed by the extraction of different characteristics, such as dimensions (length, width, area), border orientation histogram, HAAR features, color, among others. These characteristics are applied to a classifier like a neural network, a support vector machine, a boost classifier, among others. Ojha y Sakhare present in [10] a summary of works done with different types of classifiers and characteristics.
Sometimes, the choosing of the correct type of feature presents a challenge, takes a long time, and does not always give good results. The Convolutional Neural Networks (CNNs) handle that problem using convolutional layers, extracting features through spatial filters, whose weights are learned in the same way as the other networks parameters. [4, 5] present some works where CNNs are used for the classification of vehicles.
3 Proposed Method
The proposed procedure for the detection, tracking, counting and classification of vehicles is summarized in Fig. 1. This procedure is performed independently for cars and motorcycles.
Viola-Jones Algorithm. The first stage of detection is based on the Viola-Jones algorithm [17], that is based on a cascade classifier that uses HAAR features, as those shown in the Fig. 2, where the feature value is the difference between the sum of the pixels under the black and white regions.
The set of characteristics are preselected through an AdaBoost learning algorithm. The process is carried out by sweeping a detection window along the image, and processing it through the stages of the classifier as shown in the Fig. 3, establishing whether the object is in a specific position from the result of the classifier. This algorithm offers high detection rates with very short processing times.
Kalman Filter. Each of the detected positions are applied to a Kalman filter, in order to track the vehicle and estimate the unknown locations in the frames where the detector does not deliver them. The counting of a vehicle is done when its position leaves a region of pre-established interest.
Classification. Classification is only performed for cars. A color classifier is applied to determine if a detection corresponds to a taxi (taxis are yellow in Colombia), if not, the detection (snapshot of the detected car) is applied to the convolutional neural network, that classifies it between bus, microbus, minivan, sedan, SUV and truck.
For the purpose of this work, it was used a variation of the AlexNet Convolutional Neural Network [6], pretrained with the ImageNet dataset [7], which contains millions of images from 1000 categories. The net architecture is shown in the Fig. 4.
Variation of the AlexNet Convolutional Neural Network [6].
4 Experiments and Results
The following subsections present the trainings and results obtained from the stages of the algorithm, as well as the total performance.
4.1 Cascade Classifier Training
The training of the cascade classifier was performed with a bank of images of 12500 positive samples (vehicles) and 14000 negative samples (houses, buildings, people, animals, empty streets, etc.). Figure 5 presents some examples of positive samples.
In order to find the appropriate training parameters, sweeps were performed on the following parameters. The chosen parameters gave a detection rate of 74.9% and a false positive rate of 1.4%.
-
Number of stages: from 15 to 30. Chosen: 20
-
Types of features: HAAR, HOG and LBP. Chosen: HAAR
-
Detection window size: 12 \(\times \) 24, 18 \(\times \) 18, 18 \(\times \) 24, 18 \(\times \) 36, 24 \(\times \) 24. Chosen: 18 \(\varvec{\times }\) 24
-
Type of boosting: DAB, RAB, LB, GAB. Chosen: GAB.
4.2 Classification
The training of the color classifier was done with a bank of images of 295 positive (taxis) and 713 negative (non-taxis), obtaining an accuracy of \(99.3\%\) with the test dataset.
According to the Fig. 4, the input of the CNN classifier image must be \(224\times 224\), and the original output size is 100. However, the last fully connected layer was changed to have 6 outputs, and a fine-tuning was performed with the BIT dataset [4] (558 buses, 883 microbuses, 476 minivans, 5922 sedans, 1392 SUVs and 822 trucks). It was used 80% of the dataset for training and validation, and the remaining 20% for testing, getting the learning curve presented in the Fig. 6, and the confusion matrix of the Table 1. There, can be observed that around 300 iterations the training converged with loss around 0.1 and training precision around 97%. The testing accuracy was 88% on average for all the classes.
4.3 Performance of the Complete Method
In order to evaluate the performance of the complete method, traffic videos were captured at different locations in the city of Bogotá. Then, the total number of vehicles was counted manually, how many of them were taxis, and the number of motorcycles that crossed the captured road area. Subsequently, the videos were processed by the proposed method. Table 2 shows the results obtained for each of the videos with 640 \(\times \) 480 video resolution, regarding the manual count. Some time and site conditions were changed in order to show the working of the method in different situations. These features are:
-
Video 1- Main 4 lane road. Hour: 15:20. Weather: cloudy. Average speed: 27 km/h.
-
Video 2- Main 4 lane road. Hour: 10:30. Weather cloudy. Average speed: 12 km/h.
-
Video 3- Main 4 lane road. Hour: 14:12. Weather: sunny. Average speed: 17 km/h.
-
Video 4- Main 2 lane road. Hour: 11:00. Weather: cloudy. Average speed: 22 km/h.
-
Video 5- Main 2 lane road. Hour: 15:00. Weather: cloudy. Average speed: 24 km/h.
The system was implemented in a Cubieboard 4 and a surveillance camera with 640 \(\times \) 480 pixels of resolution. It was obtained detection and tracking processing times around 11.11 ms, appropriate for a real time application. In addition, it was got classification times around 76 ms, which allowed implementing it with a FIFO approach in an independent thread.
5 Conclusions
A method of measuring urban traffic through digital image processing was proposed, implemented and verified, which is divided into three main stages. The detection was based on a cascade classifier and HAAR characteristics, obtaining a detection rate of 74,9% and a false positive rate of 1,4%. The tracking and counting stage, performed from the implementation of a kalman filter, allows to increase the previous detection rate, obtaining an average error of 2,0% in the total vehicle count and 5,5% in motorcycles count. The Convolutional Neural Network presented an average precision of 88,0% in the tests performed. Additionally, the execution times allow the implementation of the system in a commercial embedded platform.
In general, it can be concluded that the proposed method presents appropriate results without the need of high image resolution, allowing its execution on platforms that do not have a high computational capacity, generating the possibility of implementing low cost traffic measurement systems, in which the analysis can be performed locally.
References
Arulampalam, M.S., Maskell, S., Gordon, N., Clapp, T.: A tutorial on particle filters for online nonlinear/non- tracking. IEEE Trans. Signal Process. 50(2), 174–188 (2002). https://doi.org/10.1109/78.978374
Bañón, L., García, B., Francisco, J.: Manual de carreteras. elementos y proyecto. Ortiz e Hijos, Contratista de Obras, S.A. (2000). http://rua.ua.es/dspace/handle/10045/1788
Chen, Z., Ellis, T., Velastin, S.A.: Vehicle detection, tracking and classification in urban traffic. In: 2012 15th International IEEE Conference on Intelligent Transportation Systems, pp. 951–956 (2012). https://doi.org/10.1109/ITSC.2012.6338852
Dong, Z., Wu, Y., Pei, M., Jia, Y.: Vehicle type classification using a semisupervised convolutional neural network. IEEE Trans. Intell. Transp. Syst. 16(4), 2247–2256 (2015). https://doi.org/10.1109/TITS.2015.2402438
Kim, P.K., Lim, K.T.: Vehicle type classification using bagging and convolutional neural network on multi view surveillance image. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 914–919 July 2017. https://doi.org/10.1109/CVPRW.2017.126
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 25, pp. 1097–1105. Curran Associates, Inc. (2012). http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
Lab, S.V.: Imagenet. http://www.image-net.org/
Li, X., Wang, K., Wang, W., Li, Y.: A multiple object tracking method using Kalman filter. In: The 2010 IEEE International Conference on Information and Automation, pp. 1862–1866 (2010). https://doi.org/10.1109/ICINFA.2010.5512258
Li-sheng, J., Bai-yuan, G., Rong-ben, W., Lie, G., Yi-bing, Z., Lin-hui, L.: Preceding Vehicle Detection Based on Multi-characteristics Fusion. In: 2006 IEEE International Conference on Vehicular Electronics and Safety, pp. 356–360 (2006). https://doi.org/10.1109/ICVES.2006.371615
Mendoza-Schrock, O., Bourbakis, N., Rizki, M., Velten, V.: Vehicle classification for civilian and non-civilian applications: survey. In: NAECON 2014-IEEE National Aerospace and Electronics Conference, pp. 163–168 (2014). https://doi.org/10.1109/NAECON.2014.7045796
Mikolajczyk, K., Zisserman, A., Schmid, C.: Shape recognition with edge-based features. In: Harvey, R., Bangham, A. (eds.) British Machine Vision Conference ( 2003), vol. 2, pp. 779–788. The British Machine Vision Association, Norwich (2003). https://hal.inria.fr/inria-00548226
Ojha, S., Sakhare, S.: Image processing techniques for object tracking in video surveillance- survey. In: 2015 International Conference on Pervasive Computing (ICPC), pp. 1–6 (2015). https://doi.org/10.1109/PERVASIVE.2015.7087180
dos Santos, D.J.a.A.a.: Automatic vehicle recognition system: an approach using car rear views and backlights shape. Ph.D. thesis, s.n., Lisboa (2008)
Shujuan, S., Zhize, X., Xingang, W., Guan, H., Wenqi, W., De, X.: Real-time vehicle detection using mixed features and gentle AdaBoost classifier. In: The 27th Chinese Control and Decision Conference (2015 CCDC), pp. 1888–1894, May 2015. https://doi.org/10.1109/CCDC.2015.7162227
Stauffer, C., Grimson, W.E.L.: Adaptive background mixture models for real-time tracking. In: Proceedings. Proceedings of 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149), vol. 2, p 252 (1999). https://doi.org/10.1109/CVPR.1999.784637
U.S. Department of Transportation: History of Intelligent Transportation Systems. http://www.its.dot.gov/history/
Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, vol. 1, pp. I-511–I-518 (2001). https://doi.org/10.1109/CVPR.2001.990517
Wen, X., Yuan, H., Yang, C., Song, C., Duan, B., Zhao, H.: Improved Haar wavelet feature extraction approaches for vehicle detection. In: 2007 IEEE Intelligent Transportation Systems Conference, pp. 1050–1053 (2007). https://doi.org/10.1109/ITSC.2007.4357743
Wen, X., Zhao, H., Wang, N., Yuan, H.: A rear-vehicle detection system for static images based on monocular vision. In: 2006 9th International Conference on Control, Automation, Robotics and Vision. 2006, pp. 1–4 (2006). https://doi.org/10.1109/ICARCV.2006.345157
Xiang, X., Bao, W., Tang, H., Li, J., Wei, Y.: Vehicle detection and tracking for gas station surveillance based on and optical flow. In: 2016 12th World Congress on Intelligent Control and Automation (WCICA), pp. 818–821 (2016). https://doi.org/10.1109/WCICA.2016.7578324
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Camacho, C., Pedraza, C., Higuera, C. (2019). An Artificial Vision Based Method for Vehicle Detection and Classification in Urban Traffic. In: Morales, A., Fierrez, J., Sánchez, J., Ribeiro, B. (eds) Pattern Recognition and Image Analysis. IbPRIA 2019. Lecture Notes in Computer Science(), vol 11868. Springer, Cham. https://doi.org/10.1007/978-3-030-31321-0_34
Download citation
DOI: https://doi.org/10.1007/978-3-030-31321-0_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-31320-3
Online ISBN: 978-3-030-31321-0
eBook Packages: Computer ScienceComputer Science (R0)
-
Published in cooperation with
http://www.iapr.org/