Development of Active Safety Software of Road Freight Transport, Aimed at Improving Inter-City Road Safety, Based on Stereo Vision Technologies and Road Scene Analysis

The article considers the active safety system of road freight transport. The stereoscopic computer vision is the core of the system. The article also describes the major algorithms of active safety and the accuracy characteristics of algorithms ’ application.


Introduction
It is widely known that the major causes of road accidents (about 85%) are the mistakes and non-compliance with the traffic laws. Speeding, crossing into oncoming traffic, inattentive driving, sleeping during the driving, and other causes are the most frequent driving mistakes. One of the ways to improve the driving safety is to deploy active safety systems, aimed at timely notification of a driver of The potential of such systems is being intensively developed, primarily due to the technological equipment of the vehicles and the development of intelligent algorithms for the analysis of road scenarios. Many automobile manufacturers have already introduced for additional car features to facilitate the process of driving and to make it more secure and more comfortable. The laws enable the progress in the deployment of such systems, so that the introduction of active safety systems will be mandatory for new models of trucks from 1 Nov 2018 (EU Directive 3472012). Hence, the systems and their functional capabilities will be further evolving, which emphasizes the rationale for the research.

Related Works
Many automobile manufacturers as well as representatives of the research community are currently working in the field of development of 'advanced driver assistance systems'. The existing projects may differ in the techniques used: high-performance camera [2], radar, camera, and ultrasonic sensor [3], so in a list of solvable problems, e.g., speed bump detect [4], pedestrian detect [5], and vehicle detect [6].
At the Russian national level the 'advanced driver assistance systems' and the robotic transport systems are best developed by 'KB Avrora' [7] and the Central research and development automobile and engine institute NAMI [8].
The results presented in this paper are new both in terms of the techniques and the algorithms used, so they will be of great interest and used for both national and international developers.

General Description
The objective of the project is to create advanced driver assistance systems that provide the truck driver with efficient information on dangerous traffic situations according to the stereoscopic vision systems.
The introduction of active safety systems will increase the level of road safety, will reduce the number of road accidents, and will reduce the cost of ownership of the vehicle.
The following hardware configuration was used to implement the project. A laptop with Intel core i7 processor, 8 GB of RAM, and a video card GTX960 m (2 GB VRAM) served as an estimator for the stereo system. Two USB cameras (IDS) are combined with stereo rod with a 1.57 m base. The cameras are synchronized via the synchronization cable in a master-slave mode. The degree of mal-synchronization is less than 5 ms. Geolocation sensor and strapdown inertial navigation system Xsens with integrated GPS antenna are used. Additional integration with sensors is also possible. Lidar is mounted on the front bumper and scans in a parallel to the road surface. Front radar is with narrow and mean angles, and infra-red range measuring system. C++ with the library Open CV 3.1 [9] was the main development language. The complex is cross platform. The software is widely employing the parallel computing on multicore processors GPU with CUDA technology to provide the performance of *10 frames per second.
Due to some limitation in scope, the paper deals only with the following most in-demand driver assistance systems: lane departure warning system, pedestrian protection system, forward collision warning system, and traffic sign recognition system.

Lane Departure Warning System
Lane departure warning system can work on roads with clearly visible lane markers at the speeds exceeding 50 km/h. The following sequential steps constitute the lane detection algorithm: identifying the road lanes, determining the parameters of lane markers, and selecting of road lanes.
First, the image I t is filtered aimed to identify the lines which are road markers. The procedure is accomplished by using the difference of gaussians. This procedure converts the original color image (Fig. 1a) to a grayscale one.
Second, the fast Hough transform is used to identify the road lanes. The Hough transform is a numeric technique used in detecting the lines in an image. The fast Hough transform makes it possible to determine the common vanishing point of the lines in the image.
Third, the lines detected are checked to be parallel and further these lines are selected. The information from inertial sensors is employed at this stage, which Fig. 1 The results of lane edges detection on the highway allows to identify the horizon in the image (Fig. 1), as well as the number of road lanes.
Then, one can determine the dependence of the brightness of pixels on the length of the horizon line. The dependence has several extrema, which correspond to abscissas of the intersection points of the lines detected with the horizon. Considering the data on the number of the road lanes, we select the brightest intersection points, which are further referred to as road marking lines.
This software was tested on a set of the 5-s-long video clips. Actual vehicle departure off the road was visually determined in a case when the wheel crossed 10 cm or more over the road marking line (in compliance with the European standard [10]). The total number of the lane-changing video clips equaled 122. The tests of the proposed algorithm software showed that the true positive index (TP) was 83% while the number of false negatives (FN) was 1.2%.
The LDWS on a truck is active only when the lane changing is not supported by the turn signal.

Pedestrian Protection System
The pedestrian protection system is active all the time since the turn on of the system. The pedestrian protection algorithm, which is the core of the system, is built on the decision trees with broad attribute space [11]. The LUV transformation, the Gradient magnitude, and the Histogram of gradient are used to estimate attributes.
By the LUV transformation, we mean color space CIE L*u*v*, which is calculated from an image in RGB format, according to the known empirical formulas given in [12]. This transformation allows to perform image norming, to allocate the most effective signal and to even the outliers. The gradient is calculated in each point of the image using a formula of difference along two directions by applying a mask.
The histogram of oriented gradients (HOG) is formed as a sum of gradient module in the directions. The histogram consists of a number of cells. The size of each cell is 6 x 6 pixels.
Therefore, the attributive description of X pedestrians is represented by a vector comprising three LUV transformation parameters, one M, and six HOG parameters.
The method of decision trees with a depth of two was used as the classification method. The attributive information and its threshold value are located in the nodes of trees (obtained in the training stage) which is used to classify the measurements into two categories-with or without a pedestrian (Y = {−1, 1}) in the image, respectively.
The AdaBoost algorithm is employed to train the classifier. To detect pedestrians in the image we use previously trained classifier, and the detector receives a fragment of the original 90 Â 36 size image. The search for pedestrians throughout the image, as well as the removal of constraints on the size of the pedestrians, is carried out by the use of sliding window and image scaling technologies. The reduction of the first and second kind errors is carried out by the use of the information on the scene geometry. The results of the pedestrian detection are exemplified in the figure (Fig. 2).
The detector application in the image is accelerated by techniques [11]. The results of testing for field data have proved that the TP probability of the system is 78.7%, and the FN is 9.4%.
The driver's alert of a potential hazardous proximity with a pedestrian is based on the calculation of time T m (time to collision) before the collision and comparing it with a certain threshold, based on the equation where Z is the relative distance between the objects, V is the relative speed, S ¼ w tÀ1 =w t , w t;tÀ1 is the width of the rectangular area in the image describing the pedestrian at time instants t and t − 1, respectively, Dt is the time interval, t add is the tolerance when exceeded, and an 'alert' message is activated. Equation (1) is true at cruise. Given the acceleration, which is a frequent case because of heavy braking, Eq. (1) may be as follows:

Forward Collision Warning System
The system is implemented by the use of a hybrid detector. The system operation is carried out at speeds exceeding 5 km /h. The proposed algorithm for pedestrian detection, which is trained on samples containing vehicles, is used as a first detector; the AdaBoost with Haar-like features [Haar-like features] is the second method of detection.
Haar-like features consist of rectangular areas within which the intensities of the pixels of the original image (in grayscale) are summed up with weights {−1, 1} for the black-and white-colored rectangles, respectively.
Examples of Haar-like features for detecting vehicles moving behind are presented below (Fig. 3). The first figure focuses on detection of the vehicle bumper, and the second and third figures show the side arches of the vehicle.
The AdaBoost algorithm employs decision trees with a depth of one as the weak classifiers. The detectors are simultaneously run, and each of them gives solutions with a probability from 0 to 100%. The probability is statistically calculated, by counting the number of matches of the objects found in the original image, taken at different scales and by moving the sliding window. The objects found by the first and the second detectors are summed and further filtered in the final stage with the information on the scene geometry. Upon the application of filtering, we evaluate the possibility of finding the object's height h at a given point of the scene, by using information from a stereo system; the object is excluded in a case when the operation is not possible.
Haar-like featured detectors proved to be good at detecting far-away objects. It works on a wide scale pyramid and detects objects ranging in size from 30 to 300 pixels. The average time of such a detector in the FULLHD image equals 50 ms. The realtime performance of object detectors is obtained by using geometric Fig. 3 Examples of Haar-like features used in the system filtration technique, which is often reffered to as foveal vision module [13,15]. The accuracy of detection is increased using image spatial normalization-forward rectification [14]. The quality of the detector operation is TP 86, 9% while the number of false negatives FN is 7% (Fig. 4).

Traffic Sign Recognition System
This system is designed to alert the driver of violations associated with the non-fulfillment of the traffic signs requirements.
A few detectors, each designed for the localization of particular category signs, are employed to detect road signs. The detectors used in the system are divided into two categories: first, 'bitmap' detectors, which calculate features on the fragment bitmap, and second, 'vector', which identify related components and analyze the latter by their shape and color distribution. The above division into categories is rather relative because the two types of detectors respond to both color and shape, but show their specific character.
Bitmap detectors use the Viola-Jones method, and the training is performed on a special color channel, calculated for a specific family of marks. For example, the channel I ¼ k R R À k G G À k B B is used for limiting and warning signs, while the yellow-frame signs are calculated with I ¼ k G G À k B B, and k R ; k G ; k B are real coefficients. The Viola-Jones method uses features that are very similar to Haar-like features. Within the work done, we considered the features, aimed at recognition of rectangular, diamond-shaped, triangular, and circular forms. The system uses four bitmap detectors aimed at the detection of each class of marks and colors such as 'bitmap detector of rectangular blue marks'. These detectors are useful in a range of distances from 5 to 50 m on a wide-angle camera.
Vector detectors are used for an image, which is four times diminished to ensure the search speed. This detector is designed primarily for searching the signs at short distances (up to 30 m). The vector detector builds color masks, based on the allocation of the required color in the HSV color space; then, it carries out morphology operations, defines related components, and selects objects in the image. It later filters components by selecting smooth-edged signs on the basis of the matrix filters and accepts the samples. The vector detector filters by finding the objects satisfying the predetermined geometrical shapes (circle, rectangle, triangle, and diamond) with dynamic programming methods.
The information on regions of interest intended to search for signs in places of their probable location can speed up the detectors' performance.
The vector detector is complemented by OCR module (Optical character recognition) for some groups of signs (such as 'speed limit'). This module also uses the analysis of related components; it removes decoys by calculating the width and the height ratios of the objects found, and it compiles and recognizes text fragments in the alphabet, which is characteristic for the analyzed subset of signs. The algorithms used were proposed by OCR Cognitive Cuneiform, Cognitive Forms, developed by the JSC 'Cognitive'.
The hypothesis, which includes the information on the detected region of an image, on a presumed sign group to which the latter belongs and on a priority, is a result of the detecting unit. The highest priority is given to the hypothesis, supported by both detectors. Coincidence is determined by calculating the degree of overlap of regions and comparing it with a threshold.
Neural network approach is further used to classify road signs. Neural network classifier is based on a fully related neural network with two hidden layers, trained to operate with large amounts of data. A three-channel color image of a road sign in the space of BGR (16*16*3 = 768 neurons for the first level of networks) is fed at the input of the network. The output layer contains 57 neurons, corresponding to the desired classes of signs.
We used 50 thousand signs, marked on the video frames to train the network. Real and synthesized images of signs should be used while training the second level of networks. The synthesized images were generated taking into account possible projective distortions, rotations, displacements, and other noises, typical of real images. The training sample set of networks included 'failure' classes, which do not contain fragments of road signs. Without taking into account the work of detectors, an independent assessment of the accuracy of the network was 99%. An example of the traffic sign recognition system is shown in the figure (Fig. 5).
The results of the complex testing of the whole sign processing subsystem (which includes a set of detectors and classifiers) for the advanced driver assistance system showed the following characteristics of TP, which is 80-85.4% and FN *6.2-10% (excluding stereo matching procedure).

Conclusions
The paper considers the active safety system of road freight transport. The main element of the software complex is a stereo camera. The paper describes the operation of the core systems of the software complex. There have been developed training and testing systems with the image database as a result of the project execution. The performance of all systems involved is of a frequency not lower than 10 Hz.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.