1 Originality and Contribution

Novelty of the work includes (1) a robust curve fitting approach based on multiple vanishing point detection and (2) the use of multiple Condensation filters to track the road of arbitrary shape and automatic switching between trackers according to road conditions. A significant contribution of this work lies in estimating the road model using high level and high confidence road features, and the exploitation of knowledge about the road geometry under perspective projection. Despite some potential drawbacks, when the ground plane or the parallel road boundary assumption is violated, our method provides a powerful tool for use in real-time road tracking.

2 Introduction

Video based systems have been used for road surveillance, e.g., recognition of vehicles and for autonomous vehicles and driver assistance, e.g., recognition of roads and obstacles [13]. In this paper, we describe a new method for road tracking—a multiple tracker framework, based on the Condensation method [4]. The work is part of the effort in developing a new generation of navigation systems using video inputs.

Many vision-based road detection systems instantiate road models from features extracted from the image. These features extraction methods are often based on corners, edges, ridges, colours and textures of the image [514]. For example, Rasmussen et al. [12] use information about texture orientations at pixels, obtained using Gabor filters, to detect the road. High level features such as road junctions and statistical models have also been used to assist road detection [1517].

Once road features are detected, a model of the road can be built for vision-based road tracking. Some used straight lines to model the road [18] while others employed more complex models such as cubic B-Splines [19, 20], clothoid [2123], parabola [24], and hyperbola [2527]. For example, Southall and Taylor [21] describe a collision warning system in which the road is modelled as a clothoid, approximated by a polynomial, in order to deal with road curvature and changes of curvature. However, Cramer et al. pointed out that a clothoid model is unreliable when the road curvature varies significantly [28, 29]. Instead, they model the road as connected arc segments with the aid of digital map data.

Our methodology for road detection and tracking takes into account the fact that a complex road shape can be decomposed into a sequence of simple shapes at road transition points (where there is a significant change of road curvature). Rather than labouring a single tracker to do all the work, we switch trackers at road transitions. The method is more tuned to road conditions and is more robust. It also supports the representation of the road as a sequence of connected arcs/segments so that information from a digital map can be integrated into tracking.

The reminder of this paper is organised as follows: Sect. 2 describes a method for estimating the road model. This includes methods for eliminate spurious line segments in images and multiple vanishing point detection and fitting, in order to produce an accurate estimate of a hyperbolic road model. Section 3 integrates the estimated hyperbolic road model with a single Condensation tracker first, then introduces a road transition model in order to handle the change of road curvature effectively, and finally describes the operations of road tracking using multiple Condensation trackers. Section 4 shows experimental results using real road videos. Section 5 draws conclusion to the paper.

3 Estimating the road model

The method for estimating the road model works as follows. First, we divide an image into horizontal strips and detect road boundary features. Second, we cluster road boundary features into left and right boundary groups. Third, we detect multiple vanishing points and the horizon necessary for estimating the parameters of the hyperbolic road model. The method was first introduced in [30] and is described here for completeness.

3.1 Clustering line segments into road boundary groups

We divide an image up into a number of horizontal strips containing only straight road boundaries and detect line segments in each strip using the Canny operator and the Hough transform. Some of the lines found may not belong to road boundaries so a procedure to eliminate these spurious line segments is necessary. We use the circular road model proposed in [17] to generate road samples in order to estimate whether a line segment is part of a road boundary. The model is represented as (h, C 0, l 0, φ, θ, W): h height of the camera; C 0 road curvature; l 0 lateral offset of the camera from the lane axis; φ pitch angle of the camera relative to the ground plane; θ yaw angle of the camera relative to the lane axis; W lane width. Suppose (u, v) represents the image coordinate. Given the intrinsic and extrinsic parameters of the camera, the position and pose of the camera relative to the road, and the road parameters (width, curvature, etc.), the projections of the road boundary points in the image can be calculated. This involves estimating three attribute values for each road boundary point P i the image coordinates u i , and v i , and gradient g i of the tangent to the road boundary at (u i , v i ). As the image is divided into horizontal strips, we know for each strip the v i coordinate, so only u i and g i need to be estimated. u i can be determined by

$$ u_{i} = e_{u} \left[ {{\frac{{e_{v} hC_{0} }}{{2(v_{i} - v_{0} + e_{v} \varphi )}}} + {\frac{{\gamma W/2 - l_{0} }}{{e_{v} h}}}\left( {v_{i} - v_{c} + e_{v} \varphi } \right) + \theta } \right] + u_{c} $$
(1)

where γ = −1 for the left boundary point, γ = 1 for the right boundary point; f is camera focal length; d u , d v width and height of the pixel; e u  = f/d u horizontal focal length in pixels; e v  = f/d v , vertical focal length in pixels; (u c ,v c ) the centre of the image.

The proof of (1) is as follows: road boundaries can be approximated by a pair of parabolas on the ground plane expressed in camera coordinates, as depicted in Fig. 1, with the same C 0 and φ:

Fig. 1
figure 1

Road boundaries are approximated by a pair of parabolas on the ground plane. The camera’s attitude is also shown. O is the camera’s optical centre. OC is the optical axis. x 0 is the offset of the camera from the centre of the road. z 0 is the height of the camera over the ground plane. α and φ are the inclination and yaw angle, respectively

The pair of parabolas can be formulated as:

$$ x = - \frac{1}{2}C_{0} y^{2} - y\tan \varphi + x_{0} + {\frac{\gamma W}{2}}. $$
(2)

Assuming a 3D point P is projected onto an image pixel (u i , v i ) under perspective projection and the image centre (u 0, v 0) = (0, 0), the following equations hold:

$$ {\frac{{v_{i} }}{{e_{v} }}} = \tan \left( {\arctan {\frac{{z_{0} }}{y}} - \alpha } \right),{\frac{{u_{i} }}{{e_{u} }}} = {x \mathord{\left/ {\vphantom {x {\left( {\sqrt {y^{2} + z_{0}^{2} } \cos \left( {\arctan {\frac{{v_{i} }}{{e_{v} }}}} \right)} \right)}}} \right. \kern-\nulldelimiterspace} {\left( {\sqrt {y^{2} + z_{0}^{2} } \cos \left( {\arctan {\frac{{v_{i} }}{{e_{v} }}}} \right)} \right)}}, $$
(3)

Substitute x, y in (2) with those derived from (3) and approximate tanα and tanφ by α and φ, respectively, we obtain (1).

Differentiating u i in (1) in terms of v i gives the gradient:

$$ g_{i} = e_{u} \left[ { - {\frac{{e_{v} hC_{0} }}{{2\left( {v_{i} - v_{c} + e_{v} \varphi } \right)^{2} }}} + {\frac{1}{{e_{v} h}}}\left( { - l_{0} + {\frac{\gamma W}{2}}} \right)} \right]. $$
(4)

Each of the other parameters in the circular road model (h, C 0, l 0, φ, θ, W) is uniformly sampled from a certain range (e.g., 0.8 m ≤ h ≤ 1.2 m, −0.01 ≤ C 0 ≤ 0.01, −5° ≤ φ ≤ 5°, 0 ≤ θ ≤ 5°, 3.5 m ≤ W ≤ 4.5 m and −W/2 ≤ l 0 ≤ W/2).

The circular road model is then used to generate road samples. The mean ū i and variance \( \sigma^{2} (u_{i} ) \)of u i for each v i are calculated over the road samples as well as the average \( \bar{g}_{i} \) and variance σ 2(g i ) of g i according to (5). Confidence intervals for u i and g i can be then defined as [u i  − σ(u i ), u ,i  + σ(u i )] and [g i  − σ(g i ), g ,i  + σ(g i )], respectively, for the boundary points corresponding to v i . Line segments in the image can be clustered into left and right road boundaries according to which confidence interval the end points of a line segment fall into.

3.2 Detecting multiple VPs

Using a single vanishing point (VPs) [3133] to estimate the road boundary is error prone because boundaries of a complex road do not intersect at a point—they vanish at a line, called the vanishing line or horizon. In order to detect the horizon we need to detect multiple VPs.

Since there are still spurious line segments in each road boundary group, not all combinations of two line segments, one from each road boundary, are valid pairs for VP calculation. We therefore detect a VP in each image strip using the least median squares (LMedS) method [34]. LMedS randomly samples a number of points from a data set and uses these points to estimate model parameters. The sampling process is repeated for at least m times and m is related to the percentage of outliers in the data set. By selecting the estimation having the least median of square errors, LMedS is able to exclude outliers up to 50%. We also use the covariance matrix about valid pairs of line segments from training road samples to eliminate unlikely combinations of left and right road boundary segments.

Suppose (u, v) represents image coordinates. For the ith image strip, let x i  = (u l,i , u r,i , g l,i , g r,i )T, where u l,i and u r,i are the u coordinates of the two intersection points of the horizontal centre line of the ith image strip with a left and a right road boundary, respectively, and g l,i and g r,i are the gradients at the two intersection points, respectively. The distribution of these pairs of intersection points can be characterized by mean \( \bar{x}_{i} \)and covariance matrix Cx i which are calculated from training road samples. The multiple VP detection method can be stated as follows:

3.3 Fitting multiple VPs to estimate the horizon

To locate the horizon (or the vanishing line), we use the M-Estimator [35] to fit a line to the VPs detected. Suppose we have a set of VP candidates, \( \left\{ {vp_{i} : = \left( {u_{{{\text{VP}}_{i} }} , \, v_{{{\text{VP}}_{i} }} } \right)|i = 0, \ldots ,N} \right\}, \, \) each is for one image strip. The M-Estimator for estimating the v coordinate v H of the horizon reduces the effect of spurious VPs by minimising a so-called ρ function of residuals:

$$ \rho (r_{i} ) = {\frac{{r_{i}^{2} }}{{\sigma^{2} + r_{i}^{2} }}} = {\frac{{\left( {v_{H} - v_{{{\text{VP}}_{i} }} } \right)^{2} }}{{\sigma^{2} + \left( {v_{H} - v_{{{\text{VP}}_{i} }} } \right)^{2} }}} $$
(5)

where r i is the ith residual, σ = 1.4826*median(||r i ||). The M-Estimator aims to find a solution to (6):

$$ \sum\limits_{i} {{\frac{{{\text{d}}\rho (r_{i} )}}{{{\text{d}}v}}} = \sum\limits_{i} {\frac{{{\text{d}}\rho }}{{{\text{d}}r_{i} }}\frac{{{\text{d}}r_{i} }}{{{\text{d}}v}}}} = \sum\limits_{i} {\left( {\frac{1}{{r_{i} }}}{\frac{{{\text{d}}\rho }}{{{\text{d}}r_{i} }}} \right)r_{i} \frac{{{\text{d}}r_{i} }}{{{\text{d}}v}}} = \sum\limits_{i} {w(r_{i} )r_{i} {\frac{{{\text{d}}r_{i} }}{{{\text{d}}v}}}} = 0 $$
(6)

where w(r i ) is a weight function with the form:

$$ w\left( {r_{i} } \right) = {\frac{{2\sigma^{2} }}{{\left[ {\sigma^{2} + \left( {v_{H} - v_{{{\text{VP}}_{i} }} } \right)^{2} } \right]^{2} }}} $$
(7)

At each iteration the residuals are re-weighted and the solution is re-computed according to (8):

$$ v_{{{\text{H}},t + 1}} = {{\left( {\sum\limits_{i} {{{2\sigma_{t}^{2} v_{{{\text{VP}}_{i} }} } \mathord{\left/ {\vphantom {{2\sigma_{t}^{2} v_{{{\text{VP}}_{i} }} } {\left( {\sigma_{t}^{2} + \left( {v_{{{\text{H}},t}} - v_{{{\text{VP}}_{i} }} } \right)^{2} } \right)^{2} }}} \right. \kern-\nulldelimiterspace} {\left( {\sigma_{t}^{2} + \left( {v_{{{\text{H}},t}} - v_{{{\text{VP}}_{i} }} } \right)^{2} } \right)^{2} }}} } \right)} \mathord{\left/ {\vphantom {{\left( {\sum {{{2\sigma_{t}^{2} v_{{{\text{VP}}_{i} }} } \mathord{\left/ {\vphantom {{2\sigma_{t}^{2} v_{{{\text{VP}}_{i} }} } {\left( {\sigma_{t}^{2} + \left( {v_{{{\text{H}},t}} - v_{{{\text{VP}}_{i} }} } \right)^{2} } \right)^{2} }}} \right. \kern-\nulldelimiterspace} {\left( {\sigma_{t}^{2} + \left( {v_{{{\text{H}},t}} - v_{{{\text{VP}}_{i} }} } \right)^{2} } \right)^{2} }}} } \right)} {\left( {\sum\limits_{i} {{{2\sigma_{t}^{2} } \mathord{\left/ {\vphantom {{2\sigma_{t}^{2} } {\left( {\sigma_{t}^{2} + \left( {v_{{{\text{H}},t}} - v_{{{\text{VP}}_{i} }} } \right)^{2} } \right)}}} \right. \kern-\nulldelimiterspace} {\left( {\sigma_{t}^{2} + \left( {v_{{{\text{H}},t}} - v_{{{\text{VP}}_{i} }} } \right)^{2} } \right)}}^{2} } } \right)}}} \right. \kern-\nulldelimiterspace} {\left( {\sum\limits_{i} {{{2\sigma_{t}^{2} } \mathord{\left/ {\vphantom {{2\sigma_{t}^{2} } {\left( {\sigma_{t}^{2} + \left( {v_{{{\text{H}},t}} - v_{{{\text{VP}}_{i} }} } \right)^{2} } \right)}}} \right. \kern-\nulldelimiterspace} {\left( {\sigma_{t}^{2} + \left( {v_{{{\text{H}},t}} - v_{{{\text{VP}}_{i} }} } \right)^{2} } \right)}}^{2} } } \right)}} $$
(8)

Once multiple VPs have been detected, a line fitting these points will be the horizon.

3.4 Estimating road parameters

We model the road boundaries in the image plane as a pair of hyperbolas:

$$ u - u_{H} = a(v - v_{H} ) + {\frac{b}{{v - v_{H} }}} $$
(9)

where (u, v) is the coordinate of points on the left or right road boundary; a is the inverse tangent of an asymptote of the hyperbola; (u H , v H ) is the other asymptote on the horizon where the two asymptotes intersect; v H is the horizon which has been determined in Sect. 2.3; b is linearly related to the road curvature C 0 in the 3D world. A more general form of the road shape can be represented by the polynomial [36, 37]: \( u = \sum\nolimits_{i = 0}^{n} {a_{i} v^{1 - i} } , \) where the coordinate origin is a point on the horizon. The hyperbola is a special case, when n = 2. From (1) and (9) we have

$$ v_{H} = v_{c} - e_{v} \varphi $$
(10)
$$ a = {\frac{{e_{u} (\gamma W/2 - l_{0} )}}{{e_{v} h}}} $$
(11)
$$ b = (e_{u} e_{v} hC_{0} )/2 $$
(12)
$$ u_{\text{H}} = u_{c} + e_{u} \theta $$
(13)

The diagram of the hyperbolic road model is shown in Fig. 2.

Fig. 2
figure 2

The diagram of the hyperbola road model. The left road boundary has two asymptotes: l 1 and the horizon. The right road boundary has two asymptotes: l 2 and the horizon. l 1 and l 2 vanish at VP (u H , v H )

We now estimate u H , a and b to determine the road shape using the VPs. Differentiating (9) with respect to v gives:

$$ {\text{d}}u/{\text{d}}v = a - b/(v - v_{\text{H}} )^{2} $$
(14)

Suppose that a road boundary point P i (\( u_{{{\text{P}}_{i} }} , \, v_{{{\text{P}}_{i} }} \)) has the same v image coordinate as the centre line of the ith image strip, i.e., v P i  = v i . Linking the ith VP VP i to P i gives the gradient to P i :

$$ ({\text{d}}u/{\text{d}}v)_{i} = \left( {u_{{{\text{VP}}_{i} }} - u_{{P_{i} }} } \right)/\left( {v_{\text{H}} - v_{{P_{i} }} } \right) = {{a - b} \mathord{/ {\vphantom {{a - b} {( {v_{{P_{i} }} - v_{\text{H}} })^{2} }}} \kern-\nulldelimiterspace} {\left( {v_{{P_{i} }} - v_{\text{H}} } \right)^{2} }} $$
(15)
$$ u_{Pi} = u_{H} + a\left( {v_{{P_{i} }} - v_{H} } \right) + b\left( {v_{{P_{i} }} - v_{H} } \right)^{ - 1} $$
(16)

Replacing \( u_{{{\text{P}}_{i} }} \) in (15) by (16), we have the following linear equations:

$$ {\mathbf{AX}} = {\mathbf{B}} $$
(17)

where \( {\mathbf{A}} = \left[ {\begin{array}{*{20}c} {a_{1}^{\text{T}} } \\ \vdots \\ {a_{n}^{\text{T}} } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {v_{{P_{1} }} - v_{\text{H}} } & 2 \\ \vdots & \vdots \\ {v_{{P_{n} }} - v_{\text{H}} } & 2 \\ \end{array} } \right] \) \( {\mathbf{X}} = \left[ {\begin{array}{*{20}c} {u_{\text{H}} } \\ b \\ \end{array} } \right] \) and \( {\mathbf{B}} = \left[ {\begin{array}{*{20}c} {b_{1} } \\ \vdots \\ {b_{n} } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {u_{{P_{1} }} \left( {v_{{P_{1} }} - v_{\text{H}} } \right)} \\ \vdots \\ {u_{{P_{n} }} \left( {v_{{P_{n} }} - v_{\text{H}} } \right)} \\ \end{array} } \right]. \)

Note that a has been cancelled out and (17) involves only two unknowns b and u H , which can be easily solved using the least squares method. However, detected VPs may still contain outliers or non-Gaussian noise, so a robust fitting method is needed. Thus a weight matrix is multiplied to both sides of the linear system:

$$ {\mathbf{WAX}} = {\mathbf{WB}} $$

where \( {\mathbf{W}} = {\text{diag[}}w_{1} \cdots w_{n} ]. \)

An iterative robust fitting algorithm is performed until the solution converges or maximal iteration has been reached, according to the following procedure:

  1. 1.

    Initialize X 0 with W = I, and set j = 1;

  2. 2.

    For all i (1 ≤ i ≤ n), compute w i,j  = \( \varphi '\left( {\left( {{\frac{{a_{i}^{T} {\mathbf{X}}_{j - 1} - b_{i} }}{\sigma }}} \right)^{2} } \right); \)

  3. 3.

    Solve the linear system W j AX j  = W j B;

  4. 4.

    If ||X j X j−1|| > ε and j < j max, set j = j + 1, and go to 2, else X = X j

where Φ is the error probability density function (pdf) of the detected VPs.

The final step is to estimate a l and a r . A search is carried out in the space containing all possible values of a l and a r . a l varies from 0 to 3, and a r from −3 to 0 with steps of 0.1. The procedure produces a set of hypothesised hyperbolas evenly distributed in the image. We then find the best hyperbola by measuring the spatial proximity of estimated model to edge features in the edge map (the measurement model is discussed in Sect. 3). The procedure can also be replaced by a Hough transform—a l and a r can be voted by all edge points below the horizon, but this may not be efficient as noise is present in the image.

3.5 Road detection results

We test our road detection methods using real road videos. We demonstrate the results using three representative videos taken from motorways, suburban roads and rural roads respectively. We randomly pick hundreds of frames from each video and compare the detected road boundaries in them with real road boundaries displayed in the video frames. Detection rate is calculated based on how close the detected boundaries are to the real road boundaries. We show the test results in Table 1.

Table 1 Detection rates for three road video sequences

Figure 3 demonstrates the robustness of our approach under adverse conditions such as occlusion by other vehicles, road markings and shadows. Since at least some vanishing points can be correctly detected, and they aggregate around the horizon, a robust estimator is able to eliminate outliers.

Fig. 3
figure 3

a, b Partial occlusion by cars. c, d Signs on the road. e Shadows on the road

We discuss the effect of the image strip height on vanishing point detection. The higher the image strip, the more line features can be detected. However, too high an image strip will degrade the accuracy of vanishing point detection, especially when the road boundary has a significant curvature. This proves the need for dividing the image into horizontal strips. On the other hand, if the height of the image segment is too small, line features of the road segment may be too short to be detected. Basically, two conditions have to be satisfied: 1. the height of an image strip has to be small enough such that road boundaries contained in the strip can be approximated by straight lines. 2. the detected line segments are long enough to be distinguished from image noise. A compromise thus has to be made. Figure 4 shows the results of VP detection by varying the height of an image strip. It is evident that the positions of the VPs detected vary as the height varies.

Fig. 4
figure 4

Deviation of u (DeviationX) and v (DeviationY) of detected vanishing point candidates. The horizontal axis is the size (height) of the image segment in pixels. The vertical axis is deviation in pixels

4 Tracking the road

First, we show how to integrate the hyperbolic road model with a Condensation tracker. Second, we develop a road transition model to handle large road curvature. Third, we integrate the road model and road transition model with multiple Condensation trackers to track the road in real-time for vision-based navigation applications.

4.1 Road tracking with a single Condensation tracker

The main feature of Condensation is Bayesian filtering, which recursively estimate a posterior density of the object’s state using the Bayesian inference:

$$ p\left( {{\mathbf{x}}_{t} |{\mathbf{z}}_{t} } \right) \propto p\left( {{\mathbf{z}}_{t} |{\mathbf{x}}_{t} } \right)p\left( {{\mathbf{x}}_{t} |{\mathbf{z}}_{t - 1} } \right) $$
(18)

where p(z t |x t ) is the likelihood and p(x t |z t−1) is the prior density derived from the previous posterior density p(x t−1|z t−1) and a dynamical model or a motion prior p(x t |x t−1)

$$ p\left( {{\mathbf{x}}_{t} |{\mathbf{z}}_{t - 1} } \right) = \int {p\left( {{\mathbf{x}}_{t} |{\mathbf{x}}_{t - 1} } \right)p\left( {{\mathbf{x}}_{t - 1} |{\mathbf{z}}_{t - 1} } \right){\text{d}}{\user2{x}}_{t - 1} } $$
(19)

Bayesian filtering is also the root of Kalman filtering, but the Condensation tracker differs from the Kalman filter in that it can estimate object distributions which are not Gaussian. It does this by generating a set of particles at time t, \( \{ {\mathbf{x}}_{n,t} ,w_{n,t} \} ,n = 1, \ldots ,N, \) where x n,t represents the nth observation of the object state at time t, w n,t is the probability for x n,t, to be the true state and N is the number of particles. These particles together approximate the probability density of the object’s state. Condensation uses a sampling scheme called Sampling Importance Resampling (SIR), which recursively draws samples from an importance proposal density function q(x) and weights them using equation (20) to arrive at \( \tilde{p}\left( {{\mathbf{x}}_{t} |{\mathbf{z}}_{t} } \right), \) the approximation of the true posterior density estimation \( p\left( {{\mathbf{x}}_{t} |{\mathbf{z}}_{t} } \right). \)

$$ w_{n,t}^{{}} \propto {\frac{{p\left( {{\mathbf{x}}_{n,t}^{{}} |{\mathbf{x}}_{n,t - 1}^{{}} ,{\mathbf{z}}_{t}^{{}} } \right)}}{{q\left( {{\mathbf{x}}_{n,t}^{{}} |{\mathbf{x}}_{n,t - 1}^{{}} ,{\mathbf{z}}_{t} } \right)}}} = {\frac{{p\left( {{\mathbf{z}}_{t}^{{}} |{\mathbf{x}}_{n,t}^{{}} } \right)p\left( {{\mathbf{x}}_{n,t}^{{}} |{\mathbf{x}}_{n,t - 1}^{{}} } \right)}}{{q\left( {{\mathbf{x}}_{n,t}^{{}} |{\mathbf{x}}_{n,t - 1}^{{}} ,{\mathbf{z}}_{t} } \right)}}} $$
(20)

Initially, Condensation tracker assigns each possible state the same probability. At each time step, a new set of particles is sampled from those with higher probabilities in the previous step and propagated according to the dynamics model. This implies that it actually takes the dynamics model or motion prior as the proposal density:\( q\left( {{\mathbf{x}}_{n,t} |{\mathbf{x}}_{n,t - 1} ,{\mathbf{z}}_{t} } \right) = p\left( {{\mathbf{x}}_{n,t} |{\mathbf{x}}_{n,t - 1} } \right) \). Subsequently, according to (20), \( w_{n,t} \propto p\left( {{\mathbf{z}}_{t} |{\mathbf{x}}_{n,t} } \right). \) This means that the probabilities are updated based on likelihood, which can be represented by some measurement related to the presence of image features in the vicinity of the predicted particles. Finally, the true object state can be estimated as the weighted moments of particles at each time step t. In summary, to use a Condensation tracker for a specific application, we need to define an application specific dynamical and likelihood models.

4.2 The dynamical model for road tracking

The road model estimated in the previous section is integrated with the Condensation tracker described in [4] for road tracking. The state of the object to be tracked can be represented by a vector x = (a l , a r , b, u H , v H )T, whose elements are the parameters of the road shape model in Equation (9). The dynamics of the states have the following form:

$$ {\mathbf{x}}_{t} = {\mathbf{Ax}}_{t - 1} + {\mathbf{B}}_{t} $$
(21)

where x t is the state vector at time t; A is the state transition matrix; B t is the stochastic part. We model the dynamics simply as a Gaussian random walk, so that A = I and B t shall carry the stochastic variation Δx t−1,t of x from time t − 1 to time t. Now B t is derived as follows. Notice that each state variable in x can be expressed using parameters W, h, l 0, θ, φ and C 0 via Equations (10)–(13). It is not difficult to see that some state variables are implicitly correlated to each other (such as a r and a l ) via (10)–(13), so are their variations from t − 1 to t. It is thus desirable to incorporate such correlation information into B t . Let y: = (W l 0 θ φ C 0)T and assume h is a constant. Parameters in y can be regarded as independent to each other. Indeed equations (10)–(13) specify a non-linear transformation from y to x: x = Φ(y). We shall derive Δx t−1,t from the variation Δy t−1,t of y from time t − 1 to t. Δx t−1,t has the following relation with Δy t−1,t by linearizing Φ:

$$ \Updelta {\mathbf{x}}_{t - 1,t} = {\mathbf{x}}_{t} - {\mathbf{x}}_{t - 1} \approx {\mathbf{H}}_{t} ({\mathbf{y}}_{t} - {\mathbf{y}}_{t - 1} ) = {\mathbf{H}}_{t} \Updelta {\mathbf{y}}_{t - 1,t} $$
(22)

where

$$ {\mathbf{H}}_{t} : = \left. {{\frac{{\partial (a_{l} {\kern 1pt} a_{r} {\kern 1pt} b{\kern 1pt} u_{\text{H}} {\kern 1pt} v_{\text{H}} )}}{{\partial (W{\kern 1pt} l_{0} {\kern 1pt} \theta {\kern 1pt} \varphi {\kern 1pt} C_{0} )}}}} \right|_{t} = \left[ {\begin{array}{*{20}c} { - {\frac{{e_{u} }}{{2e_{v} h}}}} & { - {\frac{{e_{u} }}{{e_{v} h}}}} & 0 & 0 & 0 \\ {{\frac{{e_{u} }}{{2e_{v} h}}}} & { - {\frac{{e_{u} }}{{e_{v} h}}}} & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & {{\frac{{e_{u} e_{v} h}}{2}}} \\ 0 & 0 & {e_{u} } & 0 & 0 \\ 0 & 0 & 0 & { - e_{v} } & 0 \\ \end{array} } \right] $$
(23)

The variation ∆y t, t+1 is modelled as a set of independently distributed normal variables:

  1. 1.

    W ~ N \( \left( {0,\sigma_{W}^{2} } \right), \) σ W  = 0.1 m,

  2. 2.

    C 0 ~ N \( \left( {0,\sigma_{{C_{0} }}^{2} } \right), \) σ C0 = 10−3m−1,

  3. 3.

    φ ~ N \( \left( {0,\sigma_{\varphi }^{2} } \right), \) σ φ  = 10−3rad,

  4. 4.

    θ ~ N \( \left( {0,\sigma_{\theta }^{2} } \right), \) σ θ  = 10−3rad,

  5. 5.

    l 0 ~ N \( \left( {0,\sigma_{{l_{0} }}^{2} } \right), \) σ l0 = 0.1 m.

Finally B t may be written as:

$$ {\mathbf{B}}_{t} = \Updelta {\mathbf{x}}_{t,t + 1} = {\mathbf{H}}_{t} {\varvec{\Upxi}}\omega $$
(24)

where \( {\varvec{\Upxi}}: = {\text{diag}}\left[ {\sigma_{W} \;\sigma_{{l_{0} }} \;\sigma_{\theta } \;\sigma_{\varphi } \;\sigma_{{C_{0} }} } \right] \) and ω is a 5 × 1 vector with the ith element ω (i) ~ N(0,1). Note that H t has incorporated the correlations between variations of state variables. It is easy to show that \( x_{t} \sim N({\mathbf{x}}_{t - 1} ,{\varvec{\Upsigma}}_{x} ) \) where

$$ \Upsigma_{x} = {\mathbf{H}}_{t}^{T} {\varvec{\Upxi}}^{T} {\mathbf{\Upxi H}}_{t} $$
(25)

4.3 The likelihood model for road tracking

In [4], a general form of the likelihood measurement model is proposed

$$ p({\mathbf{z}}|{\mathbf{x}}) \propto \exp \left( { - {\frac{1}{2r}}\int\limits_{0}^{L} {f\left( {{\mathbf{z}}(s) - {\mathbf{x}}(s);\mu } \right){\text{d}}s} } \right) $$
(26)

where f(ν; μ) = min(ν 2, μ 2); r is a variance constant and z(s) is the closest image feature to the predicted state x(s). μ is the spatial scale constant defining the maximal scale of the search interval for image features. So the likelihood measures how close the predicted feature is to the actual feature found in the image. As shown in Fig. 5, a set of points are sampled from the estimated road boundary. We search for the nearest edge point Q i to each sampled point P i on a horizontal line with the sampled point at the centre and measure the distance between them. We also take into account the similarity between the normal \( n_{{{\text{P}}_{i} }} \) at the sampled point P i with \( n_{{Q_{i} }} \) at the edge point Q i . The likelihood model is thus

$$ p\left( {{\mathbf{z}}|{\mathbf{x}}} \right) \propto \exp \sum\limits_{i} {\left( { - \left| {{\mathbf{n}}_{{P_{i} }} \times {\mathbf{n}}_{{Q_{i} }} } \right| - {\frac{{\left\| {{\mathbf{P}}_{i} - {\mathbf{Q}}_{i} } \right\|^{2} }}{{2\sigma^{2} }}}} \right)} $$
(27)
Fig. 5
figure 5

The likelihood model uses edge cues near sampled points from the predicted road curve

If no edge point is found, ||P i  − Q i || is set to σ (5 pixels for our case), the upper bound of μ in (26), and \( \left| {{\mathbf{n}}_{{P_{i} }} \times {\mathbf{n}}_{{Q_{i} }} } \right| \)is set to 1.

4.4 The road transition model

The hyperbolic road model cannot handle well road transitions between road segments with different curvatures. This results in poor tracking performance. For example, in the diagram on the left in Fig. 6, the real road boundaries are curve AB connected to curve BC and curve AB′ connected to curve BC′ but the hyperbolic road model is shown as a pair of curves (dashed lines in the diagram). We therefore need to define a geometric model of road transition in order to handle the transition between road segments with different curvatures, and integrate this to the hyperbola model.

Fig. 6
figure 6

Left a road transition model on the ground plane. Right image projection of the transition model

The projection of the road transition on the image is shown in the diagram on the right in Fig. 6. The road transition is modelled by two road segments connected at B and B’ with G1 continuity (i.e., share the same tangent t B and t B at B and B respectively). The projections of AB and AB′ are approximated by a pair of hyperbolas in the lower part of the image (lower hyperbolas), defined in (9). The projections of BC and BC′ are approximated by another pair of hyperbolas in the upper part of the image (upper hyperbolas). \( P_{{vc_{1} }} \left( {u_{{H_{1} }} ,v_{{H_{1} }} } \right) \, \)is the vanishing point of the asymptotes of the lower hyperbolas. P B and P B’ are projections of B and B respectively. \( P_{{vc_{2} }} \left( {u_{{H_{2} }} ,v_{H2} } \right) \, \) is the vanishing point of the asymptotes of the upper hyperbolas (projections of BC and BC’). Thus the parameters for the lower and upper hyperbolas are (a l1, a r1, b 1, \( u_{{{\text{H}}_{1} }} ,v_{{{\text{H}}_{1} }} \)) and (a l2, a r2, b 2, \( u_{{{\text{H}}_{2} }} ,v_{{{\text{H}}_{2} }} \)), respectively. According to (9) and its derivative (14), the parameters of the lower and upper hyperbolas must satisfy the following relations due to G1 continuity at transition points:

$$ a_{{{\text{l}}2}} = a_{{{\text{l}}1}} + {\frac{{b_{2} - b_{1} }}{{\left( {v_{\text{B}} - v_{{{\text{H}}_{1} }} } \right)^{2} }}} $$
(28)
$$ a_{\text{r2}} = a_{{{\text{r}}1}} + {\frac{{b_{2} - b_{1} }}{{\left( {v_{\text{B}} - v_{{{\text{H}}_{1} }} } \right)^{2} }}} $$
(29)
$$ v_{{{\text{H}}_{1} }} = v_{{{\text{H}}_{2} }} $$
(30)
$$ u_{{{\text{H}}_{2} }} = u_{{{\text{H}}_{1} }} + 2{\frac{{b_{1} - b_{2} }}{{v_{\text{B}} - v_{{{\text{H}}_{1} }} }}} $$
(31)

where v B is the v coordinate of P B and P B. Thus, in contrast to the road shape parameters defined in Sect. 3.2, which is (a l, a r, b, u H, v H) for single pair of hyperbolas, only two more free parameters b 2 and v B are needed to model the road transition. Therefore, the road transition model is (a l1, a r1, b 1, \( u_{{{\text{H}}_{1} }} ,v_{{{\text{H}}_{1} }} , \) b 2, v B ). Indeed, v B defines a transition boundary which split the road image to two strips: lower and upper with different heights (e.g., strips with height H 1 and H 2 in Fig. 6). We look for the first road segment in the lower strip and the second in the upper strip in the image. Alternatively, v B can be replaced by a ratio r = H 1/(H 1 + H 2), where H 1 + H 2 = ImageHeight—\( v_{{H_{1} }} \)and 0 ≤ r ≤ 1, indicating the proportion of the upper image strip in the image. Finally the transitional model is defined as (a l1, a r1, b 1, \( u_{{{\text{H}}_{1} }} , \, v_{{{\text{H}}_{1} }} , \) b 2, r).

4.5 Road tracking with multiple trackers

In this section, we will incorporate the road transition model in Sect. 3.4 into the hyperbolic model for tracking using multiple Condensation trackers. The new state vector has now become (a l1, a r1, b 1, \( u_{{{\text{H}}_{1} }} , \, v_{{{\text{H}}_{1} }} , \) b 2, r) for handling road transition. Higher dimensional feature space usually means we need to sample more particles in order to approximate the probability density function. We realised that this can be avoided if the state vector is partitioned into two parts and each part is handled by a different Condensation tracker. The state vector can be partitioned into x c ≔ (a l1, a r1, b 1, \( u_{{{\text{H}}_{1} }} , \, v_{{{\text{H}}_{1} }} , \)), modelling the lower hyperbolas, and x s : = (b 2 , r), modelling the upper hyperbolas. We assign a tracker C 1 to x c and C 2 to x s . Each tracker evolves its own state vector but each has the knowledge of the full state vector to be able to measure likelihoods for their own particles. We model the dynamics of x s as a Gaussian random walk: \( ({\mathbf{x}}_{s} )_{t} \sim N\left( {({\mathbf{x}}_{s} )_{t - 1} ,\Upsigma_{{x_{s} }} } \right) \) where \( {\varvec{\Upsigma}}_{{x_{s} }} \)is the covariance matrix of x s . In our experiment we set \( {\varvec{\Upsigma}}_{{x_{s} }} = {\text{diag(}}500^{2} ,0.1^{2} ). \)

Road detection in Sect. 2 provides an automatic initialization of C 1. C 2 is initialized after C 1 has been initialized. At the beginning we assume C 2 has the same curvature as C 1, so b 2 is set to b. r is sampled uniformly from the interval [0, 1]. At each time step, for C 1 we draw N 1 particles from the posterior density approximation of C 2 at the previous time step:

$$ \tilde{p}_{{C_{2} }} ({\mathbf{x}}_{t - 1} |{\mathbf{z}}_{t - 1} ) = \{ {\mathbf{x}}_{n,t - 1} ,w_{n,t - 1} \} ,n = 1, \ldots ,N_{2} $$

For each of these particles a new state is predicted using the dynamics of the state x c . Since C 1 maintains the full state vector, we can measure the likelihoods by sampling points on both estimated road segments. When tracking using C 2, N 2 samples \( \{ {\mathbf{x}}_{n,t} \} ,\;n = 1, \ldots ,N_{2} \) are drawn from the current posterior density approximation of the state of C 1: \( \tilde{p}_{{C_{1} }} ({\mathbf{x}}_{t} |{\mathbf{z}}_{t} ) = \{ {\mathbf{x}}_{n,t} ,w_{n,t} \} ,n = 1, \ldots ,N_{1} . \) The current posterior density estimation of C 1 is used as the importance proposal density function q C2:

$$ q_{{C_{2} }} ({\mathbf{x}}_{n,t} |{\mathbf{x}}_{n,t - 1} ,{\mathbf{z}}_{t} ) = p_{{C_{1} }} (x_{n,t} |{\mathbf{z}}_{t} ) \approx \tilde{p}_{{C_{1} }} ({\mathbf{x}}_{n,t} |{\mathbf{z}}_{t} ) $$
(32)

From (20), the weighting scheme for C 2 has thus become

$$ \begin{aligned} (w_{n,t} )_{{_{{C_{2} }} }} & \propto {\frac{{p_{{C_{2} }} ({\mathbf{z}}_{t} |{\mathbf{x}}_{n,t} )p_{{C_{2} }} ({\mathbf{x}}_{n,t} |{\mathbf{x}}_{n,t - 1} )}}{{p_{{C_{1} }} ({\mathbf{x}}_{n,t} |{\mathbf{z}}_{t} )}}} \\ & = {\frac{{p_{{C_{2} }} ({\mathbf{z}}_{t} |{\mathbf{x}}_{n,t} )p_{{C_{2} }} ({\mathbf{x}}_{n,t} |{\mathbf{x}}_{n,t} )}}{{p_{{C_{1} }} ({\mathbf{z}}_{t} |{\mathbf{x}}_{n,t} )p_{{C_{1} }} ({\mathbf{x}}_{n,t} |{\mathbf{x}}_{n,t - 1} )}}} = {\frac{{p_{{C_{2} }} ({\mathbf{z}}_{t} |{\mathbf{x}}_{n,t} )f_{N} \left( {({\mathbf{x}}_{s} )_{n,t} ;({\mathbf{x}}_{s} )_{n,t} ,{\varvec{\Upsigma}}_{{x_{s} }} } \right)}}{{(w_{n,t} )_{C1} f_{N} \left( {({\mathbf{x}}_{c} )_{n,t} ;({\mathbf{x}}_{c} )_{n,t - 1} ,{\varvec{\Upsigma}}_{{x_{c} }} } \right)}}} \\ \end{aligned} $$
(33)

where \( {\varvec{\Upsigma}}_{{x_{c} }} \) is (25), and \( f_{N} (\alpha ;\beta ,{\varvec{\Upsigma}}), \) \( \alpha ,\beta \in R^{M} , \) is the evaluation function of a Gaussian density having the following form:

$$ f_{N} (\alpha ;\beta ,{\varvec{\Upsigma}}): = {\frac{1}{{(2\pi )^{{{M \mathord{\left/ {\vphantom {M 2}} \right. \kern-\nulldelimiterspace} 2}}} \left| {\varvec{\Upsigma}} \right|^{{{1 \mathord{\left/ {\vphantom {1 2}} \right. \kern-\nulldelimiterspace} 2}}} }}}\exp \left\{ { - (\alpha - \beta )^{\text{T}} {\varvec{\Upsigma}}^{ - 1} (\alpha - \beta )} \right\} $$
(34)

Note that in (35) C 2 weight updating takes into account C 1 weight updating, which means C 2 uses the knowledge about the state density estimates by C 1. Note that N 1 and N 2 can be different and we assign more particles for the tracker with a state space of higher dimension.

Finally, the upper hyperbolas will become the lower hyperbolas if the proportion between the upper hyperbolas and the lower hyperbolas is high and a new pair of upper hyperbolas will be initialised in order to search for forthcoming transitions. This is indicated by r. In detail, for each particle in C 2, if r is larger than a threshold r takeover, we work out the upper hyperbola parameters a l2, a r2, u H2, v H2 using equations (29)–(32) and update the lower hyperbolas as x c  = (a l2, a r2, \( u_{{{\text{H}}_{1} }} , \, v_{{{\text{H}}_{1} }} , \) b 2 )T; (2) r is uniformly sampled from the range [0, r init] to initialize tracking for the next possible road transition. In our experiment, we set r init = 0.3 and r takeover = 0.6.

5 Experiments

We have tested our road detection and tracking methods using real road videos with resolution 512 × 384 and frame rate 25 fps. We demonstrate the results here using three videos. The first video contains road occlusion. The second contains spurious road markings. The third has shadows and significant road curvature. We used 400 particles for tracker C 1 and 200 particles for C 2. The tests were run on a PC with a 2.4 GHz processor. Table 2 shows a breakdown of the average processing time per frame.

Table 2 Average processing time

Some tracking results are shown in Fig. 7. Images in the first row show a two-lane motorway. Highlighted hyperbolas represent the estimated road shape. Lower and upper hyperbolas are in different colours. At frame 80, the car is moving to the left lane, which triggers re-initialization of the tracking. The lane changing action is near completion at frame 86, and the boundaries of the current lane are successfully re-locked. Images in the second row demonstrate the robustness of our method to background clutter such as painted white arrows, zebra crossing lines and partial occlusion by people crossing the road. Images in the last two rows show some rural roads with significant curvature and shadows of trees on them. Frames 180, 295 and 305 show transitions between road segments with different curvatures.

Fig. 7
figure 7

First row motorway. Second row suburban road. Third and forth rows rural road

The estimated road curvatures (by C 1) for the three videos are plotted in Fig. 8. Positive values indicate right bent and negative values left bent. Figure 9 illustrates the lateral position of the vehicle relative to the centre axis of the current lane in the three videos. The two jumps in the graph for video 1 indicate vehicle lane switching actions.

Fig. 8
figure 8

Estimated road curvature in three videos (unit: m−1). x Axis indicates the frame number. Positive curvature indicates right bend. Negative indicates left bend

Fig. 9
figure 9

Estimated lateral position of the vehicle relative to the current lane centre in three videos (unit: m). Two jumps in video 1 signal lane changing actions of the vehicle

Figure 10 demonstrates the tracking performance for the three videos. We measure tracking performance in terms of two criteria: (1) the tracking quality as the ratio of the number of detected road feature points to the number of the sampled positions along the estimated road boundaries, using the image measurement model described in Sect. 3. (2) The number of times the tracker has to be re-initialised due to low tracking quality. A summary is shown in Table 3. We observed that the low quality of tracking is mainly due to low contrast of road boundaries against the background or significant occlusions by other vehicles or no presence of parallel road boundaries.

Fig. 10
figure 10

Tracking quality a countryside; b motorway; c suburban

Table 3 Tracking performance for three video sequences

We also compared the tracking result using a single pair of hyperbolas with that using the road transition model, as shown in Fig. 11. Although the single pair of hyperbolas can generally follow the road boundaries close to the camera, it cannot follow those further away with a different curvature. Figure 12 gives a comparison between tracking with and without tracker switching. Although both algorithms use the same amount of particles (600 for our experiment, which is a major factor in computation time), multiple trackers gave results which are visually more stable and accurate than a single tracker.

Fig. 11
figure 11

Comparison of the tracking performance with and without road transition handling. First row tracking using a single pair of hyperbolas. Second row tracking using the transition model (two pairs of connected hyperbolas)

Fig. 12
figure 12

Comparison of tracking performance with (first row) and without tracker switching (second row)

6 Conclusion

Road detection and tracking in an outdoor environment is a non-trivial problem, due to varying curvatures of the roads and variations and noise in the image. We have developed a multiple Condensation tracker framework and integrated it with a hyperbola road model for road boundary detection and tracking. Experimental results have shown that our method is robust for real world applications.