1 Introduction

Urban scene classification and object detection are important topics in the field of remote sensing. Recently, point cloud data generated by LiDAR sensors and multispectral aerial imagery have become two important data sources for urban scene analysis. While multispectral aerial imagery with fine resolution provides detailed spectral texture information about the surface, point cloud data is more capable of presenting the geometrical characteristics of objects.

LiDAR has become a common active surveying method to directly realize the digital 3D representation of targets through a laser ranging, positioning, and orientation system (POS). Based on different platforms, LiDAR technology can cover terrestrial, mobile, airborne, and spaceborne applications. This chapter focuses on airborne applications. Airborne LiDAR (ALS) has attracted plenty of research attention for more than two decades. The ALS technique has been widely applied in diverse fields such as forest mapping (Næsset and Gobakken 2008; Reitberger et al. 2008; Zhao et al. 2018), coast monitoring (Earlie et al. 2015; Bazzichetto et al. 2016), smart urban applications (Garnett and Adams 2018) and so on. As it can directly derive accurate and highly detailed 3D surface information, and because more than one half of the population resides in urban areas, ALS was able to achieve significant applications in urban areas such as urban modeling (Zhou and Neumann 2008; Lafarge and Mallet 2012; Chen et al. 2019), land cover and land use classification (Azadbakht et al. 2018; Balado et al. 2018; Wang et al. 2019), environment monitoring and tree mapping (Liu et al. 2017; Degerickx et al. 2018; Lafortezza and Giannico 2019), urban population estimation (Tomás et al. 2016), energy conservation (Jochem et al. 2009; Dawood et al. 2017) and so on. Urban modeling with ALS data includes the 3D reconstruction of buildings (Bonczak and Kontokosta 2019; Li et al. 2019), roads (Chen and Lo 2009), bridges (Cheng et al. 2014), powerlines (Wang et al. 2017) and so on. Very recently, ALS data are also helpful to improve accuracy for urban mapping and land cover classification. Degerickx et al. (2019) applied ALS data as an additional data source to enhance the performance of multiple endmember spectral mixture analysis for urban land-cover classification using hyperspectral and multispectral images, and found that implementing height distribution information from ALS data as a basis for additional fraction constraints at the pixel level could significantly reduce spectral confusion between spectrally similar, but structurally different land-cover classes. Accurate and highly detailed height information from ALS data is also used to enhance urban mapping accuracy based on the 3D rational polynomial coefficient model (Rizeei and Pradhan 2019).

Besides the above-mentioned applications, ALS can also be used to detect and monitor dynamic objects. Compared to traditional optical imagery, airborne LiDAR data are characterized by involving not only rich spatial but also temporal information. It is theoretically possible to extract vehicles from single-pass airborne LiDAR data, to identify the vehicle motion, and to derive the vehicle’s velocity and direction based on the motion artifacts effect. Thus, besides common applications of airborne LiDAR, it should also be regarded as a demonstrator for traffic monitoring from the air.

Urban scene analysis can be categorized by different object types, different data sources, and also algorithms. During the past decades, more work referring to urban scene analysis has concentrated on the classification or detection of specified objects. Much marvelous research (Clode et al. 2007; Fauvel 2007; Sohn and Dowman 2007; Yao and Stilla 2010; Guo et al. 2011; Xiao et al. 2012) has been done in extracting objects like buildings and roads, while trees and vehicles are also interesting objects for intelligent monitoring of natural resources and traffic in urban areas (Höfle and Hollaus 2010; Yao et al. 2011). However, detection and modeling of diverse urban objects may involve more complicated situations due to the various characteristics and appearances of the objects. As ALS data became widely available for the task of creating 3D city models, there was an increasing amount of research on developing automatic approaches to object detection from images and LiDAR data, which showed the great potential of 3D target modeling and surface characterization in urban areas (Schenk and Csatho 2007; Mastin et al. 2009). In this chapter, we focus on analyzing airborne LiDAR data by the adaptive boosting (AdaBoost) classification technique for urban object detection based on selected spatial and radiometric features. In this chapter, we will develop and validate a robust classification strategy for urban object detection through fusing LiDAR point clouds and imagery.

As mentioned above, ALS data have become an important source for object extraction and reconstruction for various applications such as urban and vegetation analysis. However, traffic monitoring remains one of the few fields which are still not intensively analyzed in the LiDAR community. There are several motivations driving us to perform traffic analysis using airborne LiDAR in urban areas:

  • The penetration ability of laser rays towards volume-scattering objects (e.g., trees) can improve vehicle detection;

  • The motion artifacts generated by the linear scanning mechanism of airborne LiDAR can determine object motion;

  • The explicit extraction of vehicles can refine the results of operations such as DTM filtering and road detection where vehicles are regarded as stubborn disturbances.

The task of detecting moving vehicles with ALS has been addressed in several scientific publications. The research most relevant to our work came from Toth and Grejner-Brzezinska (2006). In this chapter, an airborne laser scanner coupled with a digital frame camera was adopted to analyze transportation corridors and acquire traffic flow information. However, the testing of this system was limited to a motorway; the same problem needs to be investigated in more challenging regions using the system equipped solely with LiDAR. In the contribution from Yao et al. (2010a), a context-guided approach based on gridded ALS data was used to delineate single instances of vehicle objects and results demonstrated the feasibility of extracting vehicles for motion analysis. A vehicle extraction method was presented, running directly on LiDAR point clouds that integrate height, edge, and point shape information in a segmentation step to improve the vehicle extraction through object-based classification (Yao et al. 2011). Based on the extracted vehicles, Yao et al. (2010b) proposed a complete procedure to distinguish vehicle motion states and to estimate the velocity of moving vehicles by parameterizing, classifying, and inverting shape deformation features. In contrast to applications monitoring military traffic, civilian applications include more constraints regarding the objects to be detected. We can assume that vehicles are bound to roads on a known road network, which might not be true in military applications. Such knowledge provides a priori information for motion estimation.

This chapter concerns the detection of selected urban objects and the characterization of traffic dynamics with ALS data. In Sect. 22.2, a robust and efficient supervised learning method for detecting urban objects is proposed, and the analysis of urban traffic dynamics is performed in Sect. 22.3. Section 22.4 presents the experiment and results of detecting urban objects and their dynamics. Finally, conclusions are drawn in Sect. 22.5.

2 Detection of Urban Objects with ALS and Co-registered Imagery

2.1 General Strategy

The workflow of the entire strategy for detecting three urban object classes (buildings, trees, and natural ground) with ALS data and co-registered images is depicted in Fig. 22.1.

Fig. 22.1
figure 1

Overview of the entire strategy

2.2 Feature Derivation

In this chapter, we combine point clouds and image data, while multispectral and LiDAR intensity information is also available. In total 13 features are defined (Wei et al. 2012).

2.2.1 Basic Features

The so-called basic features contain the features that can be directly retrieved from the point cloud and image data, respectively:

  • R, G, B: The three color channels of the digital image. As two data sets are used for experiments and one of them (named data set Vaihingen) provides color-infrared images, features R, G, B stand for infrared, red, and green spectra, But in the other data set (Toronto), the features R, G, and B are normal bands of Red, Green, and Blue. To avoid confusion, we always use the symbols R, G, B to indicate the three color channels of the image in order.

  • NDVI: Normalized Difference Vegetation Index, defined as:

    $${\it{NDVI}} = \frac{{({\it{NIR}} - {\it{VIS}})}}{{({\it{NIR}} + {\it{VIS}})}}$$
    (22.1)

    NDVI can assess whether the target being observed contains green vegetation or not. This feature is specified for data set Vaihingen because it provides color-infrared imagery.

  • Z: The vertical coordinate of each point in the LiDAR data, as the topography of datasets used here, is assumed to be flat.

  • I: Pulse intensity, which is provided by the LiDAR system for each point.

2.2.2 Spatial Context Features

Based on the basic features, we intend to extract more features. Therefore, a 3D cuboid neighborhood is defined with the help of a 2D square with radius of 1.25 m in horizontal dimension as shown in Fig. 22.2. All points located within the cell volume will be counted as the neighbors; the value 1.25 m is chosen empirically.

Fig. 22.2
figure 2

The 3D cuboid neighborhood used to acquire spatial context features

  • ∆Z: Height difference between the highest and lowest points within the cuboid neighborhood.

  • σZ: standard deviation of height of points within the cuboid neighborhood.

  • ∆I: Intensity difference between points having the highest and lowest intensities within the cuboid neighborhood.

  • σI: Standard deviation of intensity of points within the cuboid neighborhood.

  • E: Entropy, here being different from the normal entropy of images, we measure the entropy using LiDAR intensities Ik of the points within the cuboid neighborhood by Eq. 22.2 with K being the number of neighbors:

    $$E = \sum\limits_{k = 1}^{K} {\left[ {\left( { - I_{k} } \right) \cdot \log_{2}^{{I_{k} }} } \right]}$$
    (22.2)

The following two features O and P are based on the three eigenvalues of the covariance matrix from the xyz coordinates of points within the cuboid neighborhood. The three eigenvalues \(\lambda_{1}\), \(\lambda_{2}\), and \(\lambda_{3}\) are arranged in descending order, and they can present the local tridimensional structure. This allows us to distinguish between a linear, a planar, or a volumetric distribution of the points.

  • O: Omnivariance, which indicates the distribution of points in the cuboid neighborhood. It is defined as:

    $$O = \sqrt[3]{{\prod\limits_{i = 1}^{3} {\lambda_{i} } }}$$
    (22.3)
  • P: Planarity, defined as:

    $$P = \left( {\lambda_{2} - \lambda_{3} } \right)/\lambda_{1}$$
    (22.4)

\(P\) has high value for roofs and ground, but low values for vegetation.

2.3 AdaBoost Classification

AdaBoost is an abbreviation for adaptive boosting (Freund and Schapire 1999), which is an improved version of boosting. AdaBoost is an attractive and powerful supervised learning algorithm of machine learning and it has been successfully applied in both classification and regression cases. For classification cases, AdaBoost is adapted to take full advantage of the weak learners and solves the problem of combining a bundle of weak classifiers to create a strong classifier which is arbitrarily well correlated with the true classification. It consists of iteratively learning weak classifiers with respect to a distribution and adding them to a final strong classifier. Once a weak learner is added, the data are reweighted according to the weak classifier’s accuracy; misclassified samples gain weight and correctly classified samples reduce weight. No other requirement is essential for the weak learners used in the AdaBoost except that their classification accuracy is better than the random classification, which means that the weak learners only need to achieve a classification accuracy better than 50%. In this chapter, we use an open-source AdaBoost toolbox with one tree weak learner CART (classification and regression tree), more details of which can be found in the reference (Freund and Schapire 1999).

Like other supervised learning algorithms, AdaBoost contains two phases as well: training and prediction. In the training phase, it repeatedly trains T weak classifiers through T rounds. In this chapter we implemented the multiclass classification task through iterating corresponding binary classifiers, as shown in the following pseudocode for the binary classification:

$$\begin{aligned} & Input{\text{-}}{Training}\,Data\,with\,m\,samples{\text{:}}\,\left( {x_{i} ,y_{i} } \right),y_{i} \in Y = \{ - 1, + 1\} ,i \in \left[ {1,m} \right]; \\ & {\rm }Initialize{\text{:}}\,W_{1}^{i} = \frac{1}{m},h_{1}^{i} = 0; \\ & {for}\,t = 1{\text{:}}{\rm} T \\ & \quad \quad train\,the\,{t}^{{th}}\,weak\,classifier\,h^{t} \,with\,weight\,vector\,of\,sample\,distribution\,W_{t} ; \\ & \quad \quad choose\,\varepsilon _{t} = \sum\limits_{i}^{m} {W_{t}^{i} } * I\left( {h_{t}^{i} (x_{i} ) \ne y_{i} } \right); \\ & \quad \quad \alpha _{t} = \ln \left( {\frac{{1 - \varepsilon _{t} }}{{\varepsilon _{t} }}} \right)/2; \\ & \quad \quad Z_{t} = \sum\limits_{{i = 1}}^{m} {W_{t}^{i} e^{{( - \alpha _{t} h_{t} (x_{i} )y_{i} )}} } ; \\ & \quad \quad \quad \quad W_{{t + 1}}^{i} = W_{t}^{i} * e^{{( - \alpha _{t} h_{t} (x_{i} )y_{i} )}} /Z_{t} ;\quad \quad for\,i = 1{\text{:}}m \\ & \quad \quad end \\ & end \\ \end{aligned}$$

The T weak classifiers are combined and output-weighted as follows:

$$\it H\left( x \right) = {\text{sgn}} \left( {\sum\limits_{t = 1}^{T} {\alpha_{t} h_{t} } } \right)$$
(22.5)

where the sgn function is defined as:

$$\it {\text{sgn}} \left( x \right) = \left\{ {\begin{array}{*{20}c} { - 1,x < 0} \\ {0,x = 0} \\ {1,x > 0} \\ \end{array} } \right.$$
(22.6)

In the above, pseudocode \(\left( {x_{i} ,y_{i} } \right)\) represents the \(i{\text{th}}\) training sample with \(x_{i}\) standing for its feature vector and \(y_{i}\) for its class type; \(m\) represents the amount of training data; \(W_{t}^{i}\) is a weight for the \(i{\text{th}}\) training sample being selected to train the \(t{\text{th}}\) classifier \(h^{t}\) and \(W_{t}\) is a vector of \(W_{t}^{i}\); \(\varepsilon_{t}\) is the weighted prediction error of \(h^{t}\); \(\alpha_{t}\) is the weight coefficient for updating the sample distribution; the value of \(I\left( {h_{t}^{i} (x_{i} ) \ne y_{i} } \right)\) is 1 if \(h_{t}^{i} (x_{i} ) \ne y_{i}\), else it equals 0; \(Z_{t}\) is a normalization factor. At beginning, each sample is assigned an equal weight equal to \(W_{1}^{i} = 1/m\), which means that each training sample is selected with the same probability to train \(h^{{1}}\). In the \(t{\text{th}}\) training round, the AdaBoost algorithm updates \(W_{t + 1}^{i}\) as follows: training samples correctly identified by classifier \(h_{t}\) are weighted less while those incorrectly identified are weighted more. Then when training \(h^{t + 1}\), the algorithm tends to select samples wrongly classified by previous classifiers with higher probability. After T rounds of training, T-weak classifiers are trained and finally combined into a weighted classifier \(H\left( x \right)\) as the training phase’s output, which has better prediction performance.

The prediction phase uses the combined classifier for classification. Compared to boosting, AdaBoost two advantages for learning a more accurate classifier. First, for each weak classifier’s training, boosting randomly chooses training samples, while AdaBoost chooses samples misclassified in the previous training rounds with greater probability. Thus, AdaBoost can better train the classifier. Second, AdaBoost determines each sample’s classification label through weighting each classifier’s output, which makes an accurate classifier contribute more to the final classification result.

3 Detection of Urban Traffic Dynamics with ALS Data

In this section, we give a brief review of deriving the theory for detecting object dynamics in ALS. We refer to the dimension perpendicular to the sensor heading synonymously as across-track. The dimension along the sensor path will be denoted by a along-track.

3.1 Artifacts Effect of Vehicle Motion in ALS Data

In order to assess the feasibility of extracting information on traffic dynamics from airborne LiDAR sensors installed on the airborne platform, the main characteristics of the sensor, including the data formation method, should be considered first. In most airborne LiDAR scanning processes, exclusive of flash LiDAR which are predominantly based on mechanical scanning, a rotating laser pointer rapidly scans the Earth’s surface with continuous scan angles during flight. While the sensor is moving it transmits laser pulses at constant intervals given by the pulse repetition frequency (PRF) and receives the echoes. With respect to moving objects, the fundamental difference between scanning and the frame camera model is the presence of motion artifacts in the scanner data. Due to short sampling time (camera exposure), the imagery preserves the shape of moving objects; if the relative speed between the sensor and the object is significant then increased motion blurring may occur. In contrast, scanning will always produce motion artifacts, since the distance between sensor and target is usually calculated based on the stationary-world assumption; fast-moving objects violate this assumption and therefore image the target incorrectly depending on the relative motion between the sensor and the object. The dependency can be seen by adding the temporal component into the range equation of the LiDAR sensor. Here, it is assumed that the sampling rate is consistent among all the vehicles independent of the scan angle. That is to say that all the vehicles are scanned with enough points to represent their shape artifacts.

In Fig. 22.3a the geometry of data acquisition is shown. The sensor is flying at a certain altitude along the dotted arrow. An example of shape artifacts generated by moving objects is also depicted in Fig. 22.3b, where the black dotted box indicates the vehicle shape obtained in the scanning process of airborne LiDAR while the original vehicle is depicted as a rectangle nearby. It can be perceived that the moving vehicle is imaged as a stretched parallelogram. Let \(\theta_{v}\) be the intersection angle between the moving directions of sensor and vehicle where \(\theta_{v} \in \left[ {0^{ \circ } ,360^{ \circ } } \right]\), vL and v the velocity of aircraft and vehicle respectively, ls and lv the sensed and original lengths of the vehicle, respectively; and \(\theta_{SA}\) the shearing angle that accounts for the deformation of the vehicle as a parallelogram. The analytic relations between shape artifacts and object-movement parameters can be derived as:

$$l_{s} = \frac{{l_{v} \cdot v_{L} }}{{v_{L} - v \cdot \cos \left( {\theta_{v} } \right)}} = \frac{{l_{v} }}{{1 - \frac{v}{{ \, v_{L} }} \cdot \cos \left( {\theta_{v} } \right)}}$$
(22.7)
$$\theta_{SA} = \arctan \left( {\frac{{v \cdot \sin \left( {\theta_{v} } \right)}}{{v_{l} - v \cdot \cos \left( {\theta_{v} } \right)}}} \right) + 90^{ \circ }$$
(22.8)
Fig. 22.3
figure 3

Moving objects undergo the scanning of airborne LiDAR. Copyright © 2010 IEEE, reproduced by permission

where \(\theta_{SA} \in \left( {0^{ \circ } \,180^{ \circ } } \right)\) and is found as the left-bottom angle of the observed vehicle.

For the sake of full understanding of the appearance of moving objects in the airborne LiDAR data, object motions are to be divided into the following different components and investigated for their respective influences on the data artifacts generated.

First, the target is assumed to move with constant velocity \(v_{a}\) following the along-track direction, which leads to the stretching effect of the object shape depending on the relative velocity between target and sensor as illustrated in Fig. 22.4.

Fig. 22.4
figure 4

Along-track object motion. Copyright © 2010 IEEE, reproduced by permission

The analytic relation between the object velocity in along-track direction \(v_{a}\) and the observed stretched length \(l_{s}\) thus can be summarized in Eq. 22.9. The relation in Eq. 22.9 is further modified to Eq. 22.10 which explicitly connects \(v_{a}\) with the variation in the aspect ratio of vehicle shape in a mathematical way, thereby making motion detection and velocity estimation more feasible and reliable:

$$l_{s} = \frac{{l_{v} }}{{1 - \frac{{v_{a} }}{{v_{L} }}}}$$
(22.9)
$$Ar_{s} = \frac{{l_{s} }}{{w_{v} }} = \frac{Ar}{{1 - \frac{{v_{a} }}{{v_{L} }}}}$$
(22.10)

where \(Ar_{s}\) is the sensed aspect ratio of the vehicle in ALS data while \(Ar\) is the original aspect ratio of the vehicle and wv is the width of the vehicle.

Secondly, the target is assumed to move in the across-track direction with a constant velocity \(v_{c}\). This results in a scanline-wise linear shift of laser footprints that hit upon the target in the direction of movement when the sensor is sweeping over so that the observed vehicle shape in ALS data is deformed (sheared) to a certain extent as illustrated in Fig. 22.5.

Fig. 22.5
figure 5

Across-track object motion. Copyright © 2010 IEEE, reproduced by permission

Let \(v_{c}\) be the across-track motion component of the object velocity. Since \(v_{c} = v \cdot \sin \left( {\theta_{v} } \right)\), Eq. 22.8 can be rewritten as Eq. 22.11 for describing the analytic relation between the object velocity \(v_{c}\) and the observed shearing angle \(\theta_{SA}\) through the sensor velocity \(v_{L}\) and the intersection angle \(\theta_{v}\):

$$\begin{array}{*{20}l} {\theta_{SA} = \arctan \left( {\frac{1}{{{{v_{L} } \mathord{\left/ {\vphantom {{v_{L} } {v_{c} - \cot (\theta_{v} )}}} \right. \kern-\nulldelimiterspace} {v_{c} - \cot (\theta_{v} )}}}}} \right) + 90^{ \circ } } & {{\text{where}}\, \theta_{v} \ne 0^{ \circ } {/}180^{ \circ } \wedge v_{c} \ne 0} \\ {\theta_{SA} = 90^{ \circ } } & {{\text{where}}\,\theta_{v} = 0{/}180^{ \circ } \vee v_{c} = 0} \\ \end{array}$$
(22.11)

3.2 Detection of Moving Vehicles

All of the effects of moving objects described above can be exploited to not only detect vehicles’ movement but also measure their velocity. Our scheme for vehicle motion detection relies on a strategy consisting of two basic modules successively executed: (1) vehicle extraction; and (2) determination of the motion state.

For vehicle extraction, we used a hybrid strategy (Fig. 22.6) that integrates a 3D segmentation-based classification method with a context-guided approach. For a detailed analysis of vehicle detection, we refer the readers to Yao et al. (2010a, 2011).

Fig. 22.6
figure 6

Workflow for vehicle extraction

To determine the motion state, a support vector machine (SVM) classification-based method is adopted. A set of vehicle points can be geometrically described as a spoke model with control parameters, whose configuration can be formulated as

$${\mathbf{X}} = \left( {\begin{array}{*{20}c} {{\mathbf{U}}_{1} } \hfill \\ \cdot \hfill \\ \cdot \hfill \\ {{\mathbf{U}}_{k} } \hfill \\ \end{array} } \right),{\mathbf{U}}_{i} = \left( {\begin{array}{*{20}c} {\theta_{SA}^{i} } \\ {Ar_{i} } \\ \end{array} } \right)$$
(22.12)

where k denotes the number of spokes in the model. It can be seen that the vehicle shape variability can be represented as a two-dimensional feature space (if the number of spokes k = 1). Thus, the similarity between vehicle instances of different motion states needs to be measured by a nonlinear metric. The SVM has advantages in nonlinear recognition problems and finds an optimal linear hyperplane in a higher dimensional feature space that is nonlinear in the original input space. The trick of using a kernel avoids direct evaluation in the feature space of higher dimension by computing it through the kernel function with feature vectors in the input space. The SVM classifier can be used here again to perform binary classification on those vehicles which still remain after excluding the ones of uncertain state obtained by the shape parameterization step. In addition, the classification framework for distinguishing 3D shape categories (Fletcher et al. 2003) can be adapted to the motion classification schema based on exploiting the vehicle shape features.

3.3 Concept for Vehicle Velocity Estimation with ALS Data

The estimation of the velocity of detected moving vehicles can be done based on all motion artifacts effects in a single pass of ALS data by inverting the motion artifacts model to relate the velocity with other observed and known parameters. Thus, different measurements and derivations might be used to estimate the velocity. The estimation scheme can be initially divided into two main categories, depending on whether the moving direction of vehicles is known or not:

First, given the intersection angle which can be further separated into the following three situations using respective observations to estimate the velocity:

  1. (a)

    The measure for shearing angle of the detected moving vehicles from their original orthogonal shape of rectangles;

  2. (b)

    The measure for the stretching effect of detected moving vehicles from their original size; and

  3. (c)

    The combination of the along-track and across-track velocity components which are estimated based on the above-mentioned effects, respectively.

Second, if the intersection angle is not given:

  1. (a)

    The solution to a system of bivariate equations constructed by uniting the two formulas.

The three methods in the first category assume that the moving directions of vehicles are given beforehand, whereas the last one from the second category does not. To estimate the velocity, the first three methods either utilize the shape stretching or shearing effect or combine them together when applicable. For the last case, the moving direction of vehicles can be estimated along with the velocity by uniting the variable of velocity with the variable of the intersection angle to build a system of bivariate equations and solving it, thereby giving the motion estimation great flexibility to deal with many arduous cases encountered in real-life scenarios. That means that not only the quantity but also the direction of vehicles’ motion can be derived. All possible approaches have their advantages and disadvantages and differ in the accuracy of their results, which are to be analyzed and evaluated in the following subsections, respectively.

3.3.1 Velocity Estimation Based on the Across-Track Deformation Effect

The shearing angle of moving vehicles caused by the across-track deformation allows for direct access to the velocity only if the moving direction is known a priori and input as an observation. Still, information about the orientation of the road axis relative to the vehicle motion is needed to derive the real velocity of vehicles. The velocity estimate v of the vehicle based on the shearing effect of its shape is derived by inverting Eq. 22.8 as

$$v = \frac{{v_{L} \cdot \tan \left( {\theta_{SA} - 90^{ \circ } } \right)}}{{\cos \theta_{v} \cdot \tan \left( {\theta_{SA} - 90^{ \circ } } \right) + \sin \left( {\theta_{v} } \right)}}$$
(22.13)

The value of the intersection angle \(\theta_{{v}}\) can be determined based on principal axis measurements of vehicle points as the flight direction of the airborne LiDAR sensor can always be assumed to be known thanks to sustained navigation systems. Given Eq. 22.13 which shows that the accuracy of the velocity estimate based on the across-track deformation effect \(\sigma^{c}_{v}\) is a function of the quality of the moving vehicle’s heading angle relative to the sensor flight path \(\theta_{v}\) and the accuracy of the shearing angle measurement \(\theta_{SA}\), the standard deviation of the velocity estimate is calculated using the error propagation law (Wolf and Ghilani 1997) and derived as

$$\begin{aligned} \sigma _{v}^{c} & = \sqrt {\left( {\frac{{\partial v}}{{\partial \theta _{v} }}} \right)^{2} \sigma _{{\theta _{v} }}^{2} + \left( {\frac{{\partial v}}{{\partial \theta _{{SA}} }}} \right)\sigma _{{\theta _{{SA}} }}^{2} } \\ & = \sqrt {\begin{array}{*{20}c} {\left( {\frac{{v_{L} \cdot \tan \left( {\theta _{{SA}} - 90^{ \circ } } \right) \cdot \left( {\cos \left( {\theta _{v} } \right) - \tan \left( {\theta _{{SA}} - 90^{ \circ } } \right) \cdot \sin \left( {\theta _{v} } \right)} \right)}}{{\left( {\sin \left( {\theta _{v} } \right) + \tan \left( {\theta _{{SA}} - 90^{ \circ } } \right) \cdot \cos \left( {\theta _{v} } \right)} \right)^{2} }}} \right)^{2} \sigma _{{\theta _{v} }}^{2} } \\ +\, { \left( {\frac{{2v_{L} \cdot \sin \left( {\theta _{v} } \right)\left( {\tan \left( {90^{ \circ } - \theta _{{SA}} } \right)^{2} + 1} \right)}}{{\cos \left( {2\theta _{v} } \right) \cdot \tan \left( {90^{ \circ } - \theta _{{SA}} } \right)^{2} - 2\,\sin \left( {2\theta _{v} } \right) \cdot \tan \left( {90^{ \circ } - \theta _{{SA}} } \right) - \cos \left( {2\theta _{v} } \right) + \tan \left( {90^{ \circ } - \theta _{{SA}} } \right)^{2} + 1}}} \right)^{2} \sigma _{{\theta _{{SA}} }}^{2} } \\ \end{array} } \\ \end{aligned}$$
(22.14)

with \(v_{L}\) being the instantaneous flying velocity of the sensor system.

3.3.2 Velocity Estimation Based on Along-Track Stretching Effect

Besides the above mentioned approach, the velocity of a moving vehicle can be derived by measuring its along-track stretching effect from its original vehicle size. The functional relation is given by:

$$v = \frac{{\left( {1 - Ar/Ar_{s} } \right) \cdot v_{L} }}{{\cos \left( {\theta_{v} } \right)}}$$
(22.15)

where \(Ar_{s}\) =  \(l_{s} /w_{v}\) is the sensed aspect ratio of the moving vehicle, while Ar is the original aspect ratio and assumed to be constant. The accuracy of the velocity estimate based on the along-track stretching effect \(\sigma_{v}^{a}\) is a function of the quality of the aspect ratio measurement for detected moving vehicles and the accuracy of the vehicle’s heading relative to the sensor flight path. \(\sigma_{v}^{a}\) can be calculated by the error propagation law as follows:

$$\begin{aligned} \sigma _{v}^{a} & = \sqrt {\left( {\frac{{\partial v}}{{\partial \theta _{v} }}} \right)^{2} \sigma _{{\theta _{v} }}^{2} + \left( {\frac{{\partial v}}{{\partial Ar_{s} }}} \right)^{2} \sigma _{{Ar_{s} }}^{2} } \\ & = \sqrt {\left( { - \frac{{v_{L} \cdot \sin \left( {\theta _{v} } \right) \cdot \left( {Ar/Ar_{s} - 1} \right)}}{{\cos \left( {\theta _{v} } \right)^{2} }}} \right)^{2} \sigma _{{\theta _{v} }}^{2} + \left( {\frac{{Ar \cdot v_{L} }}{{Ar_{s}^{2} \cdot \cos \left( {\theta _{v} } \right)}}} \right)} \sigma _{{Ar_{s} }}^{2} \\ \end{aligned}$$
(22.16)

3.3.3 Velocity Estimation Based on Combining Two Velocity Components

Both estimation methods presented above might fail to give a reliable velocity estimate if vehicles are moving in such a direction that generated deformation effects for the vehicle shape are not dominated by either one of what the two moving components account for (e.g., a moving vehicle with intersection angle \(\theta_{v}\) = 35° and velocity v = 40 km/h). To fill this gap and enable a velocity estimate in an arbitrary traffic environment, it is proposed to use both shape deformation effects for estimating velocities. The functional dependence of the velocity estimate can be given by the sum of squares of the two motion components, which are derived based on two the shape deformation parameters Ars and \(\theta_{SA}\), respectively:

$$v = \sqrt {\left( {v_{a} } \right)^{2} + \left( {v_{c} } \right)^{2} }$$
(22.17)
$${\text{where}}\left\{ {\begin{array}{*{20}c} {v_{a} = v_{L} \cdot \left( {1 - \frac{{Ar}}{{Ar_{s} }}} \right)} \\ {v_{c} = \frac{{v_{L} }}{{\cot \left( {\theta _{{SA}} - 90^{^\circ } } \right) + \cot \left( {\theta _{v} } \right)}}} \\ \end{array} } \right.$$
(22.18)

and where va and vc are along and across-track motion components. The accuracy of the velocity estimate based on combining the two components \(\sigma_{v}^{a + c}\) is a function of the quality of the along-track and across-track motion measurements for the detected moving vehicle and \(\sigma_{v}^{a + c}\) can be first calculated with respect to these two motion components by the error propagation law as:

$$\begin{aligned} \sigma _{v}^{{a + c}} & = \sqrt {\left( {\frac{{\partial v}}{{\partial v_{a} }}} \right)^{2} \partial ^{2} v_{a} + \left( {\frac{{\partial v}}{{\partial v_{c} }}} \right)^{2} \partial ^{2} v_{c} } \\ & = \sqrt {\frac{{v_{a}^{2} }}{{v_{a}^{2} + v_{c}^{2} }}\sigma _{{v_{a} }}^{2} + \frac{{v_{c}^{2} }}{{v_{a}^{2} + v_{c}^{2} }}\sigma _{{v_{c} }}^{2} } \\ \end{aligned}$$
(22.19)

where \(\sigma_{{v_{a} }}\) and \(\sigma_{{v_{c} }}\) are the standard deviations of along- and across-track motion derivations, respectively. They can be further decomposed into the accuracy with respect to the three observations concerning the vehicle shape and motion parameters based on Eq. 22.18. Using the error propagation law, \(\sigma_{{v_{a} }}\) and \(\sigma_{{v_{c} }}\) are inferred as:

$$\sigma_{{v_{a} }} = \frac{{\partial v_{a} }}{{\partial Ar_{s} }}\sigma_{{Ar_{s} }} = \frac{{Ar \cdot v_{L} }}{{Ar_{s}^{2} }}\sigma_{{Ar_{s} }}$$
(22.20)
$$\begin{aligned} \sigma _{{v_{c} }} & = \sqrt {\left( {\frac{{\partial v_{c} }}{{\partial \theta _{v} }}} \right)^{2} \sigma _{{\theta _{v} }}^{2} + \left( {\frac{{\partial v}}{{\partial \theta _{{SA}} }}} \right)\sigma _{{\theta _{{SA}} }}^{2} } \\ & = \sqrt {\left( {\frac{{v_{L} \cdot \left( {\cot \left( {\theta _{v} } \right)^{2} + 1} \right)}}{{\left( {\cot \left( {90^{^\circ } - \theta _{{SA}} } \right) - \cot \left( {\theta _{v} } \right)} \right)^{2} }}} \right)^{2} \sigma _{{\theta _{v} }}^{2} + \left( {\frac{{v_{L} \cdot \left( {\cot \left( {90^{^\circ } - \theta _{{SA}} } \right)^{2} + 1} \right)}}{{\left( {\cot \left( {90^{^\circ } - \theta _{{SA}} } \right) - \cot \left( {\theta _{v} } \right)} \right)^{2} }}} \right)^{2} \sigma _{{\theta _{{SA}} }}^{2} } \\ \end{aligned}$$
(22.21)

Finally, after substituting Eqs. 22.20 and 22.21 into Eq. 22.19, the error propagation relation for the velocity estimate is based on combining the two velocity components with respect to the three variables Ars, \(\theta_{SA}\), and \(\theta_{v}\) is derived.

3.3.4 Joint Estimation of Vehicle Velocity and Direction by Solving Simultaneous Equations

So far, all of the estimation methods are not able to give velocity estimates if they are moving in an unknown direction or their moving detections cannot be accurately determined in advance. To solve this problem, we propose to jointly consider velocities and the intersection angle \({\theta_{v} }\) as unknown parameters simultaneously, with the variables describing the deformation effects caused by the motion components as observations. Actually, two analytic formulas for the motion artifacts model can be directly viewed as an equation system to which the velocity and the intersection angle are formulated as a set of solutions. This system of bivariate equations relating unknown parameters to observations is given by:

$$\left\{ {\begin{array}{*{20}l} {\theta _{{SA}} - 90^{^\circ } = \arctan \left( {\frac{{v \cdot \sin \left( {\theta _{v} } \right)}}{{v_{L} - v \cdot \cos \left( {\theta _{v} } \right)}}} \right)} \\ {1 - \frac{v}{{v_{L} }} \cdot \cos \left( {\theta _{v} } \right) = \frac{{Ar}}{{Ar_{s} }}} \\ \end{array} } \right.$$
(22.22)

The system is to be solved using the substitution method. First, transform the second sub-equation of Eq. 22.22 into

$$v = \frac{{v_{L} }}{{\cos \left( {\theta_{v} } \right)}} \cdot \left( {1 - \frac{Ar}{{Ar_{s} }}} \right)$$
(22.23)

and substitute it into the first sub-equation of Eq. 22.22, which has been converted into a more solution-friendly expression in advance:

$$\tan \left( {\theta_{SA} - 90^{ \circ } } \right) \cdot v_{L} = v \cdot \left( {\tan \left( {\theta_{SA} - 90^{ \circ } } \right) \cdot \cos \left( {\theta_{v} } \right) + \sin \left( {\theta_{v} } \right)} \right)$$
(22.24)

After substitution, the expression of Eq. 22.24 can be rewritten as:

$$\begin{aligned} \tan \left( {\theta_{SA} - 90^{ \circ } } \right) \cdot v_{L} & = v_{L} \left( {1 - \frac{Ar}{{Ar_{s} }}} \right) \cdot \tan \left( {\theta_{SA} - 90^{ \circ } } \right) \\ & \quad + \tan \left( {\theta_{v} } \right) \cdot v_{L} \cdot \left( {1 - \frac{Ar}{{Ar_{s} }}} \right) \\ \end{aligned}$$
(22.25)

Further, we transform to facilitate the solution and get:

$$\begin{aligned} \tan \left( {\theta _{v} } \right) & = \frac{{\tan \left( {\theta _{{SA}} - 90^{^\circ } } \right) \cdot \left[ {\left( {1 - \left( {1 - \frac{{Ar}}{{Ar_{s} }}} \right)} \right)} \right]}}{{1 - \frac{{Ar}}{{Ar_{s} }}}} = \tan \left( {\theta _{{SA}} - 90^{^\circ } } \right)\left( {\frac{{Ar_{s} }}{{Ar_{s} - Ar}} - 1} \right) \\ & \Rightarrow \theta _{v} = \arctan \left[ {\tan \left( {\theta _{{SA}} - 90^{^\circ } } \right) \cdot \left( {\frac{{Ar_{s} }}{{Ar_{s} - Ar}} - 1} \right)} \right] \\ \end{aligned}$$
(22.26)

Finally, substitute the second sub-equation in Eq. 22.26 into Eq. 22.23 again and the velocity estimate of the moving vehicle v can be derived as follows:

$$v = v_{L} \cdot \left( {1 - \frac{Ar}{{Ar_{s} }}} \right) \cdot \sec \left\{ {\arctan \left[ {\tan \left( {\theta_{SA} - 90^{ \circ } } \right) \cdot \left( {\frac{{Ar_{s} }}{{Ar_{s} - Ar}} - 1} \right)} \right]} \right\}$$
(22.27)

It can be seen that the velocity of a moving vehicle can be directly estimated based on the shape deformation parameters without the need to know the intersection angle \(\theta_{v}\) a priori. \(\theta_{v}\) can be estimated as an intermediate variable solely based on two shape deformation parameters Ars, and \(\theta_{SA}\) and is independent of the sensor flight velocity vL. For accuracy analysis, two accuracy measures can be estimated, namely the moving direction and the velocity. The accuracies of the intersection angle \(\sigma_{{\theta_{v} }}\) and the velocity estimate \(\sigma_{v}\) can be derived as functions of the quality of the along-track stretching and across-track shearing measures. Equivalently, \(\sigma_{{\theta_{v} }}\) and \(\sigma_{v}\) can be calculated with respect to the two deformation parameters by the error propagation law as:

$$\begin{aligned} \sigma _{{\theta _{v} }} & = \sqrt {\left( {\frac{{\delta \theta _{v} }}{{\delta Ar_{s} }}} \right)^{2} \sigma _{{Ar_{s} }}^{2} + \left( {\frac{{\delta \theta _{v} }}{{\delta Ar_{{\theta_{SA}}} }}} \right)\sigma _{{\theta _{{SA}} }}^{2} } \\ & = \sqrt {\begin{array}{*{20}c} {\left( {\frac{{Ar \cdot \tan \left( {90^{ \circ } - \theta _{{SA}} } \right)}}{{Ar^{2} \cdot \tan \left( {90^{ \circ } - \theta _{{SA}} } \right)^{2} \, \cdot\, \left( {Ar - Ar_{s} } \right)^{2} }}} \right)^{2} \sigma _{{Ar_{s} }}^{2} } \\ { + \frac{{Ar \cdot \left( {\tan \left( {90^{ \circ } - \theta _{{SA}} } \right)^{2} + 1} \right) + \left( {Ar - Ar_{s} } \right)}}{{Ar^{2} \cdot \tan \left( {90^{ \circ } - \theta _{{SA}} } \right)^{2} + \left( {Ar - Ar_{s} } \right)^{2} }}\sigma _{{\theta _{{SA}} }}^{2} } \\ \end{array} } \\ \end{aligned}$$
(22.28)
$$\begin{aligned} \sigma _{v} & = \sqrt {\left( {\frac{{\delta v}}{{\delta Ar_{s} }}} \right)^{2} \sigma _{{Ar_{s} }}^{2} + \left( {\frac{{\delta v}}{{\delta \theta _{{SA}} }}} \right)^{2} \sigma _{{\theta _{{SA}} }}^{2} } \\ & = \sqrt {\begin{array}{*{20}c} {\left( {\frac{{Ar \cdot v_{L} \cdot \left( {Ar \cdot \tan \left( {90^{ \circ } - \theta _{{SA}} } \right)^{2} + Ar - Ar_{s} } \right)}}{{Ar_{s}^{2} \left( {Ar - Ar_{s} } \right) \cdot \sqrt {\frac{{Ar^{2} \,\tan \left( {90^{ \circ } - \theta _{{SA}} } \right)^{2} + \left( {Ar - Ar_{s} } \right)^{2} }}{{\left( {Ar - Ar_{s} } \right)^{2} }}} }}} \right)^{2} \sigma _{{Ar_{s} }}^{2} } \\ +\,{\left( {\frac{{Ar \cdot v_{L} \cdot \tan \left( {90^{ \circ } - \theta _{{SA}} } \right) \cdot \left( {\tan \left( {90^{ \circ } - \theta _{{SA}} } \right)^{2} + 1} \right)}}{{Ar^{2} \left( {Ar - Ar_{s} } \right) \cdot \sqrt {\frac{{Ar^{2} \,\tan \left( {90^{ \circ } - \theta _{{SA}} } \right)^{2} + \left( {Ar - Ar_{s} } \right)^{2} }}{{\left( {Ar - Ar_{s} } \right)^{2} }}} }}} \right)^{2} \sigma _{{\theta _{{SA}} }}^{2} } \\ \end{array} } \\ \end{aligned}$$
(22.29)

The empirical error values for two observations \(\sigma_{Ars}\) and \(\sigma _{{\theta_{{SA}} }}\) was also assessed to the same values as used in the preceding methods. The accuracies of intersection angle \(\sigma_{{\theta_{v} }}\) and velocity estimates \(\sigma_{v}\) based on the joint estimation of moving velocity and direction are derived by inserting the empirical errors for the observations into Eqs. 22.28 and 22.29. The error of intersection angle \(\sigma_{{\theta_{v} }}\) is shown in Fig. 22.7a as a function of vehicle velocity and relative angle between vehicle heading and the sensor flying path; the relative error is indicated in Fig. 22.7b. The (relative) velocity errors \(\sigma_{v}\) and \(\sigma_{v} /v\) are shown in Fig. 22.8 as a function of vehicle velocity v and intersection angle \(\theta_{v}\). It can be seen from the plots that most of the vehicles on road sections of urban areas could not allow for high accuracy of moving direction estimation (\(\sigma_{{\theta_{v} }} /\theta_{v}\) < 25%) unless they move a little bit faster (>70 km/h). The high accuracy of velocity estimates could be only guaranteed for vehicles that obviously don’t travel in an across-track direction (\(\theta_{v}\) < 75%). The overall accuracy of velocity estimation derived in this way is slightly degraded compared to other solutions where the moving direction is given beforehand.

Fig. 22.7
figure 7

a Relative error of the intersection angle \(\sigma_{\theta v} /\theta_{v}\) of intersection angles obtained based on the joint estimation of velocity and heading as a function of target velocity v and the intersection angle \(\theta_{v}\), \(\sigma_{\theta v} /\theta_{v}\) is given in %; b Vehicle velocity v (given in km/h) as a function of \(\sigma_{\theta v} /\theta_{v}\) and \(\theta_{v}\)

Fig. 22.8
figure 8

a Relative velocity error \(\sigma_{v} /v\) of vehicle velocities obtained based on the joint estimation of velocity and heading as a function of target velocity v and the intersection angle \(\theta_{v}\), \(\sigma_{v} /v\) is given in %; b Vehicle velocity v (given in km/h) as a function of \(\sigma_{v} /v\) and \(\theta_{v}\).

4 Experiments and Results

4.1 Detection of Urban Objects with ALS Data Associated with Aerial Imagery

4.1.1 Experimental Data for Urban Objects Detection

Two datasets were used in this chapter for an urban scene object detection test, which both include aerial images and airborne LiDAR data. The first dataset (yellow areas in Fig. 22.9) was captured over Vaihingen in Germany and is a subset of the data used for the test of digital aerial cameras carried out by the German Association of Photogrammetry and Remote Sensing (DGPF; Cramer 2010). The other dataset covers an area of about 1.45 km2 in the central area of the City of Toronto in Canada (red areas in Fig. 22.10).

Fig. 22.9
figure 9

Three test sites in Vaihingen: a Area 1; b Area 2; c Area 3

Fig. 22.10
figure 10

Two test sites in Toronto: a Area 4; b Area 5

4.1.2 Experimental Design for Urban Objects Detection

The following steps are considered in this experiment:

Data preprocessing. For both datasets, the aerial images and airborne LiDAR data were acquired at different times. Thus, they are co-registered by geometrical back-projecting the point cloud into the image domain with available orientation parameters. After that, all data points are grid-fitted into the raster format in order to facilitate acquiring spatial context information per-pixel or point. We apply grid-fitting using an interval of 0.5 m on the ground, ensuring that each resampled pixel can be allocated at least with one LiDAR point.

Feature selection. For Dataset 1, as color-infrared images, point cloud data including intensity information are available. All 13 features (R, G, B, NDVI, Z, I, ∆Z, σZ, ∆I, σI, E, O, and P) introduced in Sect. 2.2 are extracted and used for the object detection test. For Dataset 2, there is no infrared band image and thus 12 features are used in the experiment only, without NDVI.

Training samples’ selection. Since training samples are essential and important for supervised learning classification, it is necessary to adopt a suitable approach to derive valid samples considering the characteristics of the used classifier. In this chapter, AdaBoost using the one tree weak learner (CART) is adopted as the final strong classifier (Freund and Schapire 1999), which chooses training samples randomly to some extent. Therefore, for each test site, we first classify the whole test area manually and then randomly choose 10% of the whole test area’s corresponding labeled samples as input training samples for the AdaBoost classifier.

Classifier control and classification procedure. This chapter uses the binary AdaBoost classifier to detect buildings, natural ground, and trees from the urban scene. To do so, the binary AdaBoost classifier is iteratively generated and applied: (1) the classifier for detecting building is generated by training the randomly chosen building samples and non-building samples corresponding to 10% of the whole data amount, and applied to classify the building from the urban scene; (2) 10% natural and non-natural ground samples are randomly selected to train and generate the classifier for natural ground detection, which is then used to separate the natural ground from the complex urban scene; (3) tree detection proceeds by using the binary AdaBoost classifier which is trained on the randomly selected 10% tree and non-tree samples. To test and validate the methods, several areas are chosen for the object detection test according to the actual urban scene. For the building detection, all the five test areas (three in Vaihingen and two in downtown Toronto) are used, whereas Areas 1–4 are used to test the detection of natural ground. And finally, Areas 1–3 in Dataset 1 are used for the detection of trees. The implementation code of the AdaBoost classifier used in this chapter was adapted from that published by Vezhnevets (2005).

Evaluation methods. The evaluation of object detection results is obtained from the ISPRS Test Project on Urban Classification and 3D Building Reconstruction, which conducts the evaluation based on the method described by Rutzinger et al. (2009) and Rottensteiner et al. (2005). The software used for evaluation reads in the reference and the object detection results, converts them into a label image, and then carries out the evaluation as described by Rottensteiner et al. (2013). Since the output of binary AdaBoost classifiers consists of samples labeled by class but not segmented objects, the topological clarification for detected objects described by Rutzinger et al. (2009) is applied to perform the object-based evaluation, which was automatically implemented by the evaluation software. The evaluation output consists of a text file containing the evaluation results and a few images that visualize these results, which include many accuracy indexes such as geometric accuracy, pixel-based completeness, and correctness, object-based completeness, and correctness, balanced completeness and correctness, etc., and the middle evaluation includes attributes like an evaluation on a per-object level as a function of the object area, etc.

This chapter applies the binary AdaBoost classifier by fusing the image and LiDAR features to detect buildings, natural ground, and trees in several different complex urban scenes. The detection accuracies of buildings, natural ground, and trees are presented in Tables 22.1, 22.2, and 22.3, respectively. In these tables pixel-based evaluation accuracy (Compl area [%], Corr area [%], Pix-Quality [%]), object-based evaluation accuracy (Compl obj [%], Corr obj [%], obj-Quality [%]), balanced evaluation accuracy (Compl obj 50 [%], Corr obj 50 [%], obj-Quality 50 [%]), and detected objects’ geometric accuracy (RMS [m]) are listed for evaluating the detection result of buildings in Areas 1–5, natural ground in Areas 1–4, and trees in Areas 1–3, respectively.

Table 22.1 Detection accuracy of buildings
Table 22.2 Detection accuracy of natural ground
Table 22.3 Detection accuracy of trees

4.1.3 Results of Urban Objects Detection

As stated in Sect. 22.2, this chapter applies the binary AdaBoost classifier by fusing the image and LiDAR features to detect buildings, natural ground, and trees in several different complex urban scenes. The detection accuracy of buildings, natural ground, and trees are presented in Table 22.1, Table 22.2, and Table 22.3 respectively. In Tables 22.1, 22.2 and 22.3, pixel-based evaluation accuracy (Compl area [%],Corr area [%], Pix-Quality [%]), object-based evaluation accuracy(Compl obj [%],Corr obj [%], obj-Quality [%]), balanced evaluation accuracy (Compl obj 50 [%], Corr obj 50 [%], obj-Quality 50 [%]) and detected objects’ geometric accuracy (RMS [m]) are listed for evaluating the detection result of buildings in Areas 1–5, natural ground in Areas 1–4, and trees in Areas 1–3, respectively.

Building detection result. It can be noticed from Table 22.1 that all the five test sites obtain 85% or higher pixel-based completeness, while the object-based completeness is lower due to the area of overlap of objects, especially for Test Sites 2 and 3 with object-based completeness of less than 80%. With regard to correctness, the three test sites in Dataset 1 perform better than the two test sites in Dataset 2 with respect to all evaluation aspects: evaluation methods of pixel-based, object-based, and pixel-object balanced. Thus, it can conclude that the building detection of Dataset 1 is more robust than that of Dataset 2. Concerning the geometric aspect, Test Area 2 obtained the best geometric accuracy of RMS 0.9 m, followed by Area 3 with RMS 1.0 m, and Area 1 with RMS 1.2 m, while both test sites in Dataset 2 obtain the worst geometric accuracy with RMS 1.6 m. Among the five test sites, Area 2 achieved the best overall building detection accuracy completeness of 92.5%, correctness of 93.9%, detection quality of 87.2% using pixel-based evaluation, completeness of 100%, correctness of 100%, and detection quality of 100% based on evaluation balanced between pixels and objects, correctness of 100% based on object-based evaluation, and geometric accuracy of RMS 0.9 m. Due to the small number of buildings, three false negatives on detected objects gave Test Site 2 lower completeness than Test Sites 1, 4, and 5 based on object-based evaluation, even though there are more false negatives.

Natural Ground Detection Result. The results of Dataset 1 are better than those of Dataset 2 on all indexes. Concerning the pixel-based evaluation result, the detection completeness is lower than the correctness for all the test sites, while it is the same for the object-based evaluation result except for Test Site 4. For this test site, the object-based correctness is very low compared to the pixel-based correctness, which shows that the natural ground of Test Site 4 is fragmented and cannot be detected well at the object level. Regarding the geometric aspect, Areas 2 and 3 obtain the best geometric accuracy of RMS 1.1 m, followed by Area 1 with RMS 1.3 m, while test site 4 in Dataset 2 obtains the worst geometric accuracy with RMS 1.7 m. Among the four test sites, Site 2 achieves the best overall natural ground detection accuracy with completeness of 80.5%, correctness of 85.7%, detection quality of 71.0% based on pixel-based evaluation, completeness of 83.3%, correctness of 100%, detection quality of 83.3% based on a balanced evaluation of pixels and objects, and geometric accuracy of RMS 1.1 m. Due to the larger number of small-sized natural ground objects and fewer larger ones, Test Site 2 obtains lower detection accuracy using object-based evaluation.

Tree-detection result. Only Dataset 1 was tested. From Table 22.3, it can be noticed that the tree-detection accuracy is lower than 80%, being lower than that of building detection in the same test site. Although the accuracy indexes obtained based on both pixel-based and object-based evaluation are not so good, this is related to the definition of trees in the reference data since the balanced accuracy is good. On the geometric aspect, Area 3 obtains the best geometric accuracy of RMS 1.3 m, followed by Area 1 and 2 with RMS 1.4 m. The geometric accuracy for tree detection is worse than that of both buildings and natural ground, due to the more complex shape of trees in 2D and 3D. Among the three test sites, Area 2 achieves the best overall tree-detection accuracy with the completeness of 72.0%, correctness of 78.5% based on pixel-based evaluation, completeness of 63.0%, correctness of 82.4% based on object-based evaluation, completeness of 89.3%, and correctness of 98.6% using the balanced evaluation of pixels and objects, and geometric accuracy of RMS 1.4 m.

The detection results presented above show that the proposed AdaBoost-based strategy can detect objects very well in complex urban areas based on relevant spatial and spectral features that have been obtained by combining point clouds and image data. First, most detected objects only suffer from errors in boundary regions, especially with respect to buildings in Test Sites 1–3, which means that the proposed method can successfully separate desirable objects from the background using the combined spatial-spectral features. Second, the trees and natural ground can be discriminated efficiently in Dataset 1 in spite of similar spectral features, which demonstrates that the method can take full use of the advantages of fusing features and an ensemble classifier. Third, the detection achieves the best geometric accuracy for buildings, with RMS 0.9 m, partly biased by data co-registration error, which demonstrates the proposed high accuracy of the method. Fourth, larger-sized objects achieve better detection completeness and correctness; for example, all the buildings with area larger than 87.5 m2 are detected correctly for Test Sites 1–3, while some smaller buildings are omitted due to being classified as false positives, which justifies the reliability of the AdaBoost-based strategy for urban objects detection.

4.2 Accuracy Prediction for Vehicle Velocity Estimation Using ALS Aata

To demonstrate the quality of the velocity estimation for real-life scenarios and to deliver quantitative guidance on the planning of LiDAR flight campaigns for traffic analysis, real road networks in urban areas will be used in an experiment to simulate the prediction of velocity and estimate its accuracy. This will be useful for exploiting boundary conditions in applying the proposed strategy in real airborne LiDAR campaigns for traffic analysis. Generally, it can be stated that this simulation has been designed by considering the following points:

  • Validate the feasibility and repeatability of velocity estimation results;

  • Verify the velocity estimation scheme, which provides rational results with sufficient accuracy in a wide range of datasets acquired over urban areas; and

  • Demonstrate the potential of velocity-accuracy analysis to provide valuable guidance on optimizing flight planning for traffic monitoring.

The accuracy of the estimated velocity \(\sigma_{v}\) is simulated for two road network sections north of Munich which represent the most typical scenarios in urban areas. In this area, several main roads and large express roads are situated and are highly frequented during rush hours. For each test site, two general schemes are assumed to exist, where the four different velocity estimators presented above are applied: First, the moving direction of a vehicle relative to the sensor flight path is known (here the moving direction is derived based on the road orientation); and second, the moving direction of the vehicle relative to the sensor flight path is unknown.

As three methods within the first scheme complement each other concerning performance, we finally combined the estimators depending on the relative orientation between the vehicle heading and the sensor flight path to get optimal results. For every relative orientation the estimator that provides the best results is chosen. That means that the maximum of estimated velocity accuracies is assumed to be selected as the accuracy value for a velocity estimate at that road location. Parameters of real flying using the Riegl LMSQ560 sensor have been used in this simulation and an average speed of 120 km/h was assumed (concrete configurations can be found in Table 22.4). The average velocity of moving vehicles on the roads is set to 60 km/h. The error measures for the shearing angle and intersection angle of moving vehicles can be assessed empirically from shape parameterization: for our case, \(\sigma_{Ars}\) = 0.4, \(\sigma_{\theta_{SA}}\) = 2°, and \(\sigma_{\theta_{v}}\) = 2°. The orientation of the roads relative to the planned flying path and the resulting \(\sigma_{v}\) values obtained by combining the estimators in the first scheme are shown in Fig. 22.11a, c, while the resulting values of \(\sigma_{v}\) using second scheme for the same sites are shown in Fig. 22.11b, d. \(\sigma_{v}\) is given in % of the absolute velocity. With the algorithm described earlier, velocities can be estimated with an accuracy better than 10% for about 80% of the investigated road networks. Figure 22.12 indicates which estimator is chosen in which parts of the road network. It shows that the across-track shearing-based estimator (Method 1) provides the best results for large parts of the road network. The along-track stretching-based (Method 2) and combined (Method 3) estimators outperform the across-track shearing-based approach only in areas where the road is extended roughly in the along-track direction (i.e., \(\forall \,\theta _{v} \le 25^{ \circ }\)). For example, in the second test site (Fig. 22.12b), Dachauer Street (in the bottom-left part) requires Method 3 to be used for velocity estimation, whereas one part of Ackermann Street (curved, in the top-left part) requires Method 2 to be used. Moreover, in most parts of the road network, the accuracy of velocity estimation using the first scheme is generally higher than that obtained using the second scheme, especially when vehicles move along a direction that is close to across-track. This is due to the fact that the joint estimation of velocity and moving direction angle can incorporate additional error sources caused by the unknown moving direction of vehicles relative to the sensor flight path, leading to an accumulative error for final velocity estimates.

Table 22.4 Parameters of typical airborne topographic LiDAR
Fig. 22.11
figure 11

Simulation of σv for two road networks north of Munich using the velocity estimation schemes: a The estimation accuracy for the first road network in % of the absolute velocity using the second scheme; b The estimation accuracy for the first road network in % of the absolute velocity using the first scheme; c The estimation accuracy for the second road network in % of the absolute velocity using the first scheme; d The estimation accuracy for the second road network in % of the absolute velocity using the second scheme

Fig. 22.12
figure 12

Indication of velocity estimation methods used for the two road networks under the first scheme for velocity estimation (moving direction relative to sensor flight is known): a Indicating which estimation method is chosen in which parts of the first road network; b Indicating which estimation method is chosen in which parts of the second road network

5 Summary

This chapter is concerned with detecting urban objects and traffic dynamics from ALS data. Urban object detection in complex scenes is still a challenging problem for the communities of both photogrammetry and computer vision. Since LiDAR data and image data are complementary for information extraction, relevant spatial-spectral features extracted from ALS point clouds and image data can be jointly applied to detect urban objects like buildings, natural ground objects, and trees in complex urban environments. To obtain good object detection results, an AdaBoost-based strategy was presented in this chapter. It includes: First, co-registering LiDAR point clouds with images by back-projection with available orientation parameters; Second, grid-fitting of data points into the raster format to facilitate acquiring spatial context information; Third, extracting various spatial-statistical and radiometric features using a cuboid neighborhood; and Fourth, detecting objects including buildings, trees, and natural ground by the trained AdaBoost classifier whose output consists of labeled grids.

The performance of the developed strategy towards detecting buildings, natural ground, and trees in urban areas was comprehensively evaluated using the benchmark datasets provided by ISPRS WGIII/4. Both semantic and geometric criteria were used to assess the experimental results. From the detection results, it can be concluded that the AdaBoost-based classification strategy can detect urban objects reliably and accurately, achieving the best detection accuracy for buildings with completeness of 92.5% and correctness of 93.9%, for natural ground with completeness of 80.5% and correctness of 85.7%, and for tree detection with completeness of 72.5% and correctness of 78.5% based on per-pixel evaluation. The quality indexes for the detection of tree and natural ground, evaluated on per-object level, seem not to be as high as for buildings. Nevertheless, the overall accuracy is high for such complex urban scenes, as can be concluded from the balanced evaluation of pixels and objects. With further research, the detection results might be refined with graph-based optimization, which is expected to improve the detection accuracy by accounting for label smoothness both locally and globally. Moreover, in order to further ensure the reliability of object detection, we still need to refine the co-registration accuracy of multimodal data via hierarchical feature matching and optimize alterable parameters through sensitivity analysis.

For characterizing urban traffic dynamics, a method to identify vehicle movement from airborne LiDAR data and to estimate respective velocities has been developed. Besides a description of the developed methods, theoretical and simulation studies for performance analysis were shown in detail. The detection and velocity estimation of fast-moving vehicles seems to be promising and accurate, whereas slow-moving vehicles are harder to distinguish from non-moving ones and it is harder to obtain estimates with acceptable accuracy. Moreover, the point density of LiDAR datasets tends to be directly proportional to the performance of motion detection. The estimation of the velocity of detected vehicles can be done with high accuracy for nearly all possible observation geometries except for those ones which are moving in the (quasi-)along-track direction while sensors are sweeping over instantaneously.

Although the results shown in this chapter cannot directly be compared with those of induction loops or bridge sensors, they show nonetheless great potential to support traffic monitoring applications. The big advantages of ALS data are their large coverage and certain penetrability through trees, and thus, the possibility to derive traffic data throughout an extended road network that may be occluded by trees on the roadsides. Evidently, this complements the accurate but sparsely sampled measurements of fixed mounted sensors. A natural extension of the presented approach would be an integration of the accurate, sparsely sampled traffic information with the less accurate but area-wide data collected from space or air-borne sensors. Existing traffic flow models would provide a framework to do this.