1 Learning Objectives

In this chapter, you will learn about:

  • The robot localisation problem

  • The robot mapping problem

  • The Simultaneous Localisation and Mapping (SLAM) problem

  • Common probabilistic state estimation techniques

  • The Kalman filter and the role of the extended Kalman filter as a recursive state estimator in nonlinear systems.

2 Introduction

Imagine you are visiting a new city or country. Perhaps, if you are like me, one of the first things you might do is download or print a copy of the local area map. Or, perhaps make sure that the navigation app or the GPS on your phone is up to date with the latest map. But, while you are travelling across the new city, do you remember the time when you got lost? Even with the latest maps?

Similarly, have you ever wondered how a self-driving car knows where it is going?

A typical mobile robotic system architecture is shown in Fig. 9.1. It consists of several sensors, planning and control modules and actuators. While specific instantiations of these components will be application and platform-dependent, a typical mobile robotic system requires these building blocks to function. First of all, internal and external sensors provide information about the robot and the physical world it inhabits. Next, this information is interpreted by various algorithms to estimate the vehicle’s state and its environment. The state estimate is then used to plan the robot’s actions and generate commands for the actuators. We looked at sensors, control, path planning and obstacle avoidance in the previous chapters.

Fig. 9.1
figure 1

A high-level overview of a mobile robot system from the sensors to algorithms to actuators

This chapter explores localisation and mapping, once considered the holy grail of robotics, which are two fundamental capabilities that any autonomous mobile robot requires to navigate in the wild, including self-driving cars (and Mars rovers, too!).

An Industry Perspective

Guillaume Charland-Arcand

ARA Robotics

I was exposed first during my CEGEP years in a small club where we were building sumo robots for robotics competitions. The competition was simple, 2 robots faced each other on a circular black ring with a small white bar that delimited the edge, the goal was for one robot to push the other one out of the ring. My robot was very simple, big motors, big wheels, a few sensors, an 8-bit microcontroller, my own circuit board, and a few lines of codes. I was not very successful in the competitions, but building a thing on your own, mixing mechanics, electronics, software, and seeing something move on its own, was pretty cool. But I felt I did not know enough, so at the university, I decided to join a scientific club focusing on multirotor UAVs, which was pretty new at the time. I fell in love with this branch of robotics instantly. It was mobile robots, like my good old sumo robot, but on steroids. Everything was harder; more vibration, complicated nonlinear dynamics, limited payload capacity and it’s flying!

At the time of doing my master’s, working with UAV was hard because of the lack of resources. I had convinced my supervisor to buy equipment, but based on the budget, we could only afford 1 UAV. This made things a lot more complicated for me, because, one mistake, one line of code in error, and the UAV crashes. Working on control law design made this even more problematic. Everybody that worked a bit in control theory has experienced this: it’s always fine on paper and in simulations, but there is always the small caveat of finding the controller gains, which is done through experimentation typically. The challenge was to tune my controller and validate my controller software without breaking the only UAV I had. This is where I got introduced to safety-critical engineering and its practices, i.e., how to design software and hardware in a systematic fashion to guarantee that it won’t fail. I did not go as far as following DO-178 standards, but it provided new insight on how to develop robotic products and applications.

When I started, SLAM was starting to be applied on UAVs. A few ROS packages existed, but it was mostly in 2D, using Hokuyo scanning laser rangefinder. There were also a few successful demonstrations of autonomous UAVs operating in GNSS denied environments, but it was mainly prototyped in experimental settings. Now, companies such as Exyn Technologies and Skydio, provide products for industrial applications that have a very high level of autonomy. These systems can generate extremely precise 3D maps, detect static and dynamic obstacles and plan paths through unknown environments at high velocities. I would say the last 5 years’ technological improvements are major and are a great bedrock from the innovation to come because the task of autonomy is very complex and not completely solved yet.

3 Robot Localisation Problem

For any autonomous mobile robot to be successfully deployed, it requires to answer the question ‘where am I?’. For example, in Fig. 9.2 a robot travelling in a local coordinate frame. In Fig. 9.2, a robot is shown at time \(t = k\) at an unknown location. The robot localisation problem is to find the current coordinates and the heading (the direction which the robot is facing) of the robot with respect to a given local coordinate frame. The local coordinate frame is usually fixed at the location where the robot was at time \(t = 0\), though this could be arbitrarily selected. The robot also carries its own coordinate frame. The heading is usually defined as the angular difference between the x-axis of the local coordinate frame and the x-axis of the robot coordinate frame. The localisation problem is said to be solved for a given time \( t = k\) when the current \(\left( {x,y,z} \right)\) coordinates and the heading \(\left( \Psi \right)\) of the robot with respect to the given local coordinate frame are fully known.

Fig. 9.2
figure 2

A robot travelling in a local coordinate frame

In the example shown, the robot knows its starting position. The localisation problem could then be thought of as a tracking problem , where the requirement is to track the robot’s movement from the beginning with respect to its initial position. On the other hand, if the initial position is not known, then the localisation problem is considered to be a global positioning problem . An example of this could be a robot being turned on at an arbitrary location without knowing its initial position.

Another related problem is the ‘kidnapped’ robot problem, where a properly localised robot suddenly gets moved to a different location without being aware of the move. An example of this could be a vacuum cleaning robot starting from its charging station (known local coordinate frame) and suddenly being picked up by a user and placed in another room (who would do that?).

3.1 Odometry-Based Localisation

One of the most common techniques used in robotics to extract the current location of a robot is to use wheel encoders. Odometry, the word derived from Greek roots (odos—street and metron—to measure) simply refers to any technique that uses motion information to derive relative position estimates of the robot. Similar to the odometer available on a vehicle that indicates the distance the vehicle has travelled (either the absolute distance since its manufacture or the relative distance from an arbitrary starting point), the wheel encoders could be used to calculate relative travel distances and heading of the robot based on the known wheel geometry and dynamics. This process is sometimes referred to as dead reckoning . However, due to sensor errors, slippage and other noise elements inherent in such systems, odometry accumulates errors over time. As shown in Fig. 9.3, a robot could lose track of its location relatively quickly if it only relied on odometry. Nevertheless, odometry is extensively used in robotics to acquire short-term localisation information.

Fig. 9.3
figure 3

True path of a robot (blue) compared to the odometry-based estimate of the path

3.2 IMU-Based Odometry

Inertial measurement units (IMU) are devices that integrate several sensors in a single package, including accelerometers and gyroscopes. IMUs could be used to measure the robot’s linear acceleration in three dimensions (using a tri-axial accelerometer) and rotational rate (using gyroscopes). By appropriately integrating this information, it is possible to derive the robot's speed and travel distance so the robot’s relative location and heading information can be worked out. However, IMU units suffer from ‘drift’ where they accumulate errors over time.

3.3 Visual Odometry

Visual odometry is an alternate technique that uses cameras and computer vision to derive odometric information. Various computer vision techniques have been used to estimate the motion of robots. Generally, these techniques attempt to understand the relative changes in images between subsequent frames due to movement. A common approach is to track a set of image features across frames (see Chap. 7). These techniques are sometimes coupled with an IMU to improve the estimates. Such systems are usually called visual inertial odometry.

3.4 Map-Based Localisation

The previous techniques described for solving the localisation problem only provide relative information. In other words, these techniques require the integration of a series of measurements to derive the robot’s current location. An alternative technique is to use an external map to make direct observations to a series of external landmarks (also called beacons or features ) using a sensor mounted on the robot to infer the robot’s current location with respect to these landmarks based on a-priori map. That is if you know the locations of a set of previously identified landmarks in the environment with respect to a global/local coordinate frame (i.e. someone has already built a local map of the environment) and if you have a sensor capable of re-identifying and measuring relative location of these landmarks with respect to its position, then it is possible to localise your robot within the given coordinate frame using triangulation . Triangulation uses two known locations to ‘triangulate’ or work out the location of a third point using geometry. In order to triangulate, the distances and the angles to the known locations (landmarks) with respect to the unknown location must be measured (see Fig. 9.4).

Fig. 9.4
figure 4

Map-based localisation

Consider a robot observing \(n\) landmarks in the environment with known locations using a noisy sensor where the current location of the robot is unknown;

The current robot location (unknown) is: \({\mathbf{x}}_{r} = \left[ {x_{r} ,y_{r} } \right]\)

Measured bearings to the known landmarks: \({{\boldsymbol{\uptheta}}} = \left[ {\theta_{1} ,\theta_{2,} \ldots ,\theta_{n} } \right]^{{\text{T}}}\)

Actual locations of the landmarks (known) \({\mathbf{x}}_{i} = \left[ {x_{i} ,y_{i} } \right]^{{\text{T}}}\)

Actual bearings to the landmarks (unknown): \({\overline{\mathbf{\theta }}}\left( {{\mathbf{x}}_{r} } \right) = \left[ {\overline{\theta }_{1} \left( {{\mathbf{x}}_{r} } \right),\overline{\theta }_{2} \left( {{\text{x}}_{r} } \right), \ldots ,\overline{\theta }_{n} \left( {{\mathbf{x}}_{r} } \right)} \right)]^{{\text{T}}}\)

where \({\text{tan}}\overline{\theta }_{i} \left( {{\mathbf{x}}_{r} } \right) = \frac{{y_{i} - y_{r} }}{{x_{i} - x_{r} }}\)

Assuming the measurement error to be \(\left( {\updelta \theta_{i} } \right)\), we can write a relationship between the measured bearing and the actual bearing for each observed feature: \(\theta_{i} = \overline{\theta }_{i} \left( {{\mathbf{x}}_{r} } \right) +\updelta \theta_{i}\) or

$$ {{\boldsymbol{\uptheta}}} = {\overline{\mathbf{\theta }}}\left( {{\mathbf{x}}_{r} } \right) +\updelta \theta $$

where

$$\updelta \theta = \left[ {\updelta \theta_{1} ,\updelta \theta_{2,} \ldots ,\updelta \theta_{n} } \right]^{{\text{T}}} $$

and assuming the measurement noise to be zero-mean Gaussian and independent, the covariance matrix is given by

$$ {\Sigma } = {\text{diag}}\left( {\sigma_{1}^{2} ,\sigma_{2}^{2} , \ldots , \sigma_{n}^{2} } \right) $$

We can then construct the maximum likelihood (ML) estimator of the robot location:

$$ \hat{\textbf{\textit{x}}}_{r} = {\text{argmin}}\frac{1}{2}\left[ {{\overline{\mathbf{\theta }}}\left( {\hat{\textbf{\textit{x}}}_{r} } \right) - {{\boldsymbol{\uptheta}}}} \right]^{{\text{T}}} {\Sigma }^{ - 1} \left[ {{\overline{\mathbf{\theta }}}\left( {\hat{\textbf{\textit{x}}}_{r} } \right) - {{\boldsymbol{\uptheta}}}} \right] $$

This could now be solved recursively using a technique such as the Gauss–Newton algorithm as a nonlinear least-squares problem.

Most common localisation algorithms assume landmarks to be stationary during the entire robot operation. This is called the static environment assumption. An environment is considered dynamic if it contains map elements (except for the robot) that are moving, such as humans, vehicles and other robots. Obviously, such dynamic elements are not suitable as landmarks for localisation and are treated as noise. In robotics, such landmarks could be visually salient features naturally occurring in the given environment (e.g. corners and edges) or artificially placed (e.g. laser retroreflective beacons). A suitable sensor (e.g. a camera for the former and a laser range finder for the latter) along with relevant signal processing techniques should be used to detect and measure the distance and the angle (bearing) to these features with respect to the sensor coordinate frame.

The Global Positioning System (GPS) uses the distance information between the robot and the satellite (via time-of-flight and satellite-specific data) to localise, using a slight variation of the triangulation technique called trilateration which requires knowing only the distances to the landmarks (or the satellites). Ideally, having more than two landmarks will help to improve the accuracy of the location estimate. The same applies to the previously described triangulation scenario.

A related problem called the data association problem deals with the disambiguation of detected features and the correct association with the known map features. This problem does not arise with GPS, as each satellite on the constellation sends uniquely identifiable information to the receiver.

4 The Robot Mapping Problem

In the previous section, we discussed how objects within the robot’s environment could be used to localise a robot. Of particular interest was the availability of maps. How are these maps generated? Figure 9.5 shows an ancient map of Taprobana, modern-day Sri Lanka, drawn by the ancient mathematician and cartographer Claudius Ptolemy. The map was drawn using geographical coordinates derived from tools and techniques available at the time. Today, such maps are drawn using modern surveying techniques using modern tools and GPS location data. The general idea, however, remains the same. Measurements are made to features of interest (e.g. contours, trees, structures) and are plotted against a fixed coordinate frame. As you would notice, Ptolemy’s map has only a passing resemblance to today’s maps of the country. During ancient times, in the absence of GPS to localise, sailors and cartographers relied on observing the sun and the stars using such instruments as the sextant resulting in significant errors in measurements and localisation. To create such maps accurately, one would need to make relative measurements to these features accurately and be able to localise the instrument that is making the measurements within the coordinate frame accurately.

Fig. 9.5
figure 5

Ptolemy’s Taprobana as published in Cosmographia Claudii Ptolomaei Alexandrini, 1535 (modern-day Sri Lanka) (Image by Laurent Fries—(Bailey and Durrant-Whyte (2006)), Public Domain, https://commons.wikimedia.org/w/index.php?curid=16165526)

When deploying robots, it is possible to access a priori maps on many occasions. For example, in indoor structures, it may be possible to refer to the architect drawn blueprints of the building or place artificial beacons (e.g. retroreflective markers that respond to laser light) at known and fixed locations. However, this assumes that the structure has not been changed from the original blueprints, which may not be accurate in the former and for large environments, placement and measurement of artificial beacons become a cumbersome proposition. In addition, there are semi-permanent elements within the building, such as furniture, that will change their locations over time, increasing the transient nature of maps. Also, it might be that the robot needs to be deployed in an environment where a pre-built map does not exist. In such situations, the robot is required to build a map. While the localisation problem is a relatively easier problem to tackle due to its low-dimensional nature, the map building problem is much harder, especially if the environment is large.

4.1 Occupancy Grid Maps

One of the simpler techniques used in robotics to create a map is the occupancy grid map. Occupancy grid maps are commonly used in 2D mapping scenarios to describe the floor plan of a robot’s environment using a grid layout. Figure 9.6 shows a small occupancy grid map of an indoor environment. The grey shadings indicate where the sensor (a laser range finder in this case) has detected obstacles. Less dense areas of the ‘map’ indicate lower certainty of an obstacle at those locations. The blue line indicates the robot’s path during the mapping exercise. If you are wondering how the robot’s path was generated, we used a separate localisation algorithm with known laser beacons placed in the environment. If we used the odometric data to generate the occupancy grid map, the results would have been as bad as Ptolemy’s map.

Fig. 9.6
figure 6

An occupancy grid map (left) and the actual plan of the mapped area (right) (image credit: Rafael Gomes Braga)

4.2 Other Types of Maps

Maps such as the occupancy grid maps (Fig. 9.6) that represent the environment using a geometric representation are called metric maps . Other types of maps include feature maps that represent the environment using a set of salient features such as edges and corners; semantic maps that combine geometric information with high-level semantic information such as human identifiable objects in the environment (e.g. Books, tables) and their relationship to each other. Metric and feature maps are relatively less intuitive to humans. Semantic maps provide a more data-rich environment that humans can interpret. Such maps are helpful when humans and robots need to interact. For example, a semantic map implementation is better suited in a self-driving car situation, whereas a metric map might be more efficient for an underground mining application.

5 The Simultaneous Localisation and Mapping (SLAM) Problem

When the robot is provided with a priori map, it is possible to localise the robot in the environment—like our example at the beginning of the chapter of you visiting a new city with a map. Conversely, if the robot’s pose is known, it is possible to construct a map of the environment—such as when surveyors create new maps using GPS location information. But, what happens if the robot does not have a map and does not know the location? This is the dreaded chicken and the egg problem in robotics. The question is, how do you construct a map while using the same map to localise simultaneously. The problem is commonly known as the Simultaneous Localisation and Mapping (SLAM) problem. In the late ‘90s, it was shown that it is indeed possible for a robot to start from an unknown location in an unknown environment to incrementally build a map of the environment while simultaneously computing the pose of the robot using the map being built.

5.1 An Estimation Theoretic Approach to the Localisation, Mapping and SLAM Problems

However, as we have seen in previous chapters, the sensors used in a robot could be noisy, and the environment could be unpredictable, resulting in inherent uncertainties in the measurements made about the environment. Therefore, it is required to consider these uncertainties in any canonical formulation of the problem.

In order to accommodate the underlying uncertainties of the system, it is possible to explore a class of algorithms that explicitly model system uncertainty using theories of probability. Thus, the localisation, mapping or the SLAM problem could be formulated as a multivariate state estimation problem with noisy measurements:

$$ \begin{array}{*{20}c} {{\hat{\mathbf{x}}}_{{{\text{MAP}}}} \triangleq } \\ \\ \end{array} \begin{array}{*{20}c} {{\text{argmax}}} \\ {\mathbf{x}} \\ \end{array} \begin{array}{*{20}c} {p\left( {{\mathbf{x}}{|}{\mathbf{z}}} \right)} \\ \\ \end{array} $$

where \({\mathbf{x}}\) represents the state variable and \({\mathbf{z}}\) is the measurement vector. We assume each multivariate state as a normal distribution.

If we can further assume a prior distribution \(p \) over \({\mathbf{x}}\) exists, then the above problem could be restated using Bayes’ theorem (see Chap. 6):

$$ \begin{array}{*{20}c} {{\hat{\mathbf{x}}}_{{{\text{MAP}}}} \triangleq } \\ \\ \end{array} \begin{array}{*{20}c} { {\text{argmax}} } \\ {\mathbf{x}} \\ \end{array} \begin{array}{*{20}c} {p\left( {{\mathbf{z}}{|}{\mathbf{x}}} \right)p\left( {\mathbf{x}} \right)} \\ \\ \end{array} $$

Here, \(p\left( {{\mathbf{z}}{|}{\mathbf{x}}} \right)\) is the measurement likelihood and \(p\left( {\mathbf{x}} \right)\) is the prior. This sets up the problem in a way that it is possible to solve it in a recursive manner, as was the case with the simple triangulation problem discussed earlier. We can now estimate the states of the robot and the map repeatedly from a given starting point. To do so, we should first define the relevant state vectors of the robot and the environment

The robot’s state for a mobile robot operating in 2D could be expressed as

$$ {\mathbf{x}}_{r} = \left[ {\begin{array}{*{20}c} {x_{r} } \\ {y_{r} } \\ {\phi_{r} } \\ \end{array} } \right] $$

where \(x_{r}\) and \(y_{r}\) denote the current location of the origin of the robot's coordinate frame with respect to the local coordinate frame and \(\phi_{r}\) is the robot's heading with respect to the x-axis of the robot’s coordinate frame.

Assuming that our map is to be constructed using a metric feature map, the location of the \(i{\text{th}}\) map feature could be defined as the vector,

$$ {\mathbf{x}}_{fi} = \left[ {\begin{array}{*{20}c} {x_{i} } \\ {y_{i} } \\ {z_{i} } \\ \end{array} } \right] $$

where \(\left( {x_{i} ,y_{i} ,z_{i} } \right)\) are the 3D coordinates of the \(i{\text{th}}\) feature with respect to the local coordinate frame. If we assume that the entire map of the environment will constitute an \(n\) number of such features, then the map vector could be defined as,

$$ {\mathbf{x}}_{m} = \left[ {\begin{array}{*{20}l} {{\boldsymbol{x}}_{f1} } \hfill \\ \vdots \hfill \\ {{\boldsymbol{x}}_{fn} } \hfill \\ \end{array} } \right] $$

Then, depending on the specific problem, we can assemble the state vector \({\mathbf{x}}\) to be estimated for a given time \(t = k,\) as follows,

  • For the localisation problem,

    $$ {\mathbf{x}}\left( k \right) = \left[ {{\mathbf{x}}_{r} \left( k \right)} \right] $$
  • For the mapping problem,

    $$ {\mathbf{x}}\left( k \right) = \left[ {{\mathbf{x}}_{m} \left( k \right)} \right] $$
  • For the Simultaneous Localisation and Mapping problem,

    $$ {\mathbf{x}}\left( k \right) = \left[ {\begin{array}{*{20}c} {{\mathbf{x}}_{r} \left( k \right)} \\ {{\mathbf{x}}_{m} \left( k \right)} \\ \end{array} } \right] $$

Once the state vector is defined, the estimation process occurs in three steps at any given time step \(t = k\) (see Fig. 9.8):

  1. 1.

    The Prediction Stage

In the prediction stage, we will use a model of the robot’s motion (control input) to predict how the states of the robot would evolve over the considered time step.

  1. 2.

    The Observation Stage

In this stage, the robot uses its on-board sensors to make measurements to the map features in the environment. For example, a depth camera and computer vision algorithm could be used to detect salient features in the environment and measure their location with respect to the robot’s coordinate frame. It is important to note that the algorithms used should be able to identify the features repeatedly and match them correctly over time (i.e. newly observed features should be correctly matched to features already on the map). This ability to match observations to corresponding map features is called data association (Fig. 9.7). Of course, if an observed feature is not already on the map, such a feature will need to be initialised first (i.e. added to the map vector).

Fig. 9.7
figure 7

A vision algorithm is used to detect salient features in the environment. The top and bottom images show two views of the same environment captured at two different time intervals of the robot’s journey. The lines indicate matched features between the two images (data association). Can you spot any instances of failed data association?

Fig. 9.8
figure 8

Unknown states could be recursively estimated using sensor information within a Bayesian framework

These measurements are inherently noisy due to their physical nature and the limitations of the algorithms. The estimates of the states based on these sensor inputs are, therefore, limited by the accuracy of the sensors used for observations. However, during the final stage of the estimation process, such noise will be filtered out.

  1. 3.

    The Update Stage

During the update stage, the prediction information and observations are combined to produce an improved estimate of the states. The process could be represented as:

$$ \hat{\textbf{\textit{x}}} = \left( {1 - w} \right){\mathbf{x}}_{{{\text{predicted}}}} + w {\mathbf{x}}_{{{\text{observed}}}} $$

where \(w\) is a weighing factor used to determine the relative importance of the observation and the prediction. If the sensors used in the robot provide highly accurate observation measurements, this parameter will be set closer to 1. If the sensors are noisier and you have to rely on the predictions, this value needs to be set closer to 0. In order to select the best value for the weighting factor, it is important to understand the nature of your observation measurements (called the sensor model) and the robot’s motion model (also called the vehicle model or the control model). Suppose this is not appropriately ‘tuned’ to the system. In that case, the estimates could be either conservative, in which case the state estimates will carry large uncertainties or optimistic, therefore being overly confident about estimation where it is not warranted. Either scenario leads to state estimates that are not useful in the worst case, outright dangerous. An example of the latter could be observed when using GPS systems for localisation in densely constructed environments such as passing through a narrow pathway amidst tall buildings. Here, you would notice that your GPS location estimate suddenly starts to jump around and sometimes appears to be inside the buildings. However, it might still indicate high confidence in the estimate (usually denoted by a circle or an ellipse around the estimated location—the smaller the circle, the higher the confidence). This occurs because of the multi-path problem with GPS signals. Here, GPS signals are bounced off the tall structures before reaching the GPS receiver leading to large errors in the time-of-flight calculations. Since the built-in algorithms are unaware of what is happening outside, it continues to trust the observations leading to these erroneous state estimates. For a self-driving car or similar robot, this could be catastrophic, to say the least!

Various algorithms have been proposed to solve the multivariate state estimation problem. One of the most popular algorithms is the Kalman filter.

6 The Kalman Filter

The Kalman filter is a recursive linear statistical method for estimating the states of interest. The basic Kalman filter deals with linear systems, and nonlinear systems are treated by a linear approximation using the extended Kalman filter (EKF). Kalman filter has various applications in varying disciplines. For example, in robotic navigation and data fusion, Kalman filter is one of the methods frequently discussed in the literature with various adaptations and modifications.

6.1 Linear Discrete-Time Kalman Filter

For a linear system subject to Gaussian, uncorrelated, zero-mean measurement and process noises, the Kalman filter is the optimal minimum mean squared error estimator. To derive the filter for such a system, its model and the model of the observation must be defined. Then the problem can be stated as a recursive linear estimator with unknown gains. The gains can be determined using the minimum mean-squared error criterion (MMS).

The Kalman filter consists of the same three recursive stages discussed in the previous section.

  1. 1.

    Prediction stage

  2. 2.

    Observation stage

  3. 3.

    Update stage.

For a linear, discrete-time system, the state transition equation can be written as follows:

$$ {\mathbf{x}}\left( k \right) = {\mathbf{F}}\left( k \right){\mathbf{x}}\left( {k - 1} \right) + {\mathbf{B}}\left( k \right){\mathbf{u}}\left( k \right) + {\mathbf{G}}\left( k \right){\mathbf{v}}\left( k \right) $$

where

  • \({\mathbf{x}}\left( k \right)\) state at time \(k\)

  • \({\mathbf{u}}\left( k \right)\) control input vector at time \(k\)

  • \({\mathbf{v}}\left( k \right)\) additive motion noise

  • \({\mathbf{B}}\left( k \right)\) control input transition matrix

  • \({\mathbf{G}}\left( k \right)\) noise transition matrix

  • \({\mathbf{F}}\left( k \right)\) state transition matrix

And the linear measurement equation can be written as follows:

$$ {\mathbf{z}}\left( k \right) = {\mathbf{H}}\left( k \right){\mathbf{x}}\left( k \right) + {\mathbf{w}}\left( k \right) $$

where

  • \({\mathbf{z}}\left( k \right)\) observation made at time \(k\)

  • \({\mathbf{x}}\left( k \right)\) state at time \(k\)

  • \({\mathbf{H}}\left( k \right)\) measurement model

  • \({\mathbf{w}}\left( k \right)\) additive observation noise

As mentioned earlier, system and measurement noise is assumed to be zero-mean and independent. Thus,

$$ E\left[ {{\mathbf{v}}\left( k \right)} \right] = E\left[ {{\mathbf{w}}\left( k \right)} \right] = {\mathbf{0}}, \,\forall {\text{k}} $$

and,

$$ {\text{E}}\left[ {{\mathbf{v}}\left( i \right){\boldsymbol{w}}^{T} \left( j \right)} \right] = 0, \,\forall i,j $$

Motion noise and the measurement noise will have the following corresponding covariance:

$$ {\text{E}}\left[ {{\mathbf{v}}\left( i \right){\boldsymbol{v}}^{T} \left( j \right)\left] { = \updelta _{ij} {\mathbf{Q}}\left( i \right), {\text{E}}} \right[{\mathbf{w}}\left( i \right){\boldsymbol{w}}^{T} \left( j \right)} \right] = \updelta _{ij} R\left( i \right) $$

The estimate of the state at a time \(k\) given all information up to time \(k\) is written as \(\hat{\textbf{\textit{x}}}(k|k)\) and the estimate of the state at a time \(k\) given information up to time \(k - 1\) is written as \(\hat{\textbf{\textit{x}}}(k|k - 1)\) and is called the prediction. Thus, given the estimate at \(\left( {k - 1} \right){\text{th}}\) time step, the prediction equation for the state at \(k{\text{th}}\) time step can be written,

$$ \hat{\textbf{\textit{x}}}\left( {k/k - 1} \right) = {\mathbf{F}}\left( k \right) \hat{\textbf{\textit{x}}} (k - 1| k - 1) + {\mathbf{B}}\left( k \right){\mathbf{u}}\left( k \right) $$

And the corresponding covariance prediction:

$$ {\mathbf{P}}(k| k - 1) = {\mathbf{F}}\left( k \right) {\mathbf{P}}(k - 1|k - 1){\mathbf{F}}^{T} \left( k \right) + {\mathbf{G}}\left( k \right){\mathbf{Q}}\left( k \right){\mathbf{G}}^{T} \left( k \right) $$

Then the unbiased linear estimate is:

$$ {\hat{\mathbf{x}}}(k|k) = {\hat{\mathbf{x}}}(k|k - 1) + {\mathbf{W}}\left( k \right)[{\mathbf{z}}\left( k \right){-} {\mathbf{H}}\left( k \right) {\hat{\mathbf{x}}}(k|k - 1)] $$

Note that the conditional expected error between the estimate and the true state is zero.

\({\mathbf{W}}\left( k \right)\) is called the Kalman gain at time step k. This is calculated as:

$$ {\mathbf{W}}\left( k \right) = {\mathbf{P}}(k|k - 1){\mathbf{H}}^{T} \left( k \right){\mathbf{S}}^{ - 1} \left( k \right) $$

where \( {\mathbf{S}}\left( k \right) \) is called the innovation variance at time step k and given by:

$$ {\mathbf{S}}\left( k \right) = {\mathbf{H}}\left( k \right){\mathbf{P}}(k|k - 1) {\mathbf{H}}^{T} \left( k \right) + {\mathbf{R}}\left( k \right) $$

and the covariance estimate is:

$$ {\mathbf{P}}(k|k) = \left( {{\mathbf{I}} {-}{\mathbf{W}}\left( k \right){\mathbf{H}}\left( k \right)} \right){\mathbf{P}}(k|k - 1)\left( {{\mathbf{I}} - {\mathbf{W}}\left( k \right){\mathbf{H}}\left( k \right)} \right)^{T} + {\mathbf{W}}\left( k \right){\mathbf{R}}\left( k \right){\mathbf{W}}^{T} \left( k \right) $$

Essentially, the Kalman filter takes a weighted average of the prediction \({\hat{\mathbf{x}}}(k|k - 1)\), based on the previous estimate \({\hat{\mathbf{x}}}(k - 1| k - 1)\), and a new observation \({\mathbf{z}}\left( k \right)\) to estimate the state of interest \({\hat{\mathbf{x}}}(k|k)\).

Case study: We can illustrate this process with a simple 1-D toy example (Fig. 9.9). Let us assume that the robot can only move in one direction \(\left( x \right)\). At time \(t = k - 1\) the robot is at the location \(x\left( {k - 1} \right)\). Though the robot is not privy to this exact value, it has an estimate of its location, given by \({\hat{\mathbf{x}}}(k - 1| k - 1)\). As this is an estimate of the true value, it has uncertainty about its location, represented by the curve above it—the spread indicating the extent of the uncertainty. Now, the robot moves forward to the location \({\mathbf{x}}\left( k \right)\) at time \(t = k\) using a control input \({\mathbf{u}}\left( k \right)\). Using these pieces of information, the robot can now predict its location at the new time step as \({\hat{\mathbf{x}}}(k|k - 1)\). However, this leads to increased error in the estimated location as represented by the new curve. The robot observes a landmark (represented by the star) at this stage using its onboard sensor. Let us assume that the exact location of the star, \(x_{{{\text{star}}}}\) is known to the robot (i.e. a map is available). The robot measures the distance \({\mathbf{z}}\left( k \right)\) between the star and its current location using an internal sensor. As we now have an additional piece of information about the robot’s location at time \(t = k\), it is possible to integrate the new information to come up with a better estimate of the current location. If we trust the sensor that made the distance measure, we could weigh that information more. The previous derivation of the Kalman filter helps us make this decision and integrate the new sensor information weighted by the ‘trust’ we place on the sensor. Finally, an improved location estimate could be worked out as \({\hat{\mathbf{x}}}(k|k)\) resulting in a reduction in the uncertainty of the robot location. This new estimate then could be used to predict the robot’s location at \(t = k + 1\), and the process could be repeated.

Fig. 9.9
figure 9

A 1-D robot localisation example

6.2 The Extended Kalman Filter (EKF)

Although the Kalman filter is the optimal minimum mean squared error estimator for a linear system, any real robot is nonlinear. A solution is found in the extended Kalman filter (EKF), which uses a linearised approximation to nonlinear models. However, linear approximations for nonlinear functions must be treated with care, and if treated properly, the EKF will generate very good results in many applications. The extended Kalman filter algorithm is very similar to the linear Kalman filter algorithm with the substitutions:

$$ {\mathbf{F}}\left( k \right) \to \nabla {\mathbf{f}}_{{\mathbf{X}}} \left( k \right)\, {\text{and}}\, {\boldsymbol{H}}\left( k \right) \to \nabla {\mathbf{h}}_{{\mathbf{X}}} \left( k \right) $$

where \(\nabla {\mathbf{f}}_{{\mathbf{X}}} \left( k \right) \) and \(\nabla {\mathbf{h}}_{{\mathbf{X}}} \left( k \right)\) are nonlinear functions of both state and time step. These are called the Jacobians, or the partial derivates of the state transition and measurement functions, respectively (see Chap. 6). This implies that these functions are differentiable.

Then, the main equations can be summarised as follows:

  1. 1.

    Prediction equations:

    $$ {\hat{\mathbf{x}}}\left( {k/k - 1} \right) = { }{\mathbf{f}}( (k - 1| k - 1) , {\mathbf{u}}\left( k \right)) $$
    $$ {\mathbf{P}} (k| k - 1) = \nabla {\mathbf{f}}_{{\mathbf{X}}} \left( k \right) {\mathbf{P}}(k - 1|k - 1) \nabla^{T} {\mathbf{f}}_{{\mathbf{X}}} \left( k \right) + {\mathbf{Q}}\left( k \right) $$
  1. 2.

    Update equations:

    $$ {\hat{\mathbf{x}}}((k|k) = {\hat{\mathbf{x}}}(k|k - 1) + {\mathbf{W}}\left( k \right)[{\mathbf{z}}\left( k \right) {-} {\mathbf{h}}({\hat{\mathbf{x}}}(k|k - 1))] $$
    $$ {\mathbf{P}}(k|k) = {\mathbf{P}}(k|k - 1) - {\mathbf{W}}\left( k \right){\mathbf{S}}\left( k \right){\mathbf{W}}^{T} \left( k \right) $$

    where

    $$ {\mathbf{S}}\left( k \right) = \nabla {\mathbf{h}}_{{\mathbf{X}}} \left( k \right){\mathbf{P}}(k|k - 1) \nabla^{T} {\mathbf{h}}_{{\mathbf{X}}} \left( k \right) + {\mathbf{R}}\left( k \right) $$

6.3 Data Association

One of the issues that arise during data fusion in a robotic navigation scenario is identifying the sensor observations with the observed. As mentioned earlier, this problem is commonly referred to as the data association problem. There are several methods available for discerning the observations. The most obvious way of doing this is to make the observations self-identifying. An example of this was presented earlier using computer vision, where a feature matching algorithm is used for the data association.

Statistical methods also exist to determine how likely a given observation is of the object thought to be observed. Derivation of equations for one such method referred to as the Mahalanobis distance is discussed below.

The difference between the observed and the predicted observation for a set of sensor data is called the innovation \(\left( {\boldsymbol{v}} \right)\) and could be represented with the notations introduced earlier in the Kalman filter section as follows:

$$ {{\boldsymbol{\upnu}}}\left( k \right) = {\mathbf{z}}\left( k \right) - {\hat{\mathbf{z}}}(k|k - 1) $$

where \({\hat{\mathbf{z}}}(k|k - 1)\) is the predicted observation for time step \(k\) given the observation information up to time step \(k - 1\). It can be proven that the innovation is white with a mean of zero and variance \({\mathbf{S}}\left( k \right)\) given below:

$$ {\mathbf{S}}\left( k \right) = {\mathbf{R}}\left( k \right) + {\mathbf{H}}\left( k \right){\mathbf{P}}(k|k - 1){\mathbf{H}}^{T} \left( k \right) $$

The above information can then be used in the problem of data association. The normalised innovation squared \({\mathbf{q}}\left( k \right)\) is defined as:

$$ {\mathbf{q}}\left( k \right) = {\boldsymbol{\nu}}^{{\boldsymbol{T}}} \left( k \right){\mathbf{S}}^{ - 1} { }{\boldsymbol{\nu}}\left( k \right) $$

If the associated filter is assumed to be consistent, the above equation can be shown to be a \(\chi^{2}\) random distribution with \(m\) degrees of freedom, where \(m = {\text{dim}}\left( {{\mathbf{z}}\left( k \right)} \right)\), which is the dimension of the observation sequence.

A confidence value can be chosen from the \(x^{{2}}\) tables and compared against the value of \({\mathbf{q}}\) for each observation in the observation sequence. If the value of \({\mathbf{q}}\) for a given observation is less than the threshold, then that observation is likely to be associated with the correct object of observation. If multiple observations satisfy the above condition, it is safer to ignore such observations as improper data association could lead to unstable filter performance.

7 A Case Study: Robot Localisation Using the Extended Kalman Filter

Let us now consider a real-world application of the extended Kalman filter in solving the localisation problem.

7.1 Assumptions

The motion model used is a very important parameter in deciding the success of the filter to be used. Therefore, it needs proper consideration along with the choice of sensors. The algorithm used in this case study uses the rigid body and rolling motion constraints to simplify the analysis. The rigid body constraint assumes that the robot’s frame is rigid, and the rolling motion constraint assumes that all points on the vehicle rotate about the instantaneous centre of rotation with the same angular velocity. This could be a reasonable model for a simple structure like the TurtleBot. In order to further simplify the analysis, it is assumed that there is no slip between tyres and the ground and that vehicle motion may be adequately be represented as a two-dimensional model whose motion is restricted to a plane.

7.2 Derivation of the EKF-Based Localisation Algorithm

Derivation follows the equations in Sect. 6. A state prediction for the \(\left( {k + {1}} \right)\) time step can be derived from the information available up to time step k:

$$ {\mathbf{X}}\left( {k + {1}} \right) = f\left( {{\mathbf{X}}\left( k \right),{\mathbf{u}}\left( k \right)} \right) $$

Note the abbreviated representation used where \(X\left( {k + {1}} \right)\) represents the prediction of state at \(\left( {k + {1}} \right)\) time step given the information up to time step \(\left( k \right)\). \(X\left( k \right)\) represents the best estimate available at time step \(\left( k \right)\). \(u\left( k \right)\) represents the control inputs to the robot driver at time step \(\left( k \right)\):

$$ u{(}k{)} = \left[ \begin{gathered} V(k) \hfill \\ \omega (k) \hfill \\ \end{gathered} \right] $$

where \(V\left( k \right)\) is the robot’s forward velocity at time step \(\left( k \right)\) and \(\omega \left( k \right)\) is the turning rate (angular velocity) at time step \(\left( k \right)\). Thus, the complete process model can be described as below:

$$ \left[ \begin{gathered} x(k + 1) \hfill \\ y(k + 1) \hfill \\ \varphi (k + 1) \hfill \\ \end{gathered} \right] = \left[ {\begin{array}{*{20}l} {x(k) + \Delta TV(k)\cos (\varphi (k))} \hfill \\ {y(k) + \Delta TV(k)\sin (\varphi (k))} \hfill \\ {\varphi (k) + \Delta T\omega } \hfill \\ \end{array} } \right] $$

Assume that the robot makes an observation at time step \(\left( {k + {1}} \right)\) with its onboard sensor to a particular feature in the environment. The sensor is mounted on the robot with an offset of a-units in the centreline, and the observation results in range \(\left( {r_{{\text{i}}} } \right)\) and bearing \((\theta_{{\text{i}}} )\) information pertaining to the feature observed, as depicted in Fig. 9.10.

Fig. 9.10
figure 10

Sensor mounted on the robot observes the ith feature

There are n-number of features scattered in the environment for which absolute position coordinates are known a priori (i.e. the maps is given). A general observation of the ith feature \(B_{{\text{i}}} \left( {X_{{\text{i}}} ,Y_{{\text{i}}} } \right)\) can be represented as follows:

$$ r_{{\text{i}}} \left( {k + {1}} \right) = \sqrt {(X_{i} - x_{r} \left( {k + 1} \right))^{2} + (Y_{i} - y_{r} \left( {k + 1} \right))^{2} } $$
$$ \theta_{{\text{i}}} \left( {k + {1}} \right) = \tan^{ - 1} \left( {\frac{{(Y_{i} - y_{r} (k + 1)}}{{(X_{i} - x_{r} (k + 1)}}} \right) - \varphi (k + 1) $$

where

$$ x_{{\text{r}}} = x\left( k \right) + a{\text{ cos}}(\upphi \left( k \right)) $$
$$ y_{{\text{r}}} = y\left( k \right) + a{\text{ sin}}(\upphi \left( k \right)) $$

The best estimate for this observation \({\mathbf{Z}}\left( {k + {1}|k} \right)\) derived from previous information can be represented in the following form;

$$ {\mathbf{Z}}\left( {k + {1}} \right) = \left[ \begin{gathered} \hat{Z}_{r} (k + 1|k) \hfill \\ \hat{Z}_{\theta } (k + 1|k) \hfill \\ \end{gathered} \right] = \left[ {\begin{array}{*{20}l} {\sqrt {(X_{i} - \hat{x}(k + 1|k))^{2} + (Y_{i} - \hat{y}(k + 1|k))^{2} } } \hfill \\ {\tan^{ - 1} \left( {\frac{{(Y_{i} - \hat{y}(k + 1|k)}}{{(X_{i} - \hat{x}(k + 1|k)}}} \right)} \hfill \\ \end{array} } \right] $$

The prediction of covariance can be obtained as

$$ P\left( {k + {1}|k} \right) = \nabla f_{x} (k)P(k|k)\nabla f_{x}^{T} (k) + \nabla f_{w} (k)\Sigma (k)\nabla f_{w}^{T} (k) $$

where \(\nabla f_{x}\) represents the gradient of Jacobean of \(f\left( . \right)\) evaluated at time k with respect to the states, \(\nabla f_{w}\) is the Jacobean of \(f\left( . \right)\) with respect to the noise sources, and \(\Sigma (k)\) is the noise strength given by

$$ \Sigma (k) = \left[ \begin{gathered} \sigma_{V}^{2} \hfill \\ 0 \hfill \\ \end{gathered} \right.\left. \begin{gathered} 0 \hfill \\ \sigma_{\omega }^{2} \hfill \\ \end{gathered} \right] $$
$$ \nabla f_{x} (k) = \left[ {\begin{array}{*{20}c} 1 & 0 & {\Delta TV(k)\sin (\phi (k|k))} \\ 0 & 1 & {\Delta TV(k)\cos (\phi (k|k))} \\ 0 & 0 & 1 \\ \end{array} } \right] $$
$$ \nabla f_{w} (k) = \left[ {\begin{array}{*{20}c} {\Delta T\cos (\varphi (k|k))} & 0 \\ {\Delta T\sin (\varphi (k|k))} & 0 \\ 0 & {\Delta T} \\ \end{array} } \right] $$

The innovation (observation prediction error) covariance \(S\left( k \right)\), which is used in the calculation of the Kalman gains, can be calculated by squaring the estimated observation error and taking the expectations of the measurements up to \(k{\text{th}}\) time step and can be written as follows

$$ S\left( {k + {1}} \right) = \nabla h_{x} (k + 1)P(k + 1|k)\nabla h_{x}^{T} (k + 1) + R(k + 1) $$

where \(R\left( {k + {1}} \right)\) is the observation variance (which is diagonal in most robotics applications due to the independent nature of the measurements)

$$ R(k) = \left[ \begin{gathered} \sigma_{r}^{2} \hfill \\ 0 \hfill \\ \end{gathered} \right.\left. \begin{gathered} 0 \hfill \\ \sigma_{\theta }^{2} \hfill \\ \end{gathered} \right] $$
$$ \nabla h_{x} (k + 1) = \left[ {\begin{array}{*{20}c} {\frac{{\hat{x}(k + 1|k) - X_{i} }}{d}} & {\frac{{\hat{y}(k + 1|k) - Y_{i} }}{d}} & 0 \\ {\frac{{ - \hat{y}(k + 1|k) - Y_{i} }}{{d^{2} }}} & {\frac{{\hat{x}(k + 1|k) - X_{i} }}{{d^{2} }}} & { - 1} \\ \end{array} } \right] $$

where

$$ d = \sqrt {(X_{i} - \hat{x}_{r} (k + 1))^{2} + (Y_{i} - \hat{y}_{r} (k + 1))^{2} } $$

Finally, the state update equations for the EKF are given by (adapting general equations in the previous section)

$$ \hat{x}\left( {k + {1}/k + {1}} \right) = \hat{x}\left( {k + {1}|k} \right) + W\left( {k + {1}} \right)[z\left( {k + {1}} \right){-}h(\hat{x}\left( {k + {1}|k} \right))] $$

where

$$ W\left( {k + {1}} \right) = P\left( {k + {1}|k} \right)\nabla h_{x}^{T} \left( {k + {1}} \right)S\left( {k + {1}} \right)^{{ - {1}}} $$

is the Kalman gain .

The algorithm is now complete, and as the robot proceeds from time \(t = 0\) observing the environment, it can be applied recursively to determine the current location. A set of map features must be first initialised. In the example in Fig. 9.11, we have used a set of retroreflective beacons spread out in the environment. The locations of these beacons were surveyed and recorded for initialisation. A SICK laser range finder was used to detect and measure the range and bearing to these beacons.

Fig. 9.11
figure 11

An implementation of the EKF-based localisation algorithm. The solid blue line indicates the EKF estimate of the robot path. The red dotted line is odometry. The * denotes the locations of the surveyed beacons

8 Summary

This chapter looked briefly at a set of fundamental problems in robot navigation. The localisation problem answers the question ‘where am I?’ and the mapping problem asks the question ‘how to generate a map of the robot’s environment?’ when the robot’s location is known. The Simultaneous Localisation and Mapping (SLAM) problem involves solving both the localisation and the mapping problem concurrently. We discussed an estimation theoretic approach to solving these problems using probabilistic techniques. The extended Kalman filter was presented as an implementation of this approach using a linear approximation to the nonlinear system models. In combination, these algorithms and techniques should enable your robot never to get lost in space (or earth!).

9 Review Questions

  • Assume a robot equipped with a sensor that can detect the state of a door. For simplicity, let us assume that the door could be in only one of two possible states, open or closed. Let us now assume the robot’s sensors are noisy. The noise is characterised by the following conditional probabilities: (Z-observation, X-door state)

    $$ p\left( {{\mathbf{Z}} = {\mathbf{sense}}\_{\mathbf{open}}|{\mathbf{X}} = {\mathbf{is}}\_{\mathbf{open}}} \right) = 0.{63} $$
    $$ p\left( {{\mathbf{Z}} = {\mathbf{sense}}\_{\mathbf{closed}}|{\mathbf{X}} = {\mathbf{is}}\_{\mathbf{closed}}} \right) = 0.{95} $$

What is the value of the conditional probability \(p\left( {{\mathbf{Z}} = {\mathbf{sense}}\_{\mathbf{closed}}|{\mathbf{X}} = {\mathbf{is}}\_{\mathbf{open}}} \right)\).

  • What is meant by the robot localisation problem?

  • Why is data association important for successful localisation?

10 Further Reading

This chapter only scratched the surface of the localisation and mapping problem. Considerable research has happened since the ‘90s, with many successful implementations now in various production platforms operating at large-scale environments. An excellent book on the subject by (Thrun et al., 2005) provides an excellent deep dive into the subject. The essential tutorial on SLAM by Bailey and Durrant-Whyte (Bailey & Durrant-Whyte, 2006; Durrant-Whyte & Bailey, 2006) provides a great quick reference to the SLAM problem. The seminal work by (Dissanayake et al., 2001; Durrant-Whyte, & Csorba, 2001) provides proof of the existence of a solution to the SLAM problem. If you are interested in understanding the underlying probabilistic estimation techniques and theories (Bar-Shalom et al., 2001) is highly recommended.