Sensor Fusion and State Estimation of the Robot

  • Francesco NoriEmail author
  • Silvio Traversaro
  • Maurice Fallon
Living reference work entry


In this chapter, we review previous estimation and sensor fusion approaches in the field of humanoid robotics. The focus is primarily but not exclusively on state estimation. Humanoids are modelled as free-floating mechanical systems subject to external forces and constrained by whole-body distributed rigid contacts. Besides state estimation, a number of related problems are considered including the estimation of joint torques, contact forces, contact positions, and the center of mass. Previous approaches are discussed with reference to a number of features relevant to humanoid applications: computational complexity, observability analysis, modeling accuracy, and floating-base parameterization. This chapter concludes with a section on future directions and open problems, revisited in the light of previously discussed approaches.

1 Introduction

Within a control system, the role of the state estimator is to combine, sometimes disparate, sensor measurements of the input and output of the system to create an overall estimate of the state (often including a measure of uncertainty) which is often then used as input to the controller. In humanoid robots, the system state estimation is a challenging problem for the reasons hereafter discussed.

Different from traditional industrial manipulators, humanoids are free-floating systems – as opposed to being rigidly fixed to the ground. The global pose of a humanoid therefore contains the overall position and orientation of the system, which are not directly sensed. Two complications therefore arise. Firstly, the parameterization of orientations cannot be simultaneously minimal and non-singular. To address this concern, previous approaches either considered a minimal representation (e.g., Euler angles) or a singularity-free parameterization (e.g., unit quaternions). Secondly, no direct sensor can estimate the free-floating component of the system pose (external motion capture systems aside). State estimators therefore need to rely on indirect measurements such as the kinematic constraints arising from footsteps making ground contact.

Humanoid motion controllers are often applied in real time, and it is often desirable to have computationally efficient implementations of their computations. For example, efficient solutions exist for solving the forward dynamics [13], i.e., for estimating the joint accelerations given the system state and the applied joint torques. Similarly, computationally efficient solutions exist for solving the inverse dynamics, i.e., for computing the joint torques given the system state and joint accelerations. Whole-body state estimation instead is a problem for which different dynamic models with different levels of complexity have been considered in previous literature, trading-off between accuracy and computational complexity. Planar models (e.g., linear inverted pendulum) have been progressively replaced by three-dimensional articulated models. Dedicated observability analyses have been conducted to cope with the complexity and nonlinearity of these models. Even though joint positions are directly measured, velocities and accelerations require filtering which causes lags and therefore instability in the controller. Gyroscopes and accelerometers can mitigate these drawbacks.

When in contact with the environment, humanoids are often represented as free-floating systems subject to physical constraints that represent the environmental contacts. Their state therefore is constrained to and evolves in a manifold whose topology is nontrivial, especially when the system is constrained at multiple contact locations. The state estimation problem should take these constraints into account to exclude those hypotheses which are not compatible with the physical bounds of the considered system, including joint limits and collisions. Contacts and constraints generate forces normal to the constraint surface [24]. In this sense, state estimation is crucial, but a full characterization of the humanoid whole-body dynamics requires also the estimation of internal and external forces. Within this context, internal forces lead to joint torques and motor friction identification; external force allows estimation of exchanged contact forces with humans (for safety) and with the environment (for postural control).

Finally, estimation often relies on several parameters which influence the estimation accuracy itself. Examples of these parameters, often referred to as hyper-parameters, are sensor bias and sensor positioning. Tuning of these parameters can be either handled online with some adaptive strategies or off-line with suitably defined batch procedures. Sensor bias is often assumed to be a slowly drifting corrupting signal, and it has been estimated online by assuming a random walk dynamic model. Other calibrations, such as the sensor positions, are typically estimated off-line with a batch procedure.

2 Structure

The present chapter surveys previous approaches to several estimation problems related to humanoid robotics. The chapter is divided into sections. Each section focuses on a specific estimation problem as described in Sect. 4, and previous approaches are categorized according to the estimated variables and estimation features. Table 1 summarizes the proposed classification. A first group of columns clusters previous approaches according to the type of estimated variables. State estimation is subdivided into joint-related (i.e., joint positions, velocities, and accelerations) and floating-base-related quantities (i.e., floating-base position, orientation, and accelerations). Force estimation is subdivided into internal (i.e., joint torques) and external (i.e., contact) force, the latter being subdivided into magnitude, direction, and position estimation. Estimated variables also include the center of mass position and sensor biases, which are often included among the state variables. A second group of columns clusters previous approaches according to the features of the estimation algorithm. As discussed in the introduction, several features are considered. The column labeled “real-time performances” refers to the computational cost of the estimation procedure. Two columns concern hyper-parameter calibration and in particular sensor geometric calibration and joint friction identification. The column “observability analysis” identifies the approaches which propose a mathematical analysis to guarantee that estimated variables can be inferred from available sensors. Other important features are related to the trade-off between accuracy and complexity of the underlying dynamical model that represent the humanoid body structure. Within this context, we distinguish between three-dimensional versus planar and articulated versus rigid-body models. The column “global parameterization” identifies those approaches which adopt a non-singular versus minimal parameterization for the free-floating state orientation. Finally, a dedicated column identifies previous works explicitly modeling physical constraints (e.g., joint limits and positivity constraints) in the estimation.
Table 1

The table presents a classification of previous works on state estimation for humanoid robots. The adopted classification criteria are subdivided into estimated variables and estimation features. Estimated variables are separated into joint and floating-base variables. Additional relevant variables have been included in the classification: joint torques, external forces, and center of mass position

The chapter is organized as hereafter described. Section 3 introduces the notation used in the chapter. Section 4 describes the estimation problem, and Sect. 5 discusses the current state of the art. Section 6 speculates on future directions and defines open problems. Finally Sect. 7 draws the conclusions.

3 Notation

The notation used in this chapter is a subpart of a wider document [42] to which the reader might refer for additional details.


Coordinate frames


An arbitrary point

o B

Origin of B


Orientation frame associated to A


Frame with origin o B and orientation [A]

A p

Coordinates of p w.r.t. to A

A o B

Coordinates of o B w.r.t. to A


Homogeneous transformation from B to A


Velocity transformation from B to A

C v A,B

Twist expressing the velocity of B wrt to A written in C

C vA, B

4 × 4 matrix representation of C v A,B

B f

Coordinates of the wrench f w.r.t. B


Wrench transformation from B to A


Jacobian relating the velocity of B with respect to A expressed in C


Jacobian relating the velocity of B with respect to A expressed in C,


where the floating-base velocity is expressed in F

3.1 Points and Coordinate Frames

A frame is defined as the combination of a point (called origin) and an orientation frame in 3D space. We typically employ a capital letter to indicate a frame. Given a frame A, we will indicate with o A its origin and with [A] its orientation frame. Formally, we write this as A = (o A , [A]).

Frames can be time varying. They can be used, for example, to describe the position and orientation in space of a rigid body as time evolves. They are also used to express a coordinate system for a wrench exchanged by two bodies or used to define a coordinate system so as to describe a robot task (like a frame attached to the center of mass and oriented as the inertial frame).

Newtonian mechanics requires the definition of an inertial frame. In this document, we usually indicate this frame simply with A (the Absolute frame). As is the common practice, for robots operating near the Earth’s surface, we will assume the frame A to be fixed to the world’s surface and disregard non-inertial effects due to the Earth’s motion.

3.2 Coordinate Vector of a Point

Given a point p, its coordinates with respect to a frame A = (o A , [A]) are collected in the coordinate vector A p. The coordinate vector A p represents the coordinates of the 3D geometric vector \({\overset {\rightarrow }{r}}_{o_A,p}\) connecting the origin of frame A with the point p, pointing toward p, expressed in the orientation frame [A], that is
$$\displaystyle \begin{aligned} ~^A p := \begin{bmatrix} {\overset{\rightarrow}{r}}_{o_A,p} \cdot {\overset{\rightarrow}{x}}_A \\ {\overset{\rightarrow}{r}}_{o_A,p} \cdot {\overset{\rightarrow}{y}}_A \\ {\overset{\rightarrow}{r}}_{o_A,p} \cdot {\overset{\rightarrow}{z}}_A \end{bmatrix} \in \mathbb{R}^3, \end{aligned} $$
with ⋅ denoting the scalar product between two vectors and \({\overset {\rightarrow }{x}}_A\), \({\overset {\rightarrow }{y}}_A\), and \({\overset {\rightarrow }{z}}_A\) the unit vectors defining the orientation frame [B].

3.3 Change of Orientation Frame

Given two frames A and B, we will employ the notation:
$$\displaystyle \begin{aligned} {}^{A} R_{B} \in \mathrm{SO}(3) \end{aligned} $$
to denote the coordinate transformation from frame B to frame A. The coordinate transformation A R B only depends on the relative orientation between the orientation frames [A] and [B], irrespective of the position of the origins o A and o B .

3.4 Homogeneous Transformation

To describe the position and orientation of frame B with respect to another frame A, we employ the 4 × 4 homogeneous matrix:
$$\displaystyle \begin{aligned}{}^A H_B := \begin{bmatrix} {}^A R_B & {}^A o_B \\ 0_{1\times3} & 1 \end{bmatrix} . \end{aligned} $$
Given a point p, the homogeneous transformation matrix A H B can be also used to map the coordinate vector A p to B p as follows: Let \({ }^A \bar {p}\) and \({ }^B \bar {p}\) denote the homogenous representation of A p and B p, respectively. That is, let \({ }^A \bar {p} := ({ }^A p; 1) \in \mathbb {R}^4\) and likewise for \({ }^B \bar {p}\) (note that ; indicates row concatenation). Then
$$\displaystyle \begin{aligned}{}^A \bar{p} & = {}^A H_B {}^B \bar{p} , \end{aligned} $$
which is the matrix form of A p = A R B B p + A o B . We refer to [24, Chapter 2] for further details on homogeneous representation of rigid transformations.

3.5 Velocity Vectors (Twists)

In the following, given a point p and a frame A, we define
$$\displaystyle \begin{aligned}{}^A \dot p := \frac{d}{dt} \left( {}^A p \right). \end{aligned} $$
In particular, when p is the origin of a frame, e.g., p = o B , we have
$$\displaystyle \begin{aligned}{}^A \dot{o}_B = \frac{d}{dt} \left({}^A o_B \right). \end{aligned}$$
It is important to note that, by itself, expressions like \(\dot o_B\) or \(\dot p\) have no meaning. Similar to (5), we also define
$$\displaystyle \begin{aligned} {}^{A} \dot{R}_{B} := \frac{d}{dt} \left( ~^{A} R_{B} \right) \end{aligned} $$
$$\displaystyle \begin{aligned} {}^A \dot H_B := \frac{d}{dt} \left( {}^A H_B \right) = \begin{bmatrix} {}^A {\dot R}_B & {}^A {\dot o}_B \\ 0_{1\times3} & 0 \end{bmatrix} . \end{aligned} $$
The relative velocity between a frame B with respect to a frame A can be represented by the time derivative of the homogenous matrix A H B  ∈ SE(3). A more compact representation of \({ }^A {\dot H}_B\) can be obtained multiplying it by the inverse of A H B on the left or on the right. In both cases, the result is an element of \(\mathfrak {se}(3)\) that will be called a twist. Premultiplying on the left, one obtains
$$\displaystyle \begin{aligned}{}^{A} H_{B}^{-1} {}^{A} \dot{H}_{B} & = \begin{bmatrix} {}^A {R}_B^T & -{}^A {R}_B^T {}^Ao_B \\ 0_{1\times3} & 1 \end{bmatrix} \begin{bmatrix} {}^A \dot{R}_B & {}^A{\dot o}_B \\ 0_{1\times3} & 0 \end{bmatrix} \notag\\ & = \begin{bmatrix} {}^AR_B^T {}^A {\dot R}_B & {}^AR_B^T {}^A {\dot o}_B \\ 0_{1\times3} & 0 \end{bmatrix} . \end{aligned} $$
Note that \({ }^A {R}_B^T { }^A \dot {R}_B\) appearing on the right-hand side of (8) is skew symmetric. Define B v A,B and \({ }^B \omega _{A,B} \in \mathbb {R}^3\) so that
$$\displaystyle \begin{aligned}{}^B v_{A,B} & := {}^A R_B^T {}^A {\dot o}_B , \end{aligned} $$
$$\displaystyle \begin{aligned}{}^B \omega^{\wedge}_{A,B} & := {}^A R_B^T {}^A {\dot R}_B . \end{aligned} $$
The left-trivialized velocity of frame B with respect to frame A is
$$\displaystyle \begin{aligned} {}^B {\mathrm{v}}_{A,B} := \begin{bmatrix} {}^{B} v_{A,B} \\ {}^{B} \omega_{A,B} \end{bmatrix} \in \mathbb{R}^6 . \end{aligned} $$
By construction,
$$\displaystyle \begin{aligned} {}^B {\mathrm{v}}^{\wedge}_{A,B} & = {}^{A} H_{B}^{-1} {}^{A} \dot{H}_{B} . \end{aligned} $$
Note the slight abuse of notation in using the hat operator ∧ in (10) and (12) that maps a vector into its corresponding matrix representation (respectively, from \(\mathbb {R}^3\) to \(\mathbb {R}^{3\times 3}\) and from \(\mathbb {R}^6\) to \(\mathbb {R}^{4\times 4}\)).
Specularly to what was done in (8), right multiplying \({ }^A {\dot H}_B\) by the inverse of A H B leads to
$$\displaystyle \begin{aligned} {}^A {\dot H}_B {}^A H_B^{-1} & = \begin{bmatrix} {}^{A} \dot{R}_{B} & {}^A {\dot o}_B \\ 0_{1\times3} & 0 \end{bmatrix} \begin{bmatrix} {}^A R_B^T & -{}^A R_B^T {}^A o_B \\ 0_{1\times3} & 1 \end{bmatrix} \notag\\ & = \begin{bmatrix} {}^{A} \dot{R}_{B} {}^A R_B^T & {}^A {\dot o}_B - {}^{A} \dot{R}_{B} {}^A R_B^T {}^A o_B \\ 0_{1\times3} & 0 \end{bmatrix} . \end{aligned} $$
Define A v A,B and \({ }^A \omega _{A,B} \in \mathbb {R}^3\) as
$$\displaystyle \begin{aligned}{}^A v_{A,B} & := {}^A {\dot o}_B - {}^{A} \dot{R}_{B} {}^A R_B^T {}^A o_B \end{aligned} $$
$$\displaystyle \begin{aligned}{}^A \omega^{\wedge}_{A,B} & := {}^{A} \dot{R}_{B} {}^A R_B^T. \end{aligned} $$
The right trivialized velocity of B with respect to A is then defined as
$$\displaystyle \begin{aligned} {}^A {\mathrm{v}}_{A,B} := \begin{bmatrix} {}^{A} v_{A,B} \\ {}^{A} \omega_{A,B} \end{bmatrix} \in \mathbb{R}^6 . \end{aligned} $$
By construction,
$$\displaystyle \begin{aligned}{}^A {\mathrm{v}}^{\wedge}_{A,B} & = {}^{A} \dot{H}_{B} {}^{A} H_{B}^{-1} . \end{aligned} $$

3.5.1 Expressing a Twist with Respect to an Arbitrary Frame

Straightforward algebraic calculations allow to show that the right and left trivialized velocities A v A,B and B v A,B are related via a linear transformation. Following the notation introduced in Featherstone [13], we denote this transformation A X B and define it as We have therefore
$$\displaystyle \begin{aligned} {}^A {\mathrm{v}}_{A,B} = {}^A X_B {}^B{\mathrm{v}}_{A,B}. \end{aligned} $$
The inverse transformation is simply given by \({ }^B X_A = { }^A X_B^{-1}\), an exercise we leave to the reader (recall that A o B  = − A R B B o A ). There are situations in which one would like to describe the linear and angular velocity of a frame as \({ }^A \dot {o}_B\) and A ω A,B , respectively. This is possible if we express the velocity of frame B with respect to frame A in the frame B[A] := (o B , [A]), that is, the frame with the same origin of B and same orientation of A. In this case, we have
$$\displaystyle \begin{aligned} {}^{B[A]} {\mathrm{v}}_{A,B} = {}^{B[A]}X_B {}^B {\mathrm{v}}_{A,B} = \begin{bmatrix} ~^{A} {R}_{B} & 0 \\ 0 & ~^{A} {R}_{B} \end{bmatrix} \begin{bmatrix} ~^{B} {R}_{A} {}^{A} \dot{o}_B \\ {}^B \omega_{A,B} \end{bmatrix} = \begin{bmatrix} {}^A \dot{o}_B \\ {}^A \omega_{A,B} \end{bmatrix}. \end{aligned} $$

3.6 Jacobians

The goal of this section is to arrive at a precise, unambiguous notation to define “geometric Jacobians” for fixed-based and, in particular, free-floating multibody systems.

In this section, A will denote the (absolute) inertial frame and B a frame rigidly attached to one of the bodies composing the multibody system, selected to be the floating base. The configuration of a free-floating multibody system will be parameterized using \(q = (H, q_J) \in \mathrm{SE}(3) \times \mathbb {R}^{n_J}\), with H = A H B  ∈ SE(3) representing the pose (position and orientation) of the floating-base frame B and \(q_J \in \mathbb {R}^{n_J}\) the internal joint displacements. The configuration space (more correctly, the configuration manifold) has correspondingly dimension n = 6 + n J .

Let E be a frame (rigidly) attached to an arbitrary body to be used, e.g., for the specification of a task to be executed by the robot. The frame E could represent, e.g., the pose of a specific frame rigidly attached to an end effector of a robot manipulator, like a hand or foot of a humanoid robot. Let
$$\displaystyle \begin{aligned} {}^AH_E &= {}^AH_E (q) = {}^AH_E (H, q_J) \end{aligned} $$
denote the homogeneous transformation expressing E with respect to A as a function of the configuration q = (H, q J ).
Let δH denote an infinitesimal perturbation of the pose of the floating base – in the language of differential geometry – and δH ∈ T H  SE(3) – and δq J an infinitesimal perturbation of the joint displacements. Then, the corresponding infinitesimal perturbation of frame E can be computed as
$$\displaystyle \begin{aligned} {}^A \delta H_E & = {}^A D_1H_E (H,q_J) \cdot \delta H + {}^A D_2H_E (H,q_J) \cdot \delta q_J , \end{aligned} $$
where we recall that H is a short form for A H B and δH a short form for A δH B . Define the trivialized infinitesimal perturbations E Δ A,E and \({ }^B \Delta _{A,B} \in \mathbb {R}^6\) such that
$$\displaystyle \begin{aligned} {}^E \Delta_{A,E}^\wedge & = {}^A H_E^{-1} \, {}^A\delta H_E \end{aligned} $$
$$\displaystyle \begin{aligned} {}^B \Delta_{A,B}^\wedge & = {}^A H_B^{-1} \, {}^A\delta H_B . \end{aligned} $$
Combining (23) and (24) together with (22), it is straightforward to show that E Δ A,E depends linearly on B Δ A,B and δq J . Such a linear map defines the “geometric Jacobian” for a floating-base system. It will be indicated with the symbol E J A,E/B to indicate that the Jacobian allows to compute the infinitesimal perturbation of frame E relative to frame A, expressed with respect to frame E (a first left trivialization), based on the infinitesimal perturbation of the internal joint configuration and that of the floating-base B, the latest expressed with respect B (a second left trivialization). For this reason, we will refer to E J A,E/B as a double left-trivialized geometric Jacobian. In formulas, we get
$$\displaystyle \begin{aligned} {}^E\Delta_{A,E} & = {}^EJ_{A,E/B}\left({}^A H_B, q_J\right) \begin{bmatrix} {}^B\Delta_{A,B} \\ \delta q_J \end{bmatrix} . \end{aligned} $$
The infinitesimal perturbation of frames E and B can clearly be expressed with respect to arbitrary frames, let us say, C and D. The geometric Jacobian D J A,E/C is related to E J A,E/B by the transformation
$$\displaystyle \begin{aligned} {}^DJ_{A,E/C} & = {}^{D} X_{E} {}^E J_{A,E/B} {}^BY_C , \end{aligned} $$
$$\displaystyle \begin{aligned}{}^BY_C := \begin{bmatrix} {}^BX_C & 0_{6\times n_J} \\ 0_{n_J \times 6} & I_{n_J \times n_J} \end{bmatrix}. \end{aligned} $$
The special case E[A]J A,E/B[A], obtained from the above formula with D = E[A] and C = B[A], will be denoted the mixed geometric Jacobian .

3.7 Free-Floating Rigid-Body Dynamics

Here we introduce the notation that we use to write, in a compact form, the equations of motion of a free-floating rigid-body system. As mentioned in Sect. 3.6, the configuration of a free-floating rigid-body system will be parameterized as q = (H, q J ) := ( A H B , q J ), where A is the inertial frame, B the (selected) floating-base frame, and \(q_J \in \mathbb {R}^{n_J}\) the joint displacements.

The kinematics of the (floating-base) rigid-body system are written treating the configuration manifold \(Q = \mathrm{SE}(3) \times \mathbb {R}^{n_J}\) as the Lie group defined by the group direct product \(\mathrm{SE}(3) \times \mathbb {R}^{n_J}\). Seeing Q as a Lie group, employing the mixed velocity of the floating-based B, allows one to write the derivative of the configuration \(\dot q = (\dot H, \dot q_J)\) as
$$\displaystyle \begin{aligned} \dot q = q \, Y \nu , \end{aligned} $$
where \(\nu = ({\mathrm {v}}, \dot q_J)\) denotes the mixed generalized velocity of the floating-base rigid-body system, i.e., \(\nu := { }^{B[A]}\nu = \left ({ }^{B[A]}{\mathrm {v}}_{A,B}, q_J\right )\), and Y := B Y B[A] is the natural extension of the velocity transformation X, introduced in Sect. 3.5.1, to the Lie algebra of Q. The structure of the generalized velocity transformation Y has been introduced previously in (27), and it is formally defined in such as a way that
$$\displaystyle \begin{aligned}Y \nu & = {}^B Y_{B[A]} {}^{B[A]}\nu \notag \\ & = {}^B Y_{B[A]} \, \left({}^{B[A]}{\mathrm{v}}_{A,B}, \dot q_J\right) \notag\\ & = \left({}^BX_{B[A]} {}^{B[A]}{\mathrm{v}}_{A,B}, \dot q_J\right) \notag \\ & = \left({}^{B}{\mathrm{v}}_{A,B}, \dot q_J\right) =: {}^B\nu. \end{aligned} $$
Note how Y acts on the floating-base velocity v, leaving the joint velocities \(\dot q_J\) unaltered. The configuration q and the mixed generalized velocity ν can then be used to write the dynamics of the floating-based rigid-body system as
$$\displaystyle \begin{aligned}M(q) \dot\nu + C(q, \nu) \nu + G(q) & = S^\top \tau + \sum_{i \in \mathcal{I}_N} J^T_i(q) {\mathrm{f}}_i \end{aligned} $$
with M the (mixed) mass matrix, C the (mixed) Coriolis matrix, G the (mixed) potential force vector, \(S := [0_{6\times n_J} ; I_{n_J\times n_J}]\) the joint selection matrix, τ the joint torques, \(\mathcal {I}_N\) the set of closed contacts, \({\mathrm {f}}_i := { }_{C_i[A]} {\mathrm {f}}_i\) the ith contact wrench, and
$$\displaystyle \begin{aligned} J_i(q) := {}^{C_i[A]}J_{A, L_i/B[A]}(q) \end{aligned} $$
the mixed geometric Jacobian associated with the contact frame C i , where we used the notation introduced in Sect. 3.6, where L i is the frame rigidly attached to the link that is experiencing the i-th contact .

3.8 Additional Definitions

Kinematic trees ordering.

In the following sections, we restrict the analysis to robots described by kinematic trees (i.e., robots with no kinematic loops) composed of N B rigid links numbered from 0 to N B , 0 being the selected fixed base (world reference frame) and 1 being the selected floating base (reference rigid body in the articulated chain). Link numbering is obtained by defining a suitable spanning tree where each rigid link is associated with a unique node in the tree. Numbers can be always selected in a topological order so that each node i has a higher number than its unique parent λ(i) and a smaller number than all the nodes in the set of its children μ(i). Links i and λ(i) are coupled with the joint i whose joint motion constraints are modeled with \(S_i \in \mathbb {R}^{6 \times n_i}\), n i being the number of degrees of freedom of joint i and \(n = n_1 + \dots + n_{N_B}\) the total number of degrees of freedom excluding the floating base.

Proper acceleration.

Adopting a naming convention common in relativity theory but without entering the details of its definition, we will use the term proper acceleration. In the scope of this chapter, it is sufficient to identify the proper acceleration with the quantity measured by an accelerometer. Let o S be the origin of the accelerometer frame S, and let A be an inertial frame. In a gravity field g, the proper acceleration is \({ }^S R_A \left ( { }^A \ddot o_S - { }^A g \right )\). The proper acceleration of an object which is not accelerating in an inertial frame is − g; this is the case of a stationary accelerometer placed on a table. Moreover, the proper acceleration of a free-falling object is zero.


We will often refer to the term wrench which corresponds to the six-dimensional vector obtained by the vertical concatenation of a three-dimensional force and a three-dimensional torque.

4 Problem Formulation

In this chapter, we consider the estimation problems arising in the field of humanoid robotics. In controlling a system, it is often assumed that the system pose is known. In the context of control theory, this requirement can be formalized by saying that a controller often requires an estimation of the system state. In this chapter, we consider the state estimation problem for a particular class of systems, nominally free-floating articulated rigid bodies. These systems belong to the broader class of lumped parameter mechanisms, for which the state has two components: the configuration and the associated velocity. We specifically focus on the problem arising from the fact that humanoids are free-floating systems. The robot pose is parameterized by a global set of coordinates. Since the pose includes the orientation of the base frame, choosing a global parametrization implies that a non-minimal set of coordinates (i.e., homogenous transformations) has been preferred to minimal ones (e.g., Euler angles). The robot velocity ν instead includes the twist of the base from motion, and therefore the robot velocity ν does not coincide with the time derivative of the robot pose q. The algebraic constraints act on the system which can be used to represent whole-body distributed rigid contacts. The system dynamics are therefore described by the following differential algebraic equation:
$$\displaystyle \begin{aligned} \begin{array}{rcl} q = (H, q_J), \qquad \nu & = & ({\mathrm{v}}, \dot q_J) , \end{array} \end{aligned} $$
$$\displaystyle \begin{aligned} \begin{array}{rcl} M(q) \dot\nu + C(q, \nu) \nu + G(q) - \sum_{i \in \mathcal{I}_N} J^T_i(q) {\mathrm{f}}_i - \sum_{i \in \mathcal{I}^x_N} J^T_{i,x}(q) {\mathrm{f}}^x_i & = & S^\top \tau,\end{array} \end{aligned} $$
$$\displaystyle \begin{aligned} \begin{array}{rcl} \forall i \in \mathcal{I}_N \qquad J_i(q) \dot\nu + \dot J_i(q, \nu) \nu & = & 0 . \end{array} \end{aligned} $$
The algebraic constraints (32c) represent the rigid contacts acting on the free-floating system (e.g., foot contacts). These constraints generate the forces \({\mathrm {f}}_i (i \in \mathcal {I}_N)\) hereafter named contact forces. Other forces might act on the system without being directly associated with algebraic constraints; these forces are denoted \({\mathrm {f}}^x_i (i \in \mathcal {I}^x_N)\) and named external forces. In practice, the differential equation (32b) corresponds to the free-floating system (30) subject to the external forces \({\mathrm {f}}^x_i\) and the constraints (32c), which generate the contact forces f i .

The location of the latter is identified by the associated Jacobian J i,x (q). A peculiarity of free-floating systems is that the system configuration (q) has two components: the joint positions (\(q_J \in \mathbb {R}^{n_J}\) in Sect. 3.7) and the system rotation-translation (H ∈ SE(3) in Sect. 3.7). In this chapter, we consider different estimation problems associated with humanoid robots described as in (32).

Problem 1 (Joint velocity and acceleration estimation)

Consider the differential algebraic equation (32). Given a set of sensors, estimate the joint velocities \(\dot q_J \in \mathbb {R}^{n_J}\) and acceleration \(\ddot q_J \in \mathbb {R}^{n_J}\).

Problem 2 (Joint torque estimation)

Consider the differential algebraic equation (32). Given a set of sensors, estimate the joint torques \(\tau \in \mathbb {R}^{n_J}\).

Problem 3 (External force estimation)

Consider the differential algebraic equation (32). Given a set of sensors, estimate the external forces \({\mathrm {f}}_i (i \in \mathcal {I}_N)\), \({\mathrm {f}}^x_i (i \in \mathcal {I}^x_N)\) and their locations J i (q), J i,x (q).

Problem 4 (Floating-base estimation)

Consider the differential algebraic equation (32). Given a set of sensors, estimate the floating-base component H ∈ SE(3) of the state vector q .

5 State of the Art

In this section, we discuss a number of estimation problems which have been addressed in the field of humanoid robotics. Section 5.1 presents previous literature on the problem of joint velocity and acceleration estimation (Problem 1). Section 5.2 focuses instead on the problem of estimating joint torques (Problem 2). The problem of estimating external and contact forces (Problem 3) is considered in Sect. 5.3, which considers separately the problems of force localization (Sect. 5.3.1) and magnitude and direction estimation (Sect. 5.3.2). Two dedicated sections present solutions of the floating-base estimation problem (Problem 4) subdivided into two subcategories. In particular, Sects. 5.4 and 5.5 discuss estimations which rely or not on range and vision sensors. Finally, Sect. 5.6 considers combined approaches which handle simultaneously different estimation problems (e.g., calibration, physical consistency, torque, and floating-base estimations).

5.1 Joint Velocity and Acceleration Estimation

Classical control algorithms for fixed-base robots [36] and their extensions to free-floating robots [28] require access to joint velocity measurements. Traditionally, joint positions q J can be directly measured with joint encoders from which joint velocities \(\dot q_J\) and accelerations \(\ddot q_J\) are computed via numerical differentiation. These estimations are often too noisy to be used in a control algorithm, and filtering them introduces a potentially destabilizing time delay, preventing the use of feedback gains high enough to achieve satisfactory stiffness and damping. Recently the use of miniaturized gyroscopes and accelerometers has been proposed as a possible source of information to achieve reliable velocity and acceleration estimation. In the following, we discuss three approaches recently proposed by Vihonen et al. [44], Xinjilefu et al. [50], and Rotella et al. [35] to exploit whole-body distributed accelerometers and gyroscopes.

Vihonen et al. [44] addressed the problem of joint velocity and acceleration estimation to circumvent issues related to limited joint encoder bandwidth and lag induced by classical filtering. The authors propose an estimation strategy based on rate gyros and accelerometers only, pointing out that magnetometers are unusable in many practical situations. Estimated quantities include joint velocities and joint accelerations. Joint velocities are recursively estimated from the chain root to the leaves (see the link ordering defined in Sect. 3.8) as the difference between the link angular velocity (as measured by the attached gyroscope) and the sum of previous joint velocities. Similarly, joint accelerations are estimated as the difference between the link angular acceleration and the sum of previous joint accelerations. Since no direct link acceleration measurement is available, the authors propose two estimation strategies: numerical differentiation on one hand and direct estimation on the other hand. The latter is based on six linear accelerometers rigidly attached at different locations of a rigid link, and authors show that the link angular acceleration can be estimated by combining the measurements from accelerometers and gyroscopes. The authors present some accuracy results: angular error 0.7 [deg] peak absolute error (PAE) and 0.16 [deg] root mean square error (RMSE), joint velocity error 0.07 [deg/s] RMSE, and joint acceleration error 8.3 [deg/s2] RMSE.

Rotella et al. [35] proposed a very similar approach with applications to humanoid robotics. Joint accelerations are estimated with a procedure similar to the one presented in [44] which exploits a second inertial measurement unit attached to each link. Authors propose a calibration procedure which allows estimation of the sensors’ positions and the orientations. The procedure relies on the gyroscope measurements whose time derivatives are numerically computed with zero-delay filters, exploiting the fact that off-line calibration procedures can rely on acausal estimation strategies. Accuracy of the estimation is improved with a Kalman filter to fuse joint velocities and accelerations computed from inertial sensors with joint position sensor measurements. The joint velocity filter adopts two different definitions of its state dynamics. First, authors propose to use measured joint torques and end-effector wrenches with the robot’s whole-body dynamic model. Second, authors proposed a simple first-order approximation of joint velocities by assuming that joint accelerations are either estimated or replaced by the desired or predicted joint accelerations from the applied controller. Experimental results have been proposed to compare link 3D velocities, with joint velocities and calibrated joint velocities; the validation is conducted by proving that estimated velocities allow to increase velocity gains with improved tracking performance in performing a sinusoid tracking experiment.

Xinjilefu et al. [50] proposed a similar approach based on a Kalman filter with a state containing the vector of joint velocities. The measurement set included the derivative of the sensed joint positions and the available gyroscopes. As in [35], the state dynamics include desired acceleration computed by the applied controller (in this case, a QP controller). A sequential Kalman filter is adopted to reduce computational complexity with a block-diagonal noise covariance matrix. Gyroscope orientations are calibrated in SO(3) with an analytic solution of the associated least squares problem identical to the one proposed in [35]. Gyroscope biases are calibrated at the beginning of the experiment, and no results of online estimation were provided. The comparison of the proposed Kalman filter with classical (causal and acausal) filtering strategies shows that the Kalman filter traces are about 15–20 [ms] faster than the low-pass filtered ones .

5.2 Joint Torque Estimation

Joint torques are yet another important component of the differential equation (32) governing a humanoid dynamic behavior. Even though joint torques are not part of the system state, their estimation is crucial to improve the system state estimation accuracy. In this section, we discuss the problem of joint torque estimation which often boils down into the problem of distinguishing the two components which contribute to the total joint torque: the actuating motor torques, τ, and the external joint torques due to the i-th contact, \(J^T_i(q) {\mathrm {f}}_i\), in (32b). In the following, we discuss previous approaches to the problem of joint torque estimation. De Luca et al. [7], Tien and Albu-Schäffer [41], Park and Chung [30], and Sotoudehnejad et al. [37] have proposed approaches which estimate external joint torques given measurements of the joint motor torques. Zhang et al. [52] and Fumagalli et al. [14] have focused on the problem of estimating motor torques given total joint torques.

De Luca et al. [7] proposed an approach to estimate the effect of external forces on joint torques from motor torque measurements. Their approach is based on the definition of a dynamic quantity, the residual, which acts as a collision identification signal. It has two main features: first, it can be computed from the motor torques and the joint positions and velocities; second, its dynamics are governed by a linear differential equation, and therefore the residual is zero if no external contact force is applied to the manipulator. The definition of the residual for both rigid and elastic joint manipulators is proposed by maintaining the two aforementioned features. Tien and Albu-Schäffer [41] adopted the approach in [7] at the actuator level and exploited the residual computations to estimate the joint friction, modeled as the sum of two linear terms: Coulomb and viscous friction.

Park and Chung [30] extended the residual approach proposed by De Luca et al. [7] to estimate both actuating motor torque and external link torque. Authors point out that the residual definition in [7] can be framed within a general theoretical framework, named linear disturbance observer (DOB). The proposed approach is tested on an elastic joint prototype with custom torque sensors.

Sotoudehnejad et al. [37] proposed yet another extension of the residual concept, focusing in particular on the problem of using time-varying thresholds on the residuals to detect external contacts. Authors proposed an accurate model of different modeling errors to define tight thresholds. Considered errors include friction models and uncertainties on the manipulator dynamic parameters (mass, inertias, centers of mass).

Zhang et al. [52] proposed a technique to estimate joint torques by accurate measurements of the harmonic drive flex spline deflections. The proposed experiments rely on two encoders whose difference is proportional to the spline deflection. A high-resolution encoder is placed on the joint, and a standard optical incremental encoder measures the motor position. The obtained estimation accuracy is satisfactory, considering that the root mean square error averages 0.25 [Nm] and never exceeds 0.5 [Nm]. This level of accuracy requires extensive modeling of phenomena which are often neglected. First, the authors characterized the harmonic drive kinematic errors (the deviation from the ideal gearbox behavior) with periodic functions. Second, authors modeled the flex spline stiffness with a quadratic curve subject to a hysteresis effect.

Fumagalli et al. [14] considered the problem of joint torque estimation with whole-body distributed force/torque and tactile sensors. The proposed estimation strategy consists in a modified recursive Newton-Euler algorithm (RNEA) for solving the associated inverse dynamic problem. The classical RNEA boundary conditions (end-effector forces and base accelerations) are replaced with available measurements (embedded force/torque sensors and inertial measurement units). The RNEA propagation ordering is redefined according to the position of external contacts, sensed at real time by the tactile sensors. Successive publications from the authors have shown that the proposed estimation strategy can be used to estimate the joint viscous and Coulomb frictions and to implement inverse dynamic control strategy with torque control [28]. Moreover similar estimation strategies have been proven to be applicable on the HRP-2 humanoid, which is not equipped with tactile sensors and is not equipped with joint torque sensing technologies [10].

5.3 Force Estimation

External \({\mathrm {f}}_i^x\) and contact f i forces act as external inputs on humanoids. Their estimation is therefore fundamental for a complete characterization of the robot dynamic behavior. In the absence of reliable tactile technologies, previous approaches mainly focused on the problem of estimating the position of externally applied forces (Sect. 5.3.1). Within this context, authors proposed some strategies to estimate the applied force magnitude, but all proposed solutions suffer from identifiability issues [21, 22]. Section 5.3.2 discusses exact solutions to the problem of estimating contact wrench magnitude and direction, relying on the assumption that contact locations are sensed with a tactile sensor.

5.3.1 Force Position Estimation

In this section, we discuss previous approaches on the estimation of contact location which reduces to the problem of retrieving the Jacobian of the associated contact point.

Del Prete et al. [8] considered the problem of estimating the position of an external contact assuming that the contact generates a pure force (i.e., null torque), and its effects are sensed with a force/torque sensor embedded in the robot structure (e.g., at the robot base). The estimation relies on the classical force transformation principle [36, Section 3.8.3] according to which the spatial transformation of a pure force generates a torque equal to the cross product of the application vector and the applied force. Estimating the application vector (i.e., the contact location) boils down to a linear least squares which identifies a unique solution if at least two noncollinear forces. The proposed procedure allows the calibration of a tactile array. The achieved spatial resolution is 7 mm but can be improved to 6 mm projecting the estimated locations on the mesh representing the robot geometry.

Magrini et al. [21] proposed a contact estimation strategy which is based on the residual concept proposed by De Luca et al. [7]. The estimation of the contact location relies on a Kinect to detect the human posture and on a robot three-dimensional rendering to predict possible contact location on the robot surfaces. Given the contact location, a least squares estimation is proposed to obtain the contact force magnitude and direction via a Moore-Penrose pseudo-inverse computation. Only the force components which do not lie on null space of the contact Jacobian can be estimated, and this is a limitation of the proposed approach.

Manuelli and Tedrake [22] formulated another contact estimation strategy based on [7] and extended it to the case of multiple contacts. Authors propose to solve two problems simultaneously: contact location on the one hand and external force estimation on the other. To reduce the associated computational complexity, the two problems are separated to exploit the fact that the force estimation problem is convex if contact locations are known. The proposed estimation problem is solved with a particle filter. Authors interestingly highlight several limitations of the proposed approach, focusing in particular on the effect of model uncertainties and on force magnitude identifiability issues .

5.3.2 Force Magnitude and Direction Estimation

In this section, we focus on the problem of contact wrench estimation. Previous approaches assume that contacts happen at known locations [31, 32] which might be sensed with a whole-body distributed tactile sensor [6, 9].

Park and Khatib [31] considered the problem of estimating contact forces assuming a spring contact model for the external forces. A classical task-space inverse dynamic controller is adopted to control contact forces. The proposed estimation technique, named Active Observer (AOB), consists of a Kalman filter for estimating the difference between the input command force and measured output force.

Petrovskaya et al. [32] extended the AOB estimator in [31] to include the geometric model of the interaction. In particular, the proposed approach defines a probabilistic model of the robot and the environment. The estimation strategy aims at estimating the position of the contact point on the robot while maintaining constant the model of the environment. Estimation is limited to the single-contact case even if experiments include some multi-contact scenarios handled with compliant exploration strategy described by Park and Khatib [31].

Del Prete et al. [9] proposed a strategy to estimate contact wrenches given whole-body distributed force/torque and tactile sensors. The estimation strategy relies on the joint torque estimations described by Fumagalli et al. [14] and detailed in Sect. 5.2. As a by-product of the rearranged Newton-Euler recursion, authors present an algorithm to compute the total (external) wrench acting on a subpart. Subparts are defined by the kinematic subchains obtained by dividing the robot at the level of the available force/torque sensors. Exact estimation is possible only if there is exactly one contact wrench per subchain; otherwise an approximated solution based on linear least squares is proposed.

Ciliberto et al. [6] extended previous approaches by Del Prete et al. [8, 9] which rely on distributed tactile sensors. At first, the approach in [8] is used to perform a spatial calibration of the tactile elements. Then, another calibration procedure is used to estimate the tactile sensor stiffness by correlating the force/torque measurements with the perceived tactile pressure. The resulting stiffness calibration allows to retrieve the normal component of all forces applied on the tactile sensors thus reducing the approximation errors described in [9].

5.4 Free-Floating State Estimation Without Range and Vision Sensors

In this section, we consider the problem of estimating the state of a free-floating mechanical system. The section is divided into two categories depending on the parameterization adopted for the free-floating system. Section 5.4.1 presents approaches that consider the center of mass position as part of the system state. In particular Xinjilefu and Atkeson [46], Stephens [39], and Pongsak et al. [33] proposed approaches focusing on the estimation of the center of mass position and velocity. To reduce computational complexity, these approaches have dealt either with a simplified model of the system (e.g., linear inverted pendulum model, LIPM) or with a reduced (if not minimal) set of coordinates (e.g., joint coordinates). Section 5.4.2 presents previous approaches which parameterized the free-floating system state with the floating-base pose, represented by the quantities H ∈ SE(3) and \({\mathrm {v}} \in \mathbb {R}^3\) in (32). These approaches typically define a full-body articulated model. Benallegue and Lamiraux [3] focused on estimating the humanoid pose assuming that uncertainty is concentrated at the ankle. Rotella et al. [34] included the humanoid odometry in the pose estimation, considering the kinematic constraints induced by footholds. Xinjilefu et al. [47, 48] proposed two approaches to reduce the computational complexity of whole-body estimation and to include linear inequalities to the estimation.

5.4.1 Center of Mass Estimation

Pongsak et al. [33] proposed one of the very first approaches to the problem of estimating the center of mass position and velocity of a humanoid robot. Considered measurements are provided by two linear accelerometers and a gyroscope. The estimation strategy considers the humanoid as a single rigid body subject to an external wrench, represented as a measured input in the rigid-body dynamics. The resulting dynamical system is linearized about the horizontal and stable position. All measurements (accelerometers and gyroscope) are corrupted by noises. Modeling these noises with the information provided by the sensor manufacturers, authors propose an optimal H2-filter for the associated estimation problem.

Stephens [39] has proposed a linear inverted pendulum model (LIPM) to estimate some kinematic (center of mass position and velocity) and some dynamic (center of pressure and external forces) variables. The proposed estimation strategy deals with modeling errors such as measurement offsets and external disturbances. The author considers the associated state equations (e.g., center of mass offset, external force offset) and conducts an observability analysis on each of the proposed alternatives. A comparison of the different approaches is conducted with a validation experiment on the Sarcos humanoid. The conclusion drawn is that the presented models will not allow the estimation of both center of mass offset and external force disturbances and that there is a trade-off between precisely estimating the disturbance and the state.

Xinjilefu and Atkeson [46] proposed yet another approach to the problem of estimating the center of mass position and velocity of a walking humanoid robot. The goal of the chapter is to quantitatively compare the results obtained with the approach proposed in [39] with those obtained with an alternative estimation strategy based on a five-link humanoid planar model. In the linear inverted pendulum model, considered sensors are the feet position with respect to the center of mass and the distribution of vertical forces among feet; in the five-link model sensors are joint angles, joint torques, force/torque sensors at contacts, and an IMU mounted on the torso. The estimation is conducted over several footsteps, and virtual measurements are included in the estimation to represent non-slipping feet contacts and the associated kinematic constraints. Experiments show that the Kalman filter based on the planar five-link model outperforms the LIPM filter, mainly due to the torso rotating around the hip joint during walking .

5.4.2 Floating-Base State Estimation

In the previous section, we discussed approaches which have been proposed in literature to estimate position, velocity, and acceleration of free-floating articulated rigid bodies. These approaches considered either simplified models or reduced (if not minimal) set of coordinates (e.g., joint coordinates) so as to reduce computational complexity. In this section, we discuss recent approaches that have considered full-body models which guarantee the physical consistency of the resulting estimation.

Aoustin et al. [2] considered a very specific estimation problem: the orientation of the planar humanoid Rabbit. The particularity of the proposed approach is the conducted observability analysis which relies on the existence of a state-space transformation of the underlying nonlinear dynamic model. The transformation leads to an almost linear representation (Brunovsky’s form) which allows the application of different observers: a high-gain observer, a step-by-step sliding mode observer, and an observer based on high-order sliding mode differentiation. No significant differences were observed among the proposed estimators which lead to similar errors: less than 2 [deg] error was observed on orientations and less than 0.65 [rad/s] on angular velocities.

Benallegue and Lamiraux [3] considered a simplified floating-base estimation problem for the HRP-2 humanoid, assuming prefect whole-body kinematics but uncertain kinematics at the ankle, modeled as series elasticity concentrated at the foot. The authors represent this uncertainty as a pure rotation, assuming that linear translations of the compliance are negligible. Constraints on the system (such as contact points at the feet) are represented as virtual measurements as proposed by Xinjilefu and Atkeson [46]. The estimated state is composed of angular and linear position, velocity, and acceleration of the floating base. The authors adopted a state transition model based on a simple integrator and a random walk for the floating-base acceleration. The estimation strategy is based on an extended Kalman filter with measurements from the robot gyroscope and accelerometer. Validation experiments consisted of stabilizing the humanoid hand in an inertial reference frame, using the estimated floating-base position to compensate for the ankle elasticity; results show that the resulting positioning error reduces from 20 to 2 [cm] thanks to the proposed estimation strategy.

Rotella et al. [34] proposed a humanoid floating-base estimation strategy which extends the approach by Bloesch et al. [4] which was initially designed and tested on quadrupeds. The underlying idea is to include kinematic constraints (such as footholds) to infer the floating-base motions due to the (measured) joint motions. The proposed filter state contains the base position, the base velocity, the base orientation, and the foot positions in the world reference frame. A quaternion-based representation of the robot attitude allows one to obtain a global singularity-free representation. The system state is augmented to represent the accelerometer and gyroscope biases. An extended Kalman filter approach is proposed to solve the estimation problem. An interesting observability analysis is conducted by means of a taxonomy of rank deficiencies based on the different gait phases (single-point foot, double foot point contact, and flat foot). Reported localization errors in simulation are minimal: 0.05 [m/s] on linear velocities and 0.05 [rad] on orientations for a 120 -s walking task.

Xinjilefu et al. [47, 48] proposed yet another floating-base estimation framework. In [47], authors simultaneously consider Problems 1 and 4, decomposing the global estimation problem in three subproblems: base estimation, joint position estimation, and joint velocity estimation. This approach is proven to simplify the computation burden of the full-body model estimation. The floating-base estimation relies on a quaternion parametrization to avoid singularities. The joint velocity estimation includes both the contact constraints (32c) and the system dynamics (32b), projected into the orthogonal complement of the constraints to cancel out contact forces as proposed in [23]. In [48], authors framed the same estimation problem as a quadratic programming, thus including inequality constraints in the estimation problem. Considered measurements include joint velocities, joint torques, contact forces, pelvis angular velocity, and linear acceleration; as in previous approaches, kinematic constraints (such as footholds) are included as virtual measurements. Validation experiments conducted on the Atlas humanoid prove that the QP estimates demonstrate good tracking performance and less delay which allow to reduce the controller chattering .

5.5 Free-Floating State Estimation with Range and Vision Sensors

The DARPA Robotics Challenge [18] encouraged the competing teams to refine their systems (including state estimation) and to take them outside of the laboratory. A major challenge teams faced was the drift of the estimate of the robot’s floating base (pelvis link) which in turn causes the position of objects of interest (e.g., doorways or stairway steps) to become inconsistent. Three teams competing with the Boston Dynamics Atlas robot [17, 19, 49] reported their open-loop (i.e., using proprioceptive sensing only) drift rate to be approximately 1 cm per step (over distances of more 10 m) at the end of the competition.

As described above, approaches commonly use kinematic foothold constraints to estimate pelvis velocity. As a result, the performance of the floating-base estimate is dependent on the accuracy of contact classification and the careful calibration and filtering of joint sensing. Drift is often characteristic – for example, drifting forward slightly when walking forward and vice versa.

To entirely avoid estimation drift, it is common to explore approaches from the field of SLAM (Simultaneous Localization and Mapping) using range and vision sensors. The literature of the field is vast, and we will not present it fully here; however, an additional challenge for a robot such as a humanoid is that performance of many SLAM approaches is in fact correlated with the smoothness of the motion of the robot. Irregular and jerky motion usually contravenes assumptions underlying many approaches as well as causing motion blur in camera images and saturation in accelerometers.

Here we split approaches into two subcategories: those using LIDAR sensors and those using cameras.

5.5.1 LIDAR-Based Approaches

Laser range finders or LIDAR – because of its long ranging performance and precise measurement capability – is perhaps a more favorable approach when achieving drift-free localization over short ranges (less than 30 m). Hornung et al. [16] was a first approach achieving localization in a two-floor setup with a 2D laser on the head of a NAO robot. The approach combined LIDAR Monte Carlo integration with information about the robot’s height and orientation to produce an estimate in 6D.

While not carrying out localization, per say, the work of Nishiwaki et al. [25] is impressive. The authors use LIDAR to build a terrain map and then localize in it while continuously walking. This demonstration makes it clear that the underlying proprioceptive odometry system has low drift.

For a full-size humanoid walking continuously, the approaches of Fallon et al. [11] and Koolen et al. [17] are similar. Both demonstrated continuous localization over 3D walking courses which require precise accuracy. Fallon et al. [11] used a Gaussian particle filter to estimate at 40 Hz the position of the robot using each individual LIDAR scan for low latency. The Gaussian particle filter approach is interesting in that the filter is sampled only in the states which are observable by the filter (e.g., the yaw of the robot, but not its pitch and roll). Meanwhile Koolen et al. [17] built upon the common iterative closest point (ICP) algorithm: first collating a point cloud using a tuned proprioceptive odometry algorithm and then aligning that point cloud to a prior map.

5.5.2 Camera-Based Approaches


Because of the computational challenges of running real-time visual SLAM onboard a humanoid robot, initially researchers studied localization against prior maps. Early approaches such as [40] used stereo point clouds to localize against a map – in that case a height map in 2D.

The work of Gonzalez-Aguirre et al. [15] explores active localization where the robot carried out gazing motions to actively improve the performance of a particle filter localizer. The approach used an inverse graphics method to render edges of a CAD model of a kitchen environment to match to edges detected by a head camera pointing the real kitchen.

Much later Alcantarilla et al. [1] demonstrated monocular visual localization using a single camera in a more general manner. The approach localized against a precise prior map of visual point features with visibility of the 3D points in different poses calculated so as to speed up online feature matching. In common with many mapping systems for humanoids, the demonstrated motion was unfortunately limited to a small 2 m space but was interesting in that different lighting conditions and datasets were examined as was substantial comparison against PTAM, a commonly used visual SLAM system.

Finally, the approach of [5] is interesting in that it achieves localization without using either a camera or a LIDAR. The approach (demonstrated on a quadruped) used information from contact sensors to localize a quadruped on a detailed height map using particle filtering.

Simultaneous Localization and Mapping

With regard to the full problem of SLAM, the work of Stasse et al. [38] has a clear breakthrough – achieving the first SLAM system capable of carrying out loop closing in real-time running onboard a humanoid. The approach uses the seminal monoSLAM system to carry out map construction.

Yoon et al. [51] later developed an EKF-based SLAM with a stereo camera which could produce minimal drift for motion in a 4 m box. This work clearly describes the development of robust visual descriptors as well as motion model switching to compensate for the uncertain motion of the humanoid. Oriolo et al. [29] is interesting as it demonstrates that even the monocular cameras on a small NAO robot are now able to use visual navigation. The approach used the camera input for visual odometry and in doing so is able to demonstrate a reduced rate of drift (rather than attempting to build a map).

Finally the work of Fallon et al. [12] is the first to use the output of a dense visual mapping system to produce terrain information which is used directly as input to the robot’s control system for a continuously walking humanoid over rough terrain .

5.6 Combined Estimation

Some recent approaches to the estimation problem proposed a combined strategy, based on the idea of mixing different estimations simultaneously. Wu et al. [45] considered estimation and calibration simultaneously. Lowrey et al. [20] proposed an estimation strategy which incorporates a physics engine into the estimator to adjust the state estimates so that the inferred contact interactions are consistent with the observed accelerations. Nori et al. [27] proposed two alternative solutions to the simultaneous estimation of forces and joint-related quantities (position, velocity, and acceleration estimation). Hereafter we discuss this approach in details.

Wu et al. [45] developed a novel alternating optimization approach for simultaneously tracking space-time joint angles and calibrating associated parameters. Considered sensors are potentiometers at joints, motion capture markers attached to links, and extension sensors on tendons. Simultaneously estimated variables are joint angles and sensor parameters. The latter include (1) potentiometer gain and bias, (2) motion capture markers positions, and (3) rotation/translation of the motion capture reference frame with respect to the robot reference frame. The mixed estimation strategy is similar to an EM algorithm: a p-phase that estimates sensor parameters while keeping joint angle fixed and a q-phase that estimates joint angles using the updated sensor parameters. Comparing to the manual calibration done by manufacturer, the proposed estimation strategy reduced the error by half in all sensor types.

Lowrey et al. [20] considered accurate approach to state estimation which reestimates the trajectory over a time window into the past and uses a recursive prior obtained from the previous time step via simulation with accurate physics engine (MuJoCo). The estimation is framed in a Bayesian-like framework (i.e., EKF) enhanced by updating the system’s history over a fixed number of steps, thus obtaining an increased accuracy and maintaining real-time performances. Exploited sensors include encoders, contact forces, 3D markers, and inertial measurement units. The estimation is conducted with a maximum a posteriori. The maximized function is the sum of the data likelihood and a corrective term which penalizes physical inconsistent estimations.

Nori et al. [27] proposed an alternative estimation framework which performs a multimodal estimation and takes into account the physical consistency of the computed quantities. Estimated variables include the system state (Problem 1: joint positions and velocities) and several dynamic quantities (Problem 2: joint torques, internal force/torques; Problem 3: external contact force/torques). Exploiting a Bayesian formulation, physical consistency is obtained by defining a suitable prior on estimated quantities. The estimation relies on the computation of the posterior probability by conditioning the prior on measurements from heterogeneous sensors: joint encoders, link gyroscopes and accelerometers, and whole-body distributed six-axis force/torque sensors. Experimental validation in simulation and on a real robot (the iCub humanoid) shows an accuracy of 0.51 [deg/s] on joint velocities and 0.02 [rad] on joint positions which drops down to \({\approxeq }0.1\) [rad] on the real robot. Computational complexity has been shown to be reducible operating on the algebraic structure of the problem [26].

6 Future Directions and Open Problems

Improving estimation reliability and accuracy, and in turn providing the controller with improved input signals, will be a key requirement to increase the performance of the next generation of humanoids. Several improvements to the aforementioned approaches will be necessary. In this section, we focus on future directions and open problems and their connections with previous literature.

Computational efficiency will be a core requirement for all future estimation strategies. Simple estimation problems, like inverse and forward dynamics, have been extensively optimized from the computational point of view [13]. Significant simplifications have been achieved by exploiting the sparsity induced by the tree-like structure of the humanoid’s kinematic model. Similar advantages can be ported to a wider class of estimation problems as recently proposed in [27, Section V]. Other simplifications can be obtained by decomposing the original non-convex estimation problem into simpler convex subproblems as proposed in [22] and [47].

Humanoids are free-floating constrained system, and therefore constrained estimation will be another core requirement. The state of a constrained mechanical system evolves on a manifold whose topology is nontrivial, especially when in the presence of contacts at multiple locations. The state estimation problem should take these constraints into account similarly to the approach proposed in Xinjilefu and Atkeson [46]. Extensions of this approach should be capable of accurate estimation even when constraints are broken or established. In general, there is a need for solutions to the problem of state estimation for hybrid (or switching) systems with applications to constrained mechanical systems as the one in equation (32).

Multimodality and sensor fusion will play an important role in forthcoming applications. To provide a robust estimation infrastructure, existing techniques should be consolidated in a unique framework for sensor fusion of the different sensor modalities including inertial sensing, joint encoders, force, and tactile sensing. Preliminary investigations of this have already been performed in [20] and in [27]. However, these methods have been tested mainly in simulation or in simple experimental settings, while in future applications, the effort will be focused to bring these techniques in a real-world scenario and eventually improve them to cope with sensor-specific issues.

Adaptive calibration and hyper-parameter tuning will be fundamental contributions to improve the estimation accuracy. It is often difficult to have an exact model of the controlled system. Adaptive calibration and tuning might reduce the different model uncertainties affecting the system (e.g., errors in the manufacturing process and unmodeled elements such as wires and cables). Current available approaches deal with sensor bias [39], sensor positions [35], and dynamic parameters [43]. Forthcoming applications based on multimodal sensor fusion will require dedicated techniques for estimating the relative accuracy of the different sensor modalities to weight their contribution in the estimation.

Improvements in the estimation can be obtained also by exploiting context-dependent information, provided by tactile sensing and RGB-D sensing, and, in an indirect way, by the available world model. For example, if there is an external contact on a part of the robot not covered by the skin, the approximate location of the contact can be detected by the RGB-D sensor. It is worth noting that, with the notable exception of [21], RGB-D sensing has been traditionally limited to SLAM applications [11]. Similarly, combining tactile and proprioceptive sensing (i.e., joint encoders) can lead to an “embodied” representation of the surrounding environment as in the paper by Petrovskaya et al. [32].

Last but not least, the proposed estimation approaches need to be validated both from the theoretical and practical points of view. At the theoretical level, observability analysis should be conducted. In general approaching a configuration where a given state is non-observable (i.e., the observability matrix loses rank) causes significant degradation in the estimation accuracy (i.e., the observability matrix has a relatively high condition number). Thus non-observable conditions should be avoided and estimation accuracy should be carefully analyzed. From a practical point of view, novel estimation strategies should be compared with previous approaches to validate their improved accuracy. In practice this validation might be sometimes nontrivial. Bloesch et al. [4] proposed the use of an additional, more accurate sensor as a validation ground-truth. Rotella et al. [35] preferred an indirect validation based on showing that the estimation could improve the control performances (i.e., tracking error).

7 Conclusions

In this chapter, we discussed previous sensor fusion approaches relevant to humanoid robotics. Focus is primarily on state estimation but includes other relevant problems: joint torque estimation, contact force estimation, contact position estimation, and center of mass estimation. Previous approaches are discussed in relation to features which are relevant to humanoid applications: computational complexity, observability analysis, modeling accuracy, and floating-base parameterization. This chapter concludes by observing that despite possible some open problems remain, the path paved by previous approaches indicates promising future direction which might overcome the limitations of current approaches.



This chapter was supported by the FP7 EU projects CoDyCo (No. 600716 ICT 2011.2.1 Cognitive Systems and Robotics) and Koroibot (No. 611909 ICT-2013.2.1 Cognitive Systems and Robotics).


  1. 1.
    P.F. Alcantarilla, O. Stasse, S. Druon, L.M. Bergasa, F. Dellaert, How to localize humanoids with a single camera? Auton. Robot. 34(1), 47–71 (2013). ISSN:1573–7527CrossRefGoogle Scholar
  2. 2.
    Y. Aoustin, F. Plestan, V. Lebastard, Experimental comparison of several posture estimation solutions for biped robot Rabbit, in IEEE International Conference on Robotics and Automation. ICRA 2008, May 2008, pp. 1270–1275.
  3. 3.
    M. Benallegue, F. Lamiraux, Humanoid flexibility deformation can be efficiently estimated using only inertial measurement units and contact information, in 2014 IEEE-RAS International Conference on Humanoid Robots, Nov 2014, pp. 246–251.
  4. 4.
    M. Bloesch, M. Hutter, M.H. Hoepflinger, C.D. Remy, C. Gehring, R. Siegwart, State estimation for legged robots – consistent fusion of leg kinematics and IMU, in Robotics: Science and Systems VIII, 2012. Google Scholar
  5. 5.
    S. Chitta, P. Vemaza, R. Geykhman, D.D. Lee, Proprioceptive localization for a quadrupedal robot on known terrain, in Proceedings 2007 IEEE International Conference on Robotics and Automation, Apr 2007, pp. 4582–4587.
  6. 6.
    C. Ciliberto, L. Fiorio, M. Maggiali, L. Natale, L. Rosasco, G. Metta, G. Sandini, F. Nori, Exploiting global force torque measurements for local compliance estimation in tactile arrays, in 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, Sept 2014, pp. 3994–3999.
  7. 7.
    A. De Luca, A. Albu-Schaffer, S. Haddadin, G. Hirzinger, Collision detection and safe reaction with the DLR-III lightweight manipulator arm, in 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, Oct 2006, pp. 1623–1630.
  8. 8.
    A. Del Prete, S. Denei, L. Natale, F. Mastrogiovanni, F. Nori, G. Cannata, G. Metta, Skin spatial calibration using force/torque measurements, in 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems (IEEE, 2011), pp. 3694–3700Google Scholar
  9. 9.
    A. Del Prete, L. Natale, F. Nori, G. Metta, Contact force estimations using tactile sensors and force/torque sensors. Human Robot Interaction 2012, 0–2 (2012)Google Scholar
  10. 10.
    A. Del Prete, N. Mansard, O.E. Ramos, O. Stasse, F. Nori, Implementing torque control with high-ratio gear boxes and without joint-torque sensors. Int. J. Humanoid Rob. 13(01), 1550044 (2016)Google Scholar
  11. 11.
    M. Fallon, M. Antone, N. Roy, S. Teller, Drift-free humanoid state estimation fusing kinematic, inertial and LIDAR sensing, in 2014 IEEE-RAS International Conference on Humanoid Robots, Nov 2014, pp. 112–119.
  12. 12.
    M.F. Fallon, P. Marion, R. Deits, T. Whelan, M. Antone, J. McDonald, R. Tedrake, Continuous humanoid locomotion over uneven terrain using stereo fusion, in 2015 15th IEEE-RAS International Conference on Humanoid Robots (Humanoids), Seoul, Nov 2015Google Scholar
  13. 13.
    R. Featherstone, Rigid Body Dynamics Algorithms (Springer, New York, 2008)CrossRefzbMATHGoogle Scholar
  14. 14.
    M. Fumagalli, S. Ivaldi, M. Randazzo, L. Natale, G. Metta, G. Sandini, F. Nori, Force feedback exploiting tactile and proximal force/torque sensing. Theory and implementation on the humanoid robot iCub. Auton. Robot. 33(4), 381–398 (2012)Google Scholar
  15. 15.
    D. Gonzalez-Aguirre, M. Vollert, T. Asfour, R. Dillmann, Robust real-time 6d active visual localization for humanoid robots, in 2014 IEEE International Conference on Robotics and Automation (ICRA), May 2014, pp. 2785–2791.
  16. 16.
    A. Hornung, K.M. Wurm, M. Bennewitz, Humanoid robot localization in complex indoor environments, in 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, Taipei, Oct 2010Google Scholar
  17. 17.
    T. Koolen, S. Bertrand, G. Thomas, T. de Boer, T. Wu, J. Smith, J. Englsberger, J. Pratt, Design of a momentum-based control framework and application to the humanoid robot Atlas. Int. J. Humanoid Rob. 13, 1650007 (2016)CrossRefGoogle Scholar
  18. 18.
    E. Krotkov, D. Hackett, L. Jackel, M. Perschbacher, J. Pippine, J. Strauss, G. Pratt, C. Orlowski, The DARPA robotics challenge finals: results and perspectives. J. Field Rob. 34(2), 229–240 (2017). ISSN:1556-4967. CrossRefGoogle Scholar
  19. 19.
    S. Kuindersma, R. Deits, M. Fallon, A. Valenzuela, H. Dai, F. Permenter, T. Koolen, P. Marion, R. Tedrake, Optimization-based locomotion planning, estimation, and control design for Atlas. Auton. Robot. 40, 429–455 (2016)CrossRefGoogle Scholar
  20. 20.
    K. Lowrey, S. Kolev, Y. Tassa, T. Erez, E. Todorov, Physically-consistent sensor fusion in contact-rich behaviors, in 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, Sept 2014, pp. 1656–1662.
  21. 21.
    E. Magrini, F. Flacco, A. De Luca, Estimation of contact forces using a virtual force sensor, in 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, Sept 2014, pp. 2126–2133.
  22. 22.
    L. Manuelli, R. Tedrake, Localizing external contact using proprioceptive sensors: the contact particle filter, in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 9–14 Oct 2016.
  23. 23.
    M. Mistry, J. Buchli, S. Schaal, Inverse dynamics control of floating base systems using orthogonal decomposition, in 2010 IEEE International Conference on Robotics and Automation (ICRA), May 2010, pp. 3406–3412.
  24. 24.
    R.M. Murray, Z. Li, S.S. Sastry, S.S. Sastry, A Mathematical Introduction to Robotic Manipulation (CRC Press, Boca Raton, 1994)zbMATHGoogle Scholar
  25. 25.
    K. Nishiwaki, J. Chestnutt, S. Kagami, Autonomous navigation of a humanoid robot over unknown rough terrain using a laser range sensor. Int. J. Robot. Res. 31, 1251–1262 (2012)CrossRefGoogle Scholar
  26. 26.
    F. Nori, Inverse, forward and other dynamic computations computationally optimized with sparse matrix factorizations. CoRR, abs/1705.04658, 2017.
  27. 27.
    F. Nori, N. Kuppuswamy, S. Traversaro, Simultaneous state and dynamics estimation in articulated structures, in 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Sept 2015, pp. 3380–3386.
  28. 28.
    F. Nori, S. Traversaro, J. Eljaik, F. Romano, A.D. Prete, D. Pucci, iCub whole-body control through force regulation on rigid noncoplanar contacts. Front. Robot. AI 2(6) (2015) ISSN:2296-9144.
  29. 29.
    G. Oriolo, A. Paolillo, L. Rosa, M. Vendittelli, Humanoid odometric localization integrating kinematic, inertial and visual information. Auton. Robot. 40(5), 867–879 (2016). ISSN:1573-7527. CrossRefGoogle Scholar
  30. 30.
    Y.J. Park, W.K. Chung, External torque sensing algorithm for flexible-joint robot based on disturbance observer structure, in 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, Sept 2014, pp. 4735–4741.
  31. 31.
    J. Park, O. Khatib, Multi-link multi-contact force control for manipulators, in Proceedings of the 2005 IEEE International Conference on Robotics and Automation, Apr 2005, pp. 3613–3618.
  32. 32.
    A. Petrovskaya, J. Park, O. Khatib, Probabilistic estimation of whole body contacts for multi-contact robot control, in Proceedings 2007 IEEE International Conference on Robotics and Automation, Apr 2007, pp. 568–573.
  33. 33.
    L. Pongsak, M. Okada, Y. Nakamura, Optimal filtering for humanoid robot state estimators, in Proceedings of SICE System Integration Division Annual Conference (SI2002), 2002Google Scholar
  34. 34.
    N. Rotella, M. Bloesch, L. Righetti, S. Schaal, State estimation for a humanoid robot, in 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE, 2014), pp. 952–958Google Scholar
  35. 35.
    N. Rotella, S. Mason, S. Schaal, L. Righetti, Inertial sensor-based humanoid joint state estimation, in 2016 IEEE International Conference on Robotics and Automation (ICRA), May 2016, pp. 1825–1831.
  36. 36.
    B. Siciliano, L. Sciavicco, Robotics: Modelling, Planning and Control (Springer, London, 2009)CrossRefzbMATHGoogle Scholar
  37. 37.
    V. Sotoudehnejad, A. Takhmar, M.R. Kermani, I.G. Polushin, Counteracting modeling errors for sensitive observer-based manipulator collision detection, in 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, Oct 2012, pp. 4315–4320.
  38. 38.
    O. Stasse, A.J. Davison, R. Sellaouti, K. Yokoi, Real-time 3D SLAM for a humanoid robot considering pattern generator information, in 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2006Google Scholar
  39. 39.
    B.J. Stephens, State estimation for force-controlled humanoid balance using simple models in the presence of modeling error, in 2011 IEEE International Conference on Robotics and Automation (ICRA), 2011, pp. 3994–3999Google Scholar
  40. 40.
    S. Thompson, S. Kagami, Humanoid robot localisation using stereo vision, in 5th IEEE-RAS International Conference on Humanoid Robots, Dec 2005, pp. 19–25.
  41. 41.
    L.L. Tien, A. Albu-Schäffer, Friction observer and compensation for control of robots with joint torque measurement, in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2008, pp. 22–26Google Scholar
  42. 42.
    S. Traversaro, A. Saccon, Multibody dynamics notation. Technical report, Department of Mechanical Engineering, Eindhoven University of Technology, 2016. Report locator DC 2016.064
  43. 43.
    S. Traversaro, A.D. Prete, S. Ivaldi, F. Nori, Inertial parameters identification and joint torques estimation with proximal force/torque sensing, in 2015 IEEE International Conference on Robotics and Automation (ICRA) (IEEE, 2015), pp. 2105–2110Google Scholar
  44. 44.
    J. Vihonen, J. Honkakorpi, J. Mattila, A. Visa, Geometry-aided angular acceleration sensing of rigid multi-body manipulator using MEMS rate gyros and linear accelerometers, in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2013Google Scholar
  45. 45.
    T. Wu, Y. Tassa, V. Kumar, J. Movellan, E. Todorov, Stac: simultaneous tracking and calibration, in 2013 13th IEEE-RAS International Conference on Humanoid Robots (Humanoids), Oct 2013, pp. 469–476.
  46. 46.
    B. Xinjilefu, C.G. Atkeson, State estimation of a walking humanoid robot, in IROS (IEEE, 2012), pp. 3693–3699Google Scholar
  47. 47.
    B. Xinjilefu, S. Feng, W. Huang, C. Atkeson, Decoupled state estimation for humanoids using full-body dynamics, in IEEE/RAS 2014 IEEE International Conference on Robotics and Automation (ICRA), 2014Google Scholar
  48. 48.
    B. Xinjilefu, F. Siyuan, C.G. Atkeson, Dynamic state estimation using quadratic programming, in 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE, 2014), pp. 989–994Google Scholar
  49. 49.
    B. Xinjilefu, S. Feng, C. Atkeson, Center of mass estimator for humanoids and its application in modelling error compensation, fall detection and prevention, in International Conference on Humanoid Robotics, Seoul, Nov 2015Google Scholar
  50. 50.
    B. Xinjilefu, S. Feng, C.G. Atkeson, A distributed mems gyro network for joint velocity estimation, in 2016 IEEE International Conference on Robotics and Automation (ICRA), May 2016, pp. 1879–1884.
  51. 51.
    S. Yoon, S. Hyung, M. Lee, K.S. Roh, S.H. Ahn, A. Gee, P. Bunnun, A. Calway, W.W. Mayol-Cuevas, Real-time 3D simultaneous localization and map-building for a dynamic walking humanoid robot. Auton. Robot. 27(10), 759–772 (2013)Google Scholar
  52. 52.
    H. Zhang, S. Ahmad, G. Liu, Torque estimation for robotic joint with harmonic drive transmission based on position measurements. IEEE Trans. Robot. 31(2), 322–330 (2015). ISSN:1552-3098. CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media B.V. 2017

Authors and Affiliations

  • Francesco Nori
    • 1
    Email author
  • Silvio Traversaro
    • 1
  • Maurice Fallon
    • 2
    • 3
  1. 1.iCub Facility, Robotics, Brain and Cognitive Sciences DepartmentIstituto Italiano di TecnologiaGenoaItaly
  2. 2.Oxford Robotics InstituteUniversity of OxfordOxfordUK
  3. 3.Robot Perception GroupUniversity of EdinburghEdinburghUK

Personalised recommendations