Defining the Pose of Any 3D Rigid Object and an Associated Distance

  • Romain Brégier
  • Frédéric Devernay
  • Laetitia Leyrit
  • James L. Crowley


The pose of a rigid object is usually regarded as a rigid transformation, described by a translation and a rotation. However, equating the pose space with the space of rigid transformations is in general abusive, as it does not account for objects with proper symmetries—which are common among man-made objects. In this article, we define pose as a distinguishable static state of an object, and equate a pose to a set of rigid transformations. Based solely on geometric considerations, we propose a frame-invariant metric on the space of possible poses, valid for any physical rigid object, and requiring no arbitrary tuning. This distance can be evaluated efficiently using a representation of poses within a Euclidean space of at most 12 dimensions depending on the object’s symmetries. This makes it possible to efficiently perform neighborhood queries such as radius searches or k-nearest neighbor searches within a large set of poses using off-the-shelf methods. Pose averaging considering this metric can similarly be performed easily, using a projection function from the Euclidean space onto the pose space. The practical value of those theoretical developments is illustrated with an application of pose estimation of instances of a 3D rigid object given an input depth map, via a Mean Shift procedure.


Pose 3D rigid object Symmetry Distance Metric Average Rotation \(\textit{SE}(3)\) \(\textit{SO}(3)\) Object recognition 

1 Introduction

Rigid body models play an important role in many technical and scientific fields, including physics science, mechanical engineering, computer vision or 3D animation. Under the rigid body assumption, the static state of an object is referred to as a pose, and is often described in term of a position and an orientation.

Poses of a 3D rigid object are in general regarded as rigid transformations and the set of poses is identified with the set of rigid transformations \(\textit{SE}(3)\), the special Euclidean group. The Lie group structure of \(\textit{SE}(3)\) makes the relative displacement of the object between two poses explicit and thus enables the definition of a distance between poses as the length of a shortest motion performing this displacement. This identification is particularly meaningful for applications where the motion of a rigid body is considered—such as motion planning (Sucan et al. 2012) or object tracking (Tjaden et al. 2016). However while there already exists numerous works regarding \(\textit{SE}(3)\) metrics, choosing how to deal with poses of an object still remains challenging, as practitioners face questions such as how to tune the relative importance of position and orientation, even in the current deep learning era (Kendall et al. 2015).

There are applications in which motion considerations are irrelevant, and for which only a notion of similarity between poses is required. Pose estimation of instances of a rigid object based on a noisy set of votes is a good example of such a problem. While motion-based applications rely on local properties of the pose space which have been the subject of a large amount of research work, applications based on similarity have to deal with numerous poses at once, performing operations such as neighborhood queries—i.e. finding poses in a set of poses similar to a given one—or pose averaging and have not gathered as much theoretical interest. Consequently, similarity measures suffering from major flaws are still used in practical applications. Following the work of Fanelli et al. (2011), Tejani et al. (2014) use a Mean Shift procedure based on the Euclidean distance between Euler angles as a representation of poses in their state-of-the-art object pose estimation method. Such a measure is fast to compute and enables the use of efficient tools developed for Euclidean spaces to perform the neighborhood queries and pose averaging required for Mean Shift, but it is not a distance. The parametrization of a rotation based on Euler angles notoriously suffers from border effects, singularities, and is dependent on the choice of frame. These issues may only have limited effects on the results announced by the authors, thanks to an appropriate choice of frames orientation and to the low variability of objects orientations within their datasets. Nonetheless, they cannot be avoided when dealing with the general case of poses having arbitrary orientations. Such an example expresses the lack of tools for dealing efficiently with large sets of poses.

Lastly, there are cases where the pose of a rigid object cannot be identified as a single rigid transformation and therefore, for which existing results cannot be applied. Such cases occur when dealing with objects showing symmetry properties such as revolution objects or cuboids, and are, in fact, common among manufactured objects. The existing literature on object pose estimation does not usually discuss how such objects are handled, and the most widespread validation method used for symmetrical objects (Hinterstoisser et al. 2012) consists in a relaxed similarity measure that cannot distinguish between poses such as a cylindrical can being flipped up or down.

Our goal in this paper is to address those issues by providing a consistent and general framework for dealing with any kind of physically admissible rigid object in practical applications. To this end, we propose a pose definition valid for any bounded rigid object, equivalent to a set of rigid transformations (Sect. 2). We then propose a physically meaningful distance over the pose space (Sect. 4), and show how poses can be represented in a Euclidean space to enable fast distance computations and neighborhood queries (Sect. 5). We show how the pose averaging problem can be solved quite efficiently (Sect. 8) for this metric using a projection technique (Sect. 7) and lastly we propose an example application for the problem of pose estimation of instances of a rigid object given a set of votes.

2 A Definition for Pose

While the notion of pose of a rigid object is widely used, e.g. in robotics or computer vision, we have not found in the literature a general definition. We therefore propose the following one:

We will refer to the set of possible poses as a pose space which we will denote \(\mathscr {C}\) for consistency with the notion of configuration space in robotics literature.

2.1 Link Between the Pose Space and \(\textit{SE}(3)\)

A pose space is highly related to the group of rigid transformations \(\textit{SE}(3)\). Let us consider a rigid object, and \(\mathscr {P}_0 \in \mathscr {C}\) an arbitrary reference pose for this object.

A rigid transformation applied to the object at its reference pose defines a static state of the object, i.e. a pose. In a similar way, a pose \(\mathscr {P} \in \mathscr {C}\) of the object can be reached through a rigid displacement from the reference pose \(\mathscr {P}_0\), and therefore \(\mathscr {P}\) can be described completely by the rigid transformation corresponding to this displacement.

We will denote \(\mathscr {P} \in \mathscr {C}\) and \(\mathbf {T} = \left( \mathbf {R}, \mathbf {t} \right) \in SE(3)\) as a couple of pose and rigid transformation—with \(\mathbf {R} \in SO(3)\) a rotation matrix and \( \mathbf {t} \in {\mathbb {R}}^3\) a translation vector. The transformation considered here is such that each point \(\mathbf {x} \in \mathbb {R}^3\) linked to an object instance at reference pose \(\mathscr {P}_0\) is transformed by \(\mathbf {T}\) into the corresponding point \(\mathbf {T}(\mathbf {x})\) of an instance at pose \(\mathscr {P}\) as follows and such as depicted in Fig. 1:
$$\begin{aligned} \mathbf {T} (\mathbf {x}) = \mathbf {R} \mathbf {x} + \mathbf {t} \end{aligned}$$
Fig. 1

Representation of corresponding points between instances at different poses of a rigid object without proper symmetries

However, the rigid transformation corresponding to a given pose is not necessarily unique and therefore the identification of \(\textit{SE}(3)\) with the pose space is in the general case incorrect. Objects—and especially manufactured ones—may indeed show some proper symmetry properties that make them invariant to some rigid displacements.

2.2 Pose as Equivalence Class of \(\textit{SE}(3)\)

Let \(M \subset SE(3)\) be the set of rigid transformations representing the same pose as a rigid transformation \(\mathbf {T}\). For the bunny object Fig. 1d, M typically consists in the singleton \(\lbrace \mathbf {T} \rbrace \). But M can also contain a continuum of poses in the case of a revolution object such as the candlestick Fig. 1a, or even be a discrete set, such as for the rocket object depicted Fig. 1e where the same pose can be represented by 3 different transformations.

By definition of \(M, G \triangleq \{ \mathbf {T}^{-1} \circ \mathbf {M}, \mathbf {M} \in M \}\) is the set of rigid transformations that have no effect on the static state of the object. This set therefore does not depend on the arbitrary transformation \(\mathbf {T}\) considered. It is moreover a subgroup of \(\textit{SE}(3)\). Indeed, combinations and inversions of such transformations can be applied to the object while leaving it unchanged, and the identity transformation has obviously no effect on the pose of the object. We will refer to the elements of this group as the proper symmetries of the object and to \(G \subset SE(3)\) as the group of proper symmetries of the object.

Given a rigid transformation \(\mathbf {T}\) defining a pose \(\mathscr {P}\), we can therefore identify \(\mathscr {P}\) to the following equivalence class \([\mathbf {T}] \subset SE(3)\), consisting in the combination of \(\mathbf {T}\) with any rigid transformation that has no effect on the pose of the object:

2.3 The Proper Symmetry Group

In the following, we propose a classification of the potential groups of proper symmetries for a physically meaningful bounded object. While models of infinite objects are commonly used e.g. for plane detection in 3D scene analysis, we do not consider those in this article as they do not correspond to actual physical objects and the definition of a suitable metric on the pose space of such objects is typically very dependent on the application. This classification will be helpful to derive the practical results associated with our proposed distance.

All proper symmetries of a bounded object necessarily have a common fixed point, thus we can consider the group of proper symmetries as a subgroup of the rotation group \(\textit{SO}(3)\) by choosing such a point as the origin of the object frame. Subgroups of \(\textit{SO}(3)\) are sometimes referred to as chiral point groups, and have been widely studied, notably in the context of crystallography. The interested reader is referred to Vainsthein (1994) for more insight on the theory of symmetry.

Ignoring the pathological case of infinite subgroups of \(\textit{SO}(3)\) that are not closed under the usual topology as they do not make sense physically, the potential groups of proper symmetries for a bounded object can actually be classified in a few categories.

In the 2D case, a bounded object will either show a circular symmetry—i.e. an invariance by any 2D rotation—or a cyclic symmetry of order \(n \in \mathbb {N}^{*}\)—i.e. an invariance by rotation of 1 / n turn. The special case \(n=1\) actually corresponds to a 2D object without any proper symmetry. Table 1 provides examples of such objects.

Similarly, we distinguish in the 3D case between five classes of proper symmetry groups, synthesized in Table 2. A 3D bounded object can show a spherical symmetry—i.e. an invariance by any rotation—or a revolution symmetry—i.e. an invariance by rotation along a given axis of any angle. This latter class can actually be split into two, depending on whether the object is also invariant under reflection across a plane that is orthogonal to the revolution axis or not. We respectively refer to these classes as revolution symmetry with or without rotoreflection invariance. In addition, we should also consider finite groups of proper symmetry, but there are an infinite number of them therefore they are considered in a general manner. We nevertheless distinguish the case of an object without proper symmetry (i.e. for which G contains only the identity transformation) from the other ones, because it is essential in our theoretical developments.
Table 1

Classification of the potential groups of proper symmetries for a 2D bounded physical object

Note that potential indirect symmetries of the object such as reflection symmetries are not accounted for. This is due to the fact that we consider an oriented 3D space—e.g. through the right-hand rule—in which reflections are not physically feasible through rigid displacements. Revolution symmetry with rotoreflection invariance is nonetheless considered since it is a proper symmetry group: the reflection symmetry can indeed be generated by the introduction of a rotational invariance of \(180^{\circ }\) along an arbitrary axis orthogonal to the revolution axis.

3 Prior Work on Metrics Over the Pose Space

We propose in this section a brief review of the recent work on metrics over the pose space of a rigid object. We consider only mathematical distances in our discussion—i.e. symmetric, positive-definite applications from \(\mathscr {C} \times \mathscr {C}\) to \(\mathbb {R}^{+}\) satisfying triangle inequality. The existing literature does not take into account potential proper symmetries of the object, and therefore, in this review, the pose space can be identified to the group of rigid transformations \(\textit{SE}(3)\).
Table 2

Classification of the potential groups of proper symmetries for a 3D bounded physical object

3.1 Objectiveness

The identification of the pose space to \(\textit{SE}(3)\) is based on the choice of two arbitrary frames: a frame linked to the object—to which we will refer to as object frame—and a fixed inertial frame such as the object frame coincides with the inertial frame when the object is in the reference pose \(\mathscr {P}_0\). For a distance to be well-defined, it should not depend on an arbitrary choice of those frames, a notion that Lin and Burdick (2000) formalize as objectiveness or frame invariance.

Among possible distances, geodesic distances have focused most interest and have been studied within the framework of Riemannian geometry on the Lie group \(\textit{SE}(3)\). Geodesic distances are well-suited for applications dealing with motions as they represent the minimum length of a motion to bring the object from one pose to an other. Park (1995) showed that there are no bi-invariant Riemannian metrics on \(\textit{SE}(3)\)—that is, invariant to any change of inertial frame (left invariance) and of object frame (right invariance). Chirikjian (2015) recently studied this question further and showed that while continuous bi-invariant metrics do not exist, there are continuous left-invariant distances that are invariant under right shifts by pure rotations.

3.2 Hyper-Rotation Approximation

Nonetheless, several authors have worked on an “approximate bi-invariant” metric (Purwar and Ge 2009) for \(\textit{SE}(3)\) through the mapping of rigid transformations to hyper-rotations of \(\textit{SO}(4)\), and the use of a bi-invariant metric on \(\textit{SO}(4)\). Techniques to perform such mapping have been proposed based on biquaternion representation (Etzel and McCarthy 1996) and polar decomposition (Larochelle et al. 2007). Such transformation unfortunately requires a scaling for the translation part, which has to be set empirically depending on the application (Angeles 2006).

3.3 Decomposition Into Translation and Rotation

Hopefully, while inertial frame invariance is necessary for the objectiveness of a metric, object frame invariance is not. Lin and Burdick (2000) indeed showed that a distance is objective if and only if it is independent of the choice of inertial frame, and transforms by a right shift in response to a change of object frame. Therefore, a method to define an objective metric consists in defining a left invariant distance considering a given object frame and always using this one, in order to avoid having to transform the distance expression.

For this technique, a frequent approach consists in splitting a pose into a position and an orientation part and to define a distance on \(\textit{SE}(3)\) based on frame invariant metrics on both \(\mathbb {R}^3\) and the rotation group \(\textit{SO}(3)\). Those metrics can then be fused in the form of a weighted generalized mean, here written with two strictly positive scaling factors a and b and for an exponent \(p \in [1, \infty ]\):
$$\begin{aligned} {{\mathrm{d}}}(\mathbf {T}_1, \mathbf {T}_2) = \root p \of {a {{\mathrm{d_{rot}}}}(\mathbf {R}_1, \mathbf {R}_2)^p + b {{\mathrm{d_{trans}}}}(\mathbf {t}_1, \mathbf {t}_2)^p }. \end{aligned}$$
The Euclidean distance is the usual choice for measuring distances between different positions. Considering the usual Riemannian distance over \(\textit{SO}(3)\), a Riemannian distance over \(\textit{SE}(3)\) can be obtained by combining those together into (Park 1995):
$$\begin{aligned} d(\mathbf {T}_1, \mathbf {T}_2) = \sqrt{ a \Vert \log (\mathbf {R}_1^{-1} \mathbf {R}_2)\Vert ^2 + b \Vert \mathbf {t}_2 - \mathbf {t}_1 \Vert ^2 }. \end{aligned}$$
This expression is particularly interesting in that the distance between orientations \(\Vert \log (\mathbf {R}_1^{-1} \mathbf {R}_2)\Vert \) corresponds to the angle \(\alpha \) of the relative rotation between the two, which can be evaluated quite easily e.g. from the following relations, using matrix or unit quaternion representations and respectively trace or inner product operators:
$$\begin{aligned} {{\mathrm{Tr}}}(\mathbf {R}_1^{-1} \mathbf {R}_2)= & {} 2 \cos (\alpha ) + 1 \end{aligned}$$
$$\begin{aligned} 1 - \left\langle \mathbf {q_1} | \mathbf {q_2} \right\rangle ^2= & {} \frac{1}{2} (1 - \cos (\alpha )). \end{aligned}$$
Without the Riemannian constraint, a large number of inertial-frame-invariant distances can be considered. Gupta (1997) notably proposed to consider a Froebenius distance for the rotation part, which also only depends on the angle of the relative rotation between the two poses:
$$\begin{aligned} \Vert \mathbf {R}_2 - \mathbf {R}_1\Vert _F = 2 \sqrt{2} |\sin (\alpha /2)|. \end{aligned}$$
Similar properties are obtained considering the Euclidean distance between representations of antipodal pairs of unit quaternions \(\mathbf {q_1}\) and \(\mathbf {q_2}\):
$$\begin{aligned} \min \Vert \mathbf {q_2} \pm \mathbf {q_1} \Vert = 2 |\sin (\alpha /4)|. \end{aligned}$$
Merging position and orientation distances together requires setting the scaling factors a and b. The choice of those factors remains a heuristic issue, and the recent work of Kendall et al. (2015) on camera pose regression using a deep neural network notably showed that this setting may have a great impact on performances. A reasonable choice in the case of object poses consists in setting the position weight b to 1 and the orientation weight a as the square of the maximum radius of the object (Di Gregorio 2008) in Eq. (4), assuming an object frame at the center of the object in order to get an upper bound of the displacement of the object’s points between two poses.

3.4 Geometric Approaches

To avoid the need for arbitrary scaling factors, some distances are based only on geometric properties of the object. A particularly interesting possibility is to define a metric based on the distance between corresponding 3D points of instances of the object at these poses, such as depicted Fig. 1. Let \(\mu \) be a density distribution relative to the object and \(V=\int \mu (\mathbf {x}) dv\) its integral over the whole object, we can formulate such a distance the following way considering an \(L^p\) norm:
$$\begin{aligned} {{\mathrm{d}}}(\mathbf {T}_1, \mathbf {T}_2) = \frac{1}{V} \left( \int \mu (\mathbf {x}) \Vert \mathbf {T}_2 (\mathbf {x}) - \mathbf {T}_1 (\mathbf {x}) \Vert ^p dv \right) ^{\frac{1}{p}} \end{aligned}$$
This expression has a strong physical meaning. It is by construction frame-invariant since its definition does not depend on a particular frame, and takes the shape of the object into account, without the need of arbitrary tuning. Martinez and Duffy (1995) suggest the use of the maximum displacement (\(p=\infty \)), and Hinterstoisser et al. (2012) used the average displacement (\(p=1\)) for a pose estimation evaluation. For the sake of tractability, those authors suggest to limit consideration to only some vertices of an object model since the integral has to be evaluated explicitly. Kazerounian and Rastegar (1992) on the other hand proposed the use of integral of square displacements (\(p=2\)) on the whole object, and showed that it could actually be evaluated efficiently given the inertia matrix of the object. Chirikjian and Zhou (1998) later on improved this formulation and extended it for arbitrary affine transformations, showing that the distance could be evaluated as a weighted Froebenius norm.

Zefran and Kumar (1996) and Lin and Burdick (2000) independently proposed a Riemannian tensor being linked to the notion of kinetic energy and therefore taking into account the object properties without the need for some arbitrary tuning. Their tensor can be seen as a local equivalent of the distance of Kazerounian and Rastegar (1992). However, to the best of our knowledge there is no known closed-form expression for the resulting geodesic distance in the general case.

3.5 Local Metric

Various others local parametrization methods exist which, by mapping locally the pose space to a Euclidean space enable to locally define distances. Those parametrizations are e.g. based on the representation of orientation with Euler angles, or the local stereographic projection of the pose space, identified to Study’s quadric—an hypersurface embedded in \(\mathbb {R}^7\) (Eberharter and Ravani 2004). In-depth discussion of this topic is out of the scope of this article as we are interested in global distances.

4 Proposed Metric

In this section, we propose a distance over the pose space of a 3D rigid bounded object, valid even for symmetric ones. This distance can be considered as an extension of the work of Kazerounian and Rastegar (1992) and Chirikjian and Zhou (1998) to arbitrary bounded objects. We also discuss some of its properties.

4.1 Formal Definition

Let \(\mathscr {S}\) be the set of points of the object at reference pose \(\mathscr {P}_0 \in \mathscr {C}\), and \(\mu \) a positive density distribution defined on \(\mathscr {S}\). In order to be meaningful, the set of points of the object and its density distribution are assumed to be compatible with the proper symmetry properties of the object and exhibit those symmetry properties. Formally, we assume they verify \(\mathbf {G} (\mathscr {S}) = \mathscr {S}\) and \(\mu \circ \mathbf {G} = \mu \) for any proper symmetry \(\mathbf {G} \in G\).

This expression is well defined. The minimum in definition (10) is reached because of the compactness of the proper symmetry group G—as a closed subgroup of \(\textit{SO}(3)\) which itself is compact—and of the continuity of \({{\mathrm{d_{no\_sym}}}}\). Moreover, this definition is by construction independent of the choice of the rigid transformations \(\mathbf {T}_1, \mathbf {T}_2\) identified to the poses considered. We verify easily that it satisfies the conditions of a distance definition: \({{\mathrm{d}}}\) is symmetric, positive-definite, and triangle inequality derives from the triangle inequality satisfied by \({{\mathrm{d_{no\_sym}}}}\), which is a direct consequence of Minkowski inequality. An equivalent formulation of this distance, involving a single minimization over G, is introduced in Proposition 1.

In typical applications, one is particularly interested in the positioning of the surface of the object. Therefore in our experiments, we consider the surface of the object as set of points \(\mathscr {S}\). The density function \(\mu \) can be used to modulate the importance of the positioning of specific areas, but without additional information it is natural to consider an uniform weight \(\mu =1\).

4.2 Objectiveness

The proposed distance is independent by construction of the choice of some arbitrary frames as it admits a purely geometric interpretation, a point we discuss in Sect. 4.3.

Definition 2 makes no assumption on the choice of object frame, and the use of a reference pose—i.e. an inertial frame—in our formulation is only here for the sake of writability. Indeed, the Euclidean distance between 3D points is invariant to isometries by definition of these, and in particular to any rigid transformation \(\mathbf {T}_3^{-1} \in SE(3)\):
$$\begin{aligned} \forall \mathbf {x}, \mathbf {y} \in \mathbb {R}^3, \Vert \mathbf {x} - \mathbf {y}\Vert = \Vert \mathbf {T}_3^{-1} (\mathbf {x}) - \mathbf {T}_3^{-1} (\mathbf {y})\Vert \end{aligned}$$
Therefore, an arbitrary new reference pose \(\mathscr {P}_3\) could be considered without any effect on the metric properties. Denoting \(\mathbf {T}_3\) a rigid transformation identified to \(\mathscr {P}_3\) relatively to the old reference pose \(\mathscr {P}_0\), we verify the independence of \({{\mathrm{d_{no\_sym}}}}\) from the choice of reference pose
$$\begin{aligned} {{\mathrm{d_{no\_sym}}}}(\mathbf {T}_1, \mathbf {T}_2) = {{\mathrm{d_{no\_sym}}}}(\mathbf {T}_3^{-1} \mathbf {T}_1, \mathbf {T}_3^{-1} \mathbf {T}_2), \end{aligned}$$
and hence the independence of the general distance:
$$\begin{aligned} {{\mathrm{d}}}([\mathbf {T}_1], [\mathbf {T}_2]) = {{\mathrm{d}}}([\mathbf {T}_3^{-1} \mathbf {T}_1], [\mathbf {T}_3^{-1} \mathbf {T}_2]). \end{aligned}$$

4.3 Geometric Interpretation

A picture being worth a thousand words, the reasoning we develop in this section is illustrated in Fig. 2 for the case of a 2D object with a rotation symmetry of \(2 \pi / 3\): a flower with three petals.
Fig. 2

Illustration of our proposed distance for a 2D object with a rotation symmetry of \(2\pi /3\). a The distance between two poses consists in the minimum distance between two poses of an equivalent object without proper symmetry—here there are 3 possible poses of the equivalent object for each pose of the original object. The distance between poses of an object without proper symmetry corresponds to the RMS distance between corresponding object points (dashed segments). b Equivalently, the proposed distance can be considered as a measure of the smallest displacement from one pose to an other—here there are actually only 3 different displacements between those two poses (solid, dotted and dashed boxes)

As we discussed in Sect. 2, a pose \(\mathscr {P}_i \in \mathscr {C}\) can be identified to a set of rigid transformations \(\left\{ \mathbf {T}_i \circ \mathbf {G}, \mathbf {G} \in G \right\} \). Each of these transformations can themselves be identified with the pose of an object with identical characteristics to the considered one but with no proper symmetries—to which we will refer to as the equivalent object and which we depict on Fig. 2 with a grey petal (in order to break the symmetry of the initial object). A pose of the object can therefore be considered as a set of poses of the equivalent object—3 in our example. Points of the equivalent object can be unambiguously put in correspondence between different poses—correspondences we represent by dashed segments on the figure. Therefore it is legitimate to define a distance between poses of the equivalent object based on the distance between such corresponding points. In this paper, we consider the RMS distance \({{\mathrm{d_{no\_sym}}}}\) as it enables efficient computations (see Sect. 5):
$$\begin{aligned} {{\mathrm{d_{no\_sym}^2}}}(\mathbf {T}_1,\mathbf {T}_2) = \frac{1}{S} \int _{\mathscr {S}} \mu (\mathbf {x}) \Vert \mathbf {T}_2 (\mathbf {x}) - \mathbf {T}_1 (\mathbf {x})\Vert ^2 ds. \end{aligned}$$
The proposed distance between two poses of the object can then be defined as the minimum distance between each potential pair of poses for the equivalent object (\(3 \times 3\) combinations in our example):
$$\begin{aligned} {{\mathrm{d}}}(\mathscr {P}_1, \mathscr {P}_2) = \min _{\mathbf {G}_1, \mathbf {G}_2 \in G } {{\mathrm{d_{no\_sym}}}}(\mathbf {T}_1 \circ \mathbf {G}_1, \mathbf {T}_2 \circ \mathbf {G}_2). \end{aligned}$$
An other and more intuitive interpretation is to consider our distance as a measure of the smallest displacement from one pose to the other. A displacement from a pose \(\mathscr {P}_1\) to an other \(\mathscr {P}_2\) is a relative transformation from a pose of the equivalent object corresponding to \(\mathscr {P}_1\) to a pose of the equivalent object corresponding to \(\mathscr {P}_2\), and the length of a displacement is measured via \({{\mathrm{d_{no\_sym}}}}\). Different pairs of poses of the equivalent object are actually linked by the same displacement—as can be observed on Fig. 2 where pairs of poses of the equivalent object being linked by the same transformation are highlighted by identical boxes. All displacements from one pose \(\mathscr {P}_1\) to an other \(\mathscr {P}_2\) are in fact considered when choosing an arbitrary pose of the equivalent object \(\mathbf {T}_1\) for \(\mathscr {P}_1\) and considering rigid transformations from \(\mathbf {T}_1\) to the poses of the equivalent object corresponding to \(\mathscr {P}_2\). Thanks to this, the distance between two poses can actually be computed considering the proper symmetries only for one pose:

This formulation is simpler than the Definition 2, however it breaks the symmetry of the roles of the two poses.


Formally, expression (16) can be deduced from the distance definition (10) as follows. Given two proper symmetries \(\mathbf {G}_1, \mathbf {G}_2 \in G\), one can perform the change of variables \(x \leftarrow \mathbf {G}_1 (x)\) and \(\mathbf {G} \leftarrow \mathbf {G}_2 \circ \mathbf {G}_1^{-1}\) to write the following equality:
$$\begin{aligned} \begin{aligned}&{{\mathrm{d_{no\_sym}^2}}}(\mathbf {T}_1 \circ \mathbf {G}_1, \mathbf {T}_2 \circ \mathbf {G}_2) \\&\quad =\frac{1}{S} \int _{\mathscr {S}} \mu (\mathbf {x}) \Vert \mathbf {T}_2 \circ \mathbf {G}_2 (\mathbf {x}) - \mathbf {T}_1 \circ \mathbf {G}_1(\mathbf {x})\Vert ^2 ds \\&\quad =\frac{1}{S} \int _{\mathbf {G}_1(\mathscr {S})} \mu (\mathbf {G}_1^{-1} (\mathbf {x})) \Vert \mathbf {T}_2 \circ \mathbf {G} (\mathbf {x}) - \mathbf {T}_1 (\mathbf {x})\Vert ^2 ds \\ \end{aligned} \end{aligned}$$
The symmetry of the object pointset and of its density ensures that \(\mathbf {G}_1(\mathscr {S}) = \mathscr {S}\) and \(\mu \circ \mathbf {G}_1^{-1} = \mu \), leading to the following result from which the conclusion is straightforward:
$$\begin{aligned} \begin{aligned}&{{\mathrm{d_{no\_sym}^2}}}(\mathbf {T}_1 \circ \mathbf {G}_1, \mathbf {T}_2 \circ \mathbf {G}_2) \\&\quad =\frac{1}{S} \int _{\mathscr {S}} \mu (\mathbf {x}) \Vert \mathbf {T}_2 \circ \mathbf {G} (\mathbf {x}) - \mathbf {T}_1 (\mathbf {x})\Vert ^2 ds \\&\quad = {{\mathrm{d_{no\_sym}^2}}}(\mathbf {T}_1, \mathbf {T}_2 \circ \mathbf {G}). \end{aligned} \end{aligned}$$
\(\square \)

4.4 Rotation Anisotropy

In case of a pure rotational displacement around the center of mass of a non symmetric object, usual metrics are solely dependent on the angle of the relative rotation between the two poses. \({{\mathrm{d_{no\_sym}}}}\) on the other hand—and the proposed distance as well—accounts for the object’s geometry and as such also depends on the considered axis. More precisely, the distance between two poses linked by such a displacement depends on the angle \(\theta \) and on the inertia moment \(I_{\mathbf {k}}\) along the axis \(\mathbf {k}\) of the relative rotation between the two poses, as follows:
$$\begin{aligned} \begin{aligned}&{{\mathrm{d_{no\_sym}}}}(\mathbf {T}_1, \mathbf {T_2}) = 2 \sqrt{I_{\mathbf {k}}} \sin \left( \frac{\theta }{2} \right) \\&\text {where } I_{\mathbf {k}} = \frac{1}{S} \int \mu (\mathbf {x}) \Vert \mathbf {k} \times \mathbf {x} \Vert ^2 ds. \end{aligned} \end{aligned}$$
This result can be easily obtained by injecting Rodrigues’ rotation formula in the expression of the proposed distance. We illustrate this property on Fig. 3 with an object consisting in a model of the Eiffel tower scaled to its actual dimensions, for two couples of poses linked by a smallest displacement consisting of a rotation of \(15^{\circ }\) around different axes. While the angle of the relative rotation is identical in both cases, displacements of surface points are quite different and we visually tend to consider poses in case (b) as being farther away one another than in case (a). Our framework formalizes this intuition, resulting in a distance between the poses in configuration (b) approximatively 2.1 times greater than the one between the poses in configuration (a).
Fig. 3

Usual metrics would consider the distances between the two poses in cases (a) and in (b) equal—as in both cases the two poses are linked by a rotation of \(15^{\circ }\) around the center of mass of the object. Our distance accounts for the object geometry and discriminates these two configurations: d \(=\) 9.2m in configuration (a), d \(=\) 18.9m in configuration (b) (Color figure online)

5 Efficient Distance Computation

The distance Definition 2 and the simpler expression of Proposition 1 are of little practical use for actual applications as they contain a summation term over the set of points of the object and a minimization over its proper symmetry group, both sets being potentially infinite.

In this section, we show how our proposed distance can be evaluated efficiently. To this aim, we propose a representation of a pose \(\mathscr {P}\) as a finite set of points \(\mathscr {R}(\mathscr {P})\) of a Euclidean space \(\mathbb {R}^N\) of at most 12 dimensions, depending on the object’s symmetries. We refer to an element of \(\mathscr {R}(\mathscr {P})\) as a representative of \(\mathscr {P}\), since a representative completely defines a pose (see Sect. 7).

Within this representation framework, the distance between a pair of poses \(\mathscr {P}_1, \mathscr {P}_2\) can be expressed as the minimum of Euclidean distance between their respective representatives,or equivalently, as the minimum Euclidean distance between a given representative for one pose and the representatives of the other, thanks to a reasoning similar to the one developed in the proof of Proposition 1:The cardinal of \(\mathscr {R}(\mathscr {P})\) is independent of the pose considered, depending solely from the class of proper symmetries of the considered object. We therefore denote it \(|\mathscr {R}(\bullet )|\). For most classes—objects with no proper symmetry, revolution objects without rotoreflection invariance and spherical objects—a pose admits a single representative and in that case we will refer to it as \(\mathscr {R}(\mathscr {P})\) by an abuse of notation. \(\mathscr {R}\) can be considered in such case as an isometric embedding of \(\mathscr {C}\) into the Euclidean space \(\mathbb {R}^N\), and the distance between two poses simply corresponds to the Euclidean distance between their respective representatives.
The expression of pose representatives will be derived later in this section for the different classes of objects, and a synthesis is proposed in Table 3 for 3D objects, and in Table 4 for 2D ones.
Table 3

Proposed representatives for a pose \(\mathscr {P} = [(\mathbf {R} \in SO(3), \mathbf {t} \in \mathbb {R}^3)]\) of a 3D object depending on its proper symmetries

Proper symmetry class

Proper symmetry group G

Pose representatives \(\mathscr {R}(\mathscr {P})\)

Spherical symmetry


\(\mathbf {t} \in \mathbb {R}^3\)

Revolution symmetry without rotoreflection invariance

\(\left\{ \mathbf {R}_z^\alpha \vert \alpha \in \mathbb {R} \right\} \)

\( (\lambda (\mathbf {R} \mathbf {e}_z)^\top , \mathbf {t}^\top )^\top \in \mathbb {R}^{6}\)

Revolution symmetry with rotoreflection invariance

\(\left\{ \mathbf {R}_x^\delta \mathbf {R}_z^\alpha \vert \delta \in \left\{ 0, \pi \right\} , \alpha \in \mathbb {R} \right\} \)

\(\left\{ (\pm \lambda (\mathbf {R} \mathbf {e}_z)^\top , \mathbf {t}^\top )^\top \right\} \subset \mathbb {R}^{6}\)

No proper symmetry

\(\left\{ \mathbf {I} \right\} \)

\( ({{\mathrm{vec}}}(\mathbf {R} \varvec{\varLambda })^\top , \mathbf {t}^\top )^\top \in \mathbb {R}^{12}\)

Finite nontrivial


\(\left\{ ({{\mathrm{vec}}}(\mathbf {R} \mathbf {G} \varvec{\varLambda })^\top , \mathbf {t}^\top )^\top | \mathbf {G} \in G \right\} \subset \mathbb {R}^{12}\)


Center of mass of the object as origin of the object frame. For revolution objects, revolution axis as \(\mathbf {e}_z\) axis of the object frame.


\(\varvec{\varLambda }\triangleq \left( \frac{1}{S} \int _\mathscr {S} \mu (\mathbf {x}) \mathbf {x} \mathbf {x}^\top ds \right) ^{1/2}\)

and \(\lambda \triangleq \sqrt{\lambda _r^2 + \lambda _z^2}\) for revolution objects where \(\varvec{\varLambda }= \text {diag}(\lambda _r, \lambda _r, \lambda _z)\)

Table 4

Proposed representatives for a pose \(\mathscr {P} = [(\theta \in \mathbb {R}, \mathbf {t} \in \mathbb {R}^2)]\) of a 2D object depending on its proper symmetries

Proper symmetry class

Proper symmetry group G

Pose representatives \(\mathscr {R}(\mathscr {P})\)

Circular symmetry


\(\mathbf {t} \in \mathbb {R}^2\)

No proper symmetry

\(\left\{ \mathbf {I} \right\} \)

\((\lambda e^{i \theta }, \mathbf {t}^\top )^\top \in \mathbb {R}^{4}\)

Cyclic symmetry (order \(n \in \mathbb {N}^*\))

\(\left\{ \mathbf {R}^{2 k \pi /n} \vert k \in \llbracket 0, n \llbracket \right\} \)

\(\left\{ (\lambda e^{i (\theta + 2 k \pi / n)}, \mathbf {t}^\top )^\top \vert k \in \llbracket 0, n \llbracket \right\} \subset \mathbb {R}^{4}\)


Center of mass of the object as origin of the object frame.


\(\forall \alpha \in \mathbb {R}, e^{i \alpha } = \left( \cos (\alpha ), \sin (\alpha ) \right) \), and \(\lambda \triangleq \left( \frac{1}{S} \int _\mathscr {S} \mu (\mathbf {x}) \Vert \mathbf {x} \Vert ^2 ds \right) ^{1/2}\)

5.1 Neighborhood Query

The distance formulation (21) is of great practical value as it enables to perform efficient radius search and exact or approximate k-nearest neighbors queries within a large set of poses through the use of any off-the-shelf neighborhood query algorithms designed for Euclidean spaces. Neighborhood queries are useful for numerous problems, and we provide an example in Sect. 10 where radius search is heavily used. Existing methods for neighborhood queries enable fast neighborhood retrieval within a set of points of a vector space compared to a brute-force approach consisting in computing the distance to every points of the set. They make use of a specific search structure—such as a grid or a kD-tree for example—adapted to the specific properties of the considered metric space. A review of those algorithms is out of the scope of this work, and we will only refer the interested reader to the well-known FLANN library (Muja and Lowe 2009) as a starting point.

Let S be a finite set of poses. We consider the pointset R consisting of the aggregation of all representatives of the poses of S:
$$\begin{aligned} R = \bigcup \limits _{\mathscr {P} \in S} \mathscr {R}(\mathscr {P}). \end{aligned}$$
From (21), given a query pose \(\mathscr {Q}\) and one of its representatives \(\mathbf {q} \in \mathscr {R}(\mathscr {Q})\), the poses of S that are closer to \(\mathscr {Q}\) than a given distance \(\delta \) are the poses that have a representative closer to \(\mathbf {q}\) than \(\delta \) using the Euclidean distance.

Such representatives can be retrieved through a standard radius search operation around \(\mathbf {q}\) in R. The search for nearest neighbors can be performed in a similar fashion.

One should nevertheless be careful of potential duplicates with those operations, as a pose may have several representatives depending on the proper symmetries of the object. Nonetheless, the absence of duplicates is locally guaranteed around the query point in an open ball of radius T / 2, where T is a constant defined as follows:

T can be computed considering an arbitrary pose \(\mathscr {P}\) because of the invariance of our underlying metric to the choice of a reference pose (see Sect. 4.2), and considering an arbitrary representative \(\mathbf {p} \in \mathscr {R}(\mathscr {P})\) because of the symmetry properties of representatives described in Sect. 6.

5.2 Decomposition Into Translation and Rotation Parts

From this point on, we consider a direct orthonormal coordinates system \((\mathbf {O}, \mathbf {e}_x, \mathbf {e}_y, \mathbf {e}_z)\). As in Sect. 2.3 on the proper symmetry group, we assume that the origin \(\mathbf {O}\) of the object frame is an invariant point of the object with respect to its proper symmetries. It is for example chosen at the center of the object for a spherical object, and on the revolution axis for a revolution object. Doing so, the proper symmetry group of the object can be considered as a group of rotations around the origin, and we therefore assimilate proper symmetries to rotation matrices. We exploit this property to develop the inner term of the expression (10) of the square distance into:
$$\begin{aligned} \begin{aligned}&\Vert (\mathbf {T}_2 \circ \mathbf {G}_2) (\mathbf {x}) - (\mathbf {T}_1 \circ \mathbf {G}_1) (\mathbf {x})\Vert ^2 \\&\quad = \Vert \mathbf {R}_2 \mathbf {G}_2 \mathbf {x} + \mathbf {t}_2 - (\mathbf {R}_1 \mathbf {G}_1 \mathbf {x} + \mathbf {t}_1)\Vert ^2 \\&\quad = \Vert \mathbf {R}_2 \mathbf {G}_2 \mathbf {x} - \mathbf {R}_1 \mathbf {G}_1 \mathbf {x} \Vert ^2 + \Vert \mathbf {t}_2 - \mathbf {t}_1\Vert ^2 \\&\qquad +\, 2 (\mathbf {t}_2 - \mathbf {t}_1)^\top (\mathbf {R}_2 \mathbf {G}_2 - \mathbf {R}_1 \mathbf {G}_1) \mathbf {x}. \end{aligned} \end{aligned}$$
We add the further constraint that the origin of the object frame is the center of mass of the object’s surface, i.e. \(\int _\mathscr {S} \mu (\mathbf {x}) \mathbf {x} ds = \mathbf {0}\). This constraint is compatible with the previous one because the center of mass is unique, and therefore has to be left unchanged by the proper symmetries of the object. Thanks to this choice, the last term of (24) disappears during the integration, and the squared distance (10) can therefore be decomposed into a translation and a rotation part:
$$\begin{aligned}&{{\mathrm{d}}}^2(\mathscr {P}_1, \mathscr {P}_2) = \Vert \mathbf {t}_2 - \mathbf {t}_1\Vert ^2 \nonumber \\&\quad + \underbrace{\min _{\mathbf {G}_1, \mathbf {G}_2 \in G } \frac{1}{S} \int _{\mathscr {S}} \mu (\mathbf {x}) \Vert \mathbf {R}_2 \mathbf {G}_2 \mathbf {x} - \mathbf {R}_1 \mathbf {G}_1 \mathbf {x} \Vert ^2 ds}_{{{\mathrm{d_{rot}^2}}}(\mathbf {R}_1, \mathbf {R}_2)}. \end{aligned}$$
In the following subsections, we show how this rotation part can be simplified, and how it leads to the notion of pose representatives.

5.3 Object with No Proper Symmetries

Let us consider the case of an object showing no proper symmetries. The proper symmetry group of this object is reduced to the identity rotation: \(G=\left\{ \mathbf {I} \right\} \), therefore the rotation part of the square distance (25) can be expressed as follows:
$$\begin{aligned} {{\mathrm{d_{rot}^2}}}(\mathscr {P}_1, \mathscr {P}_2) = \int _{\mathscr {S}} \mu (\mathbf {x}) \Vert \mathbf {R}_2 \mathbf {x} - \mathbf {R}_1 \mathbf {x} \Vert ^2 ds \end{aligned}$$
Let us show how, for any given pose \(\mathscr {P} \in \mathscr {C}\), one can define a representative \(\mathscr {R}(\mathscr {P}) \in \mathbb {R}^{12}\) such that the distance between two poses in \(\mathscr {C}\) is the same as the distance between their representatives in \(\mathbb {R}^{12}\). Our approach is inspired by the work of Kazerounian and Rastegar (1992) and Chirikjian and Zhou (1998).
Let \(\varvec{\varLambda }\) be the symmetric positive semi-definite square root matrix of the covariance matrix of the object’s weighted surface: \(\varvec{\varLambda }\) does not depend on the considered pose and can therefore be estimated once and for all for a given rigid object in a preprocessing step. We provide in Appendix C formulas to compute \(\varvec{\varLambda }\) when \(\mathscr {S}\) is the surface of a triangular mesh.
Rewriting the inner part of (26) with a trace operator,
$$\begin{aligned} \Vert \mathbf {R}_2 \mathbf {x} - \mathbf {R}_1 \mathbf {x} \Vert ^2 = {{\mathrm{Tr}}}\left( (\mathbf {R}_2 - \mathbf {R}_1 ) \mathbf {x} \mathbf {x}^\top (\mathbf {R}_2 - \mathbf {R}_1)^\top \right) , \end{aligned}$$
one can express the rotation part of the squared distance in a closed form as a weighted Frobenius square distance between the rotation matrices:
$$\begin{aligned} \begin{aligned} {{\mathrm{d_{rot}^2}}}(\mathscr {P}_1, \mathscr {P}_2)&= {{\mathrm{Tr}}}\left( (\mathbf {R}_2 - \mathbf {R}_1 ) \varvec{\varLambda }^2 (\mathbf {R}_2 - \mathbf {R}_1)^\top \right) \\&= \Vert \mathbf {R}_2 \varvec{\varLambda }- \mathbf {R}_1 \varvec{\varLambda }\Vert _F^2. \end{aligned} \end{aligned}$$
Therefore, denoting \({{\mathrm{vec}}}\) the operator vectorizing columnwise a matrix into a column vector, we can define an isometry \(\mathscr {R}\) from the pose space into the 12-dimensional Euclidian space, such asThe conversion from a pose represented in term of a rotation matrix \(\mathbf {R}\) and a translation vector \(\mathbf {t}\) to its representative in \(\mathbb {R}^{12}\) is direct, since it consists in a simple linear operation. If the object frame is moreover chosen aligned with the principal axes of the object, \(\varvec{\varLambda }\) is diagonal, making the computation of the pose representative even cheaper.

5.4 Revolution Object Without Rotoreflection Invariance

We now consider the case of a revolution object without rotoreflection invariance. As stated in Sect. 5.2, we assume that the origin of the object frame corresponds to the center of mass of the object. Without loss of generality, we moreover assume that the axis \(\mathbf {e}_z\) of the object frame is aligned with the revolution axis. A pose \(\mathscr {P}\) is thus defined up to a rotation \(\mathbf {R}_z^\phi \) along the \(\mathbf {e}_z\) axis, where \(\phi \) is the angle of the considered rotation, and the proper symmetry group of the object consists in \(G = \left\{ \mathbf {R}_z^\phi \vert \phi \in \mathbb {R} \right\} \).

The simplification to get rid of the integral within the distance expression thanks to the introduction of the matrix \(\varvec{\varLambda }\) in Sect. 5.3 is also valid here. Moreover, because \((\mathbf {O}, \mathbf {e_z})\) is the revolution axis of the object, \(\varvec{\varLambda }\) is necessarily diagonal and of the form
$$\begin{aligned} \varvec{\varLambda }= \left( \begin{array}{ccc} \lambda _r &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad \lambda _r &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad \lambda _z \\ \end{array} \right) \end{aligned}$$
with \(\lambda _r, \lambda _z \in \mathbb {R}^+\). This enables us to express the rotation part of the distance as a simple scaled distance between the revolution axes seen as 3D vectors:
$$\begin{aligned} {{\mathrm{d_{rot}^2}}}(\mathscr {P}_1, \mathscr {P}_2) = \lambda ^2 \Vert \mathbf {R}_2 \mathbf {e}_z - \mathbf {R}_1 \mathbf {e}_z \Vert ^2 \end{aligned}$$
with \(\lambda \triangleq \sqrt{\lambda _r^2 + \lambda _z^2}\). The reader is referred to Appendix A for a proof of this result.
Therefore, similarly to what we proposed for an object without proper symmetry, we can consider a simple isometry \(\mathscr {R}\) which associates to a pose of a revolution object without rotoreflection invariance, a 6D vector, consisting of the concatenation of the coordinates of its scaled revolution axis and of its position, in order to efficiently evaluate distances:

5.5 Spherical Object

We now consider the simpler case of an object with spherical symmetry. Choosing the center of the object as origin of the object frame, the proper symmetry group of the object is the whole rotation group \(\textit{SO}(3)\). The rotation part of the distance (25) can thus be rewritten as follows:
$$\begin{aligned} {{\mathrm{d_{rot}^2}}}(\mathscr {P}_1, \mathscr {P}_2) = \min _{\mathbf {R}_1, \mathbf {R}_2} \left( \frac{1}{S} \int _{\mathscr {S}} \mu (\mathbf {x}) \Vert \mathbf {R}_2 \mathbf {x} - \mathbf {R}_1 \mathbf {x} \Vert ^2 ds \right) . \end{aligned}$$
This term is null, the minimum being reached for \(\mathbf {R}_1 = \mathbf {R}_2\). Therefore, the pose space of a spherical object can be also isometrically embedded into a \(\mathbb {R}^3\) by representing a pose by the position of its center:

5.6 Revolution Object with Rotoreflection Invariance

Let us consider the case of a revolution object with rotoreflection invariance, i.e. having a reflection symmetry with respect to a plane orthogonal to the revolution axis. With the same constraints on the choice of the object frame as for a revolution object without rotoreflection invariance, the proper symmetry group of such object can be written as follows:
$$\begin{aligned} G = \left\{ \mathbf {R}_x^\delta \mathbf {R}_z^\alpha \;|\; \alpha \in \mathbb {R}, \delta \in \left\{ 0, \pi \right\} \right\} . \end{aligned}$$
Therefore, the distance between two poses \(\mathscr {P}_1, \mathscr {P}_2\) can be expressed as:
$$\begin{aligned} \min _{\delta _1, \delta _2, \phi _1, \phi _2} {{\mathrm{d_{no\_sym}}}}\left( (\mathbf {R}_1 \mathbf {R}_x^{\delta _1} \mathbf {R}_z^{\phi _1}, \mathbf {t}_1), (\mathbf {R}_2 \mathbf {R}_x^{\delta _2} \mathbf {R}_z^{\phi _2}, \mathbf {t}_2)\right) . \end{aligned}$$
We discussed in Sect. 5.4 how to compute such an expression relatively to the symmetries along the revolution axis. Therefore using result (32), our distance can be rewritten as the minimum Euclidean distance between 6D points, two being assigned to each pose:
$$\begin{aligned} {{\mathrm{d}}}(\mathscr {P}_1, \mathscr {P}_2) = \min _{\delta _1, \delta _2 \in \left\{ 0, \pi \right\} } \Vert \mathbf {p}_2^{\delta _2} - \mathbf {p}_1^{\delta _1}\Vert \end{aligned}$$
with \(\mathbf {p}_i^\delta = \left( \lambda (\mathbf {R}_i \mathbf {R}_x^{\delta } \mathbf {e}_z)^\top , \mathbf {t}^\top \right) ^\top \in \mathbb {R}^6\) the representatives of pose \(\mathscr {P}_i\), for \(\delta =0, \pi \) and \(i=1,2\).
Simplifying the representative expression a little given that \(\mathbf {R}_x^{0} \mathbf {e}_z = \mathbf {e}_z\) and \(\mathbf {R}_x^{\pi } \mathbf {e}_z = -\,\mathbf {e}_z\), we see that a pose of a revolution object with rotoreflection invariance can be represented by two 6D vectors consisting of the concatenation of the coordinates of its scaled revolution axis and of its position, each potential orientation of the axis being taken into account by one representative:

5.7 Object with a Nontrivial Finite Proper Symmetry Group

The last type of 3D object to deal with is the case of an object with a finite proper symmetry group G different from the identity, such as the object depicted in Table 2e. The proposed distance between two poses of such an object can be written as:
$$\begin{aligned} \min _{\mathbf {G}_1, \mathbf {G}_2 \in G} {{\mathrm{d_{no\_sym}}}}((\mathbf {R}_1 \mathbf {G}_1, \mathbf {t}_1), (\mathbf {R}_2 \mathbf {G}_2, \mathbf {t}_2)). \end{aligned}$$
We showed in Sect. 5.3 that the pose of an object without proper symmetry can be represented as a 12D point, such that the distance between two poses of such object corresponds to the Euclidean distance between their respective representatives. Therefore, it is straightforward to conclude that the pose of an object with a finite proper symmetry group can be represented by a finite set of 12D representative points, such that the distance between two poses corresponds to the minimum Euclidean distance between their respective representatives:

5.8 2D Object

The notion of pose representative can be applied to 2D objects as well. For the sake of conciseness, we will only discuss the case of a 2D object with no proper symmetry, as the reasoning is very similar to the one performed for 3D objects. The full list of proposed representatives is given in Table 4.

The decomposition of the square distance between two poses in a translation and rotation terms (25) and the expression of the rotation part as a Frobenius norm (28) are still valid in the 2D case, but they can be even further simplified. Indeed, a 2D rotation matrix can be parametrized by an angle \(\theta \) as follows:
$$\begin{aligned} \mathbf {R}^\theta = \left( \begin{array}{cc} \cos (\theta ) &{}\quad -\sin (\theta ) \\ \sin (\theta ) &{}\quad \cos (\theta ) \\ \end{array} \right) . \end{aligned}$$
Introducing the elements of the covariance matrix
$$\begin{aligned} \varvec{\varLambda }^2 = \left( \begin{array}{cc} \lambda _{xx}^2 &{}\quad \lambda _{xy}^2 \\ \lambda _{xy}^2 &{}\quad \lambda _{yy}^2 \\ \end{array} \right) , \end{aligned}$$
the rotation part can be simplified into
$$\begin{aligned} \begin{aligned} {{\mathrm{d_{rot}^2}}}(\mathscr {P}_1, \mathscr {P}_2)&= {{\mathrm{Tr}}}\left( (\mathbf {R}^{\theta _2} - \mathbf {R}^{\theta _1} ) \varvec{\varLambda }^2 (\mathbf {R}^{\theta _2} - \mathbf {R}^{\theta _1} )^\top \right) \\&= (\lambda _{xx}^2 + \lambda _{yy}^2) \Vert e^{i \theta _2} - e^{i \theta _1} \Vert ^2 \end{aligned} \end{aligned}$$
where \(e^{i \theta } \triangleq (\cos (\theta ), \sin (\theta ))\). Therefore, we can include in our framework a 2D object without proper symmetry, and represent a pose of such object by a 4D vector, consisting of the concatenation of the coordinates of its scaled complex orientation and of its position:

6 Symmetry Within Representatives

Objects with finite non trivial symmetry groups and revolution objects with rotoreflection invariance admit several representatives per pose. This multiplicity of representatives expresses the proper symmetries of the object that are not accounted for in the expression of a representative, and leads to some symmetry properties within the set of representatives itself. Formally, for a given object, we define a finite group of symmetry operations \(G_\mathscr {R}\) on the ambient space \(\mathbb {R}^N\). This group consists in the identity singleton in the case of objects admitting a single representative per pose, and is defined in Table 5 for the other objects classes. In this section we discuss some properties of this group. Those will be used in order to propose a method to properly average poses, in Sect. 8.2.
Table 5

Proposed symmetry operations on the ambient space for objects with multiple representatives per pose

Object type

Proper symmetry class

Symmetry group \(G_\mathscr {R}\)

Symmetry definition



\(\left\{ s_{\mathbf {G} } \vert \mathbf {G} \in G \right\} \)

\(\begin{aligned}&s_{\mathbf {G}} : \mathbb {R}^{12} \rightarrow \mathbb {R}^{12} \\&({{\mathrm{vec}}}(\mathbf {M})^\top , \mathbf {t}^\top )^\top \mapsto ({{\mathrm{vec}}}(\mathbf {M} \mathbf {G})^\top , \mathbf {t}^\top )^\top \end{aligned}\)

Revolution with rotoreflection invariance

\(\left\{ s_{\text {rev}, \delta } \vert \delta = \pm 1 \right\} \)

\(\begin{aligned} s_{\text {rev}, \delta } : \mathbb {R}^{6}&\rightarrow \mathbb {R}^{6} \\ (\mathbf {a}^\top , \mathbf {t}^\top )^\top&\mapsto (\delta \mathbf {a}^\top , \mathbf {t}^\top )^\top \end{aligned}\)


Cyclic (order \(n \in \mathbb {N}^*\))

\(\left\{ s_{\text {2D}, n, k} \vert k \in \llbracket 0, n \llbracket \right\} \)

\(\begin{aligned} s_{\text {2D}, n, k} : \mathbb {R}^{4}&\rightarrow \mathbb {R}^{4} \\ (\mathbf {a}^\top , \mathbf {t}^\top )^\top&\mapsto (e^{i 2 k \pi / n} \cdot \mathbf {a}, \mathbf {t}^\top )^\top . \end{aligned}\)


We decompose a point of the ambient space \(\mathbb {R}^N\) into two parts as follows, depending on the dimension N of the space:

   – \(({{\mathrm{vec}}}(\mathbf {M})^\top , \mathbf {t}^\top )^\top \) for a 12D space, with \(\mathbf {M} \in \mathscr {M}_{3,3}(\mathbb {R})\), and \(\mathbf {t} \in \mathbb {R}^3\).

   – \((\mathbf {a}^\top , \mathbf {t}^\top )^\top \) for a 6D space, with \(\mathbf {a}, \mathbf {t} \in \mathbb {R}^3\).

   – \((\mathbf {a}^\top , \mathbf {t}^\top )^\top \) for a 4D space, with \(\mathbf {a}, \mathbf {t} \in \mathbb {R}^2\). \(\mathbf {a}\).

In the 4D case, we use the complex multiplication notation, assimilating \(\mathbf {a}\) to a complex number

First, we ensure that the proposed group is well defined:

Proposition 2

\(G_\mathscr {R}\) is a group for the composition operation.


This property derives directly from the group properties of \(\mathbf {G}\), \(\lbrace 1, -1 \rbrace \) and \(\lbrace e^{i 2 k \pi / n} \vert k \in \llbracket 0, n \rrbracket \rbrace \) for multiplication operations. \(\square \)

Then, we introduce the following lemma, which somehow expresses the fact that the geometry of the object is consistent with the object’s symmetries:

Lemma 1

For any proper symmetry \(\mathbf {G} \in G\), \(\mathbf {G}\) and \(\varvec{\varLambda }\) commute, i.e. \(\mathbf {G} \varvec{\varLambda }= \varvec{\varLambda }\mathbf {G}\).


Let \(\mathbf {G}\) be a proper symmetry in G. By definition of \(\varvec{\varLambda }^2\),
$$\begin{aligned} \mathbf {G} \varvec{\varLambda }^2 = \frac{1}{S} \int _\mathscr {S} \mu (\mathbf {x}) \mathbf {G} \mathbf {x} \mathbf {x}^\top ds. \end{aligned}$$
Performing the change of variable \(\mathbf {x} \leftarrow \mathbf {G} \mathbf {x}\) enables to rewrite this expression into:
$$\begin{aligned} \frac{1}{S} \int _{\mathbf {G}(\mathscr {S})} \mu (\mathbf {G}^{-1} \mathbf {x}) \mathbf {x} (\mathbf {G}^{-1} \mathbf {x})^\top ds. \end{aligned}$$
Thanks to the invariance of \(\mathscr {S}\) and \(\mu \) to the proper symmetries of the object, we exhibit back \(\varvec{\varLambda }^2\) as follows:
$$\begin{aligned} \begin{aligned} \mathbf {G} \varvec{\varLambda }^2&= \frac{1}{S} \int _{\mathscr {S}} \mu (\mathbf {x}) \mathbf {x} \mathbf {x}^\top \mathbf {G}^{-\top } ds \\&= \varvec{\varLambda }^2 \mathbf {G}^{-\top }. \end{aligned} \end{aligned}$$
\(\mathbf {G}\) being a rotation, \(\mathbf {G}^{-\top } = \mathbf {G}\), and therefore \(\mathbf {G}\) and \(\varvec{\varLambda }^2\) commute, i.e.
$$\begin{aligned} \mathbf {G} \varvec{\varLambda }^2 = \varvec{\varLambda }^2 \mathbf {G}. \end{aligned}$$
Moreover, as a positive semi-definite symmetric matrix, \(\varvec{\varLambda }^2\) admits an eigenvalue decomposition \(\varvec{\varLambda }^2 = \mathbf {U} \mathbf {D} \mathbf {U}^\top \), where \(\mathbf {U} \in SO(3)\) and \(\mathbf {D}\) is a positive semi-definite diagonal matrix. Injecting this decomposition into the right hand side of Eq. 48, we observe that \(\mathbf {G}^\top \mathbf {U}\) is also an eigenbasis of \(\varvec{\varLambda }^2\):
$$\begin{aligned} \varvec{\varLambda }^2 = (\mathbf {G}^\top \mathbf {U}) \mathbf {D} (\mathbf {G}^\top \mathbf {U})^\top . \end{aligned}$$
\(\varvec{\varLambda }\) being the principal square root of \(\varvec{\varLambda }^2\), both share the same eigenspaces, thus:
$$\begin{aligned} \left\{ \begin{array}{l} \varvec{\varLambda }= \mathbf {U} \mathbf {D}^{1/2} \mathbf {U}^\top \\ \varvec{\varLambda }= (\mathbf {G}^\top \mathbf {U}) \mathbf {D}^{1/2} (\mathbf {G}^\top \mathbf {U})^\top . \end{array} \right. \end{aligned}$$
Therefore, injecting the first equality into the second one, we proved that
$$\begin{aligned} \varvec{\varLambda }= \mathbf {G}^\top \varvec{\varLambda }\mathbf {G} \end{aligned}$$
i.e., that \(\mathbf {G}\) and \(\varvec{\varLambda }\) commute: \(\mathbf {G} \varvec{\varLambda }= \varvec{\varLambda }\mathbf {G}\). \(\square \)

Thanks to this lemma, it is now possible to exhibit the three following properties of those symmetries within the ambient space:

Proposition 3

\(G_\mathscr {R}\) contains \(|\mathscr {R}(\bullet )|\) elements, and given a pose \(\mathscr {P}\) and one of its representative \(\mathbf {p} \in \mathscr {R}(\mathscr {P})\), the set of elements symmetric to \(\mathbf {p}\) (including itself) is the whole set of representatives of the pose, i.e.
$$\begin{aligned} \left\{ s(\mathbf {p}) \vert s \in G_\mathscr {R} \right\} = \mathscr {R}(\mathscr {P}). \end{aligned}$$


This proposition is easily verified thanks to the expression of pose representatives Tables 3 and 4. The only subtlety is the case of a 3D object with a finite proper symmetry group. In this case for any \(\mathbf {G} \in G\), the symmetric by \(s_{\mathbf {G}}\) of a pose representative \(({{\mathrm{vec}}}(\mathbf {R} \varvec{\varLambda })^\top , \mathbf {t}^\top )^\top \), where \(\mathbf {R} \in \mathscr {M}_{3, 3}(\mathbb {R})\) and \(\mathbf {t} \in \mathbb {R}^3\), can be expressed as
$$\begin{aligned} s_{\mathbf {G}} \left( ({{\mathrm{vec}}}(\mathbf {R} \varvec{\varLambda })^\top , \mathbf {t}^\top )^\top \right) = ({{\mathrm{vec}}}(\mathbf {R} \varvec{\varLambda }\mathbf {G})^\top , \mathbf {t}^\top )^\top , \end{aligned}$$
which according to Lemma 1 is equal to \(({{\mathrm{vec}}}(\mathbf {R} \mathbf {G} \varvec{\varLambda })^\top , \mathbf {t}^\top )^\top \). By definition of representatives for such object, it is therefore a representative of the same pose \(\mathscr {P}\). \(\square \)

Proposition 4

Elements of \(G_\mathscr {R}\) are linear transformations of the ambient space, i.e. for \(s \in G_\mathscr {R}\), and for any \(\mathbf {x}_1, \mathbf {x}_2 \in \mathbb {R}^N\), and \(\alpha \in \mathbb {R}\),
$$\begin{aligned} s(\mathbf {x}_1 + \alpha \mathbf {x}_2) = s(\mathbf {x}_1) + \alpha s(\mathbf {x}_2). \end{aligned}$$

This proposition is a direct consequence of the definition of symmetries described Table 5.

Proposition 5

Elements of \(G_\mathscr {R}\) are automorphisms of the ambient space: for any \(s \in G_\mathscr {R}\), s is bijective, and for any \(\mathbf {x}_1, \mathbf {x}_2 \in \mathbb {R}^N\),
$$\begin{aligned} \Vert s(\mathbf {x}_2) - s(\mathbf {x}_1) \Vert = \Vert \mathbf {x}_2 - \mathbf {x}_1 \Vert . \end{aligned}$$


Bijectivity is straightforward since
  • \((s_{\mathbf {G}})^{-1} = s_{\mathbf {G}^{-1}}\) for any \(\mathbf {G} \in G\).

  • \((s_{\text {rev}, \delta })^{-1} = s_{\text {rev}, \delta }\) for any \(\delta \in \left\{ -1, 1 \right\} \).

  • \((s_{\text {2D}, n, k})^{-1} = s_{\text {2D}, n, -k}\) for any \(k \in \mathbb {N}\).

The morphism property comes from the linearity of those symmetry operations (Proposition 4) and the fact they preserve the norm, since elements of \(\mathbf {G}\), \(\lbrace 1, -1 \rbrace \) and \(\lbrace e^{i 2 k \pi / n} \vert k \in \llbracket 0, n \llbracket \rbrace \) are themselves of unit norm. \(\square \)

7 Projection Onto the Pose Space

In Sect. 5, we discussed how a pose \(\mathscr {P}\) can be identified to a finite pointset \(\mathscr {R}(\mathscr {P})\) of an Euclidean space of finite dimension \(\mathbb {R}^N\) and how elements of \(\mathscr {R}(\mathscr {P})\) can be computed easily from any rigid transformation \((\mathbf {R}, \mathbf {t})\) associated to the pose. The backward mapping is possible, and for any element of \(\mathscr {R}(\mathscr {P})\) we can compute a rigid transformation fully describing the pose \(\mathscr {P}\). Hence we consider an element of \(\mathscr {R}(\mathscr {P})\) as a representative of \(\mathscr {P}\). This computation is actually straightforward given the expressions of poses representatives (see Tables 3 and 4), thus we choose to discuss this assertion in the more general framework of projection onto the pose space: given an arbitrary N-D vector \(\mathbf {x}\), find out what pose has the most similar representative to \(\mathbf {x}\). The results of this section will be useful in Sect. 8 to propose a method for pose averaging.

In nondegenerate cases, the projection is unique, and we propose in the next subsections its expression for the different classes of bounded objects, based on the computation of the closest pose representative to the query point.

7.1 Spherical Object

Projection is trivial in the case of a spherical object, since all points of \(\mathbb {R}^3\) are valid representatives of poses. A point \(\mathbf {x} \in \mathbb {R}^3\) therefore projects onto the pose having \(\mathbf {x}\) for representative, namely the pose in which the center of the object admits \(\mathbf {x}\) for 3D coordinates.

7.2 Object of Revolution

In the case of a revolution object without rotoreflection invariance, the position of the center of mass and the oriented revolution axis of the object are well defined at any given pose. Reciprocally, a pose can be defined by the position of its center of mass \(\mathbf {t}\) and its oriented revolution axis, that we represent by a normalized vector \(\mathbf {a} \in \mathbb {R}^3\). The unique representative of such a pose is \((\lambda \mathbf {a}^\top , \mathbf {t}^\top )^\top \) as we defined in Sect. 5.4.

Let \(\mathbf {x} \in \mathbb {R}^6\) be a point to project onto the pose space. Without loss of generality, \(\mathbf {x}\) can be split into two parts: \(\mathbf {x} = (\mathbf {x}_r^\top , \mathbf {x}_t^\top )^\top \) with \(\mathbf {x}_r, \mathbf {x}_t \in \mathbb {R}^3\). The projection problem can therefore be reformulated into:
$$\begin{aligned} \begin{aligned} {{\mathrm{proj}}}(\mathbf {x})&= \mathop {\hbox {argmax}}\limits _{\mathscr {P}} \Vert \mathbf {x} - \mathscr {R}(\mathscr {P}) \Vert ^2 \\&= \mathop {\hbox {argmax}}\limits _{\mathbf {a}, \mathbf {t} \in \mathbb {R}^3 / \Vert \mathbf {a}\Vert = 1} \left( \Vert \mathbf {x}_r - \lambda \mathbf {a}\Vert ^2 + \Vert \mathbf {x}_t - \mathbf {t}\Vert ^2 \right) \\ \end{aligned} \end{aligned}$$
This problem admits an unique solution as long as \( \mathbf {x}_r \ne \mathbf {0}\), and in that case the projection of \(\mathbf {x}\) is the pose of center of mass \(\hat{\mathbf {t}} = \mathbf {x}_t\) and of axis \(\hat{\mathbf {a}}=\mathbf {x}_r / \Vert \mathbf {x}_r \Vert \). This result holds true in the case of an object of revolution with rotoreflection invariance, since \((\lambda \hat{\mathbf {a}}^\top , \hat{\mathbf {t}}^\top )^\top \) is the closest pose representative to \(\mathbf {x}\).

7.3 Object with a Finite Proper Symmetry Group

The representative of a pose of an object without proper symmetry is a 12D vector, the first 9 dimensions representing the orientation in the form of a vectorized matrix and the 3 others the position of the object. Therefore, and without loss of generality, we can split a point \(\mathbf {x} \in \mathbb {R}^{12}\) to project in a similar fashion: \(\mathbf {x} = ({{\mathrm{vec}}}(\mathbf {X}_r)^\top , \mathbf {x}_t^\top )^\top \) with \(\mathbf {x}_t \in \mathbb {R}^3\) and \(\mathbf {X}_r \in \mathscr {M}_{3,3}(\mathbb {R})\). The projection problem for such a point \(\mathbf {x}\)—in the case of an object without proper symmetry – thus consists in:
$$\begin{aligned} \begin{aligned} {{\mathrm{proj}}}(\mathbf {x})&= \mathop {\hbox {argmax}}\limits _{\mathscr {P}} \Vert \mathbf {x} - \mathscr {R}(\mathscr {P}) \Vert ^2 \\&= \mathop {\hbox {argmax}}\limits _{\mathbf {R}, \mathbf {t}} \left( \Vert \mathbf {X}_r - \mathbf {R} \varvec{\varLambda }\Vert _F^2 + \Vert \mathbf {x}_t - \mathbf {t}\Vert ^2 \right) \\ \end{aligned} \end{aligned}$$
The two terms being independent, we conclude again that the position of the center of mass of the object for a projection of \(\mathbf {x}\) is \(\hat{\mathbf {t}} = \mathbf {x}_t\). The minimization problem regarding the orientation part is in the form of the so-called constrained orthogonal Procrustes problem (Schönemann 1966; Umeyama 1991) and admits the solution \(\hat{\mathbf {R}} = \mathbf {U} \mathbf {S} \mathbf {V}^\top \), where \(\mathbf {U} \mathbf {D} \mathbf {V}^\top \) is a singular value decomposition of \(\mathbf {X}_r \varvec{\varLambda }\) such as
$$\begin{aligned} \mathbf {D} = {{\mathrm{diag}}}(\alpha _1, \alpha _2, \alpha _3), \end{aligned}$$
with \(\alpha _1 \ge \alpha _2 \ge \alpha _3 \ge 0\) and
$$\begin{aligned} \mathbf {S} = \left\{ \begin{array}{l} \mathbf {I} \text { if } \det (\mathbf {U}) \det (\mathbf {V}) > 0 \\ {{\mathrm{diag}}}(1, 1, -1) \text { otherwise.} \end{array} \right. \end{aligned}$$
The projection is unique if \({{\mathrm{rank}}}(\mathbf {X}_r \varvec{\varLambda }^\top ) \ge 2\) (Umeyama 1991), a condition that is fulfilled in most practical cases. This result also holds true in the general case of an object with a finite proper symmetry group, as \(({{\mathrm{vec}}}(\hat{\mathbf {R}} \varvec{\varLambda })^\top , \hat{\mathbf {t}}^\top )^\top \) is the closest pose representative to \(\mathbf {x}\).

7.4 2D Object

The projection problem for a 2D object is similar to the 3D case.

In the case of a circular object, any point of \(\mathbf {x} \in \mathbb {R}^2\) is a valid representative of a pose and therefore projects onto the pose of center \(\mathbf {x}\).

Regarding an object with cyclic symmetry, we conclude by the same reasoning as in the case of a 3D revolution object that a 4D vector \(\mathbf {x} = (\mathbf {a}^\top , \mathbf {t}^\top )^\top \), where \(\mathbf {a}, \mathbf {t} \in \mathbb {R}^2\), admits an unique projection as long as \(\Vert \mathbf {a} \Vert \ne 0\). The projection admits a representative \((\lambda / \Vert \mathbf {a} \Vert \cdot \mathbf {a}^\top , \mathbf {t}^\top )^\top \), and consists of the pose defined by a translation \(\mathbf {t}\) and a rotation of angle \(\arg (\mathbf {a})\), where \(\arg (\mathbf {a})\) is the argument of \(\mathbf {a}\) seen as a complex number.

8 Averaging Poses

Pose averaging is of great use for applications such as denoising, modes detection or interpolation. Definition of the average is not obvious in non-vector spaces such as ours, and we therefore consider a generalization of the average to arbitrary metric spaces, known as the Fréchet mean.

Let us consider a finite set of poses \(S = \left\{ \mathscr {P}_i \right\} _{i=1..n}\) and a set of strictly positive weights \(\left\{ w_i \right\} _{i=1..n}\) assigned to each of those. The weighted mean of poses S is by definition the pose which minimizes the corresponding Fréchet variance:
$$\begin{aligned} {{\mathrm{mean}}}(S) \triangleq \mathop {\hbox {argmax}}\limits _{\mathscr {P} \in \mathscr {C}} {{\mathrm{\Phi }}}(\mathscr {P}), \end{aligned}$$
the Fréchet variance at a pose \(\mathscr {P} \in \mathscr {C}\) being expressed as follows:
$$\begin{aligned} {{\mathrm{\Phi }}}(\mathscr {P}) \triangleq \sum _{i=1}^n w_i {{\mathrm{d}}}^2(\mathscr {P}_i, \mathscr {P}). \end{aligned}$$
This mean is not necessarily well defined, since the minimum of Fréchet variance is not necessarily reached at a unique pose. However, such cases typically occur in configurations where the average would actually be meaningless, e.g. when averaging two poses of opposite axes for a revolution object without rotoreflection invariance.

The problem of pose averaging has already been studied for objects without proper symmetry with various metrics. Sharf et al. (2010) notably compare different averaging techniques for the rotation part of a pose, using common metrics. While there is no known closed-form solution for the Riemannian metric (4), it can be computed iteratively, and admits closed form approximations which are “good enough” for practical applications (Gramkow 2001). A good approximation, when dealing with more than two poses, is based on computing the average of rotation matrices, and actually corresponds to the exact average when considering the distance (7) (Curtis et al. 1993).

In the case of our proposed distance, the expression of the Fréchet variance can be developed into:
$$\begin{aligned} {{\mathrm{\Phi }}}(\mathscr {P}) = \sum _{i=1}^{n} w_i \min _{\mathbf {p}_i \in \mathscr {R}(\mathscr {P}_i),\, \mathbf {p} \in \mathscr {R}(\mathscr {P})} \Vert \mathbf {p}_i - \mathbf {p} \Vert ^2. \end{aligned}$$
Considering a given tuple \(P = (\mathbf {p}_i)_{i=1 \dots n} \in \prod _i \mathscr {R}(\mathscr {P}_i)\) of representatives of the poses to average, the weighted sum of square distances from a pose representative \(\mathbf {p}\) to elements of P can be split into two terms, through the introduction of the arithmetic mean \(\mathbf {m}_{P}\) of the elements of P:
$$\begin{aligned} \begin{aligned} \sum _{i=1}^{n} w_i \Vert \mathbf {p}_i - \mathbf {p} \Vert ^2&= \sum _i w_i \Vert \mathbf {p}_i - \mathbf {m}_{P} \Vert ^2 \\&\quad + \left( \sum _i w_i \right) \Vert \mathbf {p} - \mathbf {m}_{P} \Vert ^2, \end{aligned} \end{aligned}$$
with the arithmetic mean
$$\begin{aligned} \mathbf {m}_{P} \triangleq \frac{\sum _i w_i \mathbf {p}_i}{\sum _i w_i}. \end{aligned}$$
The first term of (64) is independent of \(\mathbf {p}\). Therefore, the problem of minimizing (64) for a given tuple P is reduced to the problem of finding a pose \(\mathscr {P}\) which minimizes \(\min _{\mathbf {p} \in \mathscr {R}(\mathscr {P})} \Vert \mathbf {p} - \mathbf {m}_{P} \Vert ^2\). This corresponds to the projection problem we discussed and solved in Sect. 7. The average pose, if well defined, is thus the projection of the arithmetic average of a combination of representatives of the poses to average, or more formally:
$$\begin{aligned} {{\mathrm{mean}}}(S) = \mathop {\hbox {argmax}}\limits _{\mathscr {P} \in \mathscr {A}} {{\mathrm{\Phi }}}(\mathscr {P}), \end{aligned}$$
$$\begin{aligned} \mathscr {A} \triangleq \left\{ {{\mathrm{proj}}}\left( \mathbf {m}_P \right) \, \vert \, P \in \prod _i \mathscr {R}(\mathscr {P}_i) \right\} . \end{aligned}$$

8.1 Objects with a Single Representative Per Pose

Because the projection is unique in nondegenerate cases, the conclusion is straightforward for spherical objects, objects of revolution without rotoreflection invariance, and objects without proper symmetry. Since poses of such objects admit only one representative, a single tuple of representatives has to be considered. For those objects, the average pose exists and corresponds simply to the projection of the arithmetic average of their representatives:

8.2 Objects with Multiple Representatives Per Pose

In the other cases, a pose admits several representatives and one should consider the different combinations of representatives to find the exact average—assuming its existence and uniqueness in nonpathological cases. This problem is not specific to our method and is similar to the issue encountered when averaging orientations of an object without proper symmetries through the arithmetic mean of their quaternions representatives, each orientation admitting two antipodal quaternions as representatives. Because the number of combinations of representatives is exponential in the number of considered poses, the exact computation of the average might easily become expensive.

A common practice to circumvent this issue when averaging orientations based on a quaternion representation is to compute the arithmetic mean of a “consistent” combination of representatives and consider its projection as the average. Such a combination is usually built by choosing a representative for an arbitrary initial pose, and then picking for each pose the nearest representative to the initial one (Gramkow 2001). This approach is simple, but is in the general case ill-defined. Indeed, the chosen combination—and hence the estimated average—are in general dependent of the initial choice. We depict an illustration on Fig. 4 of such a case, where three different choices for the initial pose lead to three different choices of “consistent” representatives combinations and therefore three different estimations of the average.
Fig. 4

Estimating the average of poses with multiple representatives: illustration with the orientation of a 2D object with a \(180^{\circ }\) rotation symmetry, that can be represented by a point on a circle, or the antipodal point. We consider three poses to average (triangle, star and disk shapes). ac The choice of a “consistent” combination of poses representatives (blue clusters) in the sense of (Gramkow 2001) is dependent of the initial choice (circled, first row), resulting potentially in different estimations of the average pose (second row). d We propose a definition of consistency—which is in particular satisfied when the considered representatives are close enough one an other—ensuring an unambiguous estimation of the average pose (Color figure online)

In this subsection, we propose an stricter definition of the consistency of a combination of representatives and prove that it enables an unambiguous estimation of the mean.

Definition 5

(Consistency) A tuple \((\mathbf {p}_i)_{i=1 \dots n} \in \prod _i \mathscr {R}(\mathscr {P}_i)\) is said consistent if and only if
$$\begin{aligned}&\forall (i,j) \in \llbracket 1, n \rrbracket ^2, \forall \mathbf {q}_j \in \mathscr {R}(\mathscr {P}_j) \setminus \left\{ \mathbf {p_j} \right\} , \nonumber \\&\quad \Vert \mathbf {p}_j - \mathbf {p}_i \Vert < \Vert \mathbf {q}_j - \mathbf {p}_i \Vert . \end{aligned}$$

In other words, a consistent tuple is a set of pose representatives closer one another than to any other representatives.

Proposition 6

(Uniqueness of a consistent tuple, up to symmetry) If \((\mathbf {p}_i)_{i=1 \dots n} \in \prod _i \mathscr {R}(\mathscr {P}_i)\) is consistent, then the set of consistent tuples of \(\prod _i \mathscr {R}(\mathscr {P}_i)\) is the set composed of \((\mathbf {p}_i)_{i=1 \dots n}\) and its symmetric tuples
$$\begin{aligned} \left\{ (s(\mathbf {p}_i))_{i=1 \dots n} \vert s \in G_\mathscr {R} \right\} . \end{aligned}$$


Let \((\mathbf {p}_i)_{i=1 \dots n}, (\mathbf {q}_i)_{i=1 \dots n} \in \prod _i \mathscr {R}(\mathscr {P}_i)\) be two different consistent tuples. There exists \(j \in \llbracket 1, n \rrbracket \) such as \(\mathbf {p}_j \ne \mathbf {q}_j\), and we know from the consistency definition of \((\mathbf {p}_i)_{i=1 \dots n}\) and \((\mathbf {q}_i)_{i=1 \dots n}\) that
$$\begin{aligned} \forall i \in \llbracket 1, n \rrbracket , \left\{ \begin{matrix} \Vert \mathbf {p}_j - \mathbf {p}_i \Vert< \Vert \mathbf {q}_j - \mathbf {p}_i \Vert \\ \Vert \mathbf {q}_j - \mathbf {q}_i \Vert < \Vert \mathbf {p}_j - \mathbf {q}_i \Vert . \end{matrix} \right. \end{aligned}$$
If there existed \(i \in \llbracket 1, n \rrbracket \) such as \(\mathbf {p}_i = \mathbf {q}_i\), it would lead to the inequality \(\Vert \mathbf {p}_j - \mathbf {p}_i \Vert < \Vert \mathbf {p}_j - \mathbf {p}_i \Vert \) which is a contradiction. The tuples \((\mathbf {p}_i)_{i=1 \dots n}\) and \((\mathbf {q}_i)_{i=1 \dots n}\) are thus pairwise disjoint, and there exists therefore at most \(|\mathscr {R}(\bullet )|\) consistent tuples.
There are moreover exactly \(|\mathscr {R}(\bullet )|\) different representative combinations symmetric to \((\mathbf {p}_i)_{i=1 \dots n}\) – including itself (proposition 3):
$$\begin{aligned} \left\{ (s(\mathbf {p}_i))_{i=1 \dots n} \vert s \in G_{\mathscr {R}} \right\} . \end{aligned}$$
Those combinations are consistent as much as \((\mathbf {p}_i)_{i=1 \dots n}\), since symmetry operations are morphisms (Proposition 5). Hence the uniqueness up to symmetry of a consistent tuple of representatives. \(\square \)

Proposition 7

(Invariance of the projection under symmetry of representatives) Let \(\mathbf {x} \in \mathbb {R}^N\) be a point of the ambient space, and \(s \in G_\mathscr {R}\). The projection of \(\mathbf {x}\) and its symmetric \(s(\mathbf {x})\) correspond to the same pose:
$$\begin{aligned} {{\mathrm{proj}}}(s(\mathbf {x})) = {{\mathrm{proj}}}(\mathbf {x}). \end{aligned}$$


This result can be easily verified in the case of a revolution object with rotoreflection symmetry or a 2D cyclic object. Therefore, we only discuss the case of an object with a finite proper symmetry group.

Let \(\mathbf {x} \in \mathbb {R}^{12}\) be a point of the ambient space, and \(s_{\mathbf {G}} \in G_\mathscr {R}\), where \(\mathbf {G} \in G\). We split \(\mathbf {x}\) into two parts \(\mathbf {M} \in \mathscr {M}_{3, 3}(\mathbb {R})\) and \(\mathbf {t} \in \mathbb {R}^3\) such as
$$\begin{aligned} \mathbf {x} = ({{\mathrm{vec}}}(\mathbf {M})^\top , \mathbf {t}^\top )^\top . \end{aligned}$$
The symmetric of \(\mathbf {x}\) can thus by definition be written as
$$\begin{aligned} s_{\mathbf {G}}(\mathbf {x}) = ({{\mathrm{vec}}}(\mathbf {M} \mathbf {G})^\top , \mathbf {t}^\top )^\top . \end{aligned}$$
The projection of \(\mathbf {x}\) onto the pose space consists in the pose \([ \hat{\mathbf {R}}, \mathbf {t} ]\), with \(\hat{\mathbf {R}} = \mathbf {U} \mathbf {S} \mathbf {V}^\top \), considering a SVD decomposition \(\mathbf {M} \varvec{\varLambda }= \mathbf {U} \mathbf {D} \mathbf {V}^\top \) and using the same conventions for \(\mathbf {U}, \mathbf {V}, \mathbf {S}\) and \(\mathbf {D}\) than in Sect. 7.3 in which we detailed this result.
Similarly, the projection of \(s_{\mathbf {G}}(\mathbf {x})\) can be deduced from a SVD decomposition of \(\mathbf {M} \mathbf {G} \varvec{\varLambda }\). We know from Lemma 1 that this latter term can be rearranged into
$$\begin{aligned} \mathbf {M} \mathbf {G} \varvec{\varLambda }= \mathbf {M} \varvec{\varLambda }\mathbf {G}. \end{aligned}$$
Thus, injecting the previous decomposition into this expression enables us to exhibit a SVD decomposition of \(\mathbf {M} \mathbf {G} \varvec{\varLambda }\):
$$\begin{aligned} \begin{aligned} \mathbf {M} \mathbf {G} \varvec{\varLambda }&= \mathbf {U} \mathbf {D} \mathbf {V}^\top \mathbf {G} \\&=\mathbf {U} \mathbf {D} \tilde{\mathbf {V}}^\top \end{aligned} \end{aligned}$$
where \(\tilde{\mathbf {V}} = \mathbf {G}^\top \mathbf {V}\). Because G is a rotation matrix,
$$\begin{aligned} \begin{aligned} \det (\tilde{\mathbf {V}})&= \det (\mathbf {G}) \det (\mathbf {V}) \\&= \det (\mathbf {V}) \end{aligned} \end{aligned}$$
and the projection of \(s_{\mathbf {G}}(\mathbf {x})\) is therefore
$$\begin{aligned} \begin{aligned} {{\mathrm{proj}}}(s_{\mathbf {G}}(\mathbf {x}))&= [\mathbf {U} \mathbf {S} \tilde{\mathbf {V}}^\top , \mathbf {t}] \\&=[\hat{\mathbf {R}} \mathbf {G}, \mathbf {t}]. \end{aligned} \end{aligned}$$
Since \(\mathbf {G}\) is a proper symmetry of the object,
$$\begin{aligned}{}[\hat{\mathbf {R}} \mathbf {G}, \mathbf {t}] = [\hat{\mathbf {R}}, \mathbf {t}] \end{aligned}$$
which concludes the proof. \(\square \)
Based on those properties, it is possible to propose an unambiguous estimation of the mean, as follows:


We show here that this expression is well-defined, i.e. that it does depends on the consistent tuple of representatives considered. Let \((\mathbf {p}_i)_{i=1 \dots n} \in \prod _i \mathscr {R}(\mathscr {P}_i)\) be a consistent tuple of representatives. The consistent tuples are the tuples symmetric to this one (Proposition 6):
$$\begin{aligned} \left\{ (s(\mathbf {p}_i))_{i=1 \dots n} \vert s \in G_\mathscr {R} \right\} . \end{aligned}$$
Let us therefore consider an arbitrary consistent tuple \((s(\mathbf {p}_i))_{i=1 \dots n}\), with \(s \in G_\mathscr {R}\) and show that it leads to the same estimation \(\mathscr {M}\) of the average pose than an estimation performed with \((\mathbf {p}_i)_{i=1 \dots n}\).
By definition,
$$\begin{aligned} \mathscr {M} = {{\mathrm{proj}}}\left( \frac{\sum _i w_i s(\mathbf {p}_i)}{\sum _i w_i} \right) . \end{aligned}$$
Because of the linearity of symmetries (Proposition 4), the arithmetic mean of \((s(\mathbf {p}_i))_{i=1 \dots n}\) corresponds to the symmetric of the arithmetic mean of \((\mathbf {p}_i)_{i=1 \dots n}\):
$$\begin{aligned} \frac{\sum _i w_i s(\mathbf {p}_i)}{\sum _i w_i} = s \left( \frac{\sum _i w_i \mathbf {p}_i}{\sum _i w_i} \right) , \end{aligned}$$
hence this expression of \(\mathscr {M}\):
$$\begin{aligned} \mathscr {M} = {{\mathrm{proj}}}\left( s \left( \frac{\sum _i w_i \mathbf {p}_i}{\sum _i w_i} \right) \right) . \end{aligned}$$
Invariance of the projection under symmetry of representatives (Proposition 7) enables to conclude this proof, since
$$\begin{aligned} \mathscr {M} = {{\mathrm{proj}}}\left( \frac{\sum _i w_i \mathbf {p}_i}{\sum _i w_i} \right) . \end{aligned}$$
\(\square \)

Such estimation corresponds most likely to the actual mean (61), unfortunately we do not have a proof of this conjecture.

8.3 Sufficient Conditions of Consistency

While the average of poses can be easily estimated given a consistent combination of representatives, there are however cases where no such combination exist, e.g. when trying to average poses spread out over the set of orientations such as illustrated on Fig. 4a–c. In such a case, one might have to perform an exhaustive evaluation of the Fréchet variance for the different combinations in order to pick the one corresponding to the actual mean. Fortunately, this case is of limited practical interest, as the mean makes little sense.

Consistency is nonetheless not trivial to establish in the general case, and therefore in this section, we provide simple sufficient conditions for a combination of representatives to be consistent. Consistency is in particular satisfied when the considered representatives are close enough one an other, relatively to the distance between their representatives:


Let us consider a tuple \((\mathbf {p}_i)_{i=1 \dots n} \in \prod _i \mathscr {R}(\mathscr {P}_i)\) that satisfies the condition 87. For any \((i,j) \in \llbracket 1, n \rrbracket ^2\) and \(\mathbf {q}_j \in \mathscr {R}(\mathscr {P}_j) \setminus \left\{ \mathbf {p}_j \right\} \), the below properties hold true:
$$\begin{aligned} {\left\{ \begin{array}{ll} \Vert \mathbf {q}_j - \mathbf {p}_j \Vert \le \Vert \mathbf {p}_j - \mathbf {p}_i \Vert + \Vert \mathbf {q}_j - \mathbf {p}_i \Vert &{}\quad \text {(triangle inequality)}\\ \Vert \mathbf {q}_j-\mathbf {p}_j \Vert \ge T &{}\quad \text {(Definition 3)}\\ \Vert \mathbf {p}_j - \mathbf {p}_i \Vert < T/2. &{}\quad \text {(condition (87))}\\ \end{array}\right. } \end{aligned}$$
From those inequalities, we deduce that
$$\begin{aligned} \Vert \mathbf {p}_j-\mathbf {p}_i\Vert < \Vert \mathbf {q}_j-\mathbf {p}_i\Vert , \end{aligned}$$
hence the consistency of \((\mathbf {p}_i)_{i=1 \dots n}\). \(\square \)
A special case of practical value of this criterion is obtained when the considered representatives are included in a ball small enough. It is illustrated on Fig. 4d, and is exploited in our application example Sect. 10.


A tuple satisfying this condition also satisfies the one of Proposition 8 because of the triangle inequality, since for any \((i,j) \in \llbracket 1, n \rrbracket ^2\),
$$\begin{aligned} \begin{aligned} \Vert \mathbf {p}_i - \mathbf {p}_j \Vert&\le \Vert \mathbf {p}_i - \mathbf {c} \Vert + \Vert \mathbf {p}_j - \mathbf {c} \Vert \\&< T/4 + T/4. \end{aligned} \end{aligned}$$
\(\square \)

These sufficient conditions can easily be generalized by considering that only the orientation parts of poses representatives have actually to be close enough one an other. This is a direct consequence to the fact that the pose space can be decomposed into a Cartesian product of a position and an orientation space, and that symmetry considerations only affect orientation for a bounded object.

9 Local Properties

While we focus in this article on global metric properties, the proposed distance can be shown locally equivalent to a Riemannian metric over the pose space manifold. We therefore briefly discuss in this section those aspects.

Object with finite proper symmetry group In the case of a 3D object of finite proper symmetry group, the pose space can be seen as a manifold of dimension 6 (3 for translation, and 3 for rotation). Let us consider two poses of such an object, and \(\mathbf {T}_1\) and \(\mathbf {T}_2 \in SE(3)\) two associated rigid transformations, such as
$$\begin{aligned} {{\mathrm{d}}}([\mathbf {T}_1], [\mathbf {T}_2]) = {{\mathrm{d_{no\_sym}}}}(\mathbf {T}_1, \mathbf {T}_2), \end{aligned}$$
i.e. such as \(\mathbf {T}_1^{-1} \circ \mathbf {T}_2\) to be a shortest displacement from pose \([\mathbf {T}_1]\) to pose \([\mathbf {T}_2]\). If the angle of the relative rotation between \(\mathbf {T}_1\) and \(\mathbf {T}_2\) is small, the corresponding displacement of a point \(\mathbf {x} \in \mathbb {R}^3\) of the object between those two poses can be approximated by introducing the displacement vector \(\mathbf {v} \in \mathbb {R}^3\) and the rotation vector \(\mathbf {\omega } \in \mathbb {R}^3\) between \(\mathbf {T}_1\) and \(\mathbf {T}_2\) as follows:
$$\begin{aligned} (\mathbf {T}_1^{-1} \circ \mathbf {T}_2) (\mathbf {x}) \underset{\theta \rightarrow 0}{\sim } \mathbf {x} + \omega \times \mathbf {x} + \mathbf {v}. \end{aligned}$$
The distance between the two poses can then be approximated by
$$\begin{aligned} {{\mathrm{d}}}([\mathbf {T}_1], [\mathbf {T}_2]) \underset{\theta \rightarrow 0}{\sim } \frac{1}{S} \int _{\mathscr {S}} \mu (\mathbf {x}) \Vert \omega \times \mathbf {x} + \mathbf {v} \Vert ^2 ds. \end{aligned}$$
When considering an infinitesimal displacement between \(\mathbf {T}_1\) and \(\mathbf {T}_2\), and assimilating \(\mathbf {v}\) and \(\mathbf {\omega }\) to translational and angular velocities, this expression corresponds to the notion of kinetic energy (up to a factor 2 / S).
It is also a quadratic form, and therefore the proposed distance is locally equivalent to a Riemannian distance associated with the following metric tensor g, defined for any two tangent vectors \((\mathbf {v}_1^\top , \mathbf {\omega }_1^\top )^\top \), \((\mathbf {v}_2^\top , \mathbf {\omega }_2^\top )^\top \in \mathbb {R}^3 \times \mathbb {R}^3\) by
$$\begin{aligned}&g \left( (\mathbf {v}_1^\top , \mathbf {\omega }_1^\top )^\top , (\mathbf {v}_2^\top , \mathbf {\omega }_2^\top )^\top \right) \nonumber \\&\quad \triangleq \frac{1}{S} \int _{\mathscr {S}} \mu (\mathbf {x}) ( \mathbf {\omega }_1 \times \mathbf {x} + \mathbf {v}_1 )^\top (\mathbf {\omega }_2 \times \mathbf {x} + \mathbf {v}_2) ds. \end{aligned}$$
This Riemannian metric was already described in the litterature (Zefran and Kumar 1996; Lin and Burdick 2000), and Belta and Kumar (2002) notably suggested its use for interpolation on SE(3). If the covariance matrix considered for the object is isotropic—i.e. \(\varvec{\varLambda }= \lambda \mathbf {I}\), with \(\lambda \in \mathbb {R}^{+*}\), which is the case e.g. for an object of spherical or cubic shape, the proposed distance is moreover locally equivalent to the usual Riemannian distance (4) over \(\textit{SE}(3)\).

Other objects classes Similarly, the pose space of a 3D revolution object can be seen as a 5D manifold. The proposed distance for such an object is indeed locally equivalent to a Riemannian distance over \(\mathbb {S}^2 \times \mathbb {R}^3\) induced by embedding \(\mathbb {S}^2\) as a sphere in a 3D Euclidean space, and considering the usual Euclidean distance regarding the translation part.

Poses of a 2D object with a finite proper symmetry group lie likewise on a 3D manifold, and the proposed distance is locally equivalent to a Riemannian distance over \(\mathbb {S}^1 \times \mathbb {R}^2\) induced by embedding \(\mathbb {S}^1\) as a circle in a 2D Euclidean space, i.e. by considering as distance between two infinitesimally close orientations the angle of the relative rotation between those two (up to a scaling factor).

Finally, pose spaces of spherical 3D objects and circular 2D ones are respectively equivalent to \(\mathbb {R}^3\) and \(\mathbb {R}^2\), associated with the Euclidean distance.

Remarks Despite this local equivalence, the proposed pose spaces can be topologically different from the manifolds evoked above, because of the discrete symmetries of the object. As an example, the proposed pose space for a revolution object with rotoreflection invariance is actually homeomorphic with \({\mathbb {R}}{\mathbb {P}}^2 \times \mathbb {R}^3\), where \({\mathbb {R}}{\mathbb {P}}^2\) is the real projective plane—i.e. a sphere with antipodal points associated—instead of \(\mathbb {S}^2 \times \mathbb {R}^3\).

Moreover, except for 3D spherical or 2D circular objects, the distance (10) is globally different from these Riemannian metrics. Compared to the proposed distance, those latter have several drawbacks.

They are indeed more expensive to estimate since they include trigonometric computations [e.g. for distance (4)]. There is even no known closed-form expression for such distance in the case of an arbitrary 3D object of finite proper symmetry group. Moreover, they do not benefit from the same nice computational properties than the proposed distance regarding the problem of pose averaging, for which they may require iterative approaches (Pennec 1998). But more fundamentally, and as discussed in the introduction, our distance was proposed to quantify the similarity between poses at a global scale. The notion of motion between two poses, expressed in a Riemannian distance, is therefore irrelevant to the kind of applications we are interested in.

10 Application Example

In this section, we illustrate the use of our metric on the problem of rigid object instances detection and pose estimation. Given an input depth image of a scene containing potentially multiple instances of a rigid object, our goal consists in recovering the poses of these instances. We perform experiments with three different objects of different symmetry classes among those shown in Table 2 to illustrate the versatility of our approach:
  • the Stanford bunny—an object without proper symmetry.

  • a candlestick—considered as a revolution object without rotoreflection invariance.

  • a cartoon-like space rocket—which is invariant by rotation of \(120^{\circ }\) along its axis.

For practical reasons, our example is based on synthetic 3D data, depicted in Fig. 5b. It was produced using an off-the-shelf stereo matching algorithm, on a pair of images of a virtual scene lit by a pseudo-random pattern projector (Fig. 5a). Those were synthesized with the rendering engine Blender Cycles (Blender Online Community 2016). The reader is referred to the work of Brégier et al. (2017) for a more quantitative analysis of the interest of the proposed distance for pose estimation.
Fig. 5

Application example: object instances detection and pose estimation via a Mean Shift procedure to extract the main modes of an initial pose distribution. Illustration with three different objects of various symmetry properties. a Stereo pair used for 3D reconstruction. b Reconstructed 3D range data (RGB channel solely for visualization purposes). c From left to right: pose distribution generated from the input 3D data using the method of Drost et al. (2010). Shifted poses using Mean Shift. Shifted poses weighted by the density of the initial distribution. Recovered poses of object instances. A pose distribution is represented by accumulating the 2D silhouettes on the image plane of the object at its different poses (the more silhouettes a pixel belongs to, the darker it is). The process is performed independently for the three different objects

10.1 Mean Shift for Pose Recovery

Among the existing work adapted for a depth image input, some popular approaches (Drost et al. 2010; Fanelli et al. 2011; Tejani et al. 2014) process in a bottom-up approach, generating votes for poses candidates in a Hough-like manner and, identifying those votes to the sampling of a pose distribution, look for its main modes which hopefully correspond to the actual poses of object instances. We place ourselves within such a modes-seeking framework.

Modes detection in a distribution on the pose space is not an easy problem. Grid-based accumulation techniques traditionally used in Hough-like methods are unpractical due to the high dimension of the pose space, except through the use of a sparse structure (Rodrigues et al. 2012) or solely as a preprocessing technique used on a few dimensions (Drost et al. 2010; Rodrigues et al. 2012; Tejani et al. 2014). A popular approach for modes detection more adapted to high dimensional problems is Mean-Shift, a local and non parametric iterative method based on a kernel density estimation of the probability distribution. Unfortunately, this method is designed for vector spaces, which the pose space is not. Fanelli et al. (2011) and Tejani et al. (2014) used nonetheless Mean Shift with a global parametrization of the pose space, but such approach suffers from the intrinsic drawbacks we evoked in the introduction. Tuzel et al. (2005) and later Subbarao and Meer (2006) proposed versions of Mean Shift for Lie groups and Riemannian manifolds that might circumvent those issues, but their approach is computationally expensive as it requires at each iteration to map the samples to the local tangent plane of the point to shift, compute the shift vector through the classical Mean Shift procedure and map it back to get the updated point.

In this example, we show how standard Mean Shift algorithm can be adapted to perform modes detection on the pose space quite efficiently through the use of our distance, even for objects showing proper symmetries properties.

Given an input depth map, we generate a set of votes for object poses \(\left\{ \mathscr {P}_i \right\} _{i=1,\ldots ,n}\) thanks to our own implementation of the method of Drost et al. (2010). It is based on a local aggregation scheme, and is performed by matching geometric features extracted from the input data with those extracted from a model of the object. We use a sampling rate of \(\tau _d = 0.025\) and consider every samples as reference points—the interested reader is referred to the original description of Drost et al. (2010) regarding the meaning of those parameters. This initial pose distribution is quite spread, as can be observed from the blurred effect of the representation Fig. 5c, column 1.

We then consider each of those votes as a starting pose for Mean Shift. A usual practice to speed up drastically computations when seeking modes is to consider only a subset taken from the votes as starting points e.g. by random sampling, but we do not use such approach here to avoid the introduction of additional parameters. For each of the poses to shift, we process iteratively following the usual Mean Shift procedure. Considering a flat Mean Shift kernel of radius r, we find the poses within the set of votes that are within a radius r of the current pose, compute their mean, shift the current pose to this mean and repeat until convergence.

We choose arbitrarily the radius r of our kernel to correspond to 1.5 times the smallest eigenvalue of the matrix \(\varvec{\varLambda }\) for the bunny and the candlestick—that is roughly 75% of the smallest typical dimension of the object.

For the rocket, we use a smaller radius of \(\sqrt{3}/2\) times this eigenvalue, which is the greatest value that satisfies the condition of Proposition 9 (see Appendix B). A bigger radius value may also experimentally give good results, but does not provide the same theoretical guarantees.

Indeed, such choices of radii satisfy to the condition \(r < T/4\), where T is the minimum distance between representatives of the same pose (see Definition 3). We know from Sect. 5.1 that given a representative \(\mathbf {p}\) of the pose \(\mathscr {P}\) to shift, poses closer than r from \(\mathscr {P}\) are the poses who have one representative within a ball of radius r around \(\mathbf {p}\). These representatives can be retrieved efficiently through an off-the-shelf radius search method. Moreover, because r is chosen strictly smaller than T / 2, the representatives retrieved by such query necessarily correspond to different poses and there are therefore no duplicates. Furthermore, these representatives lie in a ball of radius T / 4, therefore we have the insurance from Proposition 9 that we can unambiguously estimate the average of the corresponding poses as the projection on the pose space of the arithmetic mean of these representatives.

As a consequence, adapting the Mean Shift procedure to our pose space given the chosen radius only requires an additional step compared to the usual procedure in a vector space. This step consists in projecting the arithmetic mean of the retrieved representatives on the pose space, which can be performed as described in Sect. 7. Pseudo-code of the adapted Mean Shift algorithm is proposed in Algorithm 1.

The projection of the mean at each iteration is actually not required for the poses to shift towards meaningful modes in practice, and therefore we perform it only once after convergence. The pose distribution obtained after Mean Shift is sharper than the original one, as can be seen on Fig. 5c column 2 where the silhouettes of object instances emerge.

We then estimate the probability density (up to a scaling factor) at a mode \(\mathscr {M}\) by kernel density estimation:
$$\begin{aligned} s(\mathscr {M}) = \sum _i H \left( \frac{{{\mathrm{d}}}(\mathscr {M}, \mathscr {P}_i)}{r} \right) \end{aligned}$$
with H the Epanechnikov kernel to which is associated the flat Mean Shift kernel (Fukunaga and Hostetler 1975):
$$\begin{aligned} H(d) = \left\{ \begin{array}{lc} \frac{3}{4}(1-d^2) &{}\quad \text {if } |d| \le 1\\ 0 &{}\quad \text {otherwise.} \end{array} \right. \end{aligned}$$
The most significant modes based on this estimate can then be extracted. Those poses are assumed to be good pose hypotheses for the object instances of the scene, and typically stand up from the weighted distribution (Fig. 5c column 3). We refine them further through e.g. the ICP procedure (Besl and McKay 1992), and filter them by checking their consistency with the actual data in order to avoid false postives, to hopefully retrieve the poses of object instances in the 3D scene (Fig. 5c column 4).
Fig. 6

Comparison of the proposed distance and a \(\textit{SE}(3)\) distance for pose estimation. Left: shifted poses weighted by the density of the initial pose distribution. Right: first modes retrieved from the pose distribution sorted by descending score, a using the proposed metric or b the SE(3) distance (98) (with the contours of the actual poses of object instances superimposed). Modes are supposed to be good hypotheses regarding the poses of actual instances, and are classified as true positives (blue), duplicates (green, strikethrough) and false positives (yellow, double strikethrough). Both distances perform similarly well for the bunny object, which has no proper symmetry, and the first modes extracted correspond to the different object instances. However, the \(\textit{SE}(3)\) distance does not account for symmetries of the rocket and candlestick objects, and therefore leads to the generation of duplicated pose hypotheses, requiring to consider many of them to recover the pose of every instances. The proposed distance better exploit the information contained in the initial pose distribution, leading to a generation of poses hypotheses with no duplicates, and with a greater relative score gap between modes corresponding to actual instances and spurrious ones (Color figure online)

Theoretical limitation The probabilistic interpretation used here is abusive and should only be considered as a way to give the intuition of the Mean Shift approach. Kernel density estimation over a Riemannian manifold has been mathematically studied by Pelletier (2005), but our approach does not enter into such framework as we do not consider a Riemannian distance. Some theoretical results might nonetheless be obtained, since our metric is equivalent to a Riemannian metric for small Mean Shift radii (see Sect. 9). Such considerations however, are out of the scope of this work, and \(s(\mathscr {M})\) can simply be considered as a score for the pose \(\mathscr {M}\).

10.2 Comparison with a \(\textit{SE}(3)\) Metric

We compare these experimental results with those obtained using a more usual distance adapted to \(\textit{SE}(3)\)
$$\begin{aligned} {{\mathrm{d}}}(\mathbf {T}_1, \mathbf {T}_2) = \sqrt{\Vert \mathbf {t}_2 - \mathbf {t}_1 \Vert ^2 + r^2 \Vert \mathbf {R}_2 - \mathbf {R}_1 \Vert ^2}. \end{aligned}$$
We chose this particular distance because the Mean Shift approach depends on the ability to average multiple poses, and a Frobenius norm over the rotation space is quite suited for this task (Curtis et al. 1993). To limit the comparison bias, we choose the scaling factor r between the rotation and translation parts to be
$$\begin{aligned} r=\sqrt{\frac{\lambda _1^2+\lambda _2^2+\lambda _3^2}{3}}, \end{aligned}$$
where \(\lambda _1 \le \lambda _2 \le \lambda _3\) are the eigenvalues of \(\varvec{\varLambda }\). This choice is indeed consistent with our proposed metric, in that the rotational part of the distance between two poses of an object without proper symmetry corresponds respectively to \(2 \sqrt{2} r \sin (\theta /2)\) for distance (98), and \(2 \sqrt{I_{\mathbf {k}}} \sin (\theta /2)\) for the proposed one, where \(\theta \) is the angle of the relative rotation between the two poses, and \(I_{\mathbf {k}}\) is the inertia moment of the corresponding axis \(\mathbf {k}\) (see Sect. 4.4). Considering a typical value of \(2/3(\lambda _1^2+\lambda _2^2+\lambda _3^2)\) for \(I_{\mathbf {k}}\) enables to identify those two terms.

As illustrated on Fig. 6, we do not observe much differences between the two approaches for the bunny object. This is actually not surprising because the two distances are in this case very similar, since the bunny is not symmetric, and has a limited anisotropy.

However, the benefit of our metric appears for the candlestick and the rocket, that both are symmetric. Initial votes for poses are indeed spread out over the space of rigid transformations \(\textit{SE}(3)\), and considering the SE(3) distance (98) therefore leads to the detection of multiple modes corresponding to the same instance, because of the symmetries. One would have to filter out these duplicated poses hypotheses prior to any practical application, and because of these, it is required to check numerous modes to find every instance of the scene. In our example (Fig. 6b) we had to test up to respectively the 4th and 8th mode to recover the poses of the 3 rockets and candlesticks present in the scene.

On the other hand, the proposed distance enables to account for the proper symmetries of the object, and thus to better exploit the information contained in the initial set of votes than the \(\textit{SE}(3)\) distance. In our example, the 3 first modes extracted indeed correspond to the 3 actual instances for each object, without any duplicates. Moreover, these modes have more support from the initial set of votes and therefore stand out more clearly from the noise, which is important for the robustness of the method. The pose distribution obtained after Mean Shift is indeed visually sharper (Fig. 6, left), and spurious modes for the rocket and the candlestick have a score below respectively 59 and 66% of the ones of the modes corresponding to actual object instances, compared to 89 and 98% when using the distance (98).

11 Summary and Discussion

In this paper, we address issues of the commonly used notion of pose for a rigid object, both in the 2D and 3D case.

While pose is usually assumed to be equivalent to a rigid transformation, this is not true in general due to potential symmetries of the object. We therefore propose a broader definition of the notion of pose, consisting in a distinguishable static state of the object. We show that with this definition, a pose can be considered as an equivalence class of the space of rigid transformations, thanks to the introduction of a proper symmetry group specific to the object. We believe this notion to be essential, as many of manufactured objects actually show some symmetry properties and could not be represented properly previously.

Based on this definition, we propose a metric over the pose space as a measure of the smallest displacement between two poses, the length of a displacement consisting in the RMS displacement distance of surface points of the object. Besides being defined for any physical rigid object, such metric is interesting in that it does not depend on some arbitrary choice of frames or of scaling factors, while accounting for the geometry of the object.

With computation efficiency in mind, we propose a coherent framework to represent poses in a Euclidean space of at most 12 dimensions, so as to enable efficient distance computations, neighborhood queries, and pose averaging, while providing theoretical proofs for those results.

Those developments enable the use of our metric for high level tasks such as pose estimation based on a set of votes, where it appears to provide better results than a metric suited for \(\textit{SE}(3)\).



We would like to thank the anonymous reviewers for their insightful comments and suggestions that greatly helped to improve this article. Some of our illustrations are based on the following mesh models: “Stanford bunny”, from the Stanford University Computer Graphics Laboratory; “Eiffel Tower” created by Pranav Panchal; and “Şamdan 2” (candlestick), from Metin N. Those were respectively available online at, and the GrabCAD and 3D Warehouse plateforms on May 2016.


  1. Angeles, J. (2006). Is there a characteristic length of a rigid-body displacement? Mechanism and Machine Theory, 41(8), 884–896.CrossRefMATHGoogle Scholar
  2. Belta, C., & Kumar, V. (2002). An SVD-based projection method for interpolation on SE (3). IEEE Transactions on Robotics and Automation, 18(3), 334–345.CrossRefGoogle Scholar
  3. Besl, P. J., & McKay, N. D. (1992). A method for registration of 3-D shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(2), 239–256. Scholar
  4. Blender Online Community. (2016). Blender—A 3D modelling and rendering package. Blender Foundation, Blender Institute, Amsterdam.
  5. Brégier, R., Devernay, F., Leyrit, L., et al. (2017). Symmetry aware evaluation of 3D object detection and pose estimation in scenes of many parts in bulk. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2209–2218).Google Scholar
  6. Chirikjian, G. S. (2015). Partial bi-invariance of SE (3) metrics. Journal of Computing and Information Science in Engineering, 15(1), 011,008.CrossRefGoogle Scholar
  7. Chirikjian, G. S., & Zhou, S. (1998). Metrics on motion and deformation of solid models. Journal of Mechanical Design, 120(2), 252–261.CrossRefGoogle Scholar
  8. Curtis, W., Janin, A., & Zikan, K. (1993). A note on averaging rotations. In 1993 IEEE virtual reality annual international symposium, 1993, pp. 377–385.
  9. Di Gregorio, R. (2008). A novel point of view to define the distance between two rigid-body poses. In J. Lenarčič & P. Wenger (Eds.), Advances in robot kinematics: Analysis and design (pp. 361–369). Dordrecht: Springer.
  10. Drost, B., Ulrich, M., Navab, N., & Ilic, S. (2010). Model globally, match locally: Efficient and robust 3D object recognition. In 2010 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 998–1005). IEEE.Google Scholar
  11. Eberharter, J. K., & Ravani, B. (2004). Local metrics for rigid body displacements. Journal of Mechanical Design, 126(5), 805–812. Scholar
  12. Etzel, K. R., & McCarthy, J. M. (1996). A metric for spatial displacement using biquaternions on so (4). In Proceedings of the 1996 IEEE international conference on robotics and automation, 1996 (Vol. 4, pp. 3185–3190). IEEE.Google Scholar
  13. Fanelli, G., Gall, J., & Van Gool, L. (2011). Real time head pose estimation with random regression forests. In 2011 IEEE conference on computer vision and pattern recognition (CVPR), pp 617–624.
  14. Fukunaga, K., & Hostetler, L. D. (1975). The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Transactions on Information Theory, 21(1), 32–40.MathSciNetCrossRefMATHGoogle Scholar
  15. Gramkow, C. (2001). On averaging rotations. Journal of Mathematical Imaging and Vision, 15(1–2), 7–16.MathSciNetCrossRefMATHGoogle Scholar
  16. Gupta, K. C. (1997). Measures of positional error for a rigid body. Journal of Mechanical Design, 119(3), 346–348.CrossRefGoogle Scholar
  17. Hinterstoisser, S., Lepetit, V., Ilic, S., Holzer, S., Bradski, G., Konolige, K., & Navab, N. (2012). Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes. In Asian conference on computer vision (pp. 548–562). Springer.Google Scholar
  18. Kazerounian, K., & Rastegar, J. (1992). Object norms: A class of coordinate and metric independent norms for displacements. Flexible Mechanisms, Dynamics, and Analysis ASME DE, 47, 271–275.Google Scholar
  19. Kendall, A., Grimes, M., & Cipolla, R. (2015). PoseNet: A convolutional network for real-time 6-DOF camera relocalization. In Proceedings of the IEEE international conference on computer vision, pp 2938–2946.Google Scholar
  20. Larochelle, P. M., Murray, A. P., & Angeles, J. (2007). A distance metric for finite sets of rigid-body displacements via the polar decomposition. Journal of Mechanical Design, 129(8), 883–886.CrossRefGoogle Scholar
  21. Lin, Q., & Burdick, J. W. (2000). Objective and frame-invariant kinematic metric functions for rigid bodies. The International Journal of Robotics Research, 19(6), 612–625.CrossRefGoogle Scholar
  22. Martinez, J. M. R., & Duffy, J. (1995). On the metrics of rigid body displacements for infinite and finite bodies. Journal of Mechanical Design, 117(1), 41–47.CrossRefGoogle Scholar
  23. Muja, M., & Lowe, D. G. (2009). Fast approximate nearest neighbors with automatic algorithm configuration. In VISAPP, number 1, pp. 331–340.Google Scholar
  24. Park, F. C. (1995). Distance metrics on the rigid-body motions with applications to mechanism design. Journal of Mechanical Design, 117(1), 48–54.CrossRefGoogle Scholar
  25. Pelletier, B. (2005). Kernel density estimation on riemannian manifolds. Statistics & Probability Letters, 73(3), 297–304. Scholar
  26. Pennec, X. (1998). Computing the mean of geometric features application to the mean rotation. Report, INRIA.Google Scholar
  27. Purwar, A., & Ge, Q. J. (2009). Reconciling distance metric methods for rigid body displacements. In ASME 2009 international design engineering technical conferences and computers and information in engineering conference (pp. 1295–1304). American Society of Mechanical Engineers.Google Scholar
  28. Rodrigues, J. J., Kim, J., Furukawa, M., Xavier, J., Aguiar, P., & Kanade, T. (2012). 6D pose estimation of textureless shiny objects using random ferns for bin-picking. In 2012 IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 3334–3341). IEEE.Google Scholar
  29. Schönemann, P. H. (1966). A generalized solution of the orthogonal procrustes problem. Psychometrika, 31(1), 1–10.MathSciNetCrossRefMATHGoogle Scholar
  30. Sharf, I., Wolf, A., & Rubin, M. (2010). Arithmetic and geometric solutions for average rigid-body rotation. Mechanism and Machine Theory, 45(9), 1239–1251. Scholar
  31. Subbarao, R., & Meer, P. (2006). Nonlinear mean shift for clustering over analytic manifolds. In 2006 IEEE computer society conference on computer vision and pattern recognition (Vol. 1, pp. 1168–1175). IEEE.Google Scholar
  32. Sucan, I., Moll, M., & Kavraki, L. (2012). The open motion planning library. IEEE Robotics Automation Magazine, 19(4), 72–82. Scholar
  33. Tejani, A., Tang, D., Kouskouridas, R., & Kim, T. (2014). Latent-class hough forests for 3D object detection and pose estimation. In Computer vision—ECCV 2014 (pp. 462–477). Springer.Google Scholar
  34. Tjaden, H., Schwanecke, U., & Schömer, E. (2016). Real-time monocular segmentation and pose tracking of multiple objects. In European conference on computer vision (pp. 423–438). SpringerGoogle Scholar
  35. Tuzel, O., Subbarao, R., & Meer, P. (2005). Simultaneous multiple 3D motion estimation via mode finding on lie groups. In Tenth IEEE international conference on computer vision, 2005, ICCV 2005 (Vol. 1, pp. 18–25). IEEE.Google Scholar
  36. Umeyama, S. (1991). Least-squares estimation of transformation parameters between two point patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(4), 376–380.CrossRefGoogle Scholar
  37. Vainsthein, B. K. (1994). Fundamentals of crystals. Berlin: Springer.CrossRefGoogle Scholar
  38. Zefran, M., & Kumar, V. (1996). Planning of smooth motions on SE (3). In Proceedings of the 1996 IEEE international conference on robotics and automation, 1996 (Vol. 1, pp. 121–126). IEEE.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2017

Authors and Affiliations

  1. 1.SiléaneSaint-ÉtienneFrance
  2. 2.Inria Grenoble Rhône-AlpesUniversité Grenoble Alpes (UGA)GrenobleFrance

Personalised recommendations