Defining the Pose of Any 3D Rigid Object and an Associated Distance
Abstract
The pose of a rigid object is usually regarded as a rigid transformation, described by a translation and a rotation. However, equating the pose space with the space of rigid transformations is in general abusive, as it does not account for objects with proper symmetries—which are common among manmade objects. In this article, we define pose as a distinguishable static state of an object, and equate a pose to a set of rigid transformations. Based solely on geometric considerations, we propose a frameinvariant metric on the space of possible poses, valid for any physical rigid object, and requiring no arbitrary tuning. This distance can be evaluated efficiently using a representation of poses within a Euclidean space of at most 12 dimensions depending on the object’s symmetries. This makes it possible to efficiently perform neighborhood queries such as radius searches or knearest neighbor searches within a large set of poses using offtheshelf methods. Pose averaging considering this metric can similarly be performed easily, using a projection function from the Euclidean space onto the pose space. The practical value of those theoretical developments is illustrated with an application of pose estimation of instances of a 3D rigid object given an input depth map, via a Mean Shift procedure.
Keywords
Pose 3D rigid object Symmetry Distance Metric Average Rotation \(\textit{SE}(3)\) \(\textit{SO}(3)\) Object recognition1 Introduction
Rigid body models play an important role in many technical and scientific fields, including physics science, mechanical engineering, computer vision or 3D animation. Under the rigid body assumption, the static state of an object is referred to as a pose, and is often described in term of a position and an orientation.
Poses of a 3D rigid object are in general regarded as rigid transformations and the set of poses is identified with the set of rigid transformations \(\textit{SE}(3)\), the special Euclidean group. The Lie group structure of \(\textit{SE}(3)\) makes the relative displacement of the object between two poses explicit and thus enables the definition of a distance between poses as the length of a shortest motion performing this displacement. This identification is particularly meaningful for applications where the motion of a rigid body is considered—such as motion planning (Sucan et al. 2012) or object tracking (Tjaden et al. 2016). However while there already exists numerous works regarding \(\textit{SE}(3)\) metrics, choosing how to deal with poses of an object still remains challenging, as practitioners face questions such as how to tune the relative importance of position and orientation, even in the current deep learning era (Kendall et al. 2015).
There are applications in which motion considerations are irrelevant, and for which only a notion of similarity between poses is required. Pose estimation of instances of a rigid object based on a noisy set of votes is a good example of such a problem. While motionbased applications rely on local properties of the pose space which have been the subject of a large amount of research work, applications based on similarity have to deal with numerous poses at once, performing operations such as neighborhood queries—i.e. finding poses in a set of poses similar to a given one—or pose averaging and have not gathered as much theoretical interest. Consequently, similarity measures suffering from major flaws are still used in practical applications. Following the work of Fanelli et al. (2011), Tejani et al. (2014) use a Mean Shift procedure based on the Euclidean distance between Euler angles as a representation of poses in their stateoftheart object pose estimation method. Such a measure is fast to compute and enables the use of efficient tools developed for Euclidean spaces to perform the neighborhood queries and pose averaging required for Mean Shift, but it is not a distance. The parametrization of a rotation based on Euler angles notoriously suffers from border effects, singularities, and is dependent on the choice of frame. These issues may only have limited effects on the results announced by the authors, thanks to an appropriate choice of frames orientation and to the low variability of objects orientations within their datasets. Nonetheless, they cannot be avoided when dealing with the general case of poses having arbitrary orientations. Such an example expresses the lack of tools for dealing efficiently with large sets of poses.
Lastly, there are cases where the pose of a rigid object cannot be identified as a single rigid transformation and therefore, for which existing results cannot be applied. Such cases occur when dealing with objects showing symmetry properties such as revolution objects or cuboids, and are, in fact, common among manufactured objects. The existing literature on object pose estimation does not usually discuss how such objects are handled, and the most widespread validation method used for symmetrical objects (Hinterstoisser et al. 2012) consists in a relaxed similarity measure that cannot distinguish between poses such as a cylindrical can being flipped up or down.
Our goal in this paper is to address those issues by providing a consistent and general framework for dealing with any kind of physically admissible rigid object in practical applications. To this end, we propose a pose definition valid for any bounded rigid object, equivalent to a set of rigid transformations (Sect. 2). We then propose a physically meaningful distance over the pose space (Sect. 4), and show how poses can be represented in a Euclidean space to enable fast distance computations and neighborhood queries (Sect. 5). We show how the pose averaging problem can be solved quite efficiently (Sect. 8) for this metric using a projection technique (Sect. 7) and lastly we propose an example application for the problem of pose estimation of instances of a rigid object given a set of votes.
2 A Definition for Pose
We will refer to the set of possible poses as a pose space which we will denote \(\mathscr {C}\) for consistency with the notion of configuration space in robotics literature.
2.1 Link Between the Pose Space and \(\textit{SE}(3)\)
A pose space is highly related to the group of rigid transformations \(\textit{SE}(3)\). Let us consider a rigid object, and \(\mathscr {P}_0 \in \mathscr {C}\) an arbitrary reference pose for this object.
A rigid transformation applied to the object at its reference pose defines a static state of the object, i.e. a pose. In a similar way, a pose \(\mathscr {P} \in \mathscr {C}\) of the object can be reached through a rigid displacement from the reference pose \(\mathscr {P}_0\), and therefore \(\mathscr {P}\) can be described completely by the rigid transformation corresponding to this displacement.
However, the rigid transformation corresponding to a given pose is not necessarily unique and therefore the identification of \(\textit{SE}(3)\) with the pose space is in the general case incorrect. Objects—and especially manufactured ones—may indeed show some proper symmetry properties that make them invariant to some rigid displacements.
2.2 Pose as Equivalence Class of \(\textit{SE}(3)\)
Let \(M \subset SE(3)\) be the set of rigid transformations representing the same pose as a rigid transformation \(\mathbf {T}\). For the bunny object Fig. 1d, M typically consists in the singleton \(\lbrace \mathbf {T} \rbrace \). But M can also contain a continuum of poses in the case of a revolution object such as the candlestick Fig. 1a, or even be a discrete set, such as for the rocket object depicted Fig. 1e where the same pose can be represented by 3 different transformations.
By definition of \(M, G \triangleq \{ \mathbf {T}^{1} \circ \mathbf {M}, \mathbf {M} \in M \}\) is the set of rigid transformations that have no effect on the static state of the object. This set therefore does not depend on the arbitrary transformation \(\mathbf {T}\) considered. It is moreover a subgroup of \(\textit{SE}(3)\). Indeed, combinations and inversions of such transformations can be applied to the object while leaving it unchanged, and the identity transformation has obviously no effect on the pose of the object. We will refer to the elements of this group as the proper symmetries of the object and to \(G \subset SE(3)\) as the group of proper symmetries of the object.
2.3 The Proper Symmetry Group
In the following, we propose a classification of the potential groups of proper symmetries for a physically meaningful bounded object. While models of infinite objects are commonly used e.g. for plane detection in 3D scene analysis, we do not consider those in this article as they do not correspond to actual physical objects and the definition of a suitable metric on the pose space of such objects is typically very dependent on the application. This classification will be helpful to derive the practical results associated with our proposed distance.
All proper symmetries of a bounded object necessarily have a common fixed point, thus we can consider the group of proper symmetries as a subgroup of the rotation group \(\textit{SO}(3)\) by choosing such a point as the origin of the object frame. Subgroups of \(\textit{SO}(3)\) are sometimes referred to as chiral point groups, and have been widely studied, notably in the context of crystallography. The interested reader is referred to Vainsthein (1994) for more insight on the theory of symmetry.
Ignoring the pathological case of infinite subgroups of \(\textit{SO}(3)\) that are not closed under the usual topology as they do not make sense physically, the potential groups of proper symmetries for a bounded object can actually be classified in a few categories.
In the 2D case, a bounded object will either show a circular symmetry—i.e. an invariance by any 2D rotation—or a cyclic symmetry of order \(n \in \mathbb {N}^{*}\)—i.e. an invariance by rotation of 1 / n turn. The special case \(n=1\) actually corresponds to a 2D object without any proper symmetry. Table 1 provides examples of such objects.
Classification of the potential groups of proper symmetries for a 2D bounded physical object

Note that potential indirect symmetries of the object such as reflection symmetries are not accounted for. This is due to the fact that we consider an oriented 3D space—e.g. through the righthand rule—in which reflections are not physically feasible through rigid displacements. Revolution symmetry with rotoreflection invariance is nonetheless considered since it is a proper symmetry group: the reflection symmetry can indeed be generated by the introduction of a rotational invariance of \(180^{\circ }\) along an arbitrary axis orthogonal to the revolution axis.
3 Prior Work on Metrics Over the Pose Space
Classification of the potential groups of proper symmetries for a 3D bounded physical object

3.1 Objectiveness
The identification of the pose space to \(\textit{SE}(3)\) is based on the choice of two arbitrary frames: a frame linked to the object—to which we will refer to as object frame—and a fixed inertial frame such as the object frame coincides with the inertial frame when the object is in the reference pose \(\mathscr {P}_0\). For a distance to be welldefined, it should not depend on an arbitrary choice of those frames, a notion that Lin and Burdick (2000) formalize as objectiveness or frame invariance.
Among possible distances, geodesic distances have focused most interest and have been studied within the framework of Riemannian geometry on the Lie group \(\textit{SE}(3)\). Geodesic distances are wellsuited for applications dealing with motions as they represent the minimum length of a motion to bring the object from one pose to an other. Park (1995) showed that there are no biinvariant Riemannian metrics on \(\textit{SE}(3)\)—that is, invariant to any change of inertial frame (left invariance) and of object frame (right invariance). Chirikjian (2015) recently studied this question further and showed that while continuous biinvariant metrics do not exist, there are continuous leftinvariant distances that are invariant under right shifts by pure rotations.
3.2 HyperRotation Approximation
Nonetheless, several authors have worked on an “approximate biinvariant” metric (Purwar and Ge 2009) for \(\textit{SE}(3)\) through the mapping of rigid transformations to hyperrotations of \(\textit{SO}(4)\), and the use of a biinvariant metric on \(\textit{SO}(4)\). Techniques to perform such mapping have been proposed based on biquaternion representation (Etzel and McCarthy 1996) and polar decomposition (Larochelle et al. 2007). Such transformation unfortunately requires a scaling for the translation part, which has to be set empirically depending on the application (Angeles 2006).
3.3 Decomposition Into Translation and Rotation
Hopefully, while inertial frame invariance is necessary for the objectiveness of a metric, object frame invariance is not. Lin and Burdick (2000) indeed showed that a distance is objective if and only if it is independent of the choice of inertial frame, and transforms by a right shift in response to a change of object frame. Therefore, a method to define an objective metric consists in defining a left invariant distance considering a given object frame and always using this one, in order to avoid having to transform the distance expression.
3.4 Geometric Approaches
Zefran and Kumar (1996) and Lin and Burdick (2000) independently proposed a Riemannian tensor being linked to the notion of kinetic energy and therefore taking into account the object properties without the need for some arbitrary tuning. Their tensor can be seen as a local equivalent of the distance of Kazerounian and Rastegar (1992). However, to the best of our knowledge there is no known closedform expression for the resulting geodesic distance in the general case.
3.5 Local Metric
Various others local parametrization methods exist which, by mapping locally the pose space to a Euclidean space enable to locally define distances. Those parametrizations are e.g. based on the representation of orientation with Euler angles, or the local stereographic projection of the pose space, identified to Study’s quadric—an hypersurface embedded in \(\mathbb {R}^7\) (Eberharter and Ravani 2004). Indepth discussion of this topic is out of the scope of this article as we are interested in global distances.
4 Proposed Metric
In this section, we propose a distance over the pose space of a 3D rigid bounded object, valid even for symmetric ones. This distance can be considered as an extension of the work of Kazerounian and Rastegar (1992) and Chirikjian and Zhou (1998) to arbitrary bounded objects. We also discuss some of its properties.
4.1 Formal Definition
This expression is well defined. The minimum in definition (10) is reached because of the compactness of the proper symmetry group G—as a closed subgroup of \(\textit{SO}(3)\) which itself is compact—and of the continuity of \({{\mathrm{d_{no\_sym}}}}\). Moreover, this definition is by construction independent of the choice of the rigid transformations \(\mathbf {T}_1, \mathbf {T}_2\) identified to the poses considered. We verify easily that it satisfies the conditions of a distance definition: \({{\mathrm{d}}}\) is symmetric, positivedefinite, and triangle inequality derives from the triangle inequality satisfied by \({{\mathrm{d_{no\_sym}}}}\), which is a direct consequence of Minkowski inequality. An equivalent formulation of this distance, involving a single minimization over G, is introduced in Proposition 1.
In typical applications, one is particularly interested in the positioning of the surface of the object. Therefore in our experiments, we consider the surface of the object as set of points \(\mathscr {S}\). The density function \(\mu \) can be used to modulate the importance of the positioning of specific areas, but without additional information it is natural to consider an uniform weight \(\mu =1\).
4.2 Objectiveness
The proposed distance is independent by construction of the choice of some arbitrary frames as it admits a purely geometric interpretation, a point we discuss in Sect. 4.3.
4.3 Geometric Interpretation
This formulation is simpler than the Definition 2, however it breaks the symmetry of the roles of the two poses.
Proof
4.4 Rotation Anisotropy
5 Efficient Distance Computation
The distance Definition 2 and the simpler expression of Proposition 1 are of little practical use for actual applications as they contain a summation term over the set of points of the object and a minimization over its proper symmetry group, both sets being potentially infinite.
In this section, we show how our proposed distance can be evaluated efficiently. To this aim, we propose a representation of a pose \(\mathscr {P}\) as a finite set of points \(\mathscr {R}(\mathscr {P})\) of a Euclidean space \(\mathbb {R}^N\) of at most 12 dimensions, depending on the object’s symmetries. We refer to an element of \(\mathscr {R}(\mathscr {P})\) as a representative of \(\mathscr {P}\), since a representative completely defines a pose (see Sect. 7).
Proposed representatives for a pose \(\mathscr {P} = [(\mathbf {R} \in SO(3), \mathbf {t} \in \mathbb {R}^3)]\) of a 3D object depending on its proper symmetries
Proper symmetry class  Proper symmetry group G  Pose representatives \(\mathscr {R}(\mathscr {P})\) 

Spherical symmetry  \(\textit{SO}(3)\)  \(\mathbf {t} \in \mathbb {R}^3\) 
Revolution symmetry without rotoreflection invariance  \(\left\{ \mathbf {R}_z^\alpha \vert \alpha \in \mathbb {R} \right\} \)  \( (\lambda (\mathbf {R} \mathbf {e}_z)^\top , \mathbf {t}^\top )^\top \in \mathbb {R}^{6}\) 
Revolution symmetry with rotoreflection invariance  \(\left\{ \mathbf {R}_x^\delta \mathbf {R}_z^\alpha \vert \delta \in \left\{ 0, \pi \right\} , \alpha \in \mathbb {R} \right\} \)  \(\left\{ (\pm \lambda (\mathbf {R} \mathbf {e}_z)^\top , \mathbf {t}^\top )^\top \right\} \subset \mathbb {R}^{6}\) 
No proper symmetry  \(\left\{ \mathbf {I} \right\} \)  \( ({{\mathrm{vec}}}(\mathbf {R} \varvec{\varLambda })^\top , \mathbf {t}^\top )^\top \in \mathbb {R}^{12}\) 
Finite nontrivial  Finite  \(\left\{ ({{\mathrm{vec}}}(\mathbf {R} \mathbf {G} \varvec{\varLambda })^\top , \mathbf {t}^\top )^\top  \mathbf {G} \in G \right\} \subset \mathbb {R}^{12}\) 
Proposed representatives for a pose \(\mathscr {P} = [(\theta \in \mathbb {R}, \mathbf {t} \in \mathbb {R}^2)]\) of a 2D object depending on its proper symmetries
Proper symmetry class  Proper symmetry group G  Pose representatives \(\mathscr {R}(\mathscr {P})\) 

Circular symmetry  \(\textit{SO}(2)\)  \(\mathbf {t} \in \mathbb {R}^2\) 
No proper symmetry  \(\left\{ \mathbf {I} \right\} \)  \((\lambda e^{i \theta }, \mathbf {t}^\top )^\top \in \mathbb {R}^{4}\) 
Cyclic symmetry (order \(n \in \mathbb {N}^*\))  \(\left\{ \mathbf {R}^{2 k \pi /n} \vert k \in \llbracket 0, n \llbracket \right\} \)  \(\left\{ (\lambda e^{i (\theta + 2 k \pi / n)}, \mathbf {t}^\top )^\top \vert k \in \llbracket 0, n \llbracket \right\} \subset \mathbb {R}^{4}\) 
5.1 Neighborhood Query
The distance formulation (21) is of great practical value as it enables to perform efficient radius search and exact or approximate knearest neighbors queries within a large set of poses through the use of any offtheshelf neighborhood query algorithms designed for Euclidean spaces. Neighborhood queries are useful for numerous problems, and we provide an example in Sect. 10 where radius search is heavily used. Existing methods for neighborhood queries enable fast neighborhood retrieval within a set of points of a vector space compared to a bruteforce approach consisting in computing the distance to every points of the set. They make use of a specific search structure—such as a grid or a kDtree for example—adapted to the specific properties of the considered metric space. A review of those algorithms is out of the scope of this work, and we will only refer the interested reader to the wellknown FLANN library (Muja and Lowe 2009) as a starting point.
Such representatives can be retrieved through a standard radius search operation around \(\mathbf {q}\) in R. The search for nearest neighbors can be performed in a similar fashion.
T can be computed considering an arbitrary pose \(\mathscr {P}\) because of the invariance of our underlying metric to the choice of a reference pose (see Sect. 4.2), and considering an arbitrary representative \(\mathbf {p} \in \mathscr {R}(\mathscr {P})\) because of the symmetry properties of representatives described in Sect. 6.
5.2 Decomposition Into Translation and Rotation Parts
5.3 Object with No Proper Symmetries
5.4 Revolution Object Without Rotoreflection Invariance
We now consider the case of a revolution object without rotoreflection invariance. As stated in Sect. 5.2, we assume that the origin of the object frame corresponds to the center of mass of the object. Without loss of generality, we moreover assume that the axis \(\mathbf {e}_z\) of the object frame is aligned with the revolution axis. A pose \(\mathscr {P}\) is thus defined up to a rotation \(\mathbf {R}_z^\phi \) along the \(\mathbf {e}_z\) axis, where \(\phi \) is the angle of the considered rotation, and the proper symmetry group of the object consists in \(G = \left\{ \mathbf {R}_z^\phi \vert \phi \in \mathbb {R} \right\} \).
5.5 Spherical Object
5.6 Revolution Object with Rotoreflection Invariance
5.7 Object with a Nontrivial Finite Proper Symmetry Group
5.8 2D Object
The notion of pose representative can be applied to 2D objects as well. For the sake of conciseness, we will only discuss the case of a 2D object with no proper symmetry, as the reasoning is very similar to the one performed for 3D objects. The full list of proposed representatives is given in Table 4.
6 Symmetry Within Representatives
Proposed symmetry operations on the ambient space for objects with multiple representatives per pose
Object type  Proper symmetry class  Symmetry group \(G_\mathscr {R}\)  Symmetry definition 

3D  Finite  \(\left\{ s_{\mathbf {G} } \vert \mathbf {G} \in G \right\} \)  \(\begin{aligned}&s_{\mathbf {G}} : \mathbb {R}^{12} \rightarrow \mathbb {R}^{12} \\&({{\mathrm{vec}}}(\mathbf {M})^\top , \mathbf {t}^\top )^\top \mapsto ({{\mathrm{vec}}}(\mathbf {M} \mathbf {G})^\top , \mathbf {t}^\top )^\top \end{aligned}\) 
Revolution with rotoreflection invariance  \(\left\{ s_{\text {rev}, \delta } \vert \delta = \pm 1 \right\} \)  \(\begin{aligned} s_{\text {rev}, \delta } : \mathbb {R}^{6}&\rightarrow \mathbb {R}^{6} \\ (\mathbf {a}^\top , \mathbf {t}^\top )^\top&\mapsto (\delta \mathbf {a}^\top , \mathbf {t}^\top )^\top \end{aligned}\)  
2D  Cyclic (order \(n \in \mathbb {N}^*\))  \(\left\{ s_{\text {2D}, n, k} \vert k \in \llbracket 0, n \llbracket \right\} \)  \(\begin{aligned} s_{\text {2D}, n, k} : \mathbb {R}^{4}&\rightarrow \mathbb {R}^{4} \\ (\mathbf {a}^\top , \mathbf {t}^\top )^\top&\mapsto (e^{i 2 k \pi / n} \cdot \mathbf {a}, \mathbf {t}^\top )^\top . \end{aligned}\) 
First, we ensure that the proposed group is well defined:
Proposition 2
\(G_\mathscr {R}\) is a group for the composition operation.
Proof
This property derives directly from the group properties of \(\mathbf {G}\), \(\lbrace 1, 1 \rbrace \) and \(\lbrace e^{i 2 k \pi / n} \vert k \in \llbracket 0, n \rrbracket \rbrace \) for multiplication operations. \(\square \)
Then, we introduce the following lemma, which somehow expresses the fact that the geometry of the object is consistent with the object’s symmetries:
Lemma 1
For any proper symmetry \(\mathbf {G} \in G\), \(\mathbf {G}\) and \(\varvec{\varLambda }\) commute, i.e. \(\mathbf {G} \varvec{\varLambda }= \varvec{\varLambda }\mathbf {G}\).
Proof
Thanks to this lemma, it is now possible to exhibit the three following properties of those symmetries within the ambient space:
Proposition 3
Proof
Proposition 4
This proposition is a direct consequence of the definition of symmetries described Table 5.
Proposition 5
Proof

\((s_{\mathbf {G}})^{1} = s_{\mathbf {G}^{1}}\) for any \(\mathbf {G} \in G\).

\((s_{\text {rev}, \delta })^{1} = s_{\text {rev}, \delta }\) for any \(\delta \in \left\{ 1, 1 \right\} \).

\((s_{\text {2D}, n, k})^{1} = s_{\text {2D}, n, k}\) for any \(k \in \mathbb {N}\).
7 Projection Onto the Pose Space
In nondegenerate cases, the projection is unique, and we propose in the next subsections its expression for the different classes of bounded objects, based on the computation of the closest pose representative to the query point.
7.1 Spherical Object
Projection is trivial in the case of a spherical object, since all points of \(\mathbb {R}^3\) are valid representatives of poses. A point \(\mathbf {x} \in \mathbb {R}^3\) therefore projects onto the pose having \(\mathbf {x}\) for representative, namely the pose in which the center of the object admits \(\mathbf {x}\) for 3D coordinates.
7.2 Object of Revolution
In the case of a revolution object without rotoreflection invariance, the position of the center of mass and the oriented revolution axis of the object are well defined at any given pose. Reciprocally, a pose can be defined by the position of its center of mass \(\mathbf {t}\) and its oriented revolution axis, that we represent by a normalized vector \(\mathbf {a} \in \mathbb {R}^3\). The unique representative of such a pose is \((\lambda \mathbf {a}^\top , \mathbf {t}^\top )^\top \) as we defined in Sect. 5.4.
7.3 Object with a Finite Proper Symmetry Group
7.4 2D Object
The projection problem for a 2D object is similar to the 3D case.
In the case of a circular object, any point of \(\mathbf {x} \in \mathbb {R}^2\) is a valid representative of a pose and therefore projects onto the pose of center \(\mathbf {x}\).
Regarding an object with cyclic symmetry, we conclude by the same reasoning as in the case of a 3D revolution object that a 4D vector \(\mathbf {x} = (\mathbf {a}^\top , \mathbf {t}^\top )^\top \), where \(\mathbf {a}, \mathbf {t} \in \mathbb {R}^2\), admits an unique projection as long as \(\Vert \mathbf {a} \Vert \ne 0\). The projection admits a representative \((\lambda / \Vert \mathbf {a} \Vert \cdot \mathbf {a}^\top , \mathbf {t}^\top )^\top \), and consists of the pose defined by a translation \(\mathbf {t}\) and a rotation of angle \(\arg (\mathbf {a})\), where \(\arg (\mathbf {a})\) is the argument of \(\mathbf {a}\) seen as a complex number.
8 Averaging Poses
Pose averaging is of great use for applications such as denoising, modes detection or interpolation. Definition of the average is not obvious in nonvector spaces such as ours, and we therefore consider a generalization of the average to arbitrary metric spaces, known as the Fréchet mean.
The problem of pose averaging has already been studied for objects without proper symmetry with various metrics. Sharf et al. (2010) notably compare different averaging techniques for the rotation part of a pose, using common metrics. While there is no known closedform solution for the Riemannian metric (4), it can be computed iteratively, and admits closed form approximations which are “good enough” for practical applications (Gramkow 2001). A good approximation, when dealing with more than two poses, is based on computing the average of rotation matrices, and actually corresponds to the exact average when considering the distance (7) (Curtis et al. 1993).
8.1 Objects with a Single Representative Per Pose
8.2 Objects with Multiple Representatives Per Pose
In the other cases, a pose admits several representatives and one should consider the different combinations of representatives to find the exact average—assuming its existence and uniqueness in nonpathological cases. This problem is not specific to our method and is similar to the issue encountered when averaging orientations of an object without proper symmetries through the arithmetic mean of their quaternions representatives, each orientation admitting two antipodal quaternions as representatives. Because the number of combinations of representatives is exponential in the number of considered poses, the exact computation of the average might easily become expensive.
In this subsection, we propose an stricter definition of the consistency of a combination of representatives and prove that it enables an unambiguous estimation of the mean.
Definition 5
In other words, a consistent tuple is a set of pose representatives closer one another than to any other representatives.
Proposition 6
Proof
Proposition 7
Proof
This result can be easily verified in the case of a revolution object with rotoreflection symmetry or a 2D cyclic object. Therefore, we only discuss the case of an object with a finite proper symmetry group.
Proof
Such estimation corresponds most likely to the actual mean (61), unfortunately we do not have a proof of this conjecture.
8.3 Sufficient Conditions of Consistency
While the average of poses can be easily estimated given a consistent combination of representatives, there are however cases where no such combination exist, e.g. when trying to average poses spread out over the set of orientations such as illustrated on Fig. 4a–c. In such a case, one might have to perform an exhaustive evaluation of the Fréchet variance for the different combinations in order to pick the one corresponding to the actual mean. Fortunately, this case is of limited practical interest, as the mean makes little sense.
Proof
Proof
These sufficient conditions can easily be generalized by considering that only the orientation parts of poses representatives have actually to be close enough one an other. This is a direct consequence to the fact that the pose space can be decomposed into a Cartesian product of a position and an orientation space, and that symmetry considerations only affect orientation for a bounded object.
9 Local Properties
While we focus in this article on global metric properties, the proposed distance can be shown locally equivalent to a Riemannian metric over the pose space manifold. We therefore briefly discuss in this section those aspects.
Other objects classes Similarly, the pose space of a 3D revolution object can be seen as a 5D manifold. The proposed distance for such an object is indeed locally equivalent to a Riemannian distance over \(\mathbb {S}^2 \times \mathbb {R}^3\) induced by embedding \(\mathbb {S}^2\) as a sphere in a 3D Euclidean space, and considering the usual Euclidean distance regarding the translation part.
Poses of a 2D object with a finite proper symmetry group lie likewise on a 3D manifold, and the proposed distance is locally equivalent to a Riemannian distance over \(\mathbb {S}^1 \times \mathbb {R}^2\) induced by embedding \(\mathbb {S}^1\) as a circle in a 2D Euclidean space, i.e. by considering as distance between two infinitesimally close orientations the angle of the relative rotation between those two (up to a scaling factor).
Finally, pose spaces of spherical 3D objects and circular 2D ones are respectively equivalent to \(\mathbb {R}^3\) and \(\mathbb {R}^2\), associated with the Euclidean distance.
Remarks Despite this local equivalence, the proposed pose spaces can be topologically different from the manifolds evoked above, because of the discrete symmetries of the object. As an example, the proposed pose space for a revolution object with rotoreflection invariance is actually homeomorphic with \({\mathbb {R}}{\mathbb {P}}^2 \times \mathbb {R}^3\), where \({\mathbb {R}}{\mathbb {P}}^2\) is the real projective plane—i.e. a sphere with antipodal points associated—instead of \(\mathbb {S}^2 \times \mathbb {R}^3\).
Moreover, except for 3D spherical or 2D circular objects, the distance (10) is globally different from these Riemannian metrics. Compared to the proposed distance, those latter have several drawbacks.
They are indeed more expensive to estimate since they include trigonometric computations [e.g. for distance (4)]. There is even no known closedform expression for such distance in the case of an arbitrary 3D object of finite proper symmetry group. Moreover, they do not benefit from the same nice computational properties than the proposed distance regarding the problem of pose averaging, for which they may require iterative approaches (Pennec 1998). But more fundamentally, and as discussed in the introduction, our distance was proposed to quantify the similarity between poses at a global scale. The notion of motion between two poses, expressed in a Riemannian distance, is therefore irrelevant to the kind of applications we are interested in.
10 Application Example

the Stanford bunny—an object without proper symmetry.

a candlestick—considered as a revolution object without rotoreflection invariance.

a cartoonlike space rocket—which is invariant by rotation of \(120^{\circ }\) along its axis.
10.1 Mean Shift for Pose Recovery
Among the existing work adapted for a depth image input, some popular approaches (Drost et al. 2010; Fanelli et al. 2011; Tejani et al. 2014) process in a bottomup approach, generating votes for poses candidates in a Houghlike manner and, identifying those votes to the sampling of a pose distribution, look for its main modes which hopefully correspond to the actual poses of object instances. We place ourselves within such a modesseeking framework.
Modes detection in a distribution on the pose space is not an easy problem. Gridbased accumulation techniques traditionally used in Houghlike methods are unpractical due to the high dimension of the pose space, except through the use of a sparse structure (Rodrigues et al. 2012) or solely as a preprocessing technique used on a few dimensions (Drost et al. 2010; Rodrigues et al. 2012; Tejani et al. 2014). A popular approach for modes detection more adapted to high dimensional problems is MeanShift, a local and non parametric iterative method based on a kernel density estimation of the probability distribution. Unfortunately, this method is designed for vector spaces, which the pose space is not. Fanelli et al. (2011) and Tejani et al. (2014) used nonetheless Mean Shift with a global parametrization of the pose space, but such approach suffers from the intrinsic drawbacks we evoked in the introduction. Tuzel et al. (2005) and later Subbarao and Meer (2006) proposed versions of Mean Shift for Lie groups and Riemannian manifolds that might circumvent those issues, but their approach is computationally expensive as it requires at each iteration to map the samples to the local tangent plane of the point to shift, compute the shift vector through the classical Mean Shift procedure and map it back to get the updated point.
In this example, we show how standard Mean Shift algorithm can be adapted to perform modes detection on the pose space quite efficiently through the use of our distance, even for objects showing proper symmetries properties.
Given an input depth map, we generate a set of votes for object poses \(\left\{ \mathscr {P}_i \right\} _{i=1,\ldots ,n}\) thanks to our own implementation of the method of Drost et al. (2010). It is based on a local aggregation scheme, and is performed by matching geometric features extracted from the input data with those extracted from a model of the object. We use a sampling rate of \(\tau _d = 0.025\) and consider every samples as reference points—the interested reader is referred to the original description of Drost et al. (2010) regarding the meaning of those parameters. This initial pose distribution is quite spread, as can be observed from the blurred effect of the representation Fig. 5c, column 1.
We then consider each of those votes as a starting pose for Mean Shift. A usual practice to speed up drastically computations when seeking modes is to consider only a subset taken from the votes as starting points e.g. by random sampling, but we do not use such approach here to avoid the introduction of additional parameters. For each of the poses to shift, we process iteratively following the usual Mean Shift procedure. Considering a flat Mean Shift kernel of radius r, we find the poses within the set of votes that are within a radius r of the current pose, compute their mean, shift the current pose to this mean and repeat until convergence.
We choose arbitrarily the radius r of our kernel to correspond to 1.5 times the smallest eigenvalue of the matrix \(\varvec{\varLambda }\) for the bunny and the candlestick—that is roughly 75% of the smallest typical dimension of the object.
For the rocket, we use a smaller radius of \(\sqrt{3}/2\) times this eigenvalue, which is the greatest value that satisfies the condition of Proposition 9 (see Appendix B). A bigger radius value may also experimentally give good results, but does not provide the same theoretical guarantees.
Indeed, such choices of radii satisfy to the condition \(r < T/4\), where T is the minimum distance between representatives of the same pose (see Definition 3). We know from Sect. 5.1 that given a representative \(\mathbf {p}\) of the pose \(\mathscr {P}\) to shift, poses closer than r from \(\mathscr {P}\) are the poses who have one representative within a ball of radius r around \(\mathbf {p}\). These representatives can be retrieved efficiently through an offtheshelf radius search method. Moreover, because r is chosen strictly smaller than T / 2, the representatives retrieved by such query necessarily correspond to different poses and there are therefore no duplicates. Furthermore, these representatives lie in a ball of radius T / 4, therefore we have the insurance from Proposition 9 that we can unambiguously estimate the average of the corresponding poses as the projection on the pose space of the arithmetic mean of these representatives.
The projection of the mean at each iteration is actually not required for the poses to shift towards meaningful modes in practice, and therefore we perform it only once after convergence. The pose distribution obtained after Mean Shift is sharper than the original one, as can be seen on Fig. 5c column 2 where the silhouettes of object instances emerge.
Theoretical limitation The probabilistic interpretation used here is abusive and should only be considered as a way to give the intuition of the Mean Shift approach. Kernel density estimation over a Riemannian manifold has been mathematically studied by Pelletier (2005), but our approach does not enter into such framework as we do not consider a Riemannian distance. Some theoretical results might nonetheless be obtained, since our metric is equivalent to a Riemannian metric for small Mean Shift radii (see Sect. 9). Such considerations however, are out of the scope of this work, and \(s(\mathscr {M})\) can simply be considered as a score for the pose \(\mathscr {M}\).
10.2 Comparison with a \(\textit{SE}(3)\) Metric
As illustrated on Fig. 6, we do not observe much differences between the two approaches for the bunny object. This is actually not surprising because the two distances are in this case very similar, since the bunny is not symmetric, and has a limited anisotropy.
However, the benefit of our metric appears for the candlestick and the rocket, that both are symmetric. Initial votes for poses are indeed spread out over the space of rigid transformations \(\textit{SE}(3)\), and considering the SE(3) distance (98) therefore leads to the detection of multiple modes corresponding to the same instance, because of the symmetries. One would have to filter out these duplicated poses hypotheses prior to any practical application, and because of these, it is required to check numerous modes to find every instance of the scene. In our example (Fig. 6b) we had to test up to respectively the 4th and 8th mode to recover the poses of the 3 rockets and candlesticks present in the scene.
On the other hand, the proposed distance enables to account for the proper symmetries of the object, and thus to better exploit the information contained in the initial set of votes than the \(\textit{SE}(3)\) distance. In our example, the 3 first modes extracted indeed correspond to the 3 actual instances for each object, without any duplicates. Moreover, these modes have more support from the initial set of votes and therefore stand out more clearly from the noise, which is important for the robustness of the method. The pose distribution obtained after Mean Shift is indeed visually sharper (Fig. 6, left), and spurious modes for the rocket and the candlestick have a score below respectively 59 and 66% of the ones of the modes corresponding to actual object instances, compared to 89 and 98% when using the distance (98).
11 Summary and Discussion
In this paper, we address issues of the commonly used notion of pose for a rigid object, both in the 2D and 3D case.
While pose is usually assumed to be equivalent to a rigid transformation, this is not true in general due to potential symmetries of the object. We therefore propose a broader definition of the notion of pose, consisting in a distinguishable static state of the object. We show that with this definition, a pose can be considered as an equivalence class of the space of rigid transformations, thanks to the introduction of a proper symmetry group specific to the object. We believe this notion to be essential, as many of manufactured objects actually show some symmetry properties and could not be represented properly previously.
Based on this definition, we propose a metric over the pose space as a measure of the smallest displacement between two poses, the length of a displacement consisting in the RMS displacement distance of surface points of the object. Besides being defined for any physical rigid object, such metric is interesting in that it does not depend on some arbitrary choice of frames or of scaling factors, while accounting for the geometry of the object.
With computation efficiency in mind, we propose a coherent framework to represent poses in a Euclidean space of at most 12 dimensions, so as to enable efficient distance computations, neighborhood queries, and pose averaging, while providing theoretical proofs for those results.
Those developments enable the use of our metric for high level tasks such as pose estimation based on a set of votes, where it appears to provide better results than a metric suited for \(\textit{SE}(3)\).
Notes
Acknowledgements
We would like to thank the anonymous reviewers for their insightful comments and suggestions that greatly helped to improve this article. Some of our illustrations are based on the following mesh models: “Stanford bunny”, from the Stanford University Computer Graphics Laboratory; “Eiffel Tower” created by Pranav Panchal; and “Şamdan 2” (candlestick), from Metin N. Those were respectively available online at http://graphics.stanford.edu/data/3Dscanrep, and the GrabCAD and 3D Warehouse plateforms on May 2016.
References
 Angeles, J. (2006). Is there a characteristic length of a rigidbody displacement? Mechanism and Machine Theory, 41(8), 884–896.CrossRefMATHGoogle Scholar
 Belta, C., & Kumar, V. (2002). An SVDbased projection method for interpolation on SE (3). IEEE Transactions on Robotics and Automation, 18(3), 334–345.CrossRefGoogle Scholar
 Besl, P. J., & McKay, N. D. (1992). A method for registration of 3D shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(2), 239–256. https://doi.org/10.1109/34.121791.CrossRefGoogle Scholar
 Blender Online Community. (2016). Blender—A 3D modelling and rendering package. Blender Foundation, Blender Institute, Amsterdam. http://www.blender.org
 Brégier, R., Devernay, F., Leyrit, L., et al. (2017). Symmetry aware evaluation of 3D object detection and pose estimation in scenes of many parts in bulk. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2209–2218).Google Scholar
 Chirikjian, G. S. (2015). Partial biinvariance of SE (3) metrics. Journal of Computing and Information Science in Engineering, 15(1), 011,008.CrossRefGoogle Scholar
 Chirikjian, G. S., & Zhou, S. (1998). Metrics on motion and deformation of solid models. Journal of Mechanical Design, 120(2), 252–261.CrossRefGoogle Scholar
 Curtis, W., Janin, A., & Zikan, K. (1993). A note on averaging rotations. In 1993 IEEE virtual reality annual international symposium, 1993, pp. 377–385. https://doi.org/10.1109/VRAIS.1993.380755.
 Di Gregorio, R. (2008). A novel point of view to define the distance between two rigidbody poses. In J. Lenarčič & P. Wenger (Eds.), Advances in robot kinematics: Analysis and design (pp. 361–369). Dordrecht: Springer. https://doi.org/10.1007/9781402086007_38.
 Drost, B., Ulrich, M., Navab, N., & Ilic, S. (2010). Model globally, match locally: Efficient and robust 3D object recognition. In 2010 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 998–1005). IEEE.Google Scholar
 Eberharter, J. K., & Ravani, B. (2004). Local metrics for rigid body displacements. Journal of Mechanical Design, 126(5), 805–812. https://doi.org/10.1115/1.1767816.CrossRefGoogle Scholar
 Etzel, K. R., & McCarthy, J. M. (1996). A metric for spatial displacement using biquaternions on so (4). In Proceedings of the 1996 IEEE international conference on robotics and automation, 1996 (Vol. 4, pp. 3185–3190). IEEE.Google Scholar
 Fanelli, G., Gall, J., & Van Gool, L. (2011). Real time head pose estimation with random regression forests. In 2011 IEEE conference on computer vision and pattern recognition (CVPR), pp 617–624. https://doi.org/10.1109/CVPR.2011.5995458.
 Fukunaga, K., & Hostetler, L. D. (1975). The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Transactions on Information Theory, 21(1), 32–40.MathSciNetCrossRefMATHGoogle Scholar
 Gramkow, C. (2001). On averaging rotations. Journal of Mathematical Imaging and Vision, 15(1–2), 7–16.MathSciNetCrossRefMATHGoogle Scholar
 Gupta, K. C. (1997). Measures of positional error for a rigid body. Journal of Mechanical Design, 119(3), 346–348.CrossRefGoogle Scholar
 Hinterstoisser, S., Lepetit, V., Ilic, S., Holzer, S., Bradski, G., Konolige, K., & Navab, N. (2012). Model based training, detection and pose estimation of textureless 3d objects in heavily cluttered scenes. In Asian conference on computer vision (pp. 548–562). Springer.Google Scholar
 Kazerounian, K., & Rastegar, J. (1992). Object norms: A class of coordinate and metric independent norms for displacements. Flexible Mechanisms, Dynamics, and Analysis ASME DE, 47, 271–275.Google Scholar
 Kendall, A., Grimes, M., & Cipolla, R. (2015). PoseNet: A convolutional network for realtime 6DOF camera relocalization. In Proceedings of the IEEE international conference on computer vision, pp 2938–2946.Google Scholar
 Larochelle, P. M., Murray, A. P., & Angeles, J. (2007). A distance metric for finite sets of rigidbody displacements via the polar decomposition. Journal of Mechanical Design, 129(8), 883–886.CrossRefGoogle Scholar
 Lin, Q., & Burdick, J. W. (2000). Objective and frameinvariant kinematic metric functions for rigid bodies. The International Journal of Robotics Research, 19(6), 612–625.CrossRefGoogle Scholar
 Martinez, J. M. R., & Duffy, J. (1995). On the metrics of rigid body displacements for infinite and finite bodies. Journal of Mechanical Design, 117(1), 41–47.CrossRefGoogle Scholar
 Muja, M., & Lowe, D. G. (2009). Fast approximate nearest neighbors with automatic algorithm configuration. In VISAPP, number 1, pp. 331–340.Google Scholar
 Park, F. C. (1995). Distance metrics on the rigidbody motions with applications to mechanism design. Journal of Mechanical Design, 117(1), 48–54.CrossRefGoogle Scholar
 Pelletier, B. (2005). Kernel density estimation on riemannian manifolds. Statistics & Probability Letters, 73(3), 297–304. https://doi.org/10.1016/j.spl.2005.04.004.MathSciNetCrossRefMATHGoogle Scholar
 Pennec, X. (1998). Computing the mean of geometric features application to the mean rotation. Report, INRIA.Google Scholar
 Purwar, A., & Ge, Q. J. (2009). Reconciling distance metric methods for rigid body displacements. In ASME 2009 international design engineering technical conferences and computers and information in engineering conference (pp. 1295–1304). American Society of Mechanical Engineers.Google Scholar
 Rodrigues, J. J., Kim, J., Furukawa, M., Xavier, J., Aguiar, P., & Kanade, T. (2012). 6D pose estimation of textureless shiny objects using random ferns for binpicking. In 2012 IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 3334–3341). IEEE.Google Scholar
 Schönemann, P. H. (1966). A generalized solution of the orthogonal procrustes problem. Psychometrika, 31(1), 1–10.MathSciNetCrossRefMATHGoogle Scholar
 Sharf, I., Wolf, A., & Rubin, M. (2010). Arithmetic and geometric solutions for average rigidbody rotation. Mechanism and Machine Theory, 45(9), 1239–1251. https://doi.org/10.1016/j.mechmachtheory.2010.05.002.CrossRefMATHGoogle Scholar
 Subbarao, R., & Meer, P. (2006). Nonlinear mean shift for clustering over analytic manifolds. In 2006 IEEE computer society conference on computer vision and pattern recognition (Vol. 1, pp. 1168–1175). IEEE.Google Scholar
 Sucan, I., Moll, M., & Kavraki, L. (2012). The open motion planning library. IEEE Robotics Automation Magazine, 19(4), 72–82. https://doi.org/10.1109/MRA.2012.2205651.CrossRefGoogle Scholar
 Tejani, A., Tang, D., Kouskouridas, R., & Kim, T. (2014). Latentclass hough forests for 3D object detection and pose estimation. In Computer vision—ECCV 2014 (pp. 462–477). Springer.Google Scholar
 Tjaden, H., Schwanecke, U., & Schömer, E. (2016). Realtime monocular segmentation and pose tracking of multiple objects. In European conference on computer vision (pp. 423–438). SpringerGoogle Scholar
 Tuzel, O., Subbarao, R., & Meer, P. (2005). Simultaneous multiple 3D motion estimation via mode finding on lie groups. In Tenth IEEE international conference on computer vision, 2005, ICCV 2005 (Vol. 1, pp. 18–25). IEEE.Google Scholar
 Umeyama, S. (1991). Leastsquares estimation of transformation parameters between two point patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(4), 376–380.CrossRefGoogle Scholar
 Vainsthein, B. K. (1994). Fundamentals of crystals. Berlin: Springer.CrossRefGoogle Scholar
 Zefran, M., & Kumar, V. (1996). Planning of smooth motions on SE (3). In Proceedings of the 1996 IEEE international conference on robotics and automation, 1996 (Vol. 1, pp. 121–126). IEEE.Google Scholar