After having estimated the mesh motion as described in Sect. 3, we have for each vertex the trajectory \(\mathcal {T}_i\). We use the trajectories together with the shape of the mesh \(\mathcal {M}\) to reconstruct the underlying skeleton. To this end, we first segment the trajectories as described in Sect. 4.1 and then infer the skeleton structure, which will be explained in Sect. 4.2.
4.1 Motion Segmentation
In contrast to feature based trajectories, the mesh motion provides trajectories of the same length and a trajectory for each vertex, even if the vertex has never been observed in the sequence due to occlusions. This means that clustering the trajectories also segments the mesh into rigid parts.
Similar to 2D motion segmentation approaches for RGB videos [28], we define an affinity matrix based on the 3D trajectories and use spectral clustering for motion segmentation. The affinity matrix
$$\begin{aligned} \varPhi _{ij} = \exp \left( -\lambda d(\mathcal {T}_i,\mathcal {T}_j)\right) \end{aligned}$$
(6)
is based on the pairwise distance between two trajectories \(\mathcal {T}_i\) and \(\mathcal {T}_j\). \(\varPhi _{ij} = 1\) if the trajectories are the same and close to zero if the trajectories are very dissimilar. As in [28], we use \(\lambda = 0.1\).
To measure the distance between two trajectories \(\mathcal {T}_i\) and \(\mathcal {T}_j\), we measure the distance change of two vertex positions \(\mathbf {V}_i\) and \(\mathbf {V}_j\) within a fixed time interval. We set the length of the time interval proportional to the observed maximum displacement, i.e.
$$\begin{aligned} dt = 2 \max _{i,t} \Vert \mathbf {V}_{i,t}-\mathbf {V}_{i,t-1} \Vert _2. \end{aligned}$$
(7)
Since the trajectories are smooth due to the mesh tracking as described in Sect. 3.2, we do not have to deal with outliers and we can take the maximum displacement over all vertices. The object, however, might be deformed only at a certain time interval of the entire sequence. We are therefore only interested in the maximum distance change over all time intervals, i.e.
$$\begin{aligned} d^v(\mathcal {T}_i,\mathcal {T}_j) = \max _{t} \left| \Vert \mathbf {V}_{i,t}-\mathbf {V}_{j,t} \Vert _2 - \Vert \mathbf {V}_{i,t-dt}-\mathbf {V}_{j,t-dt} \Vert _2 \right| . \end{aligned}$$
(8)
This means that if two vertices belong to the same rigid part, the distance between them should not change much over time. In addition, we take the change of the angle between the vertex normals \(\mathbf {N}\) into account. This is measured in the same way as maximum over the intervals
$$\begin{aligned} d^n(\mathcal {T}_i,\mathcal {T}_j) = \max _{t} \left| \arccos \left( \mathbf {N}_{i,t}^T\mathbf {N}_{j,t}\right) - \arccos \left( \mathbf {N}_{i,t-dt}^T\mathbf {N}_{j,t-dt} \right) \right| . \end{aligned}$$
(9)
The two distance measures are combined by
$$\begin{aligned} d(\mathcal {T}_i,\mathcal {T}_j) = \left( 1 + d^n(\mathcal {T}_i,\mathcal {T}_j) \right) d^v(\mathcal {T}_i,\mathcal {T}_j). \end{aligned}$$
(10)
The distances are measured in mm and the angles in rad. Adding 1 to \(d^n\) was necessary since \(d^n\) can be close to zero despite of large displacement changes.
Based on (6), we build the normalized Laplacian graph [29]
$$\begin{aligned} \mathcal {L}= D^{-\frac{1}{2}} (D-\varPhi ) D^{-\frac{1}{2}} \end{aligned}$$
(11)
where D is an \(n \times n\) diagonal matrix with
$$\begin{aligned} D_{ii} = \sum _j \varPhi _{ij} \end{aligned}$$
(12)
and perform eigenvalue decomposition of \(\mathcal {L}\) to get the eigenvalues \(\lambda _1,\dots ,\lambda _{n}\), (\(\lambda _1 \le \dots \le \lambda _{n}\)), as well as the corresponding eigenvectors \(\mathbf{v}_1,\dots ,\mathbf{v}_{n}\). The number of clusters k is determined by the number of eigenvalues below a threshold \(\lambda _{thresh}\) and the final clustering of the trajectories is then obtained by k-means clustering [29] on the rows of the \(n \times k\) matrix \(\mathcal {F} = [\mathbf{v}_1~ \dots ~ \mathbf{v}_k]\).
In practice, we sample uniformly 1000 vertices from the mesh to compute the affinity matrix. This turned out to be sufficient while reducing the time to compute the matrix. For each vertex that has not been sampled, we compute the closest sampled vertex on the mesh and assign it to the same cluster. This results in a motion segmentation of the entire mesh as shown in Fig. 2b.
4.2 Kinematic Topology
Given the segmented mesh, it remains to determine the joint positions and topology of the skeleton. To obtain a bone structure, we first skeletonize the mesh by extracting the mean curvature skeleton (MCS) based on the mean curvature flow [30] that captures effectively the topology of the mesh by iteratively contracting the triangulated surface. The red 3D curve in Fig. 2c shows the mean curvature skeleton for an object. In order to localize the joints, we compute the intersecting boundary of two connected mesh segments using a half-edge representation. For each intersecting pair of segments, we compute the centroid of the boundary vertices and find its closest 3D point on the mean curvature skeleton. In this way, the joints are guaranteed to lie inside the mesh. In order to create the skeleton structure with bones, we first create auxiliary joints without any degree of freedom at the points where the mean curvature skeleton branches or ends as shown in Fig. 2c. After all 3D joints on the skeleton are determined, we follow the mean curvature skeleton and connect the detected joints accordingly to build a hierarchy of bones that defines the topology of a skeleton structure.
Although the number of auxiliary joints usually does not matter, we reduce the number of auxiliary joints and irrelevant bones by removing bones that link an endpoint with another auxiliary joint if they belong to the same motion segment. The corresponding motion segment for each joint can be directly computed from the mean curvature flow [30]. We finally ensure that each bone is inside the mesh. To this end, we detect bones colliding with the mesh with a collision detection approach based on bounding volume hierarchies. We then subdivide each colliding bone in two bones by adding an additional auxiliary joint at the middle of the mean curvature skeleton that connects the endpoints of the colliding bone. The process is repeated until all bones are inside the mesh. In our experiments, however, one iteration was enough. This procedure defines the refined topology of the skeleton that is already embedded in the mesh. The skinning weights are then computed as in [3].
As a result, we obtain a fully rigged model consisting of a watertight mesh, an embedded skeleton structure, and skinning weights. The entire steps of the approach are summarized in Algorithm 1. Results for a few objects are shown in Fig. 5.