Toward accurate real-time marker labeling for live optical motion capture

Xia, Shihong; Su, Le; Fei, Xinyu; Wang, Han

doi:10.1007/s00371-017-1400-y

Toward accurate real-time marker labeling for live optical motion capture

Original Article
Open access
Published: 15 May 2017

Volume 33, pages 993–1003, (2017)
Cite this article

Download PDF

You have full access to this open access article

The Visual Computer Aims and scope Submit manuscript

Toward accurate real-time marker labeling for live optical motion capture

Download PDF

Shihong Xia¹,
Le Su^1,2,
Xinyu Fei^1,2 &
…
Han Wang^1,2

3219 Accesses
6 Citations
Explore all metrics

Abstract

Marker labeling plays an important role in optical motion capture pipeline especially in real-time applications; however, the accuracy of online marker labeling is still unclear. This paper presents a novel accurate real-time online marker labeling algorithm for simultaneously dealing with missing and ghost markers. We first introduce a soft graph matching model that automatically labels the markers by using Hungarian algorithm for finding the global optimal matching. The key idea is to formulate the problem in a combinatorial optimization framework. The objective function minimizes the matching cost, which simultaneously measures the difference of markers in the model and data graphs as well as their local geometrical structures consisting of edge constraints. To achieve high subsequent marker labeling accuracy, which may be influenced by limb occlusions or self-occlusions, we also propose an online high-quality full-body pose reconstruction process to estimate the positions of missing markers. We demonstrate the power of our approach by capturing a wide range of human movements and achieve the state-of-the-art accuracy by comparing against alternative methods and commercial system like VICON.

DeMoCap: Low-Cost Marker-Based Motion Capture

Article 15 October 2021

Fusion-Based Approach to Enhance Markerless Motion Capture Accuracy for On-Site Analysis

A New Hierarchical Method for Markerless Human Pose Estimation

Find the latest articles, discoveries, and news in related topics.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Motion capture technology has been widely used to create natural human animations in real-time live applications as virtual training, virtual prototyping, computer games and computer animated puppetry [1]. Passive optical motion capture system, like VICON [2], is used in most applications because of its high precision and low intrusion. However, it only records 3D markers’ positions without any physical meaning (unlabeled). In addition, markers may often disappear and/or re-appear during the motion sequence due to limbs occlusion or self-occlusion, which makes marker labeling task for a live motion capture sequence be a big challenge.

The goal of practical marker labeling task is to (1) solve the correspondences problem for moving markers while (2) provide a solution to deal with missing and/or ghost markers which will lead to motion reconstruction ambiguities. Unlike marker labeling method in offline manner [3], we mainly aim to achieve the second goal, especially when both accuracy and efficiency need to be considered in real-time live applications [4] and interactive applications [5].

In this paper, we present a novel online marker labeling approach based on graph matching model and human pose reconstruction process to produce accurate and efficient marker labeling results for real-time live applications with missing/ghost markers, as illustrated in Fig. 1. Specifically, by regarding labeled markers at previous frame and unlabeled markers at current frame as model graph and data graph, respectively, we formulate marker labeling problem as soft graph matching, which is an essential combinatorial optimization problem solved by Hungarian algorithm to achieve high efficiency. In order to achieve high labeling accuracy, we also design a nonlinear optimization process to estimate the positions of missing markers.

We demonstrate the power of our approach by comparing against alternative state-of-the-art methods and commercial system as VICON on a wide range of motion capture data with missing/ghost markers. First, we show our outperformed accuracy and efficiency on single subject motion sequences and two interactive subjects motion sequences (Sect. 5.1). Then, we show our outperformed pose reconstruction accuracy on single subject motion sequences (Sect. 5.2). Finally, we show the accurate marker labeling results to demonstrate the capability of handling with ghost markers as well as facial motions with none rigid constraints (Sect. 5.3). Due to page limitation, please see the supplementary video for more evaluation results. Note that, since we only focus on correctly solving the marker labeling problem not motion denoising [4] or missing marker estimation [6, 7] problem, we only show the evaluations compared against alternative marker labeling methods.

In summary, our main contributions are as follows: (1) A novel accurate and efficient marker labeling process in real-time live manner; (2) A soft graph matching model that automatically labels the markers in successive frames by using Hungarian algorithm for finding the global optimal matching solution.

2 Related work

Our online marker labeling method is related to point correspondence and graph matching methods.

2.1 Point correspondence

Yu [8] proposed online tracking framework for multiple interacting subjects by constructing a motion model to find best marker correspondences, which is a greedy algorithm leading to local optimum and it must be at least two visible markers on the same limb. Similar to [8], we also use the tracking framework, but we introduce a soft graph matching model instead of using the example data [8] to improve the labeling accuracy.

Li [9] proposed a self-initializing identification labeling method on each segment for establishing local segmental correspondences. Li [10] designed a similarity k-d tree to identify markers from similar poses of two objects, but cannot deal with missing data. Li [11] integrated key-frame-based self-initializing hierarchical segmental matching [9] with inter-frame tracking to label the articulated motion sequence presented by feature points which is an offline approach.

Mundermann Articulated-ICP algorithm with soft-joint constraints [12] is used to track limbs from dense images. In our case, full-body and facial motions are represented by 3D sparse points known as markers attached on the subjects’ skin. When markers are missing, which happens frequently during motion capture process, it is almost impossible to use ICP-based methods [13, 14] to find the marker correspondences in successive frames. Others [15, 16] formulate dense points into lines, curves or surfaces to get non-rigid transformations. The necessary spatial data continuity is again not available in the case of sparse points [11].

Probabilistic inference with points’ topology is used to find point correspondences for 2D non-rigid points [17, 18] and 3D dense surface points [19, 20]. Different from them, we propose the soft graph matching model with discrete combinatorial optimization algorithm to find 3D sparse marker correspondences in online manner by solving a problem.

2.2 Graph matching

Graph matching plays a central role in solving correspondence problem. According to whether the graph edges are taken into account or not, graph matching can be divided into two categories: unary graph matching and binary graph matching.

Unary graph matching treats each node independently, discarding the relationships between nodes. The model was used by Veenman [21] to solve point tracking problem. They proposed an adaptable framework which can be used in conjunction with a variety of cost functions [22, 23], which can be solved by Hungarian algorithm optimization [24].

Binary graph matching considers both node and edge attributes. The problem is non-polynomial and a lot of effort has been made in finding good approximate solutions [25]. Probably the fastest approximation solution to the problem is presented in [26], the authors present an efficient spectral method. After relaxing the mapping constraints and the integral constraints, the principle eigenvector of matching cost matrix is interpreted as the confidence of assignments. The assignment with the maximum confidence and consistent with the constraints is accepted as a correct assignment. But as stated in [27, 28], the correspondence accuracy of the method is not very satisfactory. Torresani [28] apply “dual decomposition” approach that decomposes the original problem into simpler sub-problems, which are repeatedly solved independently and combined into a global solution. The author claims that it is the first technique capable of reaching global optimality on various real-world image matching problems and outperforms existing graph matching algorithms. In fact, their method needs several seconds to process a picture with 30 nodes.

Existing graph matching methods cannot solve our problem with both accuracy and efficiency. Unary graph matching is efficient but inaccurate due to the neglect of edge constraints. Binary graph matching is more accurate by taking both motion smoothness and edge constraints into account. But it is too complex to get the optimal solution in real time. In this paper, we take both advantages of unary and binary graph model and present soft graph matching model by merging the matching cost of local geometrical structure consisting of edges into the matching cost of graph nodes to achieve high accuracy and efficiency simultaneously.

3 Soft graph matching

We define labeled marker set at previous frame as Model Graph represented by $G_{1}=(V_{1},E_{1})$, then unlabeled marker set at current frame as Data Graph represented by $G_{2}=(V_{2},E_{2})$, respectively. $V_{1}=\{\mathbf {m_{i}}:i=1,\ldots ,M\}$ and $V_{2}=\{\mathbf {u_{j}}:i=1,\ldots ,N\}$ are node (markers) sets. $E_{1}$ and $E_{2}$ are edge sets. $\mathbf {m_{i}}$ and $\mathbf {u_{i}}$ are labeled and unlabeled 3D marker positions, respectively. We connect two markers by an edge $\mathbf {m_{i}}\mathbf {m_{j}}$ if they are neighbors on the same limb and their relative position keeps fixed over time, as we call this local rigid constraints.

We assume the number of markers in $V_{1}$ and $V_{2}$ is the same, i.e., $M=N$. In the case of $M \ne N$ which is caused by missing markers and ghost markers, we will add dummy markers to $V_{1}$ and $V_{2}$ to make the condition holds. The matching cost related to the dummy markers is set to the maximum cost $w_{max}$.

Let $\{\phi _{ij}: i=1,\ldots ,M; j=1,\ldots ,N\}$ denote all possible matches between model and data graph, and $\{c_{ij}, i=1,\ldots ,M; j=1,\ldots ,N\}$ be their matching cost. We assume L as the correct label which is a set of marker match. $x_{ij}$ is indicator variable, equals to 1, if $\phi _{ij} \in L$, and to 0 otherwise.

Our method considers the marker and its local geometrical structure simultaneously. The edges starting from a marker make up its local geometrical structure. Thus, marker labeling problem can be formulated as the following soft graph matching.

$$\begin{aligned}&\min _{\mathbf {x}}\;cost(\mathbf {x})=\sum _{a}[\omega _{p}c_{p}(a)x_{a} + (1-\omega _{p})c_{lg}(a)x_{a}]\end{aligned}$$

(1)

$$\begin{aligned}&c_{p}(a) = \Vert \mathbf {m_{1}}-\mathbf {u_{1}}\Vert ^{2} \end{aligned}$$

(2)

where $c_{p}$ and $c_{lg}$ are matching cost of point and its local geometrical structure, respectively, $\omega _{p}$ is weight, and a is any possible match (i, j). We use the classic Hungarian algorithm to solve above combinatorial optimization problem. In our experiment, we set $\omega _{p}=0.5$.

Figure 2 explains how to calculate the matching cost of marker correspondence and its local geometrical structure. Let $\phi _{a}$ denote the correspondence $\mathbf {u_{1}}$ to $\mathbf {m_{1}}$. The cost of point match caused by $\phi _{a}$ is defined as the following spatial distance.

Table 1 The matching cost of edges related to $\phi _{a}$

Full size table

As we can see in Fig. 2, $\mathbf {m_{1}}$ and $\mathbf {m_{2}}$ are connected by an edge $\mathbf {m_{1}m_{2}}$. If $\mathbf {u_{1}}$ is matched to $\mathbf {m_{1}}$ and $\mathbf {u_{3}}$ is matched to $\mathbf {m_{2}}$, the relative position between $\mathbf {u_{1}}$ and $\mathbf {u_{3}}$ must meet edge constraint of $\mathbf {m_{1}}$ and $\mathbf {m_{2}}$. We use $c_{e}(\mathbf {m_{i}m_{j}},\mathbf {u_{i'}u_{j'}})$ to denote the matching cost of edge. Let $\mathbf {m_{j}}$ denote the markers connected with $\mathbf {m_{1}}$ and $\mathbf {u_{j'}}$ be the candidate assignment of $\mathbf {m_{j}}$, the local geometrical matching cost of $\phi _{a}$ is defined as:

$$\begin{aligned} c_{lg}(a) = \frac{1}{|\mathbf {m_{j}}|} \sum _{\mathbf {m_{j}}}\min _{\mathbf {u_{j'}}}c_{e}\left( \mathbf {m_{1}m_{j}},\mathbf {u_{1}u_{j'}}\right) \end{aligned}$$

(3)

where $|\mathbf {m_{j}}|$ is the number of markers connected with $\mathbf {m_{1}}$ and $c_{e}(\mathbf {m_{1}m_{j}},\mathbf {u_{1}u_{j'}})$ is defined as:

$$\begin{aligned} c_{e}(\mathbf {m_{1}m_{j}},\mathbf {u_{1}u_{j'}})= & {} (\Vert \mathbf {u_{1}} - \mathbf {u_{j'}} \Vert - d_{\mathbf {m_{1}m_{j}}})^{2} \nonumber \\&+ \omega _{a} \left( 1 - \frac{(\mathbf {m_{1}} - \mathbf {m_{j}})\cdot (\mathbf {u_{1}} - \mathbf {u_{j'}})}{\Vert \mathbf {m_{1}} - \mathbf {m_{j}}\Vert \Vert \mathbf {u_{1}} - \mathbf {u_{j'}}\Vert }\right) ^{2}\nonumber \\ \end{aligned}$$

(4)

where $d_{\mathbf {m_{1}m_{j}}}$ is the distance between $\mathbf {m_{1}}$ and $\mathbf {m_{j}}$ which is obtained from the previous frame and is updated over time. The 1st term in Eq. 4 measures the difference of length between two edges and the 2nd term is the difference of their direction. The inconsistency between the unit of length and the unit of cosine angle is compensated by $\omega _{a}$. In our experiment, we set $\omega _{a}=1\hbox {e}4$. The matching cost of different edges is then averaged to form the local geometrical matching cost of $\phi _{a}$.

Let $\phi _{a} = \phi _{ij}$, our soft graph matching model is defined as follows:

$$\begin{aligned} \min _{\mathbf {x}}&\quad cost(\mathbf {x})=\sum _{i}\sum _{j}w_{ij}x_{ij}, \quad \text {where} \nonumber \\ w_{ij}&= \left\{ \begin{array}{ll} \omega _{p}c_{p}(a) + (1-\omega _{p})c_{lg}(a) , &{} \; j \in b(i) \\ w_\mathrm{max}, &{} \; j \notin b(i) \end{array} \right. \nonumber \\ \hbox {s.t.}&\quad x_{ij}\in \{0,1\} \nonumber \\&\quad \sum _{i}^{}x_{ij}=1, \text {for all}\, j, \quad \sum _{j}^{}x_{ij} = 1, \text {for all}\,i \end{aligned}$$

(5)

where $w_\mathrm{max}$ is an experimentally defined maximum cost, and candidate match b(i) of marker i are selected as a set of markers the distance between which and the predicted marker using Kalman filter [30] is less than a specific threshold. We use the Hungarian algorithm to find the best matching, which is super-fast as the calculation of matching cost is only done on the selected candidate assignments for each marker.

As our marker labeling method is for real-time live applications, we use a simple method to automatically label all markers at the 1st frame. Specifically, first, we instruct the subjects to perform their motions starting from T-pose with all markers visible. Then, based on the prior knowledge of current subject’s skeleton T-pose model and marker offsets relative to the inboard joints, we perform a nonlinear optimization process to fit the model into the captured markers at 1st frame by minimizing the distances between the markers on the model and captured 1st frame.

4 Missing marker estimation

Motion capture raw data often contains missing markers due to limb occlusions and self-occlusions, which will lead to low accuracy in marker labeling process. Here we propose a nonlinear optimization process to solve the problem. First, we reconstruct the current pose using Inverse Kinematics technique. Then, we use the reconstructed pose and edge constraint to estimate the position of occluded markers.

We define human body pose using a set of independent joint coordinates $\mathbf {\theta \in R^{42}}$, including absolute root position and orientation as well as the relative joint angles of individual joints. These bones are head (1 Dof), neck (2 Dof), lower back (3 Dof), and left/right shoulders (2 Dof), arms (3 Dof), forearms (1 Dof), hands (3 Dof), upper legs (3 Dof), lower legs (1 Dof), and feet (2 Dof).

We reconstruct current frame pose $\mathbf {\theta ^{t}}$ by minimizing an objective function consisting of four terms:

$$\begin{aligned} \min _{\mathbf {\theta ^{t}}}&\quad&\lambda _{1}E_{O} + \lambda _{2}E_{P} + \lambda _{3}E_{S} + \lambda _{4}E_{C} \end{aligned}$$

(6)

where $E_{O}$, $E_{P}$, $E_{S}$ and $E_{C}$ represent the observed term, predicted term, smoothness term and constraint term, respectively. The weights $\lambda _{1}$, $\lambda _{2}$, $\lambda _{3}$ and $\lambda _{4}$ control the importance of each term and experimentally set to 0.05, 0.15, 0.8 and 0.1, respectively. We describe details of each term as follows.

The observed term measures the distance between the labeled observed markers and corresponding markers from reconstructed pose:

$$\begin{aligned} E_{O}=\sum _{i=1}^{M}[(1-o_{i}^{t})(e_{i}(\mathbf {\theta ^{t}}) - \mathbf {m_{i}^{t}})^{2}] \end{aligned}$$

(7)

where $e_{i}(\mathbf {\theta ^{t}})$ is the forward kinematics function that computes ith marker position with the prior knowledge of the user’s skeleton, $\mathbf {s_{v}}$, and markers’ offsets, $\mathbf {l_{v}}$, relative to the inboard joint. $o_{i}^{t}$ is a binary weight, and equals to 0 if ith marker is occluded, and to 1 otherwise. Reconstructing the motion sequence from only this constraint is the same as performing per-frame inverse kinematics as in [29].

The predicted term According to the Kalman filter [30], we can get a probabilistic distribution of the 3D position of the occluded markers. Suppose that $x_{i}^{t-1}$ is the hidden state vector as the 3D position and velocity of marker i, $y_{i}^{t}$ is the measurement vector as the captured position or estimated position (when the marker is occluded) of the same marker. The reconstructed pose should maximize the conditional distribution $\mathbf {y_{i}^{t}} | \mathbf {x_{i}^{t-1}}$ , which is a normal probability distribution

$$\begin{aligned} P(\mathbf {y_{i}^{t}} | \mathbf {x_{i}^{t-1}}) = \frac{\exp \left( -\frac{1}{2}(\mathbf {y_{i}^{t}} - \mathbf {\mu _{i}^{t}}\right) ^{T} (\varGamma _{i}^{t}) ^{-1} (\mathbf {y_{i}^{t}} - \mathbf {\mu _{i}^{t}}))}{(2\pi )^{\frac{d}{2}}|\varGamma _{i}^{t}|^{\frac{1}{2}}} \end{aligned}$$

(8)

with the mean and variance

$$\begin{aligned} \mathbf {\mu _{i}^{t}}=\mathbf {v_{i}^{t}}, \quad \varGamma _{i}^{t}= H_{i}^{T}H_{i}\varLambda +\varSigma \end{aligned}$$

(9)

where d is the dimension of $\mathbf {y_{i}^{t}}$, $|\varGamma _{i}^{t}|$ is the determinant of the covariance matrix $\varGamma _{i}^{t}$, $H_{i}$ is the measurement matrix which relates the hidden state $x_{i}^{t}$ to the measurement $y_{i}^{t}$, $\varLambda $ and $\varSigma $ are the process noise covariance and the measurement noise covariance, respectively.

We minimize the negative log of $P(\mathbf {y_{i}^{t}} | \mathbf {x_{i}^{t-1}})$, yielding the formulation:

$$\begin{aligned} E_{P}=\sum _{i=1}^{M}[o_{i}^{t}(e_{i}(\mathbf {\theta ^{t}}) - \mathbf {v_{i}^{t}})^{T} (\varGamma _{i}^{t}) ^{-1} (e_{i}(\mathbf {\theta ^{t}}) - \mathbf {v_{i}^{t}})] \end{aligned}$$

(10)

The smoothness term is used to enforce temporal smoothness by penalizing the velocity change between current reconstructed pose $\mathbf {\theta ^{t}}$ and two previous ones $[\mathbf {\theta ^{t-1}}, \mathbf {\theta ^{t-2}}]$ through time:

$$\begin{aligned} E_{S} = \Vert \mathbf {\theta ^{t}} - 2\mathbf {\theta ^{t-1}} \ + \mathbf {\theta ^{t-2}}\Vert ^{2} \end{aligned}$$

(11)

The constraint term is used to prevent the pose from reaching an impossible posture by over bending the joints. We limit the joint angles by following equation:

$$\begin{aligned} E_{C}=\sum _{\mathbf {\theta _{i}^{t} \in \theta ^{t}}}[\underline{\beta }(i)(\mathbf {\theta _{i}^{t}} -\mathbf {\underline{\theta _{i}}})^2 + \overline{\beta }(i)(\mathbf {\theta _{i}^{t}} -\mathbf {\overline{\theta _{i}}})^2] \end{aligned}$$

(12)

where each body joint is associated with conservative bounds [$\mathbf {{\underline{\theta _{i}}}, \overline{\theta _{i}}}$]. For the bounds, we use the values measured by the biomechanical literature [31]. $\underline{\beta }(i)$ and $\overline{\beta }(i)$ are indicator functions. $\underline{\beta }(i)$ evaluates to 1 if $\mathbf {\theta _{i}^{t}<\underline{\theta _{i}}}$, and to 0 otherwise. $\overline{\beta }(i)$ is equal to 1 if $\mathbf {\theta _{i}^{t}>\overline{\theta }_{i}}$, and to 0 otherwise.

We use Quasi-Newton BFGS optimization [32] to solve the optimization problem in Eq. 6. We initialize the pose reconstruction process without the smoothness term for the 1st frame. Each frame takes 3–5 iterations to converge for most cases.

The pose reconstruction process keeps the motion tendency of the missing markers by maintaining rigid body constrains, so we can estimate missing markers from the reconstructed pose. Specifically, by assuming the relative position of two markers ($\mathbf {m_{j}^{t}, m_{i}^{t}}$), on a same limb, which we call neighbor markers, and they are fixed at any time during the motion sequence, we can get the missing markers from the reconstructed pose:

$$\begin{aligned} \mathbf {m_{i}^{t}}=\frac{1}{|j|}\sum _{j}[\mathbf {m_{j}^{t}}-(e_{j}(\mathbf {\theta ^{t}}) - e_{i}(\mathbf {\theta ^{t}}))] \end{aligned}$$

(13)

where j is the neighbor marker of i that is visible at current frame.

When one marker and most of its neighbors are missing at the same time, we use an iterative scheme to recover the missing markers. First, we recover the missing marker whose neighbors are visible, and then use the recovered marker to estimate other occluded markers. If all markers on a same limb are missing at the same time, we directly use the virtual markers on the reconstructed pose as the recovered ones.

5 Experimental results

We demonstrate the power of our approach by comparing against alternative state-of-the-art methods and commercial system as VICON on a wide range of motion capture data. First, we show our outperformed accuracy and efficiency on single and double interactive subjects motion sequences (Sect. 5.1). Then, we show our outperformed pose reconstruction accuracy on single subject motion sequences (Sect. 5.2). Finally, we show the accurate marker labeling results to demonstrate the capability of handling with ghost markers as well as facial motions with none rigid constraints (Sect. 5.3). All of the tests are done on a 4-core 2.4 GHz CPU with 2 GB RAM. We use the labeled markers at the 1st frame to initialize Kalman filters and the relative positions between markers on a same limb. We use identification rate of marker trajectories $\zeta $, which is defined as the ratio of the number of correctly labeled marker trajectories with respect to total number of trajectories, to represent the marker labeling accuracy.

Table 2 Efficiency (fps) of different labeling methods

Full size table

5.1 Performance on CMU motion capture data

We compare our method against alternative methods: Yu [8] (YLD), the closest point based approach (CP), binary graph matching [26] (LH) and original unary graph matching (UGM). The CP approach assumes that correct correspondence is the closest point in the next frame. As for binary and unary graph matching, the definition of matching cost of point and edge correspondence takes the form of Eqs. 2 and 4, respectively. We test on 665 CMU MoCap data [33] (totally 816,000 frames) including walk, run, jump, kick, punch, roll, dance, skateboard, basketball, etc. The original data are captured at 120 fps. All the data are classified according to the embedded noise level that is defined as:

$$\begin{aligned} \eta = \max _{t,i,j} \frac{\Vert \mathbf {m_{i}^{t}} - \mathbf {m_{j}^{t}}\Vert - d_{ij}}{d_{ij}}. \end{aligned}$$

(14)

Table 3 Efficiency (fps) comparison on single subject

Full size table

The results are shown in Fig. 3 and Table 2. At high capture rate (120 fps), the 3D position of each marker won’t change much, so the result of UGM is as good as ours. But due to limited computing power, lower capture rate such as 60 or 30 fps is commonly used in practical applications. So we compared our method against UGM at capture rate 60, 45, 30 and 25 fps, and get the outperformed accuracy, shown in Fig. 4. The efficiency testing result is shown in Table 3.

We also test our approach on motion capture data of multiple interacting characters. By adding markers into model graph, our method can naturally be expanded to multiple subjects. The outperformed accuracy of our method compared against alternative methods is shown in Fig. 5. And the efficiency testing result is shown in Table 4.

5.2 Application: online human motion reconstruction

Based on our online marker labeling and pose reconstruction algorithm, we proposed an online motion reconstruction system. The motion capture system we used is Vicon T-series system with 12 cameras. Our system takes unlabeled 3D marker positions as input and produces reconstructed poses in real-time online manner. We compare the resulting animation of our method against Vicon and alternative labeling methods: YLD, UGM, LH and CP. The comparison results are best viewed in the supplementary video, although we show several examples in Figs. 6 and 7.

5.3 Discussion

In case of noise motion capture data with capture rate at 120 fps, the displacement of marker in successive frames is very small, the labeling accuracy of our method is obviously better than alternative methods (CP, LH, YLD), and almost equal to UGM. That is because as in CP and LH and YLD methods, as the noise level within motion capture data increasing, the rigidity of the edges cannot be kept anymore. However, UGM method will decay as the decrease of the motion capture rates because it only considers the smoothness of the marker’s trajectory. This indicates that integrated use of the soft graph matching model and the missing marker estimation scheme is helpful to resume the identification after the loss of most tracking.

Ghost markers The original motion capture data contained noise markers. We randomly generate more noise markers as ghost markers. We first specify a number ($\alpha $) of ghost markers and then randomly generate the positions of ghost markers as well as the appearing time. The ghost markers are generated in two different ways, Fig. 8. In the first way, ghost markers are directly generated according to original noise markers positions. In the second way, ghost markers are generated according to extra noise markers, which are sampled from the original noise marker positions with Gaussian noise N(0, 2)(cm). We test our marker labeling method on 500 randomly generated motions in both ways. And we find that even when $\alpha =|M|$, the number of total wrong labeled markers is still less than 10. These results demonstrate the capability of rejecting a large number of ghost markers.

Facial marker labeling Unlike human body, there are no limbs in human face. As a result, local rigid constraints are invalid in facial motions as the relative distances between markers may change a lot along with the facial muscle and skin. So we only use the motion smoothness constraint to estimate the matching cost for different marker correspondences (the 2nd term in Eq. 4). To correctly label all markers at the 1st frame, similar to full-body cases, we instruct subject to perform facial motions starting from “normal” expression. Figure 9 indicates the power of our method for accurate facial marker labeling applications.

Table 4 Efficiency (fps) comparison on double subjects

Full size table

6 Conclusion

In this paper, we present a new online marker labeling method for optical motion capture, which can be used for building real-time live applications. Experimental results demonstrate that the marker labeling accuracy of our method outperforms the state-of-the-art marker labeling methods especially in the cases of missing/ghost markers and low-frequency capture rates. It benefits from the integrated use of the proposed soft graph matching model and the marker estimation scheme which simultaneously considers the local geometrical structure and full pose. Although the marker labeling efficiency of our method is not the best (as rand 4 of 5 methods) due to the use of pose reconstruction process, it is still sufficient for real-time live applications.

6.1 Limitation and future work

The performance of our method becomes worse as the motion capture rate is decreased. The main reason is that current setting of the empirical weights is not optimal. In fact, when the MoCap data are down-sampled or the markers are previously occluded, the weight of the point correspondence cost should be decreased and when a marker violates rigidity constraint a lot (i.e., attached to soft tissues), the weight of the edge correspondence cost should be decreased. We plan to set an automatic or experimental scheme finding the optimal weights being suitable for various kinds of motion in future. Also, we plan to explore a robust method automatically detecting labeling failures and reinitializing the labeling process.

Aiming to identify markers for live real-time applications, our method is easily extended to multi-actor interaction motions. Unfortunately, the estimated marker is not always accurate, especially when the markers on an arm or a leg are all occluded for a long period of time. Inaccurate estimation of missing markers may lead to the deterioration of the labeling algorithm, especially when the missing marker re-appears. As for multiple interacting characters, quality of the reconstructed motion cannot be guaranteed when the interaction becomes more intensive (for example, two people are holding each other while rolling on the ground). We plan to do further study on data-driven approach: First, when calculating Eq. 3, we implicitly assume the correspondence of the neighbor, but the label result may conflict with this assumption. This could be improved by using example data. Then, we would like to explore how to construct a reasonable statistical model from example database so that better predictions could be derived for occluded markers. Finally, reconstructing the movement of multiple intensively interacting characters is another problem worth study.

References

Xia, S., Gao, L., Lai, Y-K., Yuan, M-Z., Chai, J.: A survey on human performance capture and animation. J. Comput. Sci. Technol. 32(3), 536–554 (2017)
https://www.vicon.com/ (2017)
Akhter, I., Simon, T., Khan, S., Matthews, I., Sheikh, Y.: Bilinear spatiotemporal basis models. ACM Trans. Graph. 31(2), 17:1–17:12 (2012)
Article Google Scholar
Lou, H., Chai, J.: Example-based human motion denoising. IEEE Trans. Vis. Comput. Graph. 16(5), 870–8792 (2010)
Article Google Scholar
Nguyen, N., Wheatland, N., Brown, D., Parise, B., Liu, C.K., Zordan, V.: Performance capture with physical interaction. In: The ACM SIGGRAPH/Eurographics Symposium on Computer Animation, SCA 2010, pp. 189–195 (2010)
Aristidou, A., Lasenby, J.: Real-time marker prediction and CoR estimation in optical motion capture. Vis. Comput. 29(1), 7–26 (2013). doi:10.1007/s00371-011-0671-y
Burke, M., Lasenby, J.: Estimating missing marker positions using low dimensional Kalman smoothing. J. Biomech. 49, 1854–1858 (2016)
Article Google Scholar
Yu, Q., Li, Q., Deng, Z.: Online motion capture marker labeling for multiple interacting articulated targets. Comput. Graph. Forum 26(3), 477–483 (2007)
Article Google Scholar
Li, B., Meng, Q., Holstein, H.: Articulated pose identification with sparse point features. IEEE Trans. Syst. Man Cybern. Part B Cybern. 34(3), 1412–1422 (2004)
Article Google Scholar
Li, B., Meng, Q., Holstein, H.: Similarity K-d tree method for sparse point pattern matching with underlying non-rigidity. Pattern Recogn. 38(12), 2391–2399 (2005)
Article Google Scholar
Li, B., Meng, Q., Holstein, H.: Articulated motion reconstruction from feature points. Pattern Recognit. 41(1), 418–431 (2008)
Article MATH Google Scholar
Mundermann, L., Corazza, S., Andriacchi, T.P.: Accurately measuring human movement using articulated ICP with soft-joint constraints and a repository of articulated models. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–6 (2007)
Besl, P.J., McKay, H.D.: A method for registration of 3-D shapes. IEEE Trans. Pattern Anal. Mach. Intell. 14(2), 239–256 (1992)
Article Google Scholar
Zhang, Z.: Iterative point matching for registration of free-form curves and surfaces. Int. J. Comput. Vis. 13(2), 119–152 (1994)
Article Google Scholar
Maintz, J.B., Viergever, M.A.: A survey of medical image registration. Med. Image Anal. 2(1), 1–36 (1998)
Article Google Scholar
Chui, H., Rangarajan, A.: A new point matching algorithm for non-rigid registration. Comput. Vis. Image Underst. 89(2-3), 114–141 (2003)
Article MATH Google Scholar
Lee, J.-H., Won, C.-H.: Topology preserving relaxation labeling for nonrigid point matching. IEEE Trans. Pattern Anal. Mach. Intell. 33(2), 427–432 (2011)
Article Google Scholar
Zheng, Y., Doermann, D.: Robust point matching for nonrigid shapes by preserving local neighborhood structures. IEEE Trans. Pattern Anal. Mach. Intell. 28(4), 643–649 (2006)
Article Google Scholar
Starck, J., Hilton, A.: Correspondence labelling for wide-timeframe free-form surface matching. In: Proceedings of IEEE International Conference on Computer Vision (2007)
Sahillioglu, Y., Yemez, Y.: 3D shape correspondence by isometry-driven greedy optimization. In: Proceedings of CVPR, 2010, pp. 453–458 (2010)
Veenman, C.J., Reinders, M.J.T., Backer, E.: Resolving motion correspondence for densely moving points. IEEE Trans. Pattern Anal. Mach. Intell. 23(1), 54–72 (2001)
Article Google Scholar
Sethi, I.K., Ramesh, J.: Finding trajectories of feature points in a monocular image sequence. IEEE Trans. Pattern Anal. Mach. Intell. 9(1), 56–73 (1987)
Article Google Scholar
Rangarajan, K., Shah, M.: Establishing motion correspondence. CVGIP Image Underst. 54(1), 56–73 (1991)
Article MATH Google Scholar
Kuhn, H.W.: The Hungarian method for the assignment problem. Naval Res. Log. Q. 2(1–2), 83–97 (1955)
Article MathSciNet MATH Google Scholar
Conte, D., Foggia, P., Sansone, C., Vento, M.: Thirty years of graph matching in pattern recognition. Int. J. Pattern Recognit. Artif. Intell. 18(3), 265–298 (2004)
Article Google Scholar
Leordeanu, M., Hebert, M.: A spectral technique for correspondence problems using pairwise constraints. In: Tenth IEEE International Conference on Computer Vision, 2005. ICCV 2005, vol. 2, pp. 1482–1489 (2005)
Cour, T., Srinivasan, P., Shi, J.: Balanced graph matching. In: Proceedings of NIPS, pp. 313–320 (2006)
Torresani, L., Kolmogorov, V., Rother, C.: Feature correspondence via graph matching: models and global optimization. In: 10th European Conference on Computer Vision, pp. 596–609 (ECCV 2008)
Zhao, J., Badler, N.I.: Inverse kinematics positioning using nonlinear programming for highly articulated figures. ACM Trans. Graph. 13(4), 313–336 (1994)
Article Google Scholar
Kalman, R.E.: A new approach to linear filtering and prediction problems. Trans. ASME J. Basic Eng. 82(Series D), 35–45 (1960)
Article Google Scholar
Boone, D.C., Azen, S.P.: Normal range of motion of joints in male subjects. J. Bone Joint Surg. 61(5), 756–759 (1994)
Article Google Scholar
Gill, P.E., Murray, W., Wright, M.H.: Practical Optimization. Academic Press, New York (1981)
MATH Google Scholar
CMU Mocap database. http://mocap.cs.cmu.edu/

Download references

Author information

Authors and Affiliations

Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Shihong Xia, Le Su, Xinyu Fei & Han Wang
University of Chinese Academy of Sciences, Beijing, China
Le Su, Xinyu Fei & Han Wang

Authors

Shihong Xia
View author publications
You can also search for this author in PubMed Google Scholar
Le Su
View author publications
You can also search for this author in PubMed Google Scholar
Xinyu Fei
View author publications
You can also search for this author in PubMed Google Scholar
Han Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shihong Xia.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (mp4 31416 KB)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Xia, S., Su, L., Fei, X. et al. Toward accurate real-time marker labeling for live optical motion capture. Vis Comput 33, 993–1003 (2017). https://doi.org/10.1007/s00371-017-1400-y

Download citation

Published: 15 May 2017
Issue Date: June 2017
DOI: https://doi.org/10.1007/s00371-017-1400-y

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Toward accurate real-time marker labeling for live optical motion capture

Abstract

Similar content being viewed by others

DeMoCap: Low-Cost Marker-Based Motion Capture

Fusion-Based Approach to Enhance Markerless Motion Capture Accuracy for On-Site Analysis

A New Hierarchical Method for Markerless Human Pose Estimation

1 Introduction