VolumeDeform: RealTime Volumetric Nonrigid Reconstruction
 84 Citations
 13k Downloads
Abstract
We present a novel approach for the reconstruction of dynamic geometric shapes using a single handheld consumergrade RGBD sensor at realtime rates. Our method builds up the scene model from scratch during the scanning process, thus it does not require a predefined shape template to start with. Geometry and motion are parameterized in a unified manner by a volumetric representation that encodes a distance field of the surface geometry as well as the nonrigid space deformation. Motion tracking is based on a set of extracted sparse color features in combination with a dense depth constraint. This enables accurate tracking and drastically reduces drift inherent to standard modeltodepth alignment. We cast finding the optimal deformation of space as a nonlinear regularized variational optimization problem by enforcing local smoothness and proximity to the input constraints. The problem is tackled in realtime at the camera’s capture rate using a dataparallel flipflop optimization strategy. Our results demonstrate robust tracking even for fast motion and scenes that lack geometric features.
Keywords
Deformation Field Iterative Close Point Preconditioned Conjugate Gradient Space Deformation Iterative Close Point1 Introduction
While these approaches achieve impressive results on static environments, they do not reconstruct dynamic scene elements such as nonrigidly moving objects. However, the reconstruction of deformable objects is central to a wide range of applications, and also the focus of this work. In the past, a variety of methods for dense deformable geometry tracking from multiview camera systems [9] or a single RGBD camera, even in realtime [10], were proposed. Unfortunately, all these methods require a complete static shape template of the tracked scene to start with; they then deform the template over time.
Object type specific templates limit applicability in general scenes, and are often hard to construct in practice. Therefore, templatefree methods that jointly build up the shape model along with tracking its nonrigid deformations – from partial scans only – have been investigated [11, 12, 13, 14, 15, 16], but none of them achieves realtime performance.
Recently, a first method has been proposed that tackles the hard joint model reconstruction and tracking problem at realtime rates: DynamicFusion [17] reconstructs an implicit surface representation – similar to KinectFusion – of the tracked object, while jointly optimizing for the scene’s rigid and nonrigid motion based on a coarse warping field. Although the obtained results are impressive given the tight realtime constraint, we believe that this is not the end of the line. For instance, their depthonly modeltoframe tracking strategy cannot track tangential motion, since all color information is omitted. Without utilizing global features as anchor points, modeltoframe tracking is also prone to drift and error accumulation. In our work, we thus propose the use of sparse RGB feature matching to improve tracking robustness and to handle scenes with little geometric variation. In addition, we propose an alternative representation for the deformation warp field.
In our new algorithm, we perform nonrigid surface tracking to capture shape and deformations on a fine level of discretization instead of a coarse deformation graph. This is realized by combining asrigidaspossible (ARAP) volume regularization of the space embedding the surface [18] with automatically generated volumetric control lattices to abstract geometric complexity. The regular structure of the lattice allows us to define an efficient multiresolution approach for solving the underlying nonlinear optimization problem. Finally, we incorporate globallyconsistent sparse SIFT feature correspondences over the complete history of observed input frames to aid the alignment process. This minimizes the risk of drift, and enables stable tracking for fast motions.

a dense unified volumetric representation that encodes both the scene’s geometry and its motion at the same resolution,

the incorporation of global sparse SIFT correspondences into the alignment process (e.g., allowing for robust loop closures),

and a dataparallel optimization strategy that tackles the nonrigid alignment problem at realtime rates.
2 Related Work
Online Static Reconstruction: Methods for offline static 3D shape reconstruction from partial RGBD scans differ in the employed scene representation, such as pointbased representations [19, 20, 21] or meshes [22]. In the context of commodity range sensors, implicit surface representations became popular [23, 24, 25, 26] since they are able to efficiently regularize out noise from lowquality input data. Along with an appropriate surface representation, methods were developed that are able to reconstruct small scenes in real time [27, 28]. One prominent example for online static 3D scene reconstruction with a handheld commodity sensor is KinectFusion [1, 2]. A dense reconstruction is obtained based on a truncated signed distance field (TSDF) [23] that is updated at framerate, and modeltoframe tracking is performed using fast variants of the Iterative Closest Point (ICP) algorithm [29]. Recently, the scene representation has been extended to scale to larger reconstruction volumes [3, 4, 5, 6, 8, 30].
Nonrigid Deformation Tracking: One way to handle dynamics is by tracking nonrigid surface deformations over time. For instance, objects of certain types can be nonrigidly tracked using controlled multiRGB [31] or multidepth [32, 33] camera input. Templatebased methods for offline deformable shape tracking or performance capture of detailed deforming meshes [34, 35, 36, 37, 38, 39, 40, 41] were also proposed. Nonrigid structurefrommotion methods can capture dense deforming geometry from monocular RGB video [42]; however, results are very coarse and reconstruction is far from realtime. The necessity to compensate for nonrigid distortions in shape reconstruction from partial RGBD scans may also arise when static reconstruction is the goal. For instance, it is hard for humans to attain the exact same pose in multiple partial body scans. Human scanning methods address this by a nonrigid compensation of posture differences [11, 43, 44], or use templatebased pose alignment to fuse information from scans in various poses [15, 45]. Realtime deformable tracking of simple motions of a wide range of objects has been demonstrated [10], but it requires a KinectFusion reconstruction of a static template before acquisition. Hence, templatefree methods that simultaneously track the nonrigidly deforming geometry of a moving scene and build up a shape template over time were investigated. This hard joint reconstruction and tracking problem has mostly been looked at in an offline context [11, 12, 13, 14, 15, 16, 46, 47]. In addition to runtime, drift and oversmoothing of the shape model are a significant problem that arises with longer input sequences. The recently proposed DynamicFusion approach [17] is the first to jointly reconstruct and track a nonrigidly deforming shape from RGBD input in realtime (although the color channel is not used). It reconstructs an implicit surface representation  similar to the KinectFusion approach  while jointly optimizing for the scene’s rigid and nonrigid motion based on a coarse warping field parameterized by a sparse deformation graph [48]. Our approach tackles the same setting, but uses a dense volumetric representation to embed both the reconstructed model and the deformation warp field. While DynamicFusion only uses geometric correspondences, we additionally employ sparse photometric feature correspondences over the complete history of frames. These features serve as global anchor points and mitigate drift, which typically appears in modeltoframe tracking methods.
3 Method Overview
4 Scene Representation
We reconstruct nonrigid scenes incrementally by joint motion tracking and surface reconstruction. The two fundamental building blocks are a truncated signed distance (TSDF) function [23] for reconstruction of the shape in its initial, undeformed pose and a space deformation field to track the deformations. We discretize both in a unified manner on a shared regular volumetric grid \(\mathcal {G}\). The grid is composed of a set of grid points enumerated by a threedimensional index i. Each grid point stores six attributes. The first three attributes represent the scene in its undeformed pose by a truncated signed distance \(D_i \in \mathbb {R}\), a color \(\mathbf {C}_i \in [0, 255]^3\), and a confidence weight \(W_i\in \mathbb {R}\). The zero level set of D is the undeformed shape \(\hat{\mathbf {P}} = D^{1}(0)\), which we call canonical pose in the following. New depth data is continuously integrated into this canonical frame, where the confidence weights are used to update D based on a weighted floating average (see Sect. 8). The grid points also maintain information about the current space deformation. For the \(i^{th}\) gridpoint, we store its position after deformation \(\mathbf {t}_i\), as well as its current local rotation \(\mathbf {R}_i\), stored as three Euler angles. On top of the deformation field, we model the global motion of the scene by a global rotation \(\mathbf {R}\) and translation \(\mathbf {t}\). Initially, all per grid point data is set to zero, except for the positions \(\mathbf {t}_i\), which are initialized to represent a regular grid. In contrast to the DynamicFusion approach [17], this gridbased deformation representation operates on a finer scale. Attribute values inbetween grid points are obtained via trilinear interpolation. A point \(\mathbf {x}\) is deformed via the space deformation \(\mathcal {S}(\mathbf {x}) = \mathbf {R} \cdot \big [ \sum _{i=1}^{\mathcal {G}}{\alpha _{i}(\mathbf {x}) \cdot \mathbf {t}_i} ] + \mathbf {t}. \) Here, \(\mathcal {G}\) is the total number of grid points and the \(\alpha _{i}(\mathbf {x})\) are the trilinear interpolation weights of \(\mathbf {x}\). We denote as \(\mathbf {P}\) the current deformed surface; i.e., \(\mathbf {P} = \mathcal {S}(\hat{\mathbf {P}})\).
Since the deformation field stores deformation only in forward direction, an isosurface extraction via raycasting [1, 2] is not easily applicable. Thus, we use a dataparallel implementation of marching cubes [49] to obtain a polygonal representation of \(\hat{\mathbf {P}}\), and then apply the deformation to the vertices. We first find all grid cells that contain a zero crossing based on a dataparallel prefix sum. One thread per valid grid cell is used to extract the final list of triangles. The resulting vertices are immediately deformed according to the current deformation field, resulting in a polygonal approximation of \(\mathbf {P}\). This deformed mesh is the basis for the following correspondence association and visualization steps.
5 Correspondence Association
To update the deformation field, two distinct and complementary types of correspondences between the current deformed shape \(\mathbf {P}\) and the new color and depth input are searched: for depthimage alignment, we perform a fast dataparallel projective lookup to obtain dense depth correspondences (see Sect. 5.1). Since in many situations depth features are not sufficient for robust tracking, we also use color information, and extract a sparse set of robust color feature correspondences (see Sect. 5.2). These also serve as global anchor points, since their descriptors are not modified over time.
5.1 Projective Depth Correspondences
Like most stateoftheart online reconstruction approaches [1, 2, 17], we establish depth correspondences via a fast projective association step. Unlike them, we first extract a meshbased representation of the isosurface \(\mathbf {P}\) as described above, and then rasterize this mesh. The resulting depth buffer contains sample points \(\mathbf {p}_c\) of the current isosurface. To determine a candidate for a correspondence, we project each \(\mathbf {p}_c\) into the current depth map \(\mathcal {D}_t\) and read the sample \(\mathbf {p}_c^a\) at the target position. We generate a correspondence between \(\mathbf {p}_c\) and \(\mathbf {p}_c^a\), if the two points are considered sufficiently similar and appropriate for optimization. To measure similarity, we compute their world space distance \(\Vert \mathbf {p}_c\mathbf {p}_c^a\Vert _2\), and measure their normals’ similarity using their dot product \(\mathbf {n}_c \circ \mathbf {n}_c^a\). To make optimization more stable, we prune points close to silhouettes by looking at \(\mathbf {n}_c \circ \mathbf {v}\), where \(\mathbf {v}\) is the camera’s view direction.
More precisely, we use three thresholds \(\epsilon _{d}\) (distance), \(\epsilon _{n}\) (normal deviation), and \(\epsilon _{v}\) (view direction), and define a family of kernels \({\varPhi }_{r}(x) = 1\frac{x}{\epsilon _{r}}.\) If \({\varPhi }_{d}(\Vert \mathbf {p}_c\mathbf {p}_c^a\Vert _2) < 0\), \({\varPhi }_{n}(1\mathbf {n}_c \circ \mathbf {n}_c^a) < 0\) or \({\varPhi }_{v}(1\mathbf {n}_c \circ \mathbf {v}) < 0\), the correspondence is pruned by setting the confidence weight associated with the correspondence to zero \(w_c = 0\). For valid correspondences, the confidence is \(w_c = \big (\frac{{\varPhi }_{d}(\Vert \mathbf {p}_c\mathbf {p}_c^a\Vert _2) + {\varPhi }_{n}(1\mathbf {n}_c \circ \mathbf {n}_c^a) + {\varPhi }_{v}(1\mathbf {n}_c \circ \mathbf {v}) }{3}\big )^2\).
5.2 Robust Sparse Color Correspondences
We use a combination of dense and sparse correspondences to improve stability and reduce drift. To this end, we compute SIFT [50, 51] matches to all previous input frames on the GPU. Feature points are lifted to \(3\hbox {D}\) and stored in the canonical pose by applying \(\mathcal {S}^{1}\) after detection. When a new frame is captured, we use the deformation field to map all feature points to the previous frame. We assume a rigid transform for the matching between the previous and the current frame. The rest of the pipeline is split into four main components: keypoint detection, feature extraction, correspondence association, and correspondence pruning.
Keypoint Detection: We detect keypoint locations as scale space maxima in a DoG pyramid of the grayscale image using a dataparallel feature detection approach. We use 4 octaves, each with 3 levels. Only extrema with a valid associated depth are used, since we later lift the keypoints to 3D. All keypoints on the same scale are stored in an array. Memory is managed via atomic counters. We use at most 150 keypoints per image. For rotational invariance, we associate each keypoint with up to 2 dominant gradient orientations.
Feature Extraction: We compute a 128dimensional SIFT descriptor for each valid keypoint. Each keypoint is thus composed of its \(3\hbox {D}\) position, scale, orientation, and SIFT descriptor. Our GPU implementation extracts keypoints and descriptors in about 6 ms at an image resolution of \(640\times 480\).
Correspondence Association: Extracted features are matched with features from all previous frames using a dataparallel approach (all extracted features are stored for matching in subsequent frames). We exhaustively compute all pairwise feature distances from the current to all previous frames and vice versa. The best matching features in both directions are determined by minimum reductions in shared memory. We use at most 128 correspondences between two frames.
Correspondence Pruning: Correspondences are sorted based on feature distance using shared memory bubble sort. We keep the 64 best correspondences per image pair. Correspondences with keypoints not close enough in feature space, screen space, or 3D space are pruned.
6 Deformation Energy
To reconstruct nonrigid surfaces in real time, we have to update the space deformation \(\mathcal {S}\) at sensor rate. We estimate the corresponding global pose parameters using dense projective ICP [29].
7 Parallel Energy Optimization
Finding the optimum \(\mathbf {X}^*\) of the tracking energy \(E_{total}\) is a highdimensional nonlinear least squares problem in the unknown parameters. In fact, we only optimize the values in a onering neighborhood \(\mathcal {M}\) around the isosurface. The objective thus has a total of 6N unknowns (3 for position and 3 for rotation), with \(N=\mathcal {M}\). For the minimization of this highdimensional nonlinear objective at realtime rates, we propose a novel hierarchical dataparallel optimization strategy. First, we describe our approach for a single hierarchy level.
7.1 PerLevel Optimization Strategy
Fortunately, the nonlinear optimization objective \(E_{total}\) can be split into two independent subproblems [18] by employing an iterative flipflop optimization strategy: first, the rotations \(\mathbf {R}_i\) are fixed and we optimize for the best positions \(\mathbf {t}_i\). Second, the positions \(\mathbf {t}_i\) are considered constant and the rotations \(\mathbf {R}_i\) are updated. These two step are iterated until convergence. The two resulting subproblems can both be solved in a highly efficient dataparallel manner, as discussed in the following.
DataParallel Rotation Update: Solving for the best rotations is still a nonlinear optimization problem. Fortunately, this subproblem is equivalent to the shape matching problem [52] and has a closedform solution. We obtain the best fitting rotation based on Procrustes analysis [53, 54] with respect to the canonical pose. Since the per grid point rotations are independent, we solve for all optimal rotations in parallel. To this end, we run one thread per gridpoint, compute the corresponding crosscovariance matrix and compute the best rotation based on SVD. With our dataparallel implementation, we can compute the best rotations for 400 K voxels in 1.9 ms.
DataParallel Position Update: The tracking objective \(E_{total}\) is a quadratic optimization problem in the optimal positions \(\mathbf {t}_i\). We find the optimal positions by setting the corresponding partial derivatives \(\frac{\partial E_{total}(\mathbf {X})}{\partial \mathbf {t}_i}=\mathbf {0}\) to zero, which yields \( (\mathbf {L} + \mathbf {B}^T \mathbf {B})\cdot \mathbf {\mathbf {t}} = \mathbf {b}. \) Here, \(\mathbf {L}\) is the Laplacian matrix, \(\mathbf {B}\) encodes the pointpoint and pointplane constraints (including the trilinear interpolation of positions). The righthand side \(\mathbf {b}\) encodes the fixed rotations and the target points of the constraints. We solve the linear system of equations using a dataparallel preconditioned conjugate gradient (PCG) solver, similar to [10, 55, 56, 57, 58], which we run on the GPU. Since the matrix \(\mathbf {L}\) is sparse, we compute it onthefly in each iteration step. In contrast, \(\mathbf {B}^T \mathbf {B}\) has many nonzero entries, due to the involved trilinear interpolation. In addition, each entry is computationally expensive to compute, since we have to sum pervoxel over all contained constraints. This is a problem, especially on the coarser levels of the hierarchy, since each voxel may contain several thousand correspondences. To alleviate this problem, we precompute and cache \(\mathbf {B}^T \mathbf {B}\), before the PCG iteration commences. In every PCG step, we read the cached values which remain constant across iterations.
7.2 Hierarchical Optimization Strategy
This efficient flipflop solver has nice convergence properties on coarse resolution grids, since updates are propagated globally within only a few steps. On finer resolutions, which are important for accurate tracking, spatial propagation of updates would require too many iterations. This is a well known drawback of iterative approaches, which are known to deal well with highfrequency errors, while lowfrequency components are only slowly resolved. To alleviate this problem, we opt for a nested coarsetofine optimization strategy. This provides a good tradeoff between global convergence and runtime efficiency. We solve in a coarsetofine fashion and prolongate the solutions to the next finer level to jumpstart the optimization. When downsampling constraints, we gather all constraints of a parent voxel from its 8 children on the next finer level. We keep all constraints on coarser levels and express them as a trilinear combination of the coarse grid points.
8 Fusion
9 Results
In all examples, we capture an RGBD stream using an Asus Xtion PRO, a KinectV1style range sensor. We would like to point out that all reconstructions are obtained in realtime using a commodity desktop PC (timings are provided in the supplemental material). In addition, our method does not require any precomputation, and we do not rely on a prescanned template model – all reconstructions are built from scratch.
Importance of Sparse Color Correspondences: A core aspect of our method is the use of sparse RGB features as global anchor points for robust tracking. Figure 4 illustrates the improvement achieved by including the SIFT feature alignment objective. If the input lacks geometric features, dense depthbased alignment is illposed and results in drift, especially for tangential motion. By including color features, we are able to successfully track and reconstruct these cases.
We further compare our approach to the realtime templatetracking method by Zollhöfer et al. [10]; see Fig. 6. Even though our 3D model is obtained onthefly, the reconstruction quality is similar, or even higher.
Comparison to TemplateFree Approaches: Currently, DynamicFusion [17] is the only nonrigid reconstruction method that runs online and does not require a prescanned template. In Fig. 7, we compare our approach against DynamicFusion on two scenes used in their publication. Overall, we obtain at least comparable or even higher quality reconstructions. In particular, our canonical pose is of higher quality – we attribute this to the key differences in our method: first, our sparse RGB feature term mitigates drift and makes tracking much more robust (for the comparison /w and w/o SIFT feature matching, see Fig. 4). Second, our deformation field is at a higher resolution level than the coarse deformation proxy employed in DynamicFusion. This enables the alignment of finescale deformations and preserves detail in the reconstruction (otherwise newlyintegrated frames would smooth out detail). Unfortunately, a quantitative evaluation against DynamicFusion is challenging, since their method is hard to reproduce (their code is not publicly available and not all implementation details are given in the paper).
10 Limitations
While we are able to demonstrate compelling results and our method works well on a variety of examples, there are still limitations. First of all, robust tracking is fundamentally hard in the case of nonrigid deforming surfaces. Although global SIFT matching helps to improve robustness and minimizes alignment errors, drift is not completely eliminated. Ideally, we would like to solve a nonrigid, global bundle adjustment problem, which unfortunately exceeds the realtime computational budget.
High levels of deformation, such as fully bending a human arm, may cause problems, as our regularizer distributes deformations smoothly over the grid. We believe that adaptive strategies will be a key in addressing this issue; e.g., locally adjusting the rigidity.
Another limitation is the relatively small spatial extent that can be modeled with a uniform grid. We believe a next step on this end would be the combination of our method with a sparse surface reconstruction approach; e.g., [5, 6]. Nonetheless, we believe that our method helps to further improve the field of nonrigid 3D surface reconstruction, which is both a fundamentally hard and important problem.
11 Conclusion
We present a novel approach to jointly reconstruct the geometric shape as well as motion of an arbitrary nonrigidly deforming scene at realtime rates. The foundation is a novel unified volumetric representation that encodes both, geometry and motion. Motion tracking uses sparse color as well as dense depth constraints and is based on a fast GPUbased variational optimization strategy. Our results demonstrate nonrigid reconstruction results, even for scenes that lack geometric features. We hope that our method is another stepping stone for future work, and we believe that it paves the way for new applications in VR and AR, where the interaction with arbitrary nonrigidly deforming objects is of paramount importance.
Notes
Acknowledgments
We thank Angela Dai for the video voice over and Richard Newcombe for the DynamicFusion comparison sequences. This research is funded by the German Research Foundation (DFG) – grant GRK1773 Heterogeneous Image System –, the ERC Starting Grant 335545 CapReal, the Max Planck Center for Visual Computing and Communications (MPCVCC), and the Bayerische Forschungsstiftung (For3D).
Supplementary material
References
 1.Newcombe, R.A., Izadi, S., Hilliges, O., Molyneaux, D., Kim, D., Davison, A.J., Kohli, P., Shotton, J., Hodges, S., Fitzgibbon, A.: KinectFusion: realtime dense surface mapping and tracking. In: Proceedings of ISMAR, pp. 127–136 (2011)Google Scholar
 2.Izadi, S., Kim, D., Hilliges, O., Molyneaux, D., Newcombe, R., Kohli, P., Shotton, J., Hodges, S., Freeman, D., Davison, A., Fitzgibbon, A.: KinectFusion: realtime 3D reconstruction and interaction using a moving depth camera. In: Proceedings of UIST, pp. 559–568 (2011)Google Scholar
 3.Roth, H., Vona, M.: Moving volume KinectFusion. In: Proceedings of BMVC (2012)Google Scholar
 4.Zeng, M., Zhao, F., Zheng, J., Liu, X.: Octreebased fusion for realtime 3D reconstruction. Graph. Models 75, 126–136 (2012)CrossRefGoogle Scholar
 5.Chen, J., Bautembach, D., Izadi, S.: Scalable realtime volumetric surface reconstruction. ACM Trans. Graph. (TOG) 32(4), 113 (2013)zbMATHGoogle Scholar
 6.Nießner, M., Zollhöfer, M., Izadi, S., Stamminger, M.: Realtime 3D reconstruction at scale using voxel hashing. ACM Trans. Graph. (TOG) 32, 169 (2013)Google Scholar
 7.Whelan, T., Johannsson, H., Kaess, M., Leonard, J., McDonald, J.: Robust tracking for realtime dense RGBD mapping with kintinuous. Technical report Query date: 2012–1025(2012)Google Scholar
 8.Steinbruecker, F., Sturm, J., Cremers, D.: Volumetric 3D mapping in realtime on a CPU. In: Proceedings of ICRA, Hongkong, China (2014)Google Scholar
 9.Theobalt, C., de Aguiar, E., Stoll, C., Seidel, H.P., Thrun, S.: Performance capture from multiview video. In: Ronfard, R., Taubin, G. (eds.) Image and Geometry Processing for 3D Cinematography, pp. 127–149. Springer, Heidelberg (2010)CrossRefGoogle Scholar
 10.Zollhöfer, M., Nießner, M., Izadi, S., Rehmann, C., Zach, C., Fisher, M., Wu, C., Fitzgibbon, A., Loop, C., Theobalt, C., Stamminger, M.: Realtime nonrigid reconstruction using an rgbd camera. ACM Trans. Graph. (TOG) 33(4), 1–12 (2014)CrossRefGoogle Scholar
 11.Zeng, M., Zheng, J., Cheng, X., Liu, X.: Templateless quasirigid shape modeling with implicit loopclosure. In: Proceedings of CVPR, pp. 145–152. IEEE (2013)Google Scholar
 12.Mitra, N.J., Flöry, S., Ovsjanikov, M., Gelfand, N., Guibas, L.J., Pottmann, H.: Dynamic geometry registration. In: Proceedings of SGP, pp. 173–182 (2007)Google Scholar
 13.Tevs, A., Berner, A., Wand, M., Ihrke, I., Bokeloh, M., Kerber, J., Seidel, H.P.: Animation cartographyintrinsic reconstruction of shape and motion. ACM TOG 31(2), 12 (2012)CrossRefGoogle Scholar
 14.BojsenHansen, M., Li, H., Wojtan, C.: Tracking surfaces with evolving topology. ACM TOG 31(4), 53 (2012)CrossRefGoogle Scholar
 15.Dou, M., Fuchs, H., Frahm, J.M.: Scanning and tracking dynamic objects with commodity depth cameras. In: Proceedings of ISMAR, pp. 99–106. IEEE (2013)Google Scholar
 16.Dou, M., Taylor, J., Fuchs, H., Fitzgibbon, A., Izadi, S.: 3D scanning deformable objects with a single RGBD sensor. In: Proceedings of CVPR, June 2015Google Scholar
 17.Newcombe, R.A., Fox, D., Seitz, S.M.: DynamicFusion: reconstruction and tracking of nonrigid scenes in realtime. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015Google Scholar
 18.Sorkine, O., Alexa, M.: Asrigidaspossible surface modeling. In: Proceedings of SGP, Citeseer, pp. 109–116 (2007)Google Scholar
 19.Henry, P., Krainin, M., Herbst, E., Ren, X., Fox, D.: RGBD mapping: using kinectstyle depth cameras for dense 3D modeling of indoor environments. Int. J. Robot. Res. 31, 647–663 (2012)CrossRefGoogle Scholar
 20.Stückler, J., Behnke, S.: Integrating depth and color cues for dense multiresolution scene mapping using RGBD cameras. In: Proceedings of IEEE MFI (2012)Google Scholar
 21.Keller, M., Lefloch, D., Lambers, M., Izadi, S., Weyrich, T., Kolb, A.: Realtime 3D reconstruction in dynamic scenes using pointbased fusion. In: Proceedings of 3DV, pp. 1–8. IEEE (2013)Google Scholar
 22.Turk, G., Levoy, M.: Zippered polygon meshes from range images. In: Proceedings of SIGGRAPH, pp. 311–318 (1994)Google Scholar
 23.Curless, B., Levoy, M.: A volumetric method for building complex models from range images. In: Proceedings of SIGGRAPH, pp. 303–312. ACM (1996)Google Scholar
 24.Kazhdan, M., Bolitho, M., Hoppe, H.: Poisson surface reconstruction. In: Proceedings of SGP (2006)Google Scholar
 25.Zhou, Q.Y., Koltun, V.: Dense scene reconstruction with points of interest. ACM TOG 32(4), 112 (2013)zbMATHGoogle Scholar
 26.Fuhrmann, S., Goesele, M.: Floating scale surface reconstruction. In: Proceedings of ACM SIGGRAPH (2014)Google Scholar
 27.Rusinkiewicz, S., HallHolt, O., Levoy, M.: Realtime 3D model acquisition. ACM TOG 21(3), 438–446 (2002)CrossRefGoogle Scholar
 28.Weise, T., Wismer, T., Leibe, B., Gool, L.V.: Inhand scanning with online loop closure. In: Proceedings of 3DIM, October 2009Google Scholar
 29.Rusinkiewicz, S., Levoy, M.: Efficient variants of the ICP algorithm. In: Proceedings of 3DIM, pp. 145–152 (2001)Google Scholar
 30.Steinbruecker, F., Kerl, C., Sturm, J., Cremers, D.: Largescale multiresolution surface reconstruction from RGBD sequences. In: ICCV, Sydney, Australia (2013)Google Scholar
 31.Starck, J., Hilton, A.: Surface capture for performancebased animation. CGAA 27(3), 21–31 (2007)Google Scholar
 32.Ye, G., Liu, Y., Hasler, N., Ji, X., Dai, Q., Theobalt, C.: Performance capture of interacting characters with handheld kinects. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 828–841. Springer, Heidelberg (2012)CrossRefGoogle Scholar
 33.Collet, A., Chuang, M., Sweeney, P., Gillett, D., Evseev, D., Calabrese, D., Hoppe, H., Sullivan, S.: Highquality streamable freeviewpoint video. ACM Trans. Graph. (SIGGRAPH) 34, 4 (2015)CrossRefGoogle Scholar
 34.de Aguiar, E., Stoll, C., Theobalt, C., Ahmed, N., Seidel, H.P., Thrun, S.: Performance capture from sparse multiview video. ACM TOG (Proc. SIGGRAPH) 27, 1–10 (2008)CrossRefGoogle Scholar
 35.Allain, B., Franco, J.S., Boyer, E.: An efficient volumetric framework for shape tracking. In: CVPR 2015IEEE International Conference on Computer Vision and Pattern Recognition (2015)Google Scholar
 36.Guo, K., Xu, F., Wang, Y., Liu, Y., Dai, Q.: Robust nonrigid motion tracking and surface reconstruction using l0 regularization. In: Proceedings of ICCV (2015)Google Scholar
 37.Hernández, C., Vogiatzis, G., Brostow, G.J., Stenger, B., Cipolla, R.: Nonrigid photometric stereo with colored lights. In: Proceedings of ICCV, pp. 1–8. IEEE (2007)Google Scholar
 38.Li, H., Sumner, R.W., Pauly, M.: Global correspondence optimization for nonrigid registration of depth scans. In: Computer Graphics Forum, vol. 27, pp. 1421–1430. Wiley Online Library (2008)Google Scholar
 39.Li, H., Adams, B., Guibas, L.J., Pauly, M.: Robust singleview geometry and motion reconstruction. ACM TOG 28(5), 175 (2009)CrossRefGoogle Scholar
 40.Li, H., Luo, L., Vlasic, D., Peers, P., Popović, J., Pauly, M., Rusinkiewicz, S.: Temporally coherent completion of dynamic shapes. ACM Trans. Graph. (TOG) 31(1), 2 (2012)CrossRefGoogle Scholar
 41.Gall, J., Rosenhahn, B., Seidel, H.P.: Driftfree tracking of rigid and articulated objects. In: IEEE Conference on Computer Vision and Pattern Recognition, 2008, CVPR 2008, pp. 1–8, June 2008Google Scholar
 42.Garg, R., Roussos, A., Agapito, L.: Dense variational reconstruction of nonrigid surfaces from monocular video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1272–1279 (2013)Google Scholar
 43.Li, H., Vouga, E., Gudym, A., Luo, L., Barron, J.T., Gusev, G.: 3D selfportraits. ACM TOG 32(6), 187 (2013)Google Scholar
 44.Tong, J., Zhou, J., Liu, L., Pan, Z., Yan, H.: Scanning 3D full human bodies using Kinects. TVCG 18(4), 643–650 (2012)Google Scholar
 45.Malleson, C., Klaudiny, M., Hilton, A., Guillemaut, J.Y.: Singleview RGBDbased reconstruction of dynamic human geometry. In: 2013 IEEE International Conference on Computer Vision Workshops (ICCVW), pp. 307–314, December 2013Google Scholar
 46.Malleson, C., Klaudiny, M., Guillemaut, J.Y., Hilton, A.: Structured representation of nonrigid surfaces from single view 3D point tracks. In: 2014 2nd International Conference on 3D Vision, vol. 1, pp. 625–632, December 2014Google Scholar
 47.Wang, R., Wei, L., Vouga, E., Huang, Q., Ceylan, D., Medioni, G., Li, H.: Capturing dynamic textured surfaces of moving targets. In: Proceedings of the European Conference on Computer Vision (ECCV) (2016)Google Scholar
 48.Sumner, R.W., Schmid, J., Pauly, M.: Embedded deformation for shape manipulation. ACM TOG 26(3), 80 (2007)CrossRefGoogle Scholar
 49.Lorensen, W., Cline, H.: Marching cubes: a high resolution 3D surface construction algorithm. Proc. SIGGRAPH 21(4), 163–169 (1987)CrossRefGoogle Scholar
 50.Lowe, D.G.: Object recognition from local scaleinvariant features. In: ICCV 1999 (1999)Google Scholar
 51.Lowe, D.G.: Distinctive image features from scaleinvariant keypoints. IJCV 60, 91–110 (2004)CrossRefGoogle Scholar
 52.Horn, B.K.P.: Closedform solution of absolute orientation using unit quaternions. J. Opt. Soc. Am. A 4(4), 629–642 (1987)CrossRefGoogle Scholar
 53.Gower, J.C.: Generalized procrustes analysis. Psychometrika 40(1), 31–51 (1975)CrossRefzbMATHMathSciNetGoogle Scholar
 54.Umeyama, S.: Leastsquares estimation of transformation parameters between two point patterns. IEEE Trans. Pattern Anal. Mach. Intell. 13(4), 376–380 (1991)CrossRefGoogle Scholar
 55.Weber, D., Bender, J., Schnoes, M., Stork, A., Fellner, D.: Efficient GPU data structures and methods to solve sparse linear systems in dynamics applications. CGF 32(1), 16–26 (2013)Google Scholar
 56.Wu, C., Zollhöfer, M., Nießner, M., Stamminger, M., Izadi, S., Theobalt, C.: Realtime shadingbased refinement for consumer depth cameras. ACM Trans. Graph. (TOG) 33(6) (2014). doi: 10.1145/2661229.2661232 Google Scholar
 57.Zollhöfer, M., Dai, A., Innmann, M., Wu, C., Stamminger, M., Theobalt, C., Nießner, M.: Shadingbased refinement on volumetric signed distance functions. ACM Trans. Graph. (TOG) 34 (2015). doi: 10.1145/2766887 Google Scholar
 58.DeVito, Z., Mara, M., Zollöfer, M., Bernstein, G., Theobalt, C., Hanrahan, P., Fisher, M., Nießner, M.: Opt: a domain specific language for nonlinear least squares optimization in graphics and imaging. arXiv preprint arXiv:1604.06525 (2016)