Abstract
Realworld surfaces such as clothing, water and human body deform in complex ways. Estimating deformation parameters accurately and reliably is hard due to its highdimensional and nonconvex nature. Optimizationbased approaches require good initialization while regressionbased approaches need a large amount of training data. Recently, to achieve globally optimal estimation, datadriven descent (Tian and Narasimhan in Int J Comput Vis , 98:279–302, 2012) applies nearest neighbor estimators trained on a particular distribution of training samples to obtain a globally optimal and dense deformation field between a template and a distorted image. In this work, we develop a hierarchical structure that first applies nearest neighbor estimators on the entire image iteratively to obtain a rough estimation, and then applies estimators with local image support to refine the estimation. Compared to its nonhierarchical version, our approach has the theoretical guarantees with significantly fewer training samples, is faster by several orders, provides a better metric deciding whether a given image requires more (or fewer) samples, and can handle more complex scenes that include a mixture of global motion and local deformation. We demonstrate in both simulation and real experiments that the proposed algorithm successfully tracks a broad range of nonrigid scenes including water, clothing, and medical images, and compares favorably against several other deformation estimation and tracking approaches that do not provide optimality guarantees.
This is a preview of subscription content, access via your institution.
Notes
 1.
Note that here the parameter norm \(\Vert \cdot \Vert \) can be any norm, since if a certain norm is \(\epsilon \)small, so do others.
 2.
Please check http://gfx.cs.princeton.edu/pubs/Barnes_2009_PAR/index.php for their source code.
References
Baker, S., & Matthews, I. (2004). Lucaskanade 20 years on: A unifying framework. International Journal of Computer Vision, 56, 221–255.
Barnes, C., Shechtman, E., Finkelstein, A., & Goldman, D. (2009). Patchmatch: A randomized correspondence algorithm for structural image editing. ACM Transactions on GraphicsTOG, 28(3), 24.
Barnes, C., Shechtman, E., Goldman, D. B., & Finkelstein, A. (2010). The generalized patchmatch correspondence algorithm. In ECCV, 2010 (pp. 29–43). Berlin: Springer.
Beauchemin, S. S., & Barron, J. L. (1995). The computation of optical flow. ACM Computing Surveys (CSUR), 27(3), 433–466.
Bookstein, F. L. (1989). Principal warps: Thinplate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis & Machine Intelligence, 6, 567–585.
Cao, X., Wei, Y., Wen, F., & Sun, J. (2012). Face alignment by explicit shape regression. In CVPR.
Krizhevsky, A., Sutskever, I., & Hinton, G. (2012). Imagenet classification with deep convolutional neural networks. In NIPS.
Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR.
Lowe, D. (2004). Distinctive image features from scaleinvariant keypoints. International Journal of Computer Vision, 60, 91–110.
Lucas, B. D., & Kanade, T. (1981). An iterative image registration technique with an application to stereo vision. In IJCAI (pp. 674–679).
Matthews, I., & Baker, S. (2004). Active appearance models revisited. International Journal of Computer Vision, 60, 135–164.
Moll, M., & Gool, L. V. (2012). Optimal templates for nonrigid surface reconstruction. In ECCV.
Rueckert, D., Sonoda, L., Hayes, C., Hill, D., Leach, M., & Hawkes, D. (1999). Nonrigid registration using freeform deformations: Application to breast MR images. IEEE Transactions on Medical Imaging, 18, 712–721.
Salzmann, M., Hartley, R., & Fua, P. (2007). Convex optimization for deformable surface 3d tracking. In ICCV.
Salzmann, M., MorenoNoguer, F., Lepetit, V., & Fua, P. (2008). Closedform solution to nonrigid 3d surface registration. In ECCV.
Serre, T., Wolf, L., & Poggio, T. (2005). Object recognition with features inspired by visual cortex. In CVPR (Vol. 2, pp. 994–1000).
Shi, J., & Tomasi, C. (1994). Good features to track. In CVPR.
Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., & Blake, A. (2011). Realtime human pose recognition in parts from single depth images. In CVPR.
Tan, D. J., Holzer, S., Navab, N., & Ilic, S. (2014). Deformable template tracking in 1 ms. In ECCV.
Taylor, J., Jepson, A., & Kutulakos, K. (2010). Nonrigid structure from locallyrigid motion. In CVPR.
Tian, Y., & Narasimhan, S. G. (2012). Globally optimal estimation of nonrigid image distortion. International Journal of Computer Vision, 98, 279–302.
Zhang, S., Zhan, Y., Zhou, Y., Uzunbas, M., & Metaxas, D. (2012). Shape prior modeling using sparse representation and online dictionary learning. Medical image computing and computerassisted intervention (Vol. 7512, pp. 435–442)., Lecture notes in computer science Berlin: Springer.
Acknowledgments
This research was supported in parts by ONR grant N000141110295, a Microsoft Research PhD fellowship, a University Transportation Center TSET grant and a gift from TONBO Imaging.
Author information
Affiliations
Corresponding author
Additional information
Yuandong Tian is now in Facebook AI Research.
Communicated by Phil Torr, Steve Seitz, Yi Ma, and Kiriakos Kutulakos.
Appendices
Appendix 1: Correctness of Algorithm 1
Without loss of generality and for notation simplicity, we omit the subscript j and set \(r_j = 1\). We first define the following quantities:
Definition 1
(Allowable set of A and \(\varGamma \)) Given \(\alpha \), allowable set \(\tilde{A}(\alpha )\) is defined as:
Intuitively, \(\tilde{A}(\alpha )\) captures all plausible As that satisfy Eq. 10 for a given \(\alpha \). Similarly, given \(\gamma \), the allowable set \({\tilde{\varGamma }}(\gamma )\) is defined as:
Intuitively, \({\tilde{\varGamma }}(\gamma )\) captures all plausible \(\varGamma \)s that satisfy Eq. 11.
The two allowable sets have the following properties:
Lemma 1
If \(\alpha ' > \alpha \), then \({\tilde{A}}(\alpha ') \subset {\tilde{A}}(\alpha )\). Similarly, if \(\gamma ' < \gamma \), then \({\tilde{\varGamma }}(\gamma ') \subset {\tilde{\varGamma }}(\gamma )\).
Proof
The proof is simply by definition of the two sets. \(\square \)
Then we proceed to analyze the two arrays \(\varDelta I^+_m = \max _{1\le l \le m} \varDelta I_l\) and \(\varDelta I^_m = \min _{m\le l \le M} \varDelta I_l\) constructed in Algorithm 1.
Lemma 2
(Properties of \(\varDelta I^+\) and \(\varDelta I^\) ) The two arrays \(\varDelta I^+\) and \(\varDelta I^\) constructed in Algorithm 1 are monotonously increasing functions with respect to m, and \(\varDelta I^_m \le \varDelta I^+_m\) for every \(1\le m\le M\) (Fig. 6b). Moreover, we have:
Proof
Both \(\varDelta I^+_m\) and \(\varDelta I^_m\) are monotonously increasing since when m increases, \(\varDelta I^+_m\) is the maximal value over a larger set and \(\varDelta I^_m\) is the minimal value over a smaller set. Also \(\varDelta I^_m \le \varDelta I_m \le \varDelta I^+_m\).
Prove \(\varDelta I^{+}_m \in +{\tilde{A}}(\varDelta \mathbf {p}_m)\): For any \(\varDelta \mathbf {p}_l \le \varDelta \mathbf {p}_m\), since the list \(\{\varDelta \mathbf {p}_m\}\) was ordered, we have \(l \le m\). By the definition of \(\varDelta I^+_m\), we have \(\varDelta I_l \le \varDelta I^+_m\). Thus \(\varDelta I^{+}_m \in {\tilde{A}}(\varDelta \mathbf {p}_m)\).
Prove \(A\in {\tilde{A}}(\varDelta \mathbf {p}_m), \varDelta I^+_m \le A\): For any \(1\le l\le m\), since \(\varDelta \mathbf {p}_l \le \varDelta \mathbf {p}_m\), by the definition of A, we have \(\varDelta I_l\le A\), and thus \(\varDelta I^{+}_m = \max _{1\le l\le m}\varDelta I_l \le A\).
Therefore, \(\varDelta I^+_m = \min {\tilde{A}}(\varDelta \mathbf {p}_m)\). Similarly we can prove \(\varDelta I^{}_m = \max {\tilde{\varGamma }}(\varDelta \mathbf {p}_m)\). \(\square \)
Theorem 6
For each m and \(\alpha = \varDelta \mathbf {p}_m\), Algorithm 1 always gives the globally optimal solution to the following linear programming:
which has at least one feasible solution \((A \rightarrow +\infty , \gamma \rightarrow \infty , \varGamma \rightarrow \infty )\) for any \(\alpha \).
Proof
(a) First we prove every solution given by Algorithm 1 is a feasible solution to the optimization (Eq. 54). Indeed, for any \(\alpha = \varDelta \mathbf {p}_m\), according to Lemma 2, If we set the solution to be the output of Algorithm 1:
Then since \(A = \varDelta I^+_m \in {\tilde{A}}(\alpha )\) and \(\varGamma = \varDelta I^_{l^*} \in {\tilde{\varGamma }}(\gamma )\), such a tuple satisfies Eqs. 55 and 56. From the construction of Algorithm 1, \(A + 2\eta < \varGamma \). Thus, Algorithm 1 gives a feasible solution to Eq. 54.
(b) Then we prove Algorithm 1 gives the optimal solution. Suppose there is a better solution \((\alpha , \gamma ', A', \varGamma ')\). Obviously \(A' = A = \min {\tilde{A}}(\alpha )\). Note that any optimal solution of \(\gamma \) must align with some \(\varDelta \mathbf {p}_l\). If there exists \(l' < l^*\) so that \(\gamma ' = \varDelta \mathbf {p}_{l'} < \varDelta \mathbf {p}_{l^*} = \gamma \) is part of a better solution, then we have:
Therefore, we have \(\varDelta I^_{l'} = \max {\tilde{\varGamma }}(\gamma ') \le \max {\tilde{\varGamma }}(\gamma ) = \varDelta I^_{l^*}\). Since \(\varDelta I^_{l'} \le \varDelta I^_{l^*}\), there are two cases:

\(A \le \varDelta I^+_m +2\eta < \varDelta I^_{l'} < \varDelta I^_{l^*}\). This is not possible since the algorithm searching from m will stop at the minimal \(l^*\) that satisfies \(\varDelta I^+_m +2\eta < \varDelta I^_{l^*}\).

\(A \le \varDelta I^+_m +2\eta < \varDelta I^_{l'} = \varDelta I^_{l^*}\). Then according to the algorithm and monotonicity of \(\varDelta I^, l' = l^*\).
There fore, \(l' = l^*\) and \((\alpha , \gamma ', A', \varGamma ')\) is given by Algorithm 1. \(\square \)
From Theorem 6, for every \(\alpha \), Algorithm 1 always outputs the smallest \(\gamma \)that satisfies the Relaxed Lipschitz Conditions (Eqs. 10 and 11). Therefore, it outputs the curve \(\gamma = \gamma ^*(\alpha )\).
Appendix 2: Local Pullback Operation
Similar to pullback operation introduced in datadriven descent (Tian and Narasimhan 2012), we can also introduce local pullback operation:
In particular, for deformed image \(I_\mathbf {q}\) and the moving region \(R_j=R_j(\mathbf {q})\), we have
which gives back the template content. Similar to pullback inequality, we also have the following local pullback inequality:
Theorem 7
For jth patch with template region \(R_{j0}\) and radius \(r_j\), if \(\Vert \mathbf {p} \mathbf {q}\Vert _\infty \le r_j\) and \(\Vert \mathbf {q}\Vert _\infty \le c_q\), then
where \(\eta _j = c_B c_q c_G Area_j\). Note \(c_G = \max _\mathbf {x}\nabla I_\mathbf {p}(\mathbf {x})_1\) and \(c_B\) is a smoothness constant so that:
To prove this, we start with the following lemma.
Lemma 3
(Unity bound) For any \(\mathbf {x}\) and any \(\mathbf {p}\), we have \(\Vert B(\mathbf {x})\mathbf {p}\Vert _\infty \le \Vert \mathbf {p}\Vert _\infty \).
Proof
using the fact that \(\sum _i b_i(\mathbf {x}) = 1\) and \(b_i(\mathbf {x}) \ge 0\) for any \(\mathbf {x}\). \(\square \)
We now show Theorem 7 is correct.
Proof
For any \(\mathbf {y}\in R_{j0}\), by definitions of Eqs. 60 and 1, we have:
Now we need to check the pixel distance between \(\mathbf {u}= W(\mathbf {y}; \mathbf {q})\) and \(\mathbf {v}= W(W^{1}(\mathbf {y}; \mathbf {p}\mathbf {q}), \mathbf {p})\). Note that both are pixel locations on distorted image \(I_\mathbf {p}\). If we can bound \(\Vert \mathbf {u}\mathbf {v}\Vert _\infty \), then from \(I_\mathbf {p}\)’s appearance, we can obtain the bound for \(I_\mathbf {p}(R_j(\mathbf {q}))(\mathbf {y})  I_{\mathbf {p}\mathbf {q}}(\mathbf {y})\).
Denote \(\mathbf {z}= W^{1}(\mathbf {y}; \mathbf {p}\mathbf {q})\) which is a pixel on the template. By definition we have:
then by Lemma 3 we have:
On the other hand, the difference between \(\mathbf {u}\) and \(\mathbf {v}\) has the following simple form:
Thus, by the definition of \(c_B\) (Eq. 63), we have:
Thus:
where \(\xi \) lies on the line segment connecting \(\mathbf {u}\) and \(\mathbf {v}\). Collecting Eq. 73 over the entire region \(R_j(\mathbf {p})\) gives the bound. \(\square \)
Practically, \(\eta _j\) is very small and can be neglected.
From Eq. 60, there is a relationship between the (global) pullback operation \(H(I, \mathbf {q})\equiv I(W(\mathbf {x}; \mathbf {p}))\) defined in Tian and Narasimhan (2012) and the local pullback operation \(I(R_j(\mathbf {q}))\):
Therefore, to compute \(I(R_j(\mathbf {q}))\) for all patches, just compute the global pullback image \(H(I, \mathbf {q})\) once and extract region \(R_{j0}\) for every jth patch on the pulledback image.
Appendix 3: Sampling in Highdimensional Subspace
Here we show how to count the number of \(\epsilon \)ball required (i.e., sample complexity) to cover a hypercube \([r, r]^D\) in a Ddimensional parameter space. Then we discuss how to compute sample complexity if the parameters are on a ddimensional subspace within the hypercube. Both cases are shown in Fig. 19.
Covering a Ddimensional Space
Lemma 4
(Sampling theorem, sufficient conditions) With \(\lceil 1/\alpha \rceil ^D\) number of samples (\(\alpha < 1\)), for any \(\mathbf {p}\) in the hypercube \([r, r]^D\), there exists at least one sample \({\hat{\mathbf {p}}}\) so that \(\Vert {\hat{\mathbf {p}}}  \mathbf {p}\Vert _\infty \le \alpha r\).
Proof
A uniform distribution of the training samples within the hypercube suffices. In particular, let
Thus we have \(1/n = 1 / \lceil 1/\alpha \rceil \le 1 / (1 / \alpha ) = \alpha \). For every multiindex \((i_1, i_2, \ldots , i_D)\) with \(1\le i_k\le n\), we put one training sample on Ddimensional coordinates:
Therefore, along each dimension, the first sample is r / n distance away from \(r\), then the second sample is 2r / n distance to the first sample, until the last sample that is r / n distance away from the boundary r (Fig. 19c). Then for any \(\mathbf {p}\in [r, r]^D\), there exists \(i_k\) so that
This holds for \(1\le k\le D\). As a result, we have
and the total number of samples needed is \(n^D = \lceil 1/\alpha \rceil ^D\). \(\square \)
Covering a Manifold in Ddimensional Space
Now we consider the case that \(\mathbf {p}\) lies on a manifold \(\mathcal {M}\) embedded in Ddimensional space. This means that there exists a function f (linear or nonlinear) so that for every \(\mathbf {p}\) on the manifold and within the hypercube \([r, r]^D\), there exists a ddimensional vector \(\mathbf {v}\in [r, r]^d\) with \(\mathbf {p}= f(\mathbf {v})\). For example, this happens if we use overcomplete local bases to represent the deformation. Note that the function f is onto:
In this case, we do not need to fill the entire hypercube \([r, r]^D\), but rather fill the ddimensional hypercube \([r, r]^d\), which requires the number of samples to be exponential with respect to only d rather than D. To prove this, we first define the expanding factor c regarding to the mapping:
Definition 2
(Expanding factor c ) The expanding factor c for a mapping f is defined as:
We thus have the following sampling theorem for deformation parameters \(\mathbf {p}\) on a manifold:
Theorem 8
(Sampling theorem, sufficient condition in the manifold case) With \(c_{SS}\lceil 1/\alpha \rceil ^d\) samples distributed in the hypercube \([r, r]^d\), for any \(\mathbf {p}\in \mathcal {M}\), there exists at least one sample \({\hat{\mathbf {p}}} = f({\hat{\mathbf {v}}})\) so that \(\Vert {\hat{\mathbf {p}}}  \mathbf {p}\Vert _\infty \le \alpha r\). Note \(c_{SS} = \lceil c\rceil ^d\).
Proof
We first apply Theorem 4 to the hypercube \([r, r]^d\). Then with \(\lceil \frac{c}{\alpha }\rceil ^d\) samples, for any \(\mathbf {v}\in [r, r]^d\), there exists a training sample \(\mathbf {v}^{(i)}\) so that
We then build the training samples \(\{\mathbf {p}^{(i)}\}\) by setting \(\mathbf {p}^{(i)} = f(\mathbf {v}^{(i)})\). For any \(\mathbf {p}\in [r, r]^D\), there exists an \(\mathbf {v}\in [r, r]^d\) so that \(\mathbf {p}= f(\mathbf {v})\). By the sampling procedure, there exists \(\mathbf {v}^{(i)}\) so that \(\Vert \mathbf {v} \mathbf {v}^{(i)}\Vert _\infty \le \frac{\alpha }{c} r\), and therefore:
setting \({\hat{\mathbf {p}}} = \mathbf {p}^{(i)}\) thus suffices. Finally, since \(\lceil ab\rceil \le \lceil a \rceil \lceil b \rceil \), we have:
and the conclusion follows. \(\square \)
We can see from the proof that c plays a quite important role for the number of samples needed. To make c smaller, d is not necessarily the intrinsic dimension of the manifold, but can be slightly more. Theorem 8 applies to any parameterization of the manifold \(\mathcal {M}\).
Let us compute c for some global deformation fields. For example, an affine deformation field within a rectangle parameterized by D / 2 landmarks is always 6dimensional. To make c smaller, we (meta)parameterize the field by 4 meta control points (\(d=8\)) sitting at the four corner of the rectangle (Fig. 20a). In this case, any landmark displacement \(\mathbf {p}(k)\) within this rectangle can be linearly represented by the locations of four corners in a convex manner:
Here \(\mathbf {v}\) is the concatenation of four deformation vectors [\(\mathbf {p}(\mathrm {top\_left}), \mathbf {p}(\mathrm {bottom\_left}), \mathbf {p}(\mathrm {top\_right}), \mathbf {p}(\mathrm {bottom\_right})\)] from the four corners, \(0 \le a_{kj} \le 1\) and \(\sum _j a_{kj} = 1\). For any \(\mathbf {p}\in [r, r]^D, \mathbf {v}\) can be found by just picking up the deformation of its four corners, and thus \(\Vert \mathbf {v}\Vert _\infty \le r\). Furthermore, we have for \(\mathbf {v}_1, \mathbf {v}_2\in [r, r]^k\):
Therefore, \(c = 1\). The reason why we pick four corners, is to ensure all weights in linear combinations are between 0 and 1.
Similarly, for 3dimensional deformation with pure translation and rotation, as used in Sect. 8, we can just pick \(\mathbf {v}\) as the concatenation of two landmark displacements: the rotation center \(\mathbf {p}(\text {center})\) and the corner point \(\mathbf {p}(\text {corner})\) whose rest location is the most distant to the center than other landmarks (Fig. 20b). Denote \(r_\mathbf{corner }\) as the distance. For other landmark displacement \(\mathbf {p}(k)\) whose index k can be parametrized by polar coordinate \((r, \theta )\), we have:
where Id is the identity matrix, \(R(\theta )\) is the 2D rotational matrix and \(r_\mathbf{corner }\) is the distance between the rest location of the center to that of the corner. Therefore, for two different \(\mathbf {v}_1\) and \(\mathbf {v}_2\), since \(r \le r_\mathbf{corner }\), we have:
since \(\cos (\theta ) + \sin (\theta ) \le \sqrt{2}\). Therefore,
So \(c = 2+\sqrt{2}\le 3.5\). This constant is used in Sect. 8 to compute the number of samples required by the theory.
Rights and permissions
About this article
Cite this article
Tian, Y., Narasimhan, S.G. Theory and Practice of Hierarchical Datadriven Descent for Optimal Deformation Estimation. Int J Comput Vis 115, 44–67 (2015). https://doi.org/10.1007/s1126301508385
Received:
Accepted:
Published:
Issue Date:
Keywords
 Deformation modeling
 Globally optimal solutions
 Nonrigid deformation
 Datadriven approach
 Nonlinear optimization
 Nonconvex optimization
 Image deformation
 Highdimensional regression