Theory and Practice of Hierarchical Data-driven Descent for Optimal Deformation Estimation


Real-world surfaces such as clothing, water and human body deform in complex ways. Estimating deformation parameters accurately and reliably is hard due to its high-dimensional and non-convex nature. Optimization-based approaches require good initialization while regression-based approaches need a large amount of training data. Recently, to achieve globally optimal estimation, data-driven descent (Tian and Narasimhan in Int J Comput Vis , 98:279–302, 2012) applies nearest neighbor estimators trained on a particular distribution of training samples to obtain a globally optimal and dense deformation field between a template and a distorted image. In this work, we develop a hierarchical structure that first applies nearest neighbor estimators on the entire image iteratively to obtain a rough estimation, and then applies estimators with local image support to refine the estimation. Compared to its non-hierarchical version, our approach has the theoretical guarantees with significantly fewer training samples, is faster by several orders, provides a better metric deciding whether a given image requires more (or fewer) samples, and can handle more complex scenes that include a mixture of global motion and local deformation. We demonstrate in both simulation and real experiments that the proposed algorithm successfully tracks a broad range of non-rigid scenes including water, clothing, and medical images, and compares favorably against several other deformation estimation and tracking approaches that do not provide optimality guarantees.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18


  1. 1.

    Note that here the parameter norm \(\Vert \cdot \Vert \) can be any norm, since if a certain norm is \(\epsilon \)-small, so do others.

  2. 2.

    Please check for their source code.


  1. Baker, S., & Matthews, I. (2004). Lucas-kanade 20 years on: A unifying framework. International Journal of Computer Vision, 56, 221–255.

    Article  Google Scholar 

  2. Barnes, C., Shechtman, E., Finkelstein, A., & Goldman, D. (2009). Patchmatch: A randomized correspondence algorithm for structural image editing. ACM Transactions on Graphics-TOG, 28(3), 24.

    Google Scholar 

  3. Barnes, C., Shechtman, E., Goldman, D. B., & Finkelstein, A. (2010). The generalized patchmatch correspondence algorithm. In ECCV, 2010 (pp. 29–43). Berlin: Springer.

  4. Beauchemin, S. S., & Barron, J. L. (1995). The computation of optical flow. ACM Computing Surveys (CSUR), 27(3), 433–466.

    Article  Google Scholar 

  5. Bookstein, F. L. (1989). Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis & Machine Intelligence, 6, 567–585.

    Article  Google Scholar 

  6. Cao, X., Wei, Y., Wen, F., & Sun, J. (2012). Face alignment by explicit shape regression. In CVPR.

  7. Krizhevsky, A., Sutskever, I., & Hinton, G. (2012). Imagenet classification with deep convolutional neural networks. In NIPS.

  8. Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR.

  9. Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60, 91–110.

    Article  Google Scholar 

  10. Lucas, B. D., & Kanade, T. (1981). An iterative image registration technique with an application to stereo vision. In IJCAI (pp. 674–679).

  11. Matthews, I., & Baker, S. (2004). Active appearance models revisited. International Journal of Computer Vision, 60, 135–164.

  12. Moll, M., & Gool, L. V. (2012). Optimal templates for non-rigid surface reconstruction. In ECCV.

  13. Rueckert, D., Sonoda, L., Hayes, C., Hill, D., Leach, M., & Hawkes, D. (1999). Nonrigid registration using free-form deformations: Application to breast MR images. IEEE Transactions on Medical Imaging, 18, 712–721.

  14. Salzmann, M., Hartley, R., & Fua, P. (2007). Convex optimization for deformable surface 3-d tracking. In ICCV.

  15. Salzmann, M., Moreno-Noguer, F., Lepetit, V., & Fua, P. (2008). Closed-form solution to non-rigid 3d surface registration. In ECCV.

  16. Serre, T., Wolf, L., & Poggio, T. (2005). Object recognition with features inspired by visual cortex. In CVPR (Vol. 2, pp. 994–1000).

  17. Shi, J., & Tomasi, C. (1994). Good features to track. In CVPR.

  18. Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., & Blake, A. (2011). Real-time human pose recognition in parts from single depth images. In CVPR.

  19. Tan, D. J., Holzer, S., Navab, N., & Ilic, S. (2014). Deformable template tracking in 1 ms. In ECCV.

  20. Taylor, J., Jepson, A., & Kutulakos, K. (2010). Non-rigid structure from locally-rigid motion. In CVPR.

  21. Tian, Y., & Narasimhan, S. G. (2012). Globally optimal estimation of nonrigid image distortion. International Journal of Computer Vision, 98, 279–302.

    MATH  MathSciNet  Article  Google Scholar 

  22. Zhang, S., Zhan, Y., Zhou, Y., Uzunbas, M., & Metaxas, D. (2012). Shape prior modeling using sparse representation and online dictionary learning. Medical image computing and computer-assisted intervention (Vol. 7512, pp. 435–442)., Lecture notes in computer science Berlin: Springer.

    Google Scholar 

Download references


This research was supported in parts by ONR grant N00014-11-1-0295, a Microsoft Research PhD fellowship, a University Transportation Center T-SET grant and a gift from TONBO Imaging.

Author information



Corresponding author

Correspondence to Yuandong Tian.

Additional information

Yuandong Tian is now in Facebook AI Research.

Communicated by Phil Torr, Steve Seitz, Yi Ma, and Kiriakos Kutulakos.


Appendix 1: Correctness of Algorithm 1

Without loss of generality and for notation simplicity, we omit the subscript j and set \(r_j = 1\). We first define the following quantities:

Definition 1

(Allowable set of A and \(\varGamma \)) Given \(\alpha \), allowable set \(\tilde{A}(\alpha )\) is defined as:

$$\begin{aligned} \tilde{A}(\alpha ) = \{A: \forall l\ \varDelta \mathbf {p}_l \le \alpha \implies \varDelta I_l \le A\} \end{aligned}$$

Intuitively, \(\tilde{A}(\alpha )\) captures all plausible As that satisfy Eq. 10 for a given \(\alpha \). Similarly, given \(\gamma \), the allowable set \({\tilde{\varGamma }}(\gamma )\) is defined as:

$$\begin{aligned} {\tilde{\varGamma }}(\gamma ) = \{\varGamma : \forall l\ \varDelta \mathbf {p}_l \ge \gamma \implies \varDelta I_l \ge \varGamma \} \end{aligned}$$

Intuitively, \({\tilde{\varGamma }}(\gamma )\) captures all plausible \(\varGamma \)s that satisfy Eq. 11.

The two allowable sets have the following properties:

Lemma 1

If \(\alpha ' > \alpha \), then \({\tilde{A}}(\alpha ') \subset {\tilde{A}}(\alpha )\). Similarly, if \(\gamma ' < \gamma \), then \({\tilde{\varGamma }}(\gamma ') \subset {\tilde{\varGamma }}(\gamma )\).


The proof is simply by definition of the two sets. \(\square \)

Then we proceed to analyze the two arrays \(\varDelta I^+_m = \max _{1\le l \le m} \varDelta I_l\) and \(\varDelta I^-_m = \min _{m\le l \le M} \varDelta I_l\) constructed in Algorithm 1.

Lemma 2

(Properties of \(\varDelta I^+\) and \(\varDelta I^-\) ) The two arrays \(\varDelta I^+\) and \(\varDelta I^-\) constructed in Algorithm 1 are monotonously increasing functions with respect to m, and \(\varDelta I^-_m \le \varDelta I^+_m\) for every \(1\le m\le M\) (Fig. 6b). Moreover, we have:

$$\begin{aligned} \varDelta I^{+}_m= & {} \min {\tilde{A}}(\varDelta \mathbf {p}_m) \end{aligned}$$
$$\begin{aligned} \varDelta I^{-}_m= & {} \max {\tilde{\varGamma }}(\varDelta \mathbf {p}_m) \end{aligned}$$


Both \(\varDelta I^+_m\) and \(\varDelta I^-_m\) are monotonously increasing since when m increases, \(\varDelta I^+_m\) is the maximal value over a larger set and \(\varDelta I^-_m\) is the minimal value over a smaller set. Also \(\varDelta I^-_m \le \varDelta I_m \le \varDelta I^+_m\).

Prove \(\varDelta I^{+}_m \in +{\tilde{A}}(\varDelta \mathbf {p}_m)\): For any \(\varDelta \mathbf {p}_l \le \varDelta \mathbf {p}_m\), since the list \(\{\varDelta \mathbf {p}_m\}\) was ordered, we have \(l \le m\). By the definition of \(\varDelta I^+_m\), we have \(\varDelta I_l \le \varDelta I^+_m\). Thus \(\varDelta I^{+}_m \in {\tilde{A}}(\varDelta \mathbf {p}_m)\).

Prove \(A\in {\tilde{A}}(\varDelta \mathbf {p}_m), \varDelta I^+_m \le A\): For any \(1\le l\le m\), since \(\varDelta \mathbf {p}_l \le \varDelta \mathbf {p}_m\), by the definition of A, we have \(\varDelta I_l\le A\), and thus \(\varDelta I^{+}_m = \max _{1\le l\le m}\varDelta I_l \le A\).

Therefore, \(\varDelta I^+_m = \min {\tilde{A}}(\varDelta \mathbf {p}_m)\). Similarly we can prove \(\varDelta I^{-}_m = \max {\tilde{\varGamma }}(\varDelta \mathbf {p}_m)\). \(\square \)

Theorem 6

For each m and \(\alpha = \varDelta \mathbf {p}_m\), Algorithm 1 always gives the globally optimal solution to the following linear programming:

$$\begin{aligned} \min&\gamma \end{aligned}$$
$$\begin{aligned} \mathrm {s.t.}&A\in {\tilde{A}}(\alpha )) \end{aligned}$$
$$\begin{aligned}&\varGamma \in {\tilde{\varGamma }}(\gamma )) \end{aligned}$$
$$\begin{aligned}&A + 2\eta < \varGamma \end{aligned}$$

which has at least one feasible solution \((A \rightarrow +\infty , \gamma \rightarrow -\infty , \varGamma \rightarrow -\infty )\) for any \(\alpha \).


(a) First we prove every solution given by Algorithm 1 is a feasible solution to the optimization (Eq. 54). Indeed, for any \(\alpha = \varDelta \mathbf {p}_m\), according to Lemma 2, If we set the solution to be the output of Algorithm 1:

$$\begin{aligned} (\alpha , \gamma , A, \varGamma ) = (\varDelta \mathbf {p}_m, \varDelta \mathbf {p}_{l^*}, \varDelta I^+_m, \varDelta I^-_{l^*}) \end{aligned}$$

Then since \(A = \varDelta I^+_m \in {\tilde{A}}(\alpha )\) and \(\varGamma = \varDelta I^-_{l^*} \in {\tilde{\varGamma }}(\gamma )\), such a tuple satisfies Eqs. 55 and 56. From the construction of Algorithm 1, \(A + 2\eta < \varGamma \). Thus, Algorithm 1 gives a feasible solution to Eq. 54.

(b) Then we prove Algorithm 1 gives the optimal solution. Suppose there is a better solution \((\alpha , \gamma ', A', \varGamma ')\). Obviously \(A' = A = \min {\tilde{A}}(\alpha )\). Note that any optimal solution of \(\gamma \) must align with some \(\varDelta \mathbf {p}_l\). If there exists \(l' < l^*\) so that \(\gamma ' = \varDelta \mathbf {p}_{l'} < \varDelta \mathbf {p}_{l^*} = \gamma \) is part of a better solution, then we have:

$$\begin{aligned} A' + 2\eta < \varGamma ' \le \max {\tilde{\varGamma }}(\gamma ') \le \max {\tilde{\varGamma }}(\gamma ) = \varDelta I^-_{l^*} \end{aligned}$$

Therefore, we have \(\varDelta I^-_{l'} = \max {\tilde{\varGamma }}(\gamma ') \le \max {\tilde{\varGamma }}(\gamma ) = \varDelta I^-_{l^*}\). Since \(\varDelta I^-_{l'} \le \varDelta I^-_{l^*}\), there are two cases:

  • \(A \le \varDelta I^+_m +2\eta < \varDelta I^-_{l'} < \varDelta I^-_{l^*}\). This is not possible since the algorithm searching from m will stop at the minimal \(l^*\) that satisfies \(\varDelta I^+_m +2\eta < \varDelta I^-_{l^*}\).

  • \(A \le \varDelta I^+_m +2\eta < \varDelta I^-_{l'} = \varDelta I^-_{l^*}\). Then according to the algorithm and monotonicity of \(\varDelta I^-, l' = l^*\).

There fore, \(l' = l^*\) and \((\alpha , \gamma ', A', \varGamma ')\) is given by Algorithm 1. \(\square \)

From Theorem 6, for every \(\alpha \), Algorithm 1 always outputs the smallest \(\gamma \)that satisfies the Relaxed Lipschitz Conditions (Eqs. 10 and 11). Therefore, it outputs the curve \(\gamma = \gamma ^*(\alpha )\).

Appendix 2: Local Pullback Operation

Similar to pull-back operation introduced in data-driven descent (Tian and Narasimhan 2012), we can also introduce local pull-back operation:

$$\begin{aligned} I(R_j(\mathbf {q})) \equiv I(W(R_{j0}; \mathbf {q})) \end{aligned}$$

In particular, for deformed image \(I_\mathbf {q}\) and the moving region \(R_j=R_j(\mathbf {q})\), we have

$$\begin{aligned} I_\mathbf {q}(R_j(\mathbf {q})) = I_\mathbf {q}(W(R_{j0}; \mathbf {q})) = I_0(R_{j0}) \end{aligned}$$

which gives back the template content. Similar to pull-back inequality, we also have the following local pull-back inequality:

Theorem 7

For jth patch with template region \(R_{j0}\) and radius \(r_j\), if \(\Vert \mathbf {p}- \mathbf {q}\Vert _\infty \le r_j\) and \(\Vert \mathbf {q}\Vert _\infty \le c_q\), then

$$\begin{aligned} \Vert I_\mathbf {p}(R_j(\mathbf {q})) - I_{\mathbf {p}-\mathbf {q}}(R_{j0})\Vert \le \eta _j r_j \end{aligned}$$

where \(\eta _j = c_B c_q c_G Area_j\). Note \(c_G = \max _\mathbf {x}|\nabla I_\mathbf {p}(\mathbf {x})|_1\) and \(c_B\) is a smoothness constant so that:

$$\begin{aligned} \Vert (B(\mathbf {x}) - B(\mathbf {y}))\mathbf {p}\Vert _\infty \le c_B \Vert \mathbf {x}-\mathbf {y}\Vert _\infty \Vert \mathbf {p}\Vert _\infty \end{aligned}$$

To prove this, we start with the following lemma.

Lemma 3

(Unity bound) For any \(\mathbf {x}\) and any \(\mathbf {p}\), we have \(\Vert B(\mathbf {x})\mathbf {p}\Vert _\infty \le \Vert \mathbf {p}\Vert _\infty \).


$$\begin{aligned}&\Vert B(\mathbf {x})\mathbf {p}\Vert _\infty = \max _{\mathbf {x}}\left\{ \sum _i b_i(\mathbf {x})\mathbf {p}^x(i), \sum _i \mathbf {b}_i(\mathbf {x})\mathbf {p}^y(i)\right\} \nonumber \\&\quad \le \max \left\{ \max _i |\mathbf {p}^x(i)|\sum _i b_i(\mathbf {x}), \max _i |\mathbf {p}^y(i)|\sum _i b_i(\mathbf {x})\right\} \nonumber \\&\quad = \Vert \mathbf {p}\Vert _\infty \end{aligned}$$

using the fact that \(\sum _i b_i(\mathbf {x}) = 1\) and \(b_i(\mathbf {x}) \ge 0\) for any \(\mathbf {x}\). \(\square \)

We now show Theorem 7 is correct.


For any \(\mathbf {y}\in R_{j0}\), by definitions of Eqs. 60 and 1, we have:

$$\begin{aligned} I_\mathbf {p}(R_j(\mathbf {q}))(\mathbf {y})= & {} I_\mathbf {p}(W(\mathbf {y}; \mathbf {q})) \end{aligned}$$
$$\begin{aligned} I_{\mathbf {p}-\mathbf {q}}(\mathbf {y})= & {} T(W^{-1}(\mathbf {y}; \mathbf {p}-\mathbf {q}))\nonumber \\= & {} I_\mathbf {p}(W(W^{-1}(\mathbf {y}; \mathbf {p}-\mathbf {q}), \mathbf {p})) \end{aligned}$$

Now we need to check the pixel distance between \(\mathbf {u}= W(\mathbf {y}; \mathbf {q})\) and \(\mathbf {v}= W(W^{-1}(\mathbf {y}; \mathbf {p}-\mathbf {q}), \mathbf {p})\). Note that both are pixel locations on distorted image \(I_\mathbf {p}\). If we can bound \(\Vert \mathbf {u}-\mathbf {v}\Vert _\infty \), then from \(I_\mathbf {p}\)’s appearance, we can obtain the bound for \(|I_\mathbf {p}(R_j(\mathbf {q}))(\mathbf {y}) - I_{\mathbf {p}-\mathbf {q}}(\mathbf {y})|\).

Denote \(\mathbf {z}= W^{-1}(\mathbf {y}; \mathbf {p}-\mathbf {q})\) which is a pixel on the template. By definition we have:

$$\begin{aligned} \mathbf {y}= W(\mathbf {z}; \mathbf {p}- \mathbf {q}) = \mathbf {z}+ B(\mathbf {z})(\mathbf {p}-\mathbf {q}) \end{aligned}$$

then by Lemma 3 we have:

$$\begin{aligned} \Vert \mathbf {y}-\mathbf {z}\Vert _\infty = \Vert B(\mathbf {z})(\mathbf {p}-\mathbf {q})\Vert _\infty \le \Vert \mathbf {p}-\mathbf {q}\Vert _\infty \le r_j \end{aligned}$$

On the other hand, the difference between \(\mathbf {u}\) and \(\mathbf {v}\) has the following simple form:

$$\begin{aligned} \mathbf {u}- \mathbf {v}= & {} W(\mathbf {y}, \mathbf {q}) - W(\mathbf {z}, \mathbf {p}) = \mathbf {y}+ B(\mathbf {y})\mathbf {q}- \mathbf {z}- B(\mathbf {z})\mathbf {p}\nonumber \\\end{aligned}$$
$$\begin{aligned}= & {} B(\mathbf {z})(\mathbf {p}-\mathbf {q}) - B(\mathbf {z})\mathbf {p}+ B(\mathbf {y})\mathbf {q}= (B(\mathbf {y}) - B(\mathbf {z}))\mathbf {q}\nonumber \\ \end{aligned}$$

Thus, by the definition of \(c_B\) (Eq. 63), we have:

$$\begin{aligned} \Vert \mathbf {u}-\mathbf {v}\Vert _\infty \le c_B\Vert \mathbf {y}-\mathbf {z}\Vert _\infty \Vert \mathbf {q}\Vert _\infty \le (c_B \Vert \mathbf {q}\Vert _\infty ) r_j \end{aligned}$$


$$\begin{aligned}&|I_\mathbf {p}(R_j(\mathbf {q}))(\mathbf {y}) - I_{\mathbf {p}-\mathbf {q}}(\mathbf {y})|\nonumber \\&\quad = |I_\mathbf {p}(W(\mathbf {y}; \mathbf {q})) - I_\mathbf {p}(W(W^{-1}(\mathbf {y}; \mathbf {p}-\mathbf {q}), \mathbf {p}))|\nonumber \\&\quad = |I_\mathbf {p}(\mathbf {u}) - I_\mathbf {p}(\mathbf {v})| \end{aligned}$$
$$\begin{aligned}&\quad \le |\nabla I_\mathbf {p}(\xi )|_1 \Vert \mathbf {u}-\mathbf {v}\Vert _\infty \le c_B |\nabla I_\mathbf {p}(\xi )|_1 \Vert \mathbf {q}\Vert _\infty r_j \end{aligned}$$

where \(\xi \) lies on the line segment connecting \(\mathbf {u}\) and \(\mathbf {v}\). Collecting Eq. 73 over the entire region \(R_j(\mathbf {p})\) gives the bound. \(\square \)

Practically, \(\eta _j\) is very small and can be neglected.

From Eq. 60, there is a relationship between the (global) pull-back operation \(H(I, \mathbf {q})\equiv I(W(\mathbf {x}; \mathbf {p}))\) defined in Tian and Narasimhan (2012) and the local pull-back operation \(I(R_j(\mathbf {q}))\):

$$\begin{aligned} H(I, \mathbf {q})(R_{j0}) = I(W(R_{j0}; \mathbf {p})) = I(R_j(\mathbf {q})) \end{aligned}$$

Therefore, to compute \(I(R_j(\mathbf {q}))\) for all patches, just compute the global pull-back image \(H(I, \mathbf {q})\) once and extract region \(R_{j0}\) for every j-th patch on the pulled-back image.

Appendix 3: Sampling in High-dimensional Subspace

Here we show how to count the number of \(\epsilon \)-ball required (i.e., sample complexity) to cover a hypercube \([-r, r]^D\) in a D-dimensional parameter space. Then we discuss how to compute sample complexity if the parameters are on a d-dimensional subspace within the hypercube. Both cases are shown in Fig. 19.

Fig. 19

Sampling strategies in D-dimensional space. a Uniform sampling within a hypercube \([-r, r]^{D}\) so that for any \(\mathbf {p}(S) \in [-r, r]^{2|S|}\), there exists at least one training sample that is \(\alpha r\) close to \(\mathbf {p}(S)\). b If \(d < D\), then just sampling the subspace within the hypercube suffices. c Uniform sampling per dimension (See Lemma 4)

Covering a D-dimensional Space

Lemma 4

(Sampling theorem, sufficient conditions) With \(\lceil 1/\alpha \rceil ^D\) number of samples (\(\alpha < 1\)), for any \(\mathbf {p}\) in the hypercube \([-r, r]^D\), there exists at least one sample \({\hat{\mathbf {p}}}\) so that \(\Vert {\hat{\mathbf {p}}} - \mathbf {p}\Vert _\infty \le \alpha r\).


A uniform distribution of the training samples within the hypercube suffices. In particular, let

$$\begin{aligned} n = \left\lceil \frac{1}{\alpha } \right\rceil \end{aligned}$$

Thus we have \(1/n = 1 / \lceil 1/\alpha \rceil \le 1 / (1 / \alpha ) = \alpha \). For every multi-index \((i_1, i_2, \ldots , i_D)\) with \(1\le i_k\le n\), we put one training sample on D-dimensional coordinates:

$$\begin{aligned}&{\hat{\mathbf {p}}}_{i_1, i_2, \ldots , i_d} = r\left[ -1 + \frac{2i_1 - 1}{n}, -1\right. \nonumber \\&\qquad \left. + \frac{2i_2-1}{n}, \ldots , -1 + \frac{2i_D -1}{n}\right] \end{aligned}$$

Therefore, along each dimension, the first sample is r / n distance away from \(-r\), then the second sample is 2r / n distance to the first sample, until the last sample that is r / n distance away from the boundary r (Fig. 19c). Then for any \(\mathbf {p}\in [-r, r]^D\), there exists \(i_k\) so that

$$\begin{aligned} \left| \mathbf {p}(k) - r\left( -1 + \frac{2i_k-1}{n}\right) \right| \le \frac{1}{n}r \le \alpha r \end{aligned}$$

This holds for \(1\le k\le D\). As a result, we have

$$\begin{aligned} \Vert \mathbf {p}- {\hat{\mathbf {p}}}_{i_1, i_2, \ldots , i_D}\Vert _\infty \le \alpha r \end{aligned}$$

and the total number of samples needed is \(n^D = \lceil 1/\alpha \rceil ^D\). \(\square \)

Covering a Manifold in D-dimensional Space

Now we consider the case that \(\mathbf {p}\) lies on a manifold \(\mathcal {M}\) embedded in D-dimensional space. This means that there exists a function f (linear or nonlinear) so that for every \(\mathbf {p}\) on the manifold and within the hypercube \([-r, r]^D\), there exists a d-dimensional vector \(\mathbf {v}\in [-r, r]^d\) with \(\mathbf {p}= f(\mathbf {v})\). For example, this happens if we use over-complete local bases to represent the deformation. Note that the function f is onto:

$$\begin{aligned} \left( [-r, r]^D \cap \mathcal {M}\right) \subset f([-r, r]^d) \end{aligned}$$

In this case, we do not need to fill the entire hypercube \([-r, r]^D\), but rather fill the d-dimensional hypercube \([-r, r]^d\), which requires the number of samples to be exponential with respect to only d rather than D. To prove this, we first define the expanding factor c regarding to the mapping:

Definition 2

(Expanding factor c ) The expanding factor c for a mapping f is defined as:

$$\begin{aligned} c \equiv \sup _{\mathbf {v}_1, \mathbf {v}_2\in [-r, r]^d} \frac{\Vert f(\mathbf {v}_1) - f(\mathbf {v}_2)\Vert _\infty }{\Vert \mathbf {v}_1 - \mathbf {v}_2\Vert _\infty } \end{aligned}$$

We thus have the following sampling theorem for deformation parameters \(\mathbf {p}\) on a manifold:

Theorem 8

(Sampling theorem, sufficient condition in the manifold case) With \(c_{SS}\lceil 1/\alpha \rceil ^d\) samples distributed in the hypercube \([-r, r]^d\), for any \(\mathbf {p}\in \mathcal {M}\), there exists at least one sample \({\hat{\mathbf {p}}} = f({\hat{\mathbf {v}}})\) so that \(\Vert {\hat{\mathbf {p}}} - \mathbf {p}\Vert _\infty \le \alpha r\). Note \(c_{SS} = \lceil c\rceil ^d\).


We first apply Theorem 4 to the hypercube \([-r, r]^d\). Then with \(\lceil \frac{c}{\alpha }\rceil ^d\) samples, for any \(\mathbf {v}\in [-r, r]^d\), there exists a training sample \(\mathbf {v}^{(i)}\) so that

$$\begin{aligned} \Vert \mathbf {v}- \mathbf {v}^{(i)}\Vert _\infty \le \frac{\alpha r}{c} \end{aligned}$$

We then build the training samples \(\{\mathbf {p}^{(i)}\}\) by setting \(\mathbf {p}^{(i)} = f(\mathbf {v}^{(i)})\). For any \(\mathbf {p}\in [-r, r]^D\), there exists an \(\mathbf {v}\in [-r, r]^d\) so that \(\mathbf {p}= f(\mathbf {v})\). By the sampling procedure, there exists \(\mathbf {v}^{(i)}\) so that \(\Vert \mathbf {v}- \mathbf {v}^{(i)}\Vert _\infty \le \frac{\alpha }{c} r\), and therefore:

$$\begin{aligned} \Vert \mathbf {p}- \mathbf {p}^{(i)}\Vert _\infty \le c \Vert \mathbf {v}- \mathbf {v}^{(i)}\Vert _\infty \le \alpha r \end{aligned}$$

setting \({\hat{\mathbf {p}}} = \mathbf {p}^{(i)}\) thus suffices. Finally, since \(\lceil ab\rceil \le \lceil a \rceil \lceil b \rceil \), we have:

$$\begin{aligned} \left\lceil \frac{c}{\alpha }\right\rceil ^d \le \left\lceil c\right\rceil ^d \left\lceil \frac{1}{\alpha }\right\rceil ^d \end{aligned}$$

and the conclusion follows. \(\square \)

We can see from the proof that c plays a quite important role for the number of samples needed. To make c smaller, d is not necessarily the intrinsic dimension of the manifold, but can be slightly more. Theorem 8 applies to any parameterization of the manifold \(\mathcal {M}\).

Let us compute c for some global deformation fields. For example, an affine deformation field within a rectangle parameterized by D / 2 landmarks is always 6-dimensional. To make c smaller, we (meta)-parameterize the field by 4 meta control points (\(d=8\)) sitting at the four corner of the rectangle (Fig. 20a). In this case, any landmark displacement \(\mathbf {p}(k)\) within this rectangle can be linearly represented by the locations of four corners in a convex manner:

$$\begin{aligned} \mathbf {p}(k) = A_k\mathbf {v}= \sum _{j = 1}^4 a_{kj}\mathbf {v}(j) \end{aligned}$$

Here \(\mathbf {v}\) is the concatenation of four deformation vectors [\(\mathbf {p}(\mathrm {top\_left}), \mathbf {p}(\mathrm {bottom\_left}), \mathbf {p}(\mathrm {top\_right}), \mathbf {p}(\mathrm {bottom\_right})\)] from the four corners, \(0 \le a_{kj} \le 1\) and \(\sum _j a_{kj} = 1\). For any \(\mathbf {p}\in [-r, r]^D, \mathbf {v}\) can be found by just picking up the deformation of its four corners, and thus \(\Vert \mathbf {v}\Vert _\infty \le r\). Furthermore, we have for \(\mathbf {v}_1, \mathbf {v}_2\in [-r, r]^k\):

$$\begin{aligned}&\Vert \mathbf {p}_1 - \mathbf {p}_2\Vert _\infty = \Vert f(\mathbf {v}_1) - f(\mathbf {v}_2)\Vert _\infty \nonumber \\&\quad \le \max _k \sum _{j=1}^4 a_{kj} \Vert \mathbf {v}_1(j) - \mathbf {v}_2(j)\Vert _\infty \le \Vert \mathbf {v}_1 - \mathbf {v}_2\Vert _\infty \quad \quad \end{aligned}$$

Therefore, \(c = 1\). The reason why we pick four corners, is to ensure all weights in linear combinations are between 0 and 1.

Fig. 20

Finding (meta)-control points to parameterize the deformation manifold. a Affine deformation within a rectangle. The intrinsic dimension is 6 while we pick four points (\(d=8\)) to represent the landmark displacement with expanding factor \(c = 1\). Note that picking just three points will also characterize affine, but with a much larger expanding factor c. b Deformation that includes translation and rotation. We pick two points (\(d=4\)), leading to an expanding factor \(c \le 2 + \sqrt{2}\)

Similarly, for 3-dimensional deformation with pure translation and rotation, as used in Sect. 8, we can just pick \(\mathbf {v}\) as the concatenation of two landmark displacements: the rotation center \(\mathbf {p}(\text {center})\) and the corner point \(\mathbf {p}(\text {corner})\) whose rest location is the most distant to the center than other landmarks (Fig. 20b). Denote \(r_\mathbf{corner }\) as the distance. For other landmark displacement \(\mathbf {p}(k)\) whose index k can be parametrized by polar coordinate \((r, \theta )\), we have:

$$\begin{aligned}&\mathbf {p}(r, \theta ) = \mathbf {p}(\text {center})+ \frac{r}{r_\mathbf{corner }}R(\theta )(\mathbf {p}(\text {corner})- \mathbf {p}(\text {center}))\nonumber \\ \end{aligned}$$
$$\begin{aligned}&\quad = (Id - \frac{r}{r_\mathbf{corner }}R(\theta ))\mathbf {p}(\text {center})+\! \frac{r}{r_\mathbf{corner }}R(\theta )\mathbf {p}(\text {corner})\nonumber \\ \end{aligned}$$

where Id is the identity matrix, \(R(\theta )\) is the 2D rotational matrix and \(r_\mathbf{corner }\) is the distance between the rest location of the center to that of the corner. Therefore, for two different \(\mathbf {v}_1\) and \(\mathbf {v}_2\), since \(r \le r_\mathbf{corner }\), we have:

$$\begin{aligned}&\Vert \mathbf {p}_1(r, \theta ) - \mathbf {p}_2(r, \theta )\Vert _\infty \nonumber \\&\quad \le \left\| \left( I - \frac{r}{r_\mathbf{corner }}R(\theta )\right) (\mathbf {p}_1(\text {center})- \mathbf {p}_2(\text {center}))\right\| _\infty \nonumber \\&\qquad + \left\| \frac{r}{r_\mathbf{corner }}R(\theta )(\mathbf {p}_1(\text {corner})- \mathbf {p}_2(\text {corner}))\right\| _\infty \nonumber \\&\quad \le 2\Vert \mathbf {p}_1(\text {center})- \mathbf {p}_2(\text {center}))\Vert _\infty \nonumber \\&\qquad + \sqrt{2}\Vert \mathbf {p}_1(\text {corner})- \mathbf {p}_2(\text {corner}))\Vert _\infty \end{aligned}$$
$$\begin{aligned}&\quad \le (2+\sqrt{2}) \Vert \mathbf {v}_1 - \mathbf {v}_2\Vert _\infty \end{aligned}$$

since \(|\cos (\theta )| + |\sin (\theta )| \le \sqrt{2}\). Therefore,

$$\begin{aligned}&\Vert \mathbf {p}_1 - \mathbf {p}_2\Vert _\infty = \max _{r, \theta } \Vert \mathbf {p}_1(r, \theta ) - \mathbf {p}_2(r, \theta )\Vert _\infty \nonumber \\&\quad \le (2+\sqrt{2}) \Vert \mathbf {v}_1 - \mathbf {v}_2\Vert _\infty \end{aligned}$$

So \(c = 2+\sqrt{2}\le 3.5\). This constant is used in Sect. 8 to compute the number of samples required by the theory.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Tian, Y., Narasimhan, S.G. Theory and Practice of Hierarchical Data-driven Descent for Optimal Deformation Estimation. Int J Comput Vis 115, 44–67 (2015).

Download citation


  • Deformation modeling
  • Globally optimal solutions
  • Non-rigid deformation
  • Data-driven approach
  • Non-linear optimization
  • Non-convex optimization
  • Image deformation
  • High-dimensional regression