# Theory and Practice of Hierarchical Data-driven Descent for Optimal Deformation Estimation

## Abstract

Real-world surfaces such as clothing, water and human body deform in complex ways. Estimating deformation parameters accurately and reliably is hard due to its high-dimensional and non-convex nature. Optimization-based approaches require good initialization while regression-based approaches need a large amount of training data. Recently, to achieve globally optimal estimation, data-driven descent (Tian and Narasimhan in Int J Comput Vis , 98:279–302, 2012) applies nearest neighbor estimators trained on a particular distribution of training samples to obtain a globally optimal and dense deformation field between a template and a distorted image. In this work, we develop a hierarchical structure that first applies nearest neighbor estimators on the entire image iteratively to obtain a rough estimation, and then applies estimators with local image support to refine the estimation. Compared to its non-hierarchical version, our approach has the theoretical guarantees with significantly fewer training samples, is faster by several orders, provides a better metric deciding whether a given image requires more (or fewer) samples, and can handle more complex scenes that include a mixture of global motion and local deformation. We demonstrate in both simulation and real experiments that the proposed algorithm successfully tracks a broad range of non-rigid scenes including water, clothing, and medical images, and compares favorably against several other deformation estimation and tracking approaches that do not provide optimality guarantees.

### Keywords

Deformation modeling Globally optimal solutions Non-rigid deformation Data-driven approach Non-linear optimization Non-convex optimization Image deformation High-dimensional regression## Notes

### Acknowledgments

This research was supported in parts by ONR grant N00014-11-1-0295, a Microsoft Research PhD fellowship, a University Transportation Center T-SET grant and a gift from TONBO Imaging.

### References

- Baker, S., & Matthews, I. (2004). Lucas-kanade 20 years on: A unifying framework.
*International Journal of Computer Vision*,*56*, 221–255.CrossRefGoogle Scholar - Barnes, C., Shechtman, E., Finkelstein, A., & Goldman, D. (2009). Patchmatch: A randomized correspondence algorithm for structural image editing.
*ACM Transactions on Graphics-TOG*,*28*(3), 24.Google Scholar - Barnes, C., Shechtman, E., Goldman, D. B., & Finkelstein, A. (2010). The generalized patchmatch correspondence algorithm. In
*ECCV*, 2010 (pp. 29–43). Berlin: Springer.Google Scholar - Beauchemin, S. S., & Barron, J. L. (1995). The computation of optical flow.
*ACM Computing Surveys (CSUR)*,*27*(3), 433–466.CrossRefGoogle Scholar - Bookstein, F. L. (1989). Principal warps: Thin-plate splines and the decomposition of deformations.
*IEEE Transactions on Pattern Analysis & Machine Intelligence*,*6*, 567–585.CrossRefGoogle Scholar - Cao, X., Wei, Y., Wen, F., & Sun, J. (2012). Face alignment by explicit shape regression. In
*CVPR*.Google Scholar - Krizhevsky, A., Sutskever, I., & Hinton, G. (2012). Imagenet classification with deep convolutional neural networks. In
*NIPS*.Google Scholar - Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In
*CVPR*.Google Scholar - Lowe, D. (2004). Distinctive image features from scale-invariant keypoints.
*International Journal of Computer Vision*,*60*, 91–110.CrossRefGoogle Scholar - Lucas, B. D., & Kanade, T. (1981). An iterative image registration technique with an application to stereo vision. In
*IJCAI*(pp. 674–679).Google Scholar - Matthews, I., & Baker, S. (2004). Active appearance models revisited.
*International Journal of Computer Vision*,*60*, 135–164.Google Scholar - Moll, M., & Gool, L. V. (2012). Optimal templates for non-rigid surface reconstruction. In
*ECCV*.Google Scholar - Rueckert, D., Sonoda, L., Hayes, C., Hill, D., Leach, M., & Hawkes, D. (1999). Nonrigid registration using free-form deformations: Application to breast MR images.
*IEEE Transactions on Medical Imaging*,*18*, 712–721.Google Scholar - Salzmann, M., Hartley, R., & Fua, P. (2007). Convex optimization for deformable surface 3-d tracking. In
*ICCV*.Google Scholar - Salzmann, M., Moreno-Noguer, F., Lepetit, V., & Fua, P. (2008). Closed-form solution to non-rigid 3d surface registration. In
*ECCV*.Google Scholar - Serre, T., Wolf, L., & Poggio, T. (2005). Object recognition with features inspired by visual cortex. In
*CVPR*(Vol. 2, pp. 994–1000).Google Scholar - Shi, J., & Tomasi, C. (1994). Good features to track. In
*CVPR*.Google Scholar - Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., & Blake, A. (2011). Real-time human pose recognition in parts from single depth images. In
*CVPR*.Google Scholar - Tan, D. J., Holzer, S., Navab, N., & Ilic, S. (2014). Deformable template tracking in 1 ms. In
*ECCV*.Google Scholar - Taylor, J., Jepson, A., & Kutulakos, K. (2010). Non-rigid structure from locally-rigid motion. In
*CVPR*.Google Scholar - Tian, Y., & Narasimhan, S. G. (2012). Globally optimal estimation of nonrigid image distortion.
*International Journal of Computer Vision*,*98*, 279–302.MATHMathSciNetCrossRefGoogle Scholar - Zhang, S., Zhan, Y., Zhou, Y., Uzunbas, M., & Metaxas, D. (2012). Shape prior modeling using sparse representation and online dictionary learning.
*Medical image computing and computer-assisted intervention*(Vol. 7512, pp. 435–442)., Lecture notes in computer science Berlin: Springer.Google Scholar