Advertisement

Deep Feedback Inverse Problem Solver

Conference paper
  • 814 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12350)

Abstract

We present an efficient, effective, and generic approach towards solving inverse problems. The key idea is to leverage the feedback signal provided by the forward process and learn an iterative update model. Specifically, at each iteration, the neural network takes the feedback as input and outputs an update on current estimation. Our approach does not have any restrictions on the forward process; it does not require any prior knowledge either. Through the feedback information, our model not only can produce accurate estimations that are coherent to the input observation but also is capable of recovering from early incorrect predictions. We verify the performance of our model over a wide range of inverse problems, including 6-DoF pose estimation, illumination estimation, as well as inverse kinematics. Comparing to traditional optimization-based methods, we can achieve comparable or better performance while being two to three orders of magnitude faster. Compared to deep learning-based approaches, our model consistently improves the performance on all metrics.

Supplementary material

504441_1_En_14_MOESM1_ESM.pdf (3.1 mb)
Supplementary material 1 (pdf 3206 KB)

References

  1. 1.
  2. 2.
    Barron, J.T., Malik, J.: Shape, illumination, and reflectance from shading. TPAMI 37, 1670–1687 (2014)CrossRefGoogle Scholar
  3. 3.
    Bell, S., Bala, K., Snavely, N.: Intrinsic images in the wild. TOG 33, 1–12 (2014)CrossRefGoogle Scholar
  4. 4.
    Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)CrossRefGoogle Scholar
  5. 5.
    Byeon, W., Breuel, T.M., Raue, F., Liwicki, M.: Scene labeling with LSTM recurrent neural networks. In: CVPR (2015)Google Scholar
  6. 6.
    Cao, Z., Sheikh, Y., Banerjee, N.K.: Real-time scalable 6DOF pose estimation for textureless objects. In: ICRA (2016)Google Scholar
  7. 7.
    Carreira, J., Agrawal, P., Fragkiadaki, K., Malik, J.: Human pose estimation with iterative error feedback. In: CVPR (2016)Google Scholar
  8. 8.
    Chan, T.F., Shen, J., Zhou, H.M.: Total variation wavelet inpainting. J. Math. Imag. Vis. 25, 107–125 (2006)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Chang, A.X., et al.: ShapeNet: an information-rich 3D model repository. arXiv (2015)Google Scholar
  10. 10.
    Dong, C., Loy, C.C., He, K., Tang, X.: Learning a deep convolutional network for image super-resolution. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 184–199. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10593-2_13CrossRefGoogle Scholar
  11. 11.
    Dong, W., Zhang, L., Shi, G., Wu, X.: Image deblurring and super-resolution by adaptive sparse domain selection and adaptive regularization. TIP 20, 1838–1857 (2011)MathSciNetzbMATHGoogle Scholar
  12. 12.
    Donoho, D.L.: De-noising by soft-thresholding. Trans. Inf. Theory 41, 613–627 (1995)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Epstein, D., Chen, B., Vondrick, C.: Oops! Predicting unintentional action in video. arXiv (2019)Google Scholar
  14. 14.
    Flynn, J., et al.: DeepView: view synthesis with learned gradient descent. arXiv (2019)Google Scholar
  15. 15.
    Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2012)Google Scholar
  16. 16.
    Gkioxari, G., Toshev, A., Jaitly, N.: Chained predictions using convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 728–743. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46493-0_44CrossRefGoogle Scholar
  17. 17.
    Greff, K., Srivastava, R.K., Schmidhuber, J.: Highway and residual networks learn unrolled iterative estimation. arXiv (2016)Google Scholar
  18. 18.
    Grochow, K., Martin, S.L., Hertzmann, A., Popović, Z.: Style-based inverse kinematics. In: TOG (2004)Google Scholar
  19. 19.
    He, K., Sun, J., Tang, X.: Single image haze removal using dark channel prior. TPAMI 33, 2341–2353 (2010)Google Scholar
  20. 20.
    Hinterstoisser, S., et al.: Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012. LNCS, vol. 7724, pp. 548–562. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-37331-2_42CrossRefGoogle Scholar
  21. 21.
    Huang, J.B., Singh, A., Ahuja, N.: Single image super-resolution from transformed self-exemplars. In: CVPR (2015)Google Scholar
  22. 22.
    Janner, M., Wu, J., Kulkarni, T.D., Yildirim, I., Tenenbaum, J.: Self-supervised intrinsic image decomposition. In: NeurIPS (2017)Google Scholar
  23. 23.
    Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: CVPR (2018)Google Scholar
  24. 24.
    Kanazawa, A., Tulsiani, S., Efros, A.A., Malik, J.: Learning category-specific mesh reconstruction from image collections. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11219, pp. 386–402. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01267-0_23CrossRefGoogle Scholar
  25. 25.
    Karsch, K., Hedau, V., Forsyth, D., Hoiem, D.: Rendering synthetic objects into legacy photographs. TOG 30, 1–12 (2011)CrossRefGoogle Scholar
  26. 26.
    Kato, H., Ushiku, Y., Harada, T.: Neural 3D mesh renderer. In: CVPR (2018)Google Scholar
  27. 27.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv (2014)Google Scholar
  28. 28.
    Kulkarni, K., Lohit, S., Turaga, P., Kerviche, R., Ashok, A.: ReconNet: non-iterative reconstruction of images from compressively sensed measurements. In: CVPR (2016)Google Scholar
  29. 29.
    Laffont, P.Y., Bazin, J.C.: Intrinsic decomposition of image sequences from local temporal variations. In: ICCV (2015)Google Scholar
  30. 30.
    Lai, W.S., Huang, J.B., Ahuja, N., Yang, M.H.: Deep Laplacian pyramid networks for fast and accurate super-resolution. In: CVPR (2017)Google Scholar
  31. 31.
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., et al.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)CrossRefGoogle Scholar
  32. 32.
    Ledig, C., et al.: Photo-realistic single image super-resolution using a generative adversarial network. In: CVPR (2017)Google Scholar
  33. 33.
    Levin, A.: Blind motion deblurring using image statistics. In: NeurIPS (2007)Google Scholar
  34. 34.
    Li, Y., Wang, G., Ji, X., Xiang, Yu., Fox, D.: DeepIM: deep iterative matching for 6D pose estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 695–711. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01231-1_42CrossRefGoogle Scholar
  35. 35.
    Li, Z., Snavely, N.: Learning intrinsic image decomposition from watching the world. In: CVPR (2018)Google Scholar
  36. 36.
    Liang, M., Hu, X.: Recurrent convolutional neural network for object recognition. In: CVPR (2015)Google Scholar
  37. 37.
    Lin, C.H., Kong, C., Lucey, S.: Learning efficient point cloud generation for dense 3D object reconstruction. In: AAAI (2018)Google Scholar
  38. 38.
    Lin, C.H., Lucey, S.: Inverse compositional spatial transformer networks. In: CVPR (2017)Google Scholar
  39. 39.
    Lin, C.H., et al.: Photometric mesh optimization for video-aligned 3D object reconstruction. In: CVPR (2019)Google Scholar
  40. 40.
    Lin, C.H., Yumer, E., Wang, O., Shechtman, E., Lucey, S.: ST-GAN: spatial transformer generative adversarial networks for image compositing. In: CVPR (2018)Google Scholar
  41. 41.
    Liu, G., Reda, F.A., Shih, K.J., Wang, T.-C., Tao, A., Catanzaro, B.: Image inpainting for irregular holes using partial convolutions. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 89–105. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01252-6_6CrossRefGoogle Scholar
  42. 42.
    Liu, S., Chen, W., Li, T., Li, H.: Soft rasterizer: differentiable rendering for unsupervised single-view mesh reconstruction. arXiv (2019)Google Scholar
  43. 43.
    Ma, W.-C., Chu, H., Zhou, B., Urtasun, R., Torralba, A.: Single image intrinsic decomposition without a single intrinsic image. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 211–229. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01264-9_13CrossRefGoogle Scholar
  44. 44.
    Ma, W.C., Wang, S., Hu, R., Xiong, Y., Urtasun, R.: Deep rigid instance scene flow. In: CVPR (2019)Google Scholar
  45. 45.
    Martinez, J., Black, M.J., Romero, J.: On human motion prediction using recurrent neural networks. In: CVPR (2017)Google Scholar
  46. 46.
    Nah, S., Hyun Kim, T., Mu Lee, K.: Deep multi-scale convolutional neural network for dynamic scene deblurring. In: CVPR (2017)Google Scholar
  47. 47.
    Oh, B.M., Chen, M., Dorsey, J., Durand, F.: Image-based modeling and photo editing. In: SIGGRAPH (2001)Google Scholar
  48. 48.
    Pan, J., Hu, Z., Su, Z., Yang, M.H.: \(L_0\)-regularized intensity and gradient prior for deblurring text images and beyond. TPAMI 39, 342–355 (2016)CrossRefGoogle Scholar
  49. 49.
    Pan, J., Sun, D., Pfister, H., Yang, M.H.: Blind image deblurring using dark channel prior. In: CVPR (2016)Google Scholar
  50. 50.
    Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T., Efros, A.A.: Context encoders: feature learning by inpainting. In: CVPR (2016)Google Scholar
  51. 51.
    Pavllo, D., Grangier, D., Auli, M.: QuaterNet: a quaternion-based recurrent model for human motion. In: BMVS (2018)Google Scholar
  52. 52.
    Portilla, J., Strela, V., Wainwright, M.J., Simoncelli, E.P.: Image denoising using scale mixtures of Gaussians in the wavelet domain. TIP 12, 1338–1351 (2003)MathSciNetzbMATHGoogle Scholar
  53. 53.
    Ramakrishna, V., Munoz, D., Hebert, M., Andrew Bagnell, J., Sheikh, Y.: Pose machines: articulated pose estimation via inference machines. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 33–47. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10605-2_3CrossRefGoogle Scholar
  54. 54.
    Ravi, N., et al.: PyTorch3D (2020). https://github.com/facebookresearch/pytorch3d
  55. 55.
    Rick Chang, J., Li, C.L., Poczos, B., Vijaya Kumar, B., Sankaranarayanan, A.C.: One network to solve them all-solving linear inverse problems using deep projection models. In: ICCV (2017)Google Scholar
  56. 56.
    Rother, C., Kiefel, M., Zhang, L., Schölkopf, B., Gehler, P.V.: Recovering intrinsic images with a global sparsity prior on reflectance. In: NeurIPS (2011)Google Scholar
  57. 57.
    Shocher, A., Cohen, N., Irani, M.: “Zero-shot” super-resolution using deep internal learning. In: CVPR (2018)Google Scholar
  58. 58.
    Toshev, A., Szegedy, C.: DeepPose: human pose estimation via deep neural networks. In: CVPR (2014)Google Scholar
  59. 59.
    Tu, Z.: Auto-context and its application to high-level vision tasks. In: CVPR (2008)Google Scholar
  60. 60.
    Tung, H.Y., Tung, H.W., Yumer, E., Fragkiadaki, K.: Self-supervised learning of motion capture. In: NeurIPS (2017)Google Scholar
  61. 61.
    Tung, H.Y.F., Harley, A.W., Seto, W., Fragkiadaki, K.: Adversarial inverse graphics networks: learning 2D-to-3D lifting and image-to-image translation from unpaired supervision. In: ICCV (2017)Google Scholar
  62. 62.
    Villegas, R., Yang, J., Ceylan, D., Lee, H.: Neural kinematic networks for unsupervised motion retargetting. In: CVPR (2018)Google Scholar
  63. 63.
    Wang, X., et al.: ESRGAN: enhanced super-resolution generative adversarial networks. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11133, pp. 63–79. Springer, Cham (2019).  https://doi.org/10.1007/978-3-030-11021-5_5CrossRefGoogle Scholar
  64. 64.
    Wang, Z., Yang, Y., Wang, Z., Chang, S., Yang, J., Huang, T.S.: Learning super-resolution jointly from external and internal examples. TIP 24, 4359–4371 (2015)MathSciNetzbMATHGoogle Scholar
  65. 65.
    Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: CVPR (2016)Google Scholar
  66. 66.
    Weiss, D., Taskar, B.: Structured prediction cascades. In: AISTATS (2010)Google Scholar
  67. 67.
    Wiles, O., Gkioxari, G., Szeliski, R., Johnson, J.: SynSin: end-to-end view synthesis from a single image. arXiv (2019)Google Scholar
  68. 68.
    Wu, J., Lim, J.J., Zhang, H., Tenenbaum, J.B.: Physics 101: learning physical object properties from unlabeled videos Google Scholar
  69. 69.
    Wu, J., Yildirim, I., Lim, J.J., Freeman, B., Tenenbaum, J.: Galileo: perceiving physical object properties by integrating a physics engine with deep learning. In: NeurIPS (2015)Google Scholar
  70. 70.
    Xiong, X., De la Torre, F.: Supervised descent method and its applications to face alignment. In: CVPR (2013)Google Scholar
  71. 71.
    Xu, L., Ren, J.S., Liu, C., Jia, J.: Deep convolutional neural network for image deconvolution. In: NeurIPS (2014)Google Scholar
  72. 72.
    Yao, S., et al.: 3D-aware scene manipulation via inverse graphics. In: NeurIPS (2018)Google Scholar
  73. 73.
    Zhang, X., Ng, R., Chen, Q.: Single image reflection separation with perceptual losses. In: CVPR (2018)Google Scholar
  74. 74.
    Zhou, Y., Barnes, C., Lu, J., Yang, J., Li, H.: On the continuity of rotation representations in neural networks. In: CVPR (2019)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Uber Advanced Technologies GroupPittsburghUSA
  2. 2.Massachusetts Institute of TechnologyCambridgeUSA
  3. 3.University of TorontoTorontoCanada
  4. 4.University of California San DiegoSan DiegoUSA

Personalised recommendations