Learning Task-Specific Generalized Convolutions in the Permutohedral Lattice

  • Anne S. WannenwetschEmail author
  • Martin Kiefel
  • Peter V. Gehler
  • Stefan Roth
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11824)


Dense prediction tasks typically employ encoder-decoder architectures, but the prevalent convolutions in the decoder are not image-adaptive and can lead to boundary artifacts. Different generalized convolution operations have been introduced to counteract this. We go beyond these by leveraging guidance data to redefine their inherent notion of proximity. Our proposed network layer builds on the permutohedral lattice, which performs sparse convolutions in a high-dimensional space allowing for powerful non-local operations despite small filters. Multiple features with different characteristics span this permutohedral space. In contrast to prior work, we learn these features in a task-specific manner by generalizing the basic permutohedral operations to learnt feature representations. As the resulting objective is complex, a carefully designed framework and learning procedure are introduced, yielding rich feature embeddings in practice. We demonstrate the general applicability of our approach in different joint upsampling tasks. When adding our network layer to state-of-the-art networks for optical flow and semantic segmentation, boundary artifacts are removed and the accuracy is improved.

Supplementary material

480714_1_En_24_MOESM1_ESM.pdf (6.4 mb)
Supplementary material 1 (pdf 6579 KB)


  1. 1.
    Adams, A., Baek, J., Davis, M.A.: Fast high-dimensional filtering using the permutohedral lattice. Comput. Graph. Forum 29(2), 753–762 (2010) CrossRefGoogle Scholar
  2. 2.
    Barron, J.T., Poole, B.: The fast bilateral solver. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, Part III. LNCS, vol. 9907, pp. 617–632. Springer, Cham (2016). Scholar
  3. 3.
    Buades, A., Coll, B., Morel, J.M.: A non-local algorithm for image denoising. In: CVPR (2005)Google Scholar
  4. 4.
    Butler, D.J., Wulff, J., Stanley, G.B., Black, M.J.: A naturalistic open source movie for optical flow evaluation. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VI. LNCS, vol. 7577, pp. 611–625. Springer, Heidelberg (2012). Scholar
  5. 5.
    Chang, J., Gu, J., Wang, L., Meng, G., Xiang, S., Pan, C.: Structure-aware convolutional neural network. In: NeurIPS*2018 (2018)Google Scholar
  6. 6.
    Chen, L., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Semantic image segmentation with deep convolutional nets and fully connected CRFs. In: ICLR (2015)Google Scholar
  7. 7.
    Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018, Part VII. LNCS, vol. 11211, pp. 833–851. Springer, Cham (2018). Scholar
  8. 8.
    Dai, J., et al.: Deformable convolutional networks. In: ICCV (2017)Google Scholar
  9. 9.
    Dai, L., Tang, L., Xie, Y., Tang, J.: Designing by training: acceleration neural network for fast high-dimensional convolution. In: NeurIPS*2018 (2018)Google Scholar
  10. 10.
    Dolson, J., Baek, J., Plagemann, C., Thrun, S.: Upsampling range data in dynamic environments. In: CVPR (2010)Google Scholar
  11. 11.
    Everingham, M., Eslami, S.M.A., Gool, L.V., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL visual object classes challenge: a retrospective. Int. J. Comput. Vis. 111(1), 98–136 (2015)CrossRefGoogle Scholar
  12. 12.
    Gadde, R., Jampani, V., Kiefel, M., Kappler, D., Gehler, P.V.: Superpixel convolutional networks using bilateral inceptions. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, Part I. LNCS, vol. 9905, pp. 597–613. Springer, Cham (2016). Scholar
  13. 13.
    Gharbi, M., Chen, J., Barron, J.T., Hasinoff, S.W., Durand, F.: Deep bilateral learning for real-time image enhancement. In: SIGGRAPH (2017)Google Scholar
  14. 14.
    Harley, A.W., Derpanis, K.G., Kokkinos, I.: Segmentation-aware convolutional networks using local attention masks. In: ICCV (2017)Google Scholar
  15. 15.
    He, K., Sun, J., Tang, X.: Guided image filtering. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 1–14. Springer, Heidelberg (2010). Scholar
  16. 16.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2017)Google Scholar
  17. 17.
    Henriques, J.F., Vedaldi, A.: Warped convolutions: efficient invariance to spatial transformations. In: ICML (2017)Google Scholar
  18. 18.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML (2015)Google Scholar
  19. 19.
    Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. In: NIPS*2015 (2015)Google Scholar
  20. 20.
    Jampani, V., Kiefel, M., Gehler, P.V.: Learning sparse high dimensional filters: image filtering, dense CRFs and bilateral neural networks. In: CVPR (2016)Google Scholar
  21. 21.
    Jeon, Y., Kim, J.: Active convolution: learning the shape of convolution for image classification. In: CVPR (2017)Google Scholar
  22. 22.
    Jia, X., Brabandere, B.D., Tuytelaars, T., Gool, L.V.: Dynamic filter networks. In: NIPS*2016 (2016)Google Scholar
  23. 23.
    Kang, D., Dhar, D., Chan, A.B.: Incorporating side information by adaptive convolution. In: NIPS*2017 (2017)Google Scholar
  24. 24.
    Kiefel, M., Jampani, V., Gehler, P.V.: Permutohedral lattice CNNs. In: ICLR Workshop Track (2016)Google Scholar
  25. 25.
    Kopf, J., Cohen, M.F., Lischinski, D., Uyttendaele, M.: Joint bilateral upsampling. ACM Trans. Graph. 26(3), 96 (2007)CrossRefGoogle Scholar
  26. 26.
    Krähenbühl, P., Koltun, V.: Efficient inference in fully connected CRFs with Gaussian edge potentials. In: NIPS*2011 (2011)Google Scholar
  27. 27.
    Krähenbühl, P., Koltun, V.: Parameter learning and convergent inference for dense random fields. In: ICML (2013)Google Scholar
  28. 28.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS*2012 (2012)Google Scholar
  29. 29.
    Li, J., Chen, Y., Cai, L., Davidson, I., Ji, S.: Dense transformer networks. arXiv:1705.08881 [cs.CV] (2017)
  30. 30.
    Li, S., Seybold, B., Vorobyov, A., Lei, X., Kuo, C.-C.J.: Unsupervised video object segmentation with motion-based bilateral networks. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018, Part III. LNCS, vol. 11207, pp. 215–231. Springer, Cham (2018). Scholar
  31. 31.
    Li, Y., Huang, J.B., Ahuja, N., Yang, M.H.: Joint image filtering with deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 41(8), 1909–1923 (2019) CrossRefGoogle Scholar
  32. 32.
    Perazzi, F., Krähenbühl, P., Pritch, Y., Hornung, A.: Saliency filters: contrast based filtering for salient region detection. In: CVPR (2012)Google Scholar
  33. 33.
    Recasens, A., Kellnhofer, P., Stent, S., Matusik, W., Torralba, A.: Learning to zoom: a saliency-based sampling layer for neural networks. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018, Part IX. LNCS, vol. 11213, pp. 52–67. Springer, Cham (2018). Scholar
  34. 34.
    Russell, C., Yu, R., Agapito, L.: Video pop-up: monocular 3D reconstruction of dynamic scenes. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part VII. LNCS, vol. 8695, pp. 583–598. Springer, Cham (2014). Scholar
  35. 35.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)Google Scholar
  36. 36.
    Singh, B., Najibi, M., Davis, L.S.: SNIPER: efficient multi-scale training. In: NeurIPS*2018 (2018)Google Scholar
  37. 37.
    Song, H.O., Xiang, Y., Jegelka, S., Savarese, S.: Deep metric learning via lifted structured feature embedding. In: CVPR (2016)Google Scholar
  38. 38.
    Su, H., et al.: SPLATNet: sparse lattice networks for point cloud processing. In: CVPR (2018)Google Scholar
  39. 39.
    Sun, D., Yang, X., Liu, M.Y., Kautz, J.: PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In: CVPR (2018)Google Scholar
  40. 40.
    Tabernik, D., Kristan, M., Leonardis, A.: Spatially-adaptive filter units for deep neural networks. In: CVPR (2018)Google Scholar
  41. 41.
    Tomasi, C., Manduchi, R.: Bilateral filtering for gray and color images. In: ICCV (1998)Google Scholar
  42. 42.
    Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: CVPR (2018)Google Scholar
  43. 43.
    Weinzaepfel, P., Revaud, J., Harchaoui, Z., Schmid, C.: Learning to detect motion boundaries. In: CVPR (2015)Google Scholar
  44. 44.
    Wu, C.Y., Manmatha, R., Smola, A.J., Krähenbühl, P.: Sampling matters in deep embedding learning. In: ICCV (2017)Google Scholar
  45. 45.
    Wu, H., Zheng, S., Zhang, J., Huang, K.: Fast end-to-end trainable guided filter. In: CVPR (2018)Google Scholar
  46. 46.
    Wu, J., Li, D., Yang, Y., Bajaj, C., Ji, X.: Dynamic filtering with large sampling field for ConvNets. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018, Part X. LNCS, vol. 11214, pp. 188–203. Springer, Cham (2018). Scholar
  47. 47.
    Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. In: ICLR (2016)Google Scholar
  48. 48.
    Zhang, Z., Fidler, S., Urtasun, R.: Instance-level segmentation for autonomous driving with deep densely connected MRFs. In: CVPR (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.TU DarmstadtDarmstadtGermany
  2. 2.AmazonTübingenGermany

Personalised recommendations