Beyond Fixed Grid: Learning Geometric Image Representation with a Deformable Grid

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12354)


In modern computer vision, images are typically represented as a fixed uniform grid with some stride and processed via a deep convolutional neural network. We argue that deforming the grid to better align with the high-frequency image content is a more effective strategy. We introduce Deformable Grid (DefGrid), a learnable neural network module that predicts location offsets of vertices of a 2-dimensional triangular grid, such that the edges of the deformed grid align with image boundaries. We showcase our DefGrid in a variety of use cases, i.e., by inserting it as a module at various levels of processing. We utilize DefGrid as an end-to-end learnable geometric downsampling layer that replaces standard pooling methods for reducing feature resolution when feeding images into a deep CNN. We show significantly improved results at the same grid resolution compared to using CNNs on uniform grids for the task of semantic segmentation. We also utilize DefGrid at the output layers for the task of object mask annotation, and show that reasoning about object boundaries on our predicted polygonal grid leads to more accurate results over existing pixel-wise and curve-based approaches. We finally showcase DefGrid as a standalone module for unsupervised image partitioning, showing superior performance over existing approaches. Project website: jungao/def-grid.



This work was supported by NSERC. SF acknowledges the Canada CIFAR AI Chair award at the Vector Institute. We thank Frank Shen, Wenzheng Chen and Huan Ling for helpful discussions, and also thank the anonymous reviewers for valuable comments.

Supplementary material

504446_1_En_7_MOESM1_ESM.pdf (36.1 mb)
Supplementary material 1 (pdf 36962 KB)


  1. 1.
    Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., Süsstrunk, S.: Slic superpixels. Technical report (2010)Google Scholar
  2. 2.
    Achanta, R., Susstrunk, S.: Superpixels and polygons using simple non-iterative clustering. In: CVPR, pp. 4651–4660 (2017)Google Scholar
  3. 3.
    Acuna, D., Ling, H., Kar, A., Fidler, S.: Efficient interactive annotation of segmentation datasets with polygon-RNN++. In: CVPR (2018)Google Scholar
  4. 4.
    Arbelaez, P., Maire, M., Fowlkes, C., Malik, J.: Contour detection and hierarchical image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 33(5), 898–916 (2010)CrossRefGoogle Scholar
  5. 5.
    den Bergh, M.V., Roig, G., Boix, X., Manen, S., Gool, L.V.: Online video superpixels for temporal window objectness. In: ICCV (2013)Google Scholar
  6. 6.
    Bódis-Szomorú, A., Riemenschneider, H., Van Gool, L.: Superpixel meshes for fast edge-preserving surface reconstruction. In: CVPR, pp. 2011–2020 (2015)Google Scholar
  7. 7.
    Castrejon, L., Kundu, K., Urtasun, R., Fidler, S.: Annotating object instances with a polygon-RNN. In: CVPR (2017)Google Scholar
  8. 8.
    Chen, W., et al.: Learning to predict 3D objects with an interpolation-based differentiable renderer. In: Advances in Neural Information Processing Systems (2019)Google Scholar
  9. 9.
    Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3213–3223 (2016)Google Scholar
  10. 10.
    Dai, J., et al.: Deformable convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 764–773 (2017)Google Scholar
  11. 11.
    Duan, L., Lafarge, F.: Image partitioning into convex polygons. In: CVPR, pp. 3119–3127 (2015)Google Scholar
  12. 12.
    Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient graph-based image segmentation. Int. J. Comput. Vis. 59(2), 167–181 (2004)CrossRefGoogle Scholar
  13. 13.
    Gadde, R., Jampani, V., Kiefel, M., Kappler, D., Gehler, P.V.: Superpixel convolutional networks using bilateral inceptions. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 597–613. Springer, Cham (2016). Scholar
  14. 14.
    Gao, J., Tang, C., Ganapathi-Subramanian, V., Huang, J., Su, H., Guibas, L.J.: DeepSpline: data-driven reconstruction of parametric curves and surfaces. arXiv preprint arXiv:1901.03781 (2019)
  15. 15.
    Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: CVPR (2012)Google Scholar
  16. 16.
    Gerhard, S., Funke, J., Martel, J., Cardona, A., Fetter, R.: Segmented anisotropic ssTEM dataset of neural tissue (2013).
  17. 17.
    Grundmann, M., Kwatra, V., Han, M., Essa, I.: Efficient hierarchical graph-based video segmentation. In: CVPR (2010)Google Scholar
  18. 18.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  19. 19.
    Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)Google Scholar
  20. 20.
    Huang, J., et al.: DeepPrimitive: image decomposition by layered primitive detection. Comput. Vis. Media 4(4), 385–397 (2018)CrossRefGoogle Scholar
  21. 21.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML, pp. 448–456 (2015)Google Scholar
  22. 22.
    Jacobson, A., Baran, I., Popović, J., Sorkine, O.: Bounded biharmonic weights for real-time deformation. SIGGRAPH 30(4), 78:1–78:8 (2011)Google Scholar
  23. 23.
    Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: Advances in Neural Information Processing Systems, pp. 2017–2025 (2015)Google Scholar
  24. 24.
    Jampani, V., Sun, D., Liu, M.Y., Yang, M.H., Kautz, J.: Superpixel samping networks. In: ECCV (2018)Google Scholar
  25. 25.
    Levinshtein, A., Stere, A., Kutulakos, K., Fleet, D., Dickinson, S., Siddiqi, K.: TurboPixels: fast superpixels using geometric flows. PAMI 31(12), 2290–2297 (2009)CrossRefGoogle Scholar
  26. 26.
    Liang, J., Homayounfar, N., Ma, W.C., Xiong, Y., Hu, R., Urtasun, R.: PolyTransform: deep polygon transformer for instance segmentation. arXiv preprint arXiv:1912.02801 (2019)
  27. 27.
    Ling, H., Gao, J., Kar, A., Chen, W., Fidler, S.: Fast interactive object annotation with curve-GCN. In: CVPR, pp. 5257–5266 (2019)Google Scholar
  28. 28.
    Liu, M.Y., Tuzel, O., Ramalingam, S., Chellappa, R.: Entropy rate superpixel segmentation. In: CVPR 2011, pp. 2097–2104. IEEE (2011)Google Scholar
  29. 29.
    Maninis, K.K., Caelles, S., Pont-Tuset, J., Van Gool, L.: Deep extreme cut: from extreme points to object segmentation. In: CVPR (2018)Google Scholar
  30. 30.
    Moore, A., Prince, S., Warrell, J., Mohammed, U., Jones, G.: Superpixel lattices. In: CVPR (2008)Google Scholar
  31. 31.
    Recasens, A., Kellnhofer, P., Stent, S., Matusik, W., Torralba, A.: Learning to zoom: a saliency-based sampling layer for neural networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 51–66 (2018)Google Scholar
  32. 32.
    Shi, J., Malik, J.: Normalized cuts and image segmentation. Departmental Papers (CIS), p. 107 (2000)Google Scholar
  33. 33.
    Suinesiaputra, A., et al.: A collaborative resource to build consensus for automated left ventricular segmentation of cardiac MR images. Med. Image Anal. 18(1), 50–62 (2014)CrossRefGoogle Scholar
  34. 34.
    Sun, X., Christoudias, C.M., Fua, P.: Free-shape polygonal object localization. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 317–332. Springer, Cham (2014). Scholar
  35. 35.
    Takikawa, T., Acuna, D., Jampani, V., Fidler, S.: Gated-SCNN: gated shape CNNs for semantic segmentation. ArXiv abs/1907.05740 (2019)Google Scholar
  36. 36.
    Tu, W.C., et al.: Learning superpixels with segmentation-aware affinity loss. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 568–576 (2018)Google Scholar
  37. 37.
    Vincent, L., Soille, P.: Watersheds in digital spaces: An efficient algorithm based on immersion simulations. PAMI 13(6), 583–598 (1991)CrossRefGoogle Scholar
  38. 38.
    Wang, N., Zhang, Y., Li, Z., Fu, Y., Liu, W., Jiang, Y.-G.: Pixel2Mesh: generating 3D mesh models from single RGB images. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 55–71. Springer, Cham (2018). Scholar
  39. 39.
    Wang, Z., Acuna, D., Ling, H., Kar, A., Fidler, S.: Object instance annotation with deep extreme level set evolution. In: CVPR (2019)Google Scholar
  40. 40.
    Yamaguchi, K., McAllester, D., Urtasun, R.: Efficient joint segmentation, occlusion labeling, stereo and flow estimation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 756–771. Springer, Cham (2014). Scholar
  41. 41.
    Yao, J., Boben, M., Fidler, S., Urtasun, R.: Real-time coarse-to-fine topologically preserving segmentation. In: CVPR (2015)Google Scholar
  42. 42.
    Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: CVPR (2017)Google Scholar
  43. 43.
    Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., Torralba, A.: Scene parsing through ADE20K dataset. In: CVPR (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.University of TorontoTorontoCanada
  2. 2.Vector InstituteTorontoCanada
  3. 3.NVIDIATorontoCanada
  4. 4.Peking UniversityBeijingChina

Personalised recommendations