SegFix: Model-Agnostic Boundary Refinement for Segmentation

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12357)


We present a model-agnostic post-processing scheme to improve the boundary quality for the segmentation result that is generated by any existing segmentation model. Motivated by the empirical observation that the label predictions of interior pixels are more reliable, we propose to replace the originally unreliable predictions of boundary pixels by the predictions of interior pixels. Our approach processes only the input image through two steps: (i) localize the boundary pixels and (ii) identify the corresponding interior pixel for each boundary pixel. We build the correspondence by learning a direction away from the boundary pixel to an interior pixel. Our method requires no prior information of the segmentation models and achieves nearly real-time speed. We empirically verify that our SegFix consistently reduces the boundary errors for segmentation results generated from various state-of-the-art models on Cityscapes, ADE20K and GTA5. Code is available at:


Semantic segmentation Instance segmentation Boundary refinement Model agnostic 



This work is partially supported by Natural Science Foundation of China under contract No. 61390511, and Frontier Science Key Research Project CAS No. QYZDJ-SSW-JSC009.

Supplementary material

504453_1_En_29_MOESM1_ESM.pdf (1.9 mb)
Supplementary material 1 (pdf 1899 KB)


  1. 1.
    Acuna, D., Kar, A., Fidler, S.: Devil is in the edges: learning semantic boundaries from noisy annotations. In: CVPR (2019)Google Scholar
  2. 2.
    Arbelaez, P., Maire, M., Fowlkes, C., Malik, J.: Contour detection and hierarchical image segmentation. PAMI 33, 898–916 (2010)CrossRefGoogle Scholar
  3. 3.
    Bai, M., Urtasun, R.: Deep watershed transform for instance segmentation. In: CVPR (2017)Google Scholar
  4. 4.
    Bertasius, G., Shi, J., Torresani, L.: High-for-low and low-for-high: efficient boundary detection from deep object features and its applications to high-level vision. In: ICCV (2015)Google Scholar
  5. 5.
    Bertasius, G., Shi, J., Torresani, L.: Semantic segmentation with boundary neural fields. In: CVPR (2016)Google Scholar
  6. 6.
    Bischke, B., Helber, P., Folz, J., Borth, D., Dengel, A.: Multi-task learning for segmentation of building footprints with deep neural networks. In: ICIP (2019)Google Scholar
  7. 7.
    Caesar, H., Uijlings, J., Ferrari, V.: Coco-stuff: thing and stuff classes in context. In: CVPR (2018)Google Scholar
  8. 8.
    Caselles, V., Kimmel, R., Sapiro, G.: Geodesic active contours. IJCV 22, 61–79 (1997). Scholar
  9. 9.
    Chen, L.C., Barron, J.T., Papandreou, G., Murphy, K., Yuille, A.L.: Semantic image segmentation with task-specific edge detection using CNNS and a discriminatively trained domain transform. In: CVPR (2016)Google Scholar
  10. 10.
    Chen, L.C., Hermans, A., Papandreou, G., Schroff, F., Wang, P., Adam, H.: Masklab: instance segmentation by refining object detection with semantic and direction features. In: CVPR (2018)Google Scholar
  11. 11.
    Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: semantic image segmentation with deep convolutional Nets, Atrous convolution, and fully connected CRFs. PAMI 40, 834–848 (2017)CrossRefGoogle Scholar
  12. 12.
    Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking Atrous convolution for semantic image segmentation. arXiv:1706.05587 (2017)
  13. 13.
    Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with Atrous separable convolution for semantic image segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 833–851. Springer, Cham (2018). Scholar
  14. 14.
    Chen, X., Williams, B.M., Vallabhaneni, S.R., Czanner, G., Williams, R., Zheng, Y.: Learning active contour models for medical image segmentation. In: CVPR (2019)Google Scholar
  15. 15.
    Cheng, B., Xiao, B., Wang, J., Shi, H., Huang, T.S., Zhang, L.: Bottom-up higher-resolution networks for multi-person pose estimation. arXiv preprint arXiv:1908.10357 (2019)
  16. 16.
    Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: CVPR (2016)Google Scholar
  17. 17.
    Dangi, S., Yaniv, Z., Linte, C.: A distance map regularized CNN for cardiac cine MR image segmentation. arXiv:1901.01238 (2019)
  18. 18.
    Ding, H., Jiang, X., Liu, A.Q., Thalmann, N.M., Wang, G.: Boundary-aware feature propagation for scene segmentation. In: ICCV (2019)Google Scholar
  19. 19.
    Ding, H., Jiang, X., Shuai, B., Liu, A.Q., Wang, G.: Semantic correlation promoted shape-variant context for segmentation. In: CVPR (2019)Google Scholar
  20. 20.
    Ding, H., Jiang, X., Shuai, B., Qun Liu, A., Wang, G.: Context contrasted feature and gated multi-scale aggregation for scene segmentation. In: CVPR (2018)Google Scholar
  21. 21.
    Dollár, P., Zitnick, C.L.: Fast edge detection using structured forests. ArXiv (2014)Google Scholar
  22. 22.
    Fieraru, M., Khoreva, A., Pishchulin, L., Schiele, B.: Learning to refine human pose estimation. In: CVPRW (2018)Google Scholar
  23. 23.
    Fu, J., et al.: Dual attention network for scene segmentation. In: CVPR (2019)Google Scholar
  24. 24.
    Gidaris, S., Komodakis, N.: Detect, replace, refine: deep structured prediction for pixel wise labeling. In: CVPR (2017)Google Scholar
  25. 25.
    Hayder, Z., He, X., Salzmann, M.: Boundary-aware instance segmentation. In: CVPR (2017)Google Scholar
  26. 26.
    He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV (2017)Google Scholar
  27. 27.
    Huang, L., Yuan, Y., Guo, J., Zhang, C., Chen, X., Wang, J.: Interlaced sparse self-attention for semantic segmentation. arXiv preprint arXiv:1907.12273 (2019)
  28. 28.
    Huang, Z., Wang, X., Huang, L., Huang, C., Wei, Y., Liu, W.: CCNet: criss-cross attention for semantic segmentation. In: ICCV (2019)Google Scholar
  29. 29.
    Islam, M.A., Naha, S., Rochan, M., Bruce, N., Wang, Y.: Label refinement network for coarse-to-fine semantic segmentation. arXiv:1703.00551 (2017)
  30. 30.
    Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: NIPS (2015)Google Scholar
  31. 31.
    Kass, M., Witkin, A., Terzopoulos, D.: Snakes: active contour models. IJCV 1, 321–331 (1988). Scholar
  32. 32.
    Ke, T.-W., Hwang, J.-J., Liu, Z., Yu, S.X.: Adaptive affinity fields for semantic segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 605–621. Springer, Cham (2018). Scholar
  33. 33.
    Kim, Y., Kim, S., Kim, T., Kim, C.: CNN-based semantic segmentation using level set loss. In: WACV (2019)Google Scholar
  34. 34.
    Kimmel, R., Kiryati, N., Bruckstein, A.M.: Sub-pixel distance maps and weighted distance transforms. JMIV 6, 223–233 (1996)MathSciNetCrossRefGoogle Scholar
  35. 35.
    Kirillov, A., Wu, Y., He, K., Girshick, R.: Pointrend: image segmentation as rendering. arXiv:1912.08193 (2019)
  36. 36.
    Krähenbühl, P., Koltun, V.: Efficient inference in fully connected CRFs with gaussian edge potentials. In: NIPS (2011)Google Scholar
  37. 37.
    Kuo, W., Angelova, A., Malik, J., Lin, T.Y.: Shapemask: learning to segment novel objects by refining shape priors. In: ICCV (2019)Google Scholar
  38. 38.
    Li, K., Hariharan, B., Malik, J.: Iterative instance segmentation. In: CVPR (2016)Google Scholar
  39. 39.
    Li, X., Zhong, Z., Wu, J., Yang, Y., Lin, Z., Liu, H.: Expectation-maximization attention networks for semantic segmentation. In: ICCV (2019)Google Scholar
  40. 40.
    Liang, J., Homayounfar, N., Ma, W.C., Xiong, Y., Hu, R., Urtasun, R.: Polytransform: Deep polygon transformer for instance segmentation. arXiv:1912.02801 (2019)
  41. 41.
    Lin, G., Milan, A., Shen, C., Reid, I.: Refinenet: multi-path refinement networks for high-resolution semantic segmentation. In: CVPR (2017)Google Scholar
  42. 42.
    Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: CVPR (2018)Google Scholar
  43. 43.
    Liu, S., De Mello, S., Gu, J., Zhong, G., Yang, M.H., Kautz, J.: Learning affinity via spatial propagation networks. In: NIPS (2017)Google Scholar
  44. 44.
    Liu, T., et al.: Devil in the details: towards accurate single and multiple human parsing. arXiv:1809.05996 (2018)
  45. 45.
    Liu, Y., Cheng, M.M., Hu, X., Wang, K., Bai, X.: Richer convolutional features for edge detection. In: CVPR (2017)Google Scholar
  46. 46.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015)Google Scholar
  47. 47.
    Mazzini, D.: Guided upsampling network for real-time semantic segmentation. arXiv preprint arXiv:1807.07466 (2018)
  48. 48.
    Mazzini, D., Schettini, R.: Spatial sampling network for fast scene understanding. In: CVPRW (2019)Google Scholar
  49. 49.
    Neuhold, G., Ollmann, T., Rota Bulo, S., Kontschieder, P.: The mapillary vistas dataset for semantic understanding of street scenes. In: ICCV (2017)Google Scholar
  50. 50.
    Osher, S., Sethian, J.A.: Fronts propagating with curvature-dependent speed: algorithms based on Hamilton-Jacobi formulations. J. Comput. Phys. 79, 12–49 (1988)MathSciNetCrossRefGoogle Scholar
  51. 51.
    Papandreou, G., Zhu, T., Chen, L.-C., Gidaris, S., Tompson, J., Murphy, K.: PersonLab: person pose estimation and instance segmentation with a bottom-up, part-based, geometric embedding model. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 282–299. Springer, Cham (2018). Scholar
  52. 52.
    Richter, S.R., Vineet, V., Roth, S., Koltun, V.: Playing for data: ground truth from computer games. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 102–118. Springer, Cham (2016). Scholar
  53. 53.
    Rota Bulò, S., Porzi, L., Kontschieder, P.: In-place activated batchnorm for memory-optimized training of DNNs. In: CVPR (2018)Google Scholar
  54. 54.
    Sun, K., et al.: High-resolution representations for labeling pixels and regions. arXiv:1904.04514 (2019)
  55. 55.
    Takikawa, T., Acuna, D., Jampani, V., Fidler, S.: Gated-SCNN: gated shape CNNs for semantic segmentation. In: ICCV (2019)Google Scholar
  56. 56.
    Wang, Z., Acuna, D., Ling, H., Kar, A., Fidler, S.: Object instance annotation with deep extreme level set evolution. In: CVPR (2019)Google Scholar
  57. 57.
    Wu, Y., Kirillov, A., Massa, F., Lo, W.Y., Girshick, R.: Detectron2. (2019)
  58. 58.
    Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015)
  59. 59.
    Yu, Z., Feng, C., Liu, M.Y., Ramalingam, S.: CASENet: deep category-aware semantic edge detection. In: CVPR (2017)Google Scholar
  60. 60.
    Yu, Z., et al.: Simultaneous edge alignment and learning. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 400–417. Springer, Cham (2018). Scholar
  61. 61.
    Yuan, Y., Chen, X., Wang, J.: Object-contextual representations for semantic segmentation. arXiv preprint arXiv:1909.11065 (2019)
  62. 62.
    Yuan, Y., Wang, J.: OCNet: object context network for scene parsing. arXiv:1809.00916 (2018)
  63. 63.
    Zhang, H., et al.: Context encoding for semantic segmentation. In: CVPR (2018)Google Scholar
  64. 64.
    Zhang, H., Zhang, H., Wang, C., Xie, J.: Co-occurrent features in semantic segmentation. In: CVPR (2019)Google Scholar
  65. 65.
    Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network (2017)Google Scholar
  66. 66.
    Zhao, H., et al.: PSANet: point-wise spatial attention network for scene parsing. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11213, pp. 270–286. Springer, Cham (2018). Scholar
  67. 67.
    Zheng, S., et al.: Conditional random fields as recurrent neural networks. In: ICCV (2015)Google Scholar
  68. 68.
    Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., Torralba, A.: Scene parsing through ade20k dataset. In: CVPR (2017)Google Scholar
  69. 69.
    Zhu, Y., et al.: Improving semantic segmentation via video propagation and label relaxation. In: CVPR (2019)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CASBeijingChina
  2. 2.University of Chinese Academy of SciencesBeijingChina
  3. 3.University of Science and Technology of ChinaHefeiChina
  4. 4.Microsoft Research AsiaBeijingChina

Personalised recommendations