Boundary-Preserving Mask R-CNN

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12359)


Tremendous efforts have been made to improve mask localization accuracy in instance segmentation. Modern instance segmentation methods relying on fully convolutional networks perform pixel-wise classification, which ignores object boundaries and shap, leading coarse and indistinct mask prediction results and imprecise localization. To remedy these problems, we propose a conceptually simple yet effective Boundary-preserving Mask R-CNN (BMask R-CNN) to leverage object boundary information to improve mask localization accuracy. BMask R-CNN contains a boundary-preserving mask head in which object boundary and mask are mutually learned via feature fusion blocks. As a result, the predicted masks are better aligned with object boundaries. Without bells and whistles, BMask R-CNN outperforms Mask R-CNN by a considerable margin on the COCO dataset; in the Cityscapes dataset, there are more accurate boundary groundtruths available, so that BMask R-CNN obtains remarkable improvements over Mask R-CNN. Besides, it is not surprising to observe that BMask R-CNN obtains more obvious improvement when the evaluation criterion requires better localization (e.g.., AP\(_{75}\)) as shown in Fig. 1. Code and models are available at


Instance segmentation Object detection Boundary-preserving Boundary detection 



This work was in part supported by NSFC (No. 61733007 and No. 61876212), Zhejiang Lab (No. 2019NB0AB02), and HUST-Horizon Computer Vision Research Center.


  1. 1.
    Acuna, D., Kar, A., Fidler, S.: Devil is in the edges: learning semantic boundaries from noisy annotations. In: CVPR, pp. 11075–11083 (2019)Google Scholar
  2. 2.
    Arnab, A., Torr, P.H.S.: Pixelwise instance segmentation with a dynamically instantiated network. In: CVPR, pp. 879–888 (2017)Google Scholar
  3. 3.
    Bertasius, G., Shi, J., Torresani, L.: Semantic segmentation with boundary neural fields. In: CVPR, pp. 3602–3610 (2016)Google Scholar
  4. 4.
    Brabandere, B.D., Neven, D., Gool, L.V.: Semantic instance segmentation with a discriminative loss function. CoRR abs/1708.02551 (2017)Google Scholar
  5. 5.
    Cai, Z., Vasconcelos, N.: Cascade R-CNN: high quality object detection and instance segmentation. CoRR abs/1906.09756 (2019)Google Scholar
  6. 6.
    Chen, H., Sun, K., Tian, Z., Shen, C., Huang, Y., Yan, Y.: Blendmask: top-down meets bottom-up for instance segmentation. In: CVPR (2020)Google Scholar
  7. 7.
    Chen, K., et al.: Hybrid task cascade for instance segmentation. In: CVPR, pp. 4974–4983 (2019)Google Scholar
  8. 8.
    Chen, L., Barron, J.T., Papandreou, G., Murphy, K., Yuille, A.L.: Semantic image segmentation with task-specific edge detection using CNNs and a discriminatively trained domain transform. In: CVPR, pp. 4545–4554 (2016)Google Scholar
  9. 9.
    Chen, L., Hermans, A., Papandreou, G., Schroff, F., Wang, P., Adam, H.: Masklab: instance segmentation by refining object detection with semantic and direction features. In: CVPR, pp. 4013–4022 (2018)Google Scholar
  10. 10.
    Chen, L., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018)CrossRefGoogle Scholar
  11. 11.
    Cheng, F., et al.: Learning directional feature maps for cardiac MRI segmentation. In: Martel, A.L., Abolmaesumi, P., Stoyanov, D., Mateus, D., Zuluaga, M.A., Zhou, S.K., Racoceanu, D., Joskowicz, L. (eds.) MICCAI 2020, Part IV. LNCS, vol. 12264, pp. 108–117. Springer, Cham (2020). Scholar
  12. 12.
    Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: CVPR (2016)Google Scholar
  13. 13.
    Dai, J., He, K., Li, Y., Ren, S., Sun, J.: Instance-sensitive fully convolutional networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 534–549. Springer, Cham (2016). Scholar
  14. 14.
    Dai, J., He, K., Sun, J.: Instance-aware semantic segmentation via multi-task network cascades. In: CVPR, pp. 3150–3158 (2016)Google Scholar
  15. 15.
    ai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks. In: NIPS, pp. 379–387 (2016)Google Scholar
  16. 16.
    Deng, R., Shen, C., Liu, S., Wang, H., Liu, X.: Learning to predict crisp boundaries. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 570–586. Springer, Cham (2018). Scholar
  17. 17.
    Fathi, A., et al.: Semantic instance segmentation via deep metric learning. CoRR abs/1703.10277 (2017)Google Scholar
  18. 18.
    Girshick, R.B.: Fast R-CNN. In: ICCV, pp. 1440–1448 (2015)Google Scholar
  19. 19.
    Goyal, P., et al.: Accurate, large minibatch SGD: training imagenet in 1 hour. CoRR abs/1706.02677 (2017)Google Scholar
  20. 20.
    Hayder, Z., He, X., Salzmann, M.: Boundary-aware instance segmentation. In: CVPR, pp. 587–595 (2017)Google Scholar
  21. 21.
    He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask R-CNN. In: ICCV (2017)Google Scholar
  22. 22.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)Google Scholar
  23. 23.
    Huang, Q., Xia, C., Zheng, W., Song, Y., Xu, H., Jay Kuo, C.C.: Object boundary guided semantic segmentation. In: Lai, S.H., Lepetit, V., Nishino, K., Sato, Y. (eds.) ACCV 2016. LNCS, vol. 10111, pp. 197–212. Springer, Cham (2017). Scholar
  24. 24.
    Huang, Z., Huang, L., Gong, Y., Huang, C., Wang, X.: Mask scoring R-CNN. In: CVPR, pp. 6409–6418 (2019)Google Scholar
  25. 25.
    Huang, Z., et al.: Ccnet: criss-cross attention for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. (2020).
  26. 26.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML, pp. 448–456 (2015)Google Scholar
  27. 27.
    Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018)Google Scholar
  28. 28.
    Kim, H.Y., Kang, B.R.: Instance segmentation and object detection with bounding shape masks. CoRR abs/1810.10327 (2018)Google Scholar
  29. 29.
    Kirillov, A., Levinkov, E., Andres, B., Savchynskyy, B., Rother, C.: Instancecut: from edges to instances with multicut. In: CVPR. pp. 7322–7331 (2017)Google Scholar
  30. 30.
    Kirillov, A., Wu, Y., He, K., Girshick, R.: Pointrend: Image segmentation as rendering. In: CVPR (2020)Google Scholar
  31. 31.
    Kittler, J.: On the accuracy of the sobel edge detector. Image Vis. Comput. 1(1), 37–42 (1983)CrossRefGoogle Scholar
  32. 32.
    Krähenbühl, P., Koltun, V.: Efficient inference in fully connected CRFs with gaussian edge potentials. In: NIPS, pp. 109–117 (2011)Google Scholar
  33. 33.
    Li, Y., Qi, H., Dai, J., Ji, X., Wei, Y.: Fully convolutional instance-aware semantic segmentation. In: CVPR, pp. 4438–4446 (2017)Google Scholar
  34. 34.
    Lin, T., Dollár, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: CVPR (2017)Google Scholar
  35. 35.
    Lin, T.Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). Scholar
  36. 36.
    Liu, S., Jia, J., Fidler, S., Urtasun, R.: SGN: sequential grouping networks for instance segmentation. In: ICCV, pp. 3516–3524 (2017)Google Scholar
  37. 37.
    Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: CVPR, pp. 8759–8768 (2018)Google Scholar
  38. 38.
    Massa, F., Girshick, R.: maskrcnn-benchmark: Fast, modular reference implementation of Instance Segmentation and Object Detection algorithms in PyTorch (2018).
  39. 39.
    Milletari, F., Navab, N., Ahmadi, S.: V-net: Fully convolutional neural networks for volumetric medical image segmentation. In: 3DV, pp. 565–571 (2016)Google Scholar
  40. 40.
    Misra, I., Shrivastava, A., Gupta, A., Hebert, M.: Cross-stitch networks for multi-task learning. In: CVPR, pp. 3994–4003 (2016)Google Scholar
  41. 41.
    Neven, D., Brabandere, B.D., Proesmans, M., Gool, L.V.: Instance segmentation by jointly optimizing spatial embeddings and clustering bandwidth. In: CVPR, pp. 8837–8845 (2019)Google Scholar
  42. 42.
    Pinheiro, P.H.O., Collobert, R., Dollár, P.: Learning to segment object candidates. In: NIPS, pp. 1990–1998 (2015)Google Scholar
  43. 43.
    Inheiro, P.O., Lin, T.Y., Collobert, R., Dollár, P.: Learning to refine object segments. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) – ECCV 2016ECCV 2016. LNCS, vol. 9905, pp. 75–91. Springer, Cham (2016). Scholar
  44. 44.
    Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)CrossRefGoogle Scholar
  45. 45.
    Takikawa, T., Acuna, D., Jampani, V., Fidler, S.: Gated-SCNN: gated shape CNNs for semantic segmentation. In: ICCV (2019)Google Scholar
  46. 46.
    Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. ArXiv abs/2003.05664 (2020)Google Scholar
  47. 47.
    Uhrig, J., Cordts, M., Franke, U., Brox, T.: Pixel-level encoding and depth layering for instance-level semantic labeling. In: Rosenhahn, B., Andres, B. (eds.) GCPR 2016. LNCS, vol. 9796, pp. 14–25. Springer, Cham (2016). Scholar
  48. 48.
    Wang, J., et al.: Deep high-resolution representation learning for visual recognition. CoRR abs/1908.07919 (2019)Google Scholar
  49. 49.
    Xie, S., Tu, Z.: Holistically-nested edge detection. Int. J. Comput. Vis. 125(1–3), 3–18 (2017). Scholar
  50. 50.
    Xu, W., Parmar, G., Tu, Z.: Learning geometry-aware skeleton detection. In: BMVC (2019)Google Scholar
  51. 51.
    Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., Sang, N.: Learning a discriminative feature network for semantic segmentation. In: CVPR, pp. 1857–1866 (2018)Google Scholar
  52. 52.
    Yu, Z., Feng, C., Liu, M., Ramalingam, S.: Casenet: Deep category-aware semantic edge detection. In: CVPR, pp. 1761–1770 (2017)Google Scholar
  53. 53.
    Yu, Z., et al.: Simultaneous edge alignment and learning. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 400–417. Springer, Cham (2018). Scholar
  54. 54.
    Yuan, Y., Xie, J., Chen, X., Wang, J.: SegFix: model-agnostic boundary refinement for segmentation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M. (eds.) ECCV 2020. LNCS, vol. 12357, pp. 489–506. Springer, Cham (2020). Scholar
  55. 55.
    Zimmermann, R.S., Siems, J.N.: Faster training of mask R-CNN by focusing on instance boundaries. CoRR abs/1809.07069 (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Huazhong University of Science and TechnologyWuhanChina
  2. 2.Horizon Robotics Inc.BeijingChina

Personalised recommendations