Skip to main content

Highly Accurate Dichotomous Image Segmentation

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13678))

Abstract

We present a systematic study on a new task called dichotomous image segmentation (DIS), which aims to segment highly accurate objects from natural images. To this end, we collected the first large-scale DIS dataset, called DIS5K, which contains 5,470 high-resolution (e.g., 2K, 4K or larger) images covering camouflaged, salient, or meticulous objects in various backgrounds. DIS is annotated with extremely fine-grained labels. Besides, we introduce a simple intermediate supervision baseline (IS-Net) using both feature-level and mask-level guidance for DIS model training. IS-Net outperforms various cutting-edge baselines on the proposed DIS5K, making it a general self-learned supervision network that can facilitate future research in DIS. Further, we design a new metric called human correction efforts (HCE) which approximates the number of mouse clicking operations required to correct the false positives and false negatives. HCE is utilized to measure the gap between models and real-world applications and thus can complement existing metrics. Finally, we conduct the largest-scale benchmark, evaluating 16 representative segmentation models, providing a more insightful discussion regarding object complexities, and showing several potential applications (e.g., background removal, art design, 3D reconstruction). Hoping these efforts can open up promising directions for both academic and industries. Project page: https://xuebinqin.github.io/dis/index.html.

We would like to thank Jiayi Zhu for his efforts in re-organizing the dataset and codes.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
EUR   29.95
Price includes VAT (Finland)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR   93.08
Price includes VAT (Finland)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR   120.99
Price includes VAT (Finland)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Images with the license of “Commercial use & mods allowed”.

  2. 2.

    Since the long-term goal of this research is to facilitate the “safe” and “efficient” interaction between the machines and our living/working environments, these keywords are mainly related to the common targets (e.g., bicycle, chair, bag, cable, tree, etc.) in our daily lives.

  3. 3.

    https://www.gimp.org/.

  4. 4.

    It is worth noting that only R-PASCAL and the BIG datasets are included here because they target highly accurate segmentation, and most of their images contain one or two objects, which is comparable to the listed tasks and datasets.

References

  1. Jaccard index. https://en.wikipedia.org/wiki/Jaccard_index. (Accessed 21 Sep 2021)

  2. Achanta, R., Hemami, S., Estrada, F., Susstrunk, S.: Frequency-tuned salient region detection. In: CVPR (2009)

    Google Scholar 

  3. Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE TPAMI 39(12), 2481–2495 (2017)

    Article  Google Scholar 

  4. Birsan, T., Tiba, D.: One hundred years since the introduction of the set distance by dimitrie pompeiu. In: IFIP SMO (2005)

    Google Scholar 

  5. Blumberg, H.: Hausdorff’s Grundzüge der Mengenlehre. Bull. Am. Math. Soc. 27(3), 116–129 (1920)

    Article  MathSciNet  Google Scholar 

  6. Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 833–851. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_49

    Chapter  Google Scholar 

  7. Chen, S., Ma, X., Lu, Y., Hsu, D.: Ab initio particle-based object manipulation. In: Shell, D.A., Toussaint, M., Hsieh, M.A. (eds.) RSS (2021)

    Google Scholar 

  8. Chen, Z., Xu, Q., Cong, R., Huang, Q.: Global context-aware progressive aggregation network for salient object detection. In: AAAI (2020)

    Google Scholar 

  9. Cheng, B., Girshick, R., Dollár, P., Berg, A.C., Kirillov, A.: Boundary IoU: Improving object-centric image segmentation evaluation. In: CVPR (2021)

    Google Scholar 

  10. Cheng, H.K., Chung, J., Tai, Y.W., Tang, C.K.: Cascadepsp: Toward class-agnostic and very high-resolution segmentation via global and local refinement. In: CVPR (2020)

    Google Scholar 

  11. Cheng, M., Mitra, N.J., Huang, X., Torr, P.H.S., Hu, S.: Global contrast based salient region detection. IEEE TPAMI 37(3), 569–582 (2015)

    Article  Google Scholar 

  12. Chinchor, N.: MUC-4 evaluation metrics. In: MUC (1992)

    Google Scholar 

  13. Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: CVPR (2016)

    Google Scholar 

  14. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: CVPR (2009)

    Google Scholar 

  15. Ehrig, M., Euzenat, J.: Relaxed precision and recall for ontology matching. In: K-CapW (2005)

    Google Scholar 

  16. Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. IJCV 88(2), 303–338 (2010)

    Article  Google Scholar 

  17. Fan, D.-P., Cheng, M.-M., Liu, J.-J., Gao, S.-H., Hou, Q., Borji, A.: Salient objects in clutter: Bringing salient object detection to the foreground. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11219, pp. 196–212. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01267-0_12

    Chapter  Google Scholar 

  18. Fan, D.P., Cheng, M.M., Liu, Y., Li, T., Borji, A.: Structure-measure: A new way to evaluate foreground maps. In: ICCV (2017)

    Google Scholar 

  19. Fan, D.P., Gong, C., Cao, Y., Ren, B., Cheng, M.M., Borji, A.: Enhanced-alignment measure for binary foreground map evaluation. In: IJCAI (2018)

    Google Scholar 

  20. Fan, D.P., Ji, G.P., Cheng, M.M., Shao, L.: Concealed object detection. In: IEEE TPAMI (2021)

    Google Scholar 

  21. Fan, D.P., Ji, G.P., Qin, X., Cheng, M.M.: Cognitive vision inspired object segmentation metric and loss function. In: SSI, vol. 6 (2021)

    Google Scholar 

  22. Fan, D.P., Ji, G.P., Sun, G., Cheng, M.M., Shen, J., Shao, L.: Camouflaged object detection. In: CVPR (2020)

    Google Scholar 

  23. Fan, M., et al.: Rethinking bisenet for real-time semantic segmentation. In: CVPR (2021)

    Google Scholar 

  24. Fiorio, C., Gustedt, J.: Two linear time union-find strategies for image processing. TCS 154(2), 165–181 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  25. Freixenet, J., Muñoz, X., Raba, D., Martí, J., Cufí, X.: Yet another survey on image segmentation: Region and boundary information integration. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2352, pp. 408–422. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-47977-5_27

    Chapter  Google Scholar 

  26. Gao, S.H., Cheng, M.M., Zhao, K., Zhang, X.Y., Yang, M.H., Torr, P.: Res2net: A new multi-scale backbone architecture. IEEE TPAMI 43(2), 652–662 (2019)

    Article  Google Scholar 

  27. Girshick, R.: Fast r-cnn. In: ICCV (2015)

    Google Scholar 

  28. Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014)

    Google Scholar 

  29. Goferman, S., Zelnik-Manor, L., Tal, A.: Context-aware saliency detection. IEEE TPAMI 34(10), 1915–1926 (2012)

    Article  Google Scholar 

  30. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016). http://www.deeplearningbook.org

  31. Haralick, R.M., Sternberg, S.R., Zhuang, X.: Image analysis using mathematical morphology. IEEE TPAMI PAMI 9(4), 532–550 (1987)

    Article  Google Scholar 

  32. Hausdorff, F.: Grundzüge der Mengenlehre. Leipzig: Veit, ISBN 978-0-8284-0061-9 Reprinted by Chelsea Publishing Company in 1949, Germany (1914)

    Google Scholar 

  33. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)

    Google Scholar 

  34. Howard, A., et al.: Searching for mobilenetv3. In: ECCV (2019)

    Google Scholar 

  35. Hu, P., Caba, F., Wang, O., Lin, Z., Sclaroff, S., Perazzi, F.: Temporally distributed networks for fast video semantic segmentation. In: CVPR (2020)

    Google Scholar 

  36. Ke, Z., et al.: Is a green screen really necessary for real-time portrait matting? arXiv: 2011.11961 (2020)

  37. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NeurIPS (2012)

    Google Scholar 

  38. Le, T.N., Nguyen, T.V., Nie, Z., Tran, M.T., Sugimoto, A.: Anabranch network for camouflaged object segmentation. CVIU 184, 45–56 (2019)

    Google Scholar 

  39. Lee, C.Y., Xie, S., Gallagher, P., Zhang, Z., Tu, Z.: Deeply-supervised nets. In: AISTATS (2015)

    Google Scholar 

  40. Li, G., Yu, Y.: Visual saliency based on multiscale deep features. In: CVPR (2015)

    Google Scholar 

  41. Li, H., Xiong, P., Fan, H., Sun, J.: Dfanet: Deep feature aggregation for real-time semantic segmentation. In: CVPR (2019)

    Google Scholar 

  42. Li, Y., Hou, X., Koch, C., Rehg, J.M., Yuille, A.L.: The secrets of salient object segmentation. In: CVPR (2014)

    Google Scholar 

  43. Liew, J.H., Cohen, S., Price, B., Mai, L., Feng, J.: Deep interactive thin object selection. In: WACV (2021)

    Google Scholar 

  44. Lin, S., Yang, L., Saleemi, I., Sengupta, S.: Robust high-resolution video matting with temporal guidance. arXiv: 2108.11515 (2021)

  45. Lin, T.-Y.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  46. Liu, T., et al.: Learning to detect a salient object. IEEE TPAMI 33(2), 353–367 (2011)

    Article  Google Scholar 

  47. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015)

    Google Scholar 

  48. Luc, P., Couprie, C., Chintala, S., Verbeek, J.: Semantic segmentation using adversarial networks. arXiv preprint arXiv:1611.08408 (2016)

  49. Lv, Y., Zhang, J., Dai, Y., Li, A., Liu, B., Barnes, N., Fan, D.P.: Simultaneously localize, segment and rank the camouflaged objects. In: CVPR (2021)

    Google Scholar 

  50. Margolin, R., Zelnik-Manor, L., Tal, A.: How to evaluate foreground maps. In: CVPR (2014)

    Google Scholar 

  51. Martin, D.R., Fowlkes, C.C., Malik, J.: Learning to detect natural image boundaries using local brightness, color, and texture cues. IEEE TPAMI 26(5), 530–549 (2004)

    Article  Google Scholar 

  52. Mei, H., Ji, G.P., Wei, Z., Yang, X., Wei, X., Fan, D.P.: Camouflaged object segmentation with distraction mining. In: CVPR (2021)

    Google Scholar 

  53. Mnih, V.: Machine Learning for Aerial Image Labeling. Ph.D. thesis, University of Toronto (2013)

    Google Scholar 

  54. Mnih, V., Hinton, G.E.: Learning to detect roads in high-resolution aerial images. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6316, pp. 210–223. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15567-3_16

    Chapter  Google Scholar 

  55. Movahedi, V., Elder, J.H.: Design and perceptual validation of performance measures for salient object segmentation. In: CVPRW (2010)

    Google Scholar 

  56. Nirkin, Y., Wolf, L., Hassner, T.: Hyperseg: Patch-wise hypernetwork for real-time semantic segmentation. arXiv preprint arXiv:2012.11582 (2020)

  57. Orsic, M., Kreso, I., Bevandic, P., Segvic, S.: In defense of pre-trained imagenet architectures for real-time semantic segmentation of road-driving images. In: CVPR (2019)

    Google Scholar 

  58. Osserman, R.: The isoperimetric inequality. BAM 84(6), 1182–1238 (1978)

    Google Scholar 

  59. Perazzi, F., Krähenbühl, P., Pritch, Y., Hornung, A.: Saliency filters: Contrast based filtering for salient region detection. In: CVPR (2012)

    Google Scholar 

  60. Perazzi, F., et al.: A benchmark dataset and evaluation methodology for video object segmentation. In: CVPR (2016)

    Google Scholar 

  61. Qi, L., et al.: Open-world entity segmentation. arXiv preprint arXiv:2107.14228 (2021)

  62. Qin, X., et al.: Boundary-aware segmentation network for mobile and web applications. arXiv preprint arXiv:2101.04704 (2021)

  63. Qin, X., Zhang, Z., Huang, C., Dehghan, M., Zaiane, O.R., Jagersand, M.: U2-net: Going deeper with nested u-structure for salient object detection. PR 106, 107404 (2020)

    Google Scholar 

  64. Qin, X., Zhang, Z., Huang, C., Gao, C., Dehghan, M., Jagersand, M.: Basnet: Boundary-aware salient object detection. In: CVPR (2019)

    Google Scholar 

  65. Ramer, U.: An iterative procedure for the polygonal approximation of plane curves. CGIP 1(3), 244–256 (1972)

    Google Scholar 

  66. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. In: NeurIPS (2015)

    Google Scholar 

  67. van Rijsbergen, C.J.: Information retrieval. London: Butterworths (1979).http://www.dcs.gla.ac.uk/Keith/Preface.html (1979)

  68. Ronneberger, O., Fischer, P., Brox, T.: U-Net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28

    Chapter  Google Scholar 

  69. Saito, S., Yamashita, T., Aoki, Y.: Multiple object extraction from aerial imagery with convolutional neural networks. EI 2016(10), 1–9 (2016)

    Google Scholar 

  70. Shen, X., et al.: Automatic portrait segmentation for image stylization. In: CGF (2016)

    Google Scholar 

  71. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)

    Google Scholar 

  72. Skurowski, P., Abdulameer, H., Błaszczyk, J., Depta, T., Kornacki, A., Kozieł, P.: Animal camouflage analysis: Chameleon database. Unpublished Manuscript (2018)

    Google Scholar 

  73. Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. JMLR 15(1), 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

  74. Suzuki, S., Abe, K.: Topological structural analysis of digitized binary images by border following. CVGIP 30(1), 32–46 (1985)

    MATH  Google Scholar 

  75. Sørensen, T.J.: A method of establishing groups of equal amplitude in plant sociology based on similarity of species content and its application to analyses of the vegetation on Danish commons. I kommission hos E. Munksgaard, Denmark, København (1948)

    Google Scholar 

  76. Tan, M., Le, Q.: Efficientnet: Rethinking model scaling for convolutional neural networks. In: ICML, pp. 6105–6114 (2019)

    Google Scholar 

  77. Tang, L., Li, B., Zhong, Y., Ding, S., Song, M.: Disentangled high quality salient object detection. In: ICCV (2021)

    Google Scholar 

  78. Torralba, A., Efros, A.A.: Unbiased look at dataset bias. In: CVPR (2011)

    Google Scholar 

  79. Wang, J., et al.: Deep high-resolution representation learning for visual recognition. In: IEEE TPAMI (2019)

    Google Scholar 

  80. Wang, L., et al.: Learning to detect salient objects with image-level supervision. In: CVPR (2017)

    Google Scholar 

  81. Wang, T., et al.: Detect globally, refine locally: A novel approach to saliency detection. In: CVPR (2018)

    Google Scholar 

  82. Watson, A.B.: Perimetric complexity of binary digital images. Math. J. 14, 1–40 (2012)

    Google Scholar 

  83. Wei, J., Wang, S., Huang, Q.: F\(^3\)net: Fusion, feedback and focus for salient object detection. In: AAAI (2020)

    Google Scholar 

  84. Wu, K., Otoo, E.J., Shoshani, A.: Optimizing connected component labeling algorithms. In: Fitzpatrick, J.M., Reinhardt, J.M. (eds.) MI (2005)

    Google Scholar 

  85. Xie, S., Tu, Z.: Holistically-nested edge detection. In: ICCV (2015)

    Google Scholar 

  86. Xu, N., Price, B., Cohen, S., Huang, T.: Deep image matting. In: CVPR (2017)

    Google Scholar 

  87. Yan, Q., Xu, L., Shi, J., Jia, J.: Hierarchical saliency detection. In: CVPR (2013)

    Google Scholar 

  88. Yang, C., Wang, Y., Zhang, J., Zhang, H., Lin, Z., Yuille, A.: Meticulous object segmentation. arXiv preprint arXiv:2012.07181 (2020)

  89. Yang, C., Zhang, L., Lu, H., Ruan, X., Yang, M.H.: Saliency detection via graph-based manifold ranking. In: CVPR (2013)

    Google Scholar 

  90. Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., Sang, N.: BiSeNet: bilateral segmentation network for real-time semantic segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11217, pp. 334–349. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01261-8_20

    Chapter  Google Scholar 

  91. Yu, H., Xu, N., Huang, Z., Zhou, Y., Shi, H.: High-resolution deep image matting. arXiv preprint arXiv:2009.06613 (2020)

  92. Zeng, Y., Zhang, P., Zhang, J., Lin, Z., Lu, H.: Towards high-resolution salient object detection. In: CVPR, pp. 7234–7243 (2019)

    Google Scholar 

  93. Zhang, T.Y., Suen, C.Y.: A fast parallel algorithm for thinning digital patterns. Commun. ACM 27(3), 236–239 (1984)

    Article  Google Scholar 

  94. Zhang, Z., Liu, Q., Wang, Y.: Road extraction by deep residual u-net. GRSL 15(5), 749–753 (2018)

    Google Scholar 

  95. Zhao, H., Qi, X., Shen, X., Shi, J., Jia, J.: ICNet for real-time semantic segmentation on high-resolution images. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 418–434. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_25

    Chapter  Google Scholar 

  96. Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: CVPR (2017)

    Google Scholar 

  97. Zhao, J.X., Liu, J.J., Fan, D.P., Cao, Y., Yang, J., Cheng, M.M.: Egnet: Edge guidance network for salient object detection. In: ICCV (2019)

    Google Scholar 

  98. Zhao, X., Pang, Y., Zhang, L., Lu, H., Zhang, L.: Suppress and balance: A simple gated network for salient object detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12347, pp. 35–51. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_3

    Chapter  Google Scholar 

  99. Zheng, S., et al.: Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: CVPR (2021)

    Google Scholar 

  100. Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., Torralba, A.: Places: A 10 million image database for scene recognition. IEEE TPAMI 40(6), 1452–1464 (2017)

    Article  Google Scholar 

  101. Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., Torralba, A.: Scene parsing through ade20k dataset. In: CVPR (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Deng-Ping Fan .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 8966 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Qin, X., Dai, H., Hu, X., Fan, DP., Shao, L., Van Gool, L. (2022). Highly Accurate Dichotomous Image Segmentation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13678. Springer, Cham. https://doi.org/10.1007/978-3-031-19797-0_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19797-0_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19796-3

  • Online ISBN: 978-3-031-19797-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics