Box-Supervised Instance Segmentation with Level Set Evolution

Li, Wentong; Liu, Wenyu; Zhu, Jianke; Cui, Miaomiao; Hua, Xian-Sheng; Zhang, Lei

doi:10.1007/978-3-031-19818-2_1

Wentong Li¹²,
Wenyu Liu¹²,
Jianke Zhu¹²,
Miaomiao Cui¹³,
Xian-Sheng Hua¹³ &
…
Lei Zhang¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13689))

Included in the following conference series:

European Conference on Computer Vision

2276 Accesses
15 Citations

Abstract

In contrast to the fully supervised methods using pixel-wise mask labels, box-supervised instance segmentation takes advantage of the simple box annotations, which has recently attracted a lot of research attentions. In this paper, we propose a novel single-shot box-supervised instance segmentation approach, which integrates the classical level set model with deep neural network delicately. Specifically, our proposed method iteratively learns a series of level sets through a continuous Chan-Vese energy-based function in an end-to-end fashion. A simple mask supervised SOLOv2 model is adapted to predict the instance-aware mask map as the level set for each instance. Both the input image and its deep features are employed as the input data to evolve the level set curves, where a box projection function is employed to obtain the initial boundary. By minimizing the fully differentiable energy function, the level set for each instance is iteratively optimized within its corresponding bounding box annotation. The experimental results on four challenging benchmarks demonstrate the leading performance of our proposed approach to robust instance segmentation in various scenarios. The code is available at: https://github.com/LiWentomng/boxlevelset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://competitions.codalab.org/competitions/17094.

References

Adalsteinsson, D., Sethian, J.A.: A fast level set method for propagating interfaces. J. Comput. Phys. 118(2), 269–277 (1995)
Article MathSciNet Google Scholar
Arun, A., Jawahar, C.V., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12373, pp. 254–270. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58604-1_16
Chapter Google Scholar
Bilic, P., et al.: The liver tumor segmentation benchmark (LiTS). arXiv preprint arXiv:1901.04056 (2019)
Bolya, D., Zhou, C., Xiao, F., Lee, Y.J.: YOLACT: real-time instance segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 9157–9166 (2019)
Google Scholar
Bolya, D., Zhou, C., Xiao, F., Lee, Y.J.: YOLACT++: better real-time instance segmentation. IEEE Trans. Pattern Anal. Mach. Intell. (2020)
Google Scholar
Caselles, V., Kimmel, R., Sapiro, G.: Geodesic active contours. Int. J. Comput. Vision 22(1), 61–79 (1997)
Article Google Scholar
Chan, T., Vese, L.: Active contours without edges. IEEE Trans. Image Process. 10(2), 266–277 (2001)
Article Google Scholar
Chen, K., et al.: MMDetection: open MMLab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155 (2019)
Cheng, T., Wang, X., Huang, L., Liu, W.: Boundary-preserving mask R-CNN. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12359, pp. 660–676. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58568-6_39
Chapter Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Google Scholar
Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vision 88(2), 303–338 (2010)
Article Google Scholar
Hariharan, B., Arbeláez, P., Bourdev, L., Maji, S., Malik, J.: Semantic contours from inverse detectors. In: Proceedings of IEEE International Conference on Computer Vision, pp. 991–998. IEEE (2011)
Google Scholar
He, K., Gkioxari, G., Dollar, P., Girshick, R.: Mask R-CNN. In: Proceedings of IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Homayounfar, N., Xiong, Y., Liang, J., Ma, W.-C., Urtasun, R.: LevelSet R-CNN: a deep variational method for instance segmentation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12368, pp. 555–571. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58592-1_33
Chapter Google Scholar
Hsu, C.C., Hsu, K.J., Tsai, C.C., Lin, Y.Y., Chuang, Y.Y.: Weakly supervised instance segmentation using the bounding box tightness prior. In: Proceedings of Advances in Neural Information Processing Systems, vol. 32, pp. 6582–6593 (2019)
Google Scholar
Hu, P., Shuai, B., Liu, J., Wang, G.: Deep level sets for salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 540–549 (2017)
Google Scholar
Huang, Z., Huang, L., Gong, Y., Huang, C., Wang, X.: Mask scoring R-CNN. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6409–6418 (2019)
Google Scholar
Kass, M., Witkin, A., Terzopoulos, D.: Snakes: active contour models. Int. J. Comput. Vision 1(4), 321–331 (1988)
Article Google Scholar
Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1665–1674 (2017)
Google Scholar
Kim, B., Ye, J.C.: Mumford-shah loss functional for image segmentation with deep learning. IEEE Trans. Image Process. 29, 1856–1866 (2019)
Article MathSciNet Google Scholar
Kirillov, A., Wu, Y., He, K., Girshick, R.: Pointrend: image segmentation as rendering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9799–9808 (2020)
Google Scholar
Krähenbühl, P., Koltun, V.: Efficient inference in fully connected CRFs with gaussian edge potentials. In: Proceedings of Advances in Neural Information Processing Systems, vol. 24 (2011)
Google Scholar
Kulharia, V., Chandra, S., Agrawal, A., Torr, P., Tyagi, A.: Box2Seg: attention weighted loss and discriminative feature learning for weakly supervised segmentation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12372, pp. 290–308. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58583-9_18
Chapter Google Scholar
Lan, S., et al.: Discobox: weakly supervised instance segmentation and semantic correspondence from box supervision. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3406–3416 (2021)
Google Scholar
Lee, J., Yi, J., Shin, C., Yoon, S.: BBAM: bounding box attribution map for weakly supervised semantic and instance segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2643–2652 (2021)
Google Scholar
Liang, Z., Wang, T., Zhang, X., Sun, J., Shen, J.: Tree energy loss: towards sparsely annotated semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 16907–16916 (2022)
Google Scholar
Liao, S., Sun, Y., Gao, C., KP, P.S., Mu, S., Shimamura, J., Sagata, A.: Weakly supervised instance segmentation using hybrid networks. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 1917–1921. IEEE (2019)
Google Scholar
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Liu, S., Peng, Y.: A local region-based Chan-Vese model for image segmentation. Pattern Recogn. 45(7), 2769–2779 (2012)
Article Google Scholar
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)
Malladi, R., Sethian, J.A., Vemuri, B.C.: Shape modeling with front propagation: a level set approach. IEEE Trans. Pattern Anal. Mach. Intell. 17(2), 158–175 (1995)
Article Google Scholar
Maška, M., Daněk, O., Garasa, S., Rouzaut, A., Munoz-Barrutia, A., Ortiz-de Solorzano, C.: Segmentation and shape tracking of whole fluorescent cells based on the Chan-Vese model. IEEE Trans. Med. Imaging 32(6), 995–1006 (2013)
Article Google Scholar
Milletari, F., Navab, N., Ahmadi, S.A.: V-net: fully convolutional neural networks for volumetric medical image segmentation. In: Proceedings of International Conference on 3D Vision (3DV), pp. 565–571 (2016)
Google Scholar
Mumford, D.B., Shah, J.: Optimal approximations by piecewise smooth functions and associated variational problems. Commun. Pure Appl. Math. (1989)
Google Scholar
Osher, S., Sethian, J.A.: Fronts propagating with curvature-dependent speed: algorithms based on Hamilton-Jacobi formulations. J. Comput. Phys. 79(1), 12–49 (1988)
Article MathSciNet Google Scholar
Peng, S., Jiang, W., Pi, H., Li, X., Bao, H., Zhou, X.: Deep snake for real-time instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8533–8542 (2020)
Google Scholar
Pont-Tuset, J., Arbelaez, P., T.Barron, J., Marques, F., Malik, J.: Multiscale combinatorial grouping for image segmentation and object proposal generation. IEEE Trans. Pattern Anal. Mach. Intell. 39(1), 128–140 (2017)
Google Scholar
Rother, C., Kolmogorov, V., Blake, A.: Grabcut: interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. (TOG) 23(3), 309–314 (2004)
Article Google Scholar
Song, L., Li, Y., Li, Z., Yu, G., Sun, H., Sun, J., Zheng, N.: Learnable tree filter for structure-preserving feature transform. In: Proceedings of Advances in Neural Information Processing Systems, vol. 32 (2019)
Google Scholar
Sun, Y., et al.: Weakly supervised instance segmentation based on two-stage transfer learning. IEEE Access 8, 24135–24144 (2020)
Article Google Scholar
Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 282–298. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_17
Chapter Google Scholar
Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: high-performance instance segmentation with box annotations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5443–5452 (2021)
Google Scholar
Vese, L.A., Chan, T.F.: A multiphase level set framework for image segmentation using the mumford and shah model. Int. J. Comput. Vision 50(3), 271–293 (2002)
Article Google Scholar
Wang, X.F., Huang, D.S., Xu, H.: An efficient local Chan-Vese model for image segmentation. Pattern Recogn. 43(3), 603–618 (2010)
Article Google Scholar
Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10225–10235 (2021)
Google Scholar
Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: dynamic and fast instance segmentation. In: Proceedings of Advances in Neural Information Processing Systems, vol. 33, pp. 17721–17732 (2020)
Google Scholar
Wang, Z., Acuna, D., Ling, H., Kar, A., Fidler, S.: Object instance annotation with deep extreme level set evolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7500–7508 (2019)
Google Scholar
Waqas Zamir, S., et al.: iSAID: a large-scale dataset for instance segmentation in aerial images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 28–37 (2019)
Google Scholar
Xie, E., et al.: Polarmask: single shot instance segmentation with polar representation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 12193–12202 (2020)
Google Scholar
Xu, L., Lu, C., Xu, Y., Jia, J.: Image smoothing via L 0 gradient minimization. In: Proceedings of the SIGGRAPH Asia Conference, pp. 1–12 (2011)
Google Scholar
Yu, F., Wang, D., Shelhamer, E., Darrell, T.: Deep layer aggregation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2403–2412 (2018)
Google Scholar
Yuan, J., Chen, C., Li, F.: Deep variational instance segmentation. In: Proceedings of Advances in Neural Information Processing Systems, vol. 33, pp. 4811–4822 (2020)
Google Scholar
Zhang, G., et al.: Refinemask: towards high-quality instance segmentation with fine-grained features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6861–6869 (2021)
Google Scholar

Download references

Acknowledgments

This work is supported by National Natural Science Foundation of China under Grants (61831015) and Alibaba-Zhejiang University Joint Institute of Frontier Technologies.

Author information

Authors and Affiliations

Zhejiang University, Hangzhou, China
Wentong Li, Wenyu Liu & Jianke Zhu
Alibaba Damo Academy, Hangzhou, China
Miaomiao Cui & Xian-Sheng Hua
The Hong Kong Polytechnic University, Hong Kong, China
Lei Zhang

Authors

Wentong Li
View author publications
You can also search for this author in PubMed Google Scholar
Wenyu Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jianke Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Miaomiao Cui
View author publications
You can also search for this author in PubMed Google Scholar
Xian-Sheng Hua
View author publications
You can also search for this author in PubMed Google Scholar
Lei Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jianke Zhu .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, W., Liu, W., Zhu, J., Cui, M., Hua, XS., Zhang, L. (2022). Box-Supervised Instance Segmentation with Level Set Evolution. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13689. Springer, Cham. https://doi.org/10.1007/978-3-031-19818-2_1

Download citation

DOI: https://doi.org/10.1007/978-3-031-19818-2_1
Published: 22 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19817-5
Online ISBN: 978-3-031-19818-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Box-Supervised Instance Segmentation with Level Set Evolution