Abstract
In contrast to the fully supervised methods using pixel-wise mask labels, box-supervised instance segmentation takes advantage of the simple box annotations, which has recently attracted a lot of research attentions. In this paper, we propose a novel single-shot box-supervised instance segmentation approach, which integrates the classical level set model with deep neural network delicately. Specifically, our proposed method iteratively learns a series of level sets through a continuous Chan-Vese energy-based function in an end-to-end fashion. A simple mask supervised SOLOv2 model is adapted to predict the instance-aware mask map as the level set for each instance. Both the input image and its deep features are employed as the input data to evolve the level set curves, where a box projection function is employed to obtain the initial boundary. By minimizing the fully differentiable energy function, the level set for each instance is iteratively optimized within its corresponding bounding box annotation. The experimental results on four challenging benchmarks demonstrate the leading performance of our proposed approach to robust instance segmentation in various scenarios. The code is available at: https://github.com/LiWentomng/boxlevelset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Adalsteinsson, D., Sethian, J.A.: A fast level set method for propagating interfaces. J. Comput. Phys. 118(2), 269–277 (1995)
Arun, A., Jawahar, C.V., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12373, pp. 254–270. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58604-1_16
Bilic, P., et al.: The liver tumor segmentation benchmark (LiTS). arXiv preprint arXiv:1901.04056 (2019)
Bolya, D., Zhou, C., Xiao, F., Lee, Y.J.: YOLACT: real-time instance segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 9157–9166 (2019)
Bolya, D., Zhou, C., Xiao, F., Lee, Y.J.: YOLACT++: better real-time instance segmentation. IEEE Trans. Pattern Anal. Mach. Intell. (2020)
Caselles, V., Kimmel, R., Sapiro, G.: Geodesic active contours. Int. J. Comput. Vision 22(1), 61–79 (1997)
Chan, T., Vese, L.: Active contours without edges. IEEE Trans. Image Process. 10(2), 266–277 (2001)
Chen, K., et al.: MMDetection: open MMLab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155 (2019)
Cheng, T., Wang, X., Huang, L., Liu, W.: Boundary-preserving mask R-CNN. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12359, pp. 660–676. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58568-6_39
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vision 88(2), 303–338 (2010)
Hariharan, B., Arbeláez, P., Bourdev, L., Maji, S., Malik, J.: Semantic contours from inverse detectors. In: Proceedings of IEEE International Conference on Computer Vision, pp. 991–998. IEEE (2011)
He, K., Gkioxari, G., Dollar, P., Girshick, R.: Mask R-CNN. In: Proceedings of IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Homayounfar, N., Xiong, Y., Liang, J., Ma, W.-C., Urtasun, R.: LevelSet R-CNN: a deep variational method for instance segmentation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12368, pp. 555–571. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58592-1_33
Hsu, C.C., Hsu, K.J., Tsai, C.C., Lin, Y.Y., Chuang, Y.Y.: Weakly supervised instance segmentation using the bounding box tightness prior. In: Proceedings of Advances in Neural Information Processing Systems, vol. 32, pp. 6582–6593 (2019)
Hu, P., Shuai, B., Liu, J., Wang, G.: Deep level sets for salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 540–549 (2017)
Huang, Z., Huang, L., Gong, Y., Huang, C., Wang, X.: Mask scoring R-CNN. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6409–6418 (2019)
Kass, M., Witkin, A., Terzopoulos, D.: Snakes: active contour models. Int. J. Comput. Vision 1(4), 321–331 (1988)
Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1665–1674 (2017)
Kim, B., Ye, J.C.: Mumford-shah loss functional for image segmentation with deep learning. IEEE Trans. Image Process. 29, 1856–1866 (2019)
Kirillov, A., Wu, Y., He, K., Girshick, R.: Pointrend: image segmentation as rendering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9799–9808 (2020)
Krähenbühl, P., Koltun, V.: Efficient inference in fully connected CRFs with gaussian edge potentials. In: Proceedings of Advances in Neural Information Processing Systems, vol. 24 (2011)
Kulharia, V., Chandra, S., Agrawal, A., Torr, P., Tyagi, A.: Box2Seg: attention weighted loss and discriminative feature learning for weakly supervised segmentation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12372, pp. 290–308. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58583-9_18
Lan, S., et al.: Discobox: weakly supervised instance segmentation and semantic correspondence from box supervision. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3406–3416 (2021)
Lee, J., Yi, J., Shin, C., Yoon, S.: BBAM: bounding box attribution map for weakly supervised semantic and instance segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2643–2652 (2021)
Liang, Z., Wang, T., Zhang, X., Sun, J., Shen, J.: Tree energy loss: towards sparsely annotated semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 16907–16916 (2022)
Liao, S., Sun, Y., Gao, C., KP, P.S., Mu, S., Shimamura, J., Sagata, A.: Weakly supervised instance segmentation using hybrid networks. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 1917–1921. IEEE (2019)
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Liu, S., Peng, Y.: A local region-based Chan-Vese model for image segmentation. Pattern Recogn. 45(7), 2769–2779 (2012)
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)
Malladi, R., Sethian, J.A., Vemuri, B.C.: Shape modeling with front propagation: a level set approach. IEEE Trans. Pattern Anal. Mach. Intell. 17(2), 158–175 (1995)
Maška, M., Daněk, O., Garasa, S., Rouzaut, A., Munoz-Barrutia, A., Ortiz-de Solorzano, C.: Segmentation and shape tracking of whole fluorescent cells based on the Chan-Vese model. IEEE Trans. Med. Imaging 32(6), 995–1006 (2013)
Milletari, F., Navab, N., Ahmadi, S.A.: V-net: fully convolutional neural networks for volumetric medical image segmentation. In: Proceedings of International Conference on 3D Vision (3DV), pp. 565–571 (2016)
Mumford, D.B., Shah, J.: Optimal approximations by piecewise smooth functions and associated variational problems. Commun. Pure Appl. Math. (1989)
Osher, S., Sethian, J.A.: Fronts propagating with curvature-dependent speed: algorithms based on Hamilton-Jacobi formulations. J. Comput. Phys. 79(1), 12–49 (1988)
Peng, S., Jiang, W., Pi, H., Li, X., Bao, H., Zhou, X.: Deep snake for real-time instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8533–8542 (2020)
Pont-Tuset, J., Arbelaez, P., T.Barron, J., Marques, F., Malik, J.: Multiscale combinatorial grouping for image segmentation and object proposal generation. IEEE Trans. Pattern Anal. Mach. Intell. 39(1), 128–140 (2017)
Rother, C., Kolmogorov, V., Blake, A.: Grabcut: interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. (TOG) 23(3), 309–314 (2004)
Song, L., Li, Y., Li, Z., Yu, G., Sun, H., Sun, J., Zheng, N.: Learnable tree filter for structure-preserving feature transform. In: Proceedings of Advances in Neural Information Processing Systems, vol. 32 (2019)
Sun, Y., et al.: Weakly supervised instance segmentation based on two-stage transfer learning. IEEE Access 8, 24135–24144 (2020)
Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 282–298. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_17
Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: high-performance instance segmentation with box annotations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5443–5452 (2021)
Vese, L.A., Chan, T.F.: A multiphase level set framework for image segmentation using the mumford and shah model. Int. J. Comput. Vision 50(3), 271–293 (2002)
Wang, X.F., Huang, D.S., Xu, H.: An efficient local Chan-Vese model for image segmentation. Pattern Recogn. 43(3), 603–618 (2010)
Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10225–10235 (2021)
Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: dynamic and fast instance segmentation. In: Proceedings of Advances in Neural Information Processing Systems, vol. 33, pp. 17721–17732 (2020)
Wang, Z., Acuna, D., Ling, H., Kar, A., Fidler, S.: Object instance annotation with deep extreme level set evolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7500–7508 (2019)
Waqas Zamir, S., et al.: iSAID: a large-scale dataset for instance segmentation in aerial images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 28–37 (2019)
Xie, E., et al.: Polarmask: single shot instance segmentation with polar representation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 12193–12202 (2020)
Xu, L., Lu, C., Xu, Y., Jia, J.: Image smoothing via L 0 gradient minimization. In: Proceedings of the SIGGRAPH Asia Conference, pp. 1–12 (2011)
Yu, F., Wang, D., Shelhamer, E., Darrell, T.: Deep layer aggregation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2403–2412 (2018)
Yuan, J., Chen, C., Li, F.: Deep variational instance segmentation. In: Proceedings of Advances in Neural Information Processing Systems, vol. 33, pp. 4811–4822 (2020)
Zhang, G., et al.: Refinemask: towards high-quality instance segmentation with fine-grained features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6861–6869 (2021)
Acknowledgments
This work is supported by National Natural Science Foundation of China under Grants (61831015) and Alibaba-Zhejiang University Joint Institute of Frontier Technologies.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Li, W., Liu, W., Zhu, J., Cui, M., Hua, XS., Zhang, L. (2022). Box-Supervised Instance Segmentation with Level Set Evolution. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13689. Springer, Cham. https://doi.org/10.1007/978-3-031-19818-2_1
Download citation
DOI: https://doi.org/10.1007/978-3-031-19818-2_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19817-5
Online ISBN: 978-3-031-19818-2
eBook Packages: Computer ScienceComputer Science (R0)