Advertisement

SOLO: Segmenting Objects by Locations

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12363)

Abstract

We present a new, embarrassingly simple approach to instance segmentation. Compared to many other dense prediction tasks, e.g., semantic segmentation, it is the arbitrary number of instances that have made instance segmentation much more challenging. In order to predict a mask for each instance, mainstream approaches either follow the “detect-then-segment” strategy (e.g., Mask R-CNN), or predict embedding vectors first then use clustering techniques to group pixels into individual instances. We view the task of instance segmentation from a completely new perspective by introducing the notion of “instance categories”, which assigns categories to each pixel within an instance according to the instance’s location and size, thus nicely converting instance segmentation into a single-shot classification-solvable problem. We demonstrate a much simpler and flexible instance segmentation framework with strong performance, achieving on par accuracy with Mask R-CNN and outperforming recent single-shot instance segmenters in accuracy. We hope that this simple and strong framework can serve as a baseline for many instance-level recognition tasks besides instance segmentation. Code is available at https://git.io/AdelaiDet.

Keywords

Instance segmentation Location category 

Notes

Acknowledgement

We would like to thank Dongdong Yu and Enze Xie for the discussion about maskness and dice loss. We also thank Chong Xu and the ByteDance AI Lab team for technical support. Correspondence should be addressed to CS. This work was in part supported by ARC DP ‘Deep learning that scales’.

Supplementary material

504473_1_En_38_MOESM1_ESM.pdf (6.5 mb)
Supplementary material 1 (pdf 6663 KB)

References

  1. 1.
    Bolya, D., Zhou, C., Xiao, F., Lee, Y.J.: YOLACT: real-time instance segmentation. In: Proceedings of the IEEE International Conference on Computer Vision (2019)Google Scholar
  2. 2.
    Chen, K., et al.: Hybrid task cascade for instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019)Google Scholar
  3. 3.
    Chen, L.C., Hermans, A., Papandreou, G., Schroff, F., Wang, P., Adam, H.: MaskLab: instance segmentation by refining object detection with semantic and direction features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)Google Scholar
  4. 4.
    Chen, X., Girshick, R., He, K., Dollar, P.: TensorMask: a foundation for dense object segmentation. In: Proceedings of the IEEE International Conference on Computer Vision (2019)Google Scholar
  5. 5.
    Dai, J., He, K., Sun, J.: Instance-aware semantic segmentation via multi-task network cascades. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)Google Scholar
  6. 6.
    Dai, J., et al.: Deformable convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision (2017)Google Scholar
  7. 7.
    De Brabandere, B., Neven, D., Van Gool, L.: Semantic instance segmentation with a discriminative loss function. arXiv:1708.02551 (2017)
  8. 8.
    Gao, N., et al.: SSAP: single-shot instance segmentation with affinity pyramid. In: Proceedings of the IEEE International Conference on Computer Vision (2019)Google Scholar
  9. 9.
    He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision (2017)Google Scholar
  10. 10.
    Huang, Z., Huang, L., Gong, Y., Huang, C., Wang, X.: Mask scoring R-CNN. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019)Google Scholar
  11. 11.
    Islam, M.A., Jia, S., Bruce, N.D.B.: How much position information do convolutional neural networks encode? In: Proceedings of the International Conference on Learning Representations (2020)Google Scholar
  12. 12.
    Kong, T., Sun, F., Liu, H., Jiang, Y., Li, L., Shi, J.: FoveaBox: beyond anchor-based object detector. IEEE Trans. Image Process. 29, 7389–7398 (2020)CrossRefGoogle Scholar
  13. 13.
    Li, Y., Qi, H., Dai, J., Ji, X., Wei, Y.: Fully convolutional instance-aware semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)Google Scholar
  14. 14.
    Lin, T., Dollár, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)Google Scholar
  15. 15.
    Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision (2017)Google Scholar
  16. 16.
    Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10602-1_48CrossRefGoogle Scholar
  17. 17.
    Liu, R., et al.: An intriguing failing of convolutional neural networks and the CoordConv solution. In: Proceedings of the Advances in Neural Information Processing Systems (2018)Google Scholar
  18. 18.
    Liu, S., Jia, J., Fidler, S., Urtasun, R.: Sequential grouping networks for instance segmentation. In: Proceedings of the IEEE International Conference on Computer Vision (2017)Google Scholar
  19. 19.
    Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)Google Scholar
  20. 20.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)Google Scholar
  21. 21.
    Milletari, F., Navab, N., Ahmadi, S.A.: V-Net: fully convolutional neural networks for volumetric medical image segmentation. In: Proceedings of the International Conference on 3D Vision (2016)Google Scholar
  22. 22.
    Newell, A., Huang, Z., Deng, J.: Associative embedding: end-to-end learning for joint detection and grouping. In: Proceedings of the Advances in Neural Information Processing Systems (2017)Google Scholar
  23. 23.
    Novotny, D., Albanie, S., Larlus, D., Vedaldi, A.: Semi-convolutional operators for instance segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 89–105. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01246-5_6CrossRefGoogle Scholar
  24. 24.
    Pinheiro, P.H.O., Collobert, R., Dollár, P.: Learning to segment object candidates. In: Proceedings of the Advances in Neural Information Processing Systems (2015)Google Scholar
  25. 25.
    Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of the Advances in Neural Information Processing Systems (2015)Google Scholar
  26. 26.
    Sofiiuk, K., Barinova, O., Konushin, A.: AdaptIS: adaptive instance selection network. In: Proceedings of the IEEE International Conference on Computer Vision (2019)Google Scholar
  27. 27.
    Tian, Z., Shen, C., Chen, H., He, T.: FCOS: fully convolutional one-stage object detection. In: Proceedings of the IEEE International Conference on Computer Vision (2019)Google Scholar
  28. 28.
    Xie, E., et al.: PolarMask: single shot instance segmentation with polar representation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2020)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.The University of AdelaideAdelaideAustralia
  2. 2.ByteDance AI LabBeijingChina

Personalised recommendations