Advertisement

Affinity Derivation and Graph Merge for Instance Segmentation

  • Yiding Liu
  • Siyu Yang
  • Bin Li
  • Wengang Zhou
  • Jizheng Xu
  • Houqiang Li
  • Yan Lu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11207)

Abstract

We present an instance segmentation scheme based on pixel affinity information, which is the relationship of two pixels belonging to the same instance. In our scheme, we use two neural networks with similar structures. One predicts the pixel level semantic score and the other is designed to derive pixel affinities. Regarding pixels as the vertexes and affinities as edges, we then propose a simple yet effective graph merge algorithm to cluster pixels into instances. Experiments show that our scheme generates fine grained instance masks. With Cityscape training data, the proposed scheme achieves 27.3 AP on test set.

Keywords

Instance segmentation Pixel affinity Graph merge Proposal-free 

Notes

Acknowledgement

Yiding Liu, Wengang Zhou and Houqiang Li’s work was supported in part by 973 Program under Contract 2015CB351803, Natural Science Foundation of China (NSFC) under Contract 61390514 and Contract 61632019.

References

  1. 1.
    Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016)
  2. 2.
    Arnab, A., Torr, P.H.S.: Pixelwise instance segmentation with a dynamically instantiated network. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 879–888, July 2017.  https://doi.org/10.1109/CVPR.2017.100
  3. 3.
    Bai, M., Urtasun, R.: Deep watershed transform for instance segmentation. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2858–2866, July 2017.  https://doi.org/10.1109/CVPR.2017.305
  4. 4.
    Brabandere, B.D., Neven, D., Gool, L.V.: Semantic instance segmentation for autonomous driving. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 478–480, July 2017.  https://doi.org/10.1109/CVPRW.2017.66
  5. 5.
    Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018).  https://doi.org/10.1109/TPAMI.2017.2699184CrossRefGoogle Scholar
  6. 6.
    Chen, L.C., Hermans, A., Papandreou, G., Schroff, F., Wang, P., Adam, H.: MaskLab: instance segmentation by refining object detection with semantic and direction features. arXiv preprint arXiv:1712.04837 (2017)
  7. 7.
    Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017)
  8. 8.
    Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. arXiv preprint arXiv:1802.02611 (2018)
  9. 9.
    Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3213–3223, June 2016.  https://doi.org/10.1109/CVPR.2016.350
  10. 10.
    Dai, J., He, K., Sun, J.: Instance-aware semantic segmentation via multi-task network cascades. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3150–3158, June 2016.  https://doi.org/10.1109/CVPR.2016.343
  11. 11.
    Dai, J., et al.: Deformable convolutional networks. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 764–773, October 2017.  https://doi.org/10.1109/ICCV.2017.89
  12. 12.
    Dai, J., He, K., Li, Y., Ren, S., Sun, J.: Instance-sensitive fully convolutional networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 534–549. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46466-4_32CrossRefGoogle Scholar
  13. 13.
    Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks. In: Advances in Neural Information Processing Systems, pp. 379–387 (2016)Google Scholar
  14. 14.
    Erhan, D., Szegedy, C., Toshev, A., Anguelov, D.: Scalable object detection using deep neural networks. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2155–2162, June 2014.  https://doi.org/10.1109/CVPR.2014.276
  15. 15.
    Fathi, A., et al.: Semantic instance segmentation via deep metric learning. arXiv preprint arXiv:1703.10277 (2017)
  16. 16.
    Fu, J., Liu, J., Wang, Y., Lu, H.: Stacked deconvolutional network for semantic segmentation. arXiv preprint arXiv:1708.04943 (2017)
  17. 17.
    Garcia-Garcia, A., Orts-Escolano, S., Oprea, S., Villena-Martinez, V., Garcia-Rodriguez, J.: A review on deep learning techniques applied to semantic segmentation. arXiv preprint arXiv:1704.06857 (2017)
  18. 18.
    Girshick, R.: Fast R-CNN. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1440–1448, December 2015.  https://doi.org/10.1109/ICCV.2015.169
  19. 19.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587, June 2014.  https://doi.org/10.1109/CVPR.2014.81
  20. 20.
    Grauman, K., Darrell, T.: The pyramid match kernel: discriminative classification with sets of image features. In: Tenth IEEE International Conference on Computer Vision (ICCV 2005) Volume 1, vol. 2, pp. 1458–1465, October 2005.  https://doi.org/10.1109/ICCV.2005.239
  21. 21.
    Hariharan, B., Arbeláez, P., Girshick, R., Malik, J.: Simultaneous detection and segmentation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 297–312. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10584-0_20CrossRefGoogle Scholar
  22. 22.
    Hayder, Z., He, X., Salzmann, M.: Shape-aware instance segmentation. arXiv preprint arXiv:1612.03129 (2016)
  23. 23.
    He, K., Gkioxari, G., Dollr, P., Girshick, R.: Mask R-CNN. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2980–2988, October 2017.  https://doi.org/10.1109/ICCV.2017.322
  24. 24.
    He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1026–1034 (Dec 2015).  https://doi.org/10.1109/ICCV.2015.123
  25. 25.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (June 2016).  https://doi.org/10.1109/CVPR.2016.90
  26. 26.
    Huang, J., et al.: Speed/accuracy trade-offs for modern convolutional object detectors. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3296–3297, July 2017.  https://doi.org/10.1109/CVPR.2017.351
  27. 27.
    Islam, M.A., Rochan, M., Bruce, N.D.B., Wang, Y.: Gated feedback refinement network for dense image labeling. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4877–4885, July 2017.  https://doi.org/10.1109/CVPR.2017.518
  28. 28.
    Jin, L., Chen, Z., Tu, Z.: Object detection free instance segmentation with labeling transformations. arXiv preprint arXiv:1611.08991 (2016)
  29. 29.
    Kirillov, A., Levinkov, E., Andres, B., Savchynskyy, B., Rother, C.: InstanceCut: from edges to instances with MultiCut. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7322–7331, July 2017.  https://doi.org/10.1109/CVPR.2017.774
  30. 30.
    Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006), vol. 2, pp. 2169–2178 (2006).  https://doi.org/10.1109/CVPR.2006.68
  31. 31.
    LeCun, Y., et al.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989).  https://doi.org/10.1162/neco.1989.1.4.541CrossRefGoogle Scholar
  32. 32.
    Levinkov, E., et al.: Joint graph decomposition and node labeling: problem, algorithms, applications. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  33. 33.
    Li, Y., Qi, H., Dai, J., Ji, X., Wei, Y.: Fully convolutional instance-aware semantic segmentation. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4438–4446, July 2017.  https://doi.org/10.1109/CVPR.2017.472
  34. 34.
    Liang, X., Wei, Y., Shen, X., Yang, J., Lin, L., Yan, S.: Proposal-free network for instance-level object segmentation. arXiv preprint arXiv:1509.02636 (2015)
  35. 35.
    Lin, G., Shen, C., Van Den Hengel, A., Reid, I.: Efficient piecewise training of deep structured models for semantic segmentation. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3194–3203, June 2016.  https://doi.org/10.1109/CVPR.2016.348
  36. 36.
    Lin, T.Y., Dollr, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 936–944 (July 2017).  https://doi.org/10.1109/CVPR.2017.106
  37. 37.
    Liu, S., Jia, J., Fidler, S., Urtasun, R.: SGN: sequential grouping networks for instance segmentation. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 3516–3524, October 2017.  https://doi.org/10.1109/ICCV.2017.378
  38. 38.
    Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46448-0_2CrossRefGoogle Scholar
  39. 39.
    Liu, W., Rabinovich, A., Berg, A.C.: ParseNet: looking wider to see better. arXiv preprint arXiv:1506.04579 (2015)
  40. 40.
    Liu, Z., Li, X., Luo, P., Loy, C.C., Tang, X.: Semantic image segmentation via deep parsing network. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1377–1385, December 2015.  https://doi.org/10.1109/ICCV.2015.162
  41. 41.
    Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46484-8_29CrossRefGoogle Scholar
  42. 42.
    Pinheiro, P.O., Collobert, R., Dollár, P.: Learning to segment object candidates. In: Advances in Neural Information Processing Systems, pp. 1990–1998 (2015)Google Scholar
  43. 43.
    Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788, June 2016.  https://doi.org/10.1109/CVPR.2016.91
  44. 44.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017).  https://doi.org/10.1109/TPAMI.2016.2577031CrossRefGoogle Scholar
  45. 45.
    Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015).  https://doi.org/10.1007/s11263-015-0816-yMathSciNetCrossRefGoogle Scholar
  46. 46.
    Shelhamer, E., Long, J., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 640–651 (2017).  https://doi.org/10.1109/TPAMI.2016.2572683CrossRefGoogle Scholar
  47. 47.
    Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015)
  48. 48.
    Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6230–6239, July 2017.  https://doi.org/10.1109/CVPR.2017.660

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Yiding Liu
    • 1
  • Siyu Yang
    • 2
  • Bin Li
    • 3
  • Wengang Zhou
    • 1
  • Jizheng Xu
    • 3
  • Houqiang Li
    • 1
  • Yan Lu
    • 3
  1. 1.Department of Electronic Engineering and Information ScienceUniversity of Science and Technology of ChinaHefeiChina
  2. 2.Beihang UniversityBeijingChina
  3. 3.Microsoft ResearchBeijingChina

Personalised recommendations