Dense RepPoints: Representing Visual Objects with Dense Point Sets

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12366)


We present a new object representation, called Dense RepPoints, that utilizes a large set of points to describe an object at multiple levels, including both box level and pixel level. Techniques are proposed to efficiently process these dense points, maintaining near-constant complexity with increasing point numbers. Dense RepPoints is shown to represent and learn object segments well, with the use of a novel distance transform sampling method combined with set-to-set supervision. The distance transform sampling combines the strengths of contour and grid representations, leading to performance that surpasses counterparts based on contours or grids. Code is available at



We thank Jifeng Dai and Bolei Zhou for discussion and comments about this work. Jifeng Dai was involved in early discussions of the work and gave up authorship after he joined another company.

Supplementary material

504479_1_En_14_MOESM1_ESM.pdf (678 kb)
Supplementary material 1 (pdf 677 KB)


  1. 1.
    Acuna, D., Ling, H., Kar, A., Fidler, S.: Efficient interactive annotation of segmentation datasets with polygon-RNN++ (2018)Google Scholar
  2. 2.
    Alp Güler, R., Neverova, N., Kokkinos, I.: DensePose: dense human pose estimation in the wild. In: CVPR, pp. 7297–7306 (2018)Google Scholar
  3. 3.
    Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: CVPR, pp. 7291–7299 (2017)Google Scholar
  4. 4.
    Castrejon, L., Kundu, K., Urtasun, R., Fidler, S.: Annotating object instances with a polygon-RNN. In: CVPR, pp. 5230–5238 (2017)Google Scholar
  5. 5.
    Chan, T.F., Vese, L.A.: Active contours without edges. IEEE Trans. Image Process. 10(2), 266–277 (2001)CrossRefGoogle Scholar
  6. 6.
    Chen, L.C., Hermans, A., Papandreou, G., Schroff, F., Wang, P., Adam, H.: MaskLAB: instance segmentation by refining object detection with semantic and direction features. In: CVPR, pp. 4013–4022 (2018)Google Scholar
  7. 7.
    Chen, X., Girshick, R.B., He, K., Dollár, P.: TensorMask: a foundation for dense object segmentation. In: ICCV (2019)Google Scholar
  8. 8.
    Cheng, D., Liao, R., Fidler, S., Urtasun, R.: Darnet: Deep active ray network for building segmentatio. arXiv preprint arXiv:1905.05889 (2019)
  9. 9.
    Dai, J., He, K., Li, Y., Ren, S., Sun, J.: Instance-sensitive fully convolutional networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 534–549. Springer, Cham (2016). Scholar
  10. 10.
    Dai, J., He, K., Sun, J.: Instance-aware semantic segmentation via multi-task network cascades. In: CVPR, pp. 3150–3158 (2016)Google Scholar
  11. 11.
    Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks. In: NeurIPS, pp. 379–387 (2016)Google Scholar
  12. 12.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database (2009)Google Scholar
  13. 13.
    Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., Tian, Q.: CenterNet: object detection with keypoint triplets. arXiv preprint arXiv:1904.08189 (2019)
  14. 14.
    Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. IJCV 88(2), 303–338 (2010)CrossRefGoogle Scholar
  15. 15.
    Fan, H., Su, H., Guibas, L.J.: A point set generation network for 3D object reconstruction from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 605–613 (2017)Google Scholar
  16. 16.
    Girshick, R.: Fast R-CNN. In: ICCV, pp. 1440–1448 (2015)Google Scholar
  17. 17.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR, pp. 580–587 (2014)Google Scholar
  18. 18.
    He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV, pp. 2961–2969 (2017)Google Scholar
  19. 19.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)Google Scholar
  20. 20.
    Huang, C., Ai, H., Li, Y., Lao, S.: High-performance rotation invariant multiview face detection. PAMI 29(4), 671–686 (2007)CrossRefGoogle Scholar
  21. 21.
    Kass, M., Witkin, A., Terzopoulos, D.: Snakes: active contour models. IJCV 1(4), 321–331 (1988)CrossRefGoogle Scholar
  22. 22.
    Kirillov, A., Levinkov, E., Andres, B., Savchynskyy, B., Rother, C.: InstanceCut: from edges to instances with multicut. In: CVPR, pp. 5008–5017 (2017)Google Scholar
  23. 23.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NeurIPS, pp. 1097–1105 (2012)Google Scholar
  24. 24.
    Kuznetsova, A., et al.: The open images dataset V4: unified image classification, object detection, and visual relationship detection at scale. arXiv preprint arXiv:1811.00982 (2018)
  25. 25.
    Law, H., Deng, J.: CornerNet: detecting objects as paired keypoints. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 765–781. Springer, Cham (2018). Scholar
  26. 26.
    Li, Y., Qi, H., Dai, J., Ji, X., Wei, Y.: Fully convolutional instance-aware semantic segmentation. In: CVPR, pp. 2359–2367 (2017)Google Scholar
  27. 27.
    Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: ICCV, pp. 2117–2125 (2017)Google Scholar
  28. 28.
    Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017)Google Scholar
  29. 29.
    Lin, T.Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). Scholar
  30. 30.
    Ling, H., Gao, J., Kar, A., Chen, W., Fidler, S.: Fast interactive object annotation with curve-GCN. In: CVPR (2019)Google Scholar
  31. 31.
    Moreira, A., Santos, M.Y.: Concave hull: A k-nearest neighbours approach for the computation of the region occupied by a set of points (2007)Google Scholar
  32. 32.
    Palmer, S.E.: Vision Science: Photons to Phenomenology. MIT Press, Cambridge (1999)Google Scholar
  33. 33.
    Peng, S., Jiang, W., Pi, H., Bao, H., Zhou, X.: Deep snake for real-time instance segmentation. arXiv preprint arXiv:2001.01629 (2020)
  34. 34.
    Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: CVPR (2017)Google Scholar
  35. 35.
    Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: NeurIPS, pp. 5099–5108 (2017)Google Scholar
  36. 36.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NeurIPS, pp. 91–99 (2015)Google Scholar
  37. 37.
    Rubner, Y., Tomasi, C., Guibas, L.J.: The earth mover’s distance as a metric for image retrieval. IJCV 40(2), 99–121 (2000)CrossRefGoogle Scholar
  38. 38.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)Google Scholar
  39. 39.
    Srinivasan, P., Zhu, Q., Shi, J.: Many-to-one contour matching for describing and discriminating object shape. In: CVPR (2010)Google Scholar
  40. 40.
    Toshev, A., Taskar, B., Daniilidis, K.: Shape-based object detection via boundary structure segmentation. IJCV 99(2), 123–146 (2012)MathSciNetCrossRefGoogle Scholar
  41. 41.
    Wang, X., Bai, X., Ma, T., Liu, W., Latecki, L.J.: Fan shape model for object detection. In: CVPR, pp. 151–158. IEEE (2012)Google Scholar
  42. 42.
    Wang, X., Kong, T., Shen, C., Jiang, Y., Li, L.: Solo: segmenting objects by locations. arXiv preprint arXiv:1912.04488 (2019)
  43. 43.
    Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: CVPR (2016)Google Scholar
  44. 44.
    Wu, Y., He, K.: Group normalization. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11217, pp. 3–19. Springer, Cham (2018). Scholar
  45. 45.
    Xie, E., et al.: PolarMask: single shot instance segmentation with polar representation. arXiv preprint arXiv:1909.13226 (2019)
  46. 46.
    Yang, J., Price, B., Cohen, S., Lee, H., Yang, M.H.: Object contour detection with a fully convolutional encoder-decoder network. In: CVPR, pp. 193–202 (2016)Google Scholar
  47. 47.
    Yang, Z., Liu, S., Hu, H., Wang, L., Lin, S.: RepPoints: point set representation for object detection. In: CVPR (2019)Google Scholar
  48. 48.
    Zhang, S., Chi, C., Yao, Y., Lei, Z., Li, S.Z.: Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. arXiv preprint arXiv:1912.02424 (2019)
  49. 49.
    Zhou, X., Wang, D., Krähenbühl, P.: Objects as points. arXiv preprint arXiv:1904.07850 (2019)
  50. 50.
    Zhou, X., Zhuo, J., Krähenbühl, P.: Bottom-up object detection by grouping extreme and center points. In: CVPR (2019)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Peking UniversityBeijingChina
  2. 2.Zhejiang LabHangzhouChina
  3. 3.Zhejiang UniversityHangzhouChina
  4. 4.The Chinese University of Hong KongHong KongChina
  5. 5.Shanghai Jiao Tong UniversityShanghaiChina
  6. 6.University of TorontoTorontoCanada
  7. 7.Microsoft Research AsiaBeijingChina

Personalised recommendations