Advertisement

Learning to Refine Object Segments

  • Pedro O. Pinheiro
  • Tsung-Yi Lin
  • Ronan Collobert
  • Piotr Dollár
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9905)

Abstract

Object segmentation requires both object-level information and low-level pixel data. This presents a challenge for feedforward networks: lower layers in convolutional nets capture rich spatial information, while upper layers encode object-level knowledge but are invariant to factors such as pose and appearance. In this work we propose to augment feedforward nets for object segmentation with a novel top-down refinement approach. The resulting bottom-up/top-down architecture is capable of efficiently generating high-fidelity object masks. Similarly to skip connections, our approach leverages features at all layers of the net. Unlike skip connections, our approach does not attempt to output independent predictions at each layer. Instead, we first output a coarse ‘mask encoding’ in a feedforward pass, then refine this mask encoding in a top-down pass utilizing features at successively lower layers. The approach is simple, fast, and effective. Building on the recent DeepMask network for generating object proposals, we show accuracy improvements of 10–20% in average recall for various setups. Additionally, by optimizing the overall network architecture, our approach, which we call SharpMask, is 50 % faster than the original DeepMask network (under .8 s per image).

Keywords

Object Segmentation Average Recall Convolutional Layer Segmentation Mask Object Mask 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. PAMI 32, 1627–1645 (2010)CrossRefGoogle Scholar
  2. 2.
    Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y.: Overfeat: integrated recognition, localization and detection using conv nets. In: ICLR (2014)Google Scholar
  3. 3.
    Szegedy, C., Reed, S., Erhan, D., Anguelov, D.: Scalable, high-quality object detection. arXiv:1412.1441 (2014)
  4. 4.
    He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 346–361. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-10578-9_23 Google Scholar
  5. 5.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014)Google Scholar
  6. 6.
    Girshick, R.: Fast R-CNN. In: ICCV (2015)Google Scholar
  7. 7.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015)Google Scholar
  8. 8.
    Bell, S., Zitnick, C.L., Bala, K., Girshick, R.: Inside-outside net: detecting objects in context with skip pooling and recurrent neural nets. In: CVPR (2016)Google Scholar
  9. 9.
    Lin, T.Y., Maire, M., Belongie, S., Bourdev, L., Girshick, R., Hays, J., Perona, P., Ramanan, D., Zitnick, C.L., Dollár, P.: Microsoft COCO: common objects in context. arXiv:1405.0312 (2015)
  10. 10.
    Everingham, M., Gool, L.V., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL visual object classes (VOC) challenge. IJCV 88, 303–338 (2010)CrossRefGoogle Scholar
  11. 11.
    Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: CVPR (2009)Google Scholar
  12. 12.
    Shotton, J., Johnson, M., Cipolla, R.: Semantic texton forests for image categorization and segmentation. In: CVPR (2008)Google Scholar
  13. 13.
    Farabet, C., Couprie, C., Najman, L., LeCun, Y.: Learning hierarchical features for scene labeling. PAMI 35, 1915–1929 (2013)CrossRefGoogle Scholar
  14. 14.
    Pinheiro, P.O., Collobert, R.: Recurrent conv. neural networks for scene labeling. In: ICML (2014)Google Scholar
  15. 15.
    Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: ICCV (2015)Google Scholar
  16. 16.
    Zheng, S., Jayasumana, S., Romera-Paredes, B., Vineet, B., Su, Z., Du, D., Huang, C., Torr, P.: Conditional random fields as recurrent neural nets. In: ICCV (2015)Google Scholar
  17. 17.
    Chen, L., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Semantic image segmentation with deep conv. nets and fully connected CRFs. In: ICLR (2015)Google Scholar
  18. 18.
    Schwing, A.G., Urtasun, R.: Fully connected deep structured networks. arXiv:1503.02351 (2015)
  19. 19.
    Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: ICCV (2015)Google Scholar
  20. 20.
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. In: Proceedings of the IEEE (1998)Google Scholar
  21. 21.
    Hariharan, B., Arbeláez, P., Girshick, R., Malik, J.: Simultaneous detection and segmentation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 297–312. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-10584-0_20 Google Scholar
  22. 22.
    Pinheiro, P.O., Collobert, R., Dollár, P.: Learning to segment object candidates. In: NIPS (2015)Google Scholar
  23. 23.
    Dai, J., He, K., Sun, J.: Instance-aware semantic segmentation via multi-task network cascades. In: CVPR (2016)Google Scholar
  24. 24.
    Hariharan, B., Arbeláez, P., Girshick, R., Malik, J.: Hypercolumns for object segmentation and fine-grained localization. In: CVPR (2015)Google Scholar
  25. 25.
    Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)Google Scholar
  26. 26.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)Google Scholar
  27. 27.
    Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: CVPR (2015)Google Scholar
  28. 28.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)Google Scholar
  29. 29.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015)Google Scholar
  30. 30.
    Xie, S., Tu, Z.: Holistically-nested edge detection. In: ICCV (2015)Google Scholar
  31. 31.
    Sermanet, P., Kavukcuoglu, K., Chintala, S., LeCun, Y.: Pedestrian detection with unsupervised multi-stage feature learning. In: CVPR (2013)Google Scholar
  32. 32.
    Alexe, B., Deselaers, T., Ferrari, V.: Measuring the objectness of image windows. PAMI 34, 2189–2202 (2012)CrossRefGoogle Scholar
  33. 33.
    Uijlings, J., van de Sande, K., Gevers, T., Smeulders, A.: Selective search for object recog. IJCV 104, 154 (2013)CrossRefGoogle Scholar
  34. 34.
    Zitnick, C.L., Dollár, P.: Edge boxes: locating object proposals from edges. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 391–405. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-10602-1_26 Google Scholar
  35. 35.
    Pont-Tuset, J., Arbeláez, P., Barron, J., Marques, F., Malik, J.: Multiscale combinatorial grouping for image segmentation and object proposal gen. PAMI PP(99), 1 (2015)Google Scholar
  36. 36.
    Krähenbühl, P., Koltun, V.: Geodesic object proposals. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 725–739. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-10602-1_47 Google Scholar
  37. 37.
    Humayun, A., Li, F., Rehg, J.M.: RIGOR: reusing inference in graph cuts for generating object regions. In: CVPR (2014)Google Scholar
  38. 38.
    Hosang, J., Benenson, R., Dollár, P., Schiele, B.: What makes for effective detection proposals? PAMI 38(4), 814–830 (2015)CrossRefGoogle Scholar
  39. 39.
    Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., v.d. Smagt, P., Cremers, D., Brox, T.: Flownet: learning optical flow with convolutional networks. In: ICCV (2015)Google Scholar
  40. 40.
    Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: CVPR (2010)Google Scholar
  41. 41.
    Zagoruyko, S., Lerer, A., Lin, T.Y., Pinheiro, P.O., Gross, S., Chintala, S., Dollár, P.: A multipath network for object detection. In: BMVC (2016)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Pedro O. Pinheiro
    • 1
    • 2
  • Tsung-Yi Lin
    • 1
    • 3
  • Ronan Collobert
    • 1
  • Piotr Dollár
    • 1
  1. 1.Facebook AI ResearchMenlo ParkUSA
  2. 2.Idiap Research Institute and Ecole Polytechnique Fédérale de LausanneLausanneSwitzerland
  3. 3.Cornell TechCornell UniversityNew YorkUSA

Personalised recommendations