Segmentation over Detection by Coupled Global and Local Sparse Representations

  • Wei Xia
  • Zheng Song
  • Jiashi Feng
  • Loong-Fah Cheong
  • Shuicheng Yan
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7576)


Motivated by the rising performances of object detection algorithms, we investigate how to further precisely segment out objects within the output bounding boxes. The task is formulated as a unified optimization problem, pursuing a unique latent object mask in non-parametric manner. For a given test image, the objects are first detected by detectors. Then for each detected bounding box, the objects of the same category along with their object masks are extracted from the training set. The latent mask of the object within the bounding box is inferred based on three objectives: 1) the latent mask should be coherent, subject to sparse errors caused by within-category diversities, with the global bounding-box-level mask inferred by sparse representation over the bounding boxes of the same category within the training set; 2) the latent mask should be coherent with local patch-level mask inferred by sparse representation of the individual patch over all spatially nearby (handling local deformations) patches of the same category in the training set; and 3) mask property within each sufficiently small super-pixel should be consistent. All these three objectives are integrated into a unified optimization problem, and finally the sparse representation coefficients and the latent mask are alternately optimized based on Lasso optimization and smooth approximation followed by Accelerated Proximal Gradient method, respectively. Extensive experiments on the Pascal VOC object segmentation datasets, VOC2007 and VOC2010, show that our proposed algorithm achieves competitive results with the state-of-the-art learning based algorithms, and is superior over other detection based object segmentation algorithms.


Training Image Object Detection Sparse Representation Foreground Object Object Segmentation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Gonfaus, J., Bosch, X., Weijer, J., Bagdanov, A., Gual, J.: Harmony potentials for joint classification and segmentation. In: CVPR (2010)Google Scholar
  2. 2.
    Li, F., Carreira, J., Sminchisescu, C.: Object recognition as ranking holistic figure-ground hypotheses. In: CVPR (2010)Google Scholar
  3. 3.
    Brox, T., Bourdev, L., Maji, S., Malik, J.: Object segmentation by alignment of poselet activations to image contours. In: CVPR (2011)Google Scholar
  4. 4.
    Yang, Y., Hallman, S., Ramanan, D., Fowlkes, C.: Layered object detection for multi-class segmentation. In: CVPR (2010)Google Scholar
  5. 5.
    Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. TPAMI 32 (2010)Google Scholar
  6. 6.
    Vedaldi, A., Gulshan, V., Varma, M., Zisserman, A.: Multiple kernels for object detection. In: ICCV (2009)Google Scholar
  7. 7.
    Kumar, M., Torr, P., Zisserman, A.: OBJ CUT. In: CVPR (2005)Google Scholar
  8. 8.
    Ladicky, L., Sturgess, P., Alahari, K., Russell, C., Torr, P.: What, Where and How Many? Combining Object Detectors and CRFs. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 424–437. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  9. 9.
    Ladicky, L., Russell, C., Kohli, P., Torr, P.: Associative hierarchical CRFs for object class image segmentation. In: ICCV (2009)Google Scholar
  10. 10.
    Ladicky, L., Russell, C., Kohli, P., Torr, P.: Graph Cut Based Inference with Co-occurrence Statistics. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 239–253. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  11. 11.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge, VOC 2007 (2007) (results)Google Scholar
  12. 12.
    Yuan, X., Yan, S.: Visual classification with multi-task joint sparse representation. In: CVPR (2010)Google Scholar
  13. 13.
    Mori, G., Ren, X., Efros, A., Malik, J.: Recovering human body configurations: Combining segmentation and recognition. In: CVPR (2004)Google Scholar
  14. 14.
    Martin, D., Fowlkes, C., Malik, J.: Learning to detect natural image boundaries using brightness and texture. In: NIPS (2002)Google Scholar
  15. 15.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge, VOC 2010 (2010) (results)Google Scholar
  16. 16.
    Malisiewic, T., Gupta, A., Efros, A.: Ensemble of exemplar-svms for object detection and beyond. In: ICCV (2011)Google Scholar
  17. 17.
    Wright, J., Yang, A., Ganesh, A., Sastry, S., Ma, Y.: Robust face recognition via sparse representation. TPAMI 31 (2009)Google Scholar
  18. 18.
    Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: CVPR (2009)Google Scholar
  19. 19.
    Liu, X., Feng, J., Yan, S., Jin, H.: Image segmentation with patch-pair density priors. In: ACM Multimedia (2010)Google Scholar
  20. 20.
    Rao, S., Tron, R., Vidal, R., Ma, Y.: Motion segmentation via robust subspace separation in the presence of outlying, incomplete, or corrupted trajectories. In: CVPR (2008)Google Scholar
  21. 21.
    Elhamifar, E., Vidal, R.: Sparse subspace clustering. In: CVPR (2009)Google Scholar
  22. 22.
    Zhu, L., Chen, Y., Yuille, A., Freeman, W.: Latent hierarchical structural learning for object detection. In: CVPR (2010)Google Scholar
  23. 23.
    Chen, Y., Zhu, L., Yuille, A.: Active Mask Hierarchies for Object Detection. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 43–56. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  24. 24.
    Emmanuel Candes, J.R.: L1-magic: Recovery of sparse signals via convex programming (2005)Google Scholar
  25. 25.
    Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. (2005)Google Scholar
  26. 26.
    Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., Gong, Y.: Locality-constrained linear coding for image classification. In: CVPR (2010)Google Scholar
  27. 27.
    Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: CVPR (2006)Google Scholar
  28. 28.
    Lowe, D.: Distinctive image features from scale-invariant keypoints. IJCV 60 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Wei Xia
    • 1
  • Zheng Song
    • 1
  • Jiashi Feng
    • 1
  • Loong-Fah Cheong
    • 1
  • Shuicheng Yan
    • 1
  1. 1.Department of ECENational University of SingaporeSingapore

Personalised recommendations