Multimodal Mixed Conditional Random Field Model for Category-Independent Object Detection

  • Jian-Hua Zhang
  • Jian-Wei Zhang
  • Sheng-Yong Chen
  • Ying Hu
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 215)

Abstract

Category-independent object detection is extremely useful for many robot vision tasks. Most existing methods rank a lot of regions by measuring their object-likeness. However, to obtain a sufficient object covering rate too many regions need to be sampled. In this paper, we present a novel method that directly detects and localizes category-independent objects. We develop a novel model which is named as “mixed robust higher-order conditional random field” model which combines 2D and 3D data into a uniform framework. A set of novel features is developed based on 2D and 3D saliency and oversegments. The potentials used in this model are computed from these features. Extensive experiments are carried out on a public RGB-D dataset. By comparison with state-of-the-art ranking methods, the experimental results show the comparable performance of category-independent object detection without sampling a large number of extra regions.

Keywords

Multimodality Category-independent Object segmentation Conditional random fields 

References

  1. 1.
    Alexe B, Deselaers T, Ferrari V (2010) What is an object ?. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 73–80Google Scholar
  2. 2.
    Babenko B, Yang MH, Belongie S (2011) Robust object tracking with online multiple instance learning. IEEE Trans Pattern Anal Mach Intell 33(8):1619–1632Google Scholar
  3. 3.
    Carreira J, Sminchisescu C (2010) Constrained parametric min-cuts for automatic object segmentation. IEEE Trans Pattern Anal Mach Intell Early Access. doi: 10.1109/TPAMI.2011.231
  4. 4.
    Chang CC, Lin CJ LIBSVM : a library for support vector machines 2001. http://www.csie.ntu.edu.tw/~cjlin/libsvm
  5. 5.
    Choi MJ, Torralba A, Willsky AS (Feb. 2012) A tree-based context model for object recognition. IEEE Trans Pattern Anal Mach Intell 34(2):240–252Google Scholar
  6. 6.
    Collet A, Srinivasay SS , Hebert M (2011) Structure discovery in multi-modal data: a region-based approach. In: Proceedings of IEEE international conference robotics and automation, pp 5695–5702, 2011Google Scholar
  7. 7.
    Endres I, Hoiem D (2010) Category independent object roposals. In: Proceedings of European conference on computer vision, pp 575–588, 2010Google Scholar
  8. 8.
    Farhadi A, Endres I, Hoiem D, Forsyth D (2009) Describing objects by their attributes. In: Proceedings of IEEE on conference computer vision and pattern recognition, pp 1778–1785Google Scholar
  9. 9.
    Felzenszwalb P, Huttenlocher D (Sep. 2004) Efficient graph-based image segmentation. Int J Comput Vis 59(2):167–181Google Scholar
  10. 10.
    Feng J, Wei Y, Tao L, Zhang C, Sun J (2011) Salient object detection by composition. In: Proceedings of IEEE international conference on computer vision, 2011Google Scholar
  11. 11.
    Goferman S, Zelnik-Manor L, Tal A (2010) Context-aware saliency detection. IEEE Trans Pattern Anal Mach Intell Early Access. doi: 10.1109/TPAMI.2011.272
  12. 12.
    Ion A, Carreira J, Sminchisescu C (2011) Image segmentation by figure-ground composition into maximal cliques. In: Proceedings of IEEE international conference on computer vision, 2011Google Scholar
  13. 13.
    Itti L, Koch C, Niebur E (Nov. 1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259Google Scholar
  14. 14.
    Kohli P, Ladicky L, Torr PHS (2009) Robust higher order potentials for enforcing label consistency. Int J Comput Vis 82(3):302–324CrossRefGoogle Scholar
  15. 15.
    Lai k, Bo L, Ren X, Fox D (2011) A large-scale hierarchical multi-view RGB-D object dataset. In: Proceedings of IEEE international conference on robotics and automation, pp 1817–1824, 2011Google Scholar
  16. 16.
    Levinshtein A, Sminchisescu C, Dickinson S (2010) Optimal contour closure by superpixel grouping. In: Proceedings of European conference computer vision, pp 480–493, 2010Google Scholar
  17. 17.
    Li Y, Yan J, Zhou Y (2009) Visual saliency based on conditional entropy. In: Proceedings of Asian conference on computer vision, 2009Google Scholar
  18. 18.
    Maire M, Arbelaez P, Fowlkes C, Malik J (2008) Using contours to detect and localize junctions in natural images. In: Proceedings of IEEE conference on computer vision and pattern recognition, 2008Google Scholar
  19. 19.
    Rahtu E, Kannala J, Blaschko M (2011) Learning a category independent object detection cascade. In: Proceedings of IEEE international conference computer visionGoogle Scholar
  20. 20.
    Ren X, Fowlkes C, Malik J (2006) Figure/ground assignment in natural images. In: Proceedings of European conference computer vision, 2006Google Scholar
  21. 21.
    Russell BC, Efros AA, Sivic J, Freeman WT, Zisserman A (2006) Using multiple segmentations to discover objects and their extent in image collections. In: CVPR, 2006Google Scholar
  22. 22.
    Saenko K, Karayev S, Jia Y, Shyr A, Janoch A, Long J, Fritz M, Darrell T (2011) Practical 3-D object detection using category and instance-level appearance models. In: Proceedings of IEEE international conference intelligent robots and systems, pp , 2011Google Scholar
  23. 23.
    Shi J, Malik J (Aug. 2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905Google Scholar
  24. 24.
    Shotton J, Winn J, Rother C, Criminisi A (2009) TextonBoost for image understanding: multi-class object recognition and segmentation by jointly modeling texture, layout, and context. Int J Comput Vis 81(1):2–23CrossRefGoogle Scholar
  25. 25.
    Sutton C, McCallum A (2005) Piecewise training for undirected models. In: Proceedings of annual conference on uncertainty in artificial intelligence, pp 568–575, 2005Google Scholar
  26. 26.
    Veksler O, Boykov Y, Mehrani P (2010) Superpixels and supervoxels in an energy optimization framework. In: Proceedings of European conference on computer vision, pp 211–224, 2010Google Scholar
  27. 27.
    Zhang JH, Xiao J, Zhang J, Zhang H, Chen SY (2011) Integrate multi-modal cues for category-independent object detection and localization. In: Proceedigs of IEEE international conference intelligent robots and systems, pp 801–806, 2011Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Jian-Hua Zhang
    • 1
  • Jian-Wei Zhang
    • 1
  • Sheng-Yong Chen
    • 2
  • Ying Hu
    • 3
  1. 1.Group TAMS, Department of InformaticsUniversity of HamburgHamburgGermany
  2. 2.College of Computer ScienceZhejiang University of TechnologyHangzhouChina
  3. 3.Shenzhen Institutes of Advanced TechnologyChinese Academy of SciencesShenzhenChina

Personalised recommendations