Advertisement

FPM: Fine Pose Parts-Based Model with 3D CAD Models

  • Joseph J. Lim
  • Aditya Khosla
  • Antonio Torralba
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8694)

Abstract

We introduce a novel approach to the problem of localizing objects in an image and estimating their fine-pose. Given exact CAD models, and a few real training images with aligned models, we propose to leverage the geometric information from CAD models and appearance information from real images to learn a model that can accurately estimate fine pose in real images. Specifically, we propose FPM, a fine pose parts-based model, that combines geometric information in the form of shared 3D parts in deformable part based models, and appearance information in the form of objectness to achieve both fast and accurate fine pose estimation. Our method significantly outperforms current state-of-the-art algorithms in both accuracy and speed.

Keywords

Object Detection Real Image Neural Information Processing System Objectness Score Deformable Part Model 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Alexe, B., Deselaers, T., Ferrari, V.: What is an object? In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 73–80 (2010)Google Scholar
  2. 2.
    Aubry, M., Maturana, D., Efros, A., Russell, B., Sivic, J.: Seeing 3d chairs: exemplar part-based 2D-3D alignment using a large dataset of cad models. In: IEEE Conference on Computer Vision and Pattern Recognition (2014)Google Scholar
  3. 3.
    Barron, J.T., Malik, J.: Intrinsic scene properties from a single RGB-D image. In: IEEE Conference on Computer Vision and Pattern Recognition (2013)Google Scholar
  4. 4.
    Choi, W., Chao, Y.W., Pantofaru, C., Savarese, S.: Understanding indoor scenes using 3D geometric phrases. In: IEEE Conference on Computer Vision and Pattern Recognition (2013)Google Scholar
  5. 5.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition (2005)Google Scholar
  6. 6.
    Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., Darrell, T.: Decaf: A deep convolutional activation feature for generic visual recognition. arXiv preprint arXiv:1310.1531 (2013)Google Scholar
  7. 7.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2007 (VOC 2007) Results (2007)Google Scholar
  8. 8.
    Felzenszwalb, P.F., Girshick, R.B., McAllester, D.: Discriminatively trained deformable part models (2009)Google Scholar
  9. 9.
    Felzenszwalb, P.F., Huttenlocher, D.P.: Pictorial structures for object recognition. International Journal of Computer Vision 61(1), 55–79 (2005)CrossRefGoogle Scholar
  10. 10.
    Fidler, S., Dickinson, S.J., Urtasun, R.: 3D object detection and viewpoint estimation with a deformable 3D cuboid model. In: Advances in Neural Information Processing Systems (2012)Google Scholar
  11. 11.
    Fisher, M., Hanrahan, P.: Context-based search for 3D models. ACM Trans. Graph. 29(6) (December 2010)Google Scholar
  12. 12.
    Fouhey, D.F., Delaitre, V., Gupta, A., Efros, A.A., Laptev, I., Sivic, J.: People watching: Human actions as a cue for single view geometry. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part V. LNCS, vol. 7576, pp. 732–745. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  13. 13.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (2014)Google Scholar
  14. 14.
    Girshick, R., Song, H.O., Darrell, T.: Discriminatively activated sparselets. In: International Conference on Machine Learning (2013)Google Scholar
  15. 15.
    Gupta, A., Satkin, S., Efros, A.A., Hebert, M.: From 3D scene geometry to human workspace. In: IEEE Conference on Computer Vision and Pattern Recognition (2011)Google Scholar
  16. 16.
    Gupta, S., Arbelaez, P., Malik, J.: Perceptual organization and recognition of indoor scenes from RGB-D images. In: IEEE Conference on Computer Vision and Pattern Recognition (2013)Google Scholar
  17. 17.
    Hariharan, B., Malik, J., Ramanan, D.: Discriminative decorrelation for clustering and classification. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part IV. LNCS, vol. 7575, pp. 459–472. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  18. 18.
    Hedau, V., Hoiem, D., Forsyth, D.: Thinking inside the box: Using appearance models and context based on room geometry. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part VI. LNCS, vol. 6316, pp. 224–237. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  19. 19.
    Hejrati, M., Ramanan, D.: Analyzing 3D objects in cluttered images. In: Advances in Neural Information Processing Systems (2012)Google Scholar
  20. 20.
    Hejrati, M., Ramanan, D.: Analysis by synthesis: 3D object recognition by object reconstruction. In: IEEE Conference on Computer Vision and Pattern Recognition (2014)Google Scholar
  21. 21.
    Henry, P., Krainin, M., Herbst, E., Ren, X., Fox, D.: Rgbd mapping: Using depth cameras for dense 3D modeling of indoor environments. In: RGB-D: Advanced Reasoning with Depth Cameras Workshop in Conjunction with RSS (2010)Google Scholar
  22. 22.
    Hoiem, D., Efros, A.A., Hebert, M.: Geometric context from a single image. In: IEEE International Conference on Computer Vision (2005)Google Scholar
  23. 23.
    Hoiem, D., Hedau, V., Forsyth, D.: Recovering free space of indoor scenes from a single image. In: IEEE Conference on Computer Vision and Pattern Recognition (2012)Google Scholar
  24. 24.
    Jia, Z., Gallagher, A., Saxena, A., Chen, T.: 3D-based reasoning with blocks, support, and stability. In: IEEE Conference on Computer Vision and Pattern Recognition (2013)Google Scholar
  25. 25.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (2012)Google Scholar
  26. 26.
    Lai, K., Bo, L., Ren, X., Fox, D.: Detection-based object labeling in 3D scenes. In: IEEE International Conference on on Robotics and Automation (2012)Google Scholar
  27. 27.
    Lim, J.J., Pirsiavash, H., Torralba, A.: Parsing ikea objects: Fine pose estimation. In: IEEE International Conference on Computer Vision (2013)Google Scholar
  28. 28.
    Lowe, D.: Fitting parameterized three-dimensional models to images. IEEE Transactions on Pattern Analysis and Machine intelligence (1991)Google Scholar
  29. 29.
    Matzen, K., Snavely, N.: Nyc3dcars: A dataset of 3D vehicles in geographic context. In: Proc. Int. Conf. on Computer Vision (2013)Google Scholar
  30. 30.
    Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part V. LNCS, vol. 7576, pp. 746–760. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  31. 31.
    Pepik, B., Gehler, P., Stark, M., Schiele, B.: 3d2pm - 3D deformable part models. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VI. LNCS, vol. 7577, pp. 356–370. Springer, Heidelberg (2012)Google Scholar
  32. 32.
    Satkin, S., Lin, J., Hebert, M.: Data-driven scene understanding from 3D models. In: British Machine Vision Conference (2012)Google Scholar
  33. 33.
    Schwing, A.G., Fidler, S., Pollefeys, M., Urtasun, R.: Box In the Box: Joint 3D Layout and Object Reasoning from Single Images. In: Proc. ICCV (2013)Google Scholar
  34. 34.
    Sun, M., Su, H., Savarese, S., Fei-Fei, L.: A multi-view probabilistic model for 3D object classes. In: IEEE Conference on Computer Vision and Pattern Recognition (2009)Google Scholar
  35. 35.
    Uijlings, J., van de Sande, K., Gevers, T., Smeulders, A.: Selective search for object recognition. International Journal of Computer Vision (2013)Google Scholar
  36. 36.
    Xiang, Y., Mottaghi, R., Savarese, S.: Beyond pascal: A benchmark for 3D object detection in the wild. In: IEEE Winter Conference on Applications of Computer Vision (2014)Google Scholar
  37. 37.
    Xiao, J., Russell, B., Torralba, A.: Localizing 3D cuboids in single-view images. In: Advances in Neural Information Processing Systems (2012)Google Scholar
  38. 38.
    Zhao, Y., Zhu, S.C.: Scene parsing by integrating function, geometry and appearance models. In: IEEE Conference on Computer Vision and Pattern Recognition (2013)Google Scholar
  39. 39.
    Zia, M., Stark, M., Schindler, K.: Explicit occlusion modeling for 3D object class representations. In: IEEE Conference on Computer Vision and Pattern Recognition (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Joseph J. Lim
    • 1
  • Aditya Khosla
    • 1
  • Antonio Torralba
    • 1
  1. 1.Massachusetts Institute of TechnologyUSA

Personalised recommendations