Skip to main content

Indoor Objects and Outdoor Urban Scenes Recognition by 3D Visual Primitives

  • Conference paper
  • First Online:
Computer Vision - ACCV 2014 Workshops (ACCV 2014)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9008))

Included in the following conference series:

Abstract

Object detection, recognition and pose estimation in 3D images have gained momentum due to availability of 3D sensors (RGB-D) and increase of large scale 3D data, such as city maps. The most popular approach is to extract and match 3D shape descriptors that encode local scene structure, but omits visual appearance. Visual appearance can be problematic due to imaging distortions, but the assumption that local shape structures are sufficient to recognise objects and scenes is largely invalid in practise since objects may have similar shape, but different texture (e.g., grocery packages). In this work, we propose an alternative appearance-driven approach which first extracts 2D primitives justified by Marr’s primal sketch, which are “accumulated” over multiple views and the most stable ones are “promoted” to 3D visual primitives. The 3D promoted primitives represent both structure and appearance. For recognition, we propose a fast and effective correspondence matching using random sampling. For quantitative evaluation we construct a semi-synthetic benchmark dataset using a public 3D model dataset of 119 kitchen objects and another benchmark of challenging street-view images from 4 different cities. In the experiments, our method utilises only a stereo view for training. As the result, with the kitchen objects dataset our method achieved almost perfect recognition rate for \(\pm 10^\circ \) camera view point change and nearly 80 % for \(\pm 20^\circ \), and for the street-view benchmarks it achieved 75 % accuracy for 160 street-view images pairs, 80 % for 96 street-view images pairs, and 92 % for 48 street-view image pairs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://i61p109.ira.uka.de/ObjectModelsWebUI/.

  2. 2.

    http://i61p109.ira.uka.de/ObjectModelsWebUI/.

References

  1. Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: CVPR (2007)

    Google Scholar 

  2. Chum, O., Matas, J.: Unsupervised discovery of co-occurrence in sparse high dimensional data. In: CVPR (2010)

    Google Scholar 

  3. Rodola, E., Albarelli, A., Bergamasco, F., Torsello, A.: A scale independent selection process for 3d object recognition in cluttered scenes. Int. J. Comput. Vis. 102, 129–145 (2013)

    Article  MathSciNet  Google Scholar 

  4. As’ari, M., Supriyanto, U.S.E.: 3d shape descriptor for object recognition based on kinect-like depth image. Image Vis. Comput. 32, 260–269 (2014)

    Article  Google Scholar 

  5. Buch, A., Yang, Y., Krüger, N., Petersen, H.: In search of inliers: 3d correspondence by local and global voting. In: CVPR (2014)

    Google Scholar 

  6. Marr, D.: Vision. A Computational Investigation into the Human Representation and Processing of Visual Information. W.H. Freeman and Company, New York (1982)

    Google Scholar 

  7. Kalkan, S., Wörgötter, F., Krüger, N.: Statistical analysis of local 3d structure in 2d images. In: CVPR (2006)

    Google Scholar 

  8. Glasner, D., Galun, M., Alpert, S., Basri, R., Shakhnarovich, G.: Viewpoint-aware object detection and pose estimation. In: ICCV (2011)

    Google Scholar 

  9. Sattler, T., Leibe, B., Kobbelt, L.: Fast image-based localization using direct 2d-to-3d matching. In: ICCV (2011)

    Google Scholar 

  10. Zia, M., Stark, M., Schiele, B., Schindler, K.: Detailed 3d representations for object recognition and modeling. IEEE PAMI 35, 2608–2623 (2013)

    Article  Google Scholar 

  11. Dorai, C., Jain, A.: Shape spectrum based view grouping and matching of 3D free-form objects. T-PAMI 19, 1139–1145 (1997)

    Article  Google Scholar 

  12. Fayad, J., Russell, C., Agapito, L.: Automated articulated structure and 3D shape recovery from point correspondences. In: ICCV (2011)

    Google Scholar 

  13. Sharma, A., Horaud, R., Cech, J., Boyer, E.: Topologically-robust 3D shape matching based on diffusion geometry and seed growing. In: CVPR (2011)

    Google Scholar 

  14. Bronstein, A., Bronstein, M., Kimmel, R.: Three-dimensional face recognition. Int. J. Comput. Vis. 64, 5–30 (2005)

    Article  Google Scholar 

  15. Gökberg, B., Irfanoglu, M., Akarun, L.: 3D shape-based face representation and feature extraction for face recognition. Image Vis. Comput. 24, 857–869 (2006)

    Article  Google Scholar 

  16. Papazov, C., Burschka, D.: An efficient RANSAC for 3D object recognition in noisy and occluded scenes. In: Kimmel, R., Klette, R., Sugimoto, A. (eds.) ACCV 2010, Part I. LNCS, vol. 6492, pp. 135–148. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  17. Drost, B., Ulrich, M., Navab, N., Ilic, S.: Model globally, match locally: Efficient and robust 3D object recognition. In: CVPR (2010)

    Google Scholar 

  18. Detry, R., Pugeault, N., Piater, J.: A probabilistic framework for 3D visual object representation. T-PAMI 31, 1790–1803 (2009)

    Article  Google Scholar 

  19. Baseski, E., Pugeault, N., Kalkan, S., Kraft, D., Wörgötter, F., Krüger, N.: A scene representation based on multi-modal 2d and 3d features. In: ICCV Workshop on 3D Representation for Recognition (2007)

    Google Scholar 

  20. Knopp, J., Prasad, M., Willems, G., Timofte, R., Van Gool, L.: Hough transform and 3D SURF for robust three dimensional classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part VI. LNCS, vol. 6316, pp. 589–602. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  21. Tombari, F., Salti, S., Di Stefano, L.: Unique signatures of histograms for local surface description. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part III. LNCS, vol. 6313, pp. 356–369. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  22. Pham, M.T., Woodford, O., Perbert, F., Maki, A., Stenger, B., Cipolla, R.: A new distance for scale-invariant 3D shape recognition and registration. In: ICCV (2011)

    Google Scholar 

  23. Zaharescu, A., Boyer, E., Horaud, R.: Keypoints and local descriptors of scalar functions on 2d manifolds. Int. J. Comput. Vis. 100, 78–98 (2012)

    Article  MATH  Google Scholar 

  24. Bo, L., Lai, K., Ren, X., Fox, D.: Object recognition with hierarchical kernel descriptors. In: CVPR (2011)

    Google Scholar 

  25. Sun, J., Ovsjanikov, M., Guibas, L.: A concise and provably informative multi-scale signature based on heat diffusion. In: Eurographics Symposium on Geometry Processing (2009)

    Google Scholar 

  26. Bronstein, A., Bronstein, M., Guibas, L., Ovsjanikov, M.: Shape google: geometric words and expressions for invariant shape retrieval. ACM Trans. Graph. 30, 1–20 (2011)

    Article  Google Scholar 

  27. Ahmed, N., Theobalt, C., Rössl, C., Thrun, S., Seidel, H.P.: Dense correspondence finding for parameterization-free animation reconstruction from video. In: CVPR (2008)

    Google Scholar 

  28. Mian, A., Bennamoun, M., Owens, R.: On the repeatability and quality of keypoints for local feature-based 3D object retrieval from cluttered scenes. Int. J. Comput. Vis. 89, 348–361 (2010)

    Article  Google Scholar 

  29. Lee, S., Lu, Z., Kim, H.: Probabilistic 3D object recognition with both positive and negative evidences. In: ICCV (2011)

    Google Scholar 

  30. Hu, W., Zhu, S.C.: Learning a probabilistic model mixing 3d and 2d primitives for view invariant object recognition. In: CVPR (2010)

    Google Scholar 

  31. Kang, H., Hebert, M., Kanade, T.: Discovering object instances from scenes of daily living. In: ICCV (2011)

    Google Scholar 

  32. Krüger, N., Janssen, P., Kalkan, S., Lappe, M., Leonardis, A., Piater, J., Rodriguez-Sanchez, A., Wiskott, L.: Deep hierarchies in the primate visual cortex: what can we learn for computer vision? IEEE PAMI 35, 1847–1871 (2013)

    Article  Google Scholar 

  33. Fidler, S., Boben, M., Leonardis, A.: Similarity-based cross-layered hierarchical representation for object categorization. In: CVPR (2008)

    Google Scholar 

  34. Mutch, J., Lowe, D.: Object class recognition and localization using sparse features with limited receptive fields. Int. J. Comput. Vis. 80, 45–57 (2008)

    Article  Google Scholar 

  35. Pugeault, N., Wörgötter, F., Krüger, N.: Accumulated visual representation for cognitive vision. In: BMVC (2008)

    Google Scholar 

  36. Chaudhuri, B., Sarkar, N.: Texture segmentation using fractal dimension. T-PAMI 17, 72–76 (1995)

    Article  Google Scholar 

  37. Felsberg, M., Sommer, G.: Image features based on a new approach to 2D rotation invariant quadrature filters. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part I. LNCS, vol. 2350, pp. 369–383. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  38. Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge (2003)

    Google Scholar 

  39. Chum, O., Matas, J.: Optimal randomized RANSAC. T-PAMI 30, 1472–1482 (2008)

    Article  Google Scholar 

  40. Umeyama, S.: Least-squares estimation of transformation parameters between two point patterns. T-PAMI 13, 376–380 (1991)

    Article  Google Scholar 

  41. Xue, Z., Kasper, A., Zoellner, J., Dillmann, R.: An automatic grasp planning system for service robots. In: ICAR (2009)

    Google Scholar 

  42. Frome, A., Huber, D., Kolluri, R., Bülow, T., Malik, J.: Recognizing objects in range data using regional point descriptors. In: Pajdla, T., Matas, J.G. (eds.) ECCV 2004. LNCS, vol. 3023, pp. 224–237. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  43. Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape context. T-PAMI 24, 509–522 (2002)

    Article  Google Scholar 

Download references

Acknowledgement

The authors would like to give thanks to Dr. Lixin Fan for the valuable discussions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Junsheng Fu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Fu, J., Kämäräinen, JK., Buch, A.G., Krüger, N. (2015). Indoor Objects and Outdoor Urban Scenes Recognition by 3D Visual Primitives. In: Jawahar, C., Shan, S. (eds) Computer Vision - ACCV 2014 Workshops. ACCV 2014. Lecture Notes in Computer Science(), vol 9008. Springer, Cham. https://doi.org/10.1007/978-3-319-16628-5_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-16628-5_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-16627-8

  • Online ISBN: 978-3-319-16628-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics